SRE is a discipline that combines software engineering and operations to improve the reliability and performance of complex systems, including IT infrastructures, applications, and databases. It focuses on building and maintaining scalable, reliable, and efficient systems. By reducing downtime, improving system resilience, and enabling faster issue detection and resolution, SRE contributes to improved customer experience, increased user satisfaction, and higher revenue potential. However, SRE has traditionally been limited to tech teams. This disconnect can often cost companies dearly as the business objectives are far removed from the digital infrastructure that the users often directly interact with. So, there is a need to align IT and tech teams to build a Reliability Strategy that benefits the entire organization.
In this series of blogs, we highlight the importance of discussing SRE from a business perspective and everything you stand to gain from it. In our previous blogs, we discussed the specific manners in which SRE impacts the business as well as how we can go about measuring this impact.
Building a solid Reliability strategy hinges on creating a common language between IT and Business Performance teams. Two things must be established to ensure a fertile foundation for this strategy:
- Gain clarity on how IT systems support and deliver business value through value stream mapping, customer journey mapping, and service mapping.
- Create an aligned measurement of business and IT performance through SLOs and SLIs
Understanding how IT systems support business goals:
In order to align SRE practices with business objectives, it is crucial to understand how the system's components hinder or contribute to the customer's experience. This can be achieved by corresponding system components to the impact they have on the user. A handy approach to do this is through value stream mapping and customer journey mapping.
Value stream mapping involves understanding how individual components as well as the overall system support and deliver business value. For example, in the case of an e-commerce website, the value stream includes elements like searching for products, ordering, and payment.
If any of one of three components fail, it can result in customers being unable to purchase the product, and consequently, a decrease in revenue.
By using Value Stream and User Journey mapping, we can create a starting point of understanding how systems deliver business value and facilitate a unified understanding and language between Business and IT stakeholders.
For each step in the customer journey, corresponding services work to support those actions. Service mapping involves identifying and mapping these services to understand how they contribute to the overall business value.
For example, in the case of the search function, relevant recommendations may be achieved through multiple services. This could include using past purchase history to create personalized recommended list or analytics for showcase the most popular products in the search category. Mapping out these services helps create a comprehensive view of how systems deliver business value and allows for measuring business and IT performance.
Read about how you can use mapping to create a common language between IT and business teams here.
Creating aligned measurements:
Traditionally, aligning business and expected IT performance is done by defining non-functional requirements. Non-functional requirements refer to the aspects of a system that are not related to its specific functionality but rather to its overall performance, security, reliability, and other characteristics. However, there is merit in challenging this approach. Non-functional metrics are often far removed from the customer's actual experiences, and therefore, don't provide clear insight into actionable improvements.
Service Level Indicators (SLIs) and Service Level Objectives (SLOs) can help us get closer to aligning the two.
Service Level Indicators (SLIs) are point-in-time measurements of a specific technical quality of a service, indicating how the service is performing. SLOs define the desired quality of service over a given period of time. By aligning business objectives with meaningful SLIs and SLOs, organizations can ensure an aligned vision for the system and overall better performance.
Defining meaningful SLIs and SLOs that are relevant indicators of service quality performance and early warning signals of potential business impact is important. It is necessary to align business and IT performance by using SLOs as proxies for meeting business objectives. Read more about SLOs and SLIs here.
So, the cornerstones of a well-defined Reliability Strategy lie in understanding how IT systems deliver Business Value and creating an aligned measure of Business and IT performance. These cornerstones can be further broken down into smaller building blocks, outlining and connecting several concepts that need to be considered.
What are our goals?Understanding the Business Strategy and corresponding Business Objectives must be clear for Business and IT stakeholders to ensure successful interdepartmental collaboration.
Where do we play?
Knowing how IT systems deliver Business Value helps companies determine where reliability should be a focus along the Value Streams and User Journeys.
How do we win?
Creating an aligned measure of Business and IT performance helps us determine how we can use IT support to achieve our Business Goals.
There are many capabilities required to succeed, such as, but not limited to, having observability of the systems and their components. Clarifying the required qualifications will ensure we are prepared to receive the needed insight.
Beyond the business and technical aspects, the supporting elements of SRE implementations also need to be defined. For example, training Business and IT stakeholders and engineers may be necessary to ensure smooth performance.
Building on these cornerstones, organizations should be able to create a Reliability Strategy, which we believe is essential to delivering successful and sustainable SRE implementations with meaningful, measurable, tangible results, both from a technical, as well as from a business perspective.
Interested in learning more about site reliability engineering? Check our Site Reliability Engineering service page.