This article was originally published on The New Stack on October 24th.
Service-level objectives (SLOs) are a familiar concept for DevOps professionals and site reliability engineers (SREs), as they are crucial for monitoring system health and sounding the alarm when something is wrong. While SLOs have traditionally been the domain of backend engineering, their value for helping mobile teams ensure highly performant apps and make prioritization decisions between feature and reliability work is obvious.
For many organizations, however, mobile SLOs are a new, and sometimes intimating, endeavor. But they don’t have to be! Any team can effectively adopt SLOs as part of their mobile observability strategy by following a few best practices. Here are five tips for designing highly effective mobile SLOs.
1. Think in terms of end-to-end user experiences
For those coming from a DevOps background, it might be tempting to translate familiar concepts about endpoint availability, latency, etc., directly into mobile. But you’ll have to shift your thinking when building SLOs for mobile to look at end-user experiences in their entirety. You’ll want to build your SLOs around an end-to-end flow or activity that you’re trying to optimize, such as a login or search process, rather than on the individual technical components that make the flow happen – like screen renderings, API calls, etc. The technical actions are events within your SLO that, with the right tooling, can be isolated when it’s time to troubleshoot an issue, but they shouldn’t be the ultimate focus.
2. Measure user impact numbers, not just incidents
Events that happen on mobile can have an unexpected level of impact across your user base, both above and below what you might anticipate. That’s because mobile data is largely shaped by the concept of unique users and unique sessions, whereas backend data is not. If, for example, you notice 1,000 instances of a certain type of failure, how would you know how those instances are distributed across your users? Did 1,000 unique users each experience the failure once, or did one unfortunate user experience the failure 1,000 times?
If you’re only measuring incident counts, it’s impossible to know this.
As a result, you may be firing the alarm for SLO violations either too strictly or too loosely. To truly understand how you’ll prioritize your response to SLO violations, think in terms of both user numbers and event numbers.
3. Identify the user flows that have the biggest impact on business objectives
Ultimately, the purpose of SLOs is to prioritize and direct technical work so that it serves your business. That’s why, when you consider which SLOs to build for mobile, it’s crucial to pinpoint the user flows that have the largest impact on your business so that there is a clear understanding as to why a violation of the SLO is going to force your team to prioritize it over other work.
Start with the most direct, obvious indicators of business impact. For example, your customers not being able to successfully check out on the app will directly affect your sales. An issue with push notifications, on the other hand, might contribute to a gradual decline in revenue, but it’s farther back enough in the sales funnel that a disruption with this functionality shouldn’t make your engineers drop everything to fix it.
4. Avoid sampling
One of the big challenges with mobile data is that there is a lot of it. You may be used to sampling data that feeds into backend SLOs to reduce data processing and storage costs, and this makes sense. After all, you’re dealing with a predictable environment composed of a limited number of device types and other fairly stable variables.
When it comes to mobile, these assumptions do not hold. There are nearly endless permutations of device types, operating systems, app versions, networking conditions, local infrastructure, etc. This means that sampling the data you feed into SLOs will almost guarantee you’re missing key visibility.
5. Define the population you really care about
How do you analyze a mountain of high-cardinality data if you’re not sampling? And what does that mean, practically speaking, for your mobile SLOs? This quagmire can be largely solved by hyper-focusing on the populations you really care about, and doing so touches on the suggestion we made in point two around business objectives.
Consider, out of all the people using your app, which groups are responsible for the majority of revenue. Depending on your business model, it might mean paying customers vs. free-trial users. Or, it might mean people who are on the latest version of your app rather than laggards. Or, it may even be those living in certain geographic markets who are driving 80% of purchases in your app.
The point is, it’s impossible to strive for a perfect experience for all users all the time – you’d spend all your time on reliability and none of it on innovation. However, if you can isolate certain audiences that are business-critical, you can refine your mobile SLOs and their resulting error protocols so that you limit the disruption to other important engineering work when reliability becomes an issue.
Continuously iterate and learn
There’s much more at play when working through your SLO strategy, as every app is unique when it comes to its user base, product goals and revenue structure. The above tips are applicable across nearly all cases, but you should always consider your unique customer and business needs, and map out measurements accordingly.
One of the great things about SLOs, especially for mobile where there are not yet any “universal standards” or stringent expectations, is the ability to iterate. Don’t be afraid to start measuring and iterating again and again, as you better understand what your app’s performance benchmarks are and what levels of failure your users are realistically willing to tolerate.
Maintaining strong app performance is a long-term endeavor, so treat SLOs as a tool to guide you along.
If you’d like an in-depth exploration of mobile SLOs, including more detailed best practices, examples, and templates, download Embrace’s free mobile SLO guidebook.
Get started today with 1 million free user sessions.
Get started free