This article was originally published on The New Stack.
Before Observability 2.0 practices became commonplace, site reliability engineers (SREs) ensured their backend systems were behaving within expectations by monitoring key metrics. Apps would be instrumented so real-time performance metrics were reported and aggregated in production, with outliers being surfaced through dashboards and alerts.
You monitored performance in terms of percentiles, and if, for example, the P99 of execution time in a key service spiked, it might be time to investigate.
Now, while I’m not going to argue the previous Observability 1.0 approach was the end-all-be-all, I think we can all agree it worked. Or at least, it got the job done, for some definition of the job.
But then, why wasn’t this approach adopted by mobile teams? After all, mobile is one of the most complicated ecosystems, and poor app performance and frustrating user experiences are leading contributors to brand erosion.
If measuring app performance with P99s worked for backend systems, shouldn’t it work for mobile apps?
In short, no. I’ll cover why, and what a better approach is, but before I do, it’s important to understand why Observability 1.0 never took off for mobile teams.