Here’s a scenario. Your infrastructure is humming along beautifully. Uptime is 99.9%. Your SLOs are green across the board. Your error budget is intact and no one has been paged in two weeks. You are, by every traditional measure of reliability, crushing it.
And yet.. users are abandoning your app after two seconds on the loading screen. Scrolling feels choppy and weird. Taps on buttons feel laggy. The app crashes, not frequently, but just frequently enough that a certain percentage of your users – the ones who’ve experienced it – have uninstalled and moved on.
You are, simultaneously, completely reliable and completely failing your users.
