Editor’s Note: This post was originally published on July 15, 2021. It has been updated to ensure content is accurate and links are current and was republished on Dec. 8, 2022.
Application performance monitoring (APM), observability, and real user monitoring (RUM) are three closely related approaches mobile teams are leveraging today to better understand how their apps are performing, and where improvements can be made.
And, as users invest more of their time into mobile offerings and demand flawless experiences every time, APM, observability, and RUM are being used to identify and eliminate critical issues that send users searching for alternatives.
In this post, we’ll cover the following approaches, their strengths and weaknesses, and how they provide much needed visibility to mobile teams:
- Mobile APM
- Mobile RUM
- Mobile observability
Mobile APM is the most basic monitoring necessary for a team to know the health of their application. This monitoring system provides key insights and information about an application’s performance and usage patterns.
Performance monitoring is different from error tracking, which only reports errors and stability issues in your mobile app. An easy way to visualize the difference is that error tracking solutions are like when your body experiences acute pain — that’s when you know something has gone terribly wrong. Performance monitoring, on the other hand, is like trying to determine how healthy a person is by timing how long they can run on a treadmill. You could then compare their time to some benchmark. You might see that they’re doing worse than average, but you won’t know what exactly is causing them to do poorly.
Strengths of mobile APM
The primary strength of mobile APM is to help the mobile team understand the performance of the app before the errors start piling up. Knowing how the app is currently performing can help the team focus on the metrics that matter and give them actionable KPIs to improve upon.
The following are some common metrics that mobile APMs collect:
- Startup time
- Network call duration and payload size
- Network errors
- Timing of custom traces (e.g. add to cart, purchase)
Mobile APM solutions can point out regressions in aggregate metrics over time. Teams can set up and test the timing of key interactions within the app so that when it goes into production, they know the expected performance. Any significant deviations probably warrant an investigation.
The expression “where there’s smoke, there’s fire” comes to mind. Mobile APMs can indicate when an issue may have surfaced so mobile teams know when to dive in.
Weaknesses of mobile APM
Unfortunately, identifying that failures are happening within a mobile application is not the same as providing the context needed to solve them. The biggest weakness mobile APM faces is an overreliance on metrics and logs.
For example, if the mobile team discovered a key endpoint suddenly had a large increase in response time, the ultimate impact on the user could be trivial, or it could be huge. Without individual session context to know, for example, whether or not exits are caused by users force-quitting the app, the mobile team is forced to guess whether a regression deserves immediate attention based on flimsy metrics alone.
To drive this point home, let’s go through just a few of the many scenarios that might cause a network response time to suddenly balloon.
It happens when users are attempting a purchase. For example:
- Users might be upset, but ultimately complete the purchase.
- Users could cancel the purchase and force quit the app, choosing to use a competitor’s offering instead.
It happens when users are uploading a data-heavy video. For example:
- The call results in a connection error when it takes so long that it times out. Users, unfazed, try the upload again and again until it’s successful.
- A slow call results in a crash. Users are upset at having to go through the upload process again, and decide against it, ultimately nosediving app engagement.
It happens when users launch the app. For example:
- Users hate the long load time but patiently wait for the app to become interactive.
- The users just background the app and uninstall it.
Because the metric isn’t tied to individual user journeys, mobile teams cannot effectively prioritize issues without guesswork. Tracking indicators instead of actual user experiences will always have this visibility gap. And for that reason, many are turning to mobile real user monitoring (RUM).
Mobile RUM, aka End User Monitoring
Mobile real user monitoring (RUM), also known as end user monitoring, is a monitoring system which provides key insights and information about an end user’s usage patterns and experience with the application. It’s an extension of performance monitoring that captures and analyzes transactions at the individual user level. Thus, it’s designed to gauge the underlying user experience, including key metrics like load time and transaction paths, and it’s a logical expansion of mobile APM’s capabilities.
Let’s expand on our analogy on monitoring a person’s health:
- In mobile APM we determine how healthy a person is by timing how long they can run on a treadmill. We can get a general sense of their health this way, but we won’t be able to tell exactly why they are underperforming.
- In mobile RUM we instead get a list of every action they take and how long it takes to complete them. For example, we know if they went into an ice cream store and spent 30 minutes eating ice cream. Thus, we have more insight into how the person’s actions ultimately affect their health.
Mobile teams will want to implement mobile RUM when the user’s experience with the app is starting to take priority over simple performance tracking. Remember: just because your app is “healthy” and doesn’t crash or freeze, doesn’t mean your user is necessarily enjoying the experience.
Strengths of mobile RUM
Mobile RUM is frequently added as a mobile counterpart to existing backend monitoring tools like Appdynamics, Dynatrace, and Datadog. Teams can thus extend their existing coverage to include device-side visibility. The goal is to track transactions from the device through the entire backend to discover shortcomings.
Here’s what a good mobile RUM solution tracks:
- Session-level context: breadcrumbs of user actions (e.g., taps, scrolls) and screens visited
- Device-level information: device, OS, version, region, etc.
- Network-level information: network calls, network quality, etc.
Mobile teams can be notified of a spike in a given event (e.g., network call, trace) and investigate a sampled amount of sessions to gain context.
Weaknesses of mobile RUM
Mobile RUM is an additional feature built out from a backend monitoring solution. While it functions on a basic level for mobile teams to gain insight into user actions within a given session, mobile RUM tools don’t provide the true session replay necessary for debugging individual issues.
Mobile has so many variables that every single device is unique. When you factor in differences between the OS, app version, region, connectivity, and device state, each user experience has incredibly high cardinality. Thus, the key to uncovering the root cause of issues is diving into high-fidelity data from the individual affected session. Instead of relying on broad coverage like breadcrumbs and screens, mobile teams need the ability to immediately reproduce the entire technical and behavioral details of any session.
Let’s go through a few issue types that a mobile RUM solution would not provide visibility into.
Issues spanning multiple sessions
In mobile, users jump in and out of apps all the time. A unified user experience could span several foreground sessions. As such, for monitoring solutions that are not built to collect and stitch together nearby sessions, the mobile teams will not be able to pinpoint the root cause when the failure is a previous session.
One of the largest mobile e-commerce apps in the world had a crash that affected 1% of users every release. It existed for years since the app’s first submission to the app store. The mobile team knew the impact on the business but could not discover what the cause was. The problem stemmed from a failing third-party network call in a previous session. It took a mobile-first platform like Embrace to provide the visibility needed to solve it.
Exceeding resource limits
Mobile apps crash when they exceed system resource limits. Having a complete replay of the technical details of a session includes knowing exactly when there is CPU pegging, a low memory warning, low battery, etc. Knowing what leads up to these bad device states can both pinpoint optimization opportunities and prevent app failures.
For example, e-commerce and social media apps frequently load large amounts of photos and videos, leading to out-of-memory (OOM) exceptions that kill the app. OOMs tend to occur an order of magnitude more frequently than traditional crashes. A 99% crash-free app with a 97% OOM-free rate really has a crash-free rate of 96%.
A mobile RUM solution offers no context into these issues. If the network calls that download photos all complete quickly and successfully, there’s no indication of a problem. The mobile team needs insight into the device state at all times during a session to spot these types of failures.
Unclassified crash types
There are several crashes that are difficult for traditional monitoring solutions to classify, including Watchdog terminations, Auto Layout exceptions, and CollectionView crashes. When users complain about such a crash, the mobile team either doesn’t get a stack trace or doesn’t get a useful one. Mobile RUM solutions do not provide this mobile-focused depth in their coverage.
Application Not Responding (ANR) is a type of error on Android that predominantly occurs when the main thread of the application is blocked for a minimum of 5 seconds, upon which the user is prompted to terminate the app. However, ANRs do not have to last 5 seconds to impact the user experience. Freezes and stutters happen anytime the main thread is blocked, and these lead to users backgrounding and force quitting the app.
Mobile RUM can track slow network calls, but frozen screens can happen in many ways beyond a single slow network call. If multiple SDKs are initializing at startup, the network calls they fire could resolve quickly yet lead to congestion when processing the data on the device. Likewise, storing or transforming data can be CPU-intensive and result in a frozen screen.
Users force quit mobile apps for many reasons. It could be that they are just clearing out their running apps, or it could be a sign of frustration with the app (e.g., slowness, too many ads, frozen screen). A mobile RUM solution cannot provide insights about a spike in user terminations because it’s not built to monitor entire user experiences. In troubleshooting these types of issues, where there’s an absence of information, having complete data from every session is crucial. That way, the mobile team can go from an indicator metric to immediately filtering affected sessions across attributes to spot patterns that lead to causes.
Similar to the weakness of mobile APM, mobile RUM can alert teams to issues but doesn’t provide the context necessary to solve them. Mobile teams want a data platform that uncovers both known and unknown issues. Instead of monitoring for specific elements, they want a solution that will show every user-impacting issue, its impact, and provide the context needed to solve it. That’s where mobile observability comes into play.
Observability is a design concept defined as the ability of a system to enable identifying its internal states by analyzing its external outputs. In layman’s terms, you can look at an observable system from the outside and know exactly what’s happening inside. An observable application leverages design and instrumentation to provide insights that enhance monitoring and logging data.
Let’s revisit, one more time, our analogy on monitoring a person’s health:
- In mobile APM we determine how healthy a person is by timing how long they can run on a treadmill.
- In mobile RUM we instead get a list of every action they take and how long it takes to complete them.
- In mobile observability we have a camera following the person around, cataloging where they are and what they’re doing at any given time. In addition, we have a machine hooked up to them that continually sends us their body’s vitals so we know how they react physiologically to a given situation.
In other words, observability is a measure of a system’s ability to enable teams to diagnose what’s happening inside without the need for guesswork or the process of elimination. When observability is integrated into an application, the mobile team can easily gauge its internals and navigate to the root cause of issues faster.
As businesses increasingly turn to mobile as a primary revenue mechanism, the ability to make decisions with high-fidelity data and insights becomes a key competitive advantage. To move fast, it’s crucial to have full visibility into your mobile applications:
Performance and stability
- Are users abandoning the app because the startup is really slow?
- Is the app freezing when users try to make a purchase?
- Are users dropping out of key funnels?
- Whether it’s a crash, slow startup, or failing endpoint, how does the issue affect revenue and churn?
- When a regression happens, how does it affect engagement (e.g,. session length, feature use)?
- How does in-app advertising impact the user experience?
- Are users spending more time and money in Feature A as opposed to Feature B?
- Do features have more adoption with specific user segments (e.g., device, OS, region)?
- Does a new feature not perform as well as other parts of the app (e.g., uses too many system resources, runs slow)?
- Are mobile engineers the first to know when something’s wrong?
- When a third-party SDK crashes your app, do you know before the company makes an announcement?
- Can you control precisely when your team is notified for a given issue?
If your current mobile tooling cannot help you answer these questions, then you may need to switch to a more capable data platform.
The only tool built specifically for mobile observability
Embrace is the only mobile-centric data analytics and observability company that empowers enterprises to transform their businesses in a mobile-first world. With actionable data and insights derived from 100% of mobile user experiences, engineering and data science teams proactively uncover, prioritize, and quickly solve issues before they affect a business’ bottom line.
Embrace collects 100% of the data from 100% of user sessions and makes it available to the entire mobile team. With this data:
- Engineering can replay any session to pinpoint the root cause of any issue — even one that doesn’t result in a crash or error log.
- Product can check feature adoption and run experiments to make roadmap decisions based on which parts of the app are getting the biggest return.
- Data Science can run LTV models with complete data and uncover where churn is impacting revenue the most.
- QA can easily test new app versions and send the associated session data to engineers without the need to create manual bug reports.
- CS can triage user complaints by looking up individual user sessions to see if the issue was with the code, the user, or the network.
A platform that offers observability comes with all the benefits of mobile APM and mobile RUM plus so much more, including:
- 100% of the technical and behavioral data from 100% of sessions.
- Timing and outcome of every network call.
- The full user journey (e.g., views/screens/activities, breadcrumbs, and webviews).
- Full user actions (e.g., taps, swipes, scrolls, button presses).
- Automatic crash classification and deduplication.
- Advanced ANR detection and solving, through the capture of stack traces across every frozen interval.
- Device state (e.g., CPU, memory, battery).
- Error logging with filtering by key value pairs.
- Timing and abandonment tracking for custom traces.
For teams that lack the visibility needed to drive business decisions, observability is the answer.
Learn how embrace can help you achieve mobile observability by booking a demo today.