Embrace’s iOS and Android SDKs are now built on OpenTelemetry!

Read the release

Solving nativePollOnce ANRs with the right tooling

Android engineers often struggle with nativePollOnce ANRs in the Google Play Console. These incredibly common issues can account for up to 60% of ANRs, yet are nearly impossible to solve with the limited data provided by GPC. In this blog, we'll take a closer look at this type of ANR and how more specialized tools, like Embrace, can help engineers resolve it.

The nativePollOnce error is one of the most common types of Application Not Responding (ANR) errors that Android users and developers encounter.

In fact, nativePollOnce is so common, we’ve seen customers with upwards of 60% of all their ANRs attributed to nativePollOnce in their Google Play Console (GPC).

Unfortunately, while GPC is able to label the error for you, it gives very little information to help mobile developers resolve the root cause.

Better tooling can help, and in this blog we’ll show you how it makes it possible to fully understand and resolve issues related to nativePollOnce.

What is a nativePollOnce ANR?

ANR refresh

Before we dive into the details of nativePollOnce issues, let’s do a quick refresh on what ANRs are more broadly. 

ANRs, or “application not responding” errors, happen when the UI thread of an Android app is blocked for too long. As a result, the app’s UI appears frozen to the end user and the system sends an error that reads “application not responding” to the user. 

Google Play Store closely monitors apps’ ANR rates, and officially counts when an app becomes unresponsive for at least 5 seconds. As Android developers know all too well, their apps are penalized by Google if their rate of ANRs is above a certain threshold, leading to lower rankings and discoverability. 

There can be many reasons for an ANR. Some common causes, as outlined by Android, include input dispatch timeouts, broadcast receiver timeouts, execute service timeouts, and content-provider-not-responding errors.

nativePollOnce ANRs

The Google Play Console and Android’s Developer Resources pages offer quite a bit of guidance around solving some of the common causes of ANRs, such as those mentioned above. Unfortunately, when it comes to nativePollOnce issues, there’s not much help to be offered. 

Android’s Developer Resources call nativePollOnce errors “Mystery ANRs,” and explain that they are often due to the unresponsive thread being idle and waiting for looper messages. The reasons for the unresponsive thread being idle include:

  • Thread misattribution, whereby the thread used to build the ANR signature wasn’t actually the unresponsive thread that caused the ANR.
  • A late stack dump, whereby the thread recovered in the short period of time between when the ANR started and when the GPC received the stack traces 
  • A system-wide issue, whereby the process wasn’t scheduled due to a heavy system load or an issue in the system server.
An image showing the aggregate view of nativePollOnce ANRs in a Google Play Console window.
Aggregate view of nativePollOnce ANRs in the Google Play Console
An image showing the Google Play Console view with a list of ANRs and the nativePollOnce ANRs circled, with a caption saying that 30-60% of ANRs often fall into the nativePollOnce group.
It's common for apps to have 30-60% of ANRs in the nativePollOnce group.

In our experience at Embrace, many of the nativePollOnce ANRs that our customers deal with are indeed rooted in a late stack dump. The reasons for this are twofold.

Firstly, there is some degree of latency when certain types of phones and Android OS versions send a stack dump. The latency in Pixels on Android 13 — Google’s flagship hardware and recent OS — is around 100ms but can exceed one second, for example, though on Android 14 (Google’s current mobile OS) it’s usually under 10ms. During the period of time between when an ANR is detected and when the stack dump is sent, the ANR may have resolved itself, and so the information in the stack trace doesn’t actually capture the problem. 

Related to this latency issue is the fact that GPC defines the threshold of an ANR to be five seconds of blockage on the main thread. It does not capture samples of the stack frame before the five seconds which, in many cases, might actually be the better samples to reveal the origin of the blockage. ANRs can evolve over a few seconds before a sample is taken. So, by the time five seconds (plus additional latency) have come around, something else might be going on in the stack frame, and so the samples are too little too late; they’ve missed the real problem, which may have started two seconds before the stack dump.

The problem with nativePollOnce ANRs

If the stack trace that’s available to you was actually captured too late, then you’ll never get the information about what really caused the ANR.

Google says just about as much in the Play Console if you look into any single nativePollOnce ANR, explaining that the stack traces “do not show the blocking condition that is causing the ANR.”

An image from the Google Play Console of a specific nativePollOnce ANR where Google provides an unhelpful message that the issue is not actionable as the strack traces do now show the blocking condition of the ANR.
An unhelpful message from the Google Play Console about nativePollOnce ANRs

And while Google can’t help you resolve these types of issues, they will still punish your app for having them. 

This harsh reality makes nativePollOnce ANRs some of the most frustrating to deal with among Android devs. 

Embrace removes that frustration through better tooling.

Tackling nativePollOnce ANRs with Embrace

Early sampling and higher-volume sampling

Embrace’s ANR resolution tool is built to address some of these deficiencies in the GPC that make analyzing difficult ANRs like nativePollOnce, easier. 

One way to mitigate the too-late stack dump, for example, is to just do an earlier stack dump. Our SDK will do exactly this when integrated into your app. 

While GPC considers an ANR “triggered” when the main thread has been blocked for five seconds, Embrace’s SDK recognizes it much sooner. The moment the main thread is blocked for one second, the Embrace SDK starts taking samples of the stack frame. This addresses a large part of the problem because it reduces the chance that your stack trace samples have missed that crucial ANR window. 

Additionally, Embrace takes multiple samples of the blocked thread’s stack frame every 100ms during the ANR. When it’s either resolved itself or ended via a force-quit by the user, you may have hundreds of samples to analyze. This means that, even if something changes quickly within the call stack, it’s very likely that those different conditions will have been captured at least to some degree across the many samples. You don’t have to rely on just a handful of out-of-date samples taken at five seconds or later.

ApplicationExitInfo API and its correlated data

One of the most frustrating things about nativePollOnce errors is that Google provides no context for solving them, and yet still punishes your app when their instances are too high. 

What’s more, Google has historically been a closed box in terms of exposing how exactly it captures and flags an ANR. This has led to discrepancies between ANR stats on GPC and stats that other tools, like Embrace, have been able to capture. 

This changed in a big way with the introduction of the ApplicationExitInfo API (AEI API), which Android recently added to let developers understand the reason for an application process’s death — one of which is an ANR termination. 

To make use of this, Embrace actually scrapes data from the AEI API, so we know when an ANR-induced exit has happened in your app. Using this API, our listing of ANRs mirrors GPC’s with about 90-95% accuracy

As we’ve established, Embrace takes many more samples (as well as earlier samples) for a specific ANR window than Google. In our dashboard, you can actually use this abundance of data to troubleshoot the ANRs that Google doesn’t give you, because our backend correlates our data with Google’s. 

During the time frame that a session experienced a terminal ANR (according to the AEI API), the Embrace SDK will have captured many samples of the main thread’s stack frame. Our dashboard allows engineers to explore all those samples as “correlated data.” In the case of nativePollOnce ANRs, this is especially useful because we provide a window to look at what was happening on the thread based on our own data collection, whereas Google provides nothing. 

A screenshot of Embrace's dashboard showing the ApplicationExitInfo API-scraped list of ANRs captured for the app.
Embrace's AEI summary view of ANRs in the dashboard

Multiple views of the data

Going from zero to thousands of data points can be overwhelming. That’s why we’ve developed flame graph displays to help engineers make sense of all the stack trace samples. 

A flame graph essentially maps out every single sample of the stack frame that’s been captured down to the individual method level. If you interpret the graph vertically, each slice is an individual ANR sample, so the further down the graph you go, the closer to the top of the stack the methods get. 

Interpreting the graph horizontally shows you the frequency of methods. The wider the block, the more frequently it appears across all the different samples collected. Analyzing the methods sampled across both depth in the stack and frequency of occurrences can help you figure out what to prioritize. 

To further whittle down all the data you’re seeing in a flame graph, you can choose different visualization modes. 

The “prioritization” mode, for example, cuts out a lot of the noise and offers you a cleaner view of what methods are heavily represented, both in terms of breadth and depth. 

The “debug” mode, on the other hand, brings in the detail you need to actually dig in and solve the issue.

By getting multiple views of data that either hide or expose the granular, method-level details, you have adequate tools to analyze what happened in the thread during the timeframe when nativePollOnce error was caught.

An image of Embrace's ANR flame graphs in the dashboard with the "prioritization" filter applied.
Prioritized view in the ANR flame graph
An image of Embrace's ANR flame graphs in the dashboard with the "debugging" filter applied.
Debug view in the ANR flame graph

What else can you get with better tooling?

The nativePollOnce error is a common issue many developers run across, but it’s one that can’t be solved with common tools like GPC. Better tooling — tools built specifically to solve mobile engineers’ frustrations and pain points — is required to get to the bottom of these ANRs.

But there is so much more that the right tool can give you when it comes to ensuring your app performs well. From crashes to UI issues, network failures to custom user flows, Embrace collects all of your app’s critical mobile data and helps you make sense of it in our dashboard. 

Our data-rich issue detection and resolution tool is specially designed for early discovery because we capture the full story for each individual session. That means that you’re able to catch early performance risks as soon as they start impacting even just a handful of people, rather than 5%-10% of your entire user base. 

To learn more, check out the rest of our product suite, or request a demo to see how our solutions can help you maintain fast, stable, and all-around amazing mobile experiences.

Embrace Deliver incredible mobile experiences with Embrace.

Get started today with 1 million free user sessions.

Get started free

Build better mobile apps with Embrace

Find out how Embrace helps engineers identify, prioritize, and resolve app issues with ease.