Mobile Crash Owners is one of Embrace’s newest and most compelling features. It allows engineers to upload their own Github codeowner rules directly to Embrace. These rules are then used to tag any captured crashes with their correct owner, making it quick and easy to identify who is responsible for solving a specific issue in your app.
Embrace is the only monitoring solution for mobile that has an intelligent crash tagging feature like this. Building it, as you can imagine, took some time and creative problem solving. To get a look at how the sausage was made, we talked to one of our Product Managers, Scott Breudecheck, who took the lead on building Mobile Crash Owners. Read on for some of his insights.
Q: Let’s start at the beginning. What was your motivation to build the Mobile Crash Owners feature?
Our customers were actually big motivators for us to build this feature in that we kept hearing a specific pain point for this type of missing information. Two use cases specifically kicked off ideation for this feature. The first was centered around customers who had a number of SDK teams, and their app teams kept blaming these SDK teams for crashes. These customers were essentially saying “I need the SDK team to get the same monitoring and debugging info, at the same time the app team does so they can start fixing it.”
The second use case was for customers who manage multi-team projects. For every crash, they would need a pretty senior developer with lots of enterprise knowledge to dig into the crash just to figure out who it should be assigned to. This was taking up the time of their most valuable people and not even for the purpose of building solutions.
With these two use cases alone, we knew there was a need amongst our customers, and very likely a bigger need in the market, for a tool like Mobile Crash Owners.
Q: Based on this customer pain, how did you identify what you needed to build and what steps did you take to build it?
We wanted to validate the problem space before we over-built a solution. We took a functional-but-ugly approach with Crash Owners:
First, we built some simple matching rules and manually ran queries against the crash data, exploring the results with alpha customers. This helped us confirm how rules should work, and what data would be most impactful.
Then, we automated our rule set and worked on exposing results via an API. At this point, we still had to deal with a bunch of JSON wrangling, but now at least this early feature was running 24/7. We saw our customers build their own reporting, alerting, and monitoring off of it.
Finally we implemented it in our UI. Crashes show up with owner values on the Crash Summary page, highlighting the most-known bugs for investigation. This was the ultimate goal for us with this feature, having it integrate seamlessly into the rest of our product’s UI and therefore making it really easy for customers to use.
Q: What were your biggest challenges in building out this feature?
There were quite a few challenges we had to figure out how to solve. One of the biggest was probably dealing with siloed data. Another challenge was desymbolication – crash data needs to be desymbolicated and understood, so that created another layer of complexity.
Also, ownership itself can often be complex and institutional on an engineering team. If you’re lucky a team might have a CODEOWNERS file, but, even then, coverage is often pretty sparse.
Collecting – and often generating – those ownership frameworks, and then immediately applying them across every crash, is a game of mashing together two fragmenting, often incomplete datasets and short-circuiting the triage process.
Q: This concept of code ownership being complex is interesting. Did anything surprise you about how teams actually use ownership rules? How did this end up impacting the feature?
Well, we started with the hypothesis that teams would have a robust codeowners file. This turned out to be a big whiff.
What we found on mobile is that most code reviews are being done with a “tap on the shoulder”. Even in the Zoom age, developers still know the handful of peers they want reviewing their code and just tag them in the PR (or via Slack). There’s little need for documented ownership. The most common feedback we got was : “oh cool, finally a use for Codeowners, let me go put that on the backlog”. (As a PM, I know that means it’ll never happen.)
So we ended up pivoting. While we’re embracing the general matching style of Github’s CODEOWNERS, we’ve invented a more flexible, lightweight rule-setting system. Users can (soon) tag specific crashes, or make up single rules on the fly. This allows for a compounding corpus of rules built over time, rather than one big codeowners to rule them all.
Q: I’ve heard you mention before that Embrace supports “opinionated” tagging. What does this mean exactly and why is it helpful for customers using this feature?
Well, stack frames can take a circuitous route. While we tag every single frame, we’re pretty confident in Embrace’s ability to identify the frame that actually caused the crash. By highlighting this frame’s tag, we are improving the signal-to-noise. As an end user, you’re able to focus on the tag that actually needs attention because it’s associated with a causal frame.
We’re not 100% right all the time. But this is always a meaningful starting spot: the tagged team should be able to quickly give an explanation, or redirect to a more helpful colleague.
Q: Last question for you – what’s next on the horizon for this feature?
We’re most excited about exposing the rule-creation flow in the Embrace dashboard. Much like Asana or Zendesk or JIRA, we know users will see an unassigned Issue and be itching to tag the right person. This is what we’re working on supporting right now. Not only does that get that bug closer to fixed, but it creates a new ruleset that can be applied automatically to similar bugs as they happen!
Eventually, we’d also like to extend the concept of ownership beyond crashes and into other specialized issues, like ANRs. That’s a bit further down the road, but it will be a huge benefit for our Android customers, and especially for mobile gaming apps.