Embrace’s iOS and Android SDKs are now built on OpenTelemetry!

Read the release
Unity

Decoding mobile games: Mastering Unity C# Symbolication with Embrace

Learn about C# symbolication for mobile games, the importance of this process, and how we support it at Embrace.

Unity C# symbolication is the process of converting memory addresses in a compiled Unity application into readable C# method names, file names, and line numbers, which aids in debugging and understanding crash reports.

As you may have seen in our recent release announcement, Embrace now supports C# symbolication for Unity projects!

In this article, we’ll explain exactly why that’s important and how we’re able to support this at Embrace, as well as all the context you need to better understand Unity C# Symbolication.

Let’s get started with some context for those that don’t live and breathe Unity or C#, daily.

What is Unity?

Simply put, Unity is a widely-used game engine that enables developers to create, operate, and monetize interactive 3D and 2D games across mobile platforms.

If you’re an experienced Unity engineer, you can probably skip ahead to the compilers section below. But if you’re new to Unity or mobile game development, you might be wondering why you would choose this game engine over another. One reason Unity has become so popular is because it solves a very common problem for mobile engineers.

Let’s say you have an idea in your head for a great new game, and all you need to do is sit down and write the code.

Before you can start writing some of your game logic, you realize you need a few things: you’re going to need to be able to draw some pixels to the screen, either from 3D models or from some 2D sprites you have lying around. You’ll probably want to play some sounds, and let the player provide some input, maybe to move a character around.

So off you go to read your system documentation, since your operating system helpfully provides APIs for doing all of these things.

Then someone else on the team makes an off-handed comment about wanting to run on PS5, or mobile, so you go look up their API documentation, and learn, to your horror, that it’s all different for each system.

You could rewrite all of this for each system you support, but that sounds like an awful lot of work … enter: game engines.

These helpful bundles of code provide APIs to do all the above, and even more helpfully, delegate all of those API calls to the system-specific APIs that you’re targeting.

Now, you can write code once, and run it across multiple platforms! Unity is one of the most popular choices for game engines because of its wide support for platforms like Windows, consoles, mobile devices, and more.

And while Unity is also flexible in that you have your choice of languages, the most popular language mobile engineers use when working with Unity is C#.

Programming Languages: C#

So what is C#?

C# is a programming language developed by Microsoft, mostly as a response to the C++ and Java programming languages.

C++ was already a very popular choice at Microsoft, and well supported by their system APIs and compilers, with large portions of Windows itself written in it.

At the time Java was a very popular system language, but it broke from more traditional languages in a few ways, the most significant of which was by running in something called a virtual machine.

Virtual machines are a way of taking a program and compiling it once, but still being able to run the compiled program on multiple architectures and operating systems. Most languages in broad use at that time were compiled directly to machine code, and had to be recompiled for each operating system and machine architecture supported, with Java being one of the first virtual machine hosted languages to gain broad support in industry.

Since Sun Microsystems controlled the Java language and platform at the time (with much more to this story) Microsoft recognized the need for a similarly modern language. Consequently, they sought to create their own language, usable both in Microsoft products and as a programming environment for their customers.

The result was C#, .NET, and the Common Language Runtime, where the Common Language Runtime is really the virtual machine used by C#. Given Microsoft’s pedigree in gaming, and their support for C# on both XBox and Windows, it’s easy to see why C# gained significant traction among game developers, and why Unity would want to support it.

Compilers: A Quick Intro

We have almost all of the pieces we need to understand C#’s use in Unity, but we need to understand a few details of a familiar tool to programmers: compilers.

A compiler translates source code written in a programming language into machine code that a computer’s hardware can execute.

People like to reason about things in terms of names — we give names to everything from rivers and mountains, to entire countries, abstract concepts, and our own pets and children. We use names to refer to those concrete objects or abstract ideas so that we can communicate or reason about them.

That works really well for us, but computers are a completely different animal.

Computers care only about numbers — even things we traditionally think of as being “low level” for a computer, like op codes for CPUs, only have names so humans can think about them. To the computer each one is just another number, along with every other piece of data we may be working on: text, images, sounds, everything a computer deals with is just a number.

This disconnect means we need a way to interact with machines that bridges the gap between human understandable names (or symbols), and the incomprehensible everything-is-a-number world those machines inhabit: compilers.

A compiler is both very simple and extremely complex: code written in a programming language goes in, machine code (or virtual machine code) comes out. The way it accomplishes this — parsers, lexers, optimization passes, machine code generation — is an incredibly complex topic, but for our purposes we only care about a few of those features: the idea of a frontend, an intermediate representation, and a backend.

This is a very common design paradigm for compilers, with parsers, lexers and abstract syntax trees being commonly thought of as the “frontend,” and machine code generators being commonly understood as the “backend.”

Today we’re most interested in the middle part: intermediate representation or intermediate language. We’ll become more familiar with this topic as we explore how Unity deals with C# code shortly.

Unity’s C# Compilation Pipeline

That was a lot of context, but now we’re well positioned to talk about the specifics of C# and Unity, and upset the model of compilers we just built up.

As you may have noticed, we talked about how C# is a virtual machine hosted language, but then spent a bunch of time talking about compilers generating machine code. Why?

Well, it turns out that, probably for performance reasons, Unity opted not to run C# within a virtual machine. They also opted not to write a bunch of compiler backends to support the various architectures and operating systems they supported. Instead they went with something called a source-to-source compiler, or transpiler.

When we spoke about compiler backends before, we said that they generate machine code, and indeed that is the most common type of backend, but it’s not the only type. They can also generate something else: source code.

One of the reasons compilers can be so complex is that they deal a lot with a concept called recursion.

Recursion comes up in a bunch of contexts in programming languages, but compilers themselves can be recursive: A compiler can compile source code, and turn it into yet more source code, usually in a different language.

For Unity and C#, that language is C++.

This has some advantages in that it’s a very widely supported language among different platforms, and so helps Unity tick the portability box we talked about earlier. Platform specific compilers are also likely to be highly optimized for the target platform, which is great for gaming where performance is always a key consideration. Finally both C# and C++ share some semantic similarities, and that often makes transpilation easier than between dissimilar programming languages.

In the case of Unity, the component responsible is called il2cpp (Intermediate Language to C++), and we’ll see this referenced a little later on.

Symbolication: Making sense of machine code

At this point we’re doing well: We understand how Unity’s compiler takes our C# code and compiles it into C++, and how we can then use the platform specific compiler to compile that C++ code, which lets us ship our game to the app store.

But what happens when you get your first user complaint and check your bug report to find that it’s … all numbers.

Not exactly helpful!

As a game developer, I really need to know what function was involved when something went wrong, what file it is in, and what line it’s on.

Thankfully compilers have our backs here too in the form of debug symbols. During the compilation process a compiler can helpfully keep track of which number (or address) lines up with which function name (or symbol), and which file and line that function was located on. We can use a tool (like lldb, gdb, or Embrace) to match those debug symbols up with the numbers, and get something useful: a process called symbolication.

Except, as far as the compiler is concerned we wrote our code in C++, when in fact we wrote it in C# and Unity’s il2cpp transpiled it into C++ for the platform specific compiler.

That means all of our debug symbols reference code — that we may not even have access to anymore — definitely isn’t the code that we actually wrote, or need to update to solve the problem.

Don’t worry, we just need more recursion: Unity’s compiler also creates debug symbols, except instead of translating between numbers and human-understandable function names and files, it translates between the C++ function names and files, and our original C# code.

We just need to symbolicate after we symbolicate!

Unity C# Symbol Translation

Now we can dig into the meat of Unity C# symbolication.

Let’s assume you’ve already symbolicated your code using the regular platform specific tools, and now you need to translate those symbols back into C# symbols.

How do you do that?

Unity provides two files for this purpose: MethodMap.tsv and LineNumberMappings.json, normally found under Classes/Native/Symbols in the Unity project root. As you might expect, they provide two different parts of the mapping:

MethodMap.tsv

From the file name you might have guessed that this is a tab separated value file, and that’s exactly what it is. It contains three columns:

  • C++ symbol
  • C# symbol
  • C# assembly

It will look something like this:

U3CPrivateImplementationDetailsU3E_ComputeStringHash_mC7DC26EF4301846E2947FBD7916A16E88C887055 System.UInt32 <PrivateImplementationDetails>::ComputeStringHash(System.String) Mono.Security
U3CPrivateImplementationDetailsU3E_ComputeStringHash_mD3F36EC5C78F193C349C41776297E0482C988A51 System.UInt32 <PrivateImplementationDetails>::ComputeStringHash(System.String) Newtonsoft.JsonU3CPrivateImplementationDetailsU3E_ComputeStringHash_m171C269D828658C44041FA68B6DE8CA290ED517F System.UInt32 <PrivateImplementationDetails>::ComputeStringHash(System.String) System
U3CPrivateImplementationDetailsU3E_ComputeStringHash_mF94ADA7AE429F335FB436FEDD374F6ED6E8DB707 System.UInt32 <PrivateImplementationDetails>::ComputeStringHash(System.String) System.Net.Http
U3CPrivateImplementationDetailsU3E_ComputeStringHash_m8AD748350993B116B2C4A98803EE1E291A0ADFEE System.UInt32 <PrivateImplementationDetails>::ComputeStringHash(System.String) System.Xml

Figuring out your C# symbol from a C++ symbol using this map is really straightforward: You find your symbol in the first column, then the second and third column contain your C# symbol and assembly, respectively.

LineNumberMappings.json

This is the second piece of the symbol mapping, and contains the file information. It is a map, of maps, of maps to numbers, with the keys being, in order:

  • C++ file name
  • C# file name
  • C++ line number, as a string
  • C# line number, as a number

An Example:

{
  "/Users/embrace/repos/embrace-unity-sdk/UnityProjects/2021/Library/Il2cppBuildCache/iOS/il2cppOutput/Embrace.SDK.cpp" : {
  "/Users/embrace/repos/embrace-unity-sdk/io.embrace.sdk/Scripts/Embrace.cs" : {
    "2576": 24,
    "2598": 31,
    "2620": 33,
    "2628": 36,
    ...
  },
...
}

As you’ve probably already guessed, using this is a little bit trickier than the symbol mappings.

For one, the C++ line numbers won’t always match exactly with the line numbers in the JSON file. To match up your file information you’ll need to do the following:

  • Find your C++ file in the outermost map, if not found then no mapping exists.
  • Iterate through all of the C# files and C++ line numbers under that outermost key.
  • If your original C++ line number is equal to a line number in the file, you’re done; the result is the C# file referenced, and the C# line number stored under that line number key.
  • If not, keep track of the C# file and C# line number for the greatest C++ line number that is less than the line number of your crash.

At the end you should have a candidate for mapping your crash back to the original C# file and line number.

You can, optionally, enforce a minimum distance at this point: If the found C++ line number isn’t “close enough” to your target (crash) line number, then you may or may not consider it a match.

Congratulations, now you know where to look in the original source code again!

Unity C# Symbols at Embrace

As you can probably appreciate at this point, there are a lot of moving parts involved here.

To support this feature at Embrace we add one more complication: scale.

Doing this for tens, or hundreds of millions of crashes and ANRs, isn’t a trivial task. Let’s start with how the symbols flow through the system:

  • At build time, an Embrace customer will upload symbol files. This generally means native symbol files for iOS and NDK; for Android, it means ProGuard mapping files. For specific frameworks like Unity and React Native we also upload those framework-specific symbol mapping files. These files are uploaded into an S3 bucket for later use.
  • Then, some event with a stack trace, whether that’s a crash or an exception, is received by the Embrace backend. This event is loaded through our normal pipeline, but at some stage we do some data enrichment — symbolication, via an internal service called symbol service.
  • Symbol service, being primarily a caching layer, checks various levels of caching for the symbol results on a stackframe-by-stackframe basis. This includes things like in-memory caching and a Redis based cache.
  • Failing a cache at that step, the request is forwarded to a service which performs the actual symbolication work. This component then checks its caches for the files it needs, and failing that fetches them from S3 (and caches them locally). There’s also some indexing done here, but we won’t get into that.
  • Now we finally have everything we need: We take our stack, translate all of the symbols, and return things back up through the stack, filling in caches along the way. In the case of C# symbolication, we first apply the native symbols, and then we do the lookup outlined above for the C# symbols.
  • Finally the data pipeline has its stack data, and it stores it away in a database to be used later.

In some cases, we also attempt to do this process at query time.

For example, if we notice that we were unable to symbolicate a stack previously (typically because we hadn’t received symbols for it yet), we will attempt to do so again to try to provide as much context to our users as possible.

This is a process that happens many, many times per second at Embrace, hence the heavy use of caching.

And that’s it: by now you should have a basic understand of symbolication, what it is and why it’s needed, a general idea about how symbolication works for machine code, a more detailed understanding of what’s needed for C# symbols specifically, and how all of this fits together at Embrace.

To learn more about how Embrace helps Unity engineers build better mobile experiences, register for our upcoming webinar here.

Embrace Level up your Unity games

With 100% high-fidelity data on every user session, and features built for Unity, there's a reason game developers favor Embrace.

Learn more

Build better mobile apps with Embrace

Find out how Embrace helps engineers identify, prioritize, and resolve app issues with ease.