Join us Thursday, Sept 26th for the "What your SLOs aren’t telling you about Mobile" webinar

Save your seat!
OpenTelemetry

A Giant Leap For Span.Kind

At Embrace, we've been excited about meeting and growing the OpenTelemetry specification for some time now. We also have a ton of domain-specific knowledge about how mobile telemetry should be collected and processed. This led to a decision point while rebuilding our mobile SDKs to emit OpenTelemetry primitives: how can we quickly inform the data ingest layer about the types of telemetry we're sending, to make its life easier?

Intro to Span.Kind

When we started our endeavor into modeling the data we collect as OpenTelemetry primitives, we were excited to see a Span.Kind field. Initially, we thought we could use this field to provide type annotations to our telemetry in order to help our backend ingest data and apply specific behavior. Unfortunately for us, that is not Span.Kind’s intended purpose.

Span.Kind is specifically used to hint at the relationships within a distributed trace. Its primary goal is to reflect whether the Span is a “logical” child or parent when a trace crosses process boundaries.

We were looking for a way to model any data that has a duration as a Span. When this data arrived at our backend, having an explicit structure for the kinds of spans we expect would let us perform specific analysis and aggregation.

For example, if we encounter a span that represents a network request, we want to make sure we can pull out the URL path and response status code attributes to analyze how many 404s the client receives for each route it hits. Mobile apps are generally used in predictable ways, as prescribed by both the app developer and the app ecosystem, so it’s easy to create a relatively short list that covers the majority of mobile use cases.

Type-hinted telemetry

We quickly realized we wanted some kind of type system for our telemetry. At the same time, we made the switch to OpenTelemetry so we wouldn’t need to implement discrete types – we liked the common envelope that the OTel primitives provided.

We needed something that would help us sort the variety of telemetry we sought to collect, and something flexible enough to prevent redundant implementation any time we wanted to add new instrumentation. So how about a compromise?

We decided to add a hint to our telemetry so that our backend can process input with type-specific intuition, while staying true to the Span specification. This works by adding a Span Attribute with the key emb.type.

As we were developing this approach, we needed to constrain this emb.type value as a “type system” quickly led to scope creep. We needed a focused approach to avoid bloat and over-reliance on one Span Attribute. In the end, we decided to set some unbreakable rules around how these type values would be interpreted:

  1. There must not be any attempt to create “child types” where behavior would be established up an ancestral hierarchy. Each “type” needs to exist and be meaningful in its own context.
  2. There must not be open-ended “composition” by using delimiters in the value.
  3. The type must not be used to indicate a “version” of the schema if breaking changes occur.

These rules mean that emb.type, for the most part, is an opaque value that allows customization of the processing and displaying of telemetry.

Primary categories of mobile telemetry

Once we started to plan our re-instrumentation atop OTel, we realized that not all data we collect is the same. There were some major distinctions that fundamentally change how we interpret data.

The data we started to collect broke into three primary categories. These were separated by what we were observing. They are:

  1. The application‘s performance – emb.type = "perf"
  2. user‘s interaction – emb.type = "ux"
  3. The system‘s status – emb.type = "sys"

These categories are different because, when processing the data, we perform similar aggregations, but we interpret them differently.

Consider an example using span duration:

  • For application performance, developers want to minimize the span duration and make the logic as performant as possible. We look at the longest-running cases and try to remove any reason for these outliers. Here, we aim for the smallest duration.
  • In contrast, when measuring user interaction, developers may want to maximize the span duration in order to keep users engaged. You might wish to look at the longest-running cases and try to mimic these flows for other users! Here, we aim for the longest duration possible.

Note that, when monitoring the system, it’s unlikely that we as app developers can do much to influence changes in behavior. This data may not be actionable, but it is still relevant. For instance, if we have a number of spans that are performing poorly we can gain insight into the issue if the device is in low power mode. This might give an indication that the CPU is being throttled or that network calls are delayed, since the system batches device radio use to improve battery performance.

Secondary categories

These primary categories are just the start. We still want type-hinting to work for more than these 3 categories. Our emb.type needed to evolve to allow for more-specific scenarios, so the the format should be:

emb.type = "<primary>.<secondary>"

The primary category – one of perf, ux, or sys – is used as a prefix to distinguish our aggregation behavior. An optional secondary suffix can then be used to provide contextual relevance to the operation.

Some examples of emb.type values for Spans that we have in use are:

perf.fileio
perf.network.http
perf.sql.vacuum

ux.session
ux.action.tap

sys.low_power

These values classify a type of interaction or operation that we observe from within the application.

Wrapping up

So what does this mean? At Embrace, it means we can provide specific insight into the telemetry we collect. If you choose to use our SDK and the instrumentation it provides, it means your backend will be able to take advantage of this emb.type attribute. Hopefully this post shares some of the thought process behind it.

This type-hinting might grow and change as both our instrumentation and the OTel spec grow. We’ve just started our OpenTelemetry journey, and this was our initial, stable attempt to create a useful paradigm for mobile clients. We’re excited to try new things and share what we find useful along the way.

Embrace Deliver incredible mobile experiences with Embrace.

Get started today with 1 million free user sessions.

Get started free

Build better mobile apps with Embrace

Find out how Embrace helps engineers identify, prioritize, and resolve app issues with ease.