WEBINAR Nov 6: End-to-end mobile observability with Embrace and Grafana Cloud. Learn how to connect Embrace mobile telemetry with Grafana Cloud data.

Sign-up

Pumpkin spice and OpenTelemetry for mobile panel recap

OpenTelemetry panelist headshots with leaf backgrounds

In this OpenTelemetry expert panel, we discuss the challenges of collecting telemetry in mobile apps, why mobile developers struggle with observability, and what the current support for OpenTelemetry is on Android and Swift.

Recently, I got together with several members of OpenTelemetry’s Android and Swift SIGs for a fun Fall-themed discussion about how mobile developers can improve their observability practices. (You can watch the full video here.)

There’s definitely a “season of change” happening right now in terms of extending observability tooling and practices to frontend app development. Earlier this year, we covered OpenTelemetry’s new Browser SIG, and now we’re diving into all things OpenTelemetry for mobile.

That’s right, the Android and Swift SIGs have been hard at work planting the seeds for improved client-side observability for mobile, and now it’s harvest time.

In our discussion, they shared the bounty of their knowledge, experience, and contributions to the OTel community, so that engineers can have better instrumentation, telemetry, and observability for their mobile apps. Here’s a small sample of the topics we covered:

  • What are some of the challenges of collecting telemetry in mobile apps?
  • What are instrumentation considerations that are unique to mobile apps?
  • How do telemetry data volumes stack up for mobile apps compared to backend systems?
  • Why is tracing so difficult in mobile apps?
  • What’s missing in terms of semantic conventions for mobile observability using OpenTelemetry?
  • Why does Swift Package Manager make it challenging to use OTel for iOS apps?
  • Does OpenTelemetry have support for Kotlin?

If you’d like to see some of the highlights, read on! We’ve got the best Fall-themed item award, key panelist quotes, favorite answers to questions, and more. If you’d prefer to check out the video instead, you can watch the full panel discussion here.

Hope you enjoy it, and we’ll see you at the next one!

The best Fall-themed item award

When we did our intros during this Fall-themed panel, we learned a few things about Jason Plumb:

  • He’s a big Halloween fan.
  • He loves horror movies.
  • He really loves Fall.
  • He can carve the OpenTelemetry telescope into a pumpkin!

Key quotes from the panel

Ari Demarco

On why collecting telemetry from mobile apps is so difficult: “Mobile apps don’t run continuously, so they are suspended, backgrounded, terminated, killed by OS, there’s a crash, […] the OS can pre-warm your application, the application could launch because of a push notification, a background fetch, or because a human tapped into the icon. So, when do you flush your telemetry? […] How do you track session continuity across app restarts? What happens to, I don’t know, in-flight spans whenever there’s a crash, or the OS kills your process. So there’s a bunch of complexity in terms of what do you decide to do in those cases? And it’s not trivial, like, just solving one of those questions is not a one-liner thing you’ll solve in your code. It’s something you really have to think through to actually solve that.”

On why collecting telemetry from mobile apps is so difficult, part deux: “You gotta figure out how to deal with the different platform challenges that mobile provides to you, because mobile devices operate under fundamentally different constraints than the backend system. So, every telemetry operation consumes battery, CPU, memory, etc. So, those resources are valuable and have somewhat limited, because they are basically what the device has. So, a backend can indicate scores and can escalate vertically, for example, while a mobile app must use the limited resources the operating system provides you to do that, while rendering actions, doing interactions, and receiving other operating system stuff. So the challenge isn’t just collecting the data efficiently, it’s also doing that invisibly.”

Bryce Buchanan

On why it’s challenging to instrument mobile apps: “I think something as simple as, like, ‘When should I start a span?’ Like, on a mobile device is not… it’s not trivial. On a backend, it’s very trivial. It’s like, oh, when I get a request, that’s when a span starts. Like, but for a mobile developer, […] should I do it when somebody clicks a button? When a network starts? […] There’s no right answer to that, like, how should you instrument that? It really depends on what your app does and what you’re trying to monitor.”

On the instrumentations available in the OTel Swift SDK: “We do have a number of other instrumentations, which are just things that we kind of see as useful bits of data that we pipe through the other signals, traces, logs, or what have you. So we got network instrumentation using URLSession. If you enable that, it’ll just automatically create spans for your network instrumentation. We just recently had a contribution of session events, so when your app launches and, you know, closes, you’ll get events for that automatically if you enable that instrumentation.”

Hanson Ho

On why it’s challenging to instrument mobile apps: “I think OpenTelemetry really promotes the sense of automatic instrumentation. So you drop it in the library, just… instruments by default, and you just get traces and logs at the end. […] On mobile apps, there are key steps that we could do, you know, for, like, app startup, or, like, you know, view transitions, things like that, crashes. But ultimately, an Uber Eats, what those folks want is different than what a Pinterest wants, or, your banking app. Like, the goals are just so different. And to understand the goals and translating that into what kind of telemetry is a non-trivial leap. It seems trivial, if you haven’t done it, but when you do it, you’re, like, I care about everything. Do you really care about everything?”

On getting an OpenTelemetry Kotlin SDK: At Embrace, we have submitted a donation proposal for a Kotlin API and SDK to OpenTelemetry, so hopefully we could, you know, build a SIG around that. So, whereas the Swift SDK, or the Swift SIG, both is… handles the agent and the underlying language SDK, we hope that the eventual OTel Kotlin SDK will be the language part, and the Android SDK will be the agent part. And those two, hopefully, will work in harmony with the API, so that there could be other agents that are built alongside the official OTel Android agent.”

Jason Plumb

On why it’s challenging to collect telemetry from mobile apps: “The place that my brain went to first is, more around performance, and I think that often on mobile and client-side stuff, people are hyper-focused on performance, specifically about the operational cost of capturing telemetry. So, like, what API calls you have to make to the platform, how long you can spend in those callbacks or those event handlers to, like, get the requisite data, but then also payload size on the wire, like, that becomes really important for people.

You know, on the mobile… in the mobile realm, networks are, like, hugely unreliable. They go up and down all the time, and you’re transferring between towers and everything else. It’s… like, efficiently handling those payloads is also something people, I think, are specifically challenged with on mobile that doesn’t exist in other platforms, and we don’t have the luxury of just, like, scale horizontally, like, fire up a few more instances, and you can handle this new load. It’s like, no, we’re all just kind of contained in this little package, and it has to be kind of good, or at least improving release to release, so that’s where my brain went.”

Nacho Bonafonte

On creating OTel Swift: “I started in a startup company who wanted to do tracing on tests. So… That was the product that we were gonna build. We built an MVP with OpenTracing. And we had that working with that, but we saw that OpenTelemetry, it was already announced, just recently announced, and it was the future, right? Because it was not only a spec, as OpenTracing was, but also a default implementation that will allow to interact way better with the tests and with the servers that you have in the cloud, and you can interact in a way that you don’t need, like, having your own code running on the server, because the spec gives you all. So, yeah, there was no OpenTelemetry Swift, there was nothing for Swift, so I started that, I built it, and donated to the community.”

On why Apple makes it difficult to keep Swift OTel package sizes small: It [OpenTelemetry Swift SDK] had to […] support a protobuf OTLP protocol with protobuf, and that means that you have a dependency on Apple on a library from Apple that has a dependency of another library from Apple, and it has a dependency of another library, and another, and another, and another. So, at the end, when you need some feature, you need a special library, and that has lots of dependencies that are controlled by Apple. […] I mean, even if you don’t use them, the package manager decides that it downloads everything to your… to your laptop. You don’t build anything, but it loads everything, and you cannot do anything to fix that.

Favorite answers to “What are some of the key challenges in collecting telemetry from mobile apps?”

Bryce Buchanan: “Could it be scale? What do you think? Do you guys think scale? […] With mobile applications, you know, you have millions of users, potentially. Anybody who downloads your app, if you’re monitoring it, you know, they’re gonna be sending you data, and, I think OTel is kind of… starts… started out as a server-side monitoring, kind of, like, by design, and when you’re server-side, you’re dealing with, like, thousands of clients, right? Rather than millions of clients, so… that really becomes a problem when you design everything with that assumption.”

Ari Demarco: “That also leads to the problem of data volumes, because depending on the app, a mobile application can generate an enormous amount of telemetry. So, unlike backends that you can control sampling centrally, in mobile, the sampling decisions probably should be made on-device with kind of limited visibility into the bigger picture. And then you have the question, if you oversample, you’ll waste a lot of bandwidth or battery. […] But if you undersample, you probably miss critical telemetry that is necessary to identify issues or understanding behaviors.”

Favorite answer to “Where do mobile teams struggle when it comes to observability and OpenTelemetry?”

Hanson Ho: “Tracing and […] telemetry is not a core competency of mobile developers […] because, you know, the challenges that they face are different. […] There’s so much to actually teach a team, and the architecture, the mobile app architectures also aren’t super well designed for maintainable instrumentation. When you have, like, these modules that don’t… talk to each other, but you gotta put it in the same trace, and they run on different threads, like, what the hell do you do?”

Favorite answers to “What are some of the big things that the SIGs are working on to improve that mobile support?”

Ari Demarco: One of the main problems we had is that the OpenTelemetry Swift repository is really big in terms of size. […] Whenever you have to download it, or compile your application, run tests, run this in CI, build the application, and deploy that, all that takes a bunch of time, and obviously, for example, in terms of CI, minutes is money, so… for every single iOS developer, it was going to be a pain. And probably, maybe they just wanted to use the API, or just our implementation of the OpenTelemetry SDK.

So, one of the things we did is basically create a plan on how to divide this. It is something really, really particular, because it’s a super specific problem of OpenTelemetry Swift, because of the Swift ecosystem, and because of how Apple manages all the dependencies. So, what we end up doing is we keep the official OpenTelemetry Swift repository as our main repository. It has everything we need, and everybody needs to do anything related to all OTLP.

But at the same time, that repository depends on another one, that is the OpenTelemetry Swift Core. That only has the OpenTelemetry Swift API and OpenTelemetry Swift SDK, which are, like, the bare minimum for you to get started, to start creating your own traces, to start emitting logs. And using the already implemented tracer, or logger, that is basically what the OpenTelemetry SDK is.

So, in that way, you can… process data, export that data, and start instrumenting your application without all the other overhead that the rest of the repository adds. While at the same time, we are still all OTLP-compliant, so anybody that wants the other use cases can still benefit from. So… It’s… it’s a good balance between what the community was asking, something that is also useful for the community and for ourselves.

Jason Plumb: “I classify it in 3 main areas. […] First and foremost is in API stability. So, we have an initialization API for the agent. We’re working on stabilizing that in the next couple of releases. […]

Second area is broadening our instrumentation. So what we instrument… how deep and what we cover with instrumentation. And then documenting that, but also, especially enhancing our support for auto instrumentation, which we have build time auto-instrumentation support. […]

And then, the third thing, the third category, which is, I think, maybe just as important, are semantic conventions. So we are… with every bit of instrumentation, with every kind of new feature that we’re adding, we’re trying to mirror that in the semantic conventions, even if the first pass is in development or experimental, like, at least having that out there and documented, what it means, like, what the intent is when you see a piece of data marked with this name, what these attributes hang off of it mean. So getting those defined and used in the Android SDK in a way that multiple vendors, multiple backends can expect a consistent naming.”

The “Apple, you should know better” moment of the panel

Bryce Buchanan: “I think this also speaks back to the server-centricness of OpenTelemetry, where the core features require gRPC, which, I think is becoming more widely adopted, but it’s just a huge library, and you know, if you’re… recently we added an HTTP OTLP library. I mean, like, in the last 2 years, it’s not really that recently, but… you can’t only use that, because Swift Package Manager just downloads everything in the dependency list, even if you’re linking to it or not. So, I really wish that would get fixed. Please, Apple, fix it.

Hanson Ho: Send those radars.

Bryce Buchanan: Yeah, send those radars!

Resources about OpenTelemetry

Full transcript

Colin Contreary: Hello, everyone, and welcome to today’s event, “Pumpkin spice and OpenTelemetry for mobile.” I’m Colin Contreary, I’m the head of content at Embrace, and I will be today’s moderator. We’ve got a wonderful panel with several members of OpenTelemetry’s Android and Swift SIGs.

Now, the OTel community has been hard at work planting the seeds for improved client-side observability for mobile, and now it’s harvest time. That’s right! They’re here to share the bounty of their knowledge, experience, and contributions to the OTel community, so that engineers can have better instrumentation, telemetry, and observability for their mobile apps.

So, what will we be covering in today’s discussion? Well, we’ll cover some of the key challenges that mobile platforms face when collecting telemetry. And note, I did say some of the challenges, not all of the challenges. We only have an hour here.

We’ll dive into what the Android and Swift SIGs are working on to improve mobile support for OpenTelemetry.

And we’ll also cover some helpful suggestions for mobile developers who are just getting started with OpenTelemetry.

And also, we’d love to answer as many questions as you have, so ask your questions with the Q&A feature, and we’ll answer them either during the panel, or during a dedicated Q&A time we have at the end. And so, with all of that out of the way, let’s dive in!

And we’re gonna start with a poll question as we meet our panelists. So, let me launch a poll question. Alright, everyone should see it.

And so, wonderful panelists, while we give the audience some time to get their answers in, let’s go around, and let’s have everyone introduce themselves, just a quick intro, and then share your favorite Fall thing, whether it’s a food, a drink, a decoration, etc. So let’s start with…

Actually, maybe I’ll go first, because I do want to show my Fall thing. So, for me, I already said, hello, my name’s Colin Contreary. My favorite Fall thing, at least this year… I don’t know if y’all can see this. I got this at Costco. It is a pumpkin streusel muffin. And so if y’all like Costco, go buy this. It’s delicious.

And so with that, I will now hand it over to Hanson. Hanson, can you go next?

Hanson Ho: My name’s Hanson, I do OTel and Android stuff here at Embrace. My favorite Fall thing is Fall baseball. Playoffs, especially when my team isn’t the Toronto Blue Jays. Big game four coming up, because… Yeah, we gotta, we gotta do it, because blowing that lead the last thing was not good. But anyway, that’s me.

Colin Contreary: Nice. Bryce, how about you go next?

Bryce Buchanan: Hi, I’m Bryce Buchanan, I’m an OTel, or not OTel, but an observability engineer at Elastic, and I work on the OTel Swift SDK. I’ve been kind of doing mobile stuff since the, you know, one of the first iterations of iPhone, working at a shop building apps. And my favorite Fall item is an apple, which is from my very own apple tree in my backyard, so… They’re quite tasty.

Colin Contreary: Do you have… do you have one… you have one singular apple tree, or do you–

Bryce Buchanan: I have… I have a li- just a little orchard, but they’re all, like, different varieties. I think I’ve got, like, 7 different trees. One of them’s an Asian pear tree, which is really tasty too, but… The apples are where it’s at.

Colin Contreary: I am jealous. Alright, very cool. Jason, how about you go next?

Jason Plumb: I also love those Washington apples. Yeah, those are delicious. Yeah, I’m Jason Plumb, I am an engineer at Splunk. I’m based out of Portland, Oregon. I’ve been helping out with OpenTelemetry for at least 5 years, and I am a maintainer on OpenTelemetry Android.

And I’ve been helping out with the Java, kind of, core and instrumentation repos on the… on the server, I guess, server side, like, on the core Java repos as well.

I’m a big Halloween fan, I love horror movies. I’m, like, pretty deep into horror movies, and so… Halloween’s a special time for me, you know, October. I also love just, like, going out and, like, seeing the leaves and, like, watching the weather change, the seasons change. I really love Fall. It’s absolutely my favorite season. I did a… I did an OTel-specific pumpkin for you all.

Hanson Ho: Oh, wow!

Bryce Buchanan: Thank you.

Jason Plumb: Stop, okay, there we go. So freshly carved, there you go. On the pumpkin.

Colin Contreary: How did you do that? Just with a knife? You just, like…

Jason Plumb: Yeah, with a knife, yeah,

Colin Contreary: Whoa!

Bryce Buchanan: With a spoon.

Colin Contreary: That’s very cool. Wow, Jason, that is tough to top. That is incredible. Alright, let’s… let’s kick it over to Nacho.

Nacho Bonafonte: Yeah, I am Nacho Bonafonte. I work on OpenTelemetry Swift, as a maintainer. I’m basically now doing on my… on my free time, because I started when I was working on a company, and my current company is not observability-related, so I keep working on the project, just not as much as before, but yeah, I try to help.

My favorite Autumn item, I am bringing just an example, is an Apple device because, you know, that Apple presents new stuff in September. Usually, it gets… the new devices come on the 21st, 22nd of September. That means when Autumn starts, so yeah, Autumn is synonym of just new Apple devices that we choose.

Colin Contreary: Nice. Did not expect that answer, Nacho. Very surprising. Very good. Alright, and Ari, let’s… let’s go to you next.

Ari Demarco: Hello, so I’m Ari Demarco, or Ariel Demarco. I also am a maintainer on the Swift SDK. And I’m working at Embrace as an iOS engineer. I… my background responds to both backend and mobile development, but this day, I’m mostly focused on the mobile side of things, in particular, iOS.

And regarding the Fall item, as I mentioned, I’m calling from Buenos Aires, where it’s actually Spring now, so while you guys are enjoying some pumpking pies, some Fall colors, I’m over here embracing the Spring vibes with my floral shirt. And even though I wore them all year round. This is the time where this type of clothing, this type of shirts, really hits right.

Colin Contreary: Nice. Alright. Well said, Ari. Alright, and with all those intros, I think we’ve gotten all of our answers in the poll, so let me… Let’s share the results. And it looks like people prefer the pumpkin drinks. Alright, maybe they were biased by this, the panel title of “pumpkin spice,” or maybe pumpkin spice is indeed the best, but interesting.

Pumpkin drinks, everyone, I guess we gotta go grab one after, except Ari, because you probably don’t want to drink pumpkin things if you’re feeling springtime.

All right. And so, with all of that out of the way, I know we did quick intros. Because part of this discussion, we’re going to be getting into some of the differences in working at OpenTelemetry on backend systems, which is what it was originally designed for, versus, obviously, what we’re here to talk about, mobile.

I’d love to learn a bit more about everyone’s background here. For example, what’s your engineering background? Are you… primarily a backend engineer? Did you get into mobile work recently? Are you mostly a mobile engineer? Have you dabbled in backend?

Did you get started in OTel for back-end work versus mobile? Just to give everyone some context into, kind of, what your area of specialty is, and what you’re most interested in working on, obviously, in OTel.

So, with that said, why don’t we go around and just get a little bit more about everyone’s background, both engineering-wise and also with the OpenTelemetry project?

So maybe, well, I already started with you, Hanson, so maybe that’s not to you. Maybe Jason. Jason, would you like to go first?

Jason Plumb: Yeah, a little bit about my background. Gosh, I’ve been doing this… I’ve been doing this, this being engineering, for a long time. In college, I studied electrical engineering and computer science, so I kind of have, like, a formal engineering background. But got into software, you know, when… when it was really lucrative to do that, and there was a lot going on back then.

I’ve always… I’ve worked in a bunch of different verticals, but kind of… I guess my… my domain area has kind of been, like, big data, back when people used that as a term, and, like, making scalable APIs, and, like, doing a lot with REST, and that’s kind of, like, where my… and, you know, server-side focused, like, deploying services and building like, big, robust, scalable APIs with huge data backends, and so that’s kind of where I came from. And then I dabbled a little bit in Android when I was at Adidas, and I was the lead on the backend system for this mobile device, and fitness app, and so there was Android and iOS representation on that project, and so I helped kind of… I mean, everyone did everything on that… on that team, but, at least a little bit, but yeah, I was leading up the backend, effort there, so… kind of getting familiarized with mobile back then, and then, got really specialized in telemetry and instrumentation, kind of in, like, maybe 6 or 7 years ago. So that’s kind of… It’s kind of how I came into this… into this boat.

Colin Contreary: Okay, awesome. Thank you so much, Jason. How about–

Hanson Ho: I want to interrupt. How’d you get roped into Android?

Colin Contreary: Oh, yeah, I noticed that, yeah.

Jason Plumb: So, yeah, when, we had, we, Splunk, had, a RUM product that was kind of homegrown, internal, like, developed open source, based on the Java SDK and the Java instrumentation. So I helped out with that because it was heavily based in Java, and because I was here and familiarized with some of that codebase, I helped to shepherd some of that. So I wasn’t… I wasn’t the main developer on it, but I helped out enough that, when there needed to be some… some help there, that’s how I got… got brought into it. Yeah.

And I’ve written some toy apps, I never, like, you know, I’ve never, like, deployed anything to the App Store and, like, not that serious about it, but I’ve, you know, I’ve made sure I can at least kick the tires on mobile development a few times.

Colin Contreary: Alright, thank you. How about we do, Ari, can you go next, please?

Ari Demarco: Yeah, so I mentioned that I work in both sides of things, like in the client side and the backend side. I started my career mostly on the backend side and web, but for the past several years, I’d say, like, 9-10 years, I’ve been focused primarily on mobile, in specific, iOS.

And while I was already familiar with the observability concepts from my backend days and web days, I first heard of OpenTelemetry some couple years ago, while working at my previous company.

And I think it was a good idea, because the idea of bridging the gap between all the different tools, and forging a unified standard really resonated, like, well to me. That’s where I started to… contribute to the OpenTelemetry Swift, and I realized that there’s a bunch of things to do, because, again, it’s a standard that was designed for… the backend ecosystem, let’s say.

And there are a bunch of challenges to… to move forward. So, yeah, and that’s what drew me in to get involved with Swift SIG, with mobile observability. And there’s where I met both Nacho and Bryce.

Colin Contreary: Alright, nice! Well, let’s hear from one of them next. Nacho, how about you?

Nacho Bonafonte: Yeah, I am… my background, I, you know, so, always, like, a Mac guy, so I started, building, yeah, updating apps from classic Mac to macOS 10, using Carbon and C, Basic, and yeah, and I have been rolling, all the new, you know, all the new technologies Apple brings. Every some time. So yeah, basically updating apps all my life, to the new, libraries that Apple has. I started with Macs, because iPhone was not even a thing then, and, I moved to… to mobile development, and yeah, like, yeah, like, 2010, or… or so.

And, yeah, and how I came to OpenTelemetry, yeah, I started in a startup company who wanted to do tracing on tests. So… That was the product that we were gonna build. We built an MVP with OpenTracing. And we had that working with that, but we saw that OpenTelemetry, it was already announced, just recently announced, and it was the future, right? Because it was not only a spec, as OpenTracing was, but also a default implementation that will allow to interact way better with the tests and with the servers that you have in the cloud, and you can interact in a way that you don’t need, like, having your own code running on the server, because the spec gives you all. So, yeah, there was no OpenTelemetry Swift, there was nothing for Swift, so I started that, I built it, and, donated to the community.

We… we… after that, we… we were bought by Datadog, and we created a product based on… on the, on, on, on the test… the test instrumentation that we did. We also moved to CI/CD instrumentation, but yeah, the test product for Swift is still working on top of OpenTelemetry, and not using, other, like, the stuff that Datalog has. So, yeah, it’s, like, workable like that. And that’s my background.

Colin Contreary: Wow. Okay. There’s a heck of a background, the OG.

Nacho Bonafonte: Yeah, I must say also that I wrote the first version, but really quickly, Bryce came to help, and doing lots, lots, lots of things in the, in the project. I just started the, the SIG, and… and most of the work is now done by, by Ari and Bryce, mainly, yeah.

Colin Contreary: Alright, well, let’s hear from Bryce, then. Bryce, is that true? Oh, you’re muted, by the way, Bryce.

Bryce Buchanan: Sorry, I was gonna say that there’s still a few sections of the, of the, SDK that I still need, Nacho’s help with, so… I was like, I don’t know about that. But, yeah, so, I kind of started my career as an iOS developer, back in 2009. I worked for this little startup in Seattle, that contracted out to larger companies building apps for them, and, so that was a lot of fun, and I got kind of bored. It turned into, like, this thing where it’s like, oh, this company wants an app that just is, like, a table that presents its, you know, list of stuff, and then… this app, you know, this company wants the same app, and that company wants the same app, and so it just turned into, like, moving assets around, essentially.

And so I was like, well, I’m gonna go do something else, so I joined New Relic in 2013 to, start working on what is now called observability for iOS, you know, building SDKs for them, and kind of dipping my toes into a more full-stack role, like, doing the backend and, web development as well.

And all of that was all in Objective-C, which I still am a little nostalgic for. But Swift has really come into its own, and I like that a lot now. But I didn’t really embrace Swift fully until joining Elastic 5 years ago, and building out their first fully supported agent that was built entirely on top of OpenTelemetry. So, that’s really where I started, diving into OpenTelemetry wholeheartedly, whereas in… at New Relic, it was more like, like, what is this OpenTelemetry thing? I’m not really sure. But, yeah, and that’s… that’s where I’m, at now, yeah.

Colin Contreary: Okay, nice. So your work doing that is what made you start participating in the… in the Swift SIG?

Bryce Buchanan: Yeah, exactly, yep, yep. Yeah, Elastic really wanted to participate in OpenTelemetry, and, and I, you know, really dove in, as kind of, like, the first, engineer at Elastic to, to really start working entirely in OpenTelemetry.

Colin Contreary: Okay, awesome. Thank you so much.

Ari Demarco: Bryce, I also… I also feel some nostalgia for Objective-C, if it’s worth saying.

Bryce Buchanan: Yeah, it’s like, there’s not enough square brackets in my life anymore.

Colin Contreary: The chances of the Objective-C SIG ever forming have, are far, far in the rearview mirror.

Bryce Buchanan: I think that’s a dead language, unfortunately. It’s a shame. Maybe there’s still some engineers at Apple who write frameworks for Apple that use it, but I’m not sure.

Colin Contreary: Awesome. And let’s round it out with, Hanson.

Hanson Ho: Well, I feel no nostalgia for Java 8, especially the HTTP network library. So, my background, I started off doing web stuff early in my career, back when we had to use alert boxes to debug JavaScript applications. Back when DHTML was a term, Ajax was new.

Did that for a number of years, but then started doing Android development when I joined Twitter in 2015, and have been doing Android since. What I did with Twitter was about reliability and stability.

And so telemetry is super important. Experimentation is super important, trying to understand user impact and value and what things are happening on the client side. Ultimately how impact user experience, was important. So what we did was we needed something better than what we had, which was basically just logs. So we built something, we call PCT, which looks very similar to, the many tracing libraries you’re familiar with, including OpenTelemetry’s tracing.

So when… when that all went… that way. I joined Embrace, and, built out, as a vendor, this reliability stuff. And somewhere along the way, we said we should build on an open standard so that we’re not just sitting in the silo alone, and OpenTelemetry was there, and I started, you know, to join the SIG, contributed to it. Not as much as I wanted to, but, you know, startups have to do startup work, so, you know, you do what you can. But I think I’ve been a year and a half or so, I’ve been doing OpenTelemetry, something like that.

And and since then, you know, my focus is really on how to bring just the notion of observability and production telemetry to a world, mobile, that, you know, aren’t super familiar with those concepts. I mean, think of telemetry as crash reports and RUM, which is useful on its own, but observability kind of takes it to a different level, especially if you have the ability to access, you know, direct information on the app, so you get user-focused observability, which is my latest thing I want to just keep talking about, is user-focused observability. So yeah, here I am.

Colin Contreary: Nice, awesome, thank you, Hanson. Well, I’m actually glad we took the time, because we have a lot of great coverage here. I’m loving everyone’s background. We have a lot of people backend first, then to mobile, some people web first, then backend, then mobile, then mobile, then backend, then back. So, I think everyone’s here worked across the full stack. Everyone knows the sheer spectrum of problems in terms of operating these systems, both from the frontend and the backend perspective.

And also, I’m hearing a lot of the same thing of, oh, you know, we needed this thing, we had to build it internally. So I’m so happy everyone is here, because that’s one of the goals of OpenTelemetry, is if we have some of this stuff built by a community, then every engineer is not in their own silo, building everything from scratch. So, very excited to hear. And with that.

Let’s get into our discussion. So the first topic, and I’m sure there’s a lot of things we could cover, our first topic is sharing what are some of the key challenges that mobile platforms face when collecting telemetry.

So, you know, we’re all here, we’re working in the SIGs, we’re working on libraries, instrumentation, APIs to help mobile developers be able to collect telemetry so that they can have better observability. So, I’m curious for the group, once you, either as a mobile developer, or when you started working for these SIGs, what are some of these challenges in collecting quality telemetry within mobile apps?

And maybe we can start there. Maybe, Bryce, would you like to start us off?

Bryce Buchanan: Sure, I don’t know, I’m trying to predict what everybody’s gonna say, but I think that we might all be thinking the same thing. Could it be scale? What do you think? Do you guys think scale? Okay. I don’t know. I kind of think, like, what happened… I think we’re having this problem with metrics, for sure. You mentioned that earlier, maybe just a little hint there, but with mobile applications, you know, you have millions of users, potentially. Anybody who downloads your app, if you’re monitoring it, you know, they’re gonna be sending you data, and, I think OTel is kind of… starts… started out as a server-side monitoring, kind of, like, by design, and when you’re server-side, you’re dealing with, like, thousands of clients, right? Rather than millions of clients, so… that really becomes a problem when you design everything with that assumption. Does anybody else want to counter my… My thoughts?

Jason Plumb: I’ll jump in. I don’t disagree with any of that. I mean, I think that’s spot on. The place that my brain went to first is, more around performance, and I think that often on mobile and client-side stuff. People are hyper-focused on performance, specifically about the operational cost of capturing telemetry. So, like, what API calls you have to make to the platform, how long you can spend in those callbacks or those event handlers to, like, get the requisite data, but then also payload size on the wire, like, that becomes really important for people.

You know, on the mobile… in the mobile realm, networks are, like, hugely unreliable. They go up and down all the time, and you’re transferring between towers and everything else. It’s… like, efficiently handling those payloads is also something people, I think, are specifically challenged with on mobile that doesn’t exist in other platforms, and we don’t have the luxury of just, like, scale horizontally, like, fire up a few more instances, and you can handle this new load. It’s like, no, we’re all just kind of contained in this little package, and it has to be kind of good, or at least improving release to release, so that’s where my brain went.

Ari Demarco: Yeah.

Bryce Buchanan: I’m thinking more along the lines of what I have… our side of the problem, not the customer’s side, so I think, yeah, I think that’s… more… more accurate for you.

Jason Plumb: But it’s also legit, I mean, yeah, absolutely, scaling to millions of handsets, potentially.

Ari Demarco: Yeah, I think that… Yeah, I think that also, in terms of scaling… it’s not, I think, the right match, but you gotta figure out how to deal with the different platform challenges that mobile provides to you, because mobile devices operate under fundamentally different constraints than the backend system. So, every telemetry operation consumes battery, CPU, memory, etc. So, those resources are valuable and have somewhat limited, because they are basically what the device has. So, a backend can indicate scores and can escalate vertically, for example, while a mobile app must use the limited resources the operating system provides you to do that, while rendering actions, doing interactions, and receiving other operating system stuff. So the challenge isn’t just collecting the data efficiently, it’s also doing that invisibly.

Invisible, so the user actually doesn’t really understand that that’s happening under the hood. So that’s a really big challenge for us, and for the different types of devices out there.

Nacho Bonafonte: Yeah, also related to that, what Ari mentions, I think that also the privacy that the platform puts in you is something that’s difficult. I mean, you definitely need operating system support. For collecting what you really are interested in, and if the system doesn’t provide you, it makes you, work around many things, or even provide, less than great information that… so, yeah, that’s something that, yeah, that also limits… The functionality you can, you can, you can send.

Hanson Ho: Yeah, a little bit more of that is, you know, Android, iOS, every version comes with new APIs that you collect more information from. On Android, the uptick, or the adoption for new versions takes quite a bit of time. It takes a few years to actually get, you know, any sort of significant market share. Well, a couple years now.

So, like, trying to work with different APIs to get the same information, when the telemetry looks the same, but the source is different, you know, there’s an inherent kind of, you know, difference in how this stuff is collected, and we’re at the, we’re at the mercy of what the OS gives us. So, you know, sometimes we want to find some things out, and you’re like, sorry, you’re running as a sandbox process that has no access to anything but what we tell you.

It’s sometimes hard to understand the conditions, you know, it’s running in. Like, how do you know that Spotify is playing in the background? How do you know YouTube is playing in the background? That can take some cores, and that’s why things are slow. Like, these are things that affect… we don’t control the environment, and it makes things super difficult. But quickly, I kind of want to bring in, like, the technology is, I think, there’s a huge… a number of challenges, but just, I think, the people. Like, when you go up to a mobile team and you say, hey, you know, log some telemetry, and they’re like okay, what do I do? And you’re like, just, you know, create some spans. What’s a span?

Tracing and, and, like telemetry is not a core competency of mobile developers for… I mean, because, you know, the challenges that they face are different, so they… they’re just not used to it. So there’s so much to actually teach a team, and the architecture, the mobile app architectures also aren’t super, you know, well designed for, you know, maintainable instrumentation.

When you have, like, these modules that don’t… talk to each other, but you gotta put it in the same trace, and they run on different threads, like, what the hell do you do? So… I’ll stop there. We do only have an hour.

Nacho Bonafonte: Yeah, also, in that sense, we, we… Have always tried to… Make the user as easy as possible to… add instrumentation to the app, right? Because, as Hanson said, usually mobile developers are not used to this kind of stuff, so you… if you can’t just initialize something there and make things just flow. I at least have something early, and… After that, probably you have to refine more, but trying to have a really easy way of just start reporting things and getting stuff is something that, yeah, we have focused… at least personally, I have tried to focus a lot on, on the, on, on the Swift side. Yeah, but that’s, that’s a challenge.

Hanson Ho: I mean, Jason could probably speak to this, but one of the challenges of OpenTelemetry Android is… or one of the biggest things that we do is help initialize the SDK. And the thought around what the API should look like, and what the defaults, you know, should be like. So, I don’t know if you want to go into a little detail about that, Jason, because I think we spend a lot of time talking about it in the SIGs.

Jason Plumb: We certainly… yeah, we absolutely do. I mean, it’s a matter of a little bit of guesswork about what people might want to customize. And, especially in Android, the ecosystem is so diverse, that it’s really hard, or it’s challenging to predict what teams are going to want to customize which features. And it’s… it’s a little bit different, I think, than server-side, where it’s, like, everyone’s kind of doing the same thing, that, like, different teams are gonna care about and need to configure different aspects of the SDK, and it’s, like, very app-specific, but it’s also to some extent, vendor-specific, which is one of the goals of OpenTelemetry, right, is to try and be vendor-neutral.

To be portable, to avoid vendor lock-in, so we want to do that in a way that allows the flexibility of customizing for the users that have these special cases, while also being easy. So falling back to the case that covers, hopefully, 80-90% of application developers. Yeah. I think we’re doing a… I think we’re doing a really good job, though. I love to see where stuff is going in Android, and we can talk about that more in specifics later.

Colin Contreary: What are some of the questions? So, Hanson, you brought up a good point, where if someone is a mobile developer, they’re interested in OpenTelemetry, so we’ll get into, in the next section, we can talk about some of the specific work that SIGs are working on, but, like, what are some of those questions or common challenges they have? Other than just, I don’t even know what a trace is, are there… Using the tooling out there, is… is it just very difficult for them to kind of get started?

Hanson Ho: So, generally, when a mobile developer interacts with OpenTelemetry, they are told to, because somebody said, we’re using OpenTelemetry to log, you know, you know, spans and things like that. They’ll be like, huh? Opentelemetry… So, on a little bit of a background, for Android, there are product analytics frameworks that track, you know, things like, you know, what screens are opened and things like that, so that product managers can know if their features are being used so they can get a promotion or not. Yes, burn.

But those tend to be focused on what happened, and they don’t really explain. So when, you know, the funnel drops, and you’re like, oh, what happened?

They asked the developers, and the developers say, I don’t know, you know, they’ll go and check the, oh, did we deploy a new release? Oh, we did, you know, look at the commits and see what happened. And that’s typically… and because, frankly, mobile apps are fairly simple. They’re not… they’re not distributed systems where you don’t really know what is causing what. It’s just one process, you know? It… it, you know, so investigation based on, you know, what changed in repo, that’s all possible. So we’re not used to, like, having to log sophisticated telemetry. So, you know, we get maybe, you know, exactly, you know, some, some, some product analytics stuff, so we know how to create logs, but not anything sophisticated.

So… the first thing is, like, okay, what am I supposed to do? Like, what do you want to do here? And how do you validate that you did it right? Because, you know, when you put in production. You can’t just, you know, oh, I made a mistake, I called the event name by a different thing, I’m just gonna deploy something new. Well, sorry, you’re gonna have to deploy a whole new version, go through all the reviews and then have people install it. So, even if you fix something, it may be, like, 2 weeks before, you know, the majority of your users have it. If you have a good, you know, deployment system and patching system. So, like, all that just makes it, like, what is observability? They would ask what is observability instead of what is OpenTelemetry. And once they understand the notion of tracing, the OpenTelemetry API is, I think, fairly intuitive. But then setting up the SDK and all the collectors, that’s the tricky part, and that’s what, I think, what we’re trying to do is to reduce the friction of somebody who wants to, you know, like you said, somebody who said, oh yeah, we want some open telemetry signals from this app. And they go, what? They can go look at our documentation, look at OpenTelemetry Android, OpenTelemetry Swift, drop it in their SDK, a couple lines of configuration, boom.

And it’s, like, at least the Hello World is set up. So… the main challenge is just, like, learning what the hell all this is.

Ari Demarco: Yeah, also, one thing regarding… I think Jason mentioned a bit that there’s problems regarding network variability and the different costs for mobile. It’s not the same doing the request from a mobile application and doing it from the backend. But if you also add that with the app’s lifecycle complexity. Mobile apps don’t run continuously, so they are suspended, backgrounded, terminated, killed by OS, there’s a crash, the app can pre-warm your application, the OS can pre-warm your application, the application could launch because of a push notification, a background fetch, or because a human tapped into the icon. So, when do you flesh your telemetry? When do you… how do you track session continuity across app restarts? What happens to, I don’t know, in-flight spans whenever there’s a crash, or the OS kills your process. So there’s a bunch of complexity in terms of what do you decide to do in those cases? And it’s not trivial, like, just solving one of those questions is not a one-liner thing you’ll solve in your code. It’s something you really have to think through to actually solve that. So, it’s a complex… it’s really complex whenever you compare it to what is there in the backend side.

Colin Contreary: And is handling some of these cases, we’ll get into this in the next section, but is handling some of this for the developers, some of the work that y’all are working on in these SIGs? So just so that… developers don’t have to kind of think about all of these edge cases, or are these sorts of things unavoidable? And it’s up to the developer to kind of manage them.

Bryce Buchanan: Yeah, I think, yeah, this is a tough one, because I was going to kind of jump on to what Ari was saying, like, I think something as simple as, like, when… when should I start a span? Like, on a mobile device is not… it’s not trivial. On a backend, it’s very trivial. It’s like, oh, when I get a request, that’s when a span starts. Like, but for a mobile developer, like, it’s like, should I do it when somebody clicks a button, when a network starts? It’s… it’s really… it really comes down to, like, I think, what your goal is, like, what do you want to track? And so, we can’t tell you, right? It really depends, like, what kind of app are you building, you know? Like, I don’t know, like, are you building a calendar app that, you know, sits next to your meeting room and is just on all the time? Like, or are you… creating a music app that sits in the background and plays music, like, or a streaming app, you know? Like, there’s all these different types of apps that have completely different ways of being used, and so, like, we can’t really predict… like, there’s no right answer to that, like, how, like, how should you instrument that? Like, it really depends on what your app does and what you’re trying to monitor.

Jason Plumb: Yeah, I think we, OpenTelemetry, don’t have excellent guidance to developers yet around some of this stuff. We don’t yet have, like, a really good data model or just, like, like, conceptual description of, like, what sessions are. I think some of us who have, like, been involved in RUM to some extent, like, we might have a slightly different vernacular. We understand the problem maybe differently than a lot of the other OpenTelemetry contributors.

Like, Hanson touched on the word funnel. I think if you have never used a RUM product, you might not even know what a funnel is, like, like, that is… but that’s critical, like, to people looking at performance and user journeys, like, through a RUM product, and I think every vendor for their tracing solution is going to have a trace waterfall at some… in some screen. Probably every RUM vendor has some way of looking at funnel, but then there’s all this other stuff, and that is kind of non-standard. It’s very different, and we haven’t… we haven’t… I mean, the… the UX being outside the scope of what OpenTelemetry is, but… the data that… that allows for building different UXs, we haven’t quite figured that part out yet.

It’s a little bit rambly, sorry, but I wanted to just, like… this domain is, like, very… it’s very kind of specialized, and to Bryce’s point, it really does depend on the app and what you’re doing.

Hanson Ho: Yeah, when.

Ari Demarco: Yeah, none…

Hanson Ho: Go ahead. Go ahead, sir.

Ari Demarco: No, go, go, go.

Hanson Ho: Oh, so, I mean, when you’re a backend service, the goal is to, like, take the request and, you know, shoot out the response. You want to log how long that took, and if there’s interesting that’s happening, you know, in the middle, the goal is simple. The goal of a mobile app is to be defined. And, you know, when… when you ask, well, what should I log and trace, and the product managers go, I don’t know. So it takes, like, a level of understanding. Like, I think OpenTelemetry really promotes the sense of automatic instrumentation. So you drop it in the library, just… instruments by default, and you just get traces and logs at the end.

On mobile apps, there are key steps that we could do, you know, for, like, app startup, or, like, you know, view transitions, things like that, crashes. But ultimately, like, you know, an Uber Eats what those, you know, folks want is different than what a, you know, a, you know, Pinterest wants, or, your banking app. Like, the goals are just so different. And to understand the goals and translating that into what kind of telemetry is a non-trivial leap. It seems trivial, if you haven’t done it, but when you do it, you’re, like, I care about everything. Do you really care about everything?

Colin Contreary: Nice. Okay, well…

Ari Demarco: I…

Colin Contreary: Oh, sorry, go ahead.

Ari Demarco: No, just… I think that wraps a bit with what Bryce started. That also leads to the problem of data volumes, because depending on the app, a mobile application can generate an enormous amount of telemetry. So, unlike backends that you can control sampling centrally, in mobile, the sampling decisions probably should be made on-device with kind of limited visibility into the bigger picture. And then you have the question, if you oversample, you’ll waste… a lot of bandwidth or battery, and the research that I mentioned before. But if you undersample, you probably miss critical telemetry that is necessary to identify issues or understanding behaviors. So, the scale, it’s a problem when you generate that amount of data in different kind of applications.

Colin Contreary: Yep, that’s a really good point. Yeah, because like Bryce mentioned, the millions of devices that you’re collecting telemetry from are individual people launching those apps in different places, on different devices, with different specs, using different app versions. It’s not the 1,000 very similar servers running your…

Bryce Buchanan: All using the same deployment.

Colin Contreary: Yeah. I think this is a great spot for us to move to our next discussion question, because we’ve started talking about this already, and so the next thing I want to dive into is we’ve talked about some of those challenges in collecting telemetry if you’re a mobile developer, right? You have to decide what you want to collect, you have to instrument it, it’s specific to your app, your business use case, all of that. And we’re here, we’re working in the SIGs, right? We’re working on Android and Swift to make it easier for developers, so let’s dive into what are some of the things we’re actively working on in terms of improving how developers can get the OpenTelemetry into their apps. So, what are some of the big things that the SIGs are working on to improve that mobile support?

And maybe we’ll start with Swift, because I know, Ari, you’d mentioned to me previously that one of the big things y’all worked on was kind of I think you called it, like, dividing the repository to reduce some of the, like, bundle size and needing to install so much, software in order to collect its imagery. Can you dive into that a little bit?

Ari Demarco: Yeah, yeah. So, this was most… it’s an historic problem we have, like, just for context. As time went by, to download dependencies on the iOS ecosystem, well, the Apple ecosystem, we always had to do it manually, or through Cocapods, or other package dependency managers, but now there’s an official one that is Swift Package Manager.

It’s… kind of brand new. It’s not new, but it’s not, I think, as mature as I would want it to be. And one of the main problems we had is that the OpenTelemetry Swift repository is really big in terms of size.

So, even though the… all the things that are bundled, you know, by telemetry are not bundled, finally, in the application. Whenever you have to download it, or compile your application, run tests, run this in CI, build the application, and deploy that, all that takes a bunch of time, and obviously, for example, in terms of CI, minutes is money, so… for every single iOS developer, it was going to be a pain. And probably, maybe they just wanted to use the API, or just our implementation of the OpenTelemetry SDK.

And not all the other stuff, that probably is the heavy one.

So, one of the things we did is basically create a plan on how to divide this. It is something really, really particular, because it’s a super specific problem of OpenTelemetry Swift, because of the Swift ecosystem, and because of how Apple manages all the dependencies. So, what we end up doing is we keep the official OpenTelemetry Swift repository as our main repository. It has everything we need, and everybody needs to do anything related to all OTLP.

But at the same time, that repository depends on another one, that is the OpenTelemetry Swift Core. That only has the OpenTelemetry Swift API and OpenTelemetry Swift SDK, which are, like, the bare minimum for you to get started, to start creating your own traces, to start emitting logs. And using the already implemented tracer, or logger, that is basically what the OpenTelemetry SDK is.

So, in that way, you can… process data, export that data, and start instrumenting your application without all the other overhead that the rest of the repository adds. While at the same time, we are still all OTLP compliant, so anybody that wants the other use cases can still benefit from. So… It’s… it’s a good balance between what the community was asking, something that is also useful for the community and for ourselves.

And at the same time, we are still compliant, let’s say, with OpenTelemetry as a whole. So that’s… that’s… that was a really good move. I think it was a long talk. It’s been, like, a long time we had to discuss this. It adds a bunch of overhead in terms of maintaining, because now we are the same amount of people for more repositories, but all in all, it’s… it’s good for the community, for the mobile community.

Colin Contreary: Gotcha. And did anyone ask Nacho? Nacho, why did you build it so big? That we had to do all this work to break it up.

Nacho Bonafonte: Yeah, okay, yeah. There are, like, a… there were a minimum number of features that OpenTelemetry had to have, for a language to be official. And, for example, it had to, it had to support a protobuf OTLP protocol with protobuf, and that means that you have a dependency on Apple on a library from Apple that has a dependency of another library from Apple, and it has a dependency of another library, and another, and another, and another. So, at the end, when you need some feature, you need a special library, and that has lots of dependencies that are controlled by Apple. The package manager doesn’t allow, I mean, even if you don’t use them, the package manager decides that it downloads everything to your… to your laptop. You don’t build anything, but it loads everything, and you cannot do anything, to fix that. It’s something that SPM does, and… When I… it started, I must say, it was, like, 2019 or so. It was just started SPM, and I thought, okay, yeah, they will fix this, right? Like, if this is the version 1.0. Who would want to have this, right? If I’m not using this dependency, why would I, like, to download it. So, yeah, it was like, yeah, it will be fixed by time. No. It wasn’t.

Bryce Buchanan: Yeah, yeah, they didn’t. If I may chime in, yeah, so I think this also speaks back to the server-centricness of OpenTelemetry, where the core features require gRPC, which, I think is becoming more widely adopted, but it’s just a huge library, and you know, if you’re… recently we added an HTTP OTLP library. I mean, like, in the last, like, 2 years, it’s not really that recently, but… you can’t only use that, because Swift Package Manager just downloads everything in the dependency list, even if you’re linking to it or not. So, I really wish that would get fixed. Please, Apple, fix it.

Hanson Ho: Send those radars.

Bryce Buchanan: Yeah, send those radars!

Colin Contreary: Nice! Well, thank y’all for that, so that was covering the Swift SIG. I’d love to kick it over to Jason. Can you talk about some of the work in the Android SIG that y’all are doing?

Jason Plumb: Yeah, to make it easier for developers and to, like, provide better support for OpenTelemetry from Android. Yeah, I classify it in, like, 3 main areas. I think first and, like, top of mind right now, first and foremost is, in, API stability. So, we have an initialization API for the agent. We’re working on stabilizing that in the next couple of releases.

We’re actively soliciting feedback from users who are trying it out, that want to provide, input to us before we mark it as stable. So that’s the first area. Second area is broadening our instrumentation. So what instrument… how deep and what we cover with instrumentation. And then documenting that, but also, especially enhancing our support for auto instrumentation, which we have build time auto-instrumentation support. I think we’ll start to see that rapidly expand as well. And then, the third thing, the third category, which is, I think, maybe just as important, are semantic conventions. So we are… with every bit of instrumentation, with every kind of new feature that we’re adding, we’re trying to mirror that in the semantic conventions, even if the first pass is in development or experimental, like, at least having that out there and documented, what it means, like, what the intent is when you see a piece of data marked with this name, what these attributes hang off of it mean.

So getting those defined and used in the Android SDK in a way that multiple vendors, multiple backends can expect a consistent naming. So the semantic conventions for mobile is also just a huge area of interest in… work for us, so… yeah, that’s my kind of three… three-pronged answer. Hanson, did I miss anything?

Hanson Ho: No, I think in terms of the Android sake, that’s definitely, you know, those, you know, those three main focuses. I mean, frankly, you know, they each deserve more time than we have time for, but we just do our best to kind of, like, cover, you know, the rest. And, you know, I want to kind of jump on the semantic convention piece, because things aren’t real until it’s official and defined, so the community understands and shares the same definitions. And semantic conventions is how we, you know, reify these ideas that we all share, you know, in abstract, in something concrete. So, in terms of my kind of focus, with both, you know, OTel Android and just, like, OTel in general, nailing the… notion of what a client data model ought to be, and how it differs from backend, what you do and what you use data for, I think, you know, we all have ideas, but we all need to write them down and make sure everybody agrees. And the thing is, we’re not all going to agree on everything, and mobile is… the nature of mobile is that there is no right answer. So the… whatever we write down has to be inclusive of all the different opinions of, you know, where you want to start the session, whether the app is foreground, or just something that runs in the background, because, you know, if you rely on too much on the app being in the foreground, well, what if you’re a podcast app? You know, then what do you do? So it has to, like, take care of cases like that, and a lot of it is non-trivial. So, coming to understanding and an agreement, that’s gonna be, I think, the next step that we’re gonna get to.

And on the tech front, so, the OTel Android SDK is mostly an agent, and under the hood, we use the Java SDK and API to actually record telemetry. And for the most part, it’s okay. But Android developers, especially in 2025, really prefer Kotlin, in terms of idiomaticness, and also in terms of, usability. And with the advent of Kotlin Multiplatform, so the ability to basically develop in Kotlin and basically target both… well, not both, but iOS, Android, and web, you need something that’s purely Kotlin in order to, you know, record telemetry at that layer. So, at Embrace, we have submitted a donation proposal for a Kotlin API and SDK to OpenTelemetry, so hopefully we could, you know, build a SIG around that. So, whereas the Swift SDK, or the Swift SIG, both is… handles the agent and the underlying language SDK, we hope that the, eventual OTel Kotlin SDK will be the language part, and the, Android SDK will be the, agent part. And those two, hopefully, will work in harmony with the API, so that there could be other agents that are built alongside, the, OTel, official OTel Android agent, but does similar things, and offers similar services, like, you know, what’s my session? Hey, session is ending, what should I do? Like, APIs that are, like, an agent API, so to speak, so that, you know, there can be an agent specialized for background apps, an agent specialized for, car apps, and things like that. Because the diversity of runtime environments means that to help you know, set up you know, the SDK will require different opinions. And, I am a belief that the strongest part of OTel is the standard and the API, and the ability to build more APIs so that there could be different implementations doing the same thing with the same contracts. That’s where the future is, and I believe that is super key, as well as documentation of how you actually, you know, use this stuff. Sometimes it’s obvious to us, because we’ve worked on it for others, it’s like, okay, how… okay, what kind of URL do I pass into this exporter? Or, you know, these are… it’s… it’s connecting the dots. Like, we’re building the pieces, but we’ve got to connect them together. And I think, the entire… not only the Android and Swift SIGs are doing it, the entire OpenTelemetry community could do with a… with a… extra effort in how to get people ramped up.

Colin Contreary: Nice! Well, I want to ask a question, because this actually, we have an audience question that, is about both the Android and Swift SIG, so I’d like to ask that, and we’ll see, depending on how long the answer is, we might have time for maybe one more, because we’re coming up on time. But the question is, it says, “OTel Android’s README lists many features, like crash reporting, ANR detection, that seem to provide more information than simply enabling traces, logs, and spans. So, question one, how mature are these features? So this will be a question for Jason or Hanson. And then the second part is, are the same features present in OTel Swift? Even if they’re not listed in the README. So, maybe Jason, can you… can you take a swing?

Jason Plumb: Yeah, I’ll… just because we’re time-limited, I’ll take a quick stab at it. Yeah, we have a lot of features that are not just basic SDK-level tracing, so we have a lot of instrumentation around the things that were described, like ANRs, crash reporting, views, and other Android-specific components are also instrumented. We have network availability events, you know, there’s a lot of other, telemetry that’s generated.

How mature are they? It depends on how you define mature, I suppose. This did come from a Splunk codebase originally. Most of that instrumentation is still used in production by Splunk customers, so at scale. It’s evolving, though, like, the community has taken it over and made changes to it, so… you know, is instrumentation going to be marked stable in 1.0 any time this year? Very unlikely. We’re focused on the APIs first. So, mature, you know, I’d say it is in production, it’s at scale. Whether OpenTelemetry considers it mature, probably not, you know, but we’re working on it with every release.

Hanson Ho: I will say, though, there are many companies, that build on top of this, and those companies have paying customers that have apps in deployment, so if your sense of maturity is, are there real apps using it in volume, there are real apps using this in volume. Real companies. So…

Colin Contreary: Nice. And for our Swift SIG representatives, are there… obviously, probably not ANR detection, because that’s an Android-specific issue, but crash reporting, is that available in the Swift SDK?

Bryce Buchanan: Unfortunately, we don’t have crash reporting in the Swift SDK. Crash reporting is kind of a touchy subject on iOS, because there’s a lot that needs to go on behind the scenes to make it useful. I mean, and so we haven’t really prioritized that specifically. You know, you need to be able to, de-obfuscate all of the crashes, because you get it from the app. In production, it’s just a bunch of, memory addresses, and so it needs a dSYM and requires, like, backend, usually backend support to make that work, so we kind of leave that to vendors to figure out. If you’re interested in, adding it to your app, or adding it on to an instrumented app you’re using with OpenTelemetry, you can take a look at the Elastic SDK, which implements a crash reporter. But again, like, even over at Elastic, we still haven’t come to a solution on how to de-obfuscate those.

But we do have a number of other instrumentations, which are just things that we kind of see as useful bits of data that we pipe through the other signals, traces, logs, or what have you. So we got, like, network, instrumentation using URLSession. If you enable that, it’ll just automatically create spans for your network instrumentation.

We just recently had a contribution of session events, so when your app launches and, you know, closes, you’ll get events for that automatically if you enable that instrumentation. There’s a couple of other ones you can take a look at. Although, we do need to do a lot better job of documenting those things, so I apologize for that.

Hanson Ho: So, if you’re interested in… if you’re a mobile app and you’re just interested in creating manual traces and logs, you don’t have to use OTel Android. You can just use OTel Java, and in the future, OTel Kotlin, and basically that’ll give you the pipeline to basically, you know, send the data and send it to your collector.

Now, there’s a bunch of work that you have to do on top of that, and this is why, you know, the OTel Android’s SIG and package exists, not only to give you out-of-the-box instrumentation, but also simplify some of those workflows. But, again, you don’t need to use it. We just use the Java SDK, initialize the Java SDK on your behalf.

So, if you’re an Android app, and you’re just like, I have just one trace I want to do, I don’t want all your crap, well, go ahead, just use OTel Android, or OTel Java, or OTel Kotlin, and you’ll, you’ll, you know, you can do that.

Colin Contreary: Nice, okay. Well, we are coming up on time, and I don’t think we’re just gonna get cut off at the hour, but I probably should wrap up, and what I’m learning is, I think we need to schedule another one of these, because we didn’t get to a lot of the stuff we wanted to talk about. But for now, I think we’re gonna have to say the end of Chapter 1, and we’ll have to redo this again, maybe in a different season, Ari, and you won’t be wearing springtime gear. But let me do a quick wrap-up.

That’s all the time we have for questions. We had a few more we couldn’t get to. Hopefully we’ll do another one of these soon, and we can answer more of your questions.

That’s all the time we have for this awesome discussion. I want to give a big thank you to this wonderful panel. Thank you all, audience, for being here. Thank you for your, questions.

I hope you learned a bit about the current state of OTel support for mobile, specifically Android and Swift, and how you can use OpenTelemetry in your mobile apps. You can use both Android and Swift today.

If you’d like to get involved in OpenTelemetry, there are many ways you can participate. You can provide feedback, so some of the people here have mentioned that. Jason, you mentioned they’re looking for feedback. You can join the discussions, they’re all public.

All the meeting notes are public, you can also read them and chime in, asynchronously. You can also become a contributor, just like everyone here. If you’re not sure where to start, you can go to the OpenTelemetry website, opentelemetry.io, and you can also join the Cloud Native Computing Foundation Slack community. I’d encourage you to go there, people are very friendly, they can point you to the right way.

And so, with all of that said, one more time, I want to say a big thanks to this awesome panel. Thank you for being here in attendance, and we will see you at the next one.

Hanson Ho: Go Jays!

Colin Contreary: Go Jays, that’s the best end there is. Alright, thank y’all for being here.

Ari Demarco: Bye-bye. Yeah, bye-bye.

Embrace Deliver incredible mobile experiences with Embrace.

Get started today with 1 million free user sessions.

Get started free
Related Content
The Embrace and OpenTelemtry logos.

Getting started with OpenTelemetry for mobile: Key takeaways

Mobile engineers often hear that OpenTelemetry is the standard — but applying it to mobile is harder than it looks. In our latest Getting Started webinar, we walked through the pitfalls of vanilla OTel and how Embrace helps teams instrument in minutes. Here’s the recap, with clips and resources to help you try it yourself.