WEBINAR Dec 9 | Fireside chat: The future of user-focused observability with Embrace + SpeedCurve.

Sign-up

Riding the wave: OTel experts share observability tips

Advice on getting started with OpenTelemetry, the state of OTel support for mobile, understanding the true cost of telemetry collection and more.

This article was originally published on The New Stack.

Learning how to best use OpenTelemetry for observability can be a daunting task. While it’s a relatively new project, it’s growing fast and has already become the open standard for how to collect and export telemetry.

And there are so many ways to use OpenTelemetry, including:

  • Developers instrumenting their applications with OTel community SDKs
  • Library authors adding instrumentation to automatically emit OTel signals
  • Platform teams building telemetry pipelines with the OTel Collector
  • Observability vendors building OTel-compatible SDKs and OTLP (OpenTelemetry protocol) endpoints to embrace data flexibility and reduce vendor lock-in

With so much activity going on in OpenTelemetry — it’s now the second-biggest project in the Cloud Native Computing Foundation (CNCF) — it’s no surprise that keeping up to date can sometimes feel like a full-time job.

I sat down with several OpenTelemetry experts for a fun, summer-themed panel discussion on driving observability success.

Getting started with OpenTelemetry

When organizations are considering investing in OpenTelemetry for better observability, it’s important to start small and to show value before diving headfirst into a full migration. Dyrmishi said one approach is to look at your current observability platform and find a telemetry signal that you have not paid attention to for a long time. In many cases, this is going to be tracing. You can then use OpenTelemetry to instrument your application and services with traces to show your company how good the data is and how easy the transition is from your existing technology to OpenTelemetry.

“It’s something that shouldn’t be rushed because it goes both ways. Some are skeptics, and some are like, ‘OpenTelemetry is the best thing that has ever happened! Let’s migrate to it immediately.’ And in the process, things are breaking, dashboards are not working and users are not happy,” Dyrmishi said.

Organizations might jump to blame OpenTelemetry for not fixing all their problems, when in reality, they created an uphill battle by attempting too big of an initial migration. Release slowly and learn the tools well to maximize your chance at a successful migration to OpenTelemetry.

Ho then chimed in, “So I think before you start with any lines of code or anything technical, you’ve got to align the organization, especially the incentives in each stakeholder.”

Having buy-in from leadership, even the C-suite, is not sufficient for a successful rollout, he said. Without agreement on which KPIs are most important to measure, it’s easy to end up with a bunch of data that doesn’t connect to anything useful in terms of prioritizing engineering work.

Educating developers about observability

The panelists agreed that it can be a challenge getting reliability teams and developers to speak the same language when it comes to observability.

As Kröhling shared, “We forget that developers, they don’t care about security. They don’t care about observability. I mean, sure, they care, but they don’t, right? […] We have to help them. And we don’t help them by teaching them PromQL. That’s not the point. The point is, do you know how long your users are waiting for an answer from your service and how much of that answer is caused by your downstream services? So that’s why they care. And I think if you have this mindset that, you know, devs are not observability engineers, I have to bring them answers with value, then you’re going to have a successful implementation of OpenTelemetry.”

Part of the challenge for developers adopting observability practices is the lack of good frontend support in OpenTelemetry. As Dyrmishi mentioned, “Can we get some more love for frontend and web observability? Yeah, of course, backend observability is very well established now. I mean, of course, there is still work in progress, but it feels like frontend is the forgotten child.”

After all, OpenTelemetry has its roots firmly in observability for backend systems, and it’s only recently that we’ve started seeing investment in better OpenTelemetry support for mobile and web. For example, a new Browser Special Interest Group (SIG) has just been formed to provide dedicated browser support for OpenTelemetry. You can learn more about what they’re working on at this live panel discussion tomorrow, July 31 at 1 p.m. ET/10 a.m. PT, hosted by Embrace.

The panelists received several questions asking about the state of OpenTelemetry support for mobile. Ho said that the Android and Swift SDKs work great for modeling performance traces, but there is active work being done to better support the needs of mobile developers.

“We’re really building momentum, I think, in the mobile space, trying to take something that was designed for the backend to monitor backend devices, app performance, to something that is more user-centric, user-focused, to link it to performance for client applications.”

He mentioned how Embrace is creating a Kotlin API for the tracing spec, with plans to build a Kotlin SDK as well. Currently, OTel only supports Java for instrumenting Android apps, which is challenging for developers who use Kotlin. For context, Google named Kotlin as the preferred language for Android app developers back in 2019, so having first-class Kotlin support is important for modern mobile teams, especially ones building cross-platform apps with frameworks like Kotlin Multiplatform.

Understanding the cost of telemetry collection

While much has been written about the increasing cost of observability, little attention gets paid to the cost of telemetry collection in terms of computing resources. The panelists discussed the importance of measuring the performance impact at the source of telemetry collection.

“So certainly people talk about costs a lot of times to think about storage and processing in the backend, but on mobile, the cost is also in the collection,” Ho said. “So even if you’re like, ‘Yeah, I’ve got some sampling going on, tail sampling …’ Well, anytime you collect data, it is costly on Android depending on where you do it. Not just Android too. [It can be for] iOS and other mobile devices just because they sometimes are 10 years old.”

Kröhling also pointed out that it’s just as important to profile your backend services to ensure you’re being purposeful in your instrumentation and only sending valuable telemetry. While the OTel Collector can process, filter and sample telemetry to reduce what’s ultimately sent to your observability platform, engineers should ensure they are not wasting resources on bad telemetry.

“I’d argue that the Collector is a stopgap solution,” said Kröhling. “You should definitely do tail sampling. You should definitely do PII cleanup at the Collector. But you are still collecting. You are still at SDK, at your application, you are still processing. You have processing cycles, creating that data, placing it in memory, queuing up and exporting that data somewhere. You have traffic between those two services. You have network. So go back there and clean up the data at the source.”

For teams that want guidance on improving the quality of the telemetry they collect, check out Instrumentation Score, a new open source specification for assessing instrumentation quality.

Weakly also shared a key suggestion for engineering teams that want to invest in better observability but struggle to justify it from a cost-versus-value standpoint.

“Nobody really complains about the cost of business analytics or business intelligence tooling. It costs a lot. You could complain about it a little bit, but it’s so directly correlated to the value that you get out of it, to the ability to understand what the business needs and how to go from there, that it’s a worthwhile investment.”

Instead of looking at observability as a cost center, teams should spend in the right places, so that the way they measure and observe their systems is connected to driving better business outcomes.

After all, OpenTelemetry has its roots firmly in observability for backend systems, and it’s only recently that we’ve started seeing investment in better OpenTelemetry support for mobile and web. For example, a new Browser Special Interest Group (SIG) has just been formed to provide dedicated browser support for OpenTelemetry. You can learn more about what they’re working on in this on-demand panel discussion, hosted by Embrace.

Embrace Deliver incredible mobile experiences with Embrace.

Get started today with 1 million free user sessions.

Get started free
Related Content