This article was originally published on The New Stack. Part 1 and Part 2.
OpenTelemetry has been, in my opinion, one of the most engaging developments in the software community over the past few years. It’s proven incredibly valuable for instrumenting distributed systems, microservices, and complex architectures. Because of it, teams are able to understand their systems with increasing efficacy and share that understanding across the organization.
With its rapid adoption, OpenTelemetry is becoming increasingly prevalent on the frontend as well. However, we run into a problem: It feels awkward to use, particularly in the browser.
This isn’t necessarily anyone’s fault. It’s a natural consequence of having so many different languages using a single API; something is bound to feel off. The OpenTelemetry spec does state that APIs should feel idiomatic to a language, but the design awkwardness persists. I’m not sure why, but I suppose that when you put the needs of every community together along with the common denominator of language functionality, you inevitably end up with something that doesn’t feel quite natural in any given language.
That said, there’s a tremendous opportunity to build on top of this foundation and provide something that frontend developers would find more ergonomic. Several languages have already done similar work: Ruby, Go, and Java have fairly ergonomic OpenTelemetry integrations, for example.
These ergonomic implementations share common factors: Language-specific functionality is used to create conveniences on top of the common API, and common control flow patterns fit naturally into the state machine that OpenTelemetry expects.
Sometimes, the language doesn’t have particularly common control flow patterns (like Haskell or Ruby), but both languages have the flexibility to shape control flow in ways that allow the instrumentation libraries to remain ergonomic despite that potential friction.
In fact, I’m going to state a bold claim: The heart of OpenTelemetry is context management, which is a concept that is intentionally separated from the rest of the spec specifically so that context can be implemented in the most sensible way for the runtime environment. Despite the intent, we don’t seem to achieve the benefits of that separation of concerns in reality.
If we are to get those benefits and unlock truly ergonomic telemetry instrumentation, developing the ability to separate the control flow that OpenTelemetry expects from the control flow that makes sense in your program is essential. If there’s one thing I’d love for people to take away from this article, it’s that we would benefit massively from disaggregating context management, data instrumentation, and control flow in our systems.
There’s a trade-off here, and it can be tricky to navigate. If you take the state machine of OpenTelemetry’s desired control flow and push it into the libraries themselves, they can become extremely cumbersome to use. On the other hand, if you rely on propagating that control flow implicitly, you’ll run into problems when OpenTelemetry’s required control flow differs from your program’s natural control flow.