This article was originally published on The New Stack.
As software systems grow larger, they become less comprehensible to any one engineer. Organizations expand in size, responsibilities become narrower and domain knowledge becomes more diffuse. A successful software organization, whether a medical information provider or a delivery service, will begin to split responsibilities among engineers.
Then, once everything is in production, teams run into the unknown unknowns of deployed software: business risks, performance degradation, and any other thing one can only learn from computer systems that are actively running in the world. This means that teams need information to understand a system that is too large for one person’s continuous comprehension or intuition to keep up with.
The solution to this issue, increasingly, has become observability. Observability is a loaded term, especially since its tools, practices and purpose are mostly prescribed by the vendors selling their version of it. Common to the many definitions of observability, however, is the investigation and the making sense of unpredictable applications in production.