Giulia Di Pietro
Dec 16, 2022
Topics
On my YouTube channel, I have recently launched a new series called “Observable Lightning Talks”, where I invite three observability experts to share their knowledge with us.
In my first episode, I invited Steve McGhee, Aolita Sharma, and Michael Hausenblas to share their remarkable stories and takeaways from their work.
I already summarized my talk with Steve McGhee in a previous blog post (Non-standard SLOs: beyond availability), and today I'll share the key takeaways from Aolita Sharma’s presentation.
Aolita Sharma is a Governance Committee Member on the OpenTelemetry board and co-chair of the CNCF observability technical advisory group. She has a vast experience in cloud-native observability and shares with us why observability is important, how OpenTelemetry falls into the picture, and what themes are currently being discussed in the project.
Why observability?
Observability
-
1
Enables active understanding of the behavior of a system. (You're looking at the e2e systems and understand the system's behavior.)
-
2
Tracks dynamic states of a system (To understand the different states of a system when an action is running and whether it’s healthy or unhealthy.)
-
3
Takes into account uncertainty and variation in behavior
Today, the capacity of computers and networks has exponentially exploded. Their scale, diversity, computer power, commodity infrastructure, and speed vastly increase the volume and complexity of telemetry data that can be collected and empower advanced data analysis and visualization.
The whole pipeline of collecting, analyzing, and visualizing data is part of observability. Telemetry data is hugely important in understanding how a system functions. Metrics, traces, and logs are commonly used as processes and data to understand that behavior.
What should be observed?
If you want to do observability well, Aolita recommends making every aspect of a system accessible. Every layer, service, application, component, etc. This is crucial to be able to understand every aspect of your system and be able to prevent failures. Furthermore, if there are failures, it enables you to understand quickly the root causes that lead you to prevent these failures in the future.
Full stack for observability means:
Observability touches every single area, just like security. When we build new features and libraries for observability, they integrate with every area of the system to collect data, process it, and analyze it.
The more you understand your system, the deeper insight you have into how a system can perform well and deliver the quality of service you want.
Why does OpenTelemetry matter?
OpenTelemetry is an open source project and a system that collects and processes traces, metrics, and logs in an integrated way. It provides a baseline where the whole industry works together to define a specification to be implemented in the SDKs and APIs.
OpenTelemetry provides
-
1
Otel observability specification
-
2
OpenTelemetry data protocol (OTLP)
-
3
Client application instrumentation APIs and SDKs in 11 languages
-
4
Standalone collector, which functions as a collection agent
-
5
Exporters and samplers
OpenTelemetry matters because it’s the future of observability.
OpenTelemetry is the second-largest project in the CNCF after Kubernetes. Thousands of engineers and experts in observability have come together to think about how to make observability more standardized and leverage open source to achieve it. So it’s a really big effort to be able to develop a standardized open source observability framework.
OpenTelemetry provides the foundational infrastructure for the automation of future smart systems. It enables intelligent root cause analysis, self-healing, self-learning, and self-evolving systems. Some systems can already do this, but we're still moving towards e2e systems. Observability provides the path to achieve it.
Some current topics that are being discussed within the project, building toward the larger goal of having in-depth observability that is out of the box, easy to use, and standardized, are:
-
1
The end-user workgroup
-
2
Web maintainer team expanding (documentation)
-
3
Improving the contributor experience
-
4
Increasing the footprint and supporting better client instrumentation (RUM telemetry)
-
5
Developing a control plane (OpAMP)
-
6
Profiles
-
7
Implementing logs
-
8
File-Based Configuration
-
9
Capturing telemetry from CI/CD systems (Jenkins, Argo, etc.)
-
10
Semantic conventions
-
11
Additional Kubernetes metadata
-
12
Standard query format for telemetry data
-
13
OpenTelemetry Demo Application
OpenTelemetry Demo Application
The OpenTelemetry Demo Application runs through a complete end-to-end distributed system instrumented with 100% OpenTelemetry traces and metrics.
You can try it out here on GitHub: https://github.com/open-telemetry/opentelemetry-demo
Or learn more: https://github.com/open-telemetry/opentelemetry-demo/tree/main/docs
It showcases how to:
-
1
Instrument your application for 10 language SDKs
-
2
Create custom spans for richer, more useful traces using auto instrumentation
-
3
Propagate trace context automatically and manually
-
4
Pass observability data attributes between services
-
5
Create attributes, events, and other telemetry metadata
Get involved and contribute
The project roadmap is prioritized through action: contribute to areas that are important to you.
The best way to get involved is to join a weekly SIG call or the CNCF Slack #OpenTelemetry channels.
Topics