Giulia Di Pietro
Aug 18, 2022
On the Is It Observable blog and YouTube channel, we have already covered OpenTelemetry several times. However, until now, we hadn’t focused on observing Kubernetes clusters with OpenTelemetry, particularly with the OpenTelemetry Collector.
In today’s blog post, I'll start by introducing the OpenTelemetry Collector and the components from core and contrib. Then, I’ll give you an overview of the receivers and processors that are useful for observing a Kubernetes cluster. Finally, we will move on to the tutorial so you can try it out in practice.
What is the OpenTelemetry Collector?
OpenTelemetry is a standard that helps you create measurements from your application. It supports several observability signals, like traces, metrics, and logs.
OpenTelemetry is composed of two main components:
-
1
The instrumentation library
-
2
The Collector
The Collector helps you process measurements and forward them to your preferred observability backend. It’s not a mandatory component, but it may be required to process OpenTelemetry logs in the future. (Read more details about OpenTelemetry in our short guide.)
As already explained in our short guide to OpenTelemetry, the Collector can be deployed in several ways:
-
1
Agent - The Collector instance running with the application on the same host ( sidecar container, daemonset ...etc.)
-
2
Gateway - One or more Collectors instances running as standalone services per cluster, datacert, and region.
It’s recommended to deploy it in agent mode to collect local measurements and use several Collectors to forward your measurements to your observability solutions.
The Collector helps you keep your code agnostic. You'll only import standard OpenTelemetry libraries in your code and do all the vendor transformation and export using the Collector.
The Collector requires you to build a pipeline for each signal (traces, metrics, logs, etc.)
Like the agent log Collector, the Collector pipeline is a sequence of tasks starting with a receiver, then a processing sequence, and then the last sequence to forward the measurements with the exporter sequence.
The OpenTelemetry Collector also provides extensions. They're generally used for implementing components that can be added to the Collector but don't require direct access to telemetry data.
Each pipeline step comprises an operator that’s part of the Collector core or from the contrib repository. In the end, you'll use the operator provided by the release of the Collector.
Every plugin supports one or more signals, so make sure that the one you’d like to use supports traces, metrics, or logs.
Designing a pipeline
Designing your pipeline is very simple. First, you need to declare your various receivers, processors, and exporters, as follows:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:55680
http:
endpoint: 0.0.0.0:55681
exporters:
otlphttp:
endpoint: "TENANTURL_TOREPLACE/api/v2/otlp"
headers: {"Authorization": "Api-Token DT_API_TOKEN_TO_REPLACE"}
logging:
loglevel: debug
sampling_initial: 5
sampling_thereafter: 200
In this example, we only define one receiver, OTLP, and two exporters, OTLP/HTTP and logging.
Then, you need to define the actual flow of each signal pipeline:
service:
pipelines:
traces:
receivers: [otlp]
processors: []
exporters: [otlphttp,logging]
The core components of the OpenTelemetry Collector
The core components of the OpenTelemetry Collector include a few plugins that help you build your pipeline. The contrib Collector includes the core features of the OpenTelemetry Collector.
Receiver
The core Collector has only one available receiver for the data source: the standard OpenTelemetry protocol for traces, metrics, and logs.
To define your Otlp receiver, you need to define:
receivers:
otlp:
protocols:
grpc:
Http:
The Collector will bind a local port on the Collector to listen for incoming data. The default port for grp is 4317, and the default port for HTTP is 4318.
The receiver has the property CORS (Cross-origin Resource Sharing). Here, we can white label the origins allowed to send requests and the allowed headers.
receivers:
otlp:
protocols:
http:
endpoint: "localhost:4318"
cors:
allowed_origins:
# Origins can have wildcards with *, use * by itself to match any origin.
- https://*.example.com
allowed_headers:
- Example-Header
max_age: 7200
Processor
The core Collector also includes a few processors that modify the data before exporting it by adding attributes, batching, deleting data, etc. No processors are enabled by default in your pipeline. Every processor supports all or a few data sources (traces, logs, etc.), and the processor's order is important.
The community recommends using the following orders:
Traces
any sampling processors
Any processor relying on sending source from Context (e.g. k8sattributes)
any other processors
Metrics
Any processor relying on sending source from Context (e.g. k8sattributes)
any other processors
The core Collector provides only two types of operations for the processor: the batch processor and the memory limiter processor.
Memory limiter processor
The memory limiter is a crucial component of all our pipelines of all types of data sources. It is recommended to be one of the first steps after receiving the data.
Memory limit controls how much memory is used to avoid facing out-of-memory situations. It uses both soft and hard limits.
When the memory usage exceeds the soft limit, the Collector will drop data and return the error to the previous step of the pipeline (normally the receiver).
If the memory is above the hard limit, then the processor will force the garbage Collector to free some memory. That will cause data to be dropped. When the memory drops and goes back to normal, the operation is resumed, with no data dropped and no GC.
The soft limit is calculated according to two parameters: hard limit - spike limit. This means that you can’t change it in the settings.
Some more critical parameters are:
-
1
Check limit: (default value = 0, recommended value = 1) the time between two measurements of the memory usage.
-
2
Limit_mib: the maximum amount of memory in MiB. This defines the hard limit.
-
3
Spike_limit_mib: (default value = 20% of limit_mib). The maximum spike that’s expected between the measurements. The value must be less than the hard limit.
-
4
Limit_percentage: the hard limit defined by taking a % of the total available memory.
-
5
Spike_limit_percentage
Example:
processors:
memory_limiter:
check_interval: 1s
limit_mib: 4000
spike_limit_mib: 800
Batch processor
The batch processor supports all data sources and places the data into batches. Batching is important because it compresses the data and reduces the number of outgoing connections sent.
As explained previously, the batch processor must be added to all your pipelines after any sampling processor.
The batching operator has several parameters:
Send_batch_size: the number of spans, metrics, and logs in a batch that would be sent to the exporter (default value = 8192)
Timeout: time duration after each data transfer (default = 200ms)
Send_batch_max_size: the upper limit of the batch size (0 means no upper limit).
Example:
processors:
batch:
batch/2:
send_batch_size: 10000
timeout: 10s
Exporters
There are three default exporters in the OpenTelemetry Collector: OTLP/HTTP, OTLP/gRPC, and Logging.
OTLP/HTTP
OTLP/HTTP has only one required parameter, the Endpoint, which is the target URL to send your data. For each signal, the Collector will add:
-
1
/v1/traces for traces
-
2
/v1/metrics for metrics
-
3
And /v1/logs for logs.
There are also some optional parameters:
-
1
Traces_endpoint: if you want to customize the URL without having the Collector add/v1/traces, then you should use this setting
-
2
Metrics_endpoint
-
3
Logs_endpoint
-
4
Tls: that has all the TLS configuration, i.e.:
tls:
insecure: false
ca_file: server.crt
cert_file: client.crt
key_file: client.key
min_version: "1.1"
max_version: "1.2"
-
1
Timeout: (default 30s)
-
2
Read_buffer_size
-
3
Write _buffer_size
By default, the Collector compresses the data in gzip format. To disable the compression, you can use “Compression: none.”
OTLP/gRPC
OTLP/gRPC has fewer parameters: Endpoint and TLS
Grpc also compresses the content in gzip. If you want to disable it, you can add:
Compression: none
The OTLP/gRPC has support for proxy by adding the environment variables:
-
1
HTTP _PROXY
-
2
HTTPS_PROXY
-
3
NO_PROXY
Logging
Logging is an operator that you'll probably use to debug your pipelines. It accepts a couple of parameters:
-
1
Logging_lebel: (default info)
-
2
Sampling_initial: number of messages initially logged each second
Extensions
The core Collector provides two extensions: memory_ballast and Zpages
Like the memory_limiter, memory ballast is also a recommended extension to control the memory consumption of the Collector.
Memory_ballast
A memory_ballast optimizes the number of GC triggered by your system. It increases the heap size to reduce the number of GCP cycles. By reducing the GCP cycle, we will directly reduce the CPU usage (related to the GC).
The memory_ballast can be configured with:
-
1
Size_mib: corresponding to the ballast size
-
2
Size_in_percentage: the ballast's memory based on the total memory available
If we're using limits in docker or k8S, then the ballast size would be:
Memeory_limit * size_in_percentage/100
For example:
extensions:
memory_ballast:
size_in_percentage: 20
Zpages
Zpages is a format introduced by OpenCensus. This extension creates HTTP endpoints to provide live data for debugging different components of the Collector.
Zpage has one parameter: endpoint (default = localhost:55679). It provides a different route with different details:
-
1
ServiceZ (http://localhost:55679/debug/servicez) gives an overview of the Collector services
-
2
PipelineZ (http://localhost:55679/debub/pipelinez) provides details on the running pipeline defined in the Collector
-
3
ExtensionZ provides details on the extensions
-
4
FeatureZ shows the list of features available
-
5
TraceZ shows the bucket size of the spans by the latency of buckets (0us, 10us, 100us, 1ms, 10ms, 100ms, 1s, 10s, 1m)
-
6
And last, RpcZ shows statistics on remote procedure calls.
The components from the contrib repo
The OpenTelemetry Collector Contrib provides many plugins, but it wouldn’t be useful to describe all of them. Here, we will only look at the receivers, the exporters, and the processors.
Receiver
Receivers are crucial because they can connect to a third-party solution and collect measurements from it.
We can almost separate the receivers into 2 categories: listening mode and polling mode. You can find a complete list here: OpenTelemetry-collector-contrib repository
Here are some examples of receivers:
Most receivers for traces are in listening mode, for example:
-
1
AWS X-Ray
-
2
Google PUbSub
-
3
Jaeger
-
4
Etc.
For metrics, there are fewer listening plugins. Here are a few of them:
-
1
CollectD
-
2
Expvar
-
3
Kafka
-
4
OpenCencus
-
5
Etc.
On the other side, there is a larger number of receivers working in polling mode like:
-
1
Apache
-
2
AWS Container Insights
-
3
Cloud Foundry Receiver
-
4
CouchDB
-
5
Elasticsearch
-
6
flinkMetrics
-
7
Etc.
We could be interested in all the database receivers, webservers, broker technology, and OS from this list. But also kubelet, k8scluster, Prometheus, and hostmetrics.
In terms of logs, there are few operators available, mainly acting in listening mode:
-
1
fluentForward
-
2
Google PUbSub
-
3
journald
-
4
Kafka
-
5
SignalFx
-
6
splunkHEC
-
7
tcplog
-
8
udplog
And fewer acting in polling mode:
-
1
filelog
-
2
k8sevent
-
3
windowsevent
-
4
MongoDB Atlas
Processors
The Collector Contrib provides various processors that help you modify the structure of the data with the help of attributes (adding, updating, deleting), k8sattrributes, resource detection, and resources (updating/deleting resource attributes). All those processors support all existing signals.
Then you have a specific processor for trades that helps you adjust the sampling decisions (probability sampler or tailsampling) or to group your traces (with groupbytraces). For metrics, two specific processors help you make operations: cumulative to delta and delta operator. An interesting processor exposes observability metrics related to our spans that deserve to be tested, as it could help you count the span per latency buckets: spanmetrics.
In the end, you'll probably use frequently these processors:
-
1
Transform
-
2
Attributes
-
3
K8sattrributes
Extensions
The Collector contrib provides more extensions to the Collector. We can split them into various categories:
Extensions handling the authentication mechanism to the receivers or exports
-
1
Asapauth
-
2
Basicauth
-
3
Beartokenauth
-
4
Oauth2auth
-
5
Oidauth
-
6
Sigv4auth
Extensions for operations
-
1
Httphealthcheck: provides an HTTP endpoint that could be used with k8s with liveness or readiness probe
-
2
Prof: to generate profiling out of the Collector
-
3
Storage: storing the data's state into a DB or a file
Extensions for sampling:
-
1
jaegerremotesampling
And then you have a very powerful extension: observer.
The observer helps you discover networked endpoints like a Kubernetes pod, Docker container, or local listening port. Other components can subscribe to an observer instance to be notified of endpoints coming and going. Observers usually use few receivers to adjust how data is collected based on incoming information.
How to observe k8S using the Collector contrib
With all those operators, the big question is: can we observe our K8S cluster using only the Collector?
One option would be to use Prometheus exporters and the receiver to scrape the metrics directly from them. But we don’t want to use the Prometheus exporter in our case; instead, we will try to find a way to use the Collector operator designed for Kubernetes.
Receivers collecting metrics:
-
1
k8Scluster
-
2
Kubelet
-
3
Hostmetrics
Receivers collecting logs:
-
1
Kubernetes events
A few processors:
-
1
memory_limiter and batch
-
2
k8sattributes
-
3
transform
Let’s first look at the various receivers we're going to use.
The receivers
k8Scluster
K8scluster collects cluster-level metrics from the k8s API. This receiver will require you to have specific rights to collect data, and it provides a different way to handle the authentication:
-
1
A service account (default mode) will require creating a service account with a clusterRole to read and list most of the Kubernetes objects of the cluster
-
2
A Kubeconfig to map the kubeconfig file to be able to interact with the k8s API
The receiver has several optional parameters:
-
1
Collection_interval
-
2
Node_codition_to_report
-
3
Distribution: OpenShift or Kubernetes
-
4
Allocatable_types_to_report: could specify the type of data we're interested in (CPU, memory, ephemeral storage, storage)
This plugin generates data with resource attributes, so if your observability solution is not supporting resource attributes in the metrics, make sure to convert resource attributes into labels.
Kubelet Stats receiver
The Kubelet Stats receiver interacts with the kubelet API exposed on each node. To interact with kubelet you'll need to handle the authentication using TLS settings or with the help of a service account.
If using a service account, give the right list/watch rights to most of the Kubernetes objects.
Because this receiver requires the node’s information, you can utilize an environment variable in the Collector to specify the node:
apiVersion: apps/v1
kind: Deployment
…
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
And then
receivers:
kubeletstats:
collection_interval: 20s
auth_type: "serviceAccount"
endpoint: "https://${K8S_NODE_NAME}:10250"
insecure_skip_verify: true
Or we could also combine it with the powerful extension “observe.” For example:
extensions:
k8s_observer:
auth_type: serviceAccount
node: ${K8S_NODE_NAME}
observe_pods: true
observe_nodes: true
receivers:
receiver_creator:
watch_observers: [k8s_observer]
Receivers:
kubeletstats:
rule: type == "k8s.node"
config:
auth_type: serviceAccount
collection_interval: 10s
endpoint: "`endpoint`:`kubelet_endpoint_port`"
extra_metadata_labels:
- container.id
metric_groups:
- container
- pod
- node
In this example, the endpoint and kubelet_endpoint_port will be provided by the observer.
Then we could add extra metadata using the parameters extra_metadata_labels and metric_group to specify which metrics should be collected. By default, it will collect metrics from containers, pods, and nodes, but you can also add volume.
To get more details on the usage of the nodes at the host level, you could also use the Hostmetrics receiver.
Kubernetes event receiver
The Kubernetes event receiver collects events from the k8s API and creates OpenTelemetry logs data. Similar to the previous receivers, you'll need specific rights and can authenticate using Kubeconfig or serviceaccount.
This receiver also offers the ability to filter the events for specific namespaces with namespaces parameter. By default, it is set to all.
The processors
Now, let’s look at the processors we’re going to use.
k8Sattributes
The K8sattributes processor will be used to add extra labels to our k8s data.
This processor collects information by interacting with the Kubernetes API. Therefore, similarly to the previous operator, you'll need authentication using serviceaccount or Kubeconfig.
You can specify extra labels you may want to use using the extract operator :
For example :
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
filter:
node_from_env_var: K8S_NODE_NAME
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
- k8s.pod.start_time
MetricTransform
The MetricTransform processor helps you to:
-
1
Rename metrics
-
2
Add labels
-
3
Rename labels
-
4
Delete data points
-
5
And more.
In our case, we would use this processor to add an extra label with the cluster ID and name to all the reported metrics. This label would be crucial to help us filter/split data from several clusters.
Last, we won’t describe exporters, but you'll need to use one exporter for the produced metrics and one for the generated logs.
Tutorial
This tutorial will show you how to use the OpenTelemetry operator to observe your Kubernetes cluster. It’s a perfect exercise to use the various receivers and processors explained previously for k8s.
For this tutorial, we will need several things:
-
1
A Kubernetes cluster
-
2
The Nginx ingress controller to expose our demo app and Grafana
-
3
The OpenTelemetry operator
-
4
The Prometheus operator (without the default exporters)
-
5
Loki to forward our events
-
6
Prometheus to store the metric collected
-
7
Grafana to build a quick dashboard
We will build the right metric and log pipeline to export the metrics to Prometheus remote writer and logs to Loki.
Watch the full video tutorial on YouTube here: OpenTelemetry Collector - How to observe K8s using OpenTelemetry
Or go directly to GitHub: How to observe your K8s cluster using OpenTelemetry
Topics
Go Deeper
Go Deeper