Linkerd is one of the most popular service mesh tools. In this blog post, I will describe its architecture and how to set up observability with OpenTelemetry.
In a previous Is It Observable? episode and blog post, I introduced the concept of service mesh and how you can use it to improve the architecture of your microservices. And for that tutorial, I used a service mesh called Istio, but there are multiple other tools that you can use instead of that one. That’s why today we will focus on another service mesh called Linkerd, one of the most popular.
Here’s what you can expect from today’s blog post:
1
A quick recap about service mesh
2
An introduction to Linkerd and its architecture
3
The service profile CRD
4
The observability features
5
Tutorial
The YouTube video will also include an interview with Jason Morgan, Technical Evangelist from Buoyant. Watch it to learn more about the status of the project!
In short, a service mesh handles the communication between services by providing features like the retry logic, TLS, ingress and egress, observability, etc.
With the help of a service mesh, you can focus on building the code for your application without having to include code that manages how the app behaves in the network.
LIkerd’s control plane combines several components:
1
The destination. This component is used by the data plane, and it provides all the destination rules: which requests, routes, retry logic, etc. are allowed. At the end, the destination will be the core component that manages the communication of our services through the Linkerd-proxy.
2
The identity service. This component acts as the TLS authority and provides signed certificates to the various proxies, thus guaranteeing secured communication between proxies.
3
The proxy injector. This component is registered with the Kubernetes admission controller. Every time a pod is created, the admission controller reaches the proxy injector that inspects the definition and the annotations. If the annotations exist, the injector modifies the workload by adding proxy_init and Linkerd-proxy to the pod.
Linkerd’s data plane relies on the Linkerd proxy and the init container. The Linkerd proxy manages the communication, provides Prometheus metrics, manages the TLS, and more. The init container runs before any pod container, forcing the traffic to be routed to the Linkerd-proxy.
Like Istio, Linkerd provides a CLI that allows you to interact with the control plane of Linkerd. It helps you install the control plane, inject Linkerd proxy, check the installation, and more. You can also deploy Linkerd using Helm.
Linkerd also provides new CRDs in your K8s cluster:
1
Server
2
ServerAuthorization
3
Service-profile
Server and ServerAuthorization will be required if you want to authorize specific incoming traffic to a specific service. In that case, you'll need to define the server mapping your service and the ServerAuthorization that will specify the clients allowed to connect to your service.
So it means that “HTTP get <host>/products/\w+” corresponds to all the URLs related to a product page. The URL to change currency is “HTTP post http://<host>/setCurrency with form data” with currency code. And “HTTP get /cart” to view the cart page.
We can build a Service-Profile with 3 rules: one for the product page, one for the currency page, and one for the cart page.
By creating those routes, Linkerd will provide specific metrics related to the defined routes on the top of the service profile and the ability to define retry logic and timeout logic.
The following example shows the service profile:
apiVersion: Linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
creationTimestamp: null
name: frontend.hipste-rshop.svc.cluster.local
namespace: hipster-shop
spec:
routes:
- condition:
method: GET
pathRegex: /
name: GET /
- condition:
method: GET
pathRegex: /carts
name: GET/carts
- condition:
method: GET
pathRegex: /products/[^/]*
name: GET /products/{id}
- condition:
method: POST
pathRegex: /setcurrency
name: POST /setcurrency
condition:
method: POST
pathRegex: /carts
name: POST /carts
The service profile exposes extra metrics with the viz extension I'll present later.
With the help of the service profile, we will be able to define the retry logic for the given route.
- condition:
method: GET
pathRegex: /products/[^/]
name: GET /products/{id}
isRetryable: true
Linkerd applies a default retry budget, if you want to customize it you can add a retry budget in the service profile definition:
retryBudget:
retryRatio: 0.2
minRetriesPerSecond: 10
ttl: 10s
And the timeout:
condition:
method: GET
pathRegex: /products/[^/]
name: GET /products/{id}
Timeout : 300ms
Each route defined provides extra metrics in the Prometheus exporter.
Tcp_open_connections ( number of connection currently open)
Tcp_write_bytes_total and tcp_read_bytes_total
Tcp_connection_duration_ms
5
Identity metrics to report TLS identity certificates KPI
Identity_cert_expiration_timestamp_seconds
identity_cert_refresh_count
Otherwise, Linkerd provides extra metrics with the help of the viz extension that comes with its own Prometheus instance, Grafana dashboards, and a web interface to drill down into Linkerd’s metrics.
If you don’t want to use the Prometheus instance provided by Linkerd, there is a process to deploy the viz extensions without Prometheus. You specify in the helm chart that you don’t need Grafana and Prometheus, and you map your Prometheus server to the viz extension. In Prometheus, you'll need to configure it by adding new scraping configurations. You can also keep the Prometheus instance provided by Linkerd by using one of the features provided by Prometheus: Federate. Federate means that a Prometheus instance can collect the data from another server. Proxies and control planes automatically have exporters, therefore we can easily ingest those metrics in Prometheus and other solutions like Dynatrace.
Linkerd provides support for distributed traces mainly compatible with OpenCensus (the ancestor of OpenTelemetry).
Opencensus produces traces using a specific tracing propagator: B3.
If you want to enable the tracing feature of Linkerd, you'll need to check if your instrumentation library uses the B3 propagator and if your observability backend supports B3 tracing contexts. Then, you need to install the Jaeger extension, which deploys the Jaeger backend, the Jaeger injector, and an OpenTelemetry collector. If you already have Jaeger and the Collector installed, you can remove them from the Helm deployment file.
If you're generating traces with OpenTelemetry, you'll need to create a Collector pipeline receiving OpenTelemetry and Opencensus spans. Of course, you'll need spans with B3 tracing contexts to map all the spans together. The Jaeger injector is a core component that injects the right environment variables to every Linkerd proxy: the Collector service URL.
You'll need to restart your pods to get the tracing configuration applied to all your pods.
By enabling tracing on Linkerd, you'll get the latency of each communication with the Linkerd proxies.