Giulia Di Pietro
Nov 10, 2022
This blog and YouTube Channel have already covered many topics related to Kubernetes and Service mesh. Today, we will focus on a Service mesh platform that manages your network using eBPF technology: Cilium.
Before I introduce Cilium, I’ll shortly describe how networks are managed in Kubernetes. Then I'll explain how Cilium works, how it supports service meshes, what Hubble is, and how it works together with Cilium. In the end, I will wrap it up with a tutorial.
How networks are managed in Kubernetes
Our vanilla Kubernetes cluster provides a networking layer to allow us to expose our pods with the help of services. Referring to a service instead of a pod makes more sense because pods are ephemeral and can be destroyed.
As you probably already know when you deploy a service you specify a selector that will match all the pods of your service. Once the service is created, Kubernetes also creates an endpoints object that will match the various pods of your services with their IP. Endpoints are important because the Kube proxy utilizes the information to create the right networking rule.
Kube-proxy is a crucial component running on each node of your cluster, routing the traffic to the right pods and IP addresses. It runs on the node, creating one iptable rule per service. The iptable rule takes the service IP and port to redirect the traffic to one of the pod's IP. For each endpoint, Kube-proxy installs iptable rules helping to select the right pod. Kube-proxy relies on the readiness probe to determine the right backing pod to serve.
How does k8s resolve the name of our services with the various IP tables created by Kube-proxy? It uses Kube-DNS. Most likely a core DNS that has one DNS rule per service.
And then Kube-proxy resolves the service name into the right IP address.
With this mechanism, any pods of your cluster can access any service except if we filter the traffic by creating NetworkPolicy rules.
The standard NetworkPolicy only allows you to apply a rule on IP and port. You can lock or unlock access to a given service but can't allow access to a limited HTTP endpoint.
As explained, Kubernetes networking relies a lot on iptables. Iptables is a core technology to manage the networking of your service, but it has an impact once you start having a large number of services deployed in your cluster (meaning lots and lots of iptable rules on each node of our cluster). From that particular moment, iptables could be one of the bottlenecks for your cluster. So how can you resolve this? One option is to rely on eBPF.
If you're new to eBPF, watch my previous episode on this technology on YouTube or read my blog post about it.
What is Cilium?
Cilium is an open source project that provides networking, security, and observability in your cloud-native environment, like Kubernetes. With the help of eBPF, Cilium can inject network and security policies without changing your application code.
You can use the Cilium CLI or the helm chart to deploy Cilium.
Deploying Cilium requires you to add the right taints to your nodes to force any workload to wait for the Cilium agent:
node-taints node.Cilium.io/agent-not-ready=true:NoExecute
And to disable the default Container Network Interface (CNI).
Once Cilium is deployed, you'll have several components that will manage the network in your cluster:
-
1
A Cilium Agent deployed as DaemonSet.
Each node will have Cillum that injects BPF programs to look at the node's network interface and each Container deployed in the node
-
2
The Cilium operator that manages the various policies that apply in your cluster
-
3
The Cilium node init that runs as a daemonset handling tasks like mounting the eBPF filesystem and updating the existing CNI plugin to run in ‘transparent’ mode
-
4
The Cilium CNI plugin that triggers the necessary datapath configuration to provide networking, load-balancing, and network policies for the pods
Once Cilium is deployed, you can start managing the network policy of your cluster using a CRD provided by Cilium:
CiliumNetworkPolicy
With CiliumNetworkPolicy you can authorize traffic of a given pod based on the label selector and select the pod/pods that are targeted by the rule.
For example, if I want to authorize all the pods from the hipster shop to send their spans to the OpenTelemetry collector deployed in the default namespace and the collector has the label component: collector, I could create the following rule:
apiVersion: Cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-ingress-from-oteld
namespace: default
spec:
endpointSelector:
matchLabels:
component: otel-collector
ingress:
- fromEndpoints:
- matchLabels:
io.kubernetes.pod.namespace: hipster-shop
toPorts:
- ports:
- port: "4317"
Here it means that only the pods from the hipster-shop can send traffic to my openTelemetry Collector.
The CiliumNetworkPolicy can even specify the egress rule if the pod needs to contact a service out of the cluster. We can authorize the communication to a specific domain name.
In our example, my collector will export the spans to Dynatrace, so I could even adjust the rule as follows:
apiVersion: Cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-ingress-from-oteld
namespace: default
spec:
endpointSelector:
matchLabels:
component: otel-collector
ingress:
- fromEndpoints:
- matchLabels:
io.kubernetes.pod.namespace: hipster-shop
toPorts:
- ports:
- port: "4317"
egress:
- toFQDNs:
- matchPattern: "*.live.dynatra.com"
Cilium allows you to create a rule based on the application layer by specifying the authorized HTTP endpoints, Kafka topics, query allowed to send to Cassandra, and Memcache request.
For example, with OpenTelemetry:
apiVersion: Cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-ingress-from-oteld
namespace: default
spec:
endpointSelector:
matchLabels:
component: otel-collector
ingress:
- fromEndpoints:
- matchLabels:
io.kubernetes.pod.namespace: hipster-shop
toPorts:
- ports:
- port: "4317"
rules:
http:
- method: "POST”
path: "/api/v1/traces"
egress:
- toFQDNs:
- matchPattern: "*.live.dynatra.com"
For more information, I would recommend looking at Cilium’s documentation.
The standard version of Cilium includes a few CRDs:
-
1
The CiliumClusterwideNetworkPolicy
-
2
The CiliumEndpoint
-
3
The CiliumExternalWorkload
-
4
CiliumIdentity
-
5
CiliumNetworkPolicy
-
6
CiliumNode
You will mainly use CiliumNetworkPolicy and CiliumClusterWideNetworkpolicy.
The CiliumClusterWideNetworkPolicy is similar to the CiliumNetworkPolicy except it targets the entire cluster. Instead of selecting pods based on labels, you select nodes.
Cilium automatically creates CiliumEndpoint. One CiliumEndpoint is created for each service managed by Cilium, with the same name, and in the same namespace. In fact, you can also see the various Cilium endpoints by using the Cilium CLI. Cilium will also create a CiliumNode for each Node managed by Cilium. Cilium also manages all the services that are configured with hostnetwork: false.
Cilium provides many advanced networking features to manage your cluster but requires you to enable extensions in Cilium (bandwidth management, egress gateway, cluster mesh, and more).
There is also an option to replace Kube-proxy with Cilium.
Each advanced networking feature will be required to enable a specific extension when deploying it with helm.
Once the extension is deployed it also provides extra CRDs to manage the bandwidth management, the egress gateway, the cluster mesh (to manage several clusters within the same network), and more.
The Cilium service mesh
Cilium supports ingress, among other network support. To enable it, you are required to disable the Kube proxy fully and deploy Cilium with the mode
kubeProxyReplacement=stric
To disable it in an existing managed cluster you'll need to:
-
1
Delete the daemonset Kube proxy
-
2
Delete the config map Kube proxy (in the event that it runs after a cluster upgrade)
-
3
Back up the iptables in each of your nodes.
In a cluster managed by a cloud provider, you won’t be able to disable the Kube proxy easily. That is the reason GCP, AWS, and others provide options to create clusters with Cilium enabled.
Once Cilium is properly configured, you can also deploy the ingress extension delegating the ingress implantation to Cilium. We would then be able to define the ingress rules with the ingress class name: Cilium.
In the end, with the Kube proxy replacement and the ingress enabled, Cilium can cover the following features:
-
1
Network filtering
-
2
Loadbalancing
-
3
Ingress
-
4
And observability
As previously explained in the introduction to service mesh, a service mesh provides several features: traffic split, observability, circuit breaker, rate limit, retry logic, and more.
Cilium has two service mesh modes: with sidecar proxy or without. To enable proxying without a sidecar, you'll need to enable Cilium ingress support and add the extra config of the envoy. Once deployed, Cilium will provide one envoy per node, and a new CRD allowing you to define proxy rules directly on the envoys.
CiliumEnvoyConfig
Here is an example of CiliumEnvoyConfig:
apiVersion: Cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
name: envoy-lb-listener
spec:
services:
- name: echo-service-1
namespace: default
- name: echo-service-2
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
name: envoy-lb-listener
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: envoy-lb-listener
rds:
route_config_name: lb_route
http_filters:
- name: envoy.filters.http.router
- "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: lb_route
virtual_hosts:
- name: "lb_route"
domains: [ "*" ]
routes:
- match:
prefix: "/"
route:
weighted_clusters:
clusters:
- name: "default/echo-service-1"
weight: 50
- name: "default/echo-service-2"
weight: 50
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 1s
regex_rewrite:
pattern:
google_re2: { }
regex: "^/foo.*$"
substitution: "/"
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/echo-service-1"
connect_timeout: 5s
lb_policy: ROUND_ROBIN
type: EDS
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/echo-service-2"
connect_timeout: 3s
lb_policy: ROUND_ROBIN
type: EDS
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
With the envoy proxy you only have one envoy per node instead of multiple envoys for each pod:
Otherwise, Cilium provides integration with Istio. This integration allows you to have a slightly different data plane than a regular Istio deployment. Once Istio is deployed, it has control and data planes. The data plane is made of all the Sidecar proxies injected into our workload. All the namespaces having the istio annotation would have the sidecar proxy injected.
The Sidecar proxy envoy injected into our workload uses iptables to route the traffic properly.
With the Cilium integration, you remove the support of iptables and increase the performance of your mesh.
The deployment of the Istio version for Cilium requires using the CLI Cilium-istioctl.
What is Hubble?
Hubble is a fully distributed networking and security observability platform. It is built on top of Cilium and eBPF to enable deep visibility into the communication and behavior of services as well as the networking infrastructure in a completely transparent manner.
Hubble allows you to understand:
-
1
The dependencies between our services by providing a communication map
-
2
How your network is currently behaving
-
3
The level of availability and performance of your services
Hubble comes with:
-
1
The Hubble server deployed has a daemonset, it will collect all the information provided by the Cilium agents. The Hubble server is in fact part of the Cilium agent.
-
2
The Hubble relay that collects all the data from all the Hubble servers
-
3
The Hubble CLI
-
4
A web UI dashboard
Cilium and Hubble's Prometheus support
Cilium and Hubble provide standard Prometheus support that must be deployed when deploying either.
Once the Prometheus support is enabled each component will produce Prometheus metrics (the various Cilium agents, the Cilium operator, the Cilium-envoy if you enable the envoy support, and Hubble).
When the Hubble Prometheus support is enabled, it creates a new service, the Hubble-metrics, that exposes the Prometheus data.
Cilium updates the various components by adding the right Prometheus annotations:
For the Cilium agents:
prometheus.io/scrape: true
prometheus.io/port: 9962
The Cilium operator:
prometheus.io/scrape: true
prometheus.io/port: 9963
The Hubble metric service:
prometheus.io/scrape: true
prometheus.io/port: 9965
Let’s have a look at the type metrics provided by this support.
The Cilium Endpoint
You can keep track of the number of endpoints managed by Cilium and the time to regenerate the endpoints.
Cluster health
Unreachable_nodes = Number of nodes that can't be reached
Unreachable_health_endpoints = Number of health endpoints that can't be reached
Node connectivity
Node_connectivity_status = observed status of connectivity between the current Cilium agent and other Cilium nodes
Node_connectivity_latency_seconds = latency between the current Cilium agent and other Cilium nodes in seconds
Clustermesh
The data allowing you to report the number of nodes in the cluster mesh, the readiness status, the number of nodes in failure
eBPF
The metrics reporting the number of calls to the BPF maps, the memory consumed by the BPF programs, the duration of latency of the BPF syscalls, and more.
Drops/Forwards (L3/L4)
Reporting the number of packets forward, dropped
Identity
Reporting the number of identities.
Policy
For l3/L4 policies we can report the number of policies loaded, and the status of each policy. for L7 policies ( HTTP or Kafka) we can report the redirect, the number of requests..etc.
API rate limit of the Cilium
The number of the request process, the current rate limit settings, the wait duration..and more
About the operator
With the operator, you can keep track of the number of allocated IPs, and the number of nodes having issues allocating an address. All the metrics from the operator and the agent allow you to keep track of the health of the Cilium core components.
Hubble will help you report metrics related to the traffic managed by Cilium…similar to metrics provided by service mesh.
Tutorial
In this tutorial, we will deploy Cilium, Hubble, and the Istio integration. We will build a few networking policies and configure Istio to expose our application. And then explore the observability provided by Cilium, Hubble, and Istio.
For this tutorial, you will need:
-
1
A k8s cluster
-
2
Cilium
-
3
Hubble
-
4
The Cert manager
-
5
The OpenTelemetry operator
-
6
A Dynatrace Tenant
-
7
A version of the online boutique fully instrumented by the OpenTelemetry community
Follow the full tutorial on my YouTube channel: What are Cilium and Hubble - with Thomas Graf
Or on GitHub: https://github.com/isItObserva...
Topics
Go Deeper
Go Deeper