What are the main components of the Fluent Operator?

The Fluent Operator includes several Custom Resource Definitions (CRDs) to manage deployments, configurations, and the various steps of the log stream pipeline for both Fluent Bit and Fluentd.

How does the Fluent Operator deploy Fluent Bit and Fluentd?

The Fluent Operator deploys Fluent Bit as a DaemonSet to collect logs from the cluster and Fluentd as a StatefulSet to finalize the log stream pipeline.

What are the benefits of using the Fluent Operator?

Using the Fluent Operator simplifies the process of managing and updating log agents and their pipelines, ensuring efficient log collection and processing.

What are the best practices for implementing the Fluent Operator?

Best practices include defining clear input, parser, and filter steps in your log stream pipeline and regularly updating your configurations to adapt to changing requirements.

Kubernetes

What is the Fluent Operator and how do you use it?

The Fluent Operator simplifies the maintenance of your log agents, especially when you combine Fluent Bit and Fluentd in your log pipeline.

Giulia Di Pietro

Apr 14, 2022

7 minute read

What is the Fluent Operator and how do you use it?

The Fluent Operator simplifies the maintenance of your log agents, especially when you combine Fluent Bit and Fluentd in your log pipeline.

This blog post will introduce you to the Fluent operator and share some best practices for its implementation. In the end, you'll find more information about the tutorial I built on how to use the Fluent Operator and all other relevant links.

Let’s start with a quick introduction to the two log agents before focusing on the Fluent Operator.

# Introduction to Fluentd and Fluent Bit

In some of my previous blog posts, I have already spoken at length about Fluentd and Fluent Bit, but here’s a quick summary of the differences and similarities between the two tools to get us started.

Fluent Bit and Fluentd are both log agents able to collect, filter, parse, and forward log streams to the observability backend of your choice.

The major differences between the two solutions are that:

1

Fluent Bit is lightweight, built for cloud-native and more efficient in collecting and parsing logs
2

Fluentd has multiple plugins to transform and forward the logs for any solution on the market

Ideally, we’d want to use Fluent Bit to collect and do most of the tasks in our log pipeline and utilize Fluentd to finalize the transformation of our logs by doing advanced processing and then forwarding the logs to our preferred solution.

Both log agents can be deployed in bare metal technology or within a Kubernetes cluster. They require building a log stream pipeline describing how we want to collect and process logs. This pipeline is stored in a configuration file (config map for Kubernetes).

Operating Fluentd or Fluent Bit requires managing the deployment of the log agent (usually deployed in Kubernetes using daemonset) and updating your various log stream pipelines by modifying the various configmaps.

When you combine Fluentd with Fluent Bit, you need to start managing different updates and doing them correctly. To simplify this process, Fluent has created the Fluent Operator.

# The Fluent Operator

The Fluent Operator allows you to manage the deployment of only Fluent Bit or the combination of Fluentd and Fluent Bit.

The fluent community recommends using Fluentd to do advanced processing on your logs. The operator will deploy Fluent Bit in a daemonset to help us collect logs from our cluster. In the case of Fluentd, the operator will deploy it as a statefulset to finalize the log stream pipeline.

This means that the operator will come with several Custom Resource Definitions (CRDs) to manage for both Fluent Bit and Fluentd:

1

Deployment
2

The configuration that links the various steps of the log stream pipeline
3

Define input steps
4

Define parser steps
5

Define filter steps
6

Define output steps

# The Fluent Bit CRDs

The Fluent Operator provides specific CRDs to manage your Fluent Bit deployment and pipeline. Put simply, CRDs create new k8s objects that the Fluent Operator will watch to update the log stream pipeline and deployment. You can find an extensive list of all CRDs on the Fluent Operator documentation on GitHub.

Here’s an example of a Fluent Bit object (from the CRD Fluent Bit, which defines the Fluent Bit DaemonSet and its configuration).

            apiVersion: Fluent Bit.fluent.io/v1alpha2
kind: Fluent Bit
metadata:
name: fluent-bit
namespace: kubesphere-logging-system
labels:
app.kubernetes.io/name: fluent-bit
spec:
image: kubesphere/fluent-bit:v1.8.3
positionDB:
hostPath:
path: /var/lib/fluent-bit/
resources:
requests:
cpu: 10m
memory: 25Mi
limits:
cpu: 500m
memory: 200Mi
Fluent BitConfigName: fluent-bit-config
tolerations:
- operator: Exists
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/edge
operator: DoesNotExist

This object includes the default Fluent Bit image (kubesphere/fluent-bit), but you can use a custom one if needed. However, remember that it’s not a standard Fluent Bit image because it requires the Fluent Bit Operator to work and handle the reloading of dynamic configuration.

The Fluent Bit object specifies how to deploy the log agents and links to a configuration object that will map the various log stream pipeline steps. The configuration would be defined at the cluster level with the help of ClusterFluentBitConfig. You can limit the scope of your pipeline by specifying the namespace field.

Here is an example of a CLusterFluentBitConfig:

            apiVersion: Fluent Bit.fluent.io/v1alpha2
kind: ClusterFluent BitConfig
metadata:
  name: fluent-bit-config
  labels:
  app.kubernetes.io/name: fluent-bit 
spec:
  service:
  parsersFile: parsers.conf
  inputSelector:
  matchLabels:
  Fluent Bit.fluent.io/enabled: "true"
  filterSelector:
  matchLabels:
  Fluent Bit.fluent.io/enabled: "true"
 outputSelector:
  matchLabels:
  Fluent Bit.fluent.io/enabled: "true"

The clusterconfig will attach the logs stream steps with the help of label selector.

Image source: https://github.com/fluent/fluent-operator/raw/master/docs/images/fluent-bit-operator-workflow.svg

Those labels must be specified on each Fluent Bit object describing your pipeline steps.

You define how to map your:

1

Input steps using inputselector
2

Filter steps using filterselector
3

Parser steps using parserselector
4

Output steps using outputselector

Each step of the pipeline will then be split into big steps of your log stream:

1

Input by creating a ClusterInput object
2

Parser by creating a ClusterParser ( that could be also part as one of the operations of the filter task)
3

Filter with the ClusterFilter
4

And, lastly, ClusterOutput for the output.

The operator watches the various objects and steps of the pipeline to create secrets containing the actual logs stream pipeline. The deployed daemonset refers to those secrets.

So defining a Fluent Bit pipeline with the Fluent Operator would be different from a traditional Fluent Bit pipeline.

For example, here is a traditional Fluent Bit pipeline:

            [INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On*
[FILTER]
Name modify
Match *
Rename message content
Rename log content
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Merge_Log_Trim On
Labels Off
Annotations Off
K8S-Logging.Parser Off
K8S-Logging.Exclude Off
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under kubernetes
Add_prefix kubernetes_

In the fluent operator each block of this pipeline will represent a specific CRD, for example, the cluster input:

            apiVersion: Fluent Bit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
name: tail
labels:
Fluent Bit.fluent.io/enabled: "true"
Fluent Bit.fluent.io/component: logging
spec:
tail:
tag: kube.*
path: /var/log/containers/*.log
parser: docker
refreshIntervalSeconds: 10
memBufLimit: 5MB
skipLongLines: true
db: /fluent-bit/tail/pos.db
dbSync: Normal

Clustfilter:

            apiVersion: Fluent Bit.fluent.io/v1alpha2
kind: ClusterFilter
metadata:
name: kubeletfilter
labels:
Fluent Bit.fluent.io/enabled: "true"
Fluent Bit.fluent.io/component: logging
spec:
match: service.kubelet
filters:
- modify:
rules:
- rename:
message: content
log: content

And cluster output using stdout:

            apiVersion: Fluent Bit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
name: Fluentd-stdout-kub
labels:
Fluent Bit.fluent.io/enabled: "true"
spec:
stdout:
format: json_stream
match: kube.*

Every update made on each CRD will automatically update the Fluent Bit agents running.

As of now, not all Fluent Bit plugins are available in the Fluent Operator, but we could expect to have more plugins soon.

# The Fluentd CRDs

Fluentd CRDs follow a similar logic to the Fluent Bit CRDs. You need to create a Fluentd object that will specify the deployment of your Fluentd agent. The significant difference is that you have to define how to receive logs from Fluent Bit:

            apiVersion: Fluentd.fluent.io/v1alpha1
kind: Fluentd
metadata:
name: Fluentd
namespace: kubesphere-logging-system
labels:
app.kubernetes.io/name: Fluentd
spec:
globalInputs:
- forward:
bind: 0.0.0.0
port: 24224
replicas: 1
image: kubesphere/Fluentd:v1.14.4
FluentdCfgSelector:
matchLabels:
config.Fluentd.fluent.io/enabled: "true"

Similarly to the Fluent Bit object, you can customize the image, but remember that it’s a specific instance of Fluentd designed for the operator.

The Fluentd object specifies which configuration to use. There are two ways: ClusterFluentdConfig at the cluster level or FluentdConfig at the namespace level.

Here’s an example of a ClusterFluentdConfig:

            apiVersion: Fluentd.fluent.io/v1alpha1
kind: ClusterFluentdConfig
metadata:
name: Fluentd-config
labels:
config.Fluentd.fluent.io/enabled: "true"
spec:
watchedNamespaces:
- kube-system
- kubesphere-monitoring-system
clusterOutputSelector:
matchLabels:
output.Fluentd.fluent.io/enabled: "true"

You can set limits on specific namespaces here using “watchedNamespaces.” Similarly to Fluent Bit CRDs, you'll map your pipeline using label selectors.

Because Fluentd acts as a log forwarder in the Fluent Operator, you only have a filter and output steps available.

So, the idea is to collect, parse, and filter your logs with FluentBit and then forward them to Fluentd to take advantage of its filter and output plugins.

That is why Fluentd only has the following pipeline CRDs: ClusterOutput that will need to be used in a Cluster level (when using ClusterFluentdConfig) or Output when using a namespace level (FluentdConfig), ClusterFilter or Filter.

Here is an example of ClusterOutput:

            apiVersion: Fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
name: Fluentd-output-es
labels:
output.Fluentd.fluent.io/enabled: "true"
spec:
outputs:
- elasticsearch:
host: elasticsearch-logging-data.kubesphere-logging-system.svc
port: 9200
logstashFormat: true
logstashPrefix: ks-logstash-log

# Best practices for using the Fluent Operator

# Adding more plugins

Currently, the Fluent Operator doesn’t allow adding extra plugins, and only a few of them are supported (see the full list here).

Since our pipeline is built upon multiple CRDs, providing a new image (as you would do in a traditional Fluentd deployment) won’t help you. CRDs need to be updated by adding the new definition of the parameters of our new plugin.

If you need a specific Fluentd plugin that is not supported, I recommend deploying Fluentd manually (without an operator) and configuring your Fluent Bit pipeline to forward the log stream to your custom Fluentd deployment.