Kubernetes

Critical vulnerability found in Fluent Bit: What you need to know

A newly-discovered vulnerability in Fluent Bit could be exploited to expose the data of major cloud providers and technology giants.

Henrik Rexed

May 22, 2024


Recently, the Tenable research team made a significant discovery: a vulnerability in one of our favorite observability agents: Fluent Bit.

As a lightweight observability agent, Fluent Bit is crucial in collecting, processing, and exporting logs, metrics, and traces to your observability backend. Many cloud providers rely on Fluent Bit for log collection and reporting, and if you’re using Google Kubernetes Engine (GKE), chances are a Fluent Bit instance is running in your cluster.

I’ve already made multiple videos about Fluent Bit, just recently one comparing Fluent Bit to the OpenTelemetry Collector and one presenting the latest updates in Fluent Bit 3.0, so I thought sharing this information with you was important.

The Fluent Bit security vulnerability: all the details

Let’s dive into the details of this vulnerability.

Fluent Bit allows us to enable an HTTP server on the agent, which aids in troubleshooting agent behavior. To enable the HTTP server, modify the Fluent Bit settings as follows:

            

[SERVICE]

HTTP_Server On

HTTP_Listen 0.0.0.0

HTTP_PORT 2020

Once the HTTP server is enabled, Fluent Bit exposes several HTTP endpoints, including:

  • Build details of the Fluent Bit instance

  • Uptime information

  • Metrics

  • Metrics in Prometheus format

  • And more

So, where is the flaw? The vulnerability lies in the exposed HTTP server, specifically related to a troubleshooting feature called “tracing.” This feature allows us to send HTTP requests with the name of a plugin to retrieve the time spent on that plugin. Unfortunately, Fluent Bit does not adequately validate the data structure of these requests.

How can this vulnerability be exploited?

An attacker can exploit this vulnerability by sending a non-string value, for example, an integer value, to the tracing endpoint. Although this action may seem harmless, it can lead to memory corruption within Fluent Bit.

In practical terms, this vulnerability could potentially enable DDoS attacks against Fluent Bit instances. An attacker could crash the agent and potentially access sensitive information within our cluster.

How it was fixed

Thankfully, the Fluent Bit community has responded swiftly. A fix has been provided and is slated for inclusion in the next minor version of Fluent Bit (v3). You can find the details of the fix in this commit: Fluent Bit GitHub Commit.

How to mitigate the risks of the Fluent Bit vulnerability

To reduce the risks associated with this vulnerability, follow these steps:

  1. 1

    Disable the HTTPS Server: If you’ve enabled the HTTPS server to expose Prometheus metrics, consider deactivating it. Instead, use Fluent Bit’s built-in metrics input and export those metrics using the Prometheus output plugin.

  2. 2

    Layer 7 Network Policies: For Kubernetes environments, create network policies at the layer 7 level. Define which HTTP requests and parameters are authorized when interacting with Fluent Bit. You can achieve this in a SeviceMesh by utilizing the Gateway API or Virtual Service. In Cilium, define a Cilium Network Policy while using a NetworkPolicy to control which workload can send network packets to Fluent Bit.

If you want to learn about securing your cluster, I would recommend watching the dedicated videos I released about Cilium or Istio a few months ago:

Following these precautions, you can safeguard your Fluent Bit deployment and maintain observability without compromising security.

Conclusion

In conclusion, discovering a vulnerability within Fluent Bit has underscored the importance of ongoing vigilance and proactive security measures in observability tools.

Organizations using Fluent Bit are advised to implement the recommended mitigations, such as turning off the HTTP server and enforcing strict network policies, to protect their deployments.

As we move forward, this incident reminds us of the ever-present need to balance functionality with security in our increasingly connected world.


Watch Episode

Let's watch the whole episode on our YouTube channel.


Related Articles