Loki is a log aggregation system developed by Grafana Labs. It is designed to work with Prometheus and Grafana, providing a way to collect, store, and query logs.

Why use Loki for observing NGINX Ingress Controllers?

Using Loki for observing NGINX Ingress Controllers allows you to collect detailed logs that can be turned into metrics. This provides deeper insights into the performance and health of the ingress controllers compared to using metrics alone.

Promtail is an agent developed by Grafana Labs that is used to collect logs and ship them to Loki. It has the same service discovery and configuration as Prometheus, making it easy to set up if you are already familiar with Prometheus.

How do you create metrics from logs using LogQL?

LogQL is a query language used to parse logs and create metrics in Loki. You can use LogQL to filter logs based on labels and extract specific information to create metrics that provide insights into the performance of your NGINX Ingress Controllers.

What are some common LogQL queries for NGINX logs?

Common LogQL queries for NGINX logs include filtering logs by namespace, application, and specific keywords. You can use functions like rate(), count(), and sum() to analyze the logs and create metrics that provide insights into request counts, response times, and error rates.

Kubernetes

How to observe your NGINX Controller with Loki

Q: How do you collect logs from NGINX using Loki?

To collect logs from NGINX using Loki, you need to set up an agent like Promtail, which ships the contents of the logs to a Grafana Loki instance. The logs are then parsed and turned into metrics using LogQL.

Q: How to build a LogQL?

Once the log streams are stored, you want to be able to consume and transform the collected data to build dashboards or alerts for your project. LogQL allows you to filter, transform, and extract data as metrics. Once you have metrics, you will be able to use any functions from PromQL to aggregate them. After applying a filter and metric, it can also return a log stream.

The NGINX Ingress controller provides a bridge between Kubernetes services and external ones. Let’s look at how to extend this communication's observability with Loki's help.

Giulia Di Pietro

Jan 28, 2022

8 minute read

How to create metrics out of logs using LogQL from Loki - with Cyril Tovena

When you utilize a Prometheus exporter to observe ingress controllers, you'll notice that a few dimensions are missing, and it comes to a point that you can’t get the level of detail you want. That’s why it’s useful to implement another solution like Loki from Grafana to extend visibility. But how do you do it?

This blog post is part of a 3-part series on observing the NGINX controller, each with its article and YouTube video.

In this article, we will focus on observing the NGINX Controller with Loki by looking first at how to collect the logs and turn them into metrics (with the help of Loki's main contributor Cyril Tovena from Grafana) and then how to create your own LogQL.

# Collecting logs and turning them into metrics

When you utilize a Prometheus exporter to observe ingress controllers, you'll notice that a few dimensions are missing, and you can’t get the level of detail you want. That’s why it’s a good idea also to utilize Loki to extend our visibility on our ingress controller.

A key part of the journey from logs to metrics is setting up an agent like Promtail, which ships the contents of the logs to a Grafana Loki instance. The logs are then parsed and turned into metrics using LogQL.

As Cyril explains in the video, Loki and Promtail were developed to create a solution like Prometheus for logs. Promtail was developed right at the start and it has the same service discovery and configuration as Prometheus. You can use other agents instead of Promtail, but it’s easier to use Promtail if you already know how to use Prometheus. You can just copy-paste the config, and you're set.

Loki is very useful for the NGINX use case because ingress controllers don't provide a lot of metrics. However, they can produce a lot of logs, and they have a lot more details than metrics. With LogQL, you would parse those logs and create your matrix to get the information you want.

The first step is to use your index to find all the logs (label filter), with which you can decide how many logs you'll query. I.e., if you want to query a full namespace, then use the “namespace key”; if you want a specific application, use both the application and the namespace keys.

It’s better to reduce the number of logs you’re going to pass at the beginning and scan as much as possible. For this, you'll need some preparation (line filter), which filters the logs based on a word. There are two types of filters: regexp and contain. Cyril recommends using multiple contains together and regexp only if you really need to. Contains are faster, and you can filter more at the beginning, ensuring that the last operation at the end will be faster.

Before you parse logs, you need to ensure they're in the right format. There are two ways you can do this:

1

Discard all logs that aren’t in a JSON format.
2

Select a log screen that has only the specific format you require.

And finally, we have parsing, where you add more labels for aggregation or filtering to the log line.

So, to summarize:

Logs are ingested with Promtail. You build a LogQL that filters for the label related to the NGINX controller and parse the log stream to match the NGNIX log structure. The parser will help us extract new labels and then unwrap labels to expose the right metrics.

Another recommendation is to avoid parsing logs too much at the source and do it in Loki instead. Loki has four parsers, two of which depend on the format (json or logfmt). The other two depend on pattern or regexp. The pattern parser is easier to use than the regexp one, and it’s also faster and consumes fewer resources.

Now that we know what the process looks like let’s jump into how to build a LogQL.

# How to build a LogQL

LogQL stands for Log Query Language, and, similarly to what PromQL is for Prometheus, it’s the query language for logs. If you're using an observability backend solution supporting LogQL, it’s crucial to understand what you can achieve with it.

Everything starts with a log stream pipeline that collects logs from various sources and stores them into a logstream storage solution like Loki.

Once the log streams are stored, then we want to be able to consume and transform the collected data to build dashboards or alerts for your project.

LogQL allows you to filter, transform, and extract data as metrics. Once you have metrics, you will be able to use any functions from PromQL to aggregate them. After applying a filter and metric, it can also return a log stream.

LogQL is composed of a {Stream selector} and a log pipeline where each step of your pipeline is separated by a vertical bar - |.

# The log agent collector

When collecting logs, our log agent collector adds context to our log stream like the pod name, the service name, etc. So the log stream selector allows us to filter our logs based on the labels available in our log stream.

Similar to PromQL, we can filter by using label matching operators like:

1

= exactly equal
2

!= not equal
3

=~ regexp matches
4

!~ regexp does not match

# The log pipeline

The log pipeline helps you process and filter the logs stream with the help of label matching operators. It can be composed of:

1

Line filter
2

Parser
3

Label filter
4

Line format
5

Labels format
6

Unwrap ( for metrics)

Let’s see them one by one

Line filter

The line filter is similar to a grep applied over the aggregated logs. It will search the content of the log line. Here are the operators:

1

|=: Log line contains string
2

!=: Log line does not contain string
3

|~: Log line contains a match to the regular expression
4

!~: Log line does not contain a match to the regular expression

Here’s an example in which we're only interested in logs that have an error:

            {container=”frontend”} |= “error”
{cluster=”us-central-1”} |= “error” != “timeout”

Parser expression

The parser expression can parse our log stream and extract labels from the log content using different functions:

1

JSON
2

Logfmt
3

Parser
4

Unpack
5

Regexpp

Building upon the previous example:

            {container=”frontend”} |= “error” | JSON

The log stream is in JSON format:

            {
“pod.name”= { “id:”deded”}, 
namespace:”test”
}

If I apply the JSON parser, as a result of that all those JSON attributes will be exposed as new labels in our transformed log stream, which will look like this:

1
```
pod_name_id = “deded”
```
2
```
namespace=test
```

You can also specify a parameter to JSON to specify which labels you would like to extract to avoid getting everything.

Logfmt will extract all keys and values from the logfmt format line.

            at=info method=GET path=/
 host=mutelight.org
 fwd=”123.443.21.212”
 status=200 bytes=1653

Will turn into

1

At = “info”
2

method = GET
3

path = /
4

host = mutelight.org

Pattern parser

The pattern is a powerful parser tool that allows you to explicitly extract fields from the log lines.

For example, we have this log stream:

            192.176.12.1[10/Jun/2021:09:14:29 +0000] "GET /api/plugins/versioncheck HTTP/1.1" 200 2 "-" "Go-http-client/2.0" "13.76.247.102, 34.120.177.193" "TLSv1.2" "US" ""

This log line can be parsed with the following expression:

            <ip> - - <_> "<method> <uri> <_>" <status> <size> <_> "<agent>" <_>

Using <_> shows that you're not interested in keeping a specific label.

Regexp parser

Regexp is similar to the pattern, but you can specify the expected format utilizing your regexp regular expression.

Label filter

Once you have parsed your log stream and extracted and added new labels, you will be able to apply new label filtering.

After my previous expression:

            {container=”frontend”} |= “error” | JSON

I can add:

            {container=”frontend”} 
|= “error” 
| JSON
| duration > 1m and 
bytes_consumed > 20MB

That shows that I’m interested in a specific duration and amount of bytes consumed.

Line Format Expression

The line format expression helps you rewrite your log content by displaying only a few labels:

            {container="frontend"}
| logfmt 
| line_format "{{.ip}} 
{{.status}} 
{{div .duration 1000}}"

Labels Format Expression

The label format can rename, modify and even add labels to our modified log stream. Then we can use Metric Queries to extract log streams and metrics from our log.

# Metric queries

Metrics queries apply a function to log query results and return a range vector.

There are two types of aggregators:

1

Log range aggregation
2

Unwrapped range aggregation

Log range aggregation

Similarly to Prometheus, a range aggregation is a query followed by a duration. Here are some of the supported functions:

1

rate(log-range): calculates the number of entries per second
2

count_over_time(log-range): counts the entries for each log stream within the given range.
3

bytes_rate(log-range): calculates the number of bytes per second for each stream.
4

bytes_over_time(log-range): counts the amount of bytes used by each log stream for a given range.
5

absent_over_time(log-range): returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (absent_over_time is useful for alerting when no time series and logs stream exist for label combination for a certain amount of time.)

Example

            sum by (host) (
rate(
 {job="mysql"} 
 |= "error" != "timeout" 
 | JSON 
 | duration > 10s 
[1m])
)

In this example, we're defining a split by the host and only interested in the jobs related to MySQL, errors but not timeouts. We parse it to add new labels using JSON, and lastly, we're doing label filtering where we're only looking at durations above 10 seconds.

Unwrap range aggregations

Unwrap will specify which labels will be used to expose the metrics. Here are some of the supported functions for operating over unwrapped ranges:

1

rate(unwrapped-range): calculates per second rate of all values in the specified interval.
2

sum_over_time(unwrapped-range): the sum of all values in the specified interval.
3

avg_over_time(unwrapped-range): the average value of all points in the specified interval.
4

max_over_time(unwrapped-range): the maximum value of all points in the specified interval.
5

min_over_time(unwrapped-range): the minimum value of all points in the specified interval
6

first_over_time(unwrapped-range): the first value of all points in the specified interval
7

last_over_time(unwrapped-range): the last value of all points in the specified interval
8

stdvar_over_time(unwrapped-range): the population standard variance of the values in the specified interval.
9

stddev_over_time(unwrapped-range): the population standard deviation of the values in the specified interval.
10

quantile_over_time(scalar,unwrapped-range): the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.
11

absent_over_time(unwrapped-range)

Example

            quantile_over_time(0.99,
 {cluster="ops-tools1",container="ingress-nginx"}
 | JSON
 | __error__ = ""
 | unwrap request_time [1m])) by (path)

In this example, we want the percentile of 99. We're filtering only on the specific label of ops-tools1, and we also add another filter for the container. Then we parse it with JSON and use __error__ to remove potential errors. Lastly, we unwrap request_time above 1m.