Prometheus

How to build a PromQL (Prometheus Query Language)

Learning how to use the Prometheus Query Language is key to making full use of Prometheus. This blog post will show you what components it’s made of and how to use various PromQL queries.

Giulia Di Pietro

Giulia Di Pietro

Jan 20, 2022


Prometheus Query Language is the main language that helps you visualize Prometheus data in a dashboard or build alerting rules when using the alert manager of Prometheus. Knowing how to build a PromQL is an essential skill to use Prometheus, and with this blog post, I want to help you learn how to do it.

This is a summary of my video called “How to build a PromQL” on the Is It Observable YouTube Channel, which covers the following topics:

  • The types of Prometheus metrics

  • The ways of storing data in Prometheus

  • The ways to filter data

  • The operators

  • A small tutorial on various queries

Types of Prometheus metrics

I won’t describe the architecture of Prometheus in this blog post, since this topic was covered in another video (How to collect metrics in K8s). If you’d like to get a refresher, you can also read the summary here: how to collect metrics in k8s.

As we know, Prometheus stores the metrics scrapped from the various exporters in a time series database. You have an identifier for the metric you want to store and then it will be stored related to the timestamp of when the data was collected.

Identifier -> ( timestamp1, value 1), ( t2,v2),....

The identifier’s structure will be composed of a metricname and the various labels.

For example :

http_request_total{job=”httpcollector”, method=”GET”, path=”/tesst’}

job = where I scrapped that metric (defined in the scrap configuration file in Prometheus)

method = type of request

path = endpoint of our request

Thanks to these labels, you can filter through the metrics. However, technically the metrics are stored in a JSON object like this:

            

{

job=”httpcollector”,

method=”GET”,

path=”/tesst’ ,

__name__=”http_request_total”

}

__name__ is the name of your indicator. You can use this to filter all the metrics with a certain name, for example.

When applying PromQL, Prometheus will transform the data and present it in 4 different formats:

  • String

  • Scalar

  • Instant vector

  • Range vector

The first two are easy to understand: String is just a text format, Scalar is a numerical value. However, instant vector and range vector are a bit more complex.

Instant vector

Data does not come to Prometheus at the same time. Prometheus reaches out to the various exporters to collect various data and they all arrive in Prometheus at different times. When you run a query, you can evaluate the data based on the time: current time or a specific time.

I.e., if I use this query:

rate(cpu_meory_usage_total{ host=myserver, type=applicationserver, namesapce=test} [30s] )

At the end, I’m using a filter on the time, ([30s]) which means “evaluate the query from the last 30s”.

It will return the value the cpu_memory_usage /s over the last 30s

Instant vectors give you one value per series and if you don’t indicate otherwise, it will be the last reported value.

Range vector

The range vector doesn’t give you one value, like the instant vector, it gives you a set of values measured between two timestamps.

The various ways of storing data in Prometheus

If you are planning to write a Prometheus exporter, it’s crucial that you understand the various data types of Prometheus because you will have to select the right set of data types depending on your use case. And even if you don’t build your own exporter, it’s beneficial to understand the various data types so you can utilize the right functions.

Here’s an overview of those data types:

  • Counter

  • Gauge

  • Histogram

  • Summary

There are several Prometheus clients that can help you build your exporter. These clients are usually easy to use and several of their functions depend on the data type.

  • Go

  • Java

  • Python

  • Ruby

Let’s look at the data types in detail.

Counter

The counter type is designed to store metrics that will increase in time, i.e., request_count, error_count, total, etc. It should not be used for metrics that decrease.

How can you query counters?

To query counters, you should use the “rate” function.

rate( metric name [last period of time])

Rate only takes counters as input and it will return the per/sec of the counter during a period of time. It returns an instant vector.

rate (http_request_count [5s]) = http_request/s of the last 5 seconds

Please be aware that if, for any reason, the exporter crashes, the counter will restart with value 0 and will increment from there. This is the biggest limitation of counters.

Gauge

Gauge can be used for metrics where the value goes up and down, i.e., response time, memory usage, cpu usage, etc. Gauge is used quite often.

How do you query gauge in Prometheus?

There are functions and operators that take input as gauge and those are usually aggregated functions.

avg_over_time(http_response_time[5m]) = that will return the average response times of the last 5 minutes

DO NOT use rate with the gauge metric!

Histogram

Histogram will allow you to specify value for predefined buckets. Here’s an example:

Let’s say I would like to report the http_response size. Every time I use histogram to report, i.e., 2 seconds, then Prometheus will take my response times and will count the number of requests that were fitting in specific buckets.

Histogram will count the number of requests that had the value in our buckets. There are default buckets that have a clear structure:

.005,.01,.025;.05,.075.1,.25,.5,.75,1,2.5,5,7.5,10

So the default buckets will only support metrics that go up to 10 seconds. If you need to report a metric that has a higher value, you will need to define custom buckets. You need to know what the minimum and maximum value of your potential metric is.

Histograms are used to report any measurement of a value to allow us to calculate averages, percentiles, etc. If you are not interested in the raw data, histograms will do a great job of calculating them for you.

Prometheus will automatically store the number of elements in the histogram and the total number reported (i.e., to easily calculate averages).

Here’s an example:

Rate ( metric_duaration_sum[5m])/rate( request_duration_count[5m]) = to get the average response of the last 5 minutes.

There is also a predefined function to calculate percentiles, where “le” is the bucket name:

Histogram_quantile ( 0.95, sum(rate(request_duration_buckets[5m]))) by (le) )

Summary

Summary is very similar to histogram. Histogram quantiles are calculated in the Prometheus server, while summary is calculated in the application server. Summary data cannot be aggregated from a number of application instances.

Histogram requires you to use the default bucket or define your own bucket depending on the value of the metric. Whilst summaries are perfect if you don’t know what the range of values will be up front.

We usually use summary when we need to collect as many measurements to later calculate averages or percentiles. Summary is great if you do not need raw data or very precise measurements, i.e., response times or response size.

How to filter data with PromQL

PromQL offers multiple ways to filter data. In this section, we will look at the following possibilities into detail:

  • Labels

  • Range operator

  • Offset

  • @ modifier

Filter data with labels in PromQL

We’ve seen that Prometheus data has labels, which allows us to filter the metrics that we have stored. Here’s an overview of the label matching operators:

  • = : Select labels that are exactly equal to the provided string.

  • != : Select labels that are not equal to the provided string.

  • =~ : Select labels that regex-match the provided string.

  • !~ : Select labels that do not regex-match the provided string.

Important: you need to specify a name or a label operator. You cannot use an empty string.

NO: http_request_total {path=””}

NO: http_reques_total{path=~”.*”}

YES: http_request_total[path=~”.+”}

Filter data with the range operator in PromQL

With the range operator, you can specify a time duration that will filter vectors between now and a specific timing.

[time unit]

Time durations are specified as a number, followed immediately by one of the following units:

  • ms - milliseconds

  • s - seconds

  • m - minutes

  • h - hours

  • d - days - assuming a day has always 24h

  • w - weeks - assuming a week has always 7d

  • y - years - assuming a year has always 365d

Filter data with offset in PromQL

With offset, you can request the value from a certain amount of time before the moment the query was done.

http_requests_total offset 5m

Filter data with the @ modifier in PromQL

The @ modifier allows you to change the evaluation date for a specific timestamp. Without this modifier, you would use the current timestamp by default.

You can combine range selectors and offsets with @modifier. Here are some examples:

rate(http_requests_total[5m] @ 1609746000)

offset after @

http_requests_total @ 1609746000 offset 5m

# offset before @

http_requests_total offset 5m @ 1609746000

Operators in PromQL

In PromQL, you have 3 types of operators:

  1. 1

    Aggregation operators. They will only use instant vectors as an input, and will return instant vectors as output of the operator. Some types of aggregation operators are: Sum, min, max, stddev, stdvar, quantile

    Aggregation (<instant vector>) => <instant vector>

  2. 2

    Binary operators. Which can be:

    Arithmetic operators: +, -, / * , ^ ;, %

    Comparison operators: !=,=,<...

    And , or, unless

  3. 3

    Functions. There are multiple functions that you can use, like: abs(), changes(), sort(), vector(), rate(), etc. Functions take as an input the range vector and will output the instant vector. The rate function is smart enough to detect if your counter value has been restarted (after a crash, for example).

Tutorial

Now that we’ve covered the most important data types, filters and operators in PromQL, let’s jump into the tutorial!

As a prerequisite, make sure that the following tools are installed on your machine before you start:

  • jq

  • kubectl

  • git

  • gcloud ( if you are using GKE)

  • Helm


Watch Episode

Let's watch the whole episode on our YouTube channel.

Go Deeper