Learning how to use the Prometheus Query Language is key to making full use of Prometheus. This blog post will show you what components it’s made of and how to use various PromQL queries.
Prometheus Query Language is the primary language that helps you visualize Prometheus data in a dashboard or build alerting rules when using the alert manager of Prometheus. Knowing how to build a PromQL is an essential skill to use Prometheus, and with this blog post, I want to help you learn how to do it.
This is a summary of my video called “How to build a PromQL” on the Is It Observable YouTube Channel, which covers the following topics:
I won’t describe the architecture of Prometheus in this blog post since this topic was covered in another video (How to collect metrics in K8s). If you’d like a refresher, read the summary here: how to collect metrics in k8s.
As we know, Prometheus stores the metrics scrapped from the various exporters in a time series database. You have an identifier for the metric you want to store, which will then be stored related to the timestamp of when the data was collected.
Identifier -> ( timestamp1, value 1), ( t2,v2),....
The identifier’s structure will be composed of a metric name and various labels.
job = where I scrapped that metric (defined in the scrap configuration file in Prometheus)
method = type of request
path = endpoint of our request
Thanks to these labels, you can filter through the metrics. However, technically the metrics are stored in a JSON object like this:
{
job=”httpcollector”,
method=”GET”,
path=”/tesst’ ,
__name__=”http_request_total”
}
__name__ is the name of your indicator. You can use this to filter all the metrics with a certain name, for example.
When applying PromQL, Prometheus will transform the data and present it in 4 different formats:
1
String
2
Scalar
3
Instant vector
4
Range vector
The first two are easy to understand: String is just a text format, and Scalar is a numerical value. However, instant vector and range vector are a bit more complex.
Data does not come to Prometheus at the same time. Prometheus reaches out to the various exporters to collect various data, and they all arrive at Prometheus at different times. When you run a query, you can evaluate the data based on the current time or a specific time.
If you're planning to write a Prometheus exporter, it’s crucial that you understand the various data types of Prometheus because you'll have to select the right set of data types depending on your use case. And even if you don’t build your exporter, it’s beneficial to understand the various data types so you can utilize the suitable functions.
Here’s an overview of those data types:
1
Counter
2
Gauge
3
Histogram
4
Summary
Several Prometheus clients can help you build your exporter. These clients are usually easy to use and several of their functions depend on the data type.
The counter type is designed to store metrics that will increase in time, i.e., request_count, error_count, total, etc. It should not be used for metrics that decrease.
How can you query counters?
To query counters, you should use the “rate” function.
rate( metric name [last period of time])
Rate only takes counters as input, and it will return the per/sec of the counter during a specific period. It returns an instant vector.
rate (http_request_count [5s]) = http_request/s of the last 5 seconds
Please be aware that if the exporter crashes for any reason, the counter will restart with the value 0 and increment from there. This is the biggest limitation of counters.
The histogram allows you to specify values for predefined buckets. Here’s an example:
Let’s say I would like to report the http_response size. Every time I use the histogram to report, i.e., 2 seconds, then Prometheus will take my response times and count the number of requests fitting in specific buckets.
The histogram will count the number of requests that had the value in our buckets. There are default buckets that have a clear structure:
So the default buckets will only support metrics for up to 10 seconds. You'll need to define custom buckets if you need to report a metric with a higher value. You need to know what the minimum and maximum values of your potential metric are.
Histograms report any value measurement to allow us to calculate averages, percentiles, etc. If you're not interested in the raw data, histograms will do a great job calculating them.
Prometheus will automatically store the number of elements in the histogram, and the total number reported (i.e., to calculate averages easily).
Here’s an example:
Rate ( metric_duaration_sum[5m])/rate( request_duration_count[5m]) = to get the average response of the last 5 minutes.
There is also a predefined function to calculate percentiles, where “le” is the bucket name:
Histogram_quantile ( 0.95, sum(rate(request_duration_buckets[5m]))) by (le) )
The summary is very similar to the histogram. Histogram quantiles are calculated in the Prometheus server, while summary is calculated in the application server. Summary data can't be aggregated from several application instances.
Histogram requires you to use the default bucket or define your bucket depending on the metric's value. While summaries are perfect if you don’t know what the range of values will be upfront.
We usually use summary when we need to collect as many measurements to calculate averages or percentiles later. The summary is great if you don't need raw data or very precise measurements, i.e., response times or response size.
Aggregation operators. They will only use instant vectors as input and return instant vectors as the operator's output. Some types of aggregation operators are: Sum, min, max, stddev, stdvar, quantile
Functions. There are multiple functions that you can use, like: abs(), changes(), sort(), vector(), rate(), etc. Functions take as input the range vector and will output the instant vector. The rate function is smart enough to detect if your counter value has been restarted (after a crash, for example).