Stanza is the most lightweight logging technology, developed by observIQ and now part of OpenTelemetry. Learn how to create a pipeline with Stanza’s operators and plugins.
Stanza is a logging technology developed by observIQ and has recently been donated to OpenTelemetry to be embedded into the OpenTelemetry Collector. It is the most lightweight agent log collector and forwarder based on Go, but it’s still quite new in the logging space.
In this blog post, I’d like to introduce Stanza, its various operators and plugins, and share how to build a pipeline with it. All of its features will be potentially part of the OpenTelemetry Collector.
Stanza is the new kid on the block in the logging space, and it’s more lightweight than its peers, Fluent Bit and FluentD.
Stanza can be utilized both in Kubernetes and traditional bare metal technology using Linux, Mac, and Windows. Similarly to other log agent collectors, Stanza requires you to define a pipeline describing the process of ingesting logs:
A Stanza pipeline will be stored in a configuration file (for bare metal deployments) or a config map (Kubernetes). It is structured in a YAML format, i.e.,
Pipeline:
Type: plugin name
Various arguments of the plugins:
Id: (optional)
Output:
...
Type:
...
Every step of the pipeline corresponds to an operator. In the example above, we defined two operators.
You can create a linear pipeline or a non-linear one.
The Linear pipeline follows the flow defined in our file:
Pipeline:
Operation 1
Operation 2
Operation 3
It’s possible, but not recommended, to change the order of the operations.
Each parameter uses standard parameters: output and ID. The output will define which step will follow this operation. ID is the name of the current operation.
It’s important to note that you always start with an input operation. In the following example, the first operation is related to collecting logs with the plugin: file_input.
For example:
type: json_parser # 2nd operator
type: stdout # 3rd operator
type: file_input # 1st operator
include:
my-log.json
output: json_parser
We can specify the id of the task:
type: file_input
id: pperator_number_1
include:
my-log.json
output: operator_number_2
type: json_parser
id: operator_number_2
output: operator_number_3
type: stdout
id: operator_number_3
Stanza will produce a log stream in JSON format with the following structure:
{
Timestamp:
Record: {
Label1: value 1;
Label2: value 2
message:” “
},
Resource: {
Ressource1: value1;
Ressource2: value2
},
Labels: {
},
Severity: int
}
Stanza has a default expression allowing access to one of the objects of the log stream:
Built-in functions: len (length) , all ( return true if all elements satisfy condition), none (do not satisfy), any, one, filter ( filter array by contention), map , count
An operator in Stanza is a task of our log stream pipeline that helps us read from a file, parse the log, filter it and then push it to another log stream pipeline (similarly to the forwarding plugin of FluentD or Fluent Bit) and directly to the observability backend of your choice.
Similarly to the other agents on the market, there are several types of operators:
Severity objects allow you to map a severity based on fixed values or a range of values:
mapping:
# single value to be parsed as "error"
error: oops
# list of values to be parsed as "warning"
warning:
hey!
YSK
# range of values to be parsed as "info"
info:
min: 300
max: 399
# special value representing the range 200-299, to be parsed as "debug"
debug: 2xx
# single value to be parsed as a custom level of 36
36: medium
# mix and match the above concepts
95:
really serious
min: 9001
max: 9050
5xx
CSV_parser
Another important parser to look at is the CSV_parser which will help you parse log streams or log keys. You'll be able to specify the log key names using the header field and you can also customize the delimiter.
type: csv_parser
parse_from: message
header: 'id,severity,message'
header_delimiter: ","
delimiter: "\t"
Input record
{
"timestamp": "",
"record": {
"message": "1 debug \"Debug Message\""
}
}
Output record
{
"timestamp": "",
"record": {
"id": "1",
"severity": "debug",
"message": "\"Debug Message\""
}
}
If your CSV content contains a timestamp or severity that you want to use as a severity or time reference for your log stream, you can use the timestamp object or severity.
Parse_from
The parse_from operator specifies which field contains our timestamp. If you have the logfmt format, you can imagine using the csv_parser with a delimiter set to \t.
JSON_parser
The JSON_parser is used to extract JSON content stored in a specific field. You can utilize the if operator to validate that the content is JSON.
Stanza has operators to allow you to filter, add extra metadata, restructure logs, and more.
Let’s have a look at a few useful operators:
1
Rate_limit
2
Filter
3
Route
4
Metadata
5
Restructure
6
Host_metadata
7
K8s_metadata_decorator
8
Add
9
Copy
Rate_limit
Rate_limit is an operator that helps you limit the rate of logs that can pass through.
If your backend solution allows a certain number of logs/s to be ingested, then this operator will allow you to limit the traffic pushed to your output plugin.
Filter
Filter helps you to filter log streams to limit to log stream respecting the condition:
Type: filter
Expr: $record.logkey1 contains “isitobservable”
In the example above, all the logs containing “isitobservable” will be dropped. You can also filter based on labels, as you can see below:
Type: filter
Expr: $labels.namesapce == “istio-system”
Route
Route is a very important operator because you can create different routes in your pipeline based on certain conditions. For example:
1
type: router
routes:
output: my_json_parser
expr: '$.format == "json"'
output: my_syslog_parser
expr: '$.format == "syslog"'
output: my_csv_parser
expr: '$.format == "csv"'
output: my_xml_parser
expr: '$.format == "xml"'
Default: default_parser
Metadata
Metadata is an operator that will help you to enrich your log stream by adding labels and resources. Combining it with expr enables you to create dynamic metadata.
So Stanza will first look if there is a pre-built plugin for NGINX. If not, it will look at the plugin folder and generate the new operator on the fly.
Stanza also provides a utility helping us to package our plugins into a Configmap for Kubernetes deployments: https://github.com/observIQ/st...
Plugins can be added in a configmap. This means you will have one configmap with your pipeline and one configmap for the extensions.
This tutorial will collect Kubernetes logs and push the modified log stream to Dynatrace.
For this tutorial, we will need:
1
A Kubernetes cluster
2
An NGINX ingress controller to help us route the traffic to the right services
3
A demo application, in our case the Google Hipster Shop
4
A Dynatrace tenant
5
A deployed local Dynatrace active gate to push our logs.
For this tutorial, I have created the Dynatrace output operator that will transform the log stream to the Dynatrace format and interact with the log ingest API.
We will create a simple pipeline based on a predesigned plugin that will collect the logs from the container, add the Kubernetes metadata, and then use the dynatrace_output operator.