Giulia Di Pietro
Mar 03, 2022
Stanza is a logging technology developed by observIQ and has recently been donated to OpenTelemetry to be embedded into the OpenTelemetry Collector. It is the most lightweight agent log collector and forwarder based on Go, but it’s still quite new in the logging space.
In this blog post, I’d like to introduce Stanza, its various operators and plugins, and share how to build a pipeline with it. All of its features will be potentially part of the OpenTelemetry Collector.
Let’s dive into it!
Introduction to Stanza
Stanza is the new kid on the block in the logging space, and it’s more lightweight than its peers, Fluent Bit and FluentD.
Stanza can be utilized both in Kubernetes and traditional bare metal technology using Linux, Mac, and Windows. Similarly to other log agent collectors, Stanza requires you to define a pipeline describing the process of ingesting logs:
-
1
Collect
-
2
Parse
-
3
Filter
-
4
Forward (with output operator)
How do you design a pipeline with Stanza?
A Stanza pipeline will be stored in a configuration file (for bare metal deployments) or a config map (Kubernetes). It is structured in a YAML format, i.e.,
Pipeline:
Type: plugin name
Various arguments of the plugins:
Id: (optional)
Output:
...
Type:
...
Every step of the pipeline corresponds to an operator. In the example above, we defined two operators.
You can create a linear pipeline or a non-linear one.
The Linear pipeline follows the flow defined in our file:
Pipeline:
Operation 1
Operation 2
Operation 3
It’s possible, but not recommended, to change the order of the operations.
Each parameter uses standard parameters: output and ID. The output will define which step will follow this operation. ID is the name of the current operation.
It’s important to note that you always start with an input operation. In the following example, the first operation is related to collecting logs with the plugin: file_input.
For example:
type: json_parser # 2nd operator
type: stdout # 3rd operator
type: file_input # 1st operator
include:
my-log.json
output: json_parser
We can specify the id of the task:
type: file_input
id: pperator_number_1
include:
my-log.json
output: operator_number_2
type: json_parser
id: operator_number_2
output: operator_number_3
type: stdout
id: operator_number_3
Stanza will produce a log stream in JSON format with the following structure:
{
Timestamp:
Record: {
Label1: value 1;
Label2: value 2
message:” “
},
Resource: {
Ressource1: value1;
Ressource2: value2
},
Labels: {
},
Severity: int
}
Stanza has a default expression allowing access to one of the objects of the log stream:
-
1
$timestamp = timestamp of the entry
-
2
$record = content of the log stream
-
3
$resource = resource of the logs stream
-
4
$labels = label of the log stream
-
5
Severity = severity of the log
-
6
env() = helps you to access environment variables
You can apply various expression operators:
-
1
Arithmetic: +, - , *, /, % , **
-
2
Comparison: == , !=, <, >, <=, >=
-
3
Logica : not or ! , and or && or or ||
-
4
String: + , matches ( regexp), contains, startWith, endsWith
-
5
Arrays: in, not in,
-
6
Built-in functions: len (length) , all ( return true if all elements satisfy condition), none (do not satisfy), any, one, filter ( filter array by contention), map , count
What is an operator in Stanza?
An operator in Stanza is a task of our log stream pipeline that helps us read from a file, parse the log, filter it and then push it to another log stream pipeline (similarly to the forwarding plugin of FluentD or Fluent Bit) and directly to the observability backend of your choice.
Similarly to the other agents on the market, there are several types of operators:
-
1
Input
-
2
Parser
-
3
Transform
-
4
Output
Input operators
Multiple input operators can be used:
-
1
File_input
-
2
forward_input
-
3
Windows event log
-
4
Tcp
-
5
Udp
-
6
Journald
-
7
AWS_cloud_watch
-
8
Azure_vent_hub
-
9
Azure_log_analytics
-
10
Http_input
-
11
K8s_event_input
Each operator has its own set of parameters, except ID and output.
Please check the latest Stanza documentation to learn more about the operator that captures your interest: stanza/docs/operators at master · observIQ/stanza (github.com)
Parser operators
Stanza has various parser operators. Most of them will have standard fields:
-
1
parse_from = which field to parse
-
2
if = a field to set a condition on when to apply this parser
-
3
on_error = precising what to do in case of a parsing issue
-
4
drop = drop the log stream
-
5
send = to continue
-
6
timestamp = to extract the time
-
7
Severity = to extract the severity
Severity
Let’s look at an example of severity. This field has a mapping option to map values to a specific severity:
pipeline:
type: file_input
path: /var/log/test.log
type: regex_parser
regex: '^Time=(?P<timestamp_field>\d{4}-\d{2}-\d{2}), Host=(?P<host>[^,]+), Type=(?P<type>.*)$'
timestamp:
parse_from: timestamp_field
layout_type: strptime
layout: '%Y-%m-%d'
severity:
parse_from: severity_field
mapping:
critical: 5xx
error: 4xx
info: 3xx
debug: 2xx
Severity objects allow you to map a severity based on fixed values or a range of values:
mapping:
# single value to be parsed as "error"
error: oops
# list of values to be parsed as "warning"
warning:
hey!
YSK
# range of values to be parsed as "info"
info:
min: 300
max: 399
# special value representing the range 200-299, to be parsed as "debug"
debug: 2xx
# single value to be parsed as a custom level of 36
36: medium
# mix and match the above concepts
95:
really serious
min: 9001
max: 9050
5xx
CSV_parser
Another important parser to look at is the CSV_parser which will help you parse log streams or log keys. You'll be able to specify the log key names using the header field and you can also customize the delimiter.
type: csv_parser
parse_from: message
header: 'id,severity,message'
header_delimiter: ","
delimiter: "\t"
Input record
{
"timestamp": "",
"record": {
"message": "1 debug \"Debug Message\""
}
}
Output record
{
"timestamp": "",
"record": {
"id": "1",
"severity": "debug",
"message": "\"Debug Message\""
}
}
If your CSV content contains a timestamp or severity that you want to use as a severity or time reference for your log stream, you can use the timestamp object or severity.
Parse_from
The parse_from operator specifies which field contains our timestamp. If you have the logfmt format, you can imagine using the csv_parser with a delimiter set to \t.
JSON_parser
The JSON_parser is used to extract JSON content stored in a specific field. You can utilize the if operator to validate that the content is JSON.
type: json_parser
parse_from: message
timestamp:
parse_from: seconds_since_epoch
layout_type: epoch
layout: s
Input record
{
"timestamp": "",
"record": {
"message": "{\"key\": \"val\", \"seconds_since_epoch\": 1136214245}"
}
}
Output record
{
"timestamp": "2006-01-02T15:04:05-07:00",
"record": {
"key": "val"
}
}
For example:
Pipeline:
Type: file_input
Path: /var/log.*.log
Type: json_parser
If: $record matches “^{.*}$”
On top of the CSV and JSON parser, Stanza provides more parser plugins :
-
1
Regexp_parser
-
2
Syslog_parser to parse logs in syslog format
-
3
Severity_parser
-
4
Time_parser
-
5
XML_parser
-
6
Uri_parser
-
7
Key_value_parser
Transform operators
Stanza has operators to allow you to filter, add extra metadata, restructure logs, and more.
Let’s have a look at a few useful operators:
-
1
Rate_limit
-
2
Filter
-
3
Route
-
4
Metadata
-
5
Restructure
-
6
Host_metadata
-
7
K8s_metadata_decorator
-
8
Add
-
9
Copy
Rate_limit
Rate_limit is an operator that helps you limit the rate of logs that can pass through.
If your backend solution allows a certain number of logs/s to be ingested, then this operator will allow you to limit the traffic pushed to your output plugin.
Filter
Filter helps you to filter log streams to limit to log stream respecting the condition:
Type: filter
Expr: $record.logkey1 contains “isitobservable”
In the example above, all the logs containing “isitobservable” will be dropped. You can also filter based on labels, as you can see below:
Type: filter
Expr: $labels.namesapce == “istio-system”
Route
Route is a very important operator because you can create different routes in your pipeline based on certain conditions. For example:
-
1
type: router
routes:
output: my_json_parser
expr: '$.format == "json"'
output: my_syslog_parser
expr: '$.format == "syslog"'
output: my_csv_parser
expr: '$.format == "csv"'
output: my_xml_parser
expr: '$.format == "xml"'
Default: default_parser
Metadata
Metadata is an operator that will help you to enrich your log stream by adding labels and resources. Combining it with expr enables you to create dynamic metadata.
type: metadata
output: metadata_receiver
labels:
environment: 'EXPR( $.environment == "production" ? "prod" : "dev" )'
Restructure
Restructure is an operator that enables you to change the record of your logs by adding, dropping, and moving fields.
Restructure takes a list of operations specified with “ops”. It accepts the following operations: add, move, retain, remove, flatter.
Add and remove require no explanation, so let’s look at the other operations in more detail.
Retain is used to specify which fields you would like to keep; the rest will be removed
type: restructure
ops:
retain:
"key1"
"key2"
Input record
{
"key1": "val1",
"key2": "val2",
"key3": "val3",
"key4": "val4"
}
Output record
{
"key1": "val1",
"key2": "val2"
}
Move is used to rename fields.
Flatten will move fields having a JSON constant to the partner level.
type: restructure
ops:
flatten: "key1"
Input record
{
"key1": {
"nested1": "nestedval1",
"nested2": "nestedval2"
},
"key2": "val2"
}
Output record
{
"nested1": "nestedval1",
"nested2": "nestedval2",
"key2": "val2"
}
Host Metadata
To add to your log stream host information (IP address, hostname) as a resource of our log stream,
K8s_metadata_decorator
This operator adds labels and annotations to the log stream by interacting with the k8s API.
To get efficient results, you'll need to specify the namespace and the pod name.
You can customize the name of the fields containing your namespace or pod with:
-
1
Namespace_field (namespace)
-
2
Pod_name_field (defailt pod_name)
For example:
type: k8s_metadata_decorator
Input record
{
"timestamp": "",
"record": {
"namespace": "my-namespace",
"pod_name": "samplepod-6cdcf6bf9d-f4f9n"
}
}
Output record
{
"timestamp": "",
"labels": {
"k8s_ns_annotation/addonmanager.kubernetes.io/mode": "Reconcile",
"k8s_ns_annotation/control-plane": "true",
"k8s_ns_annotation/kubernetes.io/cluster-service": "true",
"k8s_ns_label/addonmanager.kubernetes.io/mode": "Reconcile",
"k8s_ns_label/control-plane": "true",
"k8s_ns_label/kubernetes.io/cluster-service": "true",
"k8s_pod_annotation/k8s-app": "dashboard-metrics-scraper",
"k8s_pod_annotation/pod-template-hash": "5f44bbb8b5",
"k8s_pod_label/k8s-app": "dashboard-metrics-scraper",
"k8s_pod_label/pod-template-hash": "5f44bbb8b5"
},
"record": {
"namespace": "my-namespace",
"pod_name": "samplepod-6cdcf6bf9d-f4f9n"
}
}
Add
Add new fields, resources, or labels to your log stream.
Copy
Copy is used to copy a value from a record to resources, labels, or records.
Output operators
Stanza also utilizes output operators to specify where the parsed logs should be forwarded. Here are the most common ones:
-
1
Stdout
-
2
File
-
3
Elasticsearch
-
4
Google Cloud Logging
-
5
Forward
-
6
Newrelic
What is a plugin in Stanza?
In Stanza, a plugin is a template containing a set of operators. You can create a separate YAML file for each of your templates.
This template is loaded in Stanza by either:
-
1
Adding the file to the <home stanza>/plugins
-
2
Or by using a folder of your choice with the argument: –plugin_dir
Many predefined plugins already exist, which you can find here: stanza-plugins/plugins at master · observIQ/stanza-plugins (github.com)
Let’s look at an example. Say we wanted to parse NGINX extended logs. Here’s what the pipeline would look like:
version: 0.0.15
title: Nginx
description: Log parser for Nginx
min_stanza_version: 0.13.12
supported_platforms:
- linux
- windows
- macos
- kubernetes
Parameters:
- name: path
label: path
description: path to nginx file
type: string
default: "/var/log/nginx/access.log*"
pipeline:
- type: file_input
Labels:
Log_type: nginx
include:
- {{ .path }}
- type: regex_parser
regex: '(?P<ip>\S+)\s+\[(?P<time_local>[^\]]*)\]\s+(?P<method>\S+)\s+(?P<request>\S+)\s+(?P<httpversion>\S*)\s+(?P<status>\S*)\s+(?P<bytes_sent>\S*)\s+(?P<responsetime>\S*)\s+(?P<proxy>\S*)\s+(?P<upstream_responsetime>\S*)\s+(?P<resourcename>\S*)\s+(?P<upstream_status>\S*)\s+(?P<ingress_name>\S*)\s+(?P<resource_type>\S*)\s+(?P<resource_namespace>\S*)\s+(?P<service>\w*)'
timestamp:
parse_from: time_local
layout: '%d/%b/%Y:%H:%M:%S %z'
severity:
parse_from: status
preserve_to: status
mapping:
error: "4xx"
info:
- min: 300
max: 399
debug: 200
output: {{ .output }}
Once loaded i can use the plugin:
Pipeline:
Type: nginx
Path: dedede.log
Type: stdout
So Stanza will first look if there is a pre-built plugin for NGINX. If not, it will look at the plugin folder and generate the new operator on the fly.
Stanza also provides a utility helping us to package our plugins into a Configmap for Kubernetes deployments: https://github.com/observIQ/st...
Plugins can be added in a configmap. This means you will have one configmap with your pipeline and one configmap for the extensions.
Tutorial
This tutorial will collect Kubernetes logs and push the modified log stream to Dynatrace.
For this tutorial, we will need:
-
1
A Kubernetes cluster
-
2
An NGINX ingress controller to help us route the traffic to the right services
-
3
A demo application, in our case the Google Hipster Shop
-
4
A Dynatrace tenant
-
5
A deployed local Dynatrace active gate to push our logs.
For this tutorial, I have created the Dynatrace output operator that will transform the log stream to the Dynatrace format and interact with the log ingest API.
We will create a simple pipeline based on a predesigned plugin that will collect the logs from the container, add the Kubernetes metadata, and then use the dynatrace_output operator.
Topics
Go Deeper
Go Deeper