Secrets of Scalable Logging Revealed: How to Use Fluentbit to Capture Hashicorp Nomad Logs Like a Pro?

Davinder Pal
DevOps.dev
Published in
6 min readDec 11, 2023

--

Nomad is a highly available, distributed, data-center-aware cluster and application scheduler designed to support the modern data centre with support for long-running services, batch jobs, and much more.

Before we dive into the solution, let’s first understand the challenge of logging in the Nomad. Nomad allows us to access the logs from all our jobs by capturing the standard output and standard error streams, as described: developer.hashicorp.com/nomad/docs/job-specification/logs
However, we also need to know where these logs are stored so that we can collect them later.

Nomad’s log rotation works by writing stdout/stderr output from tasks to a file inside the alloc/logs/ directory with the following format: <task-name>.<stdout/stderr>.<index>.

Now, how do we configure our log scraper or collector to scrape logs with meta-data like what task-name/job-name/alloc-id those are super important since, it will make parsing or sorting at the final step like in ES/Kibana a piece of cake. Also, most people need to catch up on the monitoring of Fluentbit. I will cover this as a bonus for all of my readers.

Please read all the articles mentioned in the reference section before you try something, otherwise, you will be part of the below meme.

reddit.com/r/ProgrammerHumor

One aspiring example is Filebeat: https://www.elastic.co/guide/en/beats/filebeat/current/add-nomad-metadata.html but given the license issue from Elastic, I don’t know if this is a viable option for you or not.

So, here is the OSS example achieving almost the same results as filebeat but using Fluentbit. Here is the diagram of the implementation.

Lucid Charts

The below-mentioned sample of the nomad fluentbit system job has several requirements and characteristics.
1. Fluentbit is installed during nomad worker image creation or bootup process via cloud data.
2. I am using Lua as it is a prescribed language instead of a regex-based approach ( feel free to explore regex one as well )
3. I am using this as a system job not as a sidecar to each job since it will add too many fluentbit processes to the whole system.
4. I am also not using docker since it adds an extra software stack on the worker image.
5. I do configure fluentbit to export its metrics on the 2020 port, please reconfigure as per your needs.

job "fluentbit" {
type = "system"
group "logger" {
task "fluentbit" {
driver = "raw_exec"
user = "root"
config {
command = "/opt/fluent-bit/bin/fluent-bit"
args = [
"-c", "${NOMAD_ALLOC_DIR}/fluent-main.conf",
"--port", "2020"
]
}
template {
source = file("files/fluent-main.conf")
destination = "${NOMAD_ALLOC_DIR}/fluent-main.conf"
}
template {
source = file("files/example.lua")
destination = "${NOMAD_ALLOC_DIR}/example.lua"
}
}
}
}

Now, let’s take a look at fluentbit configurations, it is quite simple from the first eye but the devil is inside the details. example.lua should be in the same directory as fluentbit configuration like NOMAD_ALLOC_DIR.

1. Path must be the same on all nodes aka nomad data dir on clients.
2. Tag should be used to separate different logs in case you want to process them separately.
3. Exclude_Path is used to make sure fluentbit doesn’t process its logs or any jobs.
4. An alias is required if you later configure monitoring since fluentbit uses this to generate input/filter/output metrics.
5. Path_Key is super important since we will be extracting useful information from it like task-name/allocation-id/log-type.
6. Filter 1 — record_modifier used to attach kv like metadata so that we can understand from which node/IP/etc. data is coming and used in filters.
7. Filter 1 — lua, this is super important as this will process our filename and create several useful fields like task-name/allocation-id/log-type. Also, I am not here to teach you lua and fluentbit integration, please the documentation mentioned in the references.
8. Output 1 — stdout, I am only showing stdout for now since ES upstream is blocked on this MR but still possible with one ES node.

[INPUT]
Name tail
Tag nomad
Path /data/nomad/alloc/*/alloc/logs/*.*.*
Exclude_Path *fluentbit*
Skip_Empty_Lines On
Path_Key filename
alias nomad-job-logs

[FILTER]
Name record_modifier
Match *
Record hostname ${HOSTNAME}
Record <extra-metadata-key> <value>

[FILTER]
Name lua
Match nomad*
script example.lua
call find_allocation_id
alias filter-allocation-id

[output]
Name stdout
Match *
alias output_stdout

Now, you must be curious to learn about the below-mentioned code, since it is just a function :). The Following things are super important
1. The name of the file since, it will be hardcoded in the fluentbit configuration.
2. The name of the function will also be hardcoded in the fluentbit configuration.
Let’s understand, what this function does.
0. The record/message is in dict format by default.
1. retcode — 2 means that we have modified the record but not its timestamp.
2. (Optional, I think) We make a copy of the record and extra filename key which was set by input.
3. we use regex to find allocation-id which is like this 0af996ed-aff4–8ddb-a566-e55ebf8969c9.
4. Once we know the allocation-id, we create a new field in the record
5. finally, we return the retcode, timestamp and new record.

function find_allocation_id(tag, timestamp, record)
retcode = 0
new_record = record
file = new_record["filename"]
if file ~= nil then
_alloc_pattern = "%w+-%w+-%w+-%w+-%w+"
allocation_id = string.sub(file, string.find(file, _alloc_pattern))
if allocation_id ~= nil then
new_record["nomad.allocation.id"] = allocation_id
retcode = 2
end
return retcode, timestamp, new_record
end

Example Drawbacks of the Approach

  1. It doesn’t capture Job metadata from Nomad like job/group name so, it will have fewer fields and it may be possible that a similar task name can cause issues in ES filtering at the final stage.
  2. Missing support of Upstream in Fluentbit for ES (Link in References).
  3. It is a general drawback for the system job, which is hard to debug at scale and has to process different formats of logs since one issue can break the whole system, luckily few ways to recover with nomad revert or fluentbit sending unprocessed logs.
  4. Lack of processing capabilities of fluentbit as compared to FluentD or filebeat or any other similar software.

Monitoring of the Fluentbit

docs.fluentbit.io

Since we will be running many many instances of fluentbit, I want to understand, how these instances are doing, whether is there a load on a given instance or if there are instances dropping logs and many more questions from the SRE perspective.

We can also add the following service block to the nomad job so that the consul can monitor instance health and if it degrades, the nomad can start an automatic recovery process. The beauty of consul integration is that fluentbit will be registered in the consul catalog.

# pseudo-code, please adjust accordingly, I am sorry :(
service {
port = "2020"
provider = "consul"
check {
type = "http"
path = "/api/v1/health"
interval = "10s"
timeout = "2s"
}

A sample Prometheus scraping config so that we can scrape all instances of fluentbit instead of manually adding all in Prometheus configuration, it does come at a cost that now, you would have to manage hashicorp consul clusters.

  scrape_configs:
- job_name: 'fluentbit'
metrics_path: '/api/v1/metrics/prometheus'
consul_sd_configs:
- server: '<consul-address>:8500'
services:
- 'fluentbit'
relabel_configs:
- source_labels: ['__meta_consul_node']
replacement: '$1.example.com'
target_label: instance

--

--