TLDR
⏱️
Using RFC5424-based formatter in syslog output from ESXi 8.0 build 22380479 U2 IA, the output seems to include an extra NILVALUE after the MSGID, breaking the STRUCTURED-DATA (SD) parts (being read as MSG instead). The PARAM-VALUE is also missing double quote characters, making it harder for parser to read (e.g. Promtail in Loki.
Background
I’m currently (ok, always) evaluating alternative (sys)log solutions. I’ve been slowly leaning towards Loki lately, in addition to existing Elastic and VMware vRealize Log Insight aka vRLI (guess it’s called now Aria-something, but I usually ignore marketing people, and focus on the technical bits). Since ESXi 8.x, VMware introduced some new goodies for us “log fanatics” (maybe just me), including improved standards, both for locally stored logs (scratch) and by forwarding logs using syslog (vmsyslogd). This was a great time for me to (re)evaluate using the improved RFC5424 (which obsoletes RFC3164). In turn, this led me down a little “troubleshooting session”, where main project (still) being researching ESXi 8 syslog to Loki best practices (which I might write more about later). Let’s solve some issues first.
Summary
- Running ESXi 8.0 build 22380479 U2 IA
- Configured syslog on ESXi to send as RFC5424 using
?formatter=RFC_5424
- Relevant example parts from
esxcli system syslog config get
Remote Host: tcp://loki-testing.esod.local:1514?formatter=RFC_5424
- Relevant example parts from
- 🔍 ESXi sends out syslog using an extra NILVALUE (where I’d expect the STRUCTURED-DATA to be)
- 🟨 UNEXPECTED:
<166>1 2023-09-27T14:47:53.655Z esx-13.esod.local vsansystem 1050126 - - [vSAN@6876 ...] MSG
(notice the extra NILVALUE before STRUCTURED-DATA) - 🟩 EXPECTED:
<166>1 2023-09-27T14:47:53.655Z esx-13.esod.local vsansystem 1050126 - [vSAN@6876 ...] MSG
(notice that the STRUCTURED-DATA is immediately after the NILVALUE for MSGID)
- 🟨 UNEXPECTED:
- 🔍 ESXi is (also) not using double quote characters around the PARAM-VALUE (guess this is not required, but recommended, as it makes it easier to parse the value(s)). Using double quotes is also included in most (working) examples I’ve seen, including in the RFC5424 itself, and from Loki (and promtail) documentation
- 🟨 UNEXPECTED:
vSAN@6876 sub=AdapterServer opId=08e02b68-33dd
(PARAM-VALUE missing double quote character) - 🟩 EXPECTED:
vSAN@6876 sub="AdapterServer" opId="08e02b68-33dd"
(PARAM-VALUE includes double quote character)
- 🟨 UNEXPECTED:
- The syslog receiver used is promtail 2.9.1 (Loki 2.9.1), but the issue should seen in any syslog platform trying to parse STRUCTURED-DATA, based on RFC5424.
- Examples on VALID messages, and the expected output from ESXi syslog
- Examples included in RFC5424
- e.g.
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] BOMAn applicationevent log entry...
- e.g.
- Examples included in Loki 2.9.1 syslogtarget_test.go, L621, L622 and L623.
- https://github.com/grafana/loki/blob/v2.9.1/clients/pkg/promtail/targets/syslog/syslogtarget_test.go#L620
- e.g.
<165>1 2018-10-11T22:14:15.003Z host5 e - id1 [custom@32473 exkey="1"] An application event log entry...
- Examples included in RFC5424
- Bedtime stories (..yes, I was awake until 2AM last night.. fine-reading details in multiple RFCs, trying to figure out the details)
- RFC 5424 - The Syslog Protocol
- VMware specific
- promtail and Loki specific
Troubleshooting notes (RAW)
Inspect: syslog packets sent from ESXi using tcpdump-uw
Using tcpdump-uw to view (unencrypted, for now..) syslog traffic to Loki (port 1514, tcp) with some “structured data” included
- filter (grep) on
6876
which is the IANA assigned enterpriseId number for VMware - example output from the command
tcpdump-uw port 1514 -An|grep 6876
, run on esx-13
Example syslog message
<166>1 2023-09-27T14:47:53.655Z esx-13.esod.local vsansystem 1050126 - - [vSAN@6876 sub=Libs] VsanInfoImpl: Refresh config generation
Finding: extra NILVALUE after the MSGID, causing STRUCTURED-DATA to be NILVALUE
Possibly extra NILVALUE after the MSGID (also NILVALUE), where I guess the STRUCTURED-DATA is suppose to be(?)
Unexpected structure
vsansystem 1050126 - - [vSAN@6876 sub=Libs] ^ ^ ^ ^ ^ app-name procid msgid ?NIL? structured data
Expected structure
vsansystem 1050126 - [vSAN@6876 sub=Libs] ^ ^ ^ ^ app-name procid msgid structured data
Example: on VALID STRUCTURED-DATA from RFC5424
Based on RFC5424,
6.5 Examples
>Example 3 - with STRUCTURED-DATA
; could it be an extra NILVALUE (hyphen) issue in log entries from VMware ESXi syslog?Example from https://www.rfc-editor.org/rfc/rfc5424#section-6.5 (see
Example 3 - with STRUCTURED-DATA
)<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] BOMAn applicationevent log entry...
Testing: syslog receiver with modified data to verify STRUCTURED-DATA
To verify issues, I’m using a simple echo and netcat (nc
) to send the original and modified syslog to the (very same) server
echo '<166>1 2023-09-27T14:47:53.655Z esx-dummy.esod.local vsansystem 1050126 - - [vSAN@6876 sub=Libs] VsanInfoImpl: Refresh config generation' | nc -v -w 0 loki-testing.esod.local 1514
Connection to loki-testing.esod.local port 1514 [tcp/fujitsu-dtcns] succeeded!
Results: ❌ Structured data is parsed as MSG (FAIL)
[
{
"line": "[vSAN@6876 sub=Libs] VsanInfoImpl: Refresh config generation",
"timestamp": "1695843921656941908",
"fields": {
"Line": "[vSAN@6876 sub=Libs] VsanInfoImpl: Refresh config generation",
"host": "esx-dummy.esod.local",
"job": "syslog",
"syslog_app_name": "vsansystem",
"syslog_connection_hostname": "archer.esod.local.",
"syslog_connection_ip_address": "192.168.1.99",
"syslog_facility": "local4",
"syslog_procid": "1050126",
"level": "informational"
}
}
]
Testing: to remove the extra NILVALUE (hyphen)
echo '<166>1 2023-09-27T14:47:53.655Z esx-dummy.esod.local vsansystem 1050126 - [vSAN@6876 sub=Libs] VsanInfoImpl: Refresh config generation' | nc -v -w 0 loki-testing.esod.local 1514
Connection to loki-testing.esod.local port 1514 [tcp/fujitsu-dtcns] succeeded!
Results: 🔍 Seems to work better, but additional (new) error detected
Notice we now get this (new) output from the promtail log (syslogtarget.go
)
level=warn ts=2023-09-27T19:49:25.88006848Z caller=syslogtarget.go:105 msg="error parsing syslog stream" err="expecting a structured data parameter (`key=\"value\"`, both part from 1 to max 32 US-ASCII characters; key cannot contain `=`, ` `, `]`, and `\"`, while value cannot contain `]`, backslash, and `\"` unless escaped) [col 89]"
So the error is “expecting a structured data parameter”, and the expected key=\"value\"
I guess the above error is thrown due to missing double quote characters around the PARAM-VALUE?
Testing: to use double quote characters around the PARAM-VALUE
e.g.
sub="Libs"
vs previoussub=Libs
echo '<166>1 2023-09-27T14:47:53.655Z esx-dummy.esod.local vsansystem 1050126 - [vSAN@6876 sub="Libs"] VsanInfoImpl: Refresh config generation' | nc -v -w 0 loki-testing.esod.local 1514
Connection to loki-testing.esod.local port 1514 [tcp/fujitsu-dtcns] succeeded!
Results: ✅ STRUCTURED-DATA now parsed successfully
Notice that the parser now successfully reads the STRUCTURED-DATA
- After removing the extra NILVALUE
- After surrounding PARAM-VALUE in double quote characters
- After correcting the source_labels for STRUCTURED-DATA to case-sensitive (promtail-part)
Verify: troubleshooting and theory
echo '<166>1 2023-09-27T14:47:53.655Z esx-dummy.esod.local vsansystem 1050126 - [vSAN@6876 sub="Libs" test="esod"] VsanInfoImpl: Refresh config generation' | nc -v -w 0 loki-testing.esod.local 1514
Connection to loki-testing.esod.local port 1514 [tcp/fujitsu-dtcns] succeeded!
The additional PARAM-NAME (
test
) is just for testing
The STRUCTURED-DATA is now being parsed successfully.
{ "line": "VsanInfoImpl: Refresh config generation", "timestamp": "1695845695416159983", "fields": { "Line": "VsanInfoImpl: Refresh config generation", "host": "esx-dummy.esod.local", "job": "syslog", "syslog_app_name": "vsansystem", "syslog_connection_hostname": "archer.esod.local.", "syslog_connection_ip_address": "192.168.1.99", "syslog_facility": "local4", "syslog_procid": "1050126", "syslog_sd_vsan_6876_sub": "Libs", "syslog_sd_vsan_6876_test": "esod", "level": "informational" } }
Related
For reference, this is my working scrape_configs
for syslog (using Promtail directly), including relevant relabel_configs
(in YAML) for STRUCTURED-DATA (which is not working if parsed as NIL)
Note that the STRUCTURED-DATA examples are based on output directly from ESXi syslog, as detected when viewing (unencrypted) traffic using tcpdump-uw - directly from ESXi sending the syslog.
scrape_configs:
- job_name: syslog
syslog:
listen_address: 0.0.0.0:1514
listen_protocol: tcp
idle_timeout: 60s
label_structured_data: yes
labels:
job: "syslog"
relabel_configs:
- source_labels: [__syslog_message_hostname]
target_label: host
- source_labels: [__syslog_message_severity]
target_label: level
- source_labels: [__syslog_message_app_name]
target_label: syslog_app_name
- source_labels: [__syslog_message_facility]
target_label: syslog_facility
- source_labels: [__syslog_message_proc_id]
target_label: syslog_procid
- source_labels: [__syslog_message_msg_id]
target_label: syslog_msgid
- source_labels: [__syslog_connection_ip_address]
target_label: syslog_connection_ip_address
- source_labels: [__syslog_connection_hostname]
target_label: syslog_connection_hostname
- source_labels: [__syslog_message_sd_Originator_6876_sub]
target_label: syslog_sd_originator_6876_sub
- source_labels: [__syslog_message_sd_vSAN_6876_sub]
target_label: syslog_sd_vsan_6876_sub
And while testing PARAM values, I created this name (temporarily)
- source_labels: [__syslog_message_sd_vSAN_6876_test]
target_label: syslog_sd_vsan_6876_test