Notes & Tools

A jq idiom for nested log parsing

I keep re-deriving this. Application logs in JSON-lines are great, but they nest deep, and jq calls turn into a wall of .foo.bar.baz?.

The trick is paths. Say I have:

{"req":{"id":"abc","headers":{"x-trace":"7af"}},"latency_ms":12}
{"req":{"id":"def","headers":{"x-trace":"8ab"}},"latency_ms":18}

To get every (id, x-trace, latency) tuple without typing the paths:

jq -r '[.req.id, .req.headers["x-trace"], .latency_ms] | @tsv'

That’s the obvious one. But when the schema drifts — say half the rows lose headers — the missing-key error stops the pipeline. The fix is ? on every step or // "-" after the lookup:

jq -r '[.req.id // "-", .req.headers["x-trace"]? // "-",
        .latency_ms // "-"] | @tsv'

Three idioms I lean on:

  1. to_entries[] to flatten a map of metrics into rows.
  2. select(.latency_ms > 100) for filtering after projection — reads like SQL.
  3. --slurpfile for joining two log streams by ID (jq -n).

There’s mlr (Miller) for tabular data, but for JSONL, jq plus column -t is still my default.