Input & UX

Structured JSON Logging in Python CLIs

Emit machine-readable JSON logs from a Python CLI with structlog or a custom formatter, add context fields, and keep human-friendly output for terminals.

Updated

When your CLI runs in CI, under a service manager, or inside a container, its logs are read by machines before humans: a collector ships them, jq filters them, a query indexes them. Free-form text like INFO connected to db in 0.3s fights every one of those tools. This page shows how to emit one JSON object per log line — with a stdlib formatter or structlog — attach context fields such as a request ID, and still fall back to pretty console output when a human is watching.

TL;DR

  • One JSON object per line (JSON Lines) makes logs greppable, jq-able, and ready for any log pipeline.
  • The zero-dependency route is a custom logging.Formatter whose format() returns json.dumps(...).
  • The ergonomic route is structlog: composable processors, add_log_level, an ISO timestamp, and JSONRenderer at the end.
  • Bind context once (log = log.bind(request_id=...)) and every later record carries it automatically.
  • Switch between JSON and a console renderer based on stderr.isatty() or an explicit --log-format flag.

Why JSON logs at all

Structured logs turn "search the text" into "query the fields." Once each line is an object with level, event, timestamp, and your own keys, you can answer operational questions with standard tools instead of fragile regexes:

$ mycli sync --log-format json 2>logs.jsonl
$ jq -c 'select(.level=="error")' logs.jsonl        # only errors
$ jq -r 'select(.request_id=="r-42") | .event' logs.jsonl   # one request's trail

Text logs can't do that reliably — a message format change breaks the grep. JSON logs are also what platforms like CloudWatch, Loki, and the systemd journal expect for automatic field extraction. The cost is readability for a human at a terminal, which is exactly why you keep a console renderer for interactive runs and reserve JSON for when output is redirected.

The stdlib route: a custom JSON formatter

You don't need a dependency to emit JSON. A logging.Formatter subclass that serializes the LogRecord is enough, and it slots into the same handler setup any CLI already has:

from __future__ import annotations
import datetime as dt
import json
import logging

# Attributes the stdlib puts on every LogRecord; anything else is user context.
_RESERVED = set(logging.makeLogRecord({}).__dict__)


class JsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        payload = {
            "timestamp": dt.datetime.fromtimestamp(
                record.created, tz=dt.timezone.utc
            ).isoformat(),
            "level": record.levelname,
            "logger": record.name,
            "event": record.getMessage(),
        }
        # Merge any fields passed via logging's `extra=` argument.
        for key, value in record.__dict__.items():
            if key not in _RESERVED and not key.startswith("_"):
                payload[key] = value
        if record.exc_info:
            payload["exc_info"] = self.formatException(record.exc_info)
        return json.dumps(payload, default=str)

Wire it to a stderr handler and log with extra= to attach fields:

import logging
import sys

handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(JsonFormatter())
logging.basicConfig(level=logging.INFO, handlers=[handler])

log = logging.getLogger("mycli")
log.info("sync complete", extra={"request_id": "r-42", "rows": 128})
{"timestamp": "2026-07-05T12:00:00+00:00", "level": "INFO", "logger": "mycli", "event": "sync complete", "request_id": "r-42", "rows": 128}

The _RESERVED trick is what makes extra= work cleanly: it computes the set of attributes a bare record already has, so anything else on the record must be a field you added. default=str keeps json.dumps from crashing on a Path or datetime value someone logs.

The structlog route: processors and renderers

structlog is worth the dependency once you want context binding and a clean pipeline. You compose a list of processors — small functions that each mutate the event dict — ending in a renderer that turns the dict into a string. For JSON that's JSONRenderer; for humans it's ConsoleRenderer.

import logging
import structlog

def configure_structlog(json_logs: bool) -> None:
    shared = [
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso", utc=True),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
    ]
    renderer = (
        structlog.processors.JSONRenderer()
        if json_logs
        else structlog.dev.ConsoleRenderer(colors=True)
    )
    structlog.configure(
        processors=[*shared, renderer],
        wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        logger_factory=structlog.PrintLoggerFactory(),
        cache_logger_on_first_use=True,
    )

log = structlog.get_logger("mycli")
log.info("sync complete", request_id="r-42", rows=128)

With json_logs=True you get the same one-object-per-line output as the stdlib formatter, but the pipeline is declarative: add_log_level injects level, TimeStamper injects an ISO timestamp, and format_exc_info renders exceptions. Flip the flag and the identical call sites render as colorized key=value lines for a developer. Note that context fields are keyword arguments here — no extra= wrapper — which is the main ergonomic win over the stdlib.

Binding context so every line carries it

The reason to log structurally is context: you want every record within an operation to carry the same request_id, user, or command without repeating it at each call. bind() returns a new logger with those fields baked in:

log = structlog.get_logger("mycli").bind(request_id="r-42", command="sync")
log.info("started")                      # includes request_id + command
log.info("fetched", rows=128)            # includes them too, plus rows
log.warning("retrying", attempt=2)       # still carried

Every line inherits request_id and command. For fields that should span function boundaries without threading a logger object through every call, use structlog.contextvars.bind_contextvars(request_id="r-42") at the top of your command; the merge_contextvars processor folds them into every record on the current context — which is exactly how you'd stamp one ID across an entire CLI invocation set up through a Click context object.

Switching JSON on and off

Autodetect the common case, then let a flag win. Emit console output when stderr is a real terminal and JSON otherwise (pipes, CI, systemd), with --log-format as an explicit override:

import sys
import click

@click.command()
@click.option("--log-format", type=click.Choice(["auto", "json", "console"]),
              default="auto")
def main(log_format: str) -> None:
    if log_format == "auto":
        json_logs = not sys.stderr.isatty()   # redirected -> JSON
    else:
        json_logs = log_format == "json"
    configure_structlog(json_logs=json_logs)
    structlog.get_logger("mycli").info("ready", log_format=log_format)

This pairs naturally with verbosity: the verbose and quiet flags guide controls how much is logged while --log-format controls how it's rendered. The two are orthogonal knobs on the same logger, and both are introduced in the structured logging overview.

Routing library logs through the same renderer

Your own logs are only half the story. The HTTP client, database driver, and other dependencies your CLI pulls in all log through the stdlib logging module — and by default those records bypass structlog entirely, landing as unformatted text amid your clean JSON. Wire structlog's ProcessorFormatter into a stdlib handler so every record, yours and theirs, exits as one consistent JSON stream:

import logging
import structlog

def unify_logging(json_logs: bool) -> None:
    renderer = (
        structlog.processors.JSONRenderer()
        if json_logs
        else structlog.dev.ConsoleRenderer(colors=True)
    )
    formatter = structlog.stdlib.ProcessorFormatter(
        processor=renderer,
        foreign_pre_chain=[            # applied to records from stdlib loggers
            structlog.processors.add_log_level,
            structlog.processors.TimeStamper(fmt="iso", utc=True),
        ],
    )
    handler = logging.StreamHandler()          # stderr
    handler.setFormatter(formatter)
    root = logging.getLogger()
    root.handlers[:] = [handler]
    root.setLevel(logging.INFO)

foreign_pre_chain is the key: it runs the timestamp and level processors against records that originated in the stdlib (a requests or urllib3 logger, say) so they carry the same fields as your structlog events. The result is a single JSON stream a collector can parse without special-casing which library emitted a line. This is also where verbosity and format meet: the level you set here comes from the verbose and quiet flags, and it applies to library logs too — a well-behaved reason to keep the default at WARNING so a chatty dependency doesn't drown your output.

Testing captured JSON

Because each line is a JSON object, tests assert on parsed structure instead of substrings — far less brittle than matching formatted text. Capture stderr, parse each line, and check the fields:

import json
import logging

def test_json_formatter_emits_fields(caplog):
    handler = logging.StreamHandler()
    handler.setFormatter(JsonFormatter())
    record = logging.makeLogRecord({
        "name": "mycli", "levelname": "INFO", "levelno": logging.INFO,
        "msg": "done", "request_id": "r-1",
    })
    line = handler.format(record)
    obj = json.loads(line)
    assert obj["event"] == "done"
    assert obj["level"] == "INFO"
    assert obj["request_id"] == "r-1"

For structlog, use structlog.testing.capture_logs() to collect emitted event dicts directly, so you can assert on request_id and event without touching the renderer at all.

Production notes

  • JSON Lines, not a JSON array. Emit one object per line and never wrap the whole stream in [...]. Streaming consumers read line by line and can't wait for a closing bracket.
  • Still log to stderr. JSON is a rendering choice, not a routing one. Keep diagnostics on stderr so stdout stays clean for a tool's actual result.
  • Pin structlog. APIs shift between majors; pin structlog>=24 and test after upgrades. The stdlib formatter has no such risk if you want zero moving parts.
  • Don't log secrets. Structured fields make logs easy to index — and easy to leak. Add a processor that drops or masks keys like password and token before the renderer.
  • UTC timestamps. Log in UTC ISO-8601 (TimeStamper(utc=True)); mixed local zones make cross-machine correlation miserable.