Input & UX

Error Handling and Exit Codes for CLIs

Design predictable Python CLI failure: meaningful exit codes, clean error messages instead of tracebacks, and errors that scripts and CI can detect.

Updated

A command-line tool is judged as much by how it fails as by how it succeeds. When something goes wrong, a script piping into your CLI, a CI job gating a deploy, or a human at 2 a.m. all need the same three things: a truthful exit code, a message that says what to do next, and no wall of Python internals. This overview shows how to design failure on purpose — classifying errors, keeping stdout and stderr honest, and installing one error boundary so every command exits cleanly.

TL;DR

  • Exit code 0 means success; anything non-zero means failure. That number is the only part of your output a shell reads, so treat it as your CLI's real API.
  • Sort every failure into three buckets: usage errors (bad invocation, exit 2), expected runtime errors (file missing, network down — exit 1 or a specific code), and unexpected bugs (exit 1, print a traceback only under --debug).
  • Send results to stdout, send diagnostics and errors to stderr, so mytool | jq never chokes on a warning.
  • Signal failure with the right tool: raise SystemExit(code), raise typer.Exit(code=...), or raise click.ClickException(msg) — not print() plus sys.exit.
  • Wrap main() in one top-level error boundary that maps exceptions to messages and codes, so no command leaks a raw traceback.
Exit codes: the CLI's contract with scripts Exit codes: the CLI's contract with scripts your CLI runs success — result on stdout exit 0 expected error — message on stderr exit 1–2 unexpected bug — traceback only if --debug exit ≥ 70 0 means success; every non-zero code tells a script exactly how it failed

Exit codes are the CLI's API for scripts

Humans read your error text. Machines read your exit code, and nothing else. The moment your tool is used in a pipeline — mytool build && mytool deploy, a Makefile target, a GitHub Actions step — the surrounding shell decides what happens next purely from $?, the status of the last command.

$ mytool deploy --env prod
$ echo $?
0

That && chain only advances when the left side exits 0. If your tool prints Error: could not connect to the screen but still exits 0, the deploy proceeds on a lie. Getting the number right is not a nicety; it is the contract every automation depends on.

A minimal, honest program looks like this:

import sys

def main() -> int:
    if not config_exists():
        print("error: no config found; run 'mytool init' first", file=sys.stderr)
        return 1
    run()
    return 0

if __name__ == "__main__":
    sys.exit(main())

Returning an int from main() and handing it to sys.exit() keeps the exit logic in one place and makes the function trivially testable. Which specific numbers to use — and when the richer sysexits.h codes earn their keep — is its own topic; see choosing exit codes for CLI tools.

A failure taxonomy: usage, expected, unexpected

Every failure your CLI can hit falls into one of three categories, and each wants different handling.

Usage errors are the caller's fault at the invocation level: an unknown flag, a missing required argument, a value that fails validation. The user needs to fix the command line and retry. Both argparse and Click already exit 2 for these and print a short usage hint, so match that convention. Argument-level validation belongs here — see advanced argument validation strategies for turning a ValidationError into a clean exit-2 message.

Expected runtime errors are conditions your code anticipates but cannot prevent: a file that isn't there, a network timeout, a permission denied, an API returning 409. These are not bugs — they are the world being the world. Catch them, print a one-line explanation, and exit non-zero (commonly 1, or a specific code so scripts can branch).

Unexpected bugs are the case you did not foresee: an AttributeError, a KeyError deep in your own logic. Here a traceback is genuinely useful — but only to whoever can fix the code. For everyone else it is noise that hides the real message. The answer is to keep the traceback available behind a flag and show a calm one-liner by default.

class ExpectedError(Exception):
    """A failure we anticipated; message is safe to show the user."""

def load_project(path: str) -> dict:
    try:
        with open(path, encoding="utf-8") as fh:
            return parse(fh.read())
    except FileNotFoundError:
        raise ExpectedError(f"project file not found: {path}")
    except PermissionError:
        raise ExpectedError(f"cannot read {path}: permission denied")

The discipline is to raise your own ExpectedError for the anticipated cases and let everything else bubble up as a genuine bug. The friendly error messages and tracebacks guide builds this into a full boundary.

Keep stdout and stderr honest

The single most common CLI hygiene bug is writing errors to stdout. Stdout is for your tool's output — the JSON, the table, the value another program will consume. Stderr is for everything else: errors, warnings, progress, prompts.

import sys

print(json.dumps(result))                      # data → stdout
print("warning: cache stale, refetching", file=sys.stderr)  # noise → stderr

Why it matters: users pipe your data into other tools. If a warning lands on stdout, mytool export | jq . fails to parse because a human sentence is now sitting in the middle of the JSON stream. Keeping diagnostics on stderr means the pipe stays clean while the human still sees the message on their terminal. This separation also lets someone run mytool export > out.json and still watch progress and errors scroll past live. Structured diagnostics deserve the same care; the structured logging for CLI apps section covers routing logs to stderr so they never pollute your data channel.

Signalling failure: SystemExit, Exit, and ClickException

Python gives you several ways to end a program, and the difference matters.

sys.exit(code) raises SystemExit, which unwinds the stack (running finally blocks and context managers) before the interpreter exits with code. Because it is an exception, a stray except Exception: can swallow it — so catch Exception, never bare except:, if you want exits to work.

In Click, prefer raise click.ClickException(message). Click catches it, prints Error: message to stderr, and exits 1 automatically — no manual sys.exit needed:

import click

@click.command()
@click.argument("name")
def greet(name: str) -> None:
    if not name.isascii():
        raise click.ClickException("name must be ASCII")
    click.echo(f"hello {name}")

Subclass ClickException and override exit_code to change the number, or raise click.UsageError to get exit 2 with a usage hint.

In Typer, raise typer.Exit(code=1) ends the command with that code, and typer.BadParameter gives you the usage-error path. Typer and Click share the same underlying machinery, so the mental model carries across both. If you are choosing between the frameworks, Typer vs Click compares them head to head.

One top-level error boundary

Tie it together with a single boundary around your entry point. Every command flows through it, so no individual command has to remember to handle failure:

import sys

def main(argv: list[str] | None = None) -> int:
    args = parse_args(argv)
    try:
        run(args)
        return 0
    except ExpectedError as exc:
        print(f"error: {exc}", file=sys.stderr)
        return 1
    except KeyboardInterrupt:
        print("aborted", file=sys.stderr)
        return 130          # 128 + SIGINT(2)
    except Exception as exc:         # a real bug
        if args.debug:
            raise                    # full traceback for developers
        print(f"internal error: {exc} (run with --debug for details)", file=sys.stderr)
        return 1

if __name__ == "__main__":
    sys.exit(main())

Three things earn their place here: ExpectedError becomes a tidy one-liner, KeyboardInterrupt exits 130 (the shell convention for a Ctrl-C'd process) instead of dumping a traceback, and genuine bugs stay quiet unless --debug is set. Click and Typer give you most of this for free — but a hand-rolled argparse CLI needs the boundary written out, and even framework apps benefit from wrapping unexpected exceptions.

Production notes

  • Test the number, not just the text. Assert exit codes in CI. Click's CliRunner exposes result.exit_code; for a subprocess, check completed.returncode. A tool that prints the right error but exits 0 will silently break pipelines.
  • finally still runs on SystemExit. Temp-file cleanup and lock release in a finally block or context manager execute during a normal sys.exit, but are skipped on os._exit() — never use the latter to bail out.
  • Broken pipes. When a downstream consumer closes early (mytool | head), Python may raise BrokenPipeError. Catch it near your boundary and exit quietly rather than printing a traceback.
  • Windows. Exit codes are 32-bit there and signal-based codes like 130 are a Unix convention; keep your meaningful codes in the 0125 range for portability.
  • Never exceed 255. Exit codes wrap modulo 256, so sys.exit(256) becomes 0 — a silent success. Keep custom codes small and reserved values clear.