Choosing Exit Codes for CLI Tools

Every time your CLI exits, it hands the shell a single integer. That number is the only thing &&, ||, set -e, and CI gates ever look at — the text you printed is invisible to them. This guide covers which numbers to use: the 0/1/2 baseline, the sysexits.h set and when it earns its keep, the reserved range above 125, and how to model your codes as one IntEnum you can document and test.

TL;DR

0 is success. Everything non-zero is failure. Never exit 0 on an error path.
1 is a general error; 2 is a usage error (bad flags/args) — argparse and Click already use 2 for this, so match them.
The sysexits.h codes (EX_USAGE 64, EX_DATAERR 65, …) give richer semantics. Adopt them only if your callers actually branch on them; otherwise they add noise.
Values 126, 127, 128, and 128+N are reserved by the shell (not executable, not found, killed by signal N). Stay out of that range.
Define your codes once as an IntEnum, document them in --help or the man page, and assert them in tests with CliRunner or subprocess.

Start with 0, 1, and 2

The bedrock convention is older than Python and universally understood:

0 — success. The command did what was asked.
1 — a general, catch-all failure. Something went wrong and you have nothing more specific to say.
2 — a usage error: an unknown option, a missing required argument, a malformed value. The user needs to fix the command line, not the world.

That 2-means-usage split is not arbitrary. Both argparse and Click exit 2 when parsing fails, so if you invent your own error handling you should keep 2 reserved for the same meaning. Otherwise a script that treats 2 as "retry with different args" gets confused when your tool returns 2 for a network error.

$ mytool deploy --nonsuch
Usage: mytool deploy [OPTIONS]
Try 'mytool deploy --help' for help.

Error: No such option: --nonsuch
$ echo $?
2

For a great many tools, 0/1/2 is the entire vocabulary you need. Reach for more only when a caller genuinely needs to distinguish failure modes programmatically.

When distinct failure modes earn distinct codes

Sometimes "it failed" is not enough. A backup tool might want CI to retry on a transient network failure but hard-stop on corrupted data. That is a real reason to hand out different numbers:

raise SystemExit(3)   # network unreachable — safe to retry
raise SystemExit(4)   # data integrity check failed — do NOT retry

Now a wrapper script can branch:

mytool backup
case $? in
  0) echo "ok" ;;
  3) echo "transient — retrying"; retry ;;
  4) echo "corruption — paging oncall"; page ;;
  *) echo "unknown failure"; exit 1 ;;
esac

The test for whether a custom code is worth it is simple: will a caller ever behave differently because of it? If yes, define it. If the only consumer is a human reading the message, a plain 1 with good error text is enough — see friendly error messages and tracebacks for making that text actionable.

The sysexits.h codes

BSD's <sysexits.h> defines a standard set of codes in the 64–78 range, meant to give failures a shared vocabulary across tools:

Code	Name	Meaning
64	`EX_USAGE`	Command used incorrectly (bad args)
65	`EX_DATAERR`	Input data was incorrect
66	`EX_NOINPUT`	An input file did not exist or was unreadable
69	`EX_UNAVAILABLE`	A required service is unavailable
70	`EX_SOFTWARE`	Internal software error
73	`EX_CANTCREAT`	Cannot create an output file
74	`EX_IOERR`	An I/O error occurred
77	`EX_NOPERM`	Permission denied
78	`EX_CONFIG`	Something is misconfigured

Python does not ship these as constants, so define the ones you use:

EX_USAGE = 64
EX_DATAERR = 65
EX_NOINPUT = 66
EX_CONFIG = 78

When to bother: adopt sysexits.h if your tool lives in an ecosystem that already reads them — mail delivery agents, some init systems, tools invoked by xargs pipelines that inspect specific codes. For a typical developer CLI, most callers only distinguish 0 from non-zero, and the 64–78 numbers are more obscure than a documented 1/2/3 scheme of your own. Consistency and documentation beat conformance to a table nobody reads. Pick one convention and hold it across every subcommand.

Reserved values above 125

Some codes are not yours to assign — the shell claims them, and reusing them creates ambiguity:

126 — the command was found but is not executable (permission problem).
127 — command not found.
128 — invalid argument to exit (e.g. a non-integer).
128 + N — the process was killed by signal N. So 130 = 128 + 2 (SIGINT, a Ctrl-C), 137 = 128 + 9 (SIGKILL), 143 = 128 + 15 (SIGTERM).
Above 255 — impossible. Exit status is 8 bits, so codes wrap modulo 256: exit(256) reports as 0, exit(257) as 1. A code over 255 is a silent bug.

The practical rule: keep your own meaningful codes in the 1–125 range. If a caller sees 130, they should be able to conclude "someone hit Ctrl-C," not "the backup's canary check failed." Matching Ctrl-C to 130 is a nicety worth implementing:

except KeyboardInterrupt:
    raise SystemExit(130)   # 128 + SIGINT

One IntEnum as the single source of truth

Scattering magic numbers like raise SystemExit(4) across a codebase is how a 4 comes to mean two different things in two commands. Centralize them:

from enum import IntEnum

class ExitCode(IntEnum):
    OK = 0
    ERROR = 1          # general failure
    USAGE = 2          # bad invocation
    NETWORK = 3        # transient, retryable
    DATA = 4           # corrupt input, do not retry
    CONFIG = 5         # misconfiguration

    def __str__(self) -> str:            # so f-strings show the number
        return str(self.value)

Because IntEnum is an int, you can hand it straight to sys.exit or return it from main:

import sys

def main() -> ExitCode:
    if not config_ok():
        print("error: config invalid; see 'mytool config --check'", file=sys.stderr)
        return ExitCode.CONFIG
    if not reachable():
        print("error: registry unreachable", file=sys.stderr)
        return ExitCode.NETWORK
    do_work()
    return ExitCode.OK

if __name__ == "__main__":
    sys.exit(main())      # IntEnum → int, cleanly

The enum becomes the one place you look to answer "what does 5 mean?" — and the names make the call sites self-documenting. This pairs naturally with the top-level error boundary described in the error handling and exit codes overview, where each caught exception maps to one ExitCode.

Documenting your codes

An exit code nobody can look up might as well be random. Publish the table where callers will find it — in --help epilog text, a man page, or the README:

import click

EPILOG = """\
Exit codes:
  0  success
  1  general error
  2  usage error
  3  network error (retryable)
  4  data error (do not retry)
  5  configuration error
"""

@click.command(epilog=EPILOG)
def cli() -> None:
    ...

Keeping the table next to the IntEnum — ideally generated from it — means the docs cannot drift from the code.

Testing exit codes

An exit code is a promise; test it like one. With Click's CliRunner you get the code without spawning a process:

from click.testing import CliRunner
from mytool.cli import cli

def test_bad_flag_is_usage_error() -> None:
    result = CliRunner().invoke(cli, ["--nonsuch"])
    assert result.exit_code == 2

def test_missing_config_returns_config_code() -> None:
    result = CliRunner().invoke(cli, ["run"])
    assert result.exit_code == ExitCode.CONFIG

For an end-to-end check that includes your real entry point and sys.exit, drive it as a subprocess:

import subprocess

def test_network_failure_exit_code() -> None:
    proc = subprocess.run(
        ["mytool", "backup", "--registry", "http://127.0.0.1:1"],
        capture_output=True,
        text=True,
    )
    assert proc.returncode == 3
    assert "unreachable" in proc.stderr

Note the second test also asserts the message went to stderr, not stdout — the two promises a good failure keeps. For a broader look at exercising a CLI in tests, the entry-point mechanics are covered in best practices for Python CLI entry points.

Production notes

Wrapping bites silently. sys.exit(256) exits 0. If codes are computed, clamp or assert they stay in 0–125.
Click and Typer own some codes. Click exits 2 on parse errors and 1 for ClickException; Abort exits 1. Don't reassign those meanings in the same app.
set -e and pipelines. In a | b, the shell reports b's code by default. Callers who need a's status use set -o pipefail — document that your meaningful codes may be masked mid-pipe.
Cross-platform. Signal-derived codes (130, 143) are a Unix convention; Windows reports different values. Keep the codes you rely on in the low range for portability.
CI gates. Most CI systems treat any non-zero as a failed step. If you use codes like 3 for "retryable," make sure the wrapper — not the raw CI step — is what interprets them.

Up: Error handling and exit codes for CLIs
Sideways: Friendly error messages and tracebacks
Sideways: Advanced argument validation strategies
Related: Best practices for Python CLI entry points