Input & UX

Handling Config Files and Env Vars in CLIs

Implement a deterministic config hierarchy in Python CLIs — merge env vars, dotfiles, and YAML/TOML configs with strict type safety and clear precedence rules.

Updated

A real CLI reads settings from several places at once: a command-line flag, an environment variable, a project config file checked into the repo, a user config in your home directory, and hard-coded defaults. The hard part is not reading any one source — it is deciding which wins when two of them disagree. This page gives you a single, deterministic precedence chain and a runnable merge function that resolves it.

TL;DR

  • Fix the precedence once and document it: CLI flags > env vars > project config > user config > defaults.
  • Merge low-to-high into one plain dict, then validate the result with a single Pydantic v2 model so types and unknown keys are checked in one place.
  • Put user config under the XDG base directory (~/.config/<app>/config.yaml), and look for a project config in the current tree.
  • Coerce strings (env vars are always strings) by letting Pydantic do the work — "5432" becomes 5432, "true" becomes True.

The precedence chain

Configuration precedence from highest to lowest — CLI flags, environment variables, project config file, user config file, then defaults; each source is consulted left to right and the first to define a value wins.

The single most important decision is the order. Higher-priority sources overwrite lower ones key by key. The rule of thumb: the closer a value is to the moment of invocation, the more it should win. A flag you typed this second beats an env var in your shell, which beats a file someone committed last month, which beats a file in your home directory, which beats the built-in default.

PrioritySourceExampleWhy it wins
1 (highest)CLI flag--port 9000Explicit, this invocation
2Env varMYCLI_PORT=9000Session/deploy scoped
3Project config./myapp.yamlPer-repo, shared with team
4User config~/.config/myapp/config.yamlPer-machine preference
5 (lowest)Defaultscode constantsFallback

Where config files live

Don't invent paths. On Linux and macOS, follow the XDG Base Directory spec: user config lives in $XDG_CONFIG_HOME (default ~/.config). The project config is whatever file you find walking up from the working directory. A small resolver keeps this honest:

from __future__ import annotations
from pathlib import Path
import os

APP = "myapp"

def user_config_path() -> Path:
    base = os.environ.get("XDG_CONFIG_HOME") or str(Path.home() / ".config")
    return Path(base) / APP / "config.yaml"

def project_config_path(start: Path | None = None) -> Path:
    return (start or Path.cwd()) / f"{APP}.yaml"

A runnable merge function

Merge each source into one dict in priority order (lowest first so higher overwrites), then validate once. Keeping validation at the end means every source — file, env, or flag — is checked against the same schema and coerced to the same types. This snippet runs as-is:

from __future__ import annotations
from pathlib import Path
import yaml
from pydantic import BaseModel, ConfigDict, ValidationError


class AppConfig(BaseModel):
    model_config = ConfigDict(extra="forbid")
    host: str = "localhost"
    port: int = 8000
    timeout: int = 10
    verbose: bool = False


DEFAULTS = {"host": "localhost", "port": 8000, "timeout": 10, "verbose": False}
ENV_MAP = {
    "MYCLI_HOST": "host",
    "MYCLI_PORT": "port",
    "MYCLI_TIMEOUT": "timeout",
    "MYCLI_VERBOSE": "verbose",
}


def _read_yaml(path: Path) -> dict:
    if not path.is_file():
        return {}
    data = yaml.safe_load(path.read_text(encoding="utf-8"))
    if data is None:
        return {}
    if not isinstance(data, dict):
        raise ValueError(f"{path}: top-level YAML must be a mapping")
    return data


def _from_env(environ: dict) -> dict:
    return {key: environ[name] for name, key in ENV_MAP.items() if name in environ}


def merge_config(user_file: Path, project_file: Path,
                 environ: dict, cli_flags: dict) -> AppConfig:
    """Precedence (low -> high): defaults < user < project < env < CLI."""
    merged: dict = {}
    merged.update(DEFAULTS)
    merged.update(_read_yaml(user_file))
    merged.update(_read_yaml(project_file))
    merged.update(_from_env(environ))
    merged.update({k: v for k, v in cli_flags.items() if v is not None})
    try:
        return AppConfig.model_validate(merged)  # coerces "3333" -> 3333, "true" -> True
    except ValidationError as exc:
        raise SystemExit(f"Bad merged config: {exc}")

Run it with a user file (host, port, timeout), a project file (host, port), env vars (MYCLI_PORT=3333, MYCLI_VERBOSE=true), and a single --host flag-host flag, and you get:

Final config: {'host': 'flag-host', 'port': 3333, 'timeout': 99, 'verbose': True}

host came from the flag, port from the env (overriding both files), timeout from the user file (no higher source set it), and verbose from the env — exactly the precedence table above.

Why merge-then-validate

Two patterns compete here. You could validate each source separately and then merge typed objects, but that forces every source to be complete and duplicates the schema. The merge-then-validate approach treats every layer as a partial dict, lets dict.update express precedence with zero ceremony, and runs one schema check on the final result. That single check is where type coercion and unknown-key rejection happen — see Advanced argument validation strategies for the validation patterns this leans on.

The one subtlety is type coercion. Environment variables are always strings, so MYCLI_PORT=3333 arrives as "3333". Pydantic v2 coerces it to int during model_validate, and "true"/"false" to bool. Because coercion happens after the merge, you never have to parse types by hand per-source. Set extra="forbid" so a typo'd key fails loudly instead of being silently ignored.

We deliberately don't use pydantic-settings here. Building the merge by hand keeps the precedence explicit and testable, and avoids a dependency you may not want in a small CLI.

Production notes

  • Deep merge for nested config. dict.update is shallow — a nested table in the project file replaces the whole nested table from the user file. If you need per-key merging inside nested mappings, recurse.
  • Boolean env vars. Pydantic accepts true/false/1/0/yes/no for bools. Document which strings your users should set.
  • Testing precedence. Make merge_config take environ and cli_flags as arguments (as above) rather than reading os.environ directly — that makes precedence trivial to unit-test with pytest.mark.parametrize.
  • TOML too. The pattern is identical for TOML; swap yaml.safe_load for tomllib.load (stdlib since 3.11). The merge and validation layers don't change.