Input & UX

Loading YAML configs safely in CLI apps

Load and validate YAML configuration files safely in Python CLI apps using pyyaml with Pydantic schema validation and secure load() best practices.

Updated

YAML is the default config format for most CLIs because it is readable and supports nesting. But the default way to parse it in Python — yaml.load() — can execute arbitrary code embedded in the file. If your CLI reads a config that a user, a CI system, or a teammate can write, parsing it unsafely turns a config file into a remote-code-execution vector. This page shows the one-line fix and the validation layer that turns a raw dict into a checked, typed config object.

TL;DR

  • Always call yaml.safe_load(). Never yaml.load(), yaml.FullLoader, or yaml.UnsafeLoader on files you didn't generate.
  • safe_load refuses the YAML tags (!!python/object/...) that construct arbitrary Python objects — so a malicious file can't run code through your parser.
  • A parsed dict is still untrusted data: validate it against a Pydantic v2 model with extra="forbid" to catch wrong types, missing required keys, and typo'd extra keys.
  • Turn Pydantic's ValidationError into a short, line-oriented message so the user knows exactly what to fix.

Two paths from config.yaml: safe_load yields a plain dict for Pydantic validation, while yaml.load/FullLoader can construct arbitrary objects and risks code execution.

Why safe_load, and what load actually does

PyYAML's full loader implements YAML tags that construct native Python objects. The tag !!python/object/apply:os.system tells the loader to call os.system with the given argument while parsing. So this is a valid YAML document:

# DO NOT load this with yaml.load()
!!python/object/apply:os.system ['echo pwned']

yaml.load(text) — and FullLoader/UnsafeLoader — will execute that os.system call. The attacker doesn't need your code to do anything; the parse step itself is the exploit. This is the YAML analogue of pickle.loads on untrusted input. CVE history is full of CLIs that shipped yaml.load on user config.

yaml.safe_load() uses SafeLoader, which only constructs standard scalars and containers — strings, ints, floats, bools, None, lists, and dicts. Encounter a !!python/... tag and it raises ConstructorError instead of executing anything. Here is the difference, runnable:

import yaml

danger = "!!python/object/apply:os.system ['echo pwned']\n"
try:
    yaml.safe_load(danger)
except yaml.YAMLError as e:
    print("safe_load refused unsafe tag:", type(e).__name__)
# -> safe_load refused unsafe tag: ConstructorError

The rule is absolute: for any file outside your own build process, use safe_load. There is no flag combination that makes the unsafe loaders acceptable for user-supplied YAML.

Loading and validating: the complete pattern

Parsing safely gets you a dict, but that dict is still arbitrary: it might be missing a required key, have port: "five thousand", or contain a typo like prot:. Push it through a schema. This program runs end-to-end and is the pattern you should copy:

from __future__ import annotations
from pathlib import Path
import yaml
from pydantic import BaseModel, ConfigDict, ValidationError


class Settings(BaseModel):
    model_config = ConfigDict(extra="forbid")  # reject unknown keys
    host: str                                   # required
    port: int = 5432
    timeout: int = 10
    retries: int = 3
    verbose: bool = False
    tags: list[str] = []


def load_yaml(path: Path) -> dict:
    text = path.read_text(encoding="utf-8")
    data = yaml.safe_load(text)            # SAFE: never yaml.load
    if data is None:                       # empty file -> empty config
        return {}
    if not isinstance(data, dict):
        raise ValueError(
            f"{path}: top-level YAML must be a mapping, got {type(data).__name__}"
        )
    return data


def load_settings(path: Path) -> Settings:
    raw = load_yaml(path)
    try:
        return Settings.model_validate(raw)
    except ValidationError as exc:
        lines = [f"Invalid configuration in {path}:"]
        for err in exc.errors():
            loc = ".".join(str(p) for p in err["loc"]) or "(root)"
            lines.append(f"  - {loc}: {err['msg']}")
        raise SystemExit("\n".join(lines))

Given this sample.yaml:

host: db.internal
port: 5432
timeout: 30
retries: 3
verbose: true
tags:
  - prod
  - east

load_settings(Path("sample.yaml")) returns a fully typed object:

Loaded & validated: {'host': 'db.internal', 'port': 5432, 'timeout': 30,
 'retries': 3, 'verbose': True, 'tags': ['prod', 'east']}

Handling missing, extra, and wrong keys

The schema decides three failure modes, and each should produce a clear message rather than a stack trace:

  • Missing required keyhost has no default, so a file without it fails validation with Field required.
  • Extra/typo'd keyextra="forbid" rejects unknown keys. A file with prot: 5432 (typo for port) fails instead of silently dropping the value. This is the single most valuable guard: silent extra-key drops are how users spend an hour wondering why their setting "isn't working."
  • Wrong typeport: "abc" fails because it can't coerce to int. (Note Pydantic will coerce "5432"5432, which is what you want for values that arrive as strings.)

The load_settings error handler above flattens ValidationError.errors() into one line per problem. Feed it a bad.yaml containing a typo:

host: x
prot: 5432

and you get an actionable message, then a non-zero exit:

Invalid configuration in bad.yaml:
  - prot: Extra inputs are not permitted

That is the difference between a CLI a user can debug and one that dumps a 30-line Pydantic traceback at them.

Loading from the right path

Read config from a predictable, documented location, not the current directory by accident. Resolve the path explicitly and handle "file absent" as "use defaults," not as an error:

def load_config(path: Path) -> Settings:
    if not path.is_file():
        return Settings(host="localhost")  # documented fallback
    return load_settings(path)

For where these paths should be — user config under ~/.config/<app>/, project config in the repo — and how to combine a YAML file with environment variables and flags, see the precedence rules in the parent hub. The merge there feeds the same safe_load + Pydantic pipeline you built on this page.

Production notes

  • safe_load only, every time. Wrap parsing in one helper (like load_yaml above) so no caller can ever reach for yaml.load. A grep for yaml.load( in CI catches regressions.
  • Audit your plugins. A plugin that parses its own YAML with yaml.load reopens the hole you closed. Hold third-party config readers to the same rule.
  • Catch yaml.YAMLError too. Malformed YAML (bad indentation, duplicate keys) raises YAMLError from safe_load before validation runs. Catch it and report the file and line.
  • Don't trust nesting depth. Deeply nested or huge YAML can be a denial-of-service vector. For files from fully untrusted sources, cap file size before parsing.