YAML is the default config format for most CLIs because it is readable and supports nesting. But the default way to parse it in Python — yaml.load() — can execute arbitrary code embedded in the file. If your CLI reads a config that a user, a CI system, or a teammate can write, parsing it unsafely turns a config file into a remote-code-execution vector. This page shows the one-line fix and the validation layer that turns a raw dict into a checked, typed config object.
TL;DR
- Always call
yaml.safe_load(). Neveryaml.load(),yaml.FullLoader, oryaml.UnsafeLoaderon files you didn't generate. safe_loadrefuses the YAML tags (!!python/object/...) that construct arbitrary Python objects — so a malicious file can't run code through your parser.- A parsed dict is still untrusted data: validate it against a Pydantic v2 model with
extra="forbid"to catch wrong types, missing required keys, and typo'd extra keys. - Turn Pydantic's
ValidationErrorinto a short, line-oriented message so the user knows exactly what to fix.
Why safe_load, and what load actually does
PyYAML's full loader implements YAML tags that construct native Python objects. The tag !!python/object/apply:os.system tells the loader to call os.system with the given argument while parsing. So this is a valid YAML document:
# DO NOT load this with yaml.load()
!!python/object/apply:os.system ['echo pwned']
yaml.load(text) — and FullLoader/UnsafeLoader — will execute that os.system call. The attacker doesn't need your code to do anything; the parse step itself is the exploit. This is the YAML analogue of pickle.loads on untrusted input. CVE history is full of CLIs that shipped yaml.load on user config.
yaml.safe_load() uses SafeLoader, which only constructs standard scalars and containers — strings, ints, floats, bools, None, lists, and dicts. Encounter a !!python/... tag and it raises ConstructorError instead of executing anything. Here is the difference, runnable:
import yaml
danger = "!!python/object/apply:os.system ['echo pwned']\n"
try:
yaml.safe_load(danger)
except yaml.YAMLError as e:
print("safe_load refused unsafe tag:", type(e).__name__)
# -> safe_load refused unsafe tag: ConstructorError
The rule is absolute: for any file outside your own build process, use safe_load. There is no flag combination that makes the unsafe loaders acceptable for user-supplied YAML.
Loading and validating: the complete pattern
Parsing safely gets you a dict, but that dict is still arbitrary: it might be missing a required key, have port: "five thousand", or contain a typo like prot:. Push it through a schema. This program runs end-to-end and is the pattern you should copy:
from __future__ import annotations
from pathlib import Path
import yaml
from pydantic import BaseModel, ConfigDict, ValidationError
class Settings(BaseModel):
model_config = ConfigDict(extra="forbid") # reject unknown keys
host: str # required
port: int = 5432
timeout: int = 10
retries: int = 3
verbose: bool = False
tags: list[str] = []
def load_yaml(path: Path) -> dict:
text = path.read_text(encoding="utf-8")
data = yaml.safe_load(text) # SAFE: never yaml.load
if data is None: # empty file -> empty config
return {}
if not isinstance(data, dict):
raise ValueError(
f"{path}: top-level YAML must be a mapping, got {type(data).__name__}"
)
return data
def load_settings(path: Path) -> Settings:
raw = load_yaml(path)
try:
return Settings.model_validate(raw)
except ValidationError as exc:
lines = [f"Invalid configuration in {path}:"]
for err in exc.errors():
loc = ".".join(str(p) for p in err["loc"]) or "(root)"
lines.append(f" - {loc}: {err['msg']}")
raise SystemExit("\n".join(lines))
Given this sample.yaml:
host: db.internal
port: 5432
timeout: 30
retries: 3
verbose: true
tags:
- prod
- east
load_settings(Path("sample.yaml")) returns a fully typed object:
Loaded & validated: {'host': 'db.internal', 'port': 5432, 'timeout': 30,
'retries': 3, 'verbose': True, 'tags': ['prod', 'east']}
Handling missing, extra, and wrong keys
The schema decides three failure modes, and each should produce a clear message rather than a stack trace:
- Missing required key —
hosthas no default, so a file without it fails validation withField required. - Extra/typo'd key —
extra="forbid"rejects unknown keys. A file withprot: 5432(typo forport) fails instead of silently dropping the value. This is the single most valuable guard: silent extra-key drops are how users spend an hour wondering why their setting "isn't working." - Wrong type —
port: "abc"fails because it can't coerce toint. (Note Pydantic will coerce"5432"→5432, which is what you want for values that arrive as strings.)
The load_settings error handler above flattens ValidationError.errors() into one line per problem. Feed it a bad.yaml containing a typo:
host: x
prot: 5432
and you get an actionable message, then a non-zero exit:
Invalid configuration in bad.yaml:
- prot: Extra inputs are not permitted
That is the difference between a CLI a user can debug and one that dumps a 30-line Pydantic traceback at them.
Loading from the right path
Read config from a predictable, documented location, not the current directory by accident. Resolve the path explicitly and handle "file absent" as "use defaults," not as an error:
def load_config(path: Path) -> Settings:
if not path.is_file():
return Settings(host="localhost") # documented fallback
return load_settings(path)
For where these paths should be — user config under ~/.config/<app>/, project config in the repo — and how to combine a YAML file with environment variables and flags, see the precedence rules in the parent hub. The merge there feeds the same safe_load + Pydantic pipeline you built on this page.
Production notes
safe_loadonly, every time. Wrap parsing in one helper (likeload_yamlabove) so no caller can ever reach foryaml.load. A grep foryaml.load(in CI catches regressions.- Audit your plugins. A plugin that parses its own YAML with
yaml.loadreopens the hole you closed. Hold third-party config readers to the same rule. - Catch
yaml.YAMLErrortoo. Malformed YAML (bad indentation, duplicate keys) raisesYAMLErrorfromsafe_loadbefore validation runs. Catch it and report the file and line. - Don't trust nesting depth. Deeply nested or huge YAML can be a denial-of-service vector. For files from fully untrusted sources, cap file size before parsing.