Input & UX

Advanced Argument Validation Strategies

Enforce strict argument validation in Python CLIs using Pydantic v2, Typer, and custom validators for schema-driven pre-execution data integrity.

Updated

Robust validation turns brittle scripts into resilient tools. The core principle is simple: validate everything at the boundary, before a single line of business logic runs, so your command body can assume it is working with clean, typed data. This hub walks through schema-driven validation with Pydantic v2, custom validators and callbacks in Typer and Click, layering checks from type to range to cross-field, and converting validation failures into clean CLI errors with the right exit codes.

TL;DR

  • Validate at the boundary: parse raw strings into a typed, validated model first; the rest of the command trusts that model.
  • Use a Pydantic v2 BaseModel as the single source of truth for shape, types, ranges, and cross-field rules.
  • Wire it in with a Typer/Click callback or a custom ParamType, catch ValidationError, and re-raise as a click.BadParameter so the user gets exit code 2 and a usage message.
  • Layer your checks: type coercion, then per-field constraints (Field(ge=..., le=...)), then cross-field invariants (@model_validator).

Funnel diagram: raw input passes through a type check, then range/format, then cross-field rules into a valid typed object; any failing stage branches to a BadParameter error with exit code 2.

Validate at the boundary

The most common validation mistake is scattering if checks through the command body. By the time you discover that --replicas is negative, you may have already opened a connection or written a file. Instead, treat the argument layer as a gate: nothing untrusted gets past it.

A Pydantic v2 model is the cleanest way to express that gate. It captures the entire contract — field names, types, bounds, defaults, and relationships — in one declarative place:

from typing import Annotated
from pydantic import BaseModel, Field, field_validator, model_validator

class Resources(BaseModel):
    cpu: Annotated[int, Field(ge=1, le=64)]
    memory_mb: Annotated[int, Field(ge=128)]

class DeployConfig(BaseModel):
    name: Annotated[str, Field(min_length=1, max_length=63)]
    replicas: Annotated[int, Field(ge=1, le=100)]
    resources: Resources
    canary_percent: Annotated[int, Field(ge=0, le=100)] = 0

    @field_validator("name")
    @classmethod
    def name_is_dns_safe(cls, v: str) -> str:
        if not all(c.isalnum() or c == "-" for c in v):
            raise ValueError("name must contain only alphanumerics and hyphens")
        return v.lower()

    @model_validator(mode="after")
    def canary_needs_replicas(self) -> "DeployConfig":
        if self.canary_percent > 0 and self.replicas < 2:
            raise ValueError("canary_percent requires at least 2 replicas")
        return self

Call DeployConfig.model_validate(data) once, and every layer fires in order.

Layered validation: type, range, cross-field

Good validation is layered, and Pydantic runs the layers for you in a predictable sequence:

  1. Type coercion — Pydantic parses "4" into 4 for an int field, or rejects "four". This is the cheapest, broadest layer.
  2. Per-field constraintsField(ge=1, le=64) and field_validator enforce bounds and shape on individual values. The name_is_dns_safe validator both checks and normalizes (lowercasing), which is a useful trick: validators can return a cleaned value.
  3. Cross-field invariants@model_validator(mode="after") sees the fully built object, so it can assert relationships between fields, like "canary deployments need at least two replicas." These rules are impossible to express on a single field.

Layering matters because earlier layers protect later ones: a model_validator never has to guard against replicas being a string, because the type layer already guaranteed it is an int.

Wiring validators into Typer and Click

In Typer, a callback on an option runs your parsing function. Raise typer.BadParameter to produce a clean usage error:

import typer
from pydantic import ValidationError

def parse_replicas(value: int) -> int:
    if value > 50:
        raise typer.BadParameter("replicas above 50 require sign-off")
    return value

@app.command()
def deploy(replicas: int = typer.Option(..., callback=parse_replicas)):
    ...

In Click, subclass click.ParamType and override convert; call self.fail(...) on bad input. That is the natural home for parsing structured payloads — covered in depth in parsing nested JSON args in Python CLIs, which builds a ParamType that runs json.loads and then model_validate in one step.

Clean errors and correct exit codes

A validation failure should never surface as a raw Python traceback. The Pydantic ValidationError carries a structured .errors() list — turn it into a tidy, field-addressed message and route it through Click's error machinery so the process exits with status 2 (the conventional "usage error" code):

from pydantic import ValidationError
import click

def format_errors(exc: ValidationError) -> str:
    lines = []
    for err in exc.errors():
        loc = ".".join(str(p) for p in err["loc"]) or "(root)"
        lines.append(f"  {loc}: {err['msg']}")
    return "validation failed:\n" + "\n".join(lines)

# inside a ParamType.convert or a callback:
try:
    return DeployConfig.model_validate(data)
except ValidationError as exc:
    raise click.BadParameter(format_errors(exc))

Now a bad cpu produces resources.cpu: Input should be less than or equal to 64 and an exit code that scripts and CI can detect — not a stack trace.