Structuring a Large Python CLI Project

A CLI that started as a single main.py and grew to forty subcommands needs more than a tidy desk — it needs a layout that keeps import paths predictable, startup fast, and tests fast. This article gives a concrete, opinionated structure for a large Python CLI: the src/ layout, layered packages, namespace packages for plugins, and lazy command loading so a tool with dozens of commands still starts in milliseconds.

TL;DR

Use the src/ layout — your package lives under src/, not at the repo root — so tests run against the installed package and can't accidentally import from the working directory.
Organize by layer, not by feature dump: commands/ (thin parsing wrappers), core/ (business logic), io/ (formatting).
Lazy-load commands so importing the root group doesn't import every subcommand's dependencies — this is what keeps startup time flat as the command count grows.
Put tests in a top-level tests/ directory: fast unit tests against core/, CliRunner smoke tests against commands/.
Use namespace packages when you want third parties (or separate internal repos) to contribute subcommands.

Directory tree for a large CLI using the src/ layout, with commands, core, and io layers and a top-level tests directory

The src/ layout, and why

The single most important structural decision is putting your importable package inside a src/ directory rather than at the repository root:

my-cli/
├── pyproject.toml
├── src/
│   └── reporter/
│       ├── __init__.py
│       ├── cli.py
│       ├── commands/
│       ├── core/
│       └── io/
└── tests/
    ├── test_sales.py
    └── test_report.py

Without src/, the repository root is on sys.path whenever you run anything from it, so import reporter finds the source tree directly — even if you forgot to install the package, even if your packaging config is broken. Tests pass locally and fail in CI or for users. The src/ layout removes the root from the import path: the only way to import reporter is to install it (pip install -e .), which means your tests exercise the same thing your users get. It catches missing __init__.py files, files left out of the wheel, and MANIFEST gaps before release.

The matching pyproject.toml is short:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "reporter"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = ["click>=8.1"]

[project.scripts]
reporter = "reporter.cli:cli"

[tool.hatch.build.targets.wheel]
packages = ["src/reporter"]

Organize by layer

Resist the urge to make one giant commands.py. Split responsibilities into packages so the import graph mirrors the architecture. The core/ service layer is pure Python with no framework imports — it's where the work happens:

# src/reporter/core/sales.py
from __future__ import annotations
from dataclasses import dataclass

@dataclass(frozen=True)
class SalesSummary:
    total: float
    count: int

    @property
    def average(self) -> float:
        return self.total / self.count if self.count else 0.0

def summarize(amounts: list[float]) -> SalesSummary:
    """Pure business logic: no I/O, no framework objects."""
    return SalesSummary(total=sum(amounts), count=len(amounts))

The io/ layer turns service results into output, and only this layer changes when you add --format json or a Rich table:

# src/reporter/io/render.py
from __future__ import annotations
from reporter.core.sales import SalesSummary

def render_summary(summary: SalesSummary) -> str:
    return f"count={summary.count} total={summary.total:.2f} avg={summary.average:.2f}"

The commands/ layer is thin glue — parse, delegate to core/, render with io/:

# src/reporter/commands/report.py
from __future__ import annotations
import click
from reporter.core.sales import summarize
from reporter.io.render import render_summary

@click.command(name="report")
@click.argument("amounts", nargs=-1, type=float)
def report_command(amounts: tuple[float, ...]) -> None:
    """Summarize AMOUNTS. Thin wrapper: parse -> service -> render."""
    summary = summarize(list(amounts))
    click.echo(render_summary(summary))

Where tests live

Tests sit in a top-level tests/ directory, outside src/, so they aren't packaged into the wheel. Mirror the layering: most tests target core/ as plain function calls, and a thin smoke test per command verifies the wiring. Both run in-process — no subprocess — so the suite stays fast even with dozens of commands.

# tests/test_report.py
from click.testing import CliRunner
from reporter.cli import cli
from reporter.core.sales import summarize

def test_service_pure() -> None:
    s = summarize([10.0, 20.0, 30.0])
    assert s.total == 60.0 and s.count == 3 and s.average == 20.0

def test_report_command() -> None:
    result = CliRunner().invoke(cli, ["report", "10", "20", "30"])
    assert result.exit_code == 0
    assert "count=3 total=60.00 avg=20.00" in result.output

Running this against an editable install:

$ pytest -q
..                                                                       [100%]
2 passed in 0.01s

Lazy command loading for startup time

Here's the problem that bites large CLIs: if cli.py imports every command module at the top, then any invocation — even mycli --help — imports the dependencies of every command. If one command imports pandas and another imports boto3, your --version flag now pays for both. Startup time creeps up as the command count grows.

The fix is a lazy Group that defers importing a command module until that command is actually invoked. The group knows the import path for each command as a string and resolves it on demand:

# src/reporter/cli.py
from __future__ import annotations
import importlib
import click

class LazyGroup(click.Group):
    """Defer importing command modules until a command is invoked."""
    def __init__(self, *args, lazy_subcommands=None, **kwargs):
        super().__init__(*args, **kwargs)
        self._lazy = lazy_subcommands or {}

    def list_commands(self, ctx):
        return sorted([*super().list_commands(ctx), *self._lazy])

    def get_command(self, ctx, name):
        if name in self._lazy:
            module_path, attr = self._lazy[name].rsplit(":", 1)
            mod = importlib.import_module(module_path)
            return getattr(mod, attr)
        return super().get_command(ctx, name)

@click.group(cls=LazyGroup, lazy_subcommands={
    "report": "reporter.commands.report:report_command",
})
def cli() -> None:
    """reporter: example layered CLI."""

list_commands makes report show up in --help without importing reporter.commands.report; get_command imports it only when the user actually runs reporter report. With dozens of commands, only the one being run — and its dependencies — gets loaded. The mapping is a plain dict, so it's also the natural place to register commands discovered from plugins.

Namespace packages and scaling to dozens of commands

When you want a command tree spread across multiple distributions — a core package plus optional plugin packages, or separate internal repos — reach for namespace packages (PEP 420). Multiple installed distributions can contribute modules under a shared package name like reporter.commands.* without any of them shipping an __init__.py for the namespace, and without one "owning" the directory. Combined with entry-point discovery, that lets a plugin register its command into the lazy mapping at install time. The deep dive on this lives in plugin architectures for extensible CLIs.

At the scale of dozens of commands, three habits keep the project healthy: one module per command under commands/ (never a grab-bag file), all real logic in core/ where it's unit-testable without the framework, and lazy loading so the import cost of a rarely-used command never slows the common path. The directory shape stays identical from five commands to fifty — contributors always know exactly where a new command, its logic, and its tests belong.

Up: Structuring multi-command Python CLIs
Up: Modern Python CLI Frameworks & Architecture
Sideways: Best practices for Python CLI entry points
Sideways: Plugin architectures for extensible CLIs