The reproducibility gap

Most infrastructure starts as someone's working setup. A cluster that was provisioned by hand by the engineer who set it up six months ago. A deployment process that lives in a Slack thread of curl commands. An environment that works on staging but behaves differently in production because the configuration diverged and nobody noticed.

The tooling to fix this exists. The problem is that it usually requires committing to a managed platform — a cloud provider's Kubernetes service, a CI/CD SaaS, a secrets manager with its own API surface. Each of those choices is a dependency. Some of them are justified. Some of them quietly make it harder to move, audit, or reproduce the environment when it matters.

This post is about a different shape: a single repository that owns the full stack from VM provisioning to running application, with Python all the way down and no mandatory managed services.

The stack

Three tools, composable and independently replaceable:

Pulumi handles cloud resource provisioning. Instead of YAML or HCL, you write Python. The same language your application uses, the same review process, the same version control. Provisioning four Linux machines on GCP becomes a function call, not a configuration file you maintain separately from everything else.

Pyinfra handles configuration management. It is Ansible, but the playbooks are Python. No YAML DSL to learn, no Jinja templating edge cases, no context-switching between the infrastructure code and the application code. Pyinfra connects to the machines Pulumi provisioned and installs the Kubernetes cluster from scratch — kubeadm, flannel, an nginx ingress controller, the full setup.

Kubernetes manifests deploy the application. Standard, portable, not tied to any particular managed Kubernetes offering.

Why this matters for platform teams

The obvious argument for this pattern is developer independence: a team that can provision, configure, and deploy without filing a ticket to an infrastructure team or clicking through a cloud console is a faster team.

But the deeper argument is about what reproducibility enables.

When your infrastructure is code — real code, not configuration prose — it can be reviewed, tested, and version-controlled like any other engineering artefact. You can see exactly what changed between the environment that works and the environment that doesn't. You can reproduce the production environment locally for debugging. You can hand it to a new team member and have them running a full stack in an afternoon.

That last point is the one that matters most in practice. Infrastructure that only one person understands is a bottleneck with a countdown timer. Infrastructure expressed as a documented, executable codebase is something a team can own collectively.

The repository structure

├── infra/
│   ├── pulumi/          # VM provisioning (Python)
│   └── pyinfra/         # Cluster configuration (Python)
├── k8s/                 # Kubernetes manifests
├── app/                 # Application code
├── Pulumi.yaml
└── pyproject.toml       # Unified dependency management with Poetry

Everything in one place. git clone, install dependencies, run Pulumi to get the machines, run Pyinfra to get the cluster, kubectl apply to get the application. The entire environment is reproducible by anyone with the credentials.

What this is not

This is not an argument against managed Kubernetes, or against Terraform, or against the cloud platforms' own deployment tooling. Those choices are appropriate for many teams and many contexts.

It is an argument that infrastructure should be owned by the teams who use it, expressed in a form they can read and modify, and reproducible without institutional knowledge. Whether you get there with this stack or another one is secondary.


The full working repository — including the Pulumi provisioning code, Pyinfra playbooks, Kubernetes manifests, and a sample Python application — is on GitHub.