Reproducible environments with uv

Understand how Xorq uses uv to create hermetic, reproducible Python environments

Your pipeline works perfectly on your laptop today. You deploy it to production next week, and it breaks because pandas updated from 2.1.0 to 2.2.0. A method you use changed behavior. Code that hasn’t changed produces different results because the environment changed. Reproducible environments with uv solve this by locking every dependency to exact versions, ensuring identical execution across time and machines.

What are reproducible environments?

A reproducible environment is a Python environment where every dependency, including transitive dependencies, is locked to a specific version. Running code in this environment produces identical results regardless of when or where you execute it.

Xorq integrates with uv, a fast Python package installer, to create these environments. When you run xorq uv-build, Xorq creates an isolated environment with pinned dependencies, builds your expression in that environment, and packages everything for deployment.

# Standard build (uses current environment)
xorq build pipeline.py -e features

# Hermetic build (creates isolated environment)
xorq uv-build pipeline.py -e features

Why reproducible environments matter

Without reproducibility, your code works on your laptop but fails in production. Dependencies update, Python versions change, and behavior drifts. What worked yesterday might break today.

This creates four critical problems:

Environment drift causes production failures. Your laptop has pandas 2.1.0, but production has 2.0.0. A method you use, such as DataFrame.map, doesn’t exist in the older version. Your pipeline fails in production despite working locally. Debugging takes hours because the code looks correct, but the environment is different.

Transitive dependency conflicts break environments. You install package A, which depends on package B version 1.0. Later, you install package C, which requires package B version 2.0. Your environment breaks because B can’t satisfy both requirements simultaneously. pip tries to resolve this, fails, and leaves your environment in an inconsistent state.

Time-based failures occur from external updates. Your code worked yesterday. But today, a dependency released a breaking change where pandas 2.2.0 changes groupby behavior. Your deployment fails even though you didn’t change your code. Production breaks from external changes you didn’t anticipate or control.

Missing audit trails prevent reproduction. Six months later, regulators ask “Which dependency versions produced this model?” You can’t remember because you didn’t track pandas 2.1.0, numpy 1.25.0, or pyarrow 13.0.0. Reproduction becomes guesswork. Compliance failures cost money and reputation.

Reproducible environments solve these by locking every dependency to exact versions, ensuring identical behavior across time and machines.

How reproducible environments work

Reproducible environments with uv operate in four stages:

Dependency resolution: When you run uv-build, uv reads your project’s dependencies and resolves them to exact versions. This includes transitive dependencies, which are dependencies of dependencies.

Environment isolation: uv creates an isolated Python environment separate from your system Python. This environment contains only the specified dependencies, nothing more.

Build execution: Xorq runs the build in this isolated environment. The expression compiles using the exact dependency versions specified.

Artifact packaging: The build artifacts include metadata about the environment like Python version. The packaged sdist contains requirements.txt with dependency versions. This supports recreation later.

When you run xorq uv-build, uv creates an isolated environment, installs locked dependencies, and builds your expression. The build process follows this sequence:

sequenceDiagram
    participant User
    participant UV
    participant Isolated as Isolated Env
    participant Build as Build System
    
    User->>UV: xorq uv-build pipeline.py
    UV->>UV: Resolve dependencies
    UV->>Isolated: Create environment
    Isolated->>Isolated: Install locked versions
    UV->>Build: Build in isolated env
    Build->>Build: Generate artifacts
    Build-->>User: builds/a3f5c9d2/
    Note over Build: Includes env metadata

The isolated environment is ephemeral. uv creates it for the build, then discards it. The build artifacts capture what you need to recreate the environment later. The dependency flow from lock file to build artifacts looks like this:

graph LR
    A[pyproject.toml] --> B[uv lock]
    B --> C[uv.lock]
    C --> D[xorq uv-build]
    D --> E[Isolated Env<br/>with locked deps]
    E --> F[Build Expression]
    F --> G[builds/a3f5c9d2/]
    G --> H[Artifacts:<br/>metadata.json<br/>+ sdist]

Tip

Reproducible environments separate build-time dependencies from runtime dependencies. You build with locked versions, then execute with the same locked versions. This prevents drift between environments.

uv-build versus regular build

Xorq provides two build commands with different reproducibility guarantees:

xorq build

Uses your current Python environment:

xorq build pipeline.py -e features

Pros

  • Fast with no environment creation.
  • Simple because it uses your existing environment with no additional tools.
  • Good for development iteration with quick build, test, and repeat cycles.

Cons

  • Not reproducible because it depends on whatever packages are currently installed.
  • Environment drift is possible, so code that works today might break tomorrow.
  • No dependency locking means you can’t recreate the exact environment later.

Use when

Use this command for development iteration, prototyping, and solo work where build speed matters more than reproducibility.

xorq uv-build

Creates an isolated, locked environment:

xorq uv-build pipeline.py -e features

Pros

  • Fully reproducible because locked dependencies guarantee identical behavior.
  • Isolated with no environment pollution from other projects.
  • Hermetic and self-contained, including all dependency info.
  • Auditable because metadata tracks exactly what was used.

Cons

  • Slower because it creates an environment each time.
  • Requires uv installed as an additional tooling dependency.
  • More complex setup requiring pyproject.toml and lock files.
  • Lock file management requiring commits and updates to uv.lock.

Use when

Use this command for production deployments, compliance audits, and team collaboration where reproducibility is critical.

Dependency locking

Dependency locking means specifying exact versions for every package, including transitive dependencies. Instead of “pandas >= 2.0”, you specify “pandas == 2.1.0, pyarrow == 13.0.0, numpy == 1.25.0”.

uv handles this automatically:

# uv resolves dependencies and creates lock file
uv lock

# uv-build uses the lock file
xorq uv-build pipeline.py -e features

The lock file uv.lock captures the entire dependency tree:

[[package]]
name = "pandas"
version = "2.1.0"
dependencies = [
    { name = "numpy", version = "1.25.0" },
    { name = "pyarrow", version = "13.0.0" },
]

[[package]]
name = "numpy"
version = "1.25.0"

This ensures that six months from now, you can recreate the exact same environment.

Environment metadata

uv-builds include environment metadata in metadata.json and dependency information in the packaged sdist:

{
  "current_library_version": "0.3.4",
  "metadata_version": "0.0.0",
  "sys-version-info": [3, 11, 5, "final", 0],
  "git_state": {
    "commit": "a3f5c9d2e1b4...",
    "branch": "main"
  }
}

The build directory also includes a packaged sdist (.tar.gz file) that contains requirements.txt with all dependency versions. This enables:

Audit trails: Know exactly which versions produced this build by checking the sdist’s requirements.txt, like pandas 2.1.0, not 2.2.0.

Reproduction: Recreate the environment from the lock file and sdist six months later.

Debugging: Identify version-specific bugs by comparing metadata and requirements across builds.

Compliance: Prove which software versions were used for regulatory requirements by inspecting the sdist.

When to use reproducible environments

Deciding when to use reproducible environments depends on your deployment patterns and collaboration needs.

Use reproducible environments when

  • Deploying to production; compliance or audit trails matter; multiple people need identical environments; you need to recreate results later.

Use regular builds when

  • Interactive development or prototyping; solo, no deployment; build speed matters more than reproducibility; code never leaves your machine.

Reproducibility guarantees

Reproducible environments provide three levels of guarantee:

Package-level reproducibility

Same package versions across all environments:

# Development
xorq uv-build pipeline.py -e features
# Uses: pandas==2.1.0, numpy==1.25.0

# Production (6 months later)
xorq uv-build pipeline.py -e features
# Uses: pandas==2.1.0, numpy==1.25.0 (same!)

Python-level reproducibility

Same Python version across environments:

{
  "python_version": "3.11.5"
}

uv ensures the build uses the specified Python version, not whatever’s installed on the machine.

System-level reproducibility

Same system dependencies with Nix (optional):

For full system-level reproducibility including system libraries like OpenSSL, combine uv with Nix. This is advanced and typically only needed for maximum reproducibility.

Trade-offs

Benefits: Full reproducibility, audit trails, isolation, time-proof builds, team alignment.

Costs: Longer build time, complexity from uv and lock files, storage, tooling dependency, and lock file management.

Learning more

Build system explains how uv-build extends regular builds with environment locking. Content-addressed hashing covers how uv-builds are still content-addressed.

Compute catalog discusses how to catalog uv-builds like regular builds.

Build reproducible environments guide provides production workflows with uv. uv-build CLI reference covers complete uv-build documentation.