Skip to main content

Data Management

Guide to copying production data to your local development environment for testing with real data.

Copy Data from Production

Use the copy_db.py script to copy schemas from production to your local database for testing with real data.

Prerequisites

  • Production database access - Connection URL to production database
  • pg_dump and pg_restore - Installed with PostgreSQL

Basic Usage

# Copy entire schema from production
python scripts/copy_db.py \
--source "postgresql://user:pass@prod-host:5432/prod_db" \
--target "postgresql://orange_user:orange_password@localhost:5432/orange_db" \
--overwrite

Skip the largest table (report_row_changes) for 50-80% faster copying:

python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--exclude-table report_row_changes \
--overwrite

Fast Mode: Parallel Processing

Use parallel jobs for 3-5x speedup:

python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--exclude-table report_row_changes \
--overwrite

Fastest Mode: Parallel + Single-Phase

Achieve 6-10x overall speedup:

python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--single-phase \
--exclude-table report_row_changes \
--overwrite

Options

OptionDescriptionExample
--sourceSource database URLpostgresql://user:pass@host:5432/db
--targetTarget database URLpostgresql://user:pass@localhost:5432/db
--source-schemaSource schema nameorange_schema (default)
--target-schemaTarget schema nameorange_schema (default)
--overwriteReplace existing target schema(flag)
--exclude-tableExclude table from copy--exclude-table report_row_changes
--parallel NUse N parallel jobs--parallel 4
--compression NCompression level 0-9--compression 0 (sequential only)
--single-phaseFaster single-phase restore(flag)

Using Environment Variables

Store connection URLs in environment variables:

# .env or shell profile
export PROD_DATABASE_URL="postgresql://user:pass@prod-host:5432/prod_db"
export DATABASE_URL="postgresql://orange_user:orange_password@localhost:5432/orange_db"

# Then use in script
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--exclude-table report_row_changes \
--overwrite

What Gets Copied

Included:

  • Table structures and constraints
  • Data from all tables (except excluded)
  • Indexes and sequences
  • Foreign key relationships

Not included:

  • Users and roles (--no-owner)
  • Permissions (--no-privileges)
  • Functions and triggers (can be added)

Reset Database

Clear all data and start fresh:

Docker Compose

# Stop services and remove volumes
docker compose -f docker-compose.dev.yml down -v

# Start services
docker compose -f docker-compose.dev.yml up -d

# Run migrations
uv run alembic upgrade head

Native Setup

# Drop and recreate database
psql -U postgres << EOF
DROP DATABASE IF EXISTS orange_db;
CREATE DATABASE orange_db OWNER orange_user;
GRANT ALL PRIVILEGES ON DATABASE orange_db TO orange_user;
EOF

# Run migrations
uv run alembic upgrade head

Copying Production Data Workflow

# 1. Copy production data (fast mode)
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--exclude-table report_row_changes \
--overwrite

# 2. Run migrations if needed
uv run alembic upgrade head

# 3. Start development
uv run uvicorn app.main:app --reload

Next Steps