Data Management
Guide to copying production data to your local development environment for testing with real data.
Copy Data from Production
Use the copy_db.py script to copy schemas from production to your local database for testing with real data.
Prerequisites
- Production database access - Connection URL to production database
- pg_dump and pg_restore - Installed with PostgreSQL
Basic Usage
# Copy entire schema from production
python scripts/copy_db.py \
--source "postgresql://user:pass@prod-host:5432/prod_db" \
--target "postgresql://orange_user:orange_password@localhost:5432/orange_db" \
--overwrite
Recommended: Exclude Large Tables
Skip the largest table (report_row_changes) for 50-80% faster copying:
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--exclude-table report_row_changes \
--overwrite
Fast Mode: Parallel Processing
Use parallel jobs for 3-5x speedup:
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--exclude-table report_row_changes \
--overwrite
Fastest Mode: Parallel + Single-Phase
Achieve 6-10x overall speedup:
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--single-phase \
--exclude-table report_row_changes \
--overwrite
Options
| Option | Description | Example |
|---|---|---|
--source | Source database URL | postgresql://user:pass@host:5432/db |
--target | Target database URL | postgresql://user:pass@localhost:5432/db |
--source-schema | Source schema name | orange_schema (default) |
--target-schema | Target schema name | orange_schema (default) |
--overwrite | Replace existing target schema | (flag) |
--exclude-table | Exclude table from copy | --exclude-table report_row_changes |
--parallel N | Use N parallel jobs | --parallel 4 |
--compression N | Compression level 0-9 | --compression 0 (sequential only) |
--single-phase | Faster single-phase restore | (flag) |
Using Environment Variables
Store connection URLs in environment variables:
# .env or shell profile
export PROD_DATABASE_URL="postgresql://user:pass@prod-host:5432/prod_db"
export DATABASE_URL="postgresql://orange_user:orange_password@localhost:5432/orange_db"
# Then use in script
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--exclude-table report_row_changes \
--overwrite
What Gets Copied
✅ Included:
- Table structures and constraints
- Data from all tables (except excluded)
- Indexes and sequences
- Foreign key relationships
❌ Not included:
- Users and roles (
--no-owner) - Permissions (
--no-privileges) - Functions and triggers (can be added)
Reset Database
Clear all data and start fresh:
Docker Compose
# Stop services and remove volumes
docker compose -f docker-compose.dev.yml down -v
# Start services
docker compose -f docker-compose.dev.yml up -d
# Run migrations
uv run alembic upgrade head
Native Setup
# Drop and recreate database
psql -U postgres << EOF
DROP DATABASE IF EXISTS orange_db;
CREATE DATABASE orange_db OWNER orange_user;
GRANT ALL PRIVILEGES ON DATABASE orange_db TO orange_user;
EOF
# Run migrations
uv run alembic upgrade head
Copying Production Data Workflow
# 1. Copy production data (fast mode)
python scripts/copy_db.py \
--source "$PROD_DATABASE_URL" \
--target "$DATABASE_URL" \
--parallel 4 \
--exclude-table report_row_changes \
--overwrite
# 2. Run migrations if needed
uv run alembic upgrade head
# 3. Start development
uv run uvicorn app.main:app --reload
Next Steps
- Development Commands - Common operations