Performance & Scalability
Strategies for running Copybara efficiently on large codebases.
Memory Configuration
Section titled “Memory Configuration”JVM Heap Size
Section titled “JVM Heap Size”Large repositories require more memory. Set the heap size with -Xmx:
# Default (may be insufficient for large repos)java -jar copybara.jar migrate copy.bara.sky
# 4GB heap for medium repos (10k-50k files)java -Xmx4g -jar copybara.jar migrate copy.bara.sky
# 8GB heap for large repos (50k-200k files)java -Xmx8g -jar copybara.jar migrate copy.bara.sky
# 16GB+ for very large repos (200k+ files)java -Xmx16g -jar copybara.jar migrate copy.bara.skySigns You Need More Memory
Section titled “Signs You Need More Memory”OutOfMemoryError: Java heap space- Process killed by OOM killer (check
dmesg) - Unexplained slowdowns during transformation phase
Glob Optimization
Section titled “Glob Optimization”File matching is often the biggest performance factor.
Use Specific Patterns
Section titled “Use Specific Patterns”# SLOW: Scans everything, then filtersorigin_files = glob( include = ["**"], exclude = ["vendor/**", "node_modules/**", "*.generated.*"],)
# FAST: Only scans needed directoriesorigin_files = glob(["src/**", "lib/**", "docs/**"])Understand Root Calculation
Section titled “Understand Root Calculation”Copybara calculates “roots” from glob patterns to determine which directories to query:
| Pattern | Root Queried |
|---|---|
src/**/*.py | src/ |
pkg/api/** | pkg/api/ |
**/*.java | / (entire repo) |
*.md | / (root only) |
Patterns starting with ** force full repository traversal.
Exclude Heavy Directories
Section titled “Exclude Heavy Directories”Always exclude directories you don’t need:
origin_files = glob( include = ["**"], exclude = [ "node_modules/**", "vendor/**", ".git/**", "build/**", "dist/**", "target/**", "**/*.min.js", "**/*.map", ],)Git Configuration
Section titled “Git Configuration”Shallow Clones
Section titled “Shallow Clones”For CI environments, use shallow clones when you only need recent history:
# Clone with limited depthgit clone --depth=100 https://github.com/org/repo
# Then run Copybara with explicit last-revjava -jar copybara.jar migrate copy.bara.sky --last-rev HEAD~50Partial Clone (Sparse Checkout)
Section titled “Partial Clone (Sparse Checkout)”For very large monorepos, use Git’s partial clone:
# Clone without blobs initiallygit clone --filter=blob:none https://github.com/org/repo
# Configure sparse checkoutgit sparse-checkout init --conegit sparse-checkout set src/component docs/Reference Repositories
Section titled “Reference Repositories”When running multiple Copybara workflows against the same origin, use reference repos:
# Create a shared referencegit clone --bare https://github.com/org/repo /shared/repo.git
# Copybara workflows reference it (reduces network I/O)git clone --reference /shared/repo.git https://github.com/org/repoWorkflow Optimization
Section titled “Workflow Optimization”Limit History Processing
Section titled “Limit History Processing”Don’t process entire history on every run:
# Only process last 100 commitsjava -jar copybara.jar migrate copy.bara.sky --last-rev HEAD~100
# Start from specific known-good commitjava -jar copybara.jar migrate copy.bara.sky --last-rev abc123defUse SQUASH Mode for Large Histories
Section titled “Use SQUASH Mode for Large Histories”When importing many commits, SQUASH mode is faster than ITERATIVE:
core.workflow( name = "import", mode = "SQUASH", # Combines all changes into one commit # ...)ITERATIVE mode processes each commit separately, which is slower but preserves history.
Batch Transformations
Section titled “Batch Transformations”Group related transformations to minimize file I/O:
# Less efficient: Multiple passes over filestransformations = [ core.replace(before = "old1", after = "new1", paths = glob(["**/*.java"])), core.replace(before = "old2", after = "new2", paths = glob(["**/*.java"])), core.replace(before = "old3", after = "new3", paths = glob(["**/*.java"])),]
# More efficient: Combined patterns where possibletransformations = [ core.replace( before = "com.old.package", after = "com.new.package", paths = glob(["**/*.java"]), regex_groups = {"package": ".*"}, ),]CI/CD Optimization
Section titled “CI/CD Optimization”Caching
Section titled “Caching”Cache the Copybara JAR and Git repos between runs:
# GitHub Actions example- uses: actions/cache@v4 with: path: | ~/.copybara ~/.m2/repository key: copybara-${{ hashFiles('copy.bara.sky') }}Parallel Workflows
Section titled “Parallel Workflows”Run independent workflows in parallel:
jobs: sync-component-a: runs-on: ubuntu-latest steps: - run: java -jar copybara.jar migrate copy.bara.sky component-a
sync-component-b: runs-on: ubuntu-latest steps: - run: java -jar copybara.jar migrate copy.bara.sky component-bIncremental Triggers
Section titled “Incremental Triggers”Only run Copybara when relevant files change:
on: push: paths: - "src/**" - "docs/**" - "copy.bara.sky"Monitoring & Debugging
Section titled “Monitoring & Debugging”Verbose Logging
Section titled “Verbose Logging”Enable detailed logging for performance analysis:
java -jar copybara.jar migrate copy.bara.sky -vTiming Information
Section titled “Timing Information”The verbose output includes timing for each phase:
- Git operations (clone, fetch, push)
- File matching
- Transformation execution
- Commit creation
Profiling Large Runs
Section titled “Profiling Large Runs”For persistent performance issues, enable JVM profiling:
java -XX:+FlightRecorder \ -XX:StartFlightRecording=duration=300s,filename=copybara.jfr \ -jar copybara.jar migrate copy.bara.skyAnalyze with jfr command or Java Mission Control.
Scaling Patterns
Section titled “Scaling Patterns”Sharding Large Monorepos
Section titled “Sharding Large Monorepos”Split large repos into independent workflows:
core.workflow( name = "frontend", origin_files = glob(["frontend/**"]), # ...)
# workflow-backend.bara.skycore.workflow( name = "backend", origin_files = glob(["backend/**"]), # ...)Rate Limiting
Section titled “Rate Limiting”For high-frequency syncs, implement rate limiting:
# Only sync if last sync was > 5 minutes agoif [ $(find /tmp/last-sync -mmin -5 2>/dev/null) ]; then echo "Skipping: synced recently" exit 0fitouch /tmp/last-syncjava -jar copybara.jar migrate copy.bara.skyBenchmarks
Section titled “Benchmarks”Typical performance on modern hardware (8-core, 32GB RAM):
| Repo Size | Files | First Sync | Incremental |
|---|---|---|---|
| Small | 1-1,000 | ~30s | ~5s |
| Medium | 1k-10k | ~2m | ~15s |
| Large | 10k-100k | ~10m | ~1m |
| Very Large | 100k+ | ~30m+ | ~5m |
Next Steps
Section titled “Next Steps”- Glob reference - Pattern optimization
- Debugging guide - Troubleshooting slow runs
- CI/CD integration - Automation patterns