Open Source a Project
Open Source a Project
Section titled “Open Source a Project”This guide walks through Copybara’s most common use case: publishing internal code to a public GitHub repository while:
- Filtering out sensitive or internal-only files
- Replacing internal URLs and references
- Verifying no secrets accidentally leak
- Automating ongoing sync
What You’ll Learn
Section titled “What You’ll Learn”- File filtering with
origin_filesand globs - Text replacement with
core.replace() - Safety verification with
core.verify_match() - GitHub destination configuration
- Setting up automated sync
Scenario
Section titled “Scenario”You have an internal repository with this structure:
my-project/├── src/│ ├── lib.py│ ├── cli.py│ └── internal/ # Internal-only code│ └── metrics.py├── tests/│ └── test_lib.py├── docs/│ └── guide.md├── .internal/ # Internal config│ └── deploy.yaml├── README.md└── INTERNAL_NOTES.md # Should not be publicGoal: Publish src/, tests/, docs/, and README.md to GitHub, excluding internal files and replacing internal references.
Prerequisites
Section titled “Prerequisites”- Copybara installed (see Installation)
- A GitHub account with a personal access token
- An internal Git repository (can be local or remote)
- A GitHub repository created for the public release
Step 1: Set Up Authentication
Section titled “Step 1: Set Up Authentication”Create a GitHub Personal Access Token (PAT) with repo scope:
- Go to GitHub Settings → Developer settings → Personal access tokens
- Click “Generate new token (classic)”
- Select the
reposcope - Copy the token
Configure Git to use the token:
# Store credentials (replace YOUR_TOKEN)git config --global credential.helper storeecho "https://YOUR_GITHUB_USERNAME:YOUR_TOKEN@github.com" >> ~/.git-credentialsStep 2: Write the Configuration
Section titled “Step 2: Write the Configuration”Create copy.bara.sky:
# Open Source Workflow Configuration
# Repository URLsinternal_repo = "https://github.com/mycompany/internal-project"public_repo = "https://github.com/mycompany/public-project"
core.workflow( name = "export",
# Source: internal repository origin = git.origin( url = internal_repo, ref = "main", ),
# Destination: public GitHub repository destination = git.destination( url = public_repo, fetch = "main", push = "main", ),
# Preserve original authors where possible authoring = authoring.pass_thru( default = "Open Source Bot <oss-bot@mycompany.com>", ),
# Only include these files (exclude internal directories) origin_files = glob( include = [ "src/**", "tests/**", "docs/**", "README.md", "LICENSE", "pyproject.toml", "setup.py", ], exclude = [ "src/internal/**", "**/*_internal.py", "**/internal_*", ], ),
# Transformations applied in order transformations = [ # Replace internal URLs core.replace( before = "https://internal.mycompany.com", after = "https://github.com/mycompany/public-project", ),
# Replace internal package references core.replace( before = "from mycompany.internal", after = "from myproject", ),
# Replace internal email domain core.replace( before = "@internal.mycompany.com", after = "@mycompany.com", ),
# SAFETY: Verify no internal markers remain core.verify_match( regex = "INTERNAL|CONFIDENTIAL|DO NOT SHARE|@internal\\.", verify_no_match = True, ),
# SAFETY: Verify no hardcoded secrets pattern core.verify_match( regex = "(?i)(api[_-]?key|secret|password)\\s*=\\s*['\"][^'\"]+['\"]", verify_no_match = True, paths = glob(["**/*.py", "**/*.yaml", "**/*.yml", "**/*.json"]), ),
# Add export metadata to commit message metadata.add_header("Exported from internal repository"), ],
# Combine all changes into one commit mode = "SQUASH",)Step 3: Understand the Key Parts
Section titled “Step 3: Understand the Key Parts”File Filtering with Globs
Section titled “File Filtering with Globs”origin_files = glob( include = ["src/**", "tests/**", "docs/**", "README.md"], exclude = ["src/internal/**", "**/*_internal.py"],)include: Only these patterns are syncedexclude: These patterns are never synced, even if they matchinclude**matches any directory depth*matches any characters in a filename
Safety Verification
Section titled “Safety Verification”core.verify_match( regex = "INTERNAL|CONFIDENTIAL", verify_no_match = True,)This fails the migration if the regex matches any file. Essential for preventing accidental leaks.
Authoring
Section titled “Authoring”authoring = authoring.pass_thru( default = "Open Source Bot <oss-bot@mycompany.com>",)pass_thru: Preserves original commit authorsdefault: Used when original author can’t be determined
Step 4: Test Locally First
Section titled “Step 4: Test Locally First”Before pushing to GitHub, test with a local folder destination:
# Temporarily use folder destination for testingdestination = folder.destination(),Run with folder output:
java -jar copybara.jar migrate copy.bara.sky export \ --folder-dir /tmp/copybara-test \ --forceInspect /tmp/copybara-test to verify:
- Only expected files are present
- Internal files are excluded
- Text replacements are correct
Step 5: Run the Migration
Section titled “Step 5: Run the Migration”Once verified, run the actual migration:
java -jar copybara.jar migrate copy.bara.sky export --forceThe --force flag is needed for the initial migration.
Step 6: Set Up Automated Sync
Section titled “Step 6: Set Up Automated Sync”For ongoing synchronization, create a GitHub Actions workflow:
name: Sync to Public Repo
on: push: branches: [main] paths: - "src/**" - "tests/**" - "docs/**" - "README.md" workflow_dispatch:
jobs: sync: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0
- name: Set up Java uses: actions/setup-java@v4 with: distribution: temurin java-version: "21"
- name: Download Copybara run: | curl -fsSL -o copybara.jar \ https://github.com/google/copybara/releases/latest/download/copybara_deploy.jar
- name: Configure Git run: | git config --global user.name "github-actions[bot]" git config --global user.email "github-actions[bot]@users.noreply.github.com" git config --global credential.helper store echo "https://x-access-token:${{ secrets.PUBLIC_REPO_TOKEN }}@github.com" >> ~/.git-credentials
- name: Run Copybara run: | java -jar copybara.jar migrate copy.bara.sky export --ignore-noopComplete Example
Section titled “Complete Example”Here’s the full workflow showing the sync process:
Adding More Transformations
Section titled “Adding More Transformations”Rename Files or Directories
Section titled “Rename Files or Directories”transformations = [ # Move internal path to standard path core.move("src/mycompany/", "src/myproject/"),]Remove Internal Comments
Section titled “Remove Internal Comments”transformations = [ # Remove lines containing internal comments core.replace( before = "# INTERNAL:${content}\n", after = "", regex_groups = {"content": ".*"}, multiline = True, ),]Add License Headers
Section titled “Add License Headers”transformations = [ # Handled via core.transform or custom transformations # See Transformations docs for details]Troubleshooting
Section titled “Troubleshooting””verify_match found matches”
Section titled “”verify_match found matches””The safety check found content that shouldn’t be public. Check the error message for which file and pattern matched, then either:
- Remove the content from source
- Add the file to
excludeinorigin_files - Add a transformation to remove/replace it
”Nothing to migrate”
Section titled “”Nothing to migrate””No new changes since last sync. This is normal - use --ignore-noop in CI to not fail.
”Authentication failed”
Section titled “”Authentication failed””Check that your token has the correct permissions and hasn’t expired. See Authentication.
Next Steps
Section titled “Next Steps”- Transformations Overview - All available transformations
- GitHub Integration - GitHub-specific features
- GitHub Actions Setup - CI/CD automation
- SQUASH vs ITERATIVE - Choosing a workflow mode