Sync Documentation via Pull Requests
Sync Documentation via Pull Requests
Section titled “Sync Documentation via Pull Requests”This guide shows you how to automatically sync documentation from an internal repository to a public docs site, creating Pull Requests for review instead of pushing directly.
This is ideal when:
- You want changes reviewed before they go live
- You don’t have (or don’t want) direct push access
- You need an audit trail of all sync operations
- Multiple teams need to approve documentation changes
What You’ll Build
Section titled “What You’ll Build”A workflow that:
- Watches for changes in your internal
docs/folder - Filters out internal-only content
- Transforms internal references to public ones
- Creates a Pull Request on your public docs repository
- Runs automatically via GitHub Actions
Scenario
Section titled “Scenario”You have this structure in your internal repository:
Directoryinternal-repo/
Directorysrc/ Application code
- …
Directorydocs/
Directorypublic/ Customer-facing docs
- getting-started.md
- api-reference.md
Directoryimages/
- diagram.png
Directoryinternal/ Internal-only docs (won’t be synced)
- architecture.md
- runbooks.md
- copy.bara.sky Copybara config
And a separate public documentation repository:
Directorypublic-docs/
- getting-started.md Synced from internal
- api-reference.md Synced from internal
Directoryimages/
- diagram.png
- README.md Not managed by Copybara
- CONTRIBUTING.md Not managed by Copybara
Prerequisites
Section titled “Prerequisites”- Two GitHub repositories: source (internal) and destination (public docs)
- GitHub Personal Access Token with repo access to the destination
- Java 11+ and Git installed
- Copybara JAR downloaded
Step 1: Create a GitHub Personal Access Token
Section titled “Step 1: Create a GitHub Personal Access Token”You need a token that can create PRs on your destination repository.
- Go to GitHub Settings → Developer settings → Personal access tokens → Fine-grained tokens
- Click Generate new token
- Configure:
- Token name:
copybara-docs-sync - Expiration: Choose based on your needs
- Repository access: Select “Only select repositories” → choose your public docs repo
- Permissions:
- Contents: Read and write
- Pull requests: Read and write
- Token name:
- Click Generate token and copy it
- Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
- Click Generate new token (classic)
- Select the
reposcope - Click Generate token and copy it
Step 2: Set Up Your Source Documentation
Section titled “Step 2: Set Up Your Source Documentation”In your internal repository, organize your docs with clear separation:
Create docs/public/getting-started.md:
---title: Getting Started---
# Getting Started with Our Product
Welcome to the official documentation!
## Installation
Download from https://internal.company.com/downloads
<!-- INTERNAL: Requires VPN access for internal users -->
## Quick Start
Run the following command:
\`\`\`bashour-cli init\`\`\`
<!-- BEGIN INTERNAL -->
### Debug Mode (Internal Only)
For internal testing, use:
\`\`\`bashour-cli init --debug --internal-api\`\`\`
<!-- END INTERNAL -->
## Next Steps
- Read the [API Reference](./api-reference.md)- Contact support@internal.company.com for helpNotice the markers:
| Marker | Purpose |
|---|---|
<!-- INTERNAL: ... --> | Single-line internal notes (will be removed) |
<!-- BEGIN INTERNAL --> … <!-- END INTERNAL --> | Multi-line internal sections (will be removed) |
These let you keep internal notes in your source while automatically stripping them during sync.
Step 3: Write the Copybara Configuration
Section titled “Step 3: Write the Copybara Configuration”Create copy.bara.sky in your internal repository root:
# Documentation Sync Configuration# Syncs docs/public/ to the public documentation repository via PR
INTERNAL_REPO = "https://github.com/YOUR_ORG/internal-repo"PUBLIC_DOCS_REPO = "https://github.com/YOUR_ORG/public-docs"
core.workflow( name = "sync-docs",
# Read from the internal repository origin = git.github_origin( url = INTERNAL_REPO, ref = "main", ),
# Create a PR on the public docs repository destination = git.github_pr_destination( url = PUBLIC_DOCS_REPO, destination_ref = "main",
# Branch name for the PR (includes commit SHA for uniqueness) pr_branch = "copybara/docs-sync-${CONTEXT_REFERENCE}",
# PR title and description title = "docs: sync from internal repository", body = """\## Automated Documentation Sync
This PR was automatically created by Copybara to sync documentation changes.
### What's included- All changes from `docs/public/` in the internal repository- Internal-only content has been automatically removed- Internal URLs have been replaced with public URLs
### SourceCommit: `${COPYBARA_CONTEXT_REFERENCE}`
---*Please review the changes and merge when ready.*""", # Update PR description if we push new changes update_description = True, ),
# Only sync the public documentation folder origin_files = glob( include = ["docs/public/**"], exclude = [ "**/*.draft.md", "**/*.draft.mdx", "**/INTERNAL_*.md", ], ),
# Don't overwrite these files in the destination # (they're manually maintained in the public repo) destination_files = glob( include = ["**"], exclude = [ "README.md", "CONTRIBUTING.md", "LICENSE", ".github/**", "CNAME", ], ),
# Preserve original authors, with fallback for automated commits authoring = authoring.pass_thru( default = "Documentation Bot <docs-bot@your-company.com>", ),
# Transformations applied in order transformations = [ # 1. Flatten the directory structure # docs/public/getting-started.md → getting-started.md core.move("docs/public/", ""),
# 2. Replace internal URLs with public ones core.replace( before = "https://internal.company.com", after = "https://docs.your-company.com", paths = glob(["**/*.md", "**/*.mdx"]), ),
# 3. Replace internal email domains core.replace( before = "@internal.company.com", after = "@your-company.com", paths = glob(["**/*.md", "**/*.mdx"]), ),
# 4. Remove single-line internal comments # <!-- INTERNAL: any text here --> core.replace( before = "<!-- INTERNAL: ${content} -->", after = "", regex_groups = {"content": "[^>]*"}, paths = glob(["**/*.md", "**/*.mdx"]), ),
# 5. Remove multi-line internal sections # <!-- BEGIN INTERNAL --> # ... anything here ... # <!-- END INTERNAL --> core.replace( before = "<!-- BEGIN INTERNAL -->${content}<!-- END INTERNAL -->", after = "", regex_groups = {"content": "[\\s\\S]*?"}, multiline = True, paths = glob(["**/*.md", "**/*.mdx"]), ),
# 6. Clean up extra blank lines left by removals core.replace( before = "\n\n\n", after = "\n\n", paths = glob(["**/*.md", "**/*.mdx"]), ),
# 7. SAFETY: Verify no internal content leaked through core.verify_match( regex = "INTERNAL|CONFIDENTIAL|internal\\.company\\.com|@internal\\.", verify_no_match = True, paths = glob(["**/*.md", "**/*.mdx"]), ),
# 8. Add sync metadata to commit message metadata.squash_notes( prefix = "Documentation sync:\n\n", show_description = True, show_author = True, oldest_first = True, ), ],
# Combine all changes into one commit mode = "SQUASH",)Understanding Key Parts
Section titled “Understanding Key Parts”git.github_pr_destination
Section titled “git.github_pr_destination”This is what makes Copybara create a PR instead of pushing directly:
destination = git.github_pr_destination( url = PUBLIC_DOCS_REPO, destination_ref = "main", # Target branch for the PR pr_branch = "copybara/docs-sync-${CONTEXT_REFERENCE}", title = "docs: sync from internal", body = "...", update_description = True, # Update PR description on re-runs)destination_files
Section titled “destination_files”Protects files that exist only in the public repo:
destination_files = glob( include = ["**"], exclude = ["README.md", "CONTRIBUTING.md", ".github/**"],)Files in exclude won’t be deleted even if they don’t exist in the source.
core.verify_match
Section titled “core.verify_match”The safety net - fails the sync if internal content would leak:
core.verify_match( regex = "INTERNAL|CONFIDENTIAL|internal\\.company\\.com", verify_no_match = True,)Step 4: Test Locally
Section titled “Step 4: Test Locally”Before automating, test the sync locally:
# Clone your internal repogit clone https://github.com/YOUR_ORG/internal-repocd internal-repo
# Set up Git credentials (use your PAT)git config --global credential.helper storeecho "https://YOUR_USERNAME:YOUR_PAT@github.com" >> ~/.git-credentials
# Run Copybara (first run needs --force)java -jar copybara.jar migrate copy.bara.sky sync-docs --forceCheck GitHub - you should see a new PR on your public docs repository!
Step 5: Set Up GitHub Actions
Section titled “Step 5: Set Up GitHub Actions”Automate the sync to run whenever docs change.
Create .github/workflows/sync-docs.yml in your internal repository:
name: Sync Documentation
on: push: branches: [main] paths: - "docs/public/**" - "copy.bara.sky" workflow_dispatch: # Allow manual trigger
jobs: sync: name: Sync to Public Docs runs-on: ubuntu-latest
steps: - name: Checkout repository uses: actions/checkout@v4 with: fetch-depth: 0 # Full history needed for Copybara
- name: Install D2 (for diagrams) run: curl -fsSL https://d2lang.com/install.sh | sh -s --
- name: Set up Java uses: actions/setup-java@v4 with: distribution: temurin java-version: "21"
- name: Download Copybara run: | curl -fsSL -o copybara.jar \ https://github.com/google/copybara/releases/latest/download/copybara_deploy.jar
- name: Configure Git credentials run: | git config --global user.name "github-actions[bot]" git config --global user.email "github-actions[bot]@users.noreply.github.com" git config --global credential.helper store echo "https://x-access-token:${{ secrets.DOCS_SYNC_TOKEN }}@github.com" >> ~/.git-credentials
- name: Run Copybara run: | java -jar copybara.jar migrate copy.bara.sky sync-docs --ignore-noopAdd the Secret
Section titled “Add the Secret”- Go to your internal repository → Settings → Secrets and variables → Actions
- Click New repository secret
- Name:
DOCS_SYNC_TOKEN - Value: Paste your GitHub PAT from Step 1
Step 6: The PR Workflow
Section titled “Step 6: The PR Workflow”Here’s what happens when documentation changes:
PR Lifecycle
Section titled “PR Lifecycle”| Scenario | What Happens |
|---|---|
| First sync | New PR is created |
| More changes (PR still open) | PR branch is updated (force push) |
| PR was merged | New PR is created for new changes |
| PR was closed without merging | New PR is created |
Marking Internal Content
Section titled “Marking Internal Content”Use these patterns in your documentation:
Single-Line Internal Notes
Section titled “Single-Line Internal Notes”This is public content.
<!-- INTERNAL: Remember to update the staging server first -->
More public content.After sync:
This is public content.
More public content.Multi-Line Internal Sections
Section titled “Multi-Line Internal Sections”## Public Feature
Public description here.
<!-- BEGIN INTERNAL -->
### Internal Implementation Notes
This entire section is stripped during sync.
- Internal detail 1- Internal detail 2
<!-- END INTERNAL -->
## Another Public SectionAfter sync:
## Public Feature
Public description here.
## Another Public SectionInternal-Only Files
Section titled “Internal-Only Files”Any file matching these patterns is completely excluded:
*.draft.md- Draft documentsINTERNAL_*.md- Files prefixed with INTERNAL_- Anything in
excludepatterns
Handling Images and Assets
Section titled “Handling Images and Assets”Images in docs/public/images/ are synced along with the markdown:
origin_files = glob( include = [ "docs/public/**/*.md", "docs/public/**/*.mdx", "docs/public/**/*.png", "docs/public/**/*.jpg", "docs/public/**/*.gif", "docs/public/**/*.svg", ], exclude = [...],)If your images use absolute paths, add a transformation:
# Update image paths after flatteningcore.replace( before = "](/docs/public/images/", after = "](/images/", paths = glob(["**/*.md"]),)Troubleshooting
Section titled “Troubleshooting””Cannot create pull request”
Section titled “”Cannot create pull request””Cause: Token doesn’t have PR permissions.
Fix: Ensure your token has pull_requests: write permission for the destination repository.
”verify_match found matches”
Section titled “”verify_match found matches””Cause: Internal content would leak to public.
Fix: Check which file triggered it:
java -jar copybara.jar migrate copy.bara.sky sync-docs --force 2>&1 | grep -A5 "verify_match"Then either:
- Remove the internal content from the source
- Wrap it in
<!-- BEGIN INTERNAL -->markers - Add the file to
excludepatterns
”Nothing to migrate”
Section titled “”Nothing to migrate””Cause: No new changes since last sync.
Fix: This is normal! Use --ignore-noop in CI to not fail:
java -jar copybara.jar migrate copy.bara.sky sync-docs --ignore-noop“Branch already exists”
Section titled ““Branch already exists””Cause: PR branch exists from a previous sync.
Fix: This is normal - Copybara will update the existing branch. If you want a fresh start:
git push origin --delete copybara/docs-sync-abc1234PR shows more changes than expected
Section titled “PR shows more changes than expected”Cause: destination_files might not be excluding manually-maintained files.
Fix: Add those files to the exclude list:
destination_files = glob( include = ["**"], exclude = [ "README.md", "CONTRIBUTING.md", "YOUR_MANUAL_FILE.md", # Add this ],)Complete Working Example
Section titled “Complete Working Example”Here’s the full configuration with all pieces:
INTERNAL = "https://github.com/acme/internal-monorepo"PUBLIC_DOCS = "https://github.com/acme/developer-docs"
core.workflow( name = "sync-docs",
origin = git.github_origin( url = INTERNAL, ref = "main", ),
destination = git.github_pr_destination( url = PUBLIC_DOCS, destination_ref = "main", pr_branch = "copybara/docs-${CONTEXT_REFERENCE}", title = "docs: automated sync from internal", body = """\## Documentation Sync
Automated sync from internal repository.
**Commit:** `${COPYBARA_CONTEXT_REFERENCE}`
Please review and merge when ready.""", update_description = True, ),
origin_files = glob( include = ["docs/public/**"], exclude = ["**/*.draft.md", "**/INTERNAL_*"], ),
destination_files = glob( include = ["**"], exclude = ["README.md", "CONTRIBUTING.md", ".github/**", "CNAME"], ),
authoring = authoring.pass_thru( default = "Docs Bot <docs@acme.com>" ),
transformations = [ core.move("docs/public/", ""), core.replace("internal.acme.com", "docs.acme.com"), core.replace("@internal.acme.com", "@acme.com"), core.replace( before = "<!-- INTERNAL: ${note} -->", after = "", regex_groups = {"note": "[^>]*"}, ), core.replace( before = "<!-- BEGIN INTERNAL -->${content}<!-- END INTERNAL -->", after = "", regex_groups = {"content": "[\\s\\S]*?"}, multiline = True, ), core.verify_match( regex = "INTERNAL|internal\\.acme\\.com", verify_no_match = True, ), ],
mode = "SQUASH",)What’s Next?
Section titled “What’s Next?”- CLI Reference - All Copybara commands and flags
- Transformations - More ways to modify content
- GitHub Integration - Advanced GitHub features
- CHANGE_REQUEST Mode - Deep dive into PR workflows