Filtering & Verification
Filtering & Verification
Section titled “Filtering & Verification”Control which files are synced and verify no sensitive content is leaked.
core.remove
Section titled “core.remove”Remove files matching a pattern:
core.remove(glob(["**/internal/**"]))Examples
Section titled “Examples”# Remove internal directoriescore.remove(glob(["**/internal/**"]))
# Remove specific file typescore.remove(glob(["**/*.bak", "**/*.tmp"]))
# Remove test filescore.remove(glob(["**/*_test.go", "**/test_*.py"]))
# Remove build artifactscore.remove(glob([ "**/node_modules/**", "**/dist/**", "**/__pycache__/**",]))origin_files vs core.remove
Section titled “origin_files vs core.remove”Two ways to exclude files:
origin_files (Preferred)
Section titled “origin_files (Preferred)”Files never leave the origin:
core.workflow( origin_files = glob( include = ["**"], exclude = ["**/internal/**"], ), ...)core.remove
Section titled “core.remove”Files are read, then removed:
transformations = [ core.remove(glob(["**/internal/**"])),]destination_files
Section titled “destination_files”Control what Copybara can modify in the destination:
core.workflow( # Only modify files under src/ destination_files = glob(["src/**"]), ...)Protect External Files
Section titled “Protect External Files”# Don't touch manually-managed filesdestination_files = glob( include = ["**"], exclude = [ "README.md", # External readme "CONTRIBUTING.md", # External contribution guide ".github/**", # External CI/CD ],)core.verify_match
Section titled “core.verify_match”Verify patterns exist or don’t exist:
Verify No Secrets
Section titled “Verify No Secrets”core.verify_match( regex = "API_KEY|SECRET|PASSWORD", verify_no_match = True, # Fail if pattern is found)Verify Required Content
Section titled “Verify Required Content”core.verify_match( regex = "Copyright.*Google", paths = glob(["**/*.java"]), # verify_no_match defaults to False # Fails if pattern is NOT found)Multiple Patterns
Section titled “Multiple Patterns”transformations = [ # Check for various secret patterns core.verify_match( regex = "INTERNAL_SECRET", verify_no_match = True, ), core.verify_match( regex = "@internal\\.corp\\.com", verify_no_match = True, ), core.verify_match( regex = "api-key-[a-z0-9]+", verify_no_match = True, ),]Common Secret Patterns
Section titled “Common Secret Patterns”# Comprehensive secret detectioncore.verify_match( regex = """(?i)( api[_-]?key| secret[_-]?key| password| credential| private[_-]?key| access[_-]?token| auth[_-]?token )[\"']?\\s*[:=]\\s*[\"'][^\"']+[\"']""", verify_no_match = True,)Combining Filters
Section titled “Combining Filters”Use multiple filtering mechanisms together:
core.workflow( # Step 1: Don't read internal files origin_files = glob( include = ["**"], exclude = ["**/internal/**"], ),
# Step 2: Don't touch external files destination_files = glob( include = ["**"], exclude = ["README.md", ".github/**"], ),
transformations = [ # Step 3: Remove any remaining sensitive files core.remove(glob(["**/*.secret", "**/credentials.*"])),
# Step 4: Remove sensitive content core.replace( before = "// SECRET: ${content}\n", after = "", regex_groups = {"content": ".*"}, ),
# Step 5: Verify nothing slipped through core.verify_match( regex = "INTERNAL|SECRET|CONFIDENTIAL", verify_no_match = True, ), ],)Error Messages
Section titled “Error Messages”When verify_no_match fails:
WARN: Pattern 'API_KEY' found in file(s): src/config.py:42: API_KEY = "sk-..."
Verification failed: Pattern should not match but did.Testing Filters
Section titled “Testing Filters”Validate your filters locally:
# Preview what would be syncedjava -jar copybara.jar migrate copy.bara.sky workflow \ --folder-destination /tmp/output
# Search for sensitive patternsgrep -r "internal\|secret\|api.key" /tmp/output