Skip to content

Filtering & Verification

Control which files are synced and verify no sensitive content is leaked.

Remove files matching a pattern:

core.remove(glob(["**/internal/**"]))
# Remove internal directories
core.remove(glob(["**/internal/**"]))
# Remove specific file types
core.remove(glob(["**/*.bak", "**/*.tmp"]))
# Remove test files
core.remove(glob(["**/*_test.go", "**/test_*.py"]))
# Remove build artifacts
core.remove(glob([
"**/node_modules/**",
"**/dist/**",
"**/__pycache__/**",
]))

Two ways to exclude files:

Files never leave the origin:

core.workflow(
origin_files = glob(
include = ["**"],
exclude = ["**/internal/**"],
),
...
)

Files are read, then removed:

transformations = [
core.remove(glob(["**/internal/**"])),
]

Control what Copybara can modify in the destination:

core.workflow(
# Only modify files under src/
destination_files = glob(["src/**"]),
...
)
# Don't touch manually-managed files
destination_files = glob(
include = ["**"],
exclude = [
"README.md", # External readme
"CONTRIBUTING.md", # External contribution guide
".github/**", # External CI/CD
],
)

Verify patterns exist or don’t exist:

core.verify_match(
regex = "API_KEY|SECRET|PASSWORD",
verify_no_match = True, # Fail if pattern is found
)
core.verify_match(
regex = "Copyright.*Google",
paths = glob(["**/*.java"]),
# verify_no_match defaults to False
# Fails if pattern is NOT found
)
transformations = [
# Check for various secret patterns
core.verify_match(
regex = "INTERNAL_SECRET",
verify_no_match = True,
),
core.verify_match(
regex = "@internal\\.corp\\.com",
verify_no_match = True,
),
core.verify_match(
regex = "api-key-[a-z0-9]+",
verify_no_match = True,
),
]
# Comprehensive secret detection
core.verify_match(
regex = """(?i)(
api[_-]?key|
secret[_-]?key|
password|
credential|
private[_-]?key|
access[_-]?token|
auth[_-]?token
)[\"']?\\s*[:=]\\s*[\"'][^\"']+[\"']""",
verify_no_match = True,
)

Use multiple filtering mechanisms together:

core.workflow(
# Step 1: Don't read internal files
origin_files = glob(
include = ["**"],
exclude = ["**/internal/**"],
),
# Step 2: Don't touch external files
destination_files = glob(
include = ["**"],
exclude = ["README.md", ".github/**"],
),
transformations = [
# Step 3: Remove any remaining sensitive files
core.remove(glob(["**/*.secret", "**/credentials.*"])),
# Step 4: Remove sensitive content
core.replace(
before = "// SECRET: ${content}\n",
after = "",
regex_groups = {"content": ".*"},
),
# Step 5: Verify nothing slipped through
core.verify_match(
regex = "INTERNAL|SECRET|CONFIDENTIAL",
verify_no_match = True,
),
],
)

When verify_no_match fails:

WARN:
Pattern 'API_KEY' found in file(s):
src/config.py:42: API_KEY = "sk-..."
Verification failed: Pattern should not match but did.

Validate your filters locally:

Terminal window
# Preview what would be synced
java -jar copybara.jar migrate copy.bara.sky workflow \
--folder-destination /tmp/output
# Search for sensitive patterns
grep -r "internal\|secret\|api.key" /tmp/output