What's New
GitHub Source
You can now scan GitHub resources natively with Betterleaks.
The GitHub source has resources that can be included or excluded w/ cli options (--include/--exclude).
# Scan GitHub org (defaults to only scanning repos)
betterleaks github https://github.com/betterleaks
# Scan GitHub org (all resources)
betterleaks github https://github.com/betterleaks --include prs,pr-comments,issues,issue-comments,discussions,releases,release-assets,actions,action-artifacts
# Scan GitHub user (w/ gists)
betterleaks github https://github.com/cooluser123456789 --gists
# Scan GitHub org but exclude certain repos (glob matching)
betterleaks github https://github.com/betterleaks --exclude-repo **/*betterleaks
# Scan specific resource, like a PR... but exclude the description (only scan comments)
betterleaks github https://github.com/betterleaks/betterleaks/pull/113 --exclude pr-comments
Check the scanning docs for more examples.
CEL-filtering (bye bye allowlists)
Filters replace legacy allowlists, entropy checks, and token efficiency checks with dynamic Common Expression Language (CEL) statements. If a filter expression evaluates to true, the item is skipped/discarded.
prefilter: Exists only at the global level. It evaluates before any regex runs and only has access to file/commit metadata (attributes). Use this to entirely bypass binary files or bot commits.filter: Exists globally and per-rule. It evaluates after a regex match is found and has access to bothattributesand thefindingitself.
Note that safe attribute access requires somewhat cumbersome syntax, attributes.[?"key"].orValue(""). If key does not exist in the attributes map, then it will default to using an empty string, "".
Available filter bindings
| Binding / Function | Description |
|---|---|
attributes
| A map of metadata. Keys include: path, git.sha, git.author_name, git.author_email, git.date, git.message, git.remote_url, git.platform, fs.symlink. Full list of available keys available here.
|
finding
| A map representing the secret. Keys include: secret (the extracted value), match (the full regex match), line (the line of code), rule_id, and description.
|
matchesAny(string, list)
| Returns true if the string matches any of the provided regex patterns.
|
containsAny(string, list)
| Returns true if the string contains any of the provided strings (uses an efficient Aho-Corasick substring match).
|
entropy(string)
| Returns the Shannon entropy (float) of the string. Useful for filtering out non-random placeholders. |
failsTokenEfficiency(string)
| Returns true if the string tokenizes too efficiently (i.e., it looks like natural language instead of a random secret).
|
Example filter CEL expression:
filter = '''
(
// Ignore if authored by a bot AND inside the fixtures folder AND the secret contains a known test string.
attributes[?"git.author_name"].orValue("").endsWith("[bot]") &&
attributes[?"path"].orValue("").startsWith("tests/fixtures/") &&
containsAny(finding["secret"], ["_MOCK_", "_TEST_"])
)
||
(
// Ignore if it's a Markdown or text file AND the specific line of code contains instructional text.
matchesAny(attributes[?"path"].orValue(""), [r"""(?i)\.(?:md|txt|csv)$"""]) &&
(
containsAny(finding["line"], ["Example:", "Placeholder:", "Replace this with"]) ||
finding["secret"] == "SUPER_SECRET_EXAMPLE_KEY_12345"
)
)
||
(
// Ignore if the entropy is low AND it tokenizes like natural language instead of a random string.
entropy(finding["secret"]) <= 2.5 &&
failsTokenEfficiency(finding["secret"])
)
'''Existing allowlists will still work! Internally there is a translation layer that converts allowlists to equivalent CEL expressions.
One of the coolest things about moving towards CEL for filtering is now we can filter on source specific attributes without changing anything in the filtering engine itself. You just have to set attributes in the source if you're authoring a new source. So for example, since we added a new GitHub source with this release, we can filter on GitHub source attributes like this:
prefilter = '''
attributes[?"github.release"].orValue("") == "v1.2.0"
'''
In this example prefilter, we are telling the engine (in the github.go source specifically, to bail out early and not download or scan the v1.2.0 release).
Changelog
- ff43d2a Dynamic attributes (prep work for more sources) and
Run()entry (better api) (#92) - 007410e Feat/cel filters (#100)
- 7f689bb GitHub source + tweaks (#115)
- 42d8f28 Update build command in README (#90)
- 3947777 feat(rule): add MongoDB connection string (#94)
- 296fee3 fix up validation filtering (#116)
- baf13cb fix: do not crash when workers exceed commits (#99)
- 0a41896 fix: do not pass filenames to betterleaks (#91)
- c65402b ovhcloud rules, make crypto bindings consistent (#102)
- 267028d promote validation (#84)
- 6208265 that perp skerp (#95)