github betterleaks/betterleaks v1.2.0

4 hours ago

What's New

GitHub Source

You can now scan GitHub resources natively with Betterleaks.
The GitHub source has resources that can be included or excluded w/ cli options (--include/--exclude).

# Scan GitHub org (defaults to only scanning repos) 
betterleaks github https://github.com/betterleaks

# Scan GitHub org (all resources)
betterleaks github https://github.com/betterleaks --include prs,pr-comments,issues,issue-comments,discussions,releases,release-assets,actions,action-artifacts

# Scan GitHub user (w/ gists)
betterleaks github https://github.com/cooluser123456789 --gists

# Scan GitHub org but exclude certain repos (glob matching)
betterleaks github https://github.com/betterleaks --exclude-repo **/*betterleaks

# Scan specific resource, like a PR... but exclude the description (only scan comments)
betterleaks github https://github.com/betterleaks/betterleaks/pull/113 --exclude pr-comments

Check the scanning docs for more examples.

CEL-filtering (bye bye allowlists)

Filters replace legacy allowlists, entropy checks, and token efficiency checks with dynamic Common Expression Language (CEL) statements. If a filter expression evaluates to true, the item is skipped/discarded.

  • prefilter: Exists only at the global level. It evaluates before any regex runs and only has access to file/commit metadata (attributes). Use this to entirely bypass binary files or bot commits.
  • filter: Exists globally and per-rule. It evaluates after a regex match is found and has access to both attributes and the finding itself.

Note that safe attribute access requires somewhat cumbersome syntax, attributes.[?"key"].orValue(""). If key does not exist in the attributes map, then it will default to using an empty string, "".

Available filter bindings

Binding / Function Description
attributes A map of metadata. Keys include: path, git.sha, git.author_name, git.author_email, git.date, git.message, git.remote_url, git.platform, fs.symlink. Full list of available keys available here.
finding A map representing the secret. Keys include: secret (the extracted value), match (the full regex match), line (the line of code), rule_id, and description.
matchesAny(string, list) Returns true if the string matches any of the provided regex patterns.
containsAny(string, list) Returns true if the string contains any of the provided strings (uses an efficient Aho-Corasick substring match).
entropy(string) Returns the Shannon entropy (float) of the string. Useful for filtering out non-random placeholders.
failsTokenEfficiency(string) Returns true if the string tokenizes too efficiently (i.e., it looks like natural language instead of a random secret).

Example filter CEL expression:

filter = '''
(
    // Ignore if authored by a bot AND inside the fixtures folder AND the secret contains a known test string.
    attributes[?"git.author_name"].orValue("").endsWith("[bot]") &&
    attributes[?"path"].orValue("").startsWith("tests/fixtures/") &&
    containsAny(finding["secret"], ["_MOCK_", "_TEST_"])
)
||
(
    // Ignore if it's a Markdown or text file AND the specific line of code contains instructional text.
    matchesAny(attributes[?"path"].orValue(""), [r"""(?i)\.(?:md|txt|csv)$"""]) &&
    (
        containsAny(finding["line"], ["Example:", "Placeholder:", "Replace this with"]) ||
        finding["secret"] == "SUPER_SECRET_EXAMPLE_KEY_12345"
    )
)
||
(
    // Ignore if the entropy is low AND it tokenizes like natural language instead of a random string.
    entropy(finding["secret"]) <= 2.5 &&
    failsTokenEfficiency(finding["secret"])
)
'''

Existing allowlists will still work! Internally there is a translation layer that converts allowlists to equivalent CEL expressions.

One of the coolest things about moving towards CEL for filtering is now we can filter on source specific attributes without changing anything in the filtering engine itself. You just have to set attributes in the source if you're authoring a new source. So for example, since we added a new GitHub source with this release, we can filter on GitHub source attributes like this:

prefilter = '''
attributes[?"github.release"].orValue("") == "v1.2.0"
'''

In this example prefilter, we are telling the engine (in the github.go source specifically, to bail out early and not download or scan the v1.2.0 release).

Changelog

Don't miss a new betterleaks release

NewReleases is sending notifications on new releases.