github thalesgroup-cert/Watcher v2.4

one month ago

v2.4

This release brings major improvements to the Threat Watcher module, including a new word reliability scoring, state-of-the-art NER detection, reduced false positives, a smarter trending algorithm, and several bug fixes and optimizations.

Update Procedure for Docker

Please follow this process:

[WARNING] Manual Deletion Step:

This operation will permanently delete all existing data in the Source, BannedWord, and TrendyWord tables.
If you have custom sources, banned words, or other critical data, make sure to back them up or export them before proceeding.

Before anything else, clean existing data to avoid conflicts. Run the following commands in the Django shell in this order:

python manage.py shell -c "from threats_watcher.models import Source, BannedWord, TrendyWord; Source.objects.all().delete(); BannedWord.objects.all().delete(); TrendyWord.objects.all().delete()"

Then continue with the update procedure:

  1. Pull the latest Docker image from the repository.

  2. Rebuild the Docker image (important for the new dependencies):

    docker compose build
  3. Stop all containers:

    docker compose down
  4. Apply database migrations and Repopulate the database with the new blocklist and sources (new fields added):

    docker compose run watcher bash
    python manage.py migrate
    python manage.py populate_db
  5. Restart the containers:

    docker compose up
If you run Watcher without Docker

1. Install all system dependencies

 sudo apt update && sudo apt install -y \
     build-essential \
     libsasl2-dev \
     libldap2-dev \
     libssl-dev \
     curl \
     git

2. Install Rust (required for tokenizers/transformers)

curl https://sh.rustup.rs -sSf | sh -s -- -y
source $HOME/.cargo/env

3. (Re)install Python dependencies

pip install --upgrade pip
pip install --no-cache-dir -r requirements.txt

4. Install torch, torchvision, torchaudio with CPU support

pip install --extra-index-url https://download.pytorch.org/whl/cpu torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0

5. Install NLTK dependencies

python ./nltk_dependencies.py

What’s Changed

ThreatWatcher – Major Improvements

  • Reliability scoring for each trending word:

    • Each source in sources.csv now features a confident score (1 = 100%, 2 = 50%, 3 = 20%).
    • The reliability for each word is the average confidence of the sources where it appeared.
    • New field shown in UI (“Reliability %” column).
  • Entity extraction now uses BERT-base-NER:

    • Improved word/entity detection in news titles.
    • 10× smaller blocklist needed; blocklist file reduced.
    • Vastly fewer false positives.
    • For more information on BERT-base-NER : https://huggingface.co/dslim/bert-base-NER
  • Trending algorithm refactor:

    • Now only the last 30 days of news headlines are used for trending word calculation.
    • Old: Words could “dominate” from historic surges (e.g. 200 hits a year ago + 1 this month = trending).
    • New: Words must truly be trending this month to rank.
    • Minimum occurrences for trend detection reduced from 7 → 5.
  • Improved testing coverage:

    • Three new unit tests added in the backend to validate recent changes.
    • Existing frontend tests adjusted to reflect UI updates (e.g. Reliability column).
  • Improved Entity Detection, Reliability Scoring, and Trending Algorithm by @ygalnezri and @LeonNadot in #224

  • v2.4 by @ygalnezri and @LeonNadot in #225

Breaking changes & warnings

  • If you use custom code for word parsing/blocklist:
    • Review your blocklist (now much smaller).
    • Word detection logic has changed (BERT, NER).
  • sources.csv structure:
    • Now requires a confident column.
    • Ensure your source feeds are updated to match the new format.
  • Database migration required (new fields).
  • Minimum word occurrence is now 5 (was 7), can be changed in settings.py.

Full Changelog: v2.3...v2.4

Don't miss a new Watcher release

NewReleases is sending notifications on new releases.