github BlessedRebuS/Krawl v2.1.0

5 hours ago

Krawl 2.1.0

New Features

Self-Hosted LLM Support (#180, closes #176)

by @BlessedRebuS

Krawl can now generate deception pages using locally-hosted LLMs instead of cloud providers. Two engines are supported out of the box via Docker Compose:

  • llama.cpp — lightweight C++ inference server, runs GGUF models directly with minimal overhead. Ideal when you want full control and raw performance over a single model.
  • Ollama — wraps llama.cpp with a model management layer, making model switching easier at the cost of a small performance overhead.

Both expose an OpenAI-compatible HTTP API, so the existing ai.provider: "openai" path is reused — just point openai_base_url at the local service. In Docker Compose deployments the service name is krawl-llm (http://krawl-llm:8080/v1). Both services are included in docker-compose.yaml as opt-in (commented out by default).

Tested with Qwen 1.5-1.8B and Qwen 3.5-4B. Larger models produce more realistic pages; smaller ones are sufficient for basic JSON/HTML generation.

See docs/ai_generation.md for full setup instructions.


OpenAI Base URL Parameter (#177)

by @Matthias-vdE

A new optional openai_base_url config field (and KRAWL_AI_OPENAI_BASE_URL env var) overrides the default api.openai.com endpoint when provider is set to openai. Useful for routing through LiteLLM, custom proxies, or self-hosted OpenAI-compatible services.

ai:
  provider: "openai"
  openai_base_url: "http://your-proxy/v1"
  api_key: "your-key"

Dashboard Warmup Aggregation & Noise Filtering (#184)

by @Lore09

Three new configuration options improve dashboard performance at scale:

Option Env var Default Description
dashboard.warmup_pages KRAWL_DASHBOARD_WARMUP_PAGES 10 Pages to pre-warm per table panel
dashboard.warmup_aggregation KRAWL_DASHBOARD_WARMUP_AGGREGATION false Pre-compute full top_paths/top_ua aggregations so any page is served instantly with zero DB queries
dashboard.top_n_min_count KRAWL_DASHBOARD_TOP_N_MIN_COUNT 5 Minimum access count to appear in top paths/user-agents panels — filters noise at the DB level

warmup_aggregation is enabled by default in all scalable deployments (Helm, Kubernetes manifests, and Docker Compose scalable files). The aggregation cache is paginated in-memory so any page of the top-N panel is served instantly without additional queries.


Default favicon.ico (#179)

by @Matthias-vdE

Krawl now ships a default favicon.ico (based on the Krawl spider logo) so that browser requests for /favicon.ico are answered correctly instead of generating an AI deception page. A custom favicon can be placed at ./data/favicon.ico to override the default.


Bug Fixes

Fix wp-login Attack Misclassification (#187, closes #186)

by @3isenHeiM

/wp-login.php POST requests were incorrectly classified as COMMAND_INJECTION because the HTML form field named pwd matched the pwd shell command pattern. The form field has been renamed to password, so submissions are now correctly captured as CREDENTIALS CAPTURED / LOGIN ATTEMPT.

Before:

[COMMAND_INJECTION DETECTED] X.X.X.X - /wp-login.php - Method: POST

After:

[LOGIN ATTEMPT] X.X.X.X - /wp-login.php - Mozilla/5.0
[CREDENTIALS CAPTURED] X.X.X.X - Username: admin - Path: /wp-login.php

Fix Dashboard Authentication (#182)

by @Lore09

The dashboard login was broken due to a mismatch between client-side SHA-256 hashing and the backend comparison. Authentication now uses plain-text password comparison end-to-end, removing the client-side hashing step. Also fixed Docker Compose volume paths in docker/ subdirectory variants to correctly use parent-directory references (../wordlists.json, etc.).


Improvements

Improved Attack Type Detection (#188)

by @Lore09

  • Added a login_attempt regex pattern covering common auth endpoints (/login, /signin, /wp-login.php, /admin/login, etc.) so brute-force and credential stuffing hits are classified correctly instead of falling through to other categories.
  • Refined command_injection regex to better detect chained shell commands and reduce false positives on POST body fields.
  • Introduced scoring_weights to fine-tune the classification scoring for attackers, good crawlers, bad crawlers, and regular users.
  • Extended attack patterns and command outputs for richer honeypot coverage.

Improved CI/CD Workflows (#190)

by @Lore09

  • Replaced flake8/pylint with Ruff for linting. Ruff rules now include security checks (S) alongside standard style/import/upgrade rules.
  • Added Bandit static security analysis with automatic PR commenting: findings are posted directly on the pull request for visibility.
  • Replaced safety with pip-audit for dependency vulnerability scanning.
  • Added pip-missing-reqs and pip-extra-reqs checks to catch unused or undeclared dependencies.
  • Removed the beta branch from all workflow triggers — CI now runs only on main and dev.

What's Changed

New Contributors

Full Changelog: v2.0.0...v2.1.0

Don't miss a new Krawl release

NewReleases is sending notifications on new releases.