Krawl 2.1.0

New Features

Self-Hosted LLM Support (#180, closes #176)

Krawl can now generate deception pages using locally-hosted LLMs instead of cloud providers. Two engines are supported out of the box via Docker Compose:

llama.cpp — lightweight C++ inference server, runs GGUF models directly with minimal overhead. Ideal when you want full control and raw performance over a single model.
Ollama — wraps llama.cpp with a model management layer, making model switching easier at the cost of a small performance overhead.

Both expose an OpenAI-compatible HTTP API, so the existing ai.provider: "openai" path is reused — just point openai_base_url at the local service. In Docker Compose deployments the service name is krawl-llm (http://krawl-llm:8080/v1). Both services are included in docker-compose.yaml as opt-in (commented out by default).

Tested with Qwen 1.5-1.8B and Qwen 3.5-4B. Larger models produce more realistic pages; smaller ones are sufficient for basic JSON/HTML generation.

See docs/ai_generation.md for full setup instructions.

OpenAI Base URL Parameter (#177)

by @Matthias-vdE

A new optional openai_base_url config field (and KRAWL_AI_OPENAI_BASE_URL env var) overrides the default api.openai.com endpoint when provider is set to openai. Useful for routing through LiteLLM, custom proxies, or self-hosted OpenAI-compatible services.

ai:
  provider: "openai"
  openai_base_url: "http://your-proxy/v1"
  api_key: "your-key"

Dashboard Warmup Aggregation & Noise Filtering (#184)

by @Lore09

Three new configuration options improve dashboard performance at scale:

Option	Env var	Default	Description
`dashboard.warmup_pages`	`KRAWL_DASHBOARD_WARMUP_PAGES`	`10`	Pages to pre-warm per table panel
`dashboard.warmup_aggregation`	`KRAWL_DASHBOARD_WARMUP_AGGREGATION`	`false`	Pre-compute full top_paths/top_ua aggregations so any page is served instantly with zero DB queries
`dashboard.top_n_min_count`	`KRAWL_DASHBOARD_TOP_N_MIN_COUNT`	`5`	Minimum access count to appear in top paths/user-agents panels — filters noise at the DB level

warmup_aggregation is enabled by default in all scalable deployments (Helm, Kubernetes manifests, and Docker Compose scalable files). The aggregation cache is paginated in-memory so any page of the top-N panel is served instantly without additional queries.

Default favicon.ico (#179)

by @Matthias-vdE

Krawl now ships a default favicon.ico (based on the Krawl spider logo) so that browser requests for /favicon.ico are answered correctly instead of generating an AI deception page. A custom favicon can be placed at ./data/favicon.ico to override the default.

Bug Fixes

Fix wp-login Attack Misclassification (#187, closes #186)

by @3isenHeiM

/wp-login.php POST requests were incorrectly classified as COMMAND_INJECTION because the HTML form field named pwd matched the pwd shell command pattern. The form field has been renamed to password, so submissions are now correctly captured as CREDENTIALS CAPTURED / LOGIN ATTEMPT.

Before:

[COMMAND_INJECTION DETECTED] X.X.X.X - /wp-login.php - Method: POST

After:

[LOGIN ATTEMPT] X.X.X.X - /wp-login.php - Mozilla/5.0
[CREDENTIALS CAPTURED] X.X.X.X - Username: admin - Path: /wp-login.php

Fix Dashboard Authentication (#182)

by @Lore09

The dashboard login was broken due to a mismatch between client-side SHA-256 hashing and the backend comparison. Authentication now uses plain-text password comparison end-to-end, removing the client-side hashing step. Also fixed Docker Compose volume paths in docker/ subdirectory variants to correctly use parent-directory references (../wordlists.json, etc.).

Improvements

Improved Attack Type Detection (#188)

by @Lore09

Added a login_attempt regex pattern covering common auth endpoints (/login, /signin, /wp-login.php, /admin/login, etc.) so brute-force and credential stuffing hits are classified correctly instead of falling through to other categories.
Refined command_injection regex to better detect chained shell commands and reduce false positives on POST body fields.
Introduced scoring_weights to fine-tune the classification scoring for attackers, good crawlers, bad crawlers, and regular users.
Extended attack patterns and command outputs for richer honeypot coverage.

Improved CI/CD Workflows (#190)

by @Lore09

Replaced flake8/pylint with Ruff for linting. Ruff rules now include security checks (S) alongside standard style/import/upgrade rules.
Added Bandit static security analysis with automatic PR commenting: findings are posted directly on the pull request for visibility.
Replaced safety with pip-audit for dependency vulnerability scanning.
Added pip-missing-reqs and pip-extra-reqs checks to catch unused or undeclared dependencies.
Removed the beta branch from all workflow triggers — CI now runs only on main and dev.

What's Changed

Add OpenAI Base URL optional parameter by @Matthias-vdE in #177
Update version and appVersion to 2.0.1 by @BlessedRebuS in #178
Add default favicon.ico by @Matthias-vdE in #179
Added llama.cpp and ollama self-hosted alternative by @BlessedRebuS in #180
Fix/failed authentication by @Lore09 in #182
Feat/improve dashboard responsiveness by @Lore09 in #184
Fixed wrong wp-login attack categorisation (fix #186) by @3isenHeiM in #187
Feat/improve attack types detection by @Lore09 in #188
Improving Github Workflows by @Lore09 in #190

New Contributors

@3isenHeiM made their first contribution in #187

Full Changelog: v2.0.0...v2.1.0

BlessedRebuS/Krawl v2.1.0 on GitHub

Krawl 2.1.0

New Features

Self-Hosted LLM Support (#180, closes #176)

OpenAI Base URL Parameter (#177)

Dashboard Warmup Aggregation & Noise Filtering (#184)

Default favicon.ico (#179)

Bug Fixes

Fix wp-login Attack Misclassification (#187, closes #186)

Fix Dashboard Authentication (#182)

Improvements

Improved Attack Type Detection (#188)

Improved CI/CD Workflows (#190)

What's Changed

New Contributors

BlessedRebuS/Krawl v2.1.0
on GitHub