Krawl 2.1.0
New Features
Self-Hosted LLM Support (#180, closes #176)
Krawl can now generate deception pages using locally-hosted LLMs instead of cloud providers. Two engines are supported out of the box via Docker Compose:
- llama.cpp — lightweight C++ inference server, runs GGUF models directly with minimal overhead. Ideal when you want full control and raw performance over a single model.
- Ollama — wraps llama.cpp with a model management layer, making model switching easier at the cost of a small performance overhead.
Both expose an OpenAI-compatible HTTP API, so the existing ai.provider: "openai" path is reused — just point openai_base_url at the local service. In Docker Compose deployments the service name is krawl-llm (http://krawl-llm:8080/v1). Both services are included in docker-compose.yaml as opt-in (commented out by default).
Tested with Qwen 1.5-1.8B and Qwen 3.5-4B. Larger models produce more realistic pages; smaller ones are sufficient for basic JSON/HTML generation.
See docs/ai_generation.md for full setup instructions.
OpenAI Base URL Parameter (#177)
A new optional openai_base_url config field (and KRAWL_AI_OPENAI_BASE_URL env var) overrides the default api.openai.com endpoint when provider is set to openai. Useful for routing through LiteLLM, custom proxies, or self-hosted OpenAI-compatible services.
ai:
provider: "openai"
openai_base_url: "http://your-proxy/v1"
api_key: "your-key"Dashboard Warmup Aggregation & Noise Filtering (#184)
by @Lore09
Three new configuration options improve dashboard performance at scale:
| Option | Env var | Default | Description |
|---|---|---|---|
dashboard.warmup_pages
| KRAWL_DASHBOARD_WARMUP_PAGES
| 10
| Pages to pre-warm per table panel |
dashboard.warmup_aggregation
| KRAWL_DASHBOARD_WARMUP_AGGREGATION
| false
| Pre-compute full top_paths/top_ua aggregations so any page is served instantly with zero DB queries |
dashboard.top_n_min_count
| KRAWL_DASHBOARD_TOP_N_MIN_COUNT
| 5
| Minimum access count to appear in top paths/user-agents panels — filters noise at the DB level |
warmup_aggregation is enabled by default in all scalable deployments (Helm, Kubernetes manifests, and Docker Compose scalable files). The aggregation cache is paginated in-memory so any page of the top-N panel is served instantly without additional queries.
Default favicon.ico (#179)
Krawl now ships a default favicon.ico (based on the Krawl spider logo) so that browser requests for /favicon.ico are answered correctly instead of generating an AI deception page. A custom favicon can be placed at ./data/favicon.ico to override the default.
Bug Fixes
Fix wp-login Attack Misclassification (#187, closes #186)
by @3isenHeiM
/wp-login.php POST requests were incorrectly classified as COMMAND_INJECTION because the HTML form field named pwd matched the pwd shell command pattern. The form field has been renamed to password, so submissions are now correctly captured as CREDENTIALS CAPTURED / LOGIN ATTEMPT.
Before:
[COMMAND_INJECTION DETECTED] X.X.X.X - /wp-login.php - Method: POST
After:
[LOGIN ATTEMPT] X.X.X.X - /wp-login.php - Mozilla/5.0
[CREDENTIALS CAPTURED] X.X.X.X - Username: admin - Path: /wp-login.php
Fix Dashboard Authentication (#182)
by @Lore09
The dashboard login was broken due to a mismatch between client-side SHA-256 hashing and the backend comparison. Authentication now uses plain-text password comparison end-to-end, removing the client-side hashing step. Also fixed Docker Compose volume paths in docker/ subdirectory variants to correctly use parent-directory references (../wordlists.json, etc.).
Improvements
Improved Attack Type Detection (#188)
by @Lore09
- Added a
login_attemptregex pattern covering common auth endpoints (/login,/signin,/wp-login.php,/admin/login, etc.) so brute-force and credential stuffing hits are classified correctly instead of falling through to other categories. - Refined
command_injectionregex to better detect chained shell commands and reduce false positives on POST body fields. - Introduced
scoring_weightsto fine-tune the classification scoring for attackers, good crawlers, bad crawlers, and regular users. - Extended attack patterns and command outputs for richer honeypot coverage.
Improved CI/CD Workflows (#190)
by @Lore09
- Replaced
flake8/pylintwith Ruff for linting. Ruff rules now include security checks (S) alongside standard style/import/upgrade rules. - Added Bandit static security analysis with automatic PR commenting: findings are posted directly on the pull request for visibility.
- Replaced
safetywith pip-audit for dependency vulnerability scanning. - Added
pip-missing-reqsandpip-extra-reqschecks to catch unused or undeclared dependencies. - Removed the
betabranch from all workflow triggers — CI now runs only onmainanddev.
What's Changed
- Add OpenAI Base URL optional parameter by @Matthias-vdE in #177
- Update version and appVersion to 2.0.1 by @BlessedRebuS in #178
- Add default favicon.ico by @Matthias-vdE in #179
- Added llama.cpp and ollama self-hosted alternative by @BlessedRebuS in #180
- Fix/failed authentication by @Lore09 in #182
- Feat/improve dashboard responsiveness by @Lore09 in #184
- Fixed wrong wp-login attack categorisation (fix #186) by @3isenHeiM in #187
- Feat/improve attack types detection by @Lore09 in #188
- Improving Github Workflows by @Lore09 in #190
New Contributors
- @3isenHeiM made their first contribution in #187
Full Changelog: v2.0.0...v2.1.0