[v2.3.2] - 2026-03-02
⭐ Highlights
- Added optional CDN asset downloading for complete offline mirroring.
- Improved URL resolution to fix malformed paths and protocol-relative URLs.
- Added CSS asset discovery so fonts and background images referenced in stylesheets are downloaded automatically.
- Introduced CI validation workflows and automated code formatting.
🌐 External CDN Asset Support
Modern websites often load assets from external CDNs.
This release introduces an optional feature to download those resources locally.
New CLI flag:
--download-external-assets
When enabled:
-
CDN assets (CSS, JS, fonts, images) are downloaded
-
Files are stored under:
cdn// -
HTML links are automatically rewritten to reference local copies
This allows complete offline mirroring of CDN-heavy websites.
🚀 URL Resolution Improvements
- Fixed incorrect URL normalization order that previously caused malformed asset paths.
- Properly handle protocol-relative URLs (
//cdn.domain.com/...). - Prevent invalid internal paths such as
https://example.com/npm/.... - Reduced false 404 errors on modern CDN-heavy websites (Webflow, etc.).
🎨 CSS Asset Discovery
The crawler now parses CSS files to detect additional assets.
Automatically downloads resources referenced in url(...), including:
- Fonts (
.woff,.woff2) - Background images
- SVG assets
This significantly improves offline rendering of mirrored sites.
🧪 CI & Validation Enhancements
- Added a dynamic GitHub Action for testing crawls against any target website.
- Implemented validation checks for:
- unresolved protocol-relative URLs
- malformed internal asset paths
- missing HTML output
- Added improved crawl reports and artifact uploads.
🧹 Code Quality Automation
- Integrated automatic formatting with Black and isort.
- Added Ruff linting for fast static analysis.
- CI now automatically formats code if style issues are detected.
🛠 Stability & Robustness
- Improved path normalization and filesystem safety.
- Enhanced logging clarity and crawl diagnostics.
- Maintained strict traversal protection and hashing safeguards.
What's Changed
- Improve URL handling and CDN asset support in
website-downloader.pyby @PKHarsimran
#23
Full Changelog
v2.2.0...v2.3.2