github PKHarsimran/website-downloader v2.3.2
URL Resolution Improvements, CDN Asset Support & CI Enhancements

latest release: v2.4.0
3 months ago

[v2.3.2] - 2026-03-02

⭐ Highlights

  • Added optional CDN asset downloading for complete offline mirroring.
  • Improved URL resolution to fix malformed paths and protocol-relative URLs.
  • Added CSS asset discovery so fonts and background images referenced in stylesheets are downloaded automatically.
  • Introduced CI validation workflows and automated code formatting.

🌐 External CDN Asset Support

Modern websites often load assets from external CDNs.
This release introduces an optional feature to download those resources locally.

New CLI flag:
--download-external-assets

When enabled:

  • CDN assets (CSS, JS, fonts, images) are downloaded

  • Files are stored under:
    cdn//

  • HTML links are automatically rewritten to reference local copies

This allows complete offline mirroring of CDN-heavy websites.


🚀 URL Resolution Improvements

  • Fixed incorrect URL normalization order that previously caused malformed asset paths.
  • Properly handle protocol-relative URLs (//cdn.domain.com/...).
  • Prevent invalid internal paths such as https://example.com/npm/....
  • Reduced false 404 errors on modern CDN-heavy websites (Webflow, etc.).

🎨 CSS Asset Discovery

The crawler now parses CSS files to detect additional assets.

Automatically downloads resources referenced in url(...), including:

  • Fonts (.woff, .woff2)
  • Background images
  • SVG assets

This significantly improves offline rendering of mirrored sites.


🧪 CI & Validation Enhancements

  • Added a dynamic GitHub Action for testing crawls against any target website.
  • Implemented validation checks for:
    • unresolved protocol-relative URLs
    • malformed internal asset paths
    • missing HTML output
  • Added improved crawl reports and artifact uploads.

🧹 Code Quality Automation

  • Integrated automatic formatting with Black and isort.
  • Added Ruff linting for fast static analysis.
  • CI now automatically formats code if style issues are detected.

🛠 Stability & Robustness

  • Improved path normalization and filesystem safety.
  • Enhanced logging clarity and crawl diagnostics.
  • Maintained strict traversal protection and hashing safeguards.

What's Changed

  • Improve URL handling and CDN asset support in website-downloader.py by @PKHarsimran
    #23

Full Changelog
v2.2.0...v2.3.2

Don't miss a new website-downloader release

NewReleases is sending notifications on new releases.