github PKHarsimran/website-downloader v2.4.0
Improved Offline Mirroring and External Asset Handling

2 months ago

Release: Improved Offline Mirroring and External Asset Handling

This release improves offline site mirroring by expanding asset discovery, improving link rewriting, and adding safer handling for modern websites that rely on external resources such as CDNs, manifests, responsive images, and inline CSS references.

✨ Highlights

Added external domain whitelisting

You can now limit external asset downloads to only approved domains using:

  • --external-domains

This also automatically enables external asset downloading, making it easier to mirror CDN-heavy websites in a more controlled way.

Improved external asset support

External assets are now handled more cleanly for offline use:

  • external resources can be stored under cdn/<domain>/...
  • rewritten references now better support localized CDN assets
  • external assets can be selectively allowed instead of downloading everything

Expanded rewrite coverage

Offline rewriting now supports more resource types and locations, including:

  • src
  • href
  • data-src
  • poster
  • srcset
  • inline style URLs
  • inline <style> blocks
  • CSS url(...)
  • CSS @import
  • common static asset references inside downloaded JS files
  • og:image
  • twitter:image

Broader modern web resource support

Support has been expanded for more <link rel> resource types, including:

  • stylesheet
  • icon
  • shortcut
  • apple-touch-icon
  • preload
  • modulepreload
  • manifest

Better URL normalization and protocol-relative handling

Improved canonicalization and protocol-relative URL handling now help reduce malformed paths and duplicate fetches.

Examples:

  • //cdn.example.com/file.css
  • default port normalization
  • fragment removal for stable deduplication

Safer filesystem handling

Path handling has been hardened to reduce failures across platforms:

  • illegal character replacement
  • reserved filename protection
  • segment sanitization
  • long-path shortening
  • hashed fallbacks for overly long filenames
  • safer handling of query-string collisions

Improved CSS and JS asset discovery

Downloaded CSS files are now scanned and rewritten for asset references such as fonts, images, and imports.

Downloaded JS files can also rewrite obvious static asset URLs when they point to known file types, improving offline compatibility for some frontend bundles.

Better non-fetchable URL handling

The crawler now skips more unsupported schemes safely, including:

  • mailto:
  • tel:
  • sms:
  • javascript:
  • data:
  • geo:
  • blob:
  • about:

Optional Brotli-aware request handling

The script now detects Brotli support and adjusts Accept-Encoding automatically when available.

Improved offline compatibility for localized external assets

When external assets are rewritten locally, integrity and crossorigin attributes are removed from localized <script> and <link> tags where needed to avoid offline loading problems.

🔧 CLI

New

  • --external-domains

Existing

  • --url
  • --destination
  • --max-pages
  • --threads
  • --download-external-assets

📌 Notes

This release is focused on making the downloader more reliable for modern websites with:

  • CDN-hosted assets
  • responsive images
  • inline style-based assets
  • CSS imports
  • social preview images
  • manifest and icon resources

It should provide a more complete offline mirror than the previous release, especially for sites that depend on external static assets.

⚠️ Limitations

This is still best suited for static or mostly server-rendered websites.

Some sites may still require additional handling if they depend heavily on:

  • authentication flows
  • JavaScript-driven navigation
  • API-loaded content
  • dynamic tokens or runtime state

🙌 Feedback

If you test this release on a site that previously had missing assets or broken offline rendering, feel free to open an issue with:

  • target URL
  • command used
  • what improved
  • what still failed
  • relevant log snippets

What's Changed

Full Changelog: v2.3.2...v2.4.0

Don't miss a new website-downloader release

NewReleases is sending notifications on new releases.