✨ Add Media Collection to Scraping Pipeline

Summary

This PR introduces the collect_media function, which enhances scraping capabilities by automatically detecting and downloading various types of media assets from a web page using a Selenium-controlled browser session.

🔧 Features

Supported Media Types:

Images (<img>)
Videos (<video>)
Audio files (<audio>)
PDFs (<a href="*.pdf">)
Documents (.doc, .docx, .txt, .rtf)
Presentations (.ppt, .pptx)
Spreadsheets (.xls, .xlsx, .csv)

Functionality:

Uses CSS selectors to find elements containing media links.
Downloads each valid media file (HTTP/HTTPS only).
Saves all assets to a structured media/ directory, grouped by media type.
Writes a download_summary.txt with the original URLs and their local file paths.

Error Handling:

Skips failed downloads and logs the error.
Generates fallback filenames when none are detected in the URL.

jaypyles/Scraperr v1.0.6 v1.0.6 (Media Collection) on GitHub

✨ Add Media Collection to Scraping Pipeline

Summary

🔧 Features

Supported Media Types:

Functionality:

Error Handling:

jaypyles/Scraperr v1.0.6
v1.0.6 (Media Collection)

on GitHub