github D4Vinci/Scrapling v0.4.10
Release v0.4.10

8 hours ago

A new update with a brand-new Scrapy integration and a batch of community fixes 🎉

🚀 New Stuff and quality of life changes

  • Added a Scrapy integration so you can use Scrapling's parsing API inside your existing Scrapy projects without rewriting them. Put the scrapling_response decorator on any spider callback, and the response it receives becomes a Scrapling Response while Scrapy keeps handling the crawling (Check the docs):

    import scrapy
    from scrapling.integrations.scrapy import scrapling_response
    
    
    class QuotesSpider(scrapy.Spider):
        name = "quotes"
        start_urls = ["https://quotes.toscrape.com"]
    
        @scrapling_response
        def parse(self, response):  # `response` is now a Scrapling Response
            first_quote = response.find_by_text("The world as we have created it", partial=True)
            for quote in [first_quote, *first_quote.find_similar()]:
                yield {"text": quote.get_all_text(strip=True)}
  • The MCP server can now use a custom Chromium-compatible browser for all browser-based tools. Set it once with scrapling mcp --executable-path "/path/to/chromium" or the SCRAPLING_EXECUTABLE_PATH environment variable, or per request with the executable_path argument, by @samrusani in #360 (Solves #347)

  • Updated all browsers and fingerprints. Run scrapling install --force after updating to refresh them.

🐛 Bug Fixes

  • Fixed garbled text (mojibake) from browser fetchers on non-UTF-8 websites by @yehudalevy-collab in #365 (Fixes #364).
  • Fixed LinkExtractor not filtering compound file extensions like .tar.gz by @renbkna in #359 (Fixes #349).
  • Fixed paused crawls losing their in-flight requests from checkpoints, so resuming no longer skips them by @yetval in #358.
  • Fixed spiders calculating wrong crawl delays from robots.txt Request-rate directives through the Protego upgrade, with tests aligned by @Disaster-Terminator in #355.

Docs

  • Clarified how init_script interacts with Patchright's isolated execution context in stealth mode by @mturac in #353 (Solves #350).
  • Added the skills.sh install method for the agent skill by @ob-aion in #363.

🙏 Special thanks to the community for all the continuous testing and feedback


Big shoutout to our Platinum Sponsors

Don't miss a new Scrapling release

NewReleases is sending notifications on new releases.