A new update with a brand-new Scrapy integration and a batch of community fixes 🎉
🚀 New Stuff and quality of life changes
-
Added a Scrapy integration so you can use Scrapling's parsing API inside your existing Scrapy projects without rewriting them. Put the
scrapling_responsedecorator on any spider callback, and the response it receives becomes a ScraplingResponsewhile Scrapy keeps handling the crawling (Check the docs):import scrapy from scrapling.integrations.scrapy import scrapling_response class QuotesSpider(scrapy.Spider): name = "quotes" start_urls = ["https://quotes.toscrape.com"] @scrapling_response def parse(self, response): # `response` is now a Scrapling Response first_quote = response.find_by_text("The world as we have created it", partial=True) for quote in [first_quote, *first_quote.find_similar()]: yield {"text": quote.get_all_text(strip=True)}
-
The MCP server can now use a custom Chromium-compatible browser for all browser-based tools. Set it once with
scrapling mcp --executable-path "/path/to/chromium"or theSCRAPLING_EXECUTABLE_PATHenvironment variable, or per request with theexecutable_pathargument, by @samrusani in #360 (Solves #347) -
Updated all browsers and fingerprints. Run
scrapling install --forceafter updating to refresh them.
🐛 Bug Fixes
- Fixed garbled text (mojibake) from browser fetchers on non-UTF-8 websites by @yehudalevy-collab in #365 (Fixes #364).
- Fixed
LinkExtractornot filtering compound file extensions like.tar.gzby @renbkna in #359 (Fixes #349). - Fixed paused crawls losing their in-flight requests from checkpoints, so resuming no longer skips them by @yetval in #358.
- Fixed spiders calculating wrong crawl delays from robots.txt
Request-ratedirectives through the Protego upgrade, with tests aligned by @Disaster-Terminator in #355.
Docs
- Clarified how
init_scriptinteracts with Patchright's isolated execution context in stealth mode by @mturac in #353 (Solves #350). - Added the skills.sh install method for the agent skill by @ob-aion in #363.
🙏 Special thanks to the community for all the continuous testing and feedback