github D4Vinci/Scrapling v0.4.3
Release v0.4.3

8 hours ago

A new update with many important changes 🎉

New Stuff and quality of life changes

  • Added a new MCP tool to open a persistent normal/stealthy browser to keep using it with the rest of the tools, and another new tool to close it. (Examples)
  • Added a new MCP tool to list all existing browser sessions. Aimed to be used with the new tools.
  • Added a new option to browser sessions to automatically collect all background requests that happen during a request (Solves #159) [Examples].
  • Added a new sanitizer to protect the MCP server from common Prompt Injection attacks by removing hidden/invisible content.
  • Added a new commandline option called --ai-targeted to the Web Scraping commands to make content targeted to AI and safe against common Prompt Injection attacks like the MCP server.
  • Added a new option to browser sessions called executable_path to allow setting a custom browser path (Solves #202)
  • Refactored the MCP server code to be easily maintained and unified all tools to be async.
  • Refactored the CLI commands code to be easily maintained and shorter by 210 lines.

Solved bugs

  • A fix to preserve HTTP method across retries in spider session by @karesansui-u in #201
  • Added a max retry limit to getting page content to prevent infinite loop by @haosenwang1018 & @D4Vinci in #197
  • Replace bare raise with return False in _restore_from_checkpoint by @haosenwang1018 in #196
  • Replaced get_all with getall in Texthandler to match the Selector class.

Coverage/tests improvement

  • Added _normalize_credentials edge case coverage tests by @Bortlesboat in #192
  • Added save/retrieve round-trip and core storage coverage tests by @haosenwang1018 in #193
  • Added coverage for TextHandler regex paths and TextHandlers.re() by @haosenwang1018 in #194
  • Added edge case tests for filter, iterancestors, and find_similar by @awanawana in #200

Agent Skill improvement

  • Fixed broken markdown links in skill references by @yetval in #204
  • Improved the skill structure to be more acceptable by Clawhub validation.
  • Forced the skill to use the --ai-targeted commandline option when scraping through commandline commands.

Docs improvement

  • Added Korean README translation by @greatsk55 in #187
  • CJK Latin spacing fixes for the Chinese and Japanese READMEs.
  • Fixed broken links from the old website design.

🙏 Special thanks to the community for all the continuous testing and feedback


Big shoutout to our Platinum Sponsors

Don't miss a new Scrapling release

NewReleases is sending notifications on new releases.