A new update with many important changes 🎉
New Stuff and quality of life changes
- Added a new MCP tool to open a persistent normal/stealthy browser to keep using it with the rest of the tools, and another new tool to close it. (Examples)
- Added a new MCP tool to list all existing browser sessions. Aimed to be used with the new tools.
- Added a new option to browser sessions to automatically collect all background requests that happen during a request (Solves #159) [Examples].
- Added a new sanitizer to protect the MCP server from common Prompt Injection attacks by removing hidden/invisible content.
- Added a new commandline option called
--ai-targetedto the Web Scraping commands to make content targeted to AI and safe against common Prompt Injection attacks like the MCP server. - Added a new option to browser sessions called
executable_pathto allow setting a custom browser path (Solves #202) - Refactored the MCP server code to be easily maintained and unified all tools to be async.
- Refactored the CLI commands code to be easily maintained and shorter by 210 lines.
Solved bugs
- A fix to preserve HTTP method across retries in spider session by @karesansui-u in #201
- Added a max retry limit to getting page content to prevent infinite loop by @haosenwang1018 & @D4Vinci in #197
- Replace bare
raisewithreturn Falsein_restore_from_checkpointby @haosenwang1018 in #196 - Replaced
get_allwithgetallinTexthandlerto match the Selector class.
Coverage/tests improvement
- Added
_normalize_credentialsedge case coverage tests by @Bortlesboat in #192 - Added save/retrieve round-trip and core storage coverage tests by @haosenwang1018 in #193
- Added coverage for
TextHandlerregex paths andTextHandlers.re()by @haosenwang1018 in #194 - Added edge case tests for
filter,iterancestors, andfind_similarby @awanawana in #200
Agent Skill improvement
- Fixed broken markdown links in skill references by @yetval in #204
- Improved the skill structure to be more acceptable by Clawhub validation.
- Forced the skill to use the
--ai-targetedcommandline option when scraping through commandline commands.
Docs improvement
- Added Korean README translation by @greatsk55 in #187
- CJK Latin spacing fixes for the Chinese and Japanese READMEs.
- Fixed broken links from the old website design.
🙏 Special thanks to the community for all the continuous testing and feedback
Big shoutout to our Platinum Sponsors