What's changed
New features
- Introducing the
Fetchersfeature with 3 new main types to make Scrapling fetch pages for you with a LOT of options!- The
Fetcherclass for basic HTTP requests - The
StealthyFetcherclass is a completely stealthy fetcher that uses a stealthy modified version of Firefox. - The
PlayWrightFetcherclass that allows doing browser-based requests with Vanilla PlayWright, PlayWright with stealth mode made by me, Real browsers through CDP, and NSTBrowser's docker browserless!
- The
- Added the completely new
find_all/findmethods to find elements easily on the page with dark magic! - Added the methods
filterandsearchto theAdaptorsclass for easier bulk operations onAdaptorobject groups. - Added methods
css_firstandxpath_firstmethods for easier usage. - Added the new class type
TextHandlerswhich is used for bulk operations onTextHandlerobjects like theAdaptorsclass. - Added
generate_full_css_selectorandgenerate_full_xpath_selectormethods.
Bugs Squashed
- Now the
Adaptorsclass version ofre_firstreturns the first result that matches in allAdaptorobjects inside instead of the faulty logic of returning the results ofre_firstof allAdaptorobjects. - Now if the user selects a text-type content to be returned from selected elements (like css
::textfunction) with any method like.cssor.xpath. TheAdaptorobject will return theTextHandlersclass instead of returning a list of strings like before. So now you can dopage.css('something::text').re_first(r'regex_pattern').json()instead ofpage.css('something::text')[0].re_first(r'regex_pattern').json() - Now
Adaptor/Adaptorsre/re_first arguments are consistent with theTextHandlerones. So now you haveclean_matchandcase_sensitivearguments. - Now the
auto_matchargument is enabled by default in the initialization ofAdaptorbut still you have to enable it while selecting elements if you want to enable it. (Not a bug but a design decision) - A lot of type-annotations corrections here and there for better auto-completion experience while you are coding with Scrapling.
Quality of life changes
- Renamed both
css_selectorandxpath_selectormethods togenerate_css_selectorandgenerate_xpath_selectorfor clarity and to not interrupt the auto-completion while coding. - Restructured most of the old code into a
coresubpackage and other design decisions for cleaner and easier maintenance in the future. - Restructured the tests folder into a cleaner structure and added tests for the new features. Also now tox environments are cached on GitHub for faster automated tests with each commit.