github D4Vinci/Scrapling v0.3.10
Release v0.3.10

latest releases: v0.4.1, v0.4, v0.3.14...
3 months ago

A maintenance update with many significant changes and possible breaking changes

  • Solved all encoding issues by using a better approach which will handle web pages where encoding is not correctly declared (Thanks to @Kemsty2's efforts for pointing that out in #110 #111 )
  • Solved a logical issue with overriding session-level parameters with request-level parameters in all browser-based fetchers that was present since v0.3
  • Fixed the signatures of the shortcuts in the interactive web scraping shell, which made a perfect autocompletion experience for the shortcuts in the shell. This issue has been present since v0.3 as well.
  • Pumped up the version for the Maxmind database, which will improve the geoip argument for StealthyFetcher and its session classes.
  • Updated all used browser versions to the latest available ones.
  • BREAKING - all fetchers had gone through a big refactor, which resulted in some interesting things that might break your code:
    1. Scrapling codebase is now smaller by ~750 lines and many changes which would make maintenance very much easier in the future and use a bit less resources.
    2. The validation for all fetchers and their session classes became much faster, which will reflect on their overall speed.
    3. To achieve this, now all fetchers can't accept standard arguments other than the url argument; the rest of the arguments must be keyword-arguments so your code must be like Fetcher.get('https://google.com', stealthy_headers=True) not Fetcher.get('https://google.com', True) if you were doing that for some reason!
    4. An annoying difference between browser-based fetchers and their session classes since v0.3 was that the argument used to pass custom parser settings per request was called custom_config, while it was named selector_config in the session classes. This refactor allowed us to unify the naming to selector_config without breaking your code, so the main one is now selector_config with backward compatibility for the custom_config argument. The autocompletion support will be available only for the selector_config argument.
    5. Also, to achieve all of this, we had to make the type hints of the fetchers' functions dynamically generated, so if you don't get a proper autocompletion in your IDE, make sure you are using a modern version of it. We have tested almost all known IDEs/editors.

We have also updated all benchmark tables with the current numbers against the latest versions of all alternative libraries.

🙏 Special thanks to our Discord community for all the continuous testing and feedback


Big shoutout to our biggest Sponsors

Don't miss a new Scrapling release

NewReleases is sending notifications on new releases.