github crwlrsoft/crawler v0.1.0

latest releases: v1.7.2, v1.7.1, v1.7.0...
pre-release2 years ago

Initial Version containing

  • Crawler class being the main unit that executes all the steps that you'll add to it, handling input and output of the steps.
  • HttpCrawler class using the PoliteHttpLoader (version of HttpLoader sticking to robots.txt rules) using any PSR-18 HTTP client under the hood and having an own implementation for a cookie jar.
  • Some ready to use steps for HTTP, HTML, XML, JSON and CSV.
  • Loops and Groups.
  • Crawler has a PSR-3 LoggerInterface and passes it on to all the steps. The included steps log some messages about what they're doing. Package includes a simple CliLogger.
  • Crawler requires a User Agent and an included BotUserAgent class provides an easy interface for bot user agent strings.
  • Stores to save the final results can be added to the Crawler. Simple CSV File Store is shipped with the package.

Don't miss a new crawler release

NewReleases is sending notifications on new releases.