github crwlrsoft/crawler v0.4.0

latest releases: v2.0.0-beta.2, v2.0.0-beta, v1.10.0...
pre-release2 years ago

Added

  • The BaseStep class now has where() and orWhere() methods to filter step outputs. You can set multiple filters that will be applied to all outputs. When setting a filter using orWhere it's linked to the previously added Filter with "OR". Outputs not matching one of the filters, are not yielded. The available filters can be accessed through static methods on the new Filter class. Currently available filters are comparison filters (equal, greater/less than,...), a few string filters (contains, starts/ends with) and url filters (scheme, domain, host,...).
  • The GetLink and GetLinks steps now have methods onSameDomain(), notOnSameDomain(), onDomain(), onSameHost(), notOnSameHost(), onHost() to restrict the which links to find.
  • Automatically add the crawler's logger to the Store so you can also log messages from there. This can be breaking as the StoreInterface now also requires the addLogger method. The new abstract Store class already implements it, so you can just extend it.

Changed

  • The Csv step can now also be used without defining a column mapping. In that case it will use the values from the first line (so this makes sense when there are column headlines) as output array keys.

Don't miss a new crawler release

NewReleases is sending notifications on new releases.