Added
- The
BaseStep
class now haswhere()
andorWhere()
methods to filter step outputs. You can set multiple filters that will be applied to all outputs. When setting a filter usingorWhere
it's linked to the previously added Filter with "OR". Outputs not matching one of the filters, are not yielded. The available filters can be accessed through static methods on the newFilter
class. Currently available filters are comparison filters (equal, greater/less than,...), a few string filters (contains, starts/ends with) and url filters (scheme, domain, host,...). - The
GetLink
andGetLinks
steps now have methodsonSameDomain()
,notOnSameDomain()
,onDomain()
,onSameHost()
,notOnSameHost()
,onHost()
to restrict the which links to find. - Automatically add the crawler's logger to the
Store
so you can also log messages from there. This can be breaking as theStoreInterface
now also requires theaddLogger
method. The new abstractStore
class already implements it, so you can just extend it.
Changed
- The
Csv
step can now also be used without defining a column mapping. In that case it will use the values from the first line (so this makes sense when there are column headlines) as output array keys.