Added
- New method
Step::refineOutput()
to manually refine step output values. It takes either aClosure
or an instance of the newRefinerInterface
as argument. If the step produces array output, you can provide a key from the array output, to refine, as first argument and the refiner as second argument. You can call the method multiple times and all the refiners will be applied to the outputs in the order you add them. If you want to refine multiple output array keys with aClosure
, you can skip providing a key and theClosure
will receive the full output array for refinement. As mentioned you can provide an instance of theRefinerInterface
. There are already a few implementations:StringRefiner::afterFirst()
,StringRefiner::afterLast()
,StringRefiner::beforeFirst()
,StringRefiner::beforeLast()
,StringRefiner::betweenFirst()
,StringRefiner::betweenLast()
andStringRefiner::replace()
. - New method
Step::excludeFromGroupOutput()
to exclude a normal steps output from the combined output of a group that it's part of. - New method
HttpLoader::setMaxRedirects()
to customize the limit of redirects to follow. Works only when using the HTTP client. - New filters to filter by string length, with the same options as the comparison filters (equal, not equal, greater than,...).
- New
Filter::custom()
that you can use with a Closure, so you're not limited to the available filters only. - New method
DomQuery::link()
as a shortcut forDomQuery::attribute('href')->toAbsoluteUrl()
. - New static method
HttpCrawler::make()
returning an instance of the new classAnonymousHttpCrawlerBuilder
. This makes it possible to create your own Crawler instance with a one-liner like:HttpCrawler::make()->withBotUserAgent('MyCrawler')
. There's also awithUserAgent()
method to create an instance with a normal (non bot) user agent.
Changed
- BREAKING: The
FileCache
now also respects thettl
(time to live) argument and by default it is one hour (3600 seconds). If you're using the cache and expect the items to live (basically) forever, please provide a high enough value for default the time to live. When you try to get a cache item that is already expired, it (the file) is immediately deleted. - BREAKING: The
TooManyRequestsHandler
(and with that also the constructor argument in theHttpLoader
) was renamed toRetryErrorResponseHandler
. It now reacts the same to 503 (Service Unavailable) responses as to the 429 (Too Many Requests) responses. If you're actively passing your own instance to theHttpLoader
, you need to update it. - You can now have multiple different loaders in a
Crawler
. To use this, return an array containing your loaders from the protectedCrawler::loader()
method with keys to name them. You can then selectively use them by calling theStep::useLoader()
method on a loading step with the key of the loader it should use.
Removed
- BREAKING: The loop feature. The only real world use case should be paginating listings and this should be solved with the Paginator feature.
- BREAKING:
Step::dontCascade()
andStep::cascades()
because with the change in v0.7, that groups can only produce combined output, there should be no use case for this anymore. If you want to exclude one steps output from the combined group output, you can use the newStep::excludeFromGroupOutput()
method.