github crwlrsoft/crawler v3.5.0

latest release: v3.5.1
23 days ago

Added

  • Dynamically building request URLs from extracted data: Http steps now have a new staticUrl() method, and you can also use variables within that static URL - as well as in request headers and the body - like https://www.example.com/foo/[crwl:some_extracted_property]. These placeholders will be replaced with the corresponding properties from input data (also works with kept data).
  • New Refiners:
    • DateTimeRefiner::reformat('Y-m-d H:i:s') to reformat a date time string to a different format. Tries to automatically recognize the input format. If this does not work, you can provide an input format to use as the second argument.
    • HtmlRefiner::remove('#foo') to remove nodes matching the given selector from selected HTML.
  • Steps that produce multiple outputs per input can now group them per input by calling the new Step::oneOutputPerInput() method.

Don't miss a new crawler release

NewReleases is sending notifications on new releases.