github crwlrsoft/crawler v1.5.0

latest releases: v2.0.0-beta.2, v2.0.0-beta, v1.10.0...
7 months ago

Added

  • The DomQuery class (parent of CssSelector (Dom::cssSelector) and XPathQuery (Dom::xPath)) has a new method formattedText() that uses the new crwlr/html-2-text package to convert the HTML to formatted plain text. You can also provide a customized instance of the Html2Text class to the formattedText() method.

Fixed

  • The Http::crawl() step won't yield a page again if a newly found URL responds with a redirect to a previously loaded URL.

Don't miss a new crawler release

NewReleases is sending notifications on new releases.