github jhy/jsoup jsoup-1.18.1

2 months ago

https://jsoup.org/news/release-1.18.1

Improvements

  • Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is
    emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an
    (empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse,
    for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit
    into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides
    a selectFirst(String query) / selectNext(String query), which will run the parser until a hit is found, at which
    point the parse is suspended. It can be resumed via another select() call, or via the stream() or iterator()
    methods. 2096
  • Download Progress: added a Response Progress event interface, which reports progress and URLs are downloaded (and
    parsed). Supported on both a session and a single connection
    level. 2164, 656
  • Added Path accepting parse methods: Jsoup.parse(Path), Jsoup.parse(path, charsetName, baseUri, parser),
    etc. 2055
  • Updated the button tag configuration to include a space between multiple button elements in the Element.text()
    method. 2105
  • Added support for the ns|* all elements in namespace Selector. 1811
  • When normalising attribute names during serialization, invalid characters are now replaced with _, vs being
    stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced
    unexpectedly. 2143

Changes

  • Removed previously deprecated internal classes and methods. 2094
  • Build change: the built jar's OSGi manifest no longer imports itself. 2158

Bug Fixes

  • When tracking source positions, if the first node was a TextNode, its position was incorrectly set
    to -1. 2106
  • When connecting (or redirecting) to URLs with characters such as {, } in the path, a Malformed URL exception would
    be thrown (if in development), or the URL might otherwise not be escaped correctly (if in
    production). The URL encoding process has been improved to handle these characters
    correctly. 2142
  • When using W3CDom with a custom output Document, a Null Pointer Exception would be
    thrown. 2114
  • The :has() selector did not match correctly when using sibling combinators (like
    e.g.: h1:has(+h2)). 2137
  • The :empty selector incorrectly matched elements that started with a blank text node and were followed by
    non-empty nodes, due to an incorrect short-circuit. 2130
  • Element.cssSelector() would fail with "Did not find balanced marker" when building a selector for elements that had
    a ( or [ in their class names. And selectors with those characters escaped would not match as
    expected. 2146
  • Updated Entities.escape(string) to make the escaped text suitable for both text nodes and attributes (previously was
    only for text nodes). This does not impact the output of Element.html() which correctly applies a minimal escape
    depending on if the use will be for text data or in a quoted
    attribute. 1278
  • Fuzz: a Stack Overflow exception could occur when resolving a crafted <base href> URL, in the normalizing regex.
    2165

Don't miss a new jsoup release

NewReleases is sending notifications on new releases.