jhy/jsoup jsoup-1.18.2 on GitHub

Improvements

Optimized the throughput and memory use throughout the input read and parse flows, with heap allocations and GC down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see throughput increases of ~ 20%. These performance improvements come through recycling the backing byte[] and char[] arrays used to read and parse the input. 2186
Speed optimized html() and Entities.escape() when the input contains UTF characters in a supplementary plane, by around 49%. 2183
The form associated elements returned by FormElement.elements() now reflect changes made to the DOM, subsequently to the original parse. 2140
In the TreeBuilder, the onNodeInserted() and onNodeClosed() events are now also fired for the outermost / root Document node. This enables source position tracking on the Document node (which was previously unset). And it also enables the node traversor to see the outer Document node. 2182
Selected Elements can now be position swapped inline using Elements#set(). 2212

Bug Fixes

Element.cssSelector() would fail if the element's class contained a * character. 2169
When tracking source ranges, a text node following an invalid self-closing element may be left untracked. 2175
When a document has no doctype, or a doctype not named html, it should be parsed in Quirks Mode. 2197
With a selector like div:has(span + a), the has() component was not working correctly, as the inner combining query caused the evaluator to match those against the outer's siblings, not children. 2187
A selector query that included multiple :has() components in a nested :has() might incorrectly execute. 2131
When cookie names in a response are duplicated, the simple view of cookies available via Connection.Response#cookies() will provide the last one set. Generally it is better to use the Jsoup.newSession method to maintain a cookie jar, as that applies appropriate path selection on cookies when making requests. 1831
When parsing named HTML entities, base entities should resolve if they are a prefix of the input token (and not in an attribute). 2207
Fixed incorrect tracking of source ranges for attributes merged from late-occurring elements that were implicitly created (html or body). 2204
Follow the current HTML specification in the tokenizer to allow < as part of a tag name, instead of emitting it as a character node. 2230
Similarly, allow a < as the start of an attribute name, vs creating a new element. The previous behavior was intended to parse closer to what we anticipated the author's intent to be, but that does not align to the spec or to how browsers behave. 1483

jhy/jsoup jsoup-1.18.2 jsoup 1.18.2 on GitHub

Improvements

Bug Fixes

jhy/jsoup jsoup-1.18.2
jsoup 1.18.2

on GitHub