github jhy/jsoup jsoup-1.15.4
jsoup 1.15.4

latest releases: jsoup-1.18.1, jsoup-1.17.2, jsoup-1.17.1...
21 months ago

jsoup Java HTML Parser release 1.15.4

jsoup 1.15.4 is out now, and includes a bunch of improvements, particularly when pretty-printing HTML, and bug fixes.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Download jsoup now.

Improvements

  • Added the ability to escape CSS selectors (tags, IDs, classes) to match elements that don't follow regular CSS syntax. For example, to match by classname <p class="one.two">, use document.select("p.one\\.two"); #838
  • When pretty-printing, wrap text that follows a <br> tag. #1858
  • When pretty-printing, normalize newlines that follow self-closing tags in custom tags. #1852
  • When pretty-printing, collapse non-significant whitespace between a block and an inline tag. #1802
  • Added a new method Document.forms(), to conveniently retrieve a List<FormElement> containing the <form> elements in a document.

Bug Fixes

  • URLs containing characters such as and <code> were not escaped correctly, and would throw a MalformedURLException when fetched. #1873
  • Element.cssSelector() would create invalid selectors for elements where the tag name, ID, or classnames needed to be escaped (e.g. if a class name contained a : or .). #1742
  • If a Node or an Element was replaced with itself, that node would incorrectly be orphaned. #1843
  • Form data on a previous request was copied to a new request in newRequest(), resulting in an accumulation of form data when executing multi-step form submissions, or data sent to later requests incorrectly. Now, newRequest() only copies session related settings (cookies, proxy settings, user-agent, etc) but not the request data nor the body. #1778
  • Fixed an issue in Safelist.removeAttributes() which could throw a ConcurrentModificationException when using the :all pseudo-attribute.
  • Given extremely deeply nested HTML, a number of methods in Element could throw a StackOverflowError due to excessive recursion. Namely: #data(), #hasText(), #parents(), and #wrap(html). #1864

Changes

  • Deprecated the unused Document.normalise() method. Normalization occurs during the HTML tree construction, and no longer as a distinct phase.


My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.

You can also follow me (@jhy@tilde.zone) on Mastodon / Fediverse to receive occasional notes about jsoup releases.

Don't miss a new jsoup release

NewReleases is sending notifications on new releases.