github sparklemotion/nokogiri v1.14.0
1.14.0 / 2023-01-12

latest releases: v1.16.6, v1.16.5, v1.16.4...
17 months ago

1.14.0 / 2023-01-12

Notable Changes

Ruby

This release introduces native gem support for Ruby 3.2. (Also see "Technical note" under "Changed" below.)

This release ends support for:

Faster, more reliable installation: Native Gem for aarch64-linux (aka linux/arm64/v8)

This version of Nokogiri ships official native gem support for the aarch64-linux platform, which should support AWS Graviton and other ARM64 Linux platforms. Please note that glibc >= 2.29 is required for aarch64-linux systems, see Supported Platforms for more information.

Faster, more reliable installation: Native Gem for arm-linux (aka linux/arm/v7)

This version of Nokogiri ships experimental native gem support for the arm-linux platform. Please note that glibc >= 2.29 is required for arm-linux systems, see Supported Platforms for more information.

Pattern matching

This version introduces an experimental pattern matching API for XML::Attr, XML::Document, XML::DocumentFragment, XML::Namespace, XML::Node, and XML::NodeSet (and their subclasses).

Some documentation on what can be matched:

We welcome feedback on this API at #2360.

Dependencies

CRuby

  • Vendored libiconv is updated to v1.17

JRuby

  • This version of Nokogiri uses jar-dependencies to manage most of the vendored Java dependencies. nokogiri -v now outputs maven metadata for all Java dependencies, and Nokogiri::VERSION_INFO also contains this metadata. [#2432]
  • HTML parsing is now provided by net.sourceforge.htmlunit:neko-htmlunit:2.61.0 (previously Nokogiri used a fork of org.cyberneko.html:nekohtml)
  • Vendored Jing is updated from com.thaiopensource:jing:20091111 to nu.validator:jing:20200702VNU.
  • New dependency on net.sf.saxon:Saxon-HE:9.6.0-4 (via nu.validator:jing:20200702VNU).

Added

  • Node#wrap and NodeSet#wrap now also accept a Node type argument, which will be duped for each wrapper. For cases where many nodes are being wrapped, creating a Node once using Document#create_element and passing that Node multiple times is significantly faster than re-parsing markup on each call. [#2657]
  • [CRuby] Invocation of custom XPath or CSS handler functions may now use the nokogiri namespace prefix. Historically, the JRuby implementation required this namespace but the CRuby implementation did not support it. It's recommended that all XPath and CSS queries use the nokogiri namespace going forward. Invocation without the namespace is planned for deprecation in v1.15.0 and removal in a future release. [#2147]
  • HTML5::Document#quirks_mode and HTML5::DocumentFragment#quirks_mode expose the quirks mode used by the parser.

Improved

Functional

Performance

  • Serialization of HTML5 documents and fragments has been re-implemented and is ~10x faster than previous versions. [#2596, #2569]
  • Parsing of HTML5 documents is ~90% faster thanks to additional compiler optimizations being applied. [#2639]
  • Compare Encoding objects rather than compare their names. This is a slight performance improvement and is future-proof. [#2454] (Thanks, @casperisfine!)

Error handling

  • Document#canonicalize now raises an exception if inclusive_namespaces is non-nil and the mode is inclusive, i.e. XML_C14N_1_0 or XML_C14N_1_1. inclusive_namespaces can only be passed with exclusive modes, and previously this silently failed.
  • Empty CSS selectors now raise a clearer Nokogiri::CSS::SyntaxError message, "empty CSS selector". Previously the exception raised from the bowels of racc was "unexpected '$' after ''". [#2700]
  • [CRuby] XML::Reader parsing errors encountered during Reader#attribute_hash and Reader#namespaces now raise an XML::SyntaxError. Previously these methods would return nil and users would generally experience NoMethodErrors from elsewhere in the code.
  • Prefer ruby_xmalloc to malloc within the C extension. [#2480] (Thanks, @Garfield96!)

Installation

  • Avoid compile-time conflict with system-installed gumbo.h on OpenBSD. [#2464]
  • Remove calls to vasprintf in favor of platform-independent rb_vsprintf
  • Installation from source on systems missing libiconv will once again generate a helpful error message (broken since v1.11.0). [#2505]
  • [CRuby+OSX] Compiling from source on MacOS will use the clang option -Wno-unknown-warning-option to avoid errors when Ruby injects options that clang doesn't know about. [#2689]

Fixed

  • SAX::Parser's encoding attribute will not be clobbered when an alternative encoding is passed into SAX::Parser#parse_io. [#1942] (Thanks, @kp666!)
  • Serialized HTML4::DocumentFragment will now be properly encoded. Previously this empty string was encoded as US-ASCII. [#2649]
  • Node#wrap now uses the parent as the context node for parsing wrapper markup, falling back to the document for unparented nodes. Previously the document was always used.
  • [CRuby] UTF-16-encoded documents longer than ~4000 code points now serialize properly. Previously the serialized document was corrupted when it exceeded the length of libxml2's internal string buffer. [#752]
  • [CRuby] The HTML5 parser now correctly handles text at the end of form elements.
  • [CRuby] HTML5::Document#fragment now always uses body as the parsing context. Previously, fragments were parsed in the context of the associated document's root node, which allowed for inconsistent parsing. [#2553]
  • [CRuby] Nokogiri::HTML5::Document#url now correctly returns the URL passed to the constructor method. Previously it always returned nil. [#2583]
  • [CRuby] HTML5 encoding detection is now case-insensitive with respect to meta tag charset declaration. [#2693]
  • [CRuby] HTML5 fragment parsing in context of an annotation-xml node now works. Previously this rarely-used path invoked rb_funcall with incorrect parameters, resulting in an exception, a fatal error, or potentially a segfault. [#2692]
  • [CRuby] HTML5 quirks mode during fragment parsing more closely matches document parsing. [#2646]
  • [JRuby] Fixed a bug with adding the same namespace to multiple nodes via #add_namespace_definition. [#1247]
  • [JRuby] NodeSet#[] now raises a TypeError if passed an invalid parameter type. [#2211]

Deprecated

  • Nokogiri.install_default_aliases is deprecated in favor of Nokogiri::EncodingHandler.install_default_aliases. This is part of a private API and is probably not called by anybody, but we'll go through a deprecation cycle before removal anyway. [#2643, #2446]

Changed

  • [CRuby+OSX] Technical note: On MacOS Ruby 3.2, the symbols from libxml2 and libxslt are no longer exported. Ruby 3.2 adopted new features from the Darwin toolchain that make it challenging to continue to support this rarely-used binary API. A future minor release of Nokogiri may remove these symbols (and others) entirely. Feedback from downstream gem maintainers is welcome at #2746, where you'll also be able to read deeper context on this decision.

Thank you!

The following people and organizations were kind enough to sponsor @flavorjones or the Nokogiri project during the development of v1.14.0:


sha256 checksums:

c87564f5f8fbfb72fbcb7ed9781f6472ceabe2f288ede6b9c37071dc32320ba6  nokogiri-1.14.0-aarch64-linux.gem
33617e8a94993b8130a50bd59d6141a8d4d2aa4d4053f5c7874c71608e6e6dcc  nokogiri-1.14.0-arm-linux.gem
5c0cd4eeb8501526e7e2aaba93b60ebf3dda37bfda665691196d4e9bb87adb1a  nokogiri-1.14.0-arm64-darwin.gem
772936bf635b33b99bc89828de8e7077de47009638fe5ff11795f8b1d578465c  nokogiri-1.14.0-java.gem
ee11c092b2cf2b137e71f623746162c578b53483dccf4c6209c80f5ba47927fe  nokogiri-1.14.0-x64-mingw-ucrt.gem
9b91eede6155eb8891d7d95d8087d514f3007dd19813982104ed77452a2a7ace  nokogiri-1.14.0-x64-mingw32.gem
649019d961b0ea8aee1bc8aa2573ab8ffb77d3f5e9c333aa2462a79fc56745fc  nokogiri-1.14.0-x86-linux.gem
40985fc46315ea3d33ed900a649c0bb77484035ea882b7c9e55aef436b1958a8  nokogiri-1.14.0-x86-mingw32.gem
5d328c0d0c5f6f37a26c75b0282f9014c9686d4c10578ec8dfbbfcbea7da8b95  nokogiri-1.14.0-x86_64-darwin.gem
faa88b2bca46adaa3420c6e27eb8eb71f5b8d9f454ed7488a194a00c5ef52fbe  nokogiri-1.14.0-x86_64-linux.gem
55ca6e87ae85e944a5901dd5a6cacbb961eaaf8b8dd3901b57475665396914bb  nokogiri-1.14.0.gem

Don't miss a new nokogiri release

NewReleases is sending notifications on new releases.