v1.11.0 / 2021-01-03
Notes
Faster, more reliable installation: Native Gems for Linux and OSX/Darwin
"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.
We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:
- Linux:
x86-linux
andx86_64-linux
-- including musl platforms like alpine - OSX/Darwin:
x86_64-darwin
andarm64-darwin
We'd appreciate your thoughts and feedback on this work at #2075.
Dependencies
Ruby
This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.
This release ends support for:
- Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)
- Ruby 2.4, for which official support ended on 2020-04-05
- JRuby 9.1, which is the Ruby 2.3-compatible release.
Gems
- Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
- [MRI] Upgrade mini_portile2 dependency from
~> 2.4.0
to~> 2.5.0
[#2005] (Thanks, @alejandroperea!)
Security
See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".
Added
- Add Node methods for manipulating "keyword attributes" (for example,
class
andrel
):#kwattr_values
,#kwattr_add
,#kwattr_append
, and#kwattr_remove
. [#2000] - Add support for CSS queries
a:has(> b)
,a:has(~ b)
, anda:has(+ b)
. [#688] (Thanks, @jonathanhefner!) - Add
Node#value?
to better match expected semantics of a Hash-like object. [#1838, #1840] (Thanks, @MatzFan!) - [CRuby] Add
Nokogiri::XML::Node#line=
for use by downstream libs like nokogumbo. [#1918] (Thanks, @stevecheckoway!) nokogiri.gemspec
is back after a 10-year hiatus. We still prefer you use the official releases, but master is pretty stable these days, and YOLO.
Performance
- [CRuby] The CSS
~=
operator and class selector.
are about 2x faster. [#2137, #2135] - [CRuby] Patch libxml2 to call
strlen
fromxmlStrlen
rather than the naive implementation, becausestrlen
is generally optimized for the architecture. [#2144] (Thanks, @ilyazub!) - Improve performance of some namespace operations. [#1916] (Thanks, @ashmaroli!)
- Remove unnecessary array allocations from Node serialization methods [#1911] (Thanks, @ashmaroli!)
- Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks, @ashmaroli!)
- Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
- [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks, @kares!)
- [CRuby]
RelaxNG.from_document
no longer leaks memory. [#2114]
Improved
- [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
- {HTML,XML}::Document#parse now accept
Pathname
objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because theread
method would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!) - [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
- Add
frozen_string_literal: true
magic comment to alllib
files. [#1745] (Thanks, @oniofchaos!) - [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)
Fixed
- HTML Parsing in "strict" mode (i.e., the
RECOVER
parse option not set) now correctly raises aXML::SyntaxError
exception. Previously the value of theRECOVER
bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130] - The CSS
~=
operator now correctly handles non-space whitespace in theclass
attribute. commit e45dedd - The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
- The Node methods
add_previous_sibling
,previous=
,before
,add_next_sibling
,next=
,after
,replace
, andswap
now correctly use their parent as the context node for parsing markup. These methods now also raise aRuntimeError
if they are called on a node with no parent. [nokogumbo#160] - [JRuby] XML::Schema XSD validation errors are captured in
XML::Schema#errors
. These errors were previously ignored. - [JRuby] Standardize reading from IO like objects, including StringIO. [#1888, #1897]
- [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
- [JRuby] Clarify exception message when custom XPath functions can't be resolved.
- [JRuby] Comparison of Node to Document with
Node#<=>
now matches CRuby/libxml2 behavior. - [CRuby] Syntax errors are now correctly captured in
Document#errors
for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler. - [CRuby] Fixed installation on AIX with respect to
vasprintf
. [#1908] - [CRuby] On some platforms, avoid symbol name collision with glibc's
canonicalize
. [#2105] - [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]
- [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
- [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)
Removed
- The internal method
Nokogiri::CSS::Parser.cache_on=
has been removed. Use.set_cache
if you need to muck with the cache internals. - The class method
Nokogiri::CSS::Parser.parse
has been removed. This was originally deprecated in 2009 in 13db61b. UseNokogiri::CSS.parse
instead.
Changed
XML::Schema
input is now "untrusted" by default
Address CVE-2020-26247.
In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema
were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.
This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.
Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".
More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.
HTML parser now obeys the strict
or norecover
parsing option
(Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the RECOVER
parse option not set) now correctly raises a XML::SyntaxError
exception. Previously the value of the RECOVER
bit was being ignored by CRuby and was misinterpreted by JRuby.
If you're using the default parser options, you will be unaffected by this fix. If you're passing strict
or norecover
to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises a XML::SyntaxError
exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.
Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.
VersionInfo
, the output of nokogiri -v
, and related constants
This release changes the metadata provided in Nokogiri::VersionInfo
which also affects the output of nokogiri -v
. Some related constants have also been changed. If you're using VersionInfo
programmatically, or relying on constants related to underlying library versions, please read the detailed changes for Nokogiri::VersionInfo
at #2139 and accept our apologies for the inconvenience.
SHA-256 Checksums of published gems
17ed2567bf76319075b4a6a7258d1a4c9e2661fca933b03e037d79ae2b9910d0 nokogiri-1.11.0.gem
2f0149c735b0672c49171b18467ce25fd323a8e608c9e6b76e2b2fa28e7f66ee nokogiri-1.11.0-java.gem
2f249be8cc705f9e899c07225fcbe18f4f7dea220a59eb5fa82461979991082e nokogiri-1.11.0-x64-mingw32.gem
9e219401dc3f93abf09166d12ed99c8310fcaf8c56a99d64ff93d8b5f0604e91 nokogiri-1.11.0-x86-mingw32.gem
bda2a9c9debf51da7011830c7f2dc5771c122ebcf0fc2dd2c4ba4fc95b5c38f2 nokogiri-1.11.0-x86-linux.gem
d500c3202e2514b32f4b02049d9193aa825ae3e9442c9cad2d235446c3e17d8d nokogiri-1.11.0-x86_64-linux.gem
3a613188e3b76d593b04e0ddcc46f44c288b13f80b32ce83957356f50e22f9ee nokogiri-1.11.0-arm64-darwin.gem
b8f9b826d09494b20b30ecd048f5eb2827dccd85b77abeb8baf1f610e5ed28ed nokogiri-1.11.0-x86_64-darwin.gem