pypi chardet 7.4.2

10 hours ago

Patch release: fixes a crash on short inputs and closes a bunch of WHATWG/IANA alias gaps.

Bug Fixes

  • Fixed RuntimeError: pipeline must always return at least one result on ~2% of all possible two-byte inputs (e.g. b"\xf9\x92"). Multi-byte encodings like CP932 and Johab could score above the structural confidence threshold on very short inputs, but then statistical scoring would return nothing, leaving an empty result list instead of falling through to the fallback. (#367, #368, thanks @jasonwbarnett)

Improvements

  • Added ~90 encoding aliases from the WHATWG Encoding Standard and IANA Character Sets registry so that <meta charset> labels like x-cp1252, x-sjis, dos-874, csUTF8, and the cswindows* family all resolve correctly through the markup detection stage. Every alias was driven by a failing spec-compliance test, not speculative. (#366)
  • Added a spec-compliance test suite covering Python decode round-trips for all 86 registry encodings, WHATWG label resolution, IANA preferred MIME names, and Unicode/RFC conformance (BOM sniffing, UTF-8 boundary cases, UTF-16 surrogate pairs). This is the test suite that would have caught the 7.4.1 BOM bug before release. (#366)

Full Changelog: 7.4.1...7.4.2

Don't miss a new chardet release

NewReleases is sending notifications on new releases.