pypi charset-normalizer 2.0.0
Version 2.0.0

latest releases: 3.3.2, 3.3.1, 3.3.0...
3 years ago

This package is reaching its two years of existence, now is a good time for a nice refresh.

Changes: See PR #45

  • Performance: ⚡ 4x to 5 times faster than the previous 1.4.0 release.
  • Performance: ⚡ At least 2x faster than Chardet.
  • Performance: ⚡ Accent has been made on UTF-8 detection, should perform rather instantaneous.
  • Improvement: 🔙 The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
  • Improvement: ❇️ The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
  • Code: 🎨 The program has been rewritten to ease the readability and maintainability. (+Using static typing)
  • Tests: ✔️ New workflows are now in place to verify the following aspects: Performance, Backward-Compatibility with Chardet, and Detection Coverage in addition to currents tests. (+CodeQL)
  • Dependency: ➖ This package no longer require anything when used with Python 3.5 (Dropped cached_property)
  • Docs: ✏️ Performance claims have been updated, the guide to contributing, and the issue template.
  • Improvement: ❇️ Add --version argument to CLI
  • Bugfix: 🐛 The CLI output used the relative path of the file(s). Should be absolute.
  • Deprecation: 🔴 Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
  • Improvement: ❇️ If no language was detected in content, trying to infer it using the encoding name/alphabets used.
  • Removal: 🔥 Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
  • Improvement: ❇️ utf_7 detection has been reinstated.
  • Removal: 🔥 The exception hook on UnicodeDecodeError has been removed.

After much consideration, this release won't drop Python 3.5 in v2.

Don't miss a new charset-normalizer release

NewReleases is sending notifications on new releases.