This release adds a simple feature as a workaround for a bug that will be
resolved in a later version. The bug is described in
#1244, and can be summarised as: When
charset-normalizer
is used to detect the encoding of a file, it will
erroneously detect a UTF-8 file as having no encoding (i.e. a binary file) when
the 2048th byte is a non-final byte of a multi-byte glyph.
You can run reuse as REUSE_ENCODING_MODULE=chardet reuse
to circumvent this
bug. If you use pre-commit, you can use this snippet:
repos:
- repo: https://github.com/fsfe/reuse-tool
rev: v6.1.0
hooks:
- id: reuse
entry: env REUSE_ENCODING_MODULE=chardet reuse
You will not encounter this bug if your environment has libmagic available.
Added
- You can now specify the module that will be used for detecting the encoding of
files with theREUSE_ENCODING_MODULE
environment variable. (#1245) - The Docker images and the pre-commit hooks now come bundled with all encoding
modules. (#1245) - The
--debug
flag now tells you the detected encoding and detected newlines
of each file, as well as which encoding module is used. (#1246)