github mhx/dwarfs v0.15.0
dwarfs-0.15.0

latest releases: v0.15.3, v0.15.2, v0.15.1...
3 months ago

Leaner Core, Hollow Filesystems and REUSE compliance

Major dependency reduction

This release removes the project's long-standing dependencies on folly and fbthrift, along with a number of transitive dependencies that came with them. DwarFS now uses a much lighter internal stack built around standard C++23 facilities, Boost, and a small set of in-repo components tailored specifically to what the project actually needs.

At the center of this change is a new internal thrift_lite library, which replaces fbthrift for the subset of functionality DwarFS uses: the thrift compiler, compact protocol support, JSON serialization, debug output, and frozen-layout support. The frozen library from fbthrift has been forked, cleaned up, and integrated as an internal component, with all folly dependencies removed. The frozen fork also opens up new possibilities for adding safety features that would have been hard to upstream.

All this significantly simplifies the build, reduces the dependency footprint, and removes several indirect dependencies as well, including gflags, glog, double-conversion, fast_float, and libevent. On macOS, it also eliminates the libsodium dependency. In addition to making the project easier to build and package, the new implementation also improves performance in some places: the refactored frozen code, while being fully compatible, is measurably faster than the original thanks to a new abstraction for accessing bit-packed data.

Compatibility: thrift_lite is only intended to be "compatible enough" with fbthrift for DwarFS' purposes. The compact protocol remains fully compatible, ensuring forwards- and backwards-compatibility of filesystem schema metadata. However, the debug and JSON output formats are no longer identical to the old fbthrift output. The debug output should look very familiar, but not byte-for-byte the same. The JSON output now follows a simpler style and represents maps as lists of pairs rather than JSON objects.

Hollow filesystem images

mkdwarfs now supports a new --hollow option for building "hollow" filesystem images. In this mode, DwarFS preserves the full directory structure, metadata, and file sizes of the input, but does not store the actual file contents. Instead, files are represented as empty sparse files of the same size, and reads from those files return zeroes.

This is useful for test scenarios where a realistic filesystem layout is needed, but the actual contents are irrelevant or would be wasteful to include.

New binary file categorizer

This release adds a new binary file categorizer, enabled with --categorize=binary. It can detect ELF, PE, and Mach-O executables and shared libraries, and group them by type and architecture.

This can significantly improve compression ratios when the input data contains binaries from different platforms and architectures. Even in less extreme cases, it can still help by separating binary and non-binary content into different streams.

REUSE compliance

The project is now compliant with the REUSE specification. All source files have been annotated with SPDX license identifiers, a REUSE.toml file has been added, and the full license texts are now included in the LICENSES directory.

This makes the licensing status of the repository much clearer and easier to verify automatically.


Bug fixes

  • Commas in the filesystem image path were not escaped when passed to the FUSE driver as the fsname option. Because commas are used as FUSE argument separators, this could cause mounting to fail for paths containing commas. Fixes #323.

  • Progress reporting in dwarfsextract was broken when extracting a subset of files using patterns, because it was computed relative to the total filesystem size rather than the total size of the selected files. Several subtle edge cases could also cause progress percentages to fail to reach 100% or even exceed it. These issues have been fixed. Fixes #316.

  • Fixed FUSE argument vector initialization in dwarfs_main, which could trigger an assertion inside libfuse when extra arguments were added after an uninitialized vector was passed to FUSE_ARGS_INIT.

  • The Windows build of mkdwarfs no longer aborts with a fatal error when it encounters an empty file during scanning.

  • Fixed a metadata lookup bug for parent_dir_entry in filesystems with format version 2.2 and earlier (that is, DwarFS releases before v0.5.0), where an additional level of indirection was required but missing. Fortunately, this only affected the debug output of dwarfsck: the parent= field shown with -d directory_tree would display the parent inode number rather than the parent entry number. The only other code path using parent_dir_entry effectively compensated for the missing indirection.

  • Recompressing a filesystem image with sparse files, without also rebuilding the metadata, could erroneously fail with an error claiming that sparse file support could not be disabled, even without --no-sparse-files. The root cause was an unchecked std::optional access. This has been fixed.

  • When rewriting a filesystem image, the bytes_in and bytes_out progress counters were updated at different times, which could lead to incorrect compression ratios being shown during progress reporting. Both counters are now updated together after compression.

  • When using --format=newc and extracting a subset of hardlinked files, dwarfsextract could crash with an "unexpected deferred entry" error. This was caused by a peculiarity of libarchive's newc implementation that was not handled correctly. The bug has been fixed and is now covered by a test.

  • Corrected the license information in a few headers, changing them from GPL-3.0-or-later to MIT.

Features

  • Major dependency reduction / de-Meta-ing the project. fbthrift and folly are no longer dependencies of DwarFS, and the corresponding submodules have been removed from the repository. fbthrift has been replaced by a new thrift_lite library that implements the subset DwarFS actually needs, including the thrift compiler, compact protocol support, JSON serialization, debug output, and frozen-layout support. The frozen library from fbthrift has been forked and is now maintained as an internal component with all folly dependencies removed. DwarFS now relies on standard C++23, Boost, and a few new in-repo components instead. This also removes several indirect dependencies (gflags, glog, double-conversion, libevent, and on macOS also libsodium). The resulting code is simpler, the dependency footprint is smaller, and binary size is reduced in many cases. The compact protocol remains fully compatible, but the debug and JSON output formats are no longer identical to fbthrift's output.

  • mkdwarfs now automatically selects the progress display mode based on whether the output is connected to a terminal and whether the current locale uses UTF-8. Previously, the default was always unicode, which could produce garbled output in non-UTF-8 environments. Addresses #326.

  • The project is now compliant with the REUSE specification. All source files now carry SPDX license identifiers, a REUSE.toml file has been added, and full license texts are included in the LICENSES directory.

  • New --hollow option for mkdwarfs. This allows building hollow filesystem images that preserve the structure, metadata, and file sizes of the input while replacing actual file contents with zero-filled sparse files. This is useful for testing scenarios where realistic filesystem structure matters but the actual contents do not. Fixes #131.

  • mkdwarfs now supports ZSTD long-distance matching (LDM) via a new long algorithm option for --compression. This can improve compression ratios with extremely large block sizes (typically above 128 MiB), or with smaller block sizes at lower compression levels. In addition, all ZSTD compression parameters can now be tuned when DwarFS is linked against a statically linked libzstd that does not expose the division-by-zero bug described in zstd issue #4590. The statically linked release binaries contain a patched libzstd and support tuning all parameters. Fixes #322.

  • New binary file categorizer (--categorize=binary). This categorizer can identify ELF (Linux/FreeBSD), PE (Windows), and Mach-O (macOS) executables and shared libraries and group them into separate categories by type and architecture. This can dramatically improve compression ratios when binaries from different platforms and architectures are mixed together.

  • dwarfsextract has a new --num-disk-writers option to run multiple writer threads in parallel when extracting files to disk. This can improve throughput, especially when extracting large numbers of small files.

  • dwarfsextract has new --skip-devices and --skip-specials options to skip device nodes and special files (such as sockets and FIFOs) during extraction.

  • dwarfsextract now emits a warning when a pattern is provided but no matching files are found.

  • dwarfsck can now export metadata to stdout instead of to a file by using --export-metadata=-.

  • With mkdwarfs in --input-list mode, specifying --order=none now preserves the exact order of entries given in the input file. Previously, none was not treated as a meaningful ordering guarantee in this mode.

  • New --no-check option for mkdwarfs to skip the filesystem integrity check before recompression. This can speed up recompression workflows when the source image is assumed to be valid. The individual checks are still performed during the rewrite itself. Addresses #322.

  • When recompressing a filesystem image, blocks that are uncompressed in the source image are no longer unnecessarily copied into memory. Instead, the mapped memory region is passed directly to the compressor, saving some memory and CPU time in this case.

  • While profiling mkdwarfs, reading memory usage from /proc/self/smaps_rollup turned out to be a significant hotspot. DwarFS now reads /proc/self/status by default instead, trading some accuracy for much lower overhead. It is still possible to use /proc/self/smaps_rollup by setting DWARFS_ACCURATE_MEMORY_USAGE=1. This can be combined with DWARFS_LOG_MEMORY_USAGE to periodically log memory usage during a mkdwarfs run. If /proc/self/smaps_rollup is inaccessible, DwarFS automatically falls back to /proc/self/status.

  • Added FreeBSD support for memory usage tracking in mkdwarfs using the kinfo_proc interface.

  • Memory usage during mkdwarfs rewrite operations is now properly constrained by the -L / --memory-limit option, taking into account both queued blocks and memory used by the compression algorithm itself. Addresses #322.

  • The similarity ordering option now uses a different hash mixing function. The main goal is to improve distribution, since only a small number of hash bits are actually used. This may or may not improve compression ratios, but it will affect the resulting image size.

Build

  • The project now requires C++23 compiler support instead of C++20. Care has been taken to restrict usage to widely available and long-supported C++23 features.

  • Ubuntu 26.04 (resolute) and Debian testing have been added to the continuous integration matrix.

  • Cleaned up many compiler warnings across different platforms.

Docs

  • Added documentation for the fits, hotness, and binary categorizers to the mkdwarfs manual page.

Test

  • Test coverage has been significantly improved from 96.4% to 97.1%, with more than 10,000 lines of new test code.

Full Changelog: v0.14.1...v0.15.0

SHA-256 Checksums

725f22f8a762ed3448afdd5551cd0da50547240bb66a2b22d684a427c6804cfe  dwarfs-0.15.0-Linux-aarch64.tar.xz
0f5d14d1c0b12b3e42a4ecb017c850a5ab5cc93f5f43c6d6461f368c21361c4d  dwarfs-0.15.0-Linux-arm.tar.xz
da68e94cfadaf6e09848edcc172fa23d4f598ce3bcdd1876c566e74dd28c220a  dwarfs-0.15.0-Linux-i386.tar.xz
2dbdf21013656bc19448ef6e2cdb392f5f8f667af8b6b9c6ef9396978a31e89c  dwarfs-0.15.0-Linux-loongarch64.tar.xz
b0bfdf2b2a427d8ae772670964578e8eccb3de20306f35c427971fde70c355b6  dwarfs-0.15.0-Linux-ppc64le.tar.xz
9b03bc0ccdb55efed322471db50397a91c22820c898e389d734939fad2b49af7  dwarfs-0.15.0-Linux-ppc64.tar.xz
e1340e850f2b35b5a271e3e2ff3e908e3cb166e106f060fa2da6e2fa6429576e  dwarfs-0.15.0-Linux-riscv64.tar.xz
340cbbff70e5a1d9c6b6d08145a4598503ab7c58cd0d0653933d546f3f0cb167  dwarfs-0.15.0-Linux-s390x.tar.xz
e05cb439217f0797583ebd14542ce7169590366d988f0454e7444094608ea9d0  dwarfs-0.15.0-Linux-x86_64.tar.xz
790f3bae70f18e9a6b27d821986fcdb72f00f6c821bf7466eb4b228c19ae78d7  dwarfs-0.15.0.tar.xz
799145f1fb0c0f446e69242eb91916315cde4ad54404e68e36fb09bc76e61a4d  dwarfs-0.15.0-Windows-AMD64.7z
624e038cbea12a2bd4bde35f81f5bf79b00ce727b297ad8c95a6e535e535fd51  dwarfs-fuse-extract-0.15.0-Linux-aarch64
be50f151d8c967c6dd1b8b42cf686564dc9864757bc735b702bd2b9795d6c819  dwarfs-fuse-extract-0.15.0-Linux-aarch64.upx
9abd2d0c60292bfd89e55eb3e85e073526c9a4d2862660c10fd57e3e7a1ed5fe  dwarfs-fuse-extract-0.15.0-Linux-arm
4a2df6a5e2168f8dbb3fb430eec0ac4daee5d93a9ba862aee71e3e28d9535aa6  dwarfs-fuse-extract-0.15.0-Linux-arm.upx
c86aa373e483e33ed43fb4a35a12b57f055923dcbdb6a70e5cadb6487c67527c  dwarfs-fuse-extract-0.15.0-Linux-i386
fc134057440b1250b164c754c0d570323255b05489f30f512034f9425833293c  dwarfs-fuse-extract-0.15.0-Linux-i386.upx
c3fa279d582a7551e41fbf44ab85c05d12044baad0b34b9a7245f8fce5491554  dwarfs-fuse-extract-0.15.0-Linux-loongarch64
c5cc036fd464f60117c60942279b987141dcf34ef6b56503083354c324a4286d  dwarfs-fuse-extract-0.15.0-Linux-ppc64
29b6a807d3357c38c4eb8ba0290cec9764046db30652d74e83c5f6472e4aca83  dwarfs-fuse-extract-0.15.0-Linux-ppc64le
49249bd112c16f45385843f46a0afdc3b33c85d3bca481e3116a96a9107c0dd7  dwarfs-fuse-extract-0.15.0-Linux-riscv64
3b4647d167de03e514bc6defb79eca7990e793f64533f050c17dc9aed379bc80  dwarfs-fuse-extract-0.15.0-Linux-s390x
07c90d83cb4cd29a9f7774a72203ef5291793af4901e6bfbb9558ea4c45c163d  dwarfs-fuse-extract-0.15.0-Linux-x86_64
aa4fe493d8fa8032bba78cbc264d6235c0357acc783be6342599b963a6d4caa9  dwarfs-fuse-extract-0.15.0-Linux-x86_64.upx
b4dafb76bb9f72c07902c677e20db258c13ee2c81a8cf2579aa769c868ea2c69  dwarfs-universal-0.15.0-Linux-aarch64
b3e050fdbc9dbee720b6c071d70fbd6f44e4e5f3a45e1e03d784a2d9dab83a10  dwarfs-universal-0.15.0-Linux-aarch64.upx
aca02400c65980790e7c66011d3c39743c58159eee99577387fcba5425b2b1f1  dwarfs-universal-0.15.0-Linux-arm
4003d6a98780292eb4e991a69b80cf6b954c2a244a1ee7ab89a319cecf29e05d  dwarfs-universal-0.15.0-Linux-arm.upx
979c7627aa1c014a19a33f48528f42c98e9f165f02e6283ca0181e43fbc3772e  dwarfs-universal-0.15.0-Linux-i386
e5f8b66b39130a1fe5e65a5a1d5ae458fe380d9dc40412f4b09f8a73f1c0a343  dwarfs-universal-0.15.0-Linux-i386.upx
4d5b9e42db61d8d4bf0bd87ed870f0659464b6b5282f070ddb7e1f57407b1449  dwarfs-universal-0.15.0-Linux-loongarch64
3861cbb196086d638186e53d7aeaba10579dc42698a5b53f40336c8ddb9fbdbc  dwarfs-universal-0.15.0-Linux-ppc64
1cda86ee3f537cc5765ad180086cb1951f5743ad35a786699641253295cabaf3  dwarfs-universal-0.15.0-Linux-ppc64le
342cc4f9539540c6b4146ebf4262d19deb8307a2a827193ca631df7c31a48099  dwarfs-universal-0.15.0-Linux-riscv64
333e3b62111c48809df14e0ce1bcd0787223d72f232f57c6317e9bb7ea6a9735  dwarfs-universal-0.15.0-Linux-s390x
7789f0dc6f28b09714c78956eed034c9093c64381d531f62451fa65dfabf0547  dwarfs-universal-0.15.0-Linux-x86_64
a1a63d006eed462162d8f016d31950eaafec20ae63db1dbf649e6bd23c1ef829  dwarfs-universal-0.15.0-Linux-x86_64.upx
f100a88e575e5dfe173abdf2580f08a768b47fa65c7f98d6a1ebf76f7d724027  dwarfs-universal-0.15.0-Windows-AMD64.exe

Don't miss a new dwarfs release

NewReleases is sending notifications on new releases.