Leaner Core, Hollow Filesystems and REUSE compliance
Major dependency reduction
This release removes the project's long-standing dependencies on folly and fbthrift, along with a number of transitive dependencies that came with them. DwarFS now uses a much lighter internal stack built around standard C++23 facilities, Boost, and a small set of in-repo components tailored specifically to what the project actually needs.
At the center of this change is a new internal thrift_lite library, which replaces fbthrift for the subset of functionality DwarFS uses: the thrift compiler, compact protocol support, JSON serialization, debug output, and frozen-layout support. The frozen library from fbthrift has been forked, cleaned up, and integrated as an internal component, with all folly dependencies removed. The frozen fork also opens up new possibilities for adding safety features that would have been hard to upstream.
All this significantly simplifies the build, reduces the dependency footprint, and removes several indirect dependencies as well, including gflags, glog, double-conversion, fast_float, and libevent. On macOS, it also eliminates the libsodium dependency. In addition to making the project easier to build and package, the new implementation also improves performance in some places: the refactored frozen code, while being fully compatible, is measurably faster than the original thanks to a new abstraction for accessing bit-packed data.
Compatibility: thrift_lite is only intended to be "compatible enough" with fbthrift for DwarFS' purposes. The compact protocol remains fully compatible, ensuring forwards- and backwards-compatibility of filesystem schema metadata. However, the debug and JSON output formats are no longer identical to the old fbthrift output. The debug output should look very familiar, but not byte-for-byte the same. The JSON output now follows a simpler style and represents maps as lists of pairs rather than JSON objects.
Hollow filesystem images
mkdwarfs now supports a new --hollow option for building "hollow" filesystem images. In this mode, DwarFS preserves the full directory structure, metadata, and file sizes of the input, but does not store the actual file contents. Instead, files are represented as empty sparse files of the same size, and reads from those files return zeroes.
This is useful for test scenarios where a realistic filesystem layout is needed, but the actual contents are irrelevant or would be wasteful to include.
New binary file categorizer
This release adds a new binary file categorizer, enabled with --categorize=binary. It can detect ELF, PE, and Mach-O executables and shared libraries, and group them by type and architecture.
This can significantly improve compression ratios when the input data contains binaries from different platforms and architectures. Even in less extreme cases, it can still help by separating binary and non-binary content into different streams.
REUSE compliance
The project is now compliant with the REUSE specification. All source files have been annotated with SPDX license identifiers, a REUSE.toml file has been added, and the full license texts are now included in the LICENSES directory.
This makes the licensing status of the repository much clearer and easier to verify automatically.
Bug fixes
-
Commas in the filesystem image path were not escaped when passed to the FUSE driver as the
fsnameoption. Because commas are used as FUSE argument separators, this could cause mounting to fail for paths containing commas. Fixes #323. -
Progress reporting in
dwarfsextractwas broken when extracting a subset of files using patterns, because it was computed relative to the total filesystem size rather than the total size of the selected files. Several subtle edge cases could also cause progress percentages to fail to reach 100% or even exceed it. These issues have been fixed. Fixes #316. -
Fixed FUSE argument vector initialization in
dwarfs_main, which could trigger an assertion inside libfuse when extra arguments were added after an uninitialized vector was passed toFUSE_ARGS_INIT. -
The Windows build of
mkdwarfsno longer aborts with a fatal error when it encounters an empty file during scanning. -
Fixed a metadata lookup bug for
parent_dir_entryin filesystems with format version 2.2 and earlier (that is, DwarFS releases before v0.5.0), where an additional level of indirection was required but missing. Fortunately, this only affected the debug output ofdwarfsck: theparent=field shown with-d directory_treewould display the parent inode number rather than the parent entry number. The only other code path usingparent_dir_entryeffectively compensated for the missing indirection. -
Recompressing a filesystem image with sparse files, without also rebuilding the metadata, could erroneously fail with an error claiming that sparse file support could not be disabled, even without
--no-sparse-files. The root cause was an uncheckedstd::optionalaccess. This has been fixed. -
When rewriting a filesystem image, the
bytes_inandbytes_outprogress counters were updated at different times, which could lead to incorrect compression ratios being shown during progress reporting. Both counters are now updated together after compression. -
When using
--format=newcand extracting a subset of hardlinked files,dwarfsextractcould crash with an "unexpected deferred entry" error. This was caused by a peculiarity of libarchive'snewcimplementation that was not handled correctly. The bug has been fixed and is now covered by a test. -
Corrected the license information in a few headers, changing them from GPL-3.0-or-later to MIT.
Features
-
Major dependency reduction / de-Meta-ing the project.
fbthriftandfollyare no longer dependencies of DwarFS, and the corresponding submodules have been removed from the repository.fbthrifthas been replaced by a newthrift_litelibrary that implements the subset DwarFS actually needs, including the thrift compiler, compact protocol support, JSON serialization, debug output, and frozen-layout support. Thefrozenlibrary fromfbthrifthas been forked and is now maintained as an internal component with allfollydependencies removed. DwarFS now relies on standard C++23, Boost, and a few new in-repo components instead. This also removes several indirect dependencies (gflags,glog,double-conversion,libevent, and on macOS alsolibsodium). The resulting code is simpler, the dependency footprint is smaller, and binary size is reduced in many cases. The compact protocol remains fully compatible, but the debug and JSON output formats are no longer identical tofbthrift's output. -
mkdwarfsnow automatically selects the progress display mode based on whether the output is connected to a terminal and whether the current locale uses UTF-8. Previously, the default was alwaysunicode, which could produce garbled output in non-UTF-8 environments. Addresses #326. -
The project is now compliant with the REUSE specification. All source files now carry SPDX license identifiers, a
REUSE.tomlfile has been added, and full license texts are included in theLICENSESdirectory. -
New
--hollowoption formkdwarfs. This allows building hollow filesystem images that preserve the structure, metadata, and file sizes of the input while replacing actual file contents with zero-filled sparse files. This is useful for testing scenarios where realistic filesystem structure matters but the actual contents do not. Fixes #131. -
mkdwarfsnow supports ZSTD long-distance matching (LDM) via a newlongalgorithm option for--compression. This can improve compression ratios with extremely large block sizes (typically above 128 MiB), or with smaller block sizes at lower compression levels. In addition, all ZSTD compression parameters can now be tuned when DwarFS is linked against a statically linkedlibzstdthat does not expose the division-by-zero bug described in zstd issue #4590. The statically linked release binaries contain a patchedlibzstdand support tuning all parameters. Fixes #322. -
New binary file categorizer (
--categorize=binary). This categorizer can identify ELF (Linux/FreeBSD), PE (Windows), and Mach-O (macOS) executables and shared libraries and group them into separate categories by type and architecture. This can dramatically improve compression ratios when binaries from different platforms and architectures are mixed together. -
dwarfsextracthas a new--num-disk-writersoption to run multiple writer threads in parallel when extracting files to disk. This can improve throughput, especially when extracting large numbers of small files. -
dwarfsextracthas new--skip-devicesand--skip-specialsoptions to skip device nodes and special files (such as sockets and FIFOs) during extraction. -
dwarfsextractnow emits a warning when a pattern is provided but no matching files are found. -
dwarfsckcan now export metadata to stdout instead of to a file by using--export-metadata=-. -
With
mkdwarfsin--input-listmode, specifying--order=nonenow preserves the exact order of entries given in the input file. Previously,nonewas not treated as a meaningful ordering guarantee in this mode. -
New
--no-checkoption formkdwarfsto skip the filesystem integrity check before recompression. This can speed up recompression workflows when the source image is assumed to be valid. The individual checks are still performed during the rewrite itself. Addresses #322. -
When recompressing a filesystem image, blocks that are uncompressed in the source image are no longer unnecessarily copied into memory. Instead, the mapped memory region is passed directly to the compressor, saving some memory and CPU time in this case.
-
While profiling
mkdwarfs, reading memory usage from/proc/self/smaps_rollupturned out to be a significant hotspot. DwarFS now reads/proc/self/statusby default instead, trading some accuracy for much lower overhead. It is still possible to use/proc/self/smaps_rollupby settingDWARFS_ACCURATE_MEMORY_USAGE=1. This can be combined withDWARFS_LOG_MEMORY_USAGEto periodically log memory usage during amkdwarfsrun. If/proc/self/smaps_rollupis inaccessible, DwarFS automatically falls back to/proc/self/status. -
Added FreeBSD support for memory usage tracking in
mkdwarfsusing thekinfo_procinterface. -
Memory usage during
mkdwarfsrewrite operations is now properly constrained by the-L/--memory-limitoption, taking into account both queued blocks and memory used by the compression algorithm itself. Addresses #322. -
The
similarityordering option now uses a different hash mixing function. The main goal is to improve distribution, since only a small number of hash bits are actually used. This may or may not improve compression ratios, but it will affect the resulting image size.
Build
-
The project now requires C++23 compiler support instead of C++20. Care has been taken to restrict usage to widely available and long-supported C++23 features.
-
Ubuntu 26.04 (resolute) and Debian testing have been added to the continuous integration matrix.
-
Cleaned up many compiler warnings across different platforms.
Docs
- Added documentation for the
fits,hotness, andbinarycategorizers to themkdwarfsmanual page.
Test
- Test coverage has been significantly improved from 96.4% to 97.1%, with more than 10,000 lines of new test code.
Full Changelog: v0.14.1...v0.15.0
SHA-256 Checksums
725f22f8a762ed3448afdd5551cd0da50547240bb66a2b22d684a427c6804cfe dwarfs-0.15.0-Linux-aarch64.tar.xz
0f5d14d1c0b12b3e42a4ecb017c850a5ab5cc93f5f43c6d6461f368c21361c4d dwarfs-0.15.0-Linux-arm.tar.xz
da68e94cfadaf6e09848edcc172fa23d4f598ce3bcdd1876c566e74dd28c220a dwarfs-0.15.0-Linux-i386.tar.xz
2dbdf21013656bc19448ef6e2cdb392f5f8f667af8b6b9c6ef9396978a31e89c dwarfs-0.15.0-Linux-loongarch64.tar.xz
b0bfdf2b2a427d8ae772670964578e8eccb3de20306f35c427971fde70c355b6 dwarfs-0.15.0-Linux-ppc64le.tar.xz
9b03bc0ccdb55efed322471db50397a91c22820c898e389d734939fad2b49af7 dwarfs-0.15.0-Linux-ppc64.tar.xz
e1340e850f2b35b5a271e3e2ff3e908e3cb166e106f060fa2da6e2fa6429576e dwarfs-0.15.0-Linux-riscv64.tar.xz
340cbbff70e5a1d9c6b6d08145a4598503ab7c58cd0d0653933d546f3f0cb167 dwarfs-0.15.0-Linux-s390x.tar.xz
e05cb439217f0797583ebd14542ce7169590366d988f0454e7444094608ea9d0 dwarfs-0.15.0-Linux-x86_64.tar.xz
790f3bae70f18e9a6b27d821986fcdb72f00f6c821bf7466eb4b228c19ae78d7 dwarfs-0.15.0.tar.xz
799145f1fb0c0f446e69242eb91916315cde4ad54404e68e36fb09bc76e61a4d dwarfs-0.15.0-Windows-AMD64.7z
624e038cbea12a2bd4bde35f81f5bf79b00ce727b297ad8c95a6e535e535fd51 dwarfs-fuse-extract-0.15.0-Linux-aarch64
be50f151d8c967c6dd1b8b42cf686564dc9864757bc735b702bd2b9795d6c819 dwarfs-fuse-extract-0.15.0-Linux-aarch64.upx
9abd2d0c60292bfd89e55eb3e85e073526c9a4d2862660c10fd57e3e7a1ed5fe dwarfs-fuse-extract-0.15.0-Linux-arm
4a2df6a5e2168f8dbb3fb430eec0ac4daee5d93a9ba862aee71e3e28d9535aa6 dwarfs-fuse-extract-0.15.0-Linux-arm.upx
c86aa373e483e33ed43fb4a35a12b57f055923dcbdb6a70e5cadb6487c67527c dwarfs-fuse-extract-0.15.0-Linux-i386
fc134057440b1250b164c754c0d570323255b05489f30f512034f9425833293c dwarfs-fuse-extract-0.15.0-Linux-i386.upx
c3fa279d582a7551e41fbf44ab85c05d12044baad0b34b9a7245f8fce5491554 dwarfs-fuse-extract-0.15.0-Linux-loongarch64
c5cc036fd464f60117c60942279b987141dcf34ef6b56503083354c324a4286d dwarfs-fuse-extract-0.15.0-Linux-ppc64
29b6a807d3357c38c4eb8ba0290cec9764046db30652d74e83c5f6472e4aca83 dwarfs-fuse-extract-0.15.0-Linux-ppc64le
49249bd112c16f45385843f46a0afdc3b33c85d3bca481e3116a96a9107c0dd7 dwarfs-fuse-extract-0.15.0-Linux-riscv64
3b4647d167de03e514bc6defb79eca7990e793f64533f050c17dc9aed379bc80 dwarfs-fuse-extract-0.15.0-Linux-s390x
07c90d83cb4cd29a9f7774a72203ef5291793af4901e6bfbb9558ea4c45c163d dwarfs-fuse-extract-0.15.0-Linux-x86_64
aa4fe493d8fa8032bba78cbc264d6235c0357acc783be6342599b963a6d4caa9 dwarfs-fuse-extract-0.15.0-Linux-x86_64.upx
b4dafb76bb9f72c07902c677e20db258c13ee2c81a8cf2579aa769c868ea2c69 dwarfs-universal-0.15.0-Linux-aarch64
b3e050fdbc9dbee720b6c071d70fbd6f44e4e5f3a45e1e03d784a2d9dab83a10 dwarfs-universal-0.15.0-Linux-aarch64.upx
aca02400c65980790e7c66011d3c39743c58159eee99577387fcba5425b2b1f1 dwarfs-universal-0.15.0-Linux-arm
4003d6a98780292eb4e991a69b80cf6b954c2a244a1ee7ab89a319cecf29e05d dwarfs-universal-0.15.0-Linux-arm.upx
979c7627aa1c014a19a33f48528f42c98e9f165f02e6283ca0181e43fbc3772e dwarfs-universal-0.15.0-Linux-i386
e5f8b66b39130a1fe5e65a5a1d5ae458fe380d9dc40412f4b09f8a73f1c0a343 dwarfs-universal-0.15.0-Linux-i386.upx
4d5b9e42db61d8d4bf0bd87ed870f0659464b6b5282f070ddb7e1f57407b1449 dwarfs-universal-0.15.0-Linux-loongarch64
3861cbb196086d638186e53d7aeaba10579dc42698a5b53f40336c8ddb9fbdbc dwarfs-universal-0.15.0-Linux-ppc64
1cda86ee3f537cc5765ad180086cb1951f5743ad35a786699641253295cabaf3 dwarfs-universal-0.15.0-Linux-ppc64le
342cc4f9539540c6b4146ebf4262d19deb8307a2a827193ca631df7c31a48099 dwarfs-universal-0.15.0-Linux-riscv64
333e3b62111c48809df14e0ce1bcd0787223d72f232f57c6317e9bb7ea6a9735 dwarfs-universal-0.15.0-Linux-s390x
7789f0dc6f28b09714c78956eed034c9093c64381d531f62451fa65dfabf0547 dwarfs-universal-0.15.0-Linux-x86_64
a1a63d006eed462162d8f016d31950eaafec20ae63db1dbf649e6bd23c1ef829 dwarfs-universal-0.15.0-Linux-x86_64.upx
f100a88e575e5dfe173abdf2580f08a768b47fa65c7f98d6a1ebf76f7d724027 dwarfs-universal-0.15.0-Windows-AMD64.exe