This release contains big performance improvements:
-
File hashing is performed in the order of physical location of the data on disk. This minimizes disk seek latency and hugely improves performance on HDDs. On file systems which don't support
ioctl
FIEMAP
feature to get physical location of the file data, accesses are ordered by file identifiers (e.g. inode ids on Unix) which seems to also improve performance as long as file data are not fragmented heavily. -
Switched from
HashMap
toBTreeMap
for file grouping. This reduces memory usage (and also improves memory access locality, but this probably isn't something you'll notice in cold-cache runs).
Stay tuned for updated benchmarks...