dwarfs-0.4.0
Up to twice as fast and up to 10% better compression
The segmenting algorithm has been completely rewritten and is now much cleaner, uses much less memory, is significantly faster and detects a lot more duplicate segments. At the same time it's easier to configure (just a single window size instead of a list).
As a result, mkdwarfs
speed has been significantly improved. The 47 GiB worth of Perl installations can now be turned into a DwarFS image in less then 6 minutes, about 30% faster than with the 0.3.1 release. Using lzma
compression, it actually takes less than 4 minutes now, almost twice as fast as 0.3.1.
At the same time, compression ratio also significantly improved, mostly due to the new segmenting algorithm. With the 0.3.1 release, using the default configuration, the 47 GiB of Perl installations compressed down to 471.6 MiB. With the 0.4.0 release, this has dropped to 426.5 MiB, a 10% improvement. Using lzma
compression (-l9
), the size of the resulting image went from 319.5 MiB to 300.9 MiB, about 5% better. More importantly, though, the uncompressed file system size dropped from about 7 GiB to 4 GiB thanks to improved segmenting, which means less blocks need to be decompressed on average when using the file system.
New dwarfsextract
tool
The new tool allows extracting a file system image directly to disk without having to use the FUSE driver. It also allows conversion of the file system image directly into a standard archive format (e.g. tar
or cpio
). Extracting a DwarFS image can be significantly faster than extracting a equivalent compressed archive.
Options have been cleaned up
The --blockhash-window-sizes
and --blockhash-increment-shift
options were replaced by --window-size
and --window-step
, respectively. The new --window-size
option takes only a single window size instead of a list. There's also a new option --max-lookback-blocks
that allows duplicate segments to be detected across multiple blocks, which can result in significantly better compression when using small file system blocks.
Bugfixes
-
The rewrite of the segmenting algorithm was triggered by a "bug" (github #35) that caused excessive memory consumption in
mkdwarfs
. It wasn't really a bug, though, more like a bad algorithm that used memory proportional to the file size. This issue has now been fully solved. -
Scanning of large files would excessively grow
mkdwarfs
RSS. The memory would have sooner or later be reclaimed by the kernel, but the code now actively releases the memory while scanning. -
The project can now be built to use the system installed
zstd
andxxHash
libraries. (fixes github #34) -
The project can now be built without the legacy FUSE driver. (fixes github #32)