Skip to content

dwarfs-0.4.0

Compare
Choose a tag to compare
@mhx mhx released this 06 Mar 17:59
· 1872 commits to main since this release

Up to twice as fast and up to 10% better compression

The segmenting algorithm has been completely rewritten and is now much cleaner, uses much less memory, is significantly faster and detects a lot more duplicate segments. At the same time it's easier to configure (just a single window size instead of a list).

As a result, mkdwarfs speed has been significantly improved. The 47 GiB worth of Perl installations can now be turned into a DwarFS image in less then 6 minutes, about 30% faster than with the 0.3.1 release. Using lzma compression, it actually takes less than 4 minutes now, almost twice as fast as 0.3.1.

At the same time, compression ratio also significantly improved, mostly due to the new segmenting algorithm. With the 0.3.1 release, using the default configuration, the 47 GiB of Perl installations compressed down to 471.6 MiB. With the 0.4.0 release, this has dropped to 426.5 MiB, a 10% improvement. Using lzma compression (-l9), the size of the resulting image went from 319.5 MiB to 300.9 MiB, about 5% better. More importantly, though, the uncompressed file system size dropped from about 7 GiB to 4 GiB thanks to improved segmenting, which means less blocks need to be decompressed on average when using the file system.

New dwarfsextract tool

The new tool allows extracting a file system image directly to disk without having to use the FUSE driver. It also allows conversion of the file system image directly into a standard archive format (e.g. tar or cpio). Extracting a DwarFS image can be significantly faster than extracting a equivalent compressed archive.

Options have been cleaned up

The --blockhash-window-sizes and --blockhash-increment-shift options were replaced by --window-size and --window-step, respectively. The new --window-size option takes only a single window size instead of a list. There's also a new option --max-lookback-blocks that allows duplicate segments to be detected across multiple blocks, which can result in significantly better compression when using small file system blocks.

Bugfixes

  • The rewrite of the segmenting algorithm was triggered by a "bug" (github #35) that caused excessive memory consumption in mkdwarfs. It wasn't really a bug, though, more like a bad algorithm that used memory proportional to the file size. This issue has now been fully solved.

  • Scanning of large files would excessively grow mkdwarfs RSS. The memory would have sooner or later be reclaimed by the kernel, but the code now actively releases the memory while scanning.

  • The project can now be built to use the system installed zstd and xxHash libraries. (fixes github #34)

  • The project can now be built without the legacy FUSE driver. (fixes github #32)