All notable changes to this project will be documented in this file.
The format follows keepachangelog.com. Please stick to it.
[2.5.0 Penetrating Pineapple] -- unreleased
- Fix symbolic link emitting in sh script (sometimes files were omitted from rmlint.sh)
- Fix compile stop on BSD systems in utilities.c (thanks f99aq8ove)
- Fix some compiler warnings and typos.
- Add basic spanish translation.
- Add basic compile support on cygwin.
- Fix bad size stats using
--replay
with hardlinks. - Fix unicode emission of
json
formatter. - Fix broken
gui/SConscript
(was looking for python4...)
- Add
unique
formatter to print unique files. (See sahib#161)
- Removed
--with-metadata-cache
and--cache
since both introduced lots of code without giving many benefits. In total about 1000 lines of code were removed in this process. - For limiting memory usage there is just
--limit-mem
now.
2.4.1 Myopic Micrathene Bugfixes -- 2015-11-01
btrfs
: Do not try to clone files on different subvolumes.gui
: Tie down version for gobject-introspection.freebsd
: Fix very bad performance due to mounttable and some compile issues.
-
-S
now understands two new letters to match via regular expressions:r
: Select the path as original that matches the pattern given in<PATTERN>
after this letter.x
: Same asr
but match only basename.
2.4.0 Myopic Micrathene -- 2015-10-25
rmlint
should compile on Mac OSX now.- Bugfix: Broken
chown
calls in sh script (thanks Shukrat Mukimov) - Bugfix: memory corruption when specifying
-T dd
alone. - Bugfix: Make
-D
and-k / -K
play together nicely (thanks phiresky). - Smaller compile time troubles fixed.
- Progressbar uses timeout-based redraws which leads to much smoother drawing and less cpu footprint.
pretty
formatter (default) produces now valid escaped commands. It is still intented for visual output only. That's why a note for this was added.
- A fully working graphical user interface which is installed as a python module
by default (can be disabled via compile option ie
scons --without-gui
). It can be started viarmlint --gui
. - Support for automatic deduplication on btrfs using
BTRFS_IOC_FILE_EXTENT_SAME
. The Shellscript now will contain calls tormlint --btrfs $source $dest
for duplicates onbtrfs
filesystems if the user specified-c sh:clone
. - Benchmark suite that will track the performance of rmlint from release to release. This helps developers detect any speed regressions or improvements and is a tool to help develop and validate optimization strategies.
- Shell/Python-script now does more sanity checks before removing and can be told to
re-compare files byte-by-byte before removing them (
-p
option when running the.sh
file). - Add a new
--hash
option so rmlint can be used as a very fast file hashing utility, egrmlint --hash
works likesha1sum
, orrmlint --hash -d md5
works likemd5sum
. Also does sha256, sha512, murmur{128}, spooky{32,64,128}, city{128}. --sort-by
learned new keys:l
(path length) andd
(path depth).- New
--unmatched-basename
option only finds twins with differing basenames. - Smaller performance and memory optimisations in shredder.
-g
now checks if there is already ash
andjson
formatter before it adds one.-PP
now defaults toxxhash
as hashing algorithm.-o / --output
learned to guess the formatter you want to use from the file ending. For example-o /tmp/test.json
will work like-o json:/tmp/test.json
.- JSON output contains
rmlint
version and revision now. --replay
learned to merge several json files.- Internal refactoring (credits go to Daniel) of the scheduler and hashing library. The duplicate finding process has be split in separate modules.
2.3.0 Ominous Oscar -- 2015-06-15
- Compiles on Mac OSX now. See also: sahib#139
- Fix a crash that happened with
-e
. - Protect other lint than duplicates by
-k
or-K
. chown
in sh script fixed (waschmod
by accident).
--replay
: Re-output a previously written json file. Allow filtering by using all other standard options (like size or directory filtering).--sort-by
: Similar to-S
, but sorts groups of files. So showing the group with the biggest size sucker is as easy as-y s
.
-S
's long options is--rank-by
now (prior--sortcriteria
).-o
can guess the formatter from the filename if given.- Remove some optimisations that gave no visible effect.
- Simplified FIEMAP optimisation to reduce initial delay and reduce memory overhead
- Improved hashing strategy for large disks (do repeated smaller sweeps across the disk instead of incrementally hashing every file on the disk)
- Incorrect handling of -W, --no-with-color option
- Handling of $PKG_CONFIG in SConstruct
- Failure to build manpage
- Various BSD compatibility issues
- Nonstandard header sequence in modules using fts
- Removed some unnecessary warnings
2.2.0 Dreary Dropbear -- 2015-05-09
- Issue with excessive memory usage and processing delays with very large file counts (>5M files)
- Problems and crashes on 32bit with large files and normal files.
- Bug in memory manager for "paranoid" file comparison method which could lead to OOM error in some cases and infinite looping in others.
- Fixed bug which prevented option --max-paranoid-mem working.
- Note: much kudos to our user "vvs-" who provided many useful testcases and was prepared to re-run a 10-hour duplicate search after each effort to fix the underlying issues.
- Handling of json formatter on invalid utf8, which fixed
--cache
in return. - Bug during file traversal when encountering symlinks to empty folders
- More aggressive test suite, leading to higher coverage rates (90% of lines, almost 100% functions at least). Let's not speak of branch coverage for now. 😄
- A primitive benchmark suite.
- A GUI sketch that can be shipped along rmlint.
- Most internal filesystems like
proc
are ignored now. - Improved progressbar
- Memory footprint reduced to enable larger filesets to be processed. See discussion at sahib#109. Improvements include a Pat(h)ricia-Trie used as data structure to efficiently map file paths with much less memory consumption. Also the file preprocessing strategy (eg to find path doubles) has been improved to avoid having several large hashtables active at the same time.
- Improved threading strategy which increases speed of duplicate matching. As before, the threading strategy uses just one thread per physical disk to enable fast reading without disk thrash. The improved algorithm now increases the number of cpu threads used to hash the data as it is read in. Also an improved mutex strategy reduces the wait time before the hash results can be processed. Note the new threading strategy is particularly effective on the "paranoid" (byte-by-byte) file comparison method (option -pp), which is now almost as fast as the default (SHA1 hash) method.
- The optimisation in 2.1.0 which detects existing reflinks has been reverted for now due to conflicts between shredder and treemerge.
2.1.0 Malnourished Molly -- beta-release 2015-04-13
- performance regression: When having many pairs of duplicates, the core got slower very fast due to linear lookups. Fixed.
- performance regression: No SSDs were detected due to two bugs.
- commandline aborts also on non-fatal option misuses.
- Some statistic counts were updated wrong sometimes.
- Fixes in treemerge to respect directories tagges as originals.
- Ignore "evil" fs types like bindfs, nullfs completely.
- Fix race in file tree traversal.
- Various smaller bugfixes.
--with-metadata-cache
makesrmlint
less memory hungry by storing its paths in a sqlite3 database and selecting them when needed.--without-fiemap
disables thefiemap
optimization when focus is on memory footprint.--perms
can check if a file should be readable/writable or executable.- Json output is enabled by default and is written to
rmlint.json
. --partial-hidden
does only see hidden files in duplicate directories.--cache/--write-unfinished
can be used to speedup re-runs drastically.- Checksums can be stored in the xattr of files with
--xattr-read/write/clear
. - New progressbar output inspired by
journalctl --verfiy
. - Better support for reflink-capable filesystems (e.g. btrfs):
- detect existing reflinks using
fiemap
data (significant speedup) - support replacing files by a reflink if the filesystem supports it.
- detect existing reflinks using
- Optional dependency for sqlite3 for
--with-metadata-cache
. --hardlinked
is enabled by default.- Support -n (dry-run) for rmlint.sh; require user input on ask.
- Default digest is now sha1 instead of spooky.
- updated
.pot
template with help strings. - updated german translation accordingly.
- -T supports arguments like df,dd properly now.
- New --help text that shows a short reference only.
- sahib made his 1000th commit on rmlint with this text and wonders where all the time has gone and why he isn't rich yet.
2.0.0 Personable Pidgeon -- 2014-01-23
Initial release of the rewrite.