Releases: COMBINE-lab/RapMap
rapmap v0.6.0
This release brings the master branch and tagged release up to date with many of the bug fixes, developments and improvements that have been going into the develop-salmon
branch in support of salmon development. Among the most important new features in this release is the ability to have rapmap apply selective-alignment to improve the sensitivity and specificity of the mappings. For more details on selective-alignment and mapping validation, refer to the release notes of salmon --- and specifically options related to the --validateMappings
. This release also adds the ability to optionally write out unmapped reads in the output SAM with the -u
flag.
RapMap v0.5.0
This release is accompanied by more re-organization and refinement of the code. It also adds some new options and the SAM output format (removing quality scores, resulting is smaller SAM files and faster output).
RapMap v0.4.0
This new release of RapMap is accompanied by a substantial cleanup of the underlying codebase (including a few bug fixes). The new version should better handle cases where there are almost equally-good mappings on the forward and reverse-complement strands (but where one mapping is slightly better than the other). It also introduces a few more user-facing features (see New Features below).
Important note:
The quasi-indices from previous versions of RapMap are not binary compatible with the new version (see below). Please re-build your indices before using RapMap v0.4.0.
New Features
- New hash map for default index - The default
quasiindex
command now uses the sparsepp sparse hash map. While providing very similar lookup performance to the prior hash map implementation, sparsepp provides a number of benefits. Specifically, it uses substantially less memory (typically ~50% less) and, crucially, the memory usage grows gradually with the number of keys. A big problem with the previous implementation being used (Google's dense hash map) is that, on resize, the map would double and memory usage would jump by a factor of 3 (a new map of twice the size as the old, plus the original map from which to copy the keys). This means that even if you had enough memory to hold the final map, you might not be able to build it. Sparsepp, on the other hand exhibits memory usage that scales almost linearly with the number of items in the map. For more details on the performance characteristics of the new default hash used in the index, please see the sparsepp benchmarks here. - New frugal perfect hash index - The vastly improved memory usage of the new default
quasi index
essentially obviates the previous perfect-hash-based index. Specifically, since that perfect hash also stored the keys (to validate queries from outside the universe on which the hash was built), the size of the resulting index was similar, it simply required less memory to build. However, sparsepp achieves very similar memory usage to the previous perfect-hash-based index. Instead of removing the perfect-hash-based index entirely, the-p/--perfectHash
flag now tells thequasiindex
command to build a frugal perfect-hash-based index. This index uses a number of aggressive space-saving techniques which results in a much smaller memory footprint (but it is also slower to construct and has slower lookups than the default index). For large references, the new frugal perfect-hash-based index exhibits a memory reduction (over the new, reduced-memory, default index) of 40-50% (hence, it shows close to this savings over the old perfect-hash-based index as well). Also, for large references, the size of the index on disk is ~40% smaller. The cost of this substantial size reduction is that the frugal perfect-hash-based index takes 2-2.5 times longer to build, and lookups are slower. This slower lookup speed can, conceivably, reduce quasi-mapping speed a bit, but the speed hit (if there is one) is dataset dependent. This new indexing scheme should allow the construction ofquasi
indices on substantially larger references for a fixed RAM budget, and also reduces the memory required to retain the index in memory during mapping as well. - New options to the
quasimap
command - The following options have been introduced to thequasimap
command:- sensitive mode - the
-e
/--sensitive
flag will turn off some NIP-based jumping in the algorithm and will allow reads to compete for mapping using MMP-based coverage profiles. This can increase the sensitivity and specificity of difficult-to-map reads. - quasi coverage - the
-z
/--quasiCoverage
option takes a numberc <= 0 <= 1
, that allows the user to specify that a read will only be considered as "mappable" if at least a fractionc
of the read is covered by maximum-mappable-prefixes. Note that the condition that the coverage must be in terms of MMPs is rather stringent, and so this parameter is not to be interpreted as the fraction of nucleotides that would be covered under an optimal alignment. Nonetheless, it allows enforcing the requirement that a single k-length hit should not be sufficient evidence of mapping, and can reduce false-positive mappings when similar but distinct sequences are present in the sample but not the reference (thequasiCov
option impliessensitive
mode, but not vice-versa). - quiet flag - the
q
/--quiet
flag will disable all non-warning/non-error output of thequasimapping
command to the console.
- sensitive mode - the
Other changes
- Removal of quality strings from the SAM output - RapMap now output
*
in place of the quality string of a read in the output SAM file. This is consistent with the SAM standard, and produces output that is considerably less verbose (faster to write and takes up less space), and which also compresses to BAM much better. If quality strings for particular reads are desired, they can always be retrieved from the corresponding read IDs and the original file (we may provide a tool for this in the future). - CIGAR string of unmapped reads - The CIGAR string of unmapped reads is now reported
*
rather thanNS
(i.e. softclipping of lengthN
, whereN
is the read length).
RapMap v0.3.0
RapMap v0.3.0 includes one major new feature and one major bugfix since v0.2.2
New Feature
The emphf library has been replaced by BooM, which is based on the excellent BBHash minimal perfect hash library. The big benefits of this switch are that (1) generating the perfect hash now doesn't require any significant extra memory, even further reducing the memory requirements for indexing and (2) the perfect hash can be computed much more quickly and in parallel — use the -x
flag to pass an argument for the number of flags that should be used to build the perfect hash function.
Bug Fix
Previous versions of RapMap contained a bug that could be triggered when the total transcriptome size was between 2^31 and 2^32-1 (transcriptome of smaller or larger size ere unaffected). This bug would most likely have caused RapMap to segfault. This was the result of a possible integer overflow, and has been fixed in version 0.3.0.
RapMap v0.2.2
This is mainly a bug-fix and maintenance release.
Bug Fix
- In rare circumstances, when an equal quality mapping existed between the forward and reverse-complement strand, the algorithm would exhibit a preference for the forward mapping. This bug was due to a shadowed variable and has been resolved.
Features
- The
-c
flag enforces co-linearity within chains of hits (i.e. the hits must be monotonically increasing / decreasing with respect to both the reference and query). We're adding the appropriate framework for enforcing different types of filters, so more should be forthcoming in future releases. - Though it has been implemented for a while, here we're documenting the existence of the perfect-hash-based quasi indexing. This replaces the dense hash map with a perfect hash (using the fantastic EMPHF library). This is enabled when building the quasi-index by passing the
-p
or--perfectHash
flag. The tradeoff is that index construction will be slower, but the resulting index will require considerably less space (40 - 50% less) during mapping. Mapping speed is roughly equivalent regardless of whether a "normal" or perfect hash is used.
RapMap v0.2.1
This version includes two bug fixes:
- Correct SAM flags for primary and non-primary mappings of paired-end reads.
- Correct reported sequence / quality values when the read maps to the reverse complement strand.
RapMap v0.2.0
Description of changes forthcoming.
RapMap v0.1.0-pre
This release provides a (hopefully widely-compatible) Linux binary for RapMap.