Salmon v0.6.0
This is a fairly major new release of Salmon (thus the major version bump). It includes some new features and makes minor but backward-incompatible changes to the output format. Many of these changes track the latest changes to Sailfish.
Note for OSX binary:
If you receive a message that a library cannot be found (i.e. if you run into an @rpath
issue), try running Salmon using the following command:
$ DYLD_FALLBACK_LIBRARY_PATH=<PATH_TO_SALMON>/lib <PATH_TO_SALMON>/bin/salmon
If this works, you can add the library path to the DYLD_FALLBACK_LIBRARY_PATH
variable automatically by placing the line:
export DYLD_FALLBACK_LIBRARY_PATH=<PATH_TO_SALMON>/lib <PATH_TO_SALMON>/bin/salmon:$DYLD_FALLBACK_LIBRARY_PATH
in your ~/.profile
file.
Major Changes
- Default index --- The quasi index has been made the default type. This means that it is no longer necessary to provide the
--type
option to theindex
command. Thefmd
index remains enabled, but may be removed in a future version. We urge you to move over to thequasi
index if you are not already using it. - Sequence-specific bias correction --- The old bias correction methodology has been removed from Salmon and replaced with a new sequence-specific bias correction model. Bias correction is enabled with the
--biasCorrect
flag. The new model has numerous benefits over the old. First, it should more accurately correct for sequence specific biases, leading to better estimates in biased samples. Second, it should not suffer from the same pathological "over-correction" failure cases of the old model --- if there is no substantial bias in the sample, it should have only a minimal effect on quantification results. - New output format --- The new output format adds another column,
EffectiveLength
, to the output which records the effective length of each transcript. This is the third column, and theTPM
andNumReads
columns have both been shifted by 1. Also, thequant.sf
output file has been simplified and now contains no comment lines. The first row in the file is an (un-commented) header that lists the column names, and the subsequent rows are the quantification estimates. - Information about the command used --- Since the comment lines have been removed from the
quant.sf
file, this information (and more), which can sometimes be useful, has been output to other locations. There is a JSON formatted file in the top-level output directory calledcmd_info.json
. This contains a JSON structure with the relevant command line parameters (which used to appear in thequant.sf
comments). - Meta-information about the run --- Quite a bit of useful information appears in the file
aux/meta_info.json
under the main quantification directory. This records information such as the number of reads processed, the number mapped, the percentage mapped, which type of posterior sampling (e.g. Gibbs / bootstrap), if any, was performed. - Auxiliary parameters from the run --- In addition to the
meta_info.json
file, theaux/
directory of the main quantification directory contains other useful files. Specifically, it contains gzipped, binary, data for any bootstrap or Gibbs samples that were generated, and gzipped binary data about the fragment length distribution and bias parameters (the latter is only meaningful if bias-correction was performed).
Minor Changes
- Position specific start distribution --- Modeling of the position-specific start distribution has been improved, and the way that it is enabled / disabled has been changed. This model is off by default, but is enabled with the
--useFSPD
.
Bug Fixes
- This release fixes a bug where the mapping location of a fragment may have been miscalculated by a small number of bases in certain cases. This in turn could lead to a small shift in the fragment length distribution and in the resulting quantification estimates.
Acknowledgements
- Special thanks go to Ayush Sengupta for helping out with the implementation of sequence-specific bias correction.
- Special thanks go to Mike Love for testing the effectiveness of the sequence-specific bias correction implementation (in Sailfish, but this uses the same model) on some experimental (GEUVADIS) data!
Note
As you may note, there are two DebianSqueeze binaries listed below. The binary called SalmonBeta-0.6.0_DebianSqueeze.tar.gz
is the "standard" binary, which is built to use the JEMalloc memory allocator. In certain situations (involving files on NFS) this allocator has been observed to segfault upon program termination. This doesn't seem to affect the results, which have already been written by the time this occurs. However, if you encounter this problem, you can try SalmonBeta-0.6.0_DebianSqueeze_tcmalloc.tar.gz
, which is built to use the TCMalloc memory allocator instead; which doesn't seem to suffer from this same issue.