Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
rob-p committed Jun 23, 2022
2 parents c8b0ee2 + 595ecb7 commit 9b60125
Show file tree
Hide file tree
Showing 14 changed files with 928 additions and 713 deletions.
10 changes: 3 additions & 7 deletions .drone.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,22 @@
pipeline:
setup:
image: hbb:salmon_build_new
image: combinelab/hbb_salmon_build:latest
commands:
- echo "Starting build"
- ./.drone/build.sh
test_indexing:
image: hbb:salmon_build_new
image: combinelab/hbb_salmon_build:latest
commands:
- echo "[Testing quant]"
- ./.drone/test_quant.sh
volumes:
- /mnt/scratch6/avi/data:/mnt/data
- /mnt/scratch6/salmon_ci:/mnt/ci_res
copy_build:
image: hbb:salmon_build_new
image: combinelab/hbb_salmon_build:latest
commands:
- echo "[Packaging binary]"
- ./.drone/copy_build.sh
volumes:
- /mnt/scratch6/avi/data:/mnt/data
- /mnt/scratch6/salmon_ci:/mnt/ci_res
notify_gitter:
image: plugins/gitter
commands:
- echo "[Notifying gitter]"
2 changes: 1 addition & 1 deletion current_version.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
VERSION_MAJOR 1
VERSION_MINOR 8
VERSION_MINOR 9
VERSION_PATCH 0
4 changes: 2 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@
# built documents.
#
# The short X.Y version.
version = '1.8'
version = '1.9'
# The full version, including alpha/beta/rc tags.
release = '1.8.0'
release = '1.9.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
26 changes: 13 additions & 13 deletions doc/source/salmon.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ alignments (in the form of a SAM/BAM file) to the transcripts rather than the
raw reads.

The **mapping**-based mode of Salmon runs in two phases; indexing and
quantification. The indexing step is independent of the reads, and only need to
be run one for a particular set of reference transcripts. The quantification
quantification. The indexing step is independent of the reads, and only needs to
be run once for a particular set of reference transcripts. The quantification
step, obviously, is specific to the set of RNA-seq reads and is thus run more
frequently. For a more complete description of all available options in Salmon,
see below.
Expand All @@ -24,7 +24,7 @@ see below.
salmon. When salmon is run with selective alignment, it adopts a
considerably more sensitive scheme that we have developed for finding the
potential mapping loci of a read, and score potential mapping loci using
the chaining algorithm introdcued in minimap2 [#minimap2]_. It scores and
the chaining algorithm introduced in minimap2 [#minimap2]_. It scores and
validates these mappings using the score-only, SIMD, dynamic programming
algorithm of ksw2 [#ksw2]_. Finally, we recommend using selective
alignment with a *decoy-aware* transcriptome, to mitigate potential
Expand Down Expand Up @@ -90,7 +90,7 @@ set of alignments.

For quasi-mapping-based Salmon, the story is somewhat different.
Generally, performance continues to improve as more threads are made
available. This is because the determiniation of the potential mapping
available. This is because the determination of the potential mapping
locations of each read is, generally, the slowest step in
quasi-mapping-based quantification. Since this process is
trivially parallelizable (and well-parallelized within Salmon), more
Expand Down Expand Up @@ -140,9 +140,9 @@ This will build the mapping-based index, using an auxiliary k-mer hash
over k-mers of length 31. While the mapping algorithms will make used of arbitrarily
long matches between the query and reference, the `k` size selected here will
act as the *minimum* acceptable length for a valid match. Thus, a smaller
value of `k` may slightly improve sensitivty. We find that a `k` of 31 seems
value of `k` may slightly improve sensitivity. We find that a `k` of 31 seems
to work well for reads of 75bp or longer, but you might consider a smaller
`k` if you plan to deal with shorter reads. Also, a shoter value of `k` may
`k` if you plan to deal with shorter reads. Also, a shorter value of `k` may
improve sensitivity even more when using selective alignment (enabled via the `--validateMappings` flag). So,
if you are seeing a smaller mapping rate than you might expect, consider building
the index with a slightly smaller `k`.
Expand Down Expand Up @@ -243,7 +243,7 @@ mode, and a description of each, run ``salmon quant --help-alignment``.
.. note:: Genomic vs. Transcriptomic alignments

Salmon expects that the alignment files provided are with respect to the
transcripts given in the corresponding fasta file. That is, Salmon expects
transcripts given in the corresponding FASTA file. That is, Salmon expects
that the reads have been aligned directly to the transcriptome (like RSEM,
eXpress, etc.) rather than to the genome (as does, e.g. Cufflinks). If you
have reads that have already been aligned to the genome, there are
Expand Down Expand Up @@ -415,7 +415,7 @@ distribution of the sequencing library. This value will affect the
effective length correction, and hence the estimated effective lengths
of the transcripts and the TPMs. The value passed to ``--fldSD`` will
be used as the standard deviation of the assumed fragment length
distribution (which is modeled as a truncated Gaussan with a mean
distribution (which is modeled as a truncated Gaussian with a mean
given by ``--fldMean``).


Expand Down Expand Up @@ -529,7 +529,7 @@ have a prior count of 1 fragment, while a transcript of length 50000 will have
a prior count of 0.5 fragments, etc. This behavior can be modified in two
ways. First, the prior itself can be modified via Salmon's ``--vbPrior``
option. The argument to this option is the value you wish to place as the
*per-nucleotide* prior. Additonally, you can modify the behavior to use
*per-nucleotide* prior. Additionally, you can modify the behavior to use
a *per-transcript* rather than a *per-nucleotide* prior by passing the flag
``--perTranscriptPrior`` to Salmon. In this case, whatever value is set
by ``--vbPrior`` will be used as the transcript-level prior, so that the
Expand Down Expand Up @@ -559,7 +559,7 @@ bootstraps allows us to assess technical variance in the main abundance estimate
we produce. Such estimates can be useful for downstream (e.g. differential
expression) tools that can make use of such uncertainty estimates. This option
takes a positive integer that dictates the number of bootstrap samples to compute.
The more samples computed, the better the estimates of varaiance, but the
The more samples computed, the better the estimates of variance, but the
more computation (and time) required.

"""""""""""""""""""""""""""""""
Expand Down Expand Up @@ -664,7 +664,7 @@ the length of the transcriptome --- though each evaluation itself is
efficient and the process is highly parallelized.

It is possible to speed this process up by a multiplicative factor by
considering only every *i*:sup:`th` fragment length, and interploating
considering only every *i*:sup:`th` fragment length, and interpolating
the intermediate results. The ``--biasSpeedSamp`` option allows the
user to set this sampling factor. Larger values speed up effective
length correction, but may decrease the fidelity of bias modeling.
Expand All @@ -683,7 +683,7 @@ map to the transcriptome. When mapping paired-end reads, the entire
fragment (both ends of the pair) are identified by the name of the first
read (i.e. the read appearing in the ``_1`` file). Each line of the unmapped
reads file contains the name of the unmapped read followed by a simple flag
that designates *how* the read failed to map completely. If fragmetns are
that designates *how* the read failed to map completely. If fragments are
aligned against a decoy-aware index, then fragments that are confidently
assigned as decoys are written in this file followed by the ``d`` (decoy)
flag. Apart from the decoy flag, for single-end
Expand All @@ -694,7 +694,7 @@ reads, there are a number of different possibilities, outlined below:
u = The entire pair was unmapped. No mappings were found for either the left or right read.
m1 = Left orphan (mappings were found for the left (i.e. first) read, but not the right).
m2 = Right orphan (mappinds were found for the right read, but not the left).
m2 = Right orphan (mappings were found for the right read, but not the left).
m12 = Left and right orphans. Both the left and right read mapped, but never to the same transcript.

By reading through the file of unmapped reads and selecting the appropriate
Expand Down
11 changes: 8 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# image: COMBINE-lab/salmon
# This dockerfile is based on the one created by
# Titus Brown (available at https://github.com/ctb/2015-docker-building/blob/master/salmon/Dockerfile)
FROM ubuntu:18.04
FROM ubuntu:18.04 as base
MAINTAINER [email protected]

ENV PACKAGES git gcc make g++ libboost-all-dev liblzma-dev libbz2-dev \
ca-certificates zlib1g-dev libcurl4-openssl-dev curl unzip autoconf apt-transport-https ca-certificates gnupg software-properties-common wget
ENV SALMON_VERSION 1.8.0
ENV SALMON_VERSION 1.9.0

# salmon binary will be installed in /home/salmon/bin/salmon

Expand Down Expand Up @@ -36,7 +36,7 @@ RUN curl -k -L https://github.com/COMBINE-lab/salmon/archive/v${SALMON_VERSION}.
cd salmon-${SALMON_VERSION} && \
mkdir build && \
cd build && \
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local && make && make install
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/salmon && make && make install

# For dev version
#RUN git clone https://github.com/COMBINE-lab/salmon.git && \
Expand All @@ -46,6 +46,11 @@ RUN curl -k -L https://github.com/COMBINE-lab/salmon/archive/v${SALMON_VERSION}.
# cd build && \
# cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local && make && make install

FROM ubuntu:18.04
RUN apt-get update \
&& apt-get install -y --no-install-recommends libhwloc5 \
&& rm -rf /var/lib/apt/lists/*
COPY --from=base /usr/local/salmon/ /usr/local/
ENV PATH /home/salmon-${SALMON_VERSION}/bin:${PATH}
ENV LD_LIBRARY_PATH "/usr/local/lib:${LD_LIBRARY_PATH}"

Expand Down
2 changes: 1 addition & 1 deletion docker/build_test.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#! /bin/bash
SALMON_VERSION=1.8.0
SALMON_VERSION=1.9.0
docker build --no-cache -t combinelab/salmon:${SALMON_VERSION} -t combinelab/salmon:latest .
4 changes: 2 additions & 2 deletions include/SalmonConfig.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@

namespace salmon {
constexpr char majorVersion[] = "1";
constexpr char minorVersion[] = "8";
constexpr char minorVersion[] = "9";
constexpr char patchVersion[] = "0";
constexpr char version[] = "1.8.0";
constexpr char version[] = "1.9.0";
constexpr uint32_t indexVersion = 5;
constexpr char requiredQuasiIndexVersion[] = "p7";
} // namespace salmon
Expand Down
1 change: 1 addition & 0 deletions include/SalmonDefaults.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ namespace defaults {
constexpr const bool posBiasCorrect{false};
constexpr const uint32_t numThreads{8};
constexpr const double incompatPrior{0.0};
constexpr const bool writeQualities{false};
constexpr const char quasiMappingDefaultFile[] = "";
constexpr const char quasiMappingImplicitFile[] = "-";
constexpr const bool metaMode{false};
Expand Down
1 change: 1 addition & 0 deletions include/SalmonOpts.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ struct SalmonOpts {

uint32_t eelCountCutoff{50};
// For writing quasi-mappings
bool writeQualities;
std::string qmFileName;
std::ofstream qmFile;
std::unique_ptr<std::ostream> qmStream{nullptr};
Expand Down
4 changes: 2 additions & 2 deletions scripts/fetchPufferfish.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ if [ -d ${INSTALL_DIR}/src/pufferfish ] ; then
rm -fr ${INSTALL_DIR}/src/pufferfish
fi

SVER=salmon-v1.8.0
#SVER=develop
#SVER=salmon-v1.8.0
SVER=develop
#SVER=sketch-mode

EXPECTED_SHA256=9c415bf431629929153625b354d8bc96828da2a236e99b5d1e6624311b3e0ad5
Expand Down
1 change: 1 addition & 0 deletions scripts/make-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ rm ${DIR}/../RELEASES/${betaname}/lib/libc.so.6
rm ${DIR}/../RELEASES/${betaname}/lib/ld-linux-x86-64.so.2
rm ${DIR}/../RELEASES/${betaname}/lib/libdl.so.2
rm ${DIR}/../RELEASES/${betaname}/lib/libpthread*.so.*
rm ${DIR}/../RELEASES/${betaname}/lib/libm.so.6

# now make the tarball
echo -e "Making the tarball\n"
Expand Down
5 changes: 5 additions & 0 deletions src/ProgramOptionsGenerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,11 @@ namespace salmon {
"format. By default, output will be directed to "
"stdout, but an alternative file name can be "
"provided instead.")
("writeQualities", po::bool_switch(&(sopt.writeQualities))->default_value(salmon::defaults::writeQualities),
"This flag only has meaning if mappings are being written (with --writeMappings/-z). "
"If this flag is provided, then the output SAM file will contain quality strings as well as "
"read sequences. Note that this can greatly increase the size of the output file."
)
("hitFilterPolicy",
po::value<string>(&sopt.hitFilterPolicyStr)->default_value(salmon::defaults::hitFilterPolicyStr),
"[selective-alignment mode only] : Determines the policy by which hits are filtered in selective alignment. Filtering hits after "
Expand Down
Loading

0 comments on commit 9b60125

Please sign in to comment.