You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this issue was mostly about the ephemeral intermediates, not anything we present to the outside world. In that regard, the only reason to use bam over sam is so that we don't always require a VM instance to have a ton of local disk space for handling large sets of reads (say, from big sequencers). But going for cram is probably cpu-overkill on a file that we're just going to delete anyway. In fact, for this particular issue, I was thinking that we should just be using the -1 compression level flag on samtools, which optimizes for speed (you really don't want this part to be the bottleneck) while reducing unnecessary wastage on the local temp disk.
In a few places we use
.sam
intermediary files where we could use.bam
files. The latter take a bit more IO/CPU time with the advantage of better compression ratio. One such instance is here:https://github.com/broadinstitute/viral-ngs/blob/master/tools/bwa.py#L228
We should consider switching these occurrences across the codebase to use
.bam
by pipling to samtools with the-b
flag.The text was updated successfully, but these errors were encountered: