##Show how the Rsamtools pileup function could be improve to include details about indels

library(Rsamtools)
bam_file <- "test.bam"
sbp <- ScanBamParam(which = GRanges("chr17", IRanges(7578425, 7578425)))
p_param <- PileupParam(max_depth = 30000, min_base_quality = 0, include_insertions = TRUE)
res <- pileup(bam_file, scanBamParam = sbp, pileupParam = p_param)
res

##    seqnames     pos strand nucleotide count           which_label
## 1     chr17 7578425      +          +     7 chr17:7578425-7578425
## 2     chr17 7578425      -          +    14 chr17:7578425-7578425
## 3     chr17 7578425      +          -    61 chr17:7578425-7578425
## 4     chr17 7578425      -          -     2 chr17:7578425-7578425
## 5     chr17 7578425      +          A    11 chr17:7578425-7578425
## 6     chr17 7578425      -          A     5 chr17:7578425-7578425
## 7     chr17 7578425      +          C    21 chr17:7578425-7578425
## 8     chr17 7578425      -          C    23 chr17:7578425-7578425
## 9     chr17 7578425      +          G     7 chr17:7578425-7578425
## 10    chr17 7578425      -          G     5 chr17:7578425-7578425
## 11    chr17 7578425      +          T  4866 chr17:7578425-7578425
## 12    chr17 7578425      -          T  6273 chr17:7578425-7578425

Note that first four lines don't give details about the actual insertions and deletions. Samtools pilepup actually gives this information and it would be good to have them (optionally) here too. For example here:

The 7 insertions on the + strand consist of 6 insertions of a G and 1 insertion of TCA.
The 14 insertions on the - strand consist of 11 deletions of a G, 2 of a A and one of a C.
The 61 deletions on the + strand consist of 41 insertions of TGT, 18 of a T and 2 of TG.
The 2 deletions on the - strand consist of 2 deletions of a T.

So one possible way would be to output a data frame like this containing the full data:

##    seqnames     pos strand nucleotide count           which_label
## 1     chr17 7578425      +         +G     6 chr17:7578425-7578425
## 2     chr17 7578425      +       +TCA     1 chr17:7578425-7578425
## 3     chr17 7578425      -         +G    11 chr17:7578425-7578425
## 4     chr17 7578425      -         +A     2 chr17:7578425-7578425
## 5     chr17 7578425      -         +C     1 chr17:7578425-7578425
## 6     chr17 7578425      +       -TGT    41 chr17:7578425-7578425
## 7     chr17 7578425      +         -T    18 chr17:7578425-7578425
## 8     chr17 7578425      +        -TG     2 chr17:7578425-7578425
## 9     chr17 7578425      -         -T     2 chr17:7578425-7578425
## 51    chr17 7578425      +          A    11 chr17:7578425-7578425
## 61    chr17 7578425      -          A     5 chr17:7578425-7578425
## 71    chr17 7578425      +          C    21 chr17:7578425-7578425
## 81    chr17 7578425      -          C    23 chr17:7578425-7578425
## 91    chr17 7578425      +          G     7 chr17:7578425-7578425
## 10    chr17 7578425      -          G     5 chr17:7578425-7578425
## 11    chr17 7578425      +          T  4866 chr17:7578425-7578425
## 12    chr17 7578425      -          T  6273 chr17:7578425-7578425

Note that the last 8 lines are just the same as above, only the indels lines now show more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls