Skip to content

Commit

Permalink
Merge branch 'main' of github.com:davetang/learning_bam_file into main
Browse files Browse the repository at this point in the history
  • Loading branch information
davetang committed Sep 13, 2022
2 parents 5f4ea0b + 9934df0 commit 0a4bb18
Showing 1 changed file with 43 additions and 42 deletions.
85 changes: 43 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Table of Contents

<!-- Created by https://github.com/ekalinin/github-markdown-toc -->

Mon 18 Apr 2022 01:51:54 PM UTC
Thu 18 Aug 2022 11:49:13 PM UTC

Learning the BAM format
================
Expand Down Expand Up @@ -124,7 +124,7 @@ samtools --help

##
## Program: samtools (Tools for alignments in the SAM format)
## Version: 1.15 (using htslib 1.15)
## Version: 1.16 (using htslib 1.16)
##
## Usage: samtools <command> [options]
##
Expand Down Expand Up @@ -156,6 +156,7 @@ samtools --help
## fastq converts a BAM to a FASTQ
## fasta converts a BAM to a FASTA
## import Converts FASTA or FASTQ files to SAM/BAM/CRAM
## reference Generates a reference from aligned data
##
## -- Statistics
## bedcov read depth per BED region
Expand Down Expand Up @@ -214,15 +215,15 @@ Size of SAM file.
ls -lh eg/ERR188273_chrX.sam
```

## -rw-r--r-- 1 root root 321M Apr 18 13:49 eg/ERR188273_chrX.sam
## -rw-r--r-- 1 root root 321M Aug 18 23:46 eg/ERR188273_chrX.sam

Size of BAM file.

``` bash
ls -lh eg/ERR188273_chrX.bam
```

## -rw-r--r-- 1 root root 67M Apr 18 13:47 eg/ERR188273_chrX.bam
## -rw-r--r-- 1 root root 67M Aug 18 23:44 eg/ERR188273_chrX.bam

We can use `head` to view a SAM file.

Expand All @@ -233,7 +234,7 @@ head eg/ERR188273_chrX.sam
## @HD VN:1.0 SO:coordinate
## @SQ SN:chrX LN:156040895
## @PG ID:hisat2 PN:hisat2 VN:2.2.0 CL:"/Users/dtang/github/rnaseq/hisat2/../src/hisat2-2.2.0/hisat2-align-s --wrapper basic-0 --dta -p 4 -x ../raw/chrX_data/indexes/chrX_tran -1 /tmp/4195.inpipe1 -2 /tmp/4195.inpipe2"
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.15 CL:samtools view -h eg/ERR188273_chrX.bam
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.16 CL:samtools view -h eg/ERR188273_chrX.bam
## ERR188273.4711308 73 chrX 21649 0 5S70M = 21649 0 CGGGTGATCACGAGGTCAGGAGATCAAGACCATCCTGGCCAACACAGTGAAACCCCATCTCTACTAAAAATACAA @@@F=DDFFHGHBHIFFHIGGIFGEGHFHIGIGIFIIIGIGIGGDHIIGIIC@>DGHCHHHGHHFFFFFDEACC@ AS:i:-5 ZS:i:-5 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:70 YT:Z:UP NH:i:2
## ERR188273.4711308 133 chrX 21649 0 * = 21649 0 CTACAGGTGCCCGCCACCATGCCCAGCTAATTTTTTTTGTATTTTTAGTAGAGATGGGGTTTCACTGTGTTGGCC CB@FDFFFHHGFHIJJJJIIIIIIIGGGIJGIIJJJJJJFFHIIIIGECHEHHGGHHFF?AACCDDDDDDDDBCD YT:Z:UP
## ERR188273.4711308 329 chrX 233717 0 5S70M = 233717 0 CGGGTGATCACGAGGTCAGGAGATCAAGACCATCCTGGCCAACACAGTGAAACCCCATCTCTACTAAAAATACAA @@@F=DDFFHGHBHIFFHIGGIFGEGHFHIGIGIFIIIGIGIGGDHIIGIIC@>DGHCHHHGHHFFFFFDEACC@ AS:i:-5 ZS:i:-5 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:70 YT:Z:UP NH:i:2
Expand Down Expand Up @@ -272,9 +273,9 @@ samtools view -T genome/chrX.fa -C -o eg/ERR188273_chrX.cram eg/ERR188273_chrX.b
ls -lh eg/ERR188273_chrX.[sbcr]*am
```

## -rw-r--r-- 1 root root 67M Apr 18 13:47 eg/ERR188273_chrX.bam
## -rw-r--r-- 1 root root 40M Apr 18 13:49 eg/ERR188273_chrX.cram
## -rw-r--r-- 1 root root 321M Apr 18 13:49 eg/ERR188273_chrX.sam
## -rw-r--r-- 1 root root 67M Aug 18 23:44 eg/ERR188273_chrX.bam
## -rw-r--r-- 1 root root 40M Aug 18 23:46 eg/ERR188273_chrX.cram
## -rw-r--r-- 1 root root 321M Aug 18 23:46 eg/ERR188273_chrX.sam

You can use `samtools view` to view a CRAM file just as you would for a
BAM file.
Expand Down Expand Up @@ -311,8 +312,8 @@ ls -l eg/ERR188273_chrX.bam
ls -l eg/sorted.bam
```

## -rw-r--r-- 1 root root 69983526 Apr 18 13:47 eg/ERR188273_chrX.bam
## -rw-r--r-- 1 root root 69983598 Apr 18 13:49 eg/sorted.bam
## -rw-r--r-- 1 root root 69983526 Aug 18 23:44 eg/ERR188273_chrX.bam
## -rw-r--r-- 1 root root 69983598 Aug 18 23:46 eg/sorted.bam

You should use use additional threads (if they are available) to speed
up sorting; to use four threads, use `-@ 4`.
Expand All @@ -324,9 +325,9 @@ time samtools sort eg/ERR188273_chrX.sam -o eg/sorted.bam
```

##
## real 0m9.192s
## user 0m8.878s
## sys 0m0.236s
## real 0m10.403s
## user 0m10.045s
## sys 0m0.268s

Time taken using four threads.

Expand All @@ -336,9 +337,9 @@ time samtools sort -@ 4 eg/ERR188273_chrX.sam -o eg/sorted.bam

## [bam_sort_core] merging from 0 files and 4 in-memory blocks...
##
## real 0m5.257s
## user 0m9.655s
## sys 0m0.302s
## real 0m6.061s
## user 0m11.180s
## sys 0m0.378s

Many of the SAMtools subtools can use additional threads, so make use of
them if you have the resources\!
Expand Down Expand Up @@ -366,7 +367,7 @@ samtools head eg/ERR188273_chrX_rg.bam
## @HD VN:1.0 SO:coordinate
## @SQ SN:chrX LN:156040895
## @PG ID:hisat2 PN:hisat2 VN:2.2.0 CL:"/Users/dtang/github/rnaseq/hisat2/../src/hisat2-2.2.0/hisat2-align-s --wrapper basic-0 --dta -p 4 -x ../raw/chrX_data/indexes/chrX_tran -1 /tmp/4195.inpipe1 -2 /tmp/4195.inpipe2"
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.15 CL:samtools addreplacerg -r @RG\tID:ERR188273\tSM:ERR188273\tPL:illumina -o eg/ERR188273_chrX_rg.bam eg/ERR188273_chrX.bam
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.16 CL:samtools addreplacerg -r @RG\tID:ERR188273\tSM:ERR188273\tPL:illumina -o eg/ERR188273_chrX_rg.bam eg/ERR188273_chrX.bam
## @RG ID:ERR188273 SM:ERR188273 PL:illumina

If you want to replace existing read groups, just use the same command.
Expand All @@ -379,8 +380,8 @@ samtools head eg/ERR188273_chrX_rg2.bam
## @HD VN:1.0 SO:coordinate
## @SQ SN:chrX LN:156040895
## @PG ID:hisat2 PN:hisat2 VN:2.2.0 CL:"/Users/dtang/github/rnaseq/hisat2/../src/hisat2-2.2.0/hisat2-align-s --wrapper basic-0 --dta -p 4 -x ../raw/chrX_data/indexes/chrX_tran -1 /tmp/4195.inpipe1 -2 /tmp/4195.inpipe2"
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.15 CL:samtools addreplacerg -r @RG\tID:ERR188273\tSM:ERR188273\tPL:illumina -o eg/ERR188273_chrX_rg.bam eg/ERR188273_chrX.bam
## @PG ID:samtools.1 PN:samtools PP:samtools VN:1.15 CL:samtools addreplacerg -r @RG\tID:ERR188273_2\tSM:ERR188273_2\tPL:illumina_2 -o eg/ERR188273_chrX_rg2.bam eg/ERR188273_chrX_rg.bam
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.16 CL:samtools addreplacerg -r @RG\tID:ERR188273\tSM:ERR188273\tPL:illumina -o eg/ERR188273_chrX_rg.bam eg/ERR188273_chrX.bam
## @PG ID:samtools.1 PN:samtools PP:samtools VN:1.16 CL:samtools addreplacerg -r @RG\tID:ERR188273_2\tSM:ERR188273_2\tPL:illumina_2 -o eg/ERR188273_chrX_rg2.bam eg/ERR188273_chrX_rg.bam
## @RG ID:ERR188273_2 SM:ERR188273_2 PL:illumina_2

Popular alignment tools such as BWA MEM and STAR can add read groups;
Expand Down Expand Up @@ -694,8 +695,8 @@ head eg/ERR188273_chrX_fillmd.bam
## @HD VN:1.0 SO:coordinate
## @SQ SN:chrX LN:156040895
## @PG ID:hisat2 PN:hisat2 VN:2.2.0 CL:"/Users/dtang/github/rnaseq/hisat2/../src/hisat2-2.2.0/hisat2-align-s --wrapper basic-0 --dta -p 4 -x ../raw/chrX_data/indexes/chrX_tran -1 /tmp/4195.inpipe1 -2 /tmp/4195.inpipe2"
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.15 CL:samtools view -b eg/ERR188273_chrX.bam
## @PG ID:samtools.1 PN:samtools PP:samtools VN:1.15 CL:samtools fillmd -e - genome/chrX.fa
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.16 CL:samtools view -b eg/ERR188273_chrX.bam
## @PG ID:samtools.1 PN:samtools PP:samtools VN:1.16 CL:samtools fillmd -e - genome/chrX.fa
## ERR188273.4711308 73 chrX 21649 0 5S70M = 21649 0 CGGGT====================================================================== @@@F=DDFFHGHBHIFFHIGGIFGEGHFHIGIGIFIIIGIGIGGDHIIGIIC@>DGHCHHHGHHFFFFFDEACC@ AS:i:-5 ZS:i:-5 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:70 YT:Z:UP NH:i:2
## ERR188273.4711308 133 chrX 21649 0 * = 21649 0 CTACAGGTGCCCGCCACCATGCCCAGCTAATTTTTTTTGTATTTTTAGTAGAGATGGGGTTTCACTGTGTTGGCC CB@FDFFFHHGFHIJJJJIIIIIIIGGGIJGIIJJJJJJFFHIIIIGECHEHHGGHHFF?AACCDDDDDDDDBCD YT:Z:UP
## ERR188273.4711308 329 chrX 233717 0 5S70M = 233717 0 CGGGT====================================================================== @@@F=DDFFHGHBHIFFHIGGIFGEGHFHIGIGIFIIIGIGIGGDHIIGIIC@>DGHCHHHGHHFFFFFDEACC@ AS:i:-5 ZS:i:-5 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:70 YT:Z:UP NH:i:2
Expand Down Expand Up @@ -818,26 +819,26 @@ samtools mpileup -s -f test_ref.fa aln_bwa.bam aln_mm.bam | head -20
```

## [mpileup] 2 samples in 2 input files
## 1 7741 T 1 ^]. E ] 1 ^]. E ]
## 1 7742 G 1 . J ] 1 . J ]
## 1 7743 C 1 . J ] 1 . J ]
## 1 7744 A 1 . J ] 1 . J ]
## 1 7745 G 1 . J ] 1 . J ]
## 1 7746 C 1 . J ] 1 . J ]
## 1 7747 A 1 . J ] 1 . J ]
## 1 7748 G 1 . J ] 1 . J ]
## 1 7749 G 1 . J ] 1 . J ]
## 1 7750 T 1 . J ] 1 . J ]
## 1 7751 C 1 . J ] 1 . J ]
## 1 7752 C 1 . J ] 1 . J ]
## 1 7753 C 1 . J ] 1 . J ]
## 1 7754 C 1 . J ] 1 . J ]
## 1 7755 G 1 . J ] 1 . J ]
## 1 7756 A 1 . J ] 1 . J ]
## 1 7757 C 1 . J ] 1 . J ]
## 1 7758 C 1 . J ] 1 . J ]
## 1 7759 A 1 . J ] 1 . J ]
## 1 7760 A 1 . J ] 1 . J ]
## 1 2466 C 1 ^]. B ] 1 ^]. B ]
## 1 2467 C 1 . E ] 1 . E ]
## 1 2468 T 1 . J ] 1 . J ]
## 1 2469 G 1 . J ] 1 . J ]
## 1 2470 A 1 . J ] 1 . J ]
## 1 2471 C 1 . J ] 1 . J ]
## 1 2472 T 1 . J ] 1 . J ]
## 1 2473 G 1 . J ] 1 . J ]
## 1 2474 T 1 . J ] 1 . J ]
## 1 2475 T 1 . J ] 1 . J ]
## 1 2476 G 1 . J ] 1 . J ]
## 1 2477 G 1 . J ] 1 . J ]
## 1 2478 G 1 . J ] 1 . J ]
## 1 2479 A 1 . J ] 1 . J ]
## 1 2480 C 1 . J ] 1 . J ]
## 1 2481 G 1 . J ] 1 . J ]
## 1 2482 A 1 . J ] 1 . J ]
## 1 2483 A 1 . J ] 1 . J ]
## 1 2484 A 1 . J ] 1 . J ]
## 1 2485 A 1 . J ] 1 . J ]

Another approach is to use
[deepTools](https://deeptools.readthedocs.io/en/develop/) and the
Expand Down Expand Up @@ -877,7 +878,7 @@ samtools view -H eg/ERR188273_chrX.bam
## @HD VN:1.0 SO:coordinate
## @SQ SN:chrX LN:156040895
## @PG ID:hisat2 PN:hisat2 VN:2.2.0 CL:"/Users/dtang/github/rnaseq/hisat2/../src/hisat2-2.2.0/hisat2-align-s --wrapper basic-0 --dta -p 4 -x ../raw/chrX_data/indexes/chrX_tran -1 /tmp/4195.inpipe1 -2 /tmp/4195.inpipe2"
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.15 CL:samtools view -H eg/ERR188273_chrX.bam
## @PG ID:samtools PN:samtools PP:hisat2 VN:1.16 CL:samtools view -H eg/ERR188273_chrX.bam

Substitute header with new name.

Expand Down

0 comments on commit 0a4bb18

Please sign in to comment.