authors: Kayla Barton, Lindsey Fenderson, and Ben King
Using UCSC we can visualize the locations of our prospective sites, SnpEff vcf outputs, and bed files.
SNPeff allows us to see how our variants change a gene and where. Here is what a raw SnpEff file looks like. I've highlighted important parts in purple that I'll put into a spreadsheet.
Note that just like in the picture of DAB2 on UCSC we found the same amino acid changes!
You can access the White-Throated sparrow genome on the UCSC Genome Browser by googling it or using this link: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_2172235_GCF_000385455.1&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=NW_005081537v1%3A11248104%2D11248175&hgsid=1271870139_caIUP1mE0oiZeK7TPV2S5EB8EPno
Click the custom custom track button to add our files.
First lets try and load a bam file from http://katahdin.acg.maine.edu/~benking/GECO/ On this page we have large bam files for each species. Bam files are tab-delimited text files that contain sequence alignment data. Because they are so large it's not easy to simply click download them and upload them to UCSC, instead you can right click the link and copy it into custom track data for faster upload. For this example we'll upload all of the nelson bam files. (You can either copy each link, or copy and paste this below)
http://katahdin.acg.maine.edu/~benking/GECO/2391-71617_Anelsoni_BassHarborME_20100715_trimmed-bwamem-Zalbicollis-1.0.1.sorted_UCSC.bam
http://katahdin.acg.maine.edu/~benking/GECO/2631-21304_Anelsoni_MaquoitBayME_20150812_trimmed-bwamem-Zalbicollis-1.0.1.sorted_UCSC.bam
http://katahdin.acg.maine.edu/~benking/GECO/2781-84960_Anelsoni_PleasantRiverAddisonME_20190624_trimmed-bwamem-Zalbicollis-1.0.1.sorted_UCSC.bam
If the file was uploaded successfully, you should then see the manage custom tracks page, where you can continue to add files or delete unwanted tracks.
Next we'll add a bed file with gene regions. From http://katahdin.acg.maine.edu/~benking/GECO/ copy the link for UCSC_WTSFoundGenesFull.bed and upload it to custom tracks.
http://katahdin.acg.maine.edu/~benking/GECO/UCSC_WTSFoundGenesFull.bed
Sometimes bed files on UCSC can be finicky. Make sure your bed file follows this same format! Here are UCSC's basic guidelines for Bed files: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
Finally press go to see your custom tracks alongside the genome! (use the toolbar to toggle view.) Now you can observe different regions of the genome, with our bed track we can see that this particular section is apart of GREM1. Note if your screen doesn't match this change location by copying and pasting the text below:
NW_005081537v1:11,248,139-11,248,210
By clicking the bars on the new bam tracks you can expand them to look at the reads and variants.
Notice anything interesting about these 3 different individuals?
Like with DAB2 using a SNPeff output we can find variants that have a different allele frequency
Now that we understand why we need these visualization tools lets make our SnpEff file!
Galaxy creates a user friendly interface for running jobs without the use of your command line. SNPeff is a tool used for predicting the effects of genetic variants on genes and proteins. In this tutorial we will be going through how to use SNPeff to help narrow down the list of genes that will be most impactful for the GTseq assay.
Note: all files used in this tutorial and more are available at http://katahdin.acg.maine.edu/~benking/GECO/
First log in or create a Galaxy account. https://usegalaxy.org/login
The basic galaxy layout consists of three major sections history, menu, and tools
Use the following link to obtain the premade history Galaxy Link: url: https://usegalaxy.org/u/suika_64/h/snpeffanalysisgtseq2
After clicking the link you'll be redirected to the history. In the upper right hand corner click the plus button to add the history to your galaxy account
If done successfully, you should now have your galaxy homepage look like this: (If you have another history go to view all histories and switch to the imported one)
First search and select the tool you want to use, let's start off by searching "isec" and selecting "bcftools isec"
When you select a tool, the middle menu section will change to the options of that tool, here you can select files to be used and parameters.
After selecting options, just click execute! (you can also have Galaxy email you notifications for when the job is done)
Bcftools isec allows you to create intersections, unions and complements of VCF files. Specifically we'll be using this for finding variants that are unique to a certain species when compared to another.
Recommended options: Slide the "Complement" option to yes. This will output positions that are unique to the first file when compared to others.
Select the two species you'd like to compare. (Helpful tip: isec will always makes your earliest file your first file, the way around this is to make a new history and copy over the species that you're interested in. For example NESP-BCF.bcf.gz is the 1st dataset in my history so no matter what combo I choose it will always output the unique Nelson variants when compared to different samples.)
Example usage: Let's say I want to find all the variants that are unique to the Saltmarsh sparrow when compared to the Swamp sparrow. First I would create a new history and copy over the SALS-BCF.bcf.gz (drag from previous directory to current) then copy over SWSP-BCF.bcf.gz making them 1. and 2. respectively. Search for the isec tool, select it, and slide the Complement option to yes, select SALS-BCF.bcf.gz and SWSP-BCF.bcf.gz, and hit execute.
Note: You can also restrict this output by a certain region, just click "Restrict to" scroll down to "Regions" and select "Specify one or more Region(s) directly" in the drop down menu
NW_005081537.1
11247728
11248940
The isec output should look like this! Another way to visualize bcftools isec is to see it almost a venn diagram
Note: that this list matches our isec results!
This tool is pretty straightforward just select the file(s) you'd like to compare/get stats on and hit execute. (we can keep all the defaults for this tool)
Note: we can also run this on our finished SNPeff file down the line
Bcftools view is a tool that allows us to filter by region and allele frequency
Recommended options: Set to desired region. Click Filter options, scroll down to Min Af and Max AF to set allelic frequency filters. As in bcftools isec we can also restrict by region if we're only interested in certain genes
Example usage: 10-90 AF only want variants in GREM1 region
SnpEff will allow us to identify the specific changes that are caused due to variants.
Select your filtered file, say yes to creating csv report, change genome source to "Custom snpEff database in your history" which should automatically select the "SnpEff4.3 database for Zonotrichia_albicollis"
After the parameters are set hit the blue "Execute" button to run the job! The job may take a while so feel free to click away or do other work while the job is running. (it'll turn grey when submitted, yellow when running, and green once completed) You can even check the status on your phone or have galaxy email you once it's completed.
Once the job is finished you should now have a vcf/bed file!
Once your new vcf or bed file is made you can download it by clicking the floppy disk.
SNPeff intro: http://pcingola.github.io/SnpEff/se_introduction/
Custom SnpEff database generation workflow: