In this README, there will be information on each analysis code file which is used to analyze the data of the paper "linking candidate causal autoimmune variants to T-cell networks using genetic and epigenetic screens".
This .Rmd eliminates a SNP which appeared in the results which should not have been there because it was not even in the sequencing library.
This Rmd takes the basic MPRA information and contextualize it with linkage disequillirium, epigenetic and transcription factor binding data. This creates the expanded table which is called mpra merge.
After the MPRA merge table is created, I incorporated human genome liftover data to have seperate hg19, hg38 and hg19 and 38 tables. I have already done this and both columns appear in the final table so you don't need to do this agian.
This .Rmd contains code to replicate the MPRA analysis code for the previously published Jurkat T-cell cell line data (from Mouri, K., Guo, M.H., de Boer, C.G. et al. Prioritization of autoimmune disease-associated genetic variants that perturb regulatory element activity in T cells. Nat Genet 54, 603–612 (2022).) as well as some plots of our own. The analyses include:
Comparisons between the Jurkat and primary T-cell data using venn diagrams, tables and plotting allelic bias
Enrichments for epigenetic data including DHS, ATAC-seq, caQTL, histone marks, etc.
An plot describing the enrichment for MPRA emVars for PICS fine-mapping variants
An initial motifbreakR analysis simply describing the enrichment for transcription factor binding sites)
This Rmd contains the grid search which is used to estimate the cut-offs for high activity variants (p-CREs) and allelic-specific expression variants (emVars).
This Rmd contains the transcription factor binding analysis of the MPRA data. The steps to this analysis include:
-
Create a Granges bed file of the variants mpra tested in the MPRA
-
Run motifbreakR function to generate TF binding data on the MPRA variants
-
Merge motifbreakr and MPRA data
-
Run t-test of primary T cell MPRA expression of variants which do and do not bind to each tf
-
Repeat step 4 with jurkat mpra expression data
-
Run t-test analysis for variants fine mapped to each disease
-
Merge the primary tcell and unstimulated jurkat data
-
Compare the results of jurkat and primary T cells
After creating the TF data in the previous .Rmd, this .Rmd created the columns which are used in MPRA merge. This .Rmd incorporates data two TF binding site programs, motifbreakR and Ananastra.
This Rmd contains the enrichments for MPRA emVars for variants fine-mapped in UK BioBank (UKBB) fine-mapping data. The steps to this analysis include:
Import the UKBB data and merge with MPRA data.
Create a table with MPRA variants and the UKBB data for the paper.
Create the enrichment plots for MPRA emVars in UKBB data.
Finally using all the tables which are relevant to the MPRA data created so far, I put the tables into the final format which appears in the paper. Here are all the tables created in this file:
NOT THE ORDER IN THE ACTUAL SUPPLEMENTARY TABLES
Tcell MPRA results
Jurkat MPRA results
PICS enrichment all loci
PICS enrichment emvars loci
UK biobank enrichment all loci
UK biobank enrichment emvars loci
tcell motifbreakr mpra combined
tcell motifbreakr logskew ttest
jurkat motifbreakr mpra combined
jurkat motifbreakr logskew ttest
ChromHMM enrich
Histone CAGE DHS enr
T cell MPRA functional annotations
PICS by MPRA
UKBB by MPRA
Jurkat MPRA functional annotations
Encode DHS Enrichment
T-cell DHS Grid Search
Jurkat DHS Grid Search
Tcell TF ttest by disease
This Jupyter notebook generates variant-to-gene (V2G) mapping for rsIDs of interest. Key steps include:
Converting rsIDs to variant IDs using genopyc Mapping variants to genes with the V2G otargen pipeline Processing T cell expression data from the DICE database Filtering V2G output based on cell-specific expression Creating background and foreground datasets for network analysis
Requires Python (pandas, genopyc, polars) and R (otargen, purrr, dplyr, readr) libraries. Outputs include filtered V2G data and gene sets for further analysis.
This markdown file uses Seurat and SCEPTRE to analyze single-cell CRISPR screen data.