Skip to content

Commit

Permalink
Add glomeruli data download from greenelab/rheum-plier-data (#31)
Browse files Browse the repository at this point in the history
* Add glom data download from data repo

Ignore downloaded files

* Update: rerun DELV with data downloaded from data repo

* Update: rerun with data from data repo

* Update: rerun figure notebook with data downloaded from data repo

No changes in p-values on display items
  • Loading branch information
jaclyn-taroni authored Aug 17, 2018
1 parent ba18d18 commit 042154e
Show file tree
Hide file tree
Showing 18 changed files with 101,001 additions and 100,728 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,14 @@ data/expression_data/E-MTAB-2452_hugene11st_SCANfast.pcl
data/expression_data/GSE18885_series_matrix.txt
data/expression_data/NARES_SCANfast_ComBat.pcl
data/expression_data/SLE_WB_all_microarray_QN_zto_before.pcl
data/expression_data/ERCB_Glom_CustCDF19_forVCRC.txt
data/sample_info/E-GEOD-39088.sdrf.txt
data/sample_info/E-GEOD-65391.sdrf.txt
data/sample_info/E-GEOD-78193.sdrf.txt
data/sample_info/E-MTAB-2452.sdrf.txt
data/sample_info/NARES_demographic_data.tsv
data/sample_info/sle-wb_sample_dataset_mapping.tsv
data/sample_info/ERCB_glom_diagnosis.tsv

# private data
data/kidney/
Expand Down
5 changes: 5 additions & 0 deletions 00-data_download.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b
# isolated blood cell populations from autoimmune conditions
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/isolated-cell-pop/processed/E-MTAB-2452_hugene11st_SCANfast.pcl

# glomeruli data
wget https://github.com/greenelab/rheum-plier-data/raw/55d86bb537f9e38c83fc3cca993cde48dc984411/glomeruli/ERCB_Glom_CustCDF19_forVCRC.txt

# get sample (e.g., phenotype) data
cd .. && mkdir sample_info && cd sample_info
# sle-wb sample to dataset of origin data
Expand All @@ -37,3 +40,5 @@ wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/arrayexpress/E-GEOD-39088/E-GEOD-39088.sdrf.txt
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/arrayexpress/E-GEOD-78193/E-GEOD-78193.sdrf.txt
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/NARES/NARES_demographic_data.tsv
wget https://github.com/greenelab/rheum-plier-data/raw/55d86bb537f9e38c83fc3cca993cde48dc984411/glomeruli/ERCB_glom_diagnosis.tsv

38 changes: 5 additions & 33 deletions 20-kidney_differential_expression.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)
### Expression data

```{r}
ercb.file <- file.path("data", "kidney",
ercb.file <- file.path("data", "expression_data",
"ERCB_Glom_CustCDF19_forVCRC.txt")
exprs.df <- readr::read_tsv(ercb.file)
exprs.df <- dplyr::select(exprs.df, -EntrezGeneID)
Expand All @@ -46,7 +46,7 @@ colnames(exprs.df)[1] <- "Gene"
agg.exprs.df <- PrepExpressionDF(exprs.df)
# any genes that don't have a gene symbol need to be dropped
agg.exprs.df <- dplyr::filter(agg.exprs.df, !(is.na(Gene)))
readr::write_tsv(agg.exprs.df, path = file.path("data", "kidney",
readr::write_tsv(agg.exprs.df, path = file.path("data", "expression_data",
"ERCB_Glom_mean_agg.pcl"))
# as a matrix for use with PLIER
exprs.mat <- as.matrix(dplyr::select(agg.exprs.df, -Gene))
Expand All @@ -56,36 +56,8 @@ rownames(exprs.mat) <- agg.exprs.df$Gene
### Clinical data

```{r}
clinical.file <- file.path("data", "kidney",
"Neptune_ERCB_GE_Clinical_Data_2016-08-24_15-11-58-1.txt")
clinical.df <- readr::read_tsv(clinical.file)
```

```{r}
# only retain samples that are in the expression data
microarray.samples <- colnames(exprs.mat)
all(microarray.samples %in% clinical.df$`Microarray Sample ID`)
```

```{r}
# we only want the info for the samples in the ERCB glomeruli data -- that's
# the microarray data we're looking at
clinical.df <- clinical.df %>%
dplyr::filter(`Microarray Sample ID` %in% microarray.samples,
`Tissue Source` == "Glomeruli")
# diagnosis information only - this will serve as group labels
# recode diagnosis such that nephrotic syndrome diagnoses are grouped
diagnosis.df <- clinical.df %>%
dplyr::select(c(`Microarray Sample ID`, Diagnosis)) %>%
dplyr::mutate(Diagnosis =
dplyr::recode(Diagnosis,
MCD = "Nephrotic syndrome",
MN = "Nephrotic syndrome",
FSGS = "Nephrotic syndrome",
`FSGS/MCD` = "Nephrotic syndrome"),
Sample = `Microarray Sample ID`) %>%
dplyr::select(c(Sample, Diagnosis))
clinical.file <- file.path("data", "sample_info", "ERCB_glom_diagnosis.tsv")
diagnosis.df <- readr::read_tsv(clinical.file)
```

## Apply recount2 model
Expand All @@ -101,7 +73,7 @@ recount.b <- GetNewDataB(exprs.mat = as.matrix(exprs.mat),

```{r}
# save B matrix to file
b.file <- file.path("data", "kidney", "ERCB_glom_recount2_B.RDS")
b.file <- file.path(results.dir, "ERCB_glom_recount2_B.RDS")
saveRDS(recount.b, file = b.file)
```

Expand Down
223 changes: 164 additions & 59 deletions 20-kidney_differential_expression.nb.html

Large diffs are not rendered by default.

8 changes: 0 additions & 8 deletions 21-AAV_DLVE.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,6 @@ At first, we'll take the naive approach of just looking at the overlapping
sets of significant LVs.
There's no guarantee that the directionality will be in agreement this way.

## Install VennDiagram & cowplot

```{r}
# current versions on CRAN
devtools::install_url("https://cran.r-project.org/src/contrib/VennDiagram_1.6.20.tar.gz")
devtools::install_url("https://cran.r-project.org/src/contrib/cowplot_0.9.2.tar.gz")
```

## Functions and directory set up

```{r}
Expand Down
256 changes: 218 additions & 38 deletions 21-AAV_DLVE.nb.html

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions data/expression_data/ERCB_Glom_mean_agg.pcl
Git LFS file not shown
414 changes: 205 additions & 209 deletions figure_notebooks/AAV_multitissue_figures.nb.html

Large diffs are not rendered by default.

Binary file modified figure_notebooks/figures/AAV_LV10_multipanel.pdf
Binary file not shown.
Binary file modified figure_notebooks/figures/AAV_LV10_multipanel_no_barplot.pdf
Binary file not shown.
Binary file modified figure_notebooks/figures/AAV_LV937_multipanel.pdf
Binary file not shown.
Binary file modified figure_notebooks/figures/AAV_LV937_multipanel_no_barplot.pdf
Binary file not shown.
Binary file modified plots/20/ERCB_glom_recount2_model_LV_boxplots.pdf
Binary file not shown.
15 changes: 15 additions & 0 deletions plots/21/AAV_FDR_0.05_overlap.png.2018-08-15_20-45-43.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
INFO [2018-08-15 20:45:43] $x
INFO [2018-08-15 20:45:43] sig.list
INFO [2018-08-15 20:45:43]
INFO [2018-08-15 20:45:43] $filename
INFO [2018-08-15 20:45:43] vd.file
INFO [2018-08-15 20:45:43]
INFO [2018-08-15 20:45:43] $imagetype
INFO [2018-08-15 20:45:43] [1] "png"
INFO [2018-08-15 20:45:43]
INFO [2018-08-15 20:45:43] $resolution
INFO [2018-08-15 20:45:43] [1] 600
INFO [2018-08-15 20:45:43]
INFO [2018-08-15 20:45:43] $category.names
INFO [2018-08-15 20:45:43] c("NARES", "GPA blood", "ERCB glomeruli")
INFO [2018-08-15 20:45:43]
Binary file modified plots/21/significant_LVs_3_AAV_sets.pdf
Binary file not shown.
3 changes: 3 additions & 0 deletions results/20/ERCB_glom_recount2_B.RDS
Git LFS file not shown
Loading

0 comments on commit 042154e

Please sign in to comment.