-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding batch 2 consensus profiles #61
Conversation
also adding Metadata_cell_id column
This must be a metadata issue or a missing grouping column Batch 2 has 3 cell lines x 3 dose points x 3 time points x 360 compounds = ~9720 (not exact because some compounds might be missing all doses) |
Here is the exact number of consensus profiles for batch 2 library(tidyverse)
platemaps <-
c("https://raw.githubusercontent.com/gwaygenomics/lincs-cell-painting/batch2-consensus/metadata/platemaps/2017_12_05_Batch2/platemap/ASG003_A549_24H.txt",
"https://raw.githubusercontent.com/gwaygenomics/lincs-cell-painting/batch2-consensus/metadata/platemaps/2017_12_05_Batch2/platemap/LKCP001_A549_24H.txt",
"https://raw.githubusercontent.com/gwaygenomics/lincs-cell-painting/batch2-consensus/metadata/platemaps/2017_12_05_Batch2/platemap/LKCP002_A549_24H.txt")
n_cell_lines <- 3
n_time_points <- 3
platemaps %>%
map_df(read_tsv) %>%
distinct(broad_sample, mmoles_per_liter) %>%
tally(name = "n_consensus") %>%
mutate(n_consensus = n_consensus * n_cell_lines * n_time_points) %>%
knitr::kable()
|
missing time as a grouping column, thanks! |
this turned out to be an even larger problem. the aggregate function will drop samples if one of their aggregating columns ( This impacted both batches of data, but batch 2 substantially more. Batch 2 now has 10,368 consensus profiles. Note that your example above does not include platemaps from multiple time points. Also note that I do update MOAs in the profiling step for both batches: lincs-cell-painting/profiles/profile_cells.py Lines 73 to 83 in d471bbd
But i wonder if I need to update the external moa file first with the new batch broad ids... lincs-cell-painting/profiles/profiling_pipeline.py Lines 46 to 50 in d471bbd
|
in other words, if I have to do this, then I'll need to rerun the profiling pipeline again for at least batch 2 data |
Wow, glad you found it! Bad 🐼 !
For our notes: It does actually – there are only 3 unique platemaps (containing 3 doses x ~360 compounds), so I read 3 of them then multiplied that by 3x3. But that example is useless given that we are computing consensus by including the |
lgtm |
Here, I add consensus profiles for batch 2 profiles. I also add
Metadata_cell_id
to the aggregation columns for both batches (batch 2 has three cell lines). I make some minor changes throughout the notebook.We find only 1,620 consensus profiles in batch 2 (we have 8,340 in batch 1).