Skip to content

Commit

Permalink
Fixed up some of the documentation, added the saved parameters thingy.
Browse files Browse the repository at this point in the history
  • Loading branch information
innesbre committed Aug 17, 2018
1 parent 753f5ed commit 5987d4a
Show file tree
Hide file tree
Showing 8 changed files with 583 additions and 340 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Package: scClustViz
Type: Package
Title: Differential Expression-based scRNAseq Cluster Assessment and Viewing
Version: 0.2.0
Date: 2018-08-16
Date: 2018-08-17
Authors@R: c(as.person("Brendan T. Innes <[email protected]> [aut,cre]"),
as.person("Gary D. Bader [aut,ths]"))
Description: An interactive R Shiny tool for visualizing single-cell RNAseq clustering
Expand Down
213 changes: 148 additions & 65 deletions R/deTest.R
Original file line number Diff line number Diff line change
@@ -1,80 +1,157 @@
#' Cluster-wise differential expression testing
#'
#' Performs differential expression testing between clusters for all cluster solutions in
#' order to assess the biological relevance of each cluster solution. Differential
#' expression testing is done using the Wilcoxon rank-sum test implemented in the base R
#' \code{stats} package. For details about what is being compared in the tests, see the
#' "Value" section.
#' Performs differential expression testing between clusters for all cluster
#' solutions in order to assess the biological relevance of each cluster
#' solution. Differential expression testing is done using the Wilcoxon rank-sum
#' test implemented in the base R \code{stats} package. For details about what
#' is being compared in the tests, see the "Value" section.
#'
#' @param il The list outputted by one of the importData functions (either
#' @param il The list outputted by one of the importData functions (either
#' \code{\link{readFromSeurat}} or \code{\link{readFromManual}}).
#'
#' @param testAll Logical value indicating whether to test all cluster solutions
#' (\code{TRUE}) or stop testing once a cluster solution has been found where there is
#' no differentially expressed genes found between at least one pair of nearest
#' neighbouring clusters (\code{FALSE}). \emph{If set to (\code{FALSE}), only the
#' cluster solutions tested will appear in the scClustViz shiny app.}
#'
#' @param exponent The log base of your normalized input data. Seurat normalization uses
#' the natural log (set this to exp(1)), while other normalization methods generally use
#' log2 (set this to 2).
#'
#' @param pseudocount The pseudocount added to all log-normalized values in your input
#' data. Most methods use a pseudocount of 1 to eliminate log(0) errors.
#'
#' @param FDRthresh The false discovery rate to use as a threshold for determining statistical
#' significance of differential expression calculated by the Wilcoxon rank-sum test.
#'
#' @param threshType Filtering genes for use in differential expression testing can be
#' done multiple ways. We use an expression ratio filter for comparing each cluster to
#' the rest of the tissue as a whole, but find that difference in detection rates works
#' better when comparing clusters to each other. You can set threshType to
#' \code{"logGER"} to use a gene expression ratio for all gene filtering, or leave it as
#' default (\code{"dDR"}) to use difference in detection rate as the thresholding method
#' when comparing clusters to each other.
#'
#' @param dDRthresh Magnitude of detection rate difference of a gene between clusters to
#' use as filter for determining which genes to test for differential expression between
#' clusters.
#'
#' @param logGERthresh Magnitude of gene expression ratio for a gene between clusters to
#' use as filter for determining which genes to test for differential expression between
#' clusters.
#'
#' @return The function returns a list containing the results of differential expression
#' testing for all sets of cluster solutions. \emph{Saving both the input (the object passed
#' to the \code{il} argument) and the output of this function to an RData file is all
#' the preparation necessary for running the scClustViz Shiny app itself.}
#' The output list of this function contains the following elements:
#' \describe{
#' \item{CGS}{}
#' \item{deTissue}{}
#' \item{deVS}{}
#' \item{deMarker}{}
#' \item{deDist}{}
#' \item{deNeighb}{}
#'
#' @param testAll Default = TRUE. Logical value indicating whether to test all
#' cluster solutions (\code{TRUE}) or stop testing once a cluster solution has
#' been found where there is no differentially expressed genes found between
#' at least one pair of nearest neighbouring clusters (\code{FALSE}). \emph{If
#' set to (\code{FALSE}), only the cluster solutions tested will appear in the
#' scClustViz shiny app.}
#'
#' @param exponent Default = 2. The log base of your normalized input data.
#' Seurat normalization uses the natural log (set this to exp(1)), while other
#' normalization methods generally use log2 (set this to 2).
#'
#' @param pseudocount Default = 1. The pseudocount added to all log-normalized
#' values in your input data. Most methods use a pseudocount of 1 to eliminate
#' log(0) errors.
#'
#' @param FDRthresh Default = 0.01. The false discovery rate to use as a
#' threshold for determining statistical significance of differential
#' expression calculated by the Wilcoxon rank-sum test.
#'
#' @param threshType Default = "dDR". Filtering genes for use in differential
#' expression testing can be done multiple ways. We use an expression ratio
#' filter for comparing each cluster to the rest of the tissue as a whole, but
#' find that difference in detection rates works better when comparing
#' clusters to each other. You can set threshType to \code{"logGER"} to use a
#' gene expression ratio for all gene filtering, or leave it as default
#' (\code{"dDR"}) to use difference in detection rate as the thresholding
#' method when comparing clusters to each other.
#'
#' @param dDRthresh Default = 0.15. Magnitude of detection rate difference of a
#' gene between clusters to use as filter for determining which genes to test
#' for differential expression between clusters.
#'
#' @param logGERthresh Default = 1. Magnitude of gene expression ratio for a
#' gene between clusters to use as filter for determining which genes to test
#' for differential expression between clusters.
#'
#' @return The function returns a list containing the results of differential
#' expression testing for all sets of cluster solutions. \emph{Saving both the
#' input (the object passed to the \code{il} argument) and the output of this
#' function to an RData file is all the preparation necessary for running the
#' scClustViz Shiny app itself.} The output list of this function contains the
#' following elements:
#' \describe{
#' \item{CGS}{A nested list of dataframes. Each list element is named for
#' a column in \code{il$cl} (a cluster resolution). That list element
#' contains a named list of clusters at that resolution. Each of those
#' list elements contains a dataframe of three variables, where each
#' sample is a gene. \code{DR} is the proportion of cells in the cluster
#' in which that gene was detected. \code{MDTC} is mean normalized gene
#' expression for that gene in only the cells in which it was detected
#' (see \link{meanLogX} for mean calculation). \code{MTC} is the mean
#' normalized gene expression for that gene in all cells of the cluster
#' (see \link{meanLogX} for mean calculation).}
#' \item{deTissue}{Differential testing results from Wilcoxon rank sum tests
#' comparing a gene in each cluster to the rest of the cells as a whole in
#' a one vs all comparison. The results are stored as a nested list of
#' dataframes. Each list element is named for a column in \code{il$cl} (a
#' cluster resolution). That list element contains a named list of
#' clusters at that resolution. Each of those list elements contains a
#' dataframe of three variables, where each sample is a gene.
#' \code{logGER} is the log gene expression ratio calculated by
#' subtracting the mean expression of the gene (see \link{meanLogX} for
#' mean calculation) in all other cells from the mean expression of the
#' gene in this cluster. \code{pVal} is the p-value of the Wilcoxon rank
#' sum test. \code{qVal} is the false discovery rate-corrected p-value of
#' the test.}
#' \item{deVS}{Differential testing results from Wilcoxon rank sum tests
#' comparing a gene in each cluster to that gene in every other cluster in
#' a series of tests. The results are stored as a nested list of
#' dataframes. Each list element is named for a column in \code{il$cl} (a
#' cluster resolution). That list element contains a named list of
#' clusters at that resolution (cluster A). Each of those lists contains a
#' named list of all the other clusters at that resolution (cluster B).
#' Each of those list elements contains a dataframe of four variables,
#' where each sample is a gene. \code{dDR} is the difference in detection
#' rate of that gene between the two clusters (DR[A] - DR[B]).
#' \code{logGER} is the log gene expression ratio calculated by taking the
#' difference in mean expression of the gene (see \link{meanLogX} for
#' mean calculation) between the two clusters (MTC[A] - MTC[B]).
#' \code{pVal} is the p-value of the Wilcoxon rank sum test. \code{qVal}
#' is the false discovery rate-corrected p-value of the test.}
#' \item{deMarker}{Differential testing results from Wilcoxon rank sum tests
#' comparing a gene in each cluster to that gene in every other cluster in
#' a series of tests, and filtering for only those genes that show
#' significant positive differential expression versus all other clusters.
#' The results are stored as a nested list of dataframes. Each list
#' element is named for a column in \code{il$cl} (a cluster resolution).
#' That list element contains a named list of clusters at that resolution
#' (cluster A). Each of those list elements contains a dataframe where
#' variables represent comparisons to all the other clusters and each
#' sample is a gene. For each other cluster (cluster B), there are three
#' variables, named as follows: \code{vs.B.dDR} is the difference in
#' detection rate of that gene between the two clusters (DR[A] - DR[B]).
#' \code{vs.B.logGER} is the log gene expression ratio calculated by
#' taking the difference in mean expression of the gene (see
#' \link{meanLogX} for mean calculation) between the two clusters (MTC[A]
#' - MTC[B]). \code{vs.B.qVal} is the false discovery rate-corrected
#' p-value of the Wilcoxon rank sum test.}
#' \item{deDist}{A named list of distances between clusters for each cluster
#' resolution. Distances are calculated as number of differentially
#' expressed genes between clusters.}
#' \item{deNeighb}{Differential testing results from Wilcoxon rank sum tests
#' comparing a gene in each cluster to that gene in its nearest
#' neighbouring cluster (calculated by number of differentially expressed
#' genes), and filtering for only those genes that show significant
#' positive differential expression versus all other clusters. The results
#' are stored as a nested list of dataframes. Each list element is named
#' for a column in \code{il$cl} (a cluster resolution). That list element
#' contains a named list of clusters at that resolution (cluster A). Each
#' of those list elements contains a dataframe where variables represent
#' the comparison to its nearest neighbouring cluster (cluster B) and each
#' sample is a gene. There are three variables, named as follows:
#' \code{vs.B.dDR} is the difference in detection rate of that gene
#' between the two clusters (DR[A] - DR[B]). \code{vs.B.logGER} is the log
#' gene expression ratio calculated by taking the difference in mean
#' expression of the gene (see \link{meanLogX} for mean calculation)
#' between the two clusters (MTC[A] - MTC[B]). \code{vs.B.qVal} is the
#' false discovery rate-corrected p-value of the Wilcoxon rank sum test.}
#' \item{params}{A list of the parameters from the argument list of this
#' function used to do the analysis, saved so that the same parameters are
#' used in the Shiny app.}
#' }
#'
#' @examples
#'
#' @examples
#' \dontrun{
#' data_for_scClustViz <- readFromSeurat(your_seurat_object,
#' convertGeneIDs=F)
#' rm(your_seurat_object)
#' data_for_scClustViz <- readFromSeurat(your_seurat_object)
#' rm(your_seurat_object)
#' # All the data scClustViz needs is in 'data_for_scClustViz'.
#'
#'
#' DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz)
#'
#'
#' save(data_for_scClustViz,DE_for_scClustViz,
#' file="for_scClustViz.RData")
#' # Save these objects so you'll never have to run this slow function again!
#'
#'
#' runShiny(filePath="for_scClustViz.RData")
#' }
#'
#' @seealso \code{\link{readFromSeurat}} or \code{\link{readFromManual}} for reading in
#' data to generate the input object for this function, and \code{\link{runShiny}} to
#' use the interactive Shiny GUI to view the results of this testing.
#'
#'
#' @seealso \code{\link{readFromSeurat}} or \code{\link{readFromManual}} for
#' reading in data to generate the input object for this function, and
#' \code{\link{runShiny}} to use the interactive Shiny GUI to view the results
#' of this testing.
#'
#' @export


Expand All @@ -87,7 +164,13 @@ clusterWiseDEtest <- function(il,testAll=TRUE,
options(warn=-1)

out <- list(CGS=list(),deTissue=list(),deVS=list(),
deMarker=list(),deDist=list(),deNeighb=list())
deMarker=list(),deDist=list(),deNeighb=list(),
params=list(exponent=exponent,
pseudocount=pseudocount,
FDRthresh=FDRthresh,
threshType=threshType,
dDRthresh=dDRthresh,
logGERthresh=logGERthresh))
# This loop iterates through every cluster solution, and does DE testing between clusters
# to generate the DE metrics for assessing your clusters. This takes some time.
for (res in colnames(il[["cl"]])) {
Expand Down
Loading

0 comments on commit 5987d4a

Please sign in to comment.