diff --git a/DESCRIPTION b/DESCRIPTION index 29ee9db..6d8205e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,7 +2,7 @@ Package: scClustViz Type: Package Title: Differential Expression-based scRNAseq Cluster Assessment and Viewing Version: 0.2.0 -Date: 2018-08-16 +Date: 2018-08-17 Authors@R: c(as.person("Brendan T. Innes [aut,cre]"), as.person("Gary D. Bader [aut,ths]")) Description: An interactive R Shiny tool for visualizing single-cell RNAseq clustering diff --git a/R/deTest.R b/R/deTest.R index 06838a9..953cbc3 100644 --- a/R/deTest.R +++ b/R/deTest.R @@ -1,80 +1,157 @@ #' Cluster-wise differential expression testing #' -#' Performs differential expression testing between clusters for all cluster solutions in -#' order to assess the biological relevance of each cluster solution. Differential -#' expression testing is done using the Wilcoxon rank-sum test implemented in the base R -#' \code{stats} package. For details about what is being compared in the tests, see the -#' "Value" section. +#' Performs differential expression testing between clusters for all cluster +#' solutions in order to assess the biological relevance of each cluster +#' solution. Differential expression testing is done using the Wilcoxon rank-sum +#' test implemented in the base R \code{stats} package. For details about what +#' is being compared in the tests, see the "Value" section. #' -#' @param il The list outputted by one of the importData functions (either +#' @param il The list outputted by one of the importData functions (either #' \code{\link{readFromSeurat}} or \code{\link{readFromManual}}). -#' -#' @param testAll Logical value indicating whether to test all cluster solutions -#' (\code{TRUE}) or stop testing once a cluster solution has been found where there is -#' no differentially expressed genes found between at least one pair of nearest -#' neighbouring clusters (\code{FALSE}). \emph{If set to (\code{FALSE}), only the -#' cluster solutions tested will appear in the scClustViz shiny app.} -#' -#' @param exponent The log base of your normalized input data. Seurat normalization uses -#' the natural log (set this to exp(1)), while other normalization methods generally use -#' log2 (set this to 2). -#' -#' @param pseudocount The pseudocount added to all log-normalized values in your input -#' data. Most methods use a pseudocount of 1 to eliminate log(0) errors. -#' -#' @param FDRthresh The false discovery rate to use as a threshold for determining statistical -#' significance of differential expression calculated by the Wilcoxon rank-sum test. -#' -#' @param threshType Filtering genes for use in differential expression testing can be -#' done multiple ways. We use an expression ratio filter for comparing each cluster to -#' the rest of the tissue as a whole, but find that difference in detection rates works -#' better when comparing clusters to each other. You can set threshType to -#' \code{"logGER"} to use a gene expression ratio for all gene filtering, or leave it as -#' default (\code{"dDR"}) to use difference in detection rate as the thresholding method -#' when comparing clusters to each other. -#' -#' @param dDRthresh Magnitude of detection rate difference of a gene between clusters to -#' use as filter for determining which genes to test for differential expression between -#' clusters. -#' -#' @param logGERthresh Magnitude of gene expression ratio for a gene between clusters to -#' use as filter for determining which genes to test for differential expression between -#' clusters. -#' -#' @return The function returns a list containing the results of differential expression -#' testing for all sets of cluster solutions. \emph{Saving both the input (the object passed -#' to the \code{il} argument) and the output of this function to an RData file is all -#' the preparation necessary for running the scClustViz Shiny app itself.} -#' The output list of this function contains the following elements: -#' \describe{ -#' \item{CGS}{} -#' \item{deTissue}{} -#' \item{deVS}{} -#' \item{deMarker}{} -#' \item{deDist}{} -#' \item{deNeighb}{} +#' +#' @param testAll Default = TRUE. Logical value indicating whether to test all +#' cluster solutions (\code{TRUE}) or stop testing once a cluster solution has +#' been found where there is no differentially expressed genes found between +#' at least one pair of nearest neighbouring clusters (\code{FALSE}). \emph{If +#' set to (\code{FALSE}), only the cluster solutions tested will appear in the +#' scClustViz shiny app.} +#' +#' @param exponent Default = 2. The log base of your normalized input data. +#' Seurat normalization uses the natural log (set this to exp(1)), while other +#' normalization methods generally use log2 (set this to 2). +#' +#' @param pseudocount Default = 1. The pseudocount added to all log-normalized +#' values in your input data. Most methods use a pseudocount of 1 to eliminate +#' log(0) errors. +#' +#' @param FDRthresh Default = 0.01. The false discovery rate to use as a +#' threshold for determining statistical significance of differential +#' expression calculated by the Wilcoxon rank-sum test. +#' +#' @param threshType Default = "dDR". Filtering genes for use in differential +#' expression testing can be done multiple ways. We use an expression ratio +#' filter for comparing each cluster to the rest of the tissue as a whole, but +#' find that difference in detection rates works better when comparing +#' clusters to each other. You can set threshType to \code{"logGER"} to use a +#' gene expression ratio for all gene filtering, or leave it as default +#' (\code{"dDR"}) to use difference in detection rate as the thresholding +#' method when comparing clusters to each other. +#' +#' @param dDRthresh Default = 0.15. Magnitude of detection rate difference of a +#' gene between clusters to use as filter for determining which genes to test +#' for differential expression between clusters. +#' +#' @param logGERthresh Default = 1. Magnitude of gene expression ratio for a +#' gene between clusters to use as filter for determining which genes to test +#' for differential expression between clusters. +#' +#' @return The function returns a list containing the results of differential +#' expression testing for all sets of cluster solutions. \emph{Saving both the +#' input (the object passed to the \code{il} argument) and the output of this +#' function to an RData file is all the preparation necessary for running the +#' scClustViz Shiny app itself.} The output list of this function contains the +#' following elements: +#' \describe{ +#' \item{CGS}{A nested list of dataframes. Each list element is named for +#' a column in \code{il$cl} (a cluster resolution). That list element +#' contains a named list of clusters at that resolution. Each of those +#' list elements contains a dataframe of three variables, where each +#' sample is a gene. \code{DR} is the proportion of cells in the cluster +#' in which that gene was detected. \code{MDTC} is mean normalized gene +#' expression for that gene in only the cells in which it was detected +#' (see \link{meanLogX} for mean calculation). \code{MTC} is the mean +#' normalized gene expression for that gene in all cells of the cluster +#' (see \link{meanLogX} for mean calculation).} +#' \item{deTissue}{Differential testing results from Wilcoxon rank sum tests +#' comparing a gene in each cluster to the rest of the cells as a whole in +#' a one vs all comparison. The results are stored as a nested list of +#' dataframes. Each list element is named for a column in \code{il$cl} (a +#' cluster resolution). That list element contains a named list of +#' clusters at that resolution. Each of those list elements contains a +#' dataframe of three variables, where each sample is a gene. +#' \code{logGER} is the log gene expression ratio calculated by +#' subtracting the mean expression of the gene (see \link{meanLogX} for +#' mean calculation) in all other cells from the mean expression of the +#' gene in this cluster. \code{pVal} is the p-value of the Wilcoxon rank +#' sum test. \code{qVal} is the false discovery rate-corrected p-value of +#' the test.} +#' \item{deVS}{Differential testing results from Wilcoxon rank sum tests +#' comparing a gene in each cluster to that gene in every other cluster in +#' a series of tests. The results are stored as a nested list of +#' dataframes. Each list element is named for a column in \code{il$cl} (a +#' cluster resolution). That list element contains a named list of +#' clusters at that resolution (cluster A). Each of those lists contains a +#' named list of all the other clusters at that resolution (cluster B). +#' Each of those list elements contains a dataframe of four variables, +#' where each sample is a gene. \code{dDR} is the difference in detection +#' rate of that gene between the two clusters (DR[A] - DR[B]). +#' \code{logGER} is the log gene expression ratio calculated by taking the +#' difference in mean expression of the gene (see \link{meanLogX} for +#' mean calculation) between the two clusters (MTC[A] - MTC[B]). +#' \code{pVal} is the p-value of the Wilcoxon rank sum test. \code{qVal} +#' is the false discovery rate-corrected p-value of the test.} +#' \item{deMarker}{Differential testing results from Wilcoxon rank sum tests +#' comparing a gene in each cluster to that gene in every other cluster in +#' a series of tests, and filtering for only those genes that show +#' significant positive differential expression versus all other clusters. +#' The results are stored as a nested list of dataframes. Each list +#' element is named for a column in \code{il$cl} (a cluster resolution). +#' That list element contains a named list of clusters at that resolution +#' (cluster A). Each of those list elements contains a dataframe where +#' variables represent comparisons to all the other clusters and each +#' sample is a gene. For each other cluster (cluster B), there are three +#' variables, named as follows: \code{vs.B.dDR} is the difference in +#' detection rate of that gene between the two clusters (DR[A] - DR[B]). +#' \code{vs.B.logGER} is the log gene expression ratio calculated by +#' taking the difference in mean expression of the gene (see +#' \link{meanLogX} for mean calculation) between the two clusters (MTC[A] +#' - MTC[B]). \code{vs.B.qVal} is the false discovery rate-corrected +#' p-value of the Wilcoxon rank sum test.} +#' \item{deDist}{A named list of distances between clusters for each cluster +#' resolution. Distances are calculated as number of differentially +#' expressed genes between clusters.} +#' \item{deNeighb}{Differential testing results from Wilcoxon rank sum tests +#' comparing a gene in each cluster to that gene in its nearest +#' neighbouring cluster (calculated by number of differentially expressed +#' genes), and filtering for only those genes that show significant +#' positive differential expression versus all other clusters. The results +#' are stored as a nested list of dataframes. Each list element is named +#' for a column in \code{il$cl} (a cluster resolution). That list element +#' contains a named list of clusters at that resolution (cluster A). Each +#' of those list elements contains a dataframe where variables represent +#' the comparison to its nearest neighbouring cluster (cluster B) and each +#' sample is a gene. There are three variables, named as follows: +#' \code{vs.B.dDR} is the difference in detection rate of that gene +#' between the two clusters (DR[A] - DR[B]). \code{vs.B.logGER} is the log +#' gene expression ratio calculated by taking the difference in mean +#' expression of the gene (see \link{meanLogX} for mean calculation) +#' between the two clusters (MTC[A] - MTC[B]). \code{vs.B.qVal} is the +#' false discovery rate-corrected p-value of the Wilcoxon rank sum test.} +#' \item{params}{A list of the parameters from the argument list of this +#' function used to do the analysis, saved so that the same parameters are +#' used in the Shiny app.} #' } -#' -#' @examples +#' +#' @examples #' \dontrun{ -#' data_for_scClustViz <- readFromSeurat(your_seurat_object, -#' convertGeneIDs=F) -#' rm(your_seurat_object) +#' data_for_scClustViz <- readFromSeurat(your_seurat_object) +#' rm(your_seurat_object) #' # All the data scClustViz needs is in 'data_for_scClustViz'. -#' +#' #' DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz) -#' +#' #' save(data_for_scClustViz,DE_for_scClustViz, #' file="for_scClustViz.RData") #' # Save these objects so you'll never have to run this slow function again! -#' +#' #' runShiny(filePath="for_scClustViz.RData") #' } -#' -#' @seealso \code{\link{readFromSeurat}} or \code{\link{readFromManual}} for reading in -#' data to generate the input object for this function, and \code{\link{runShiny}} to -#' use the interactive Shiny GUI to view the results of this testing. -#' +#' +#' @seealso \code{\link{readFromSeurat}} or \code{\link{readFromManual}} for +#' reading in data to generate the input object for this function, and +#' \code{\link{runShiny}} to use the interactive Shiny GUI to view the results +#' of this testing. +#' #' @export @@ -87,7 +164,13 @@ clusterWiseDEtest <- function(il,testAll=TRUE, options(warn=-1) out <- list(CGS=list(),deTissue=list(),deVS=list(), - deMarker=list(),deDist=list(),deNeighb=list()) + deMarker=list(),deDist=list(),deNeighb=list(), + params=list(exponent=exponent, + pseudocount=pseudocount, + FDRthresh=FDRthresh, + threshType=threshType, + dDRthresh=dDRthresh, + logGERthresh=logGERthresh)) # This loop iterates through every cluster solution, and does DE testing between clusters # to generate the DE metrics for assessing your clusters. This takes some time. for (res in colnames(il[["cl"]])) { diff --git a/R/importData.R b/R/importData.R index 81474f1..b578064 100644 --- a/R/importData.R +++ b/R/importData.R @@ -1,65 +1,68 @@ #' Read in data from a Seurat object automatically #' -#' Loads the necessary data from a Seurat object for use in both the cluster-wise -#' differential expression testing function, as well as in the Shiny app itself. +#' Loads the necessary data from a Seurat object for use in both the +#' cluster-wise differential expression testing function, as well as in the +#' Shiny app itself. #' #' @param inD A Seurat object containing slots as outlined in Details. #' -#' @return The function returns a list containing input data necessary for both the -#' cluster-wise differential expression testing function and the Shiny app itself. The -#' list contains the following elements: +#' @return The function returns a list containing input data necessary for both +#' the cluster-wise differential expression testing function and the Shiny app +#' itself. The list contains the following elements: #' \describe{ #' \item{nge}{The normalized gene expression matrix.} #' \item{md}{The metadata dataframe, not including cluster assignments.} -#' \item{cl}{The cluster assignment dataframe, containing cluster assignments for each -#' resolution tested. The columns will be sorted in order of increasing resolution -#' (k, number of clusters).} -#' \item{dr_clust}{The cell embeddings used in the clustering, from PCA.} -#' \item{dr_viz}{The cell embeddings used for visualization in 2D, from tSNE.} +#' \item{cl}{The cluster assignment dataframe, containing cluster +#' assignments for each resolution tested. The columns will be sorted in +#' order of increasing resolution (k, number of clusters).} +#' \item{dr_clust}{The cell embeddings used in the clustering, from PCA.} +#' \item{dr_viz}{The cell embeddings used for visualization in 2D, from +#' tSNE.} #' } #' -#' @section Seurat object slots: The following slots are expected in the Seurat object. If -#' you're using Seurat v1.x, the equivalent slots are expected (this code takes -#' advantage of \code{UpdateSeuratObject} to find the relevant data in older Seurat -#' objects.) +#' @section Seurat object slots: The following slots are expected in the Seurat +#' object. If you're using Seurat v1.x, the equivalent slots are expected +#' (this code takes advantage of \code{UpdateSeuratObject} to find the +#' relevant data in older Seurat objects.) #' \describe{ -#' \item{@@data}{Holds the normalized gene expression matrix.} -#' \item{@@meta.data}{Holds the metadata, including cluster assignments. \emph{Cluster -#' assignment columns of the metadata should be titled with their resolution -#' parameters, as is the default in Seurat (ie. "res.0.8").} } -#' \item{@@dr$pca@@cell.embeddings}{Holds the results of the PCA run by Seurat. The -#' cell embeddings are used for the silhouette plot in the Shiny app. If Seurat v2.x -#' or greater was used, only the PC dimensions used in clustering will be considered -#' in silhouette calculations. If an alternative dimensionality reduction method was -#' used prior to clustering, use \code{readFromManual} to manually specify the -#' desired cell embeddings.} -#' \item{@@dr$tsne@@cell.embeddings}{Holds the results of the tSNE run by Seurat. The -#' cell embeddings are used for cell visualizations in the Shiny app. If an -#' alternative 2D projection method was used, use \code{readFromManual} to manually -#' specify the desired cell embeddings.} +#' \item{@@data}{Holds the normalized gene expression matrix.} +#' \item{@@meta.data}{Holds the metadata, including cluster assignments. +#' \emph{Cluster assignment columns of the metadata should be titled with +#' their resolution parameters, as is the default in Seurat (ie. +#' "res.0.8").} } +#' \item{@@dr$pca@@cell.embeddings}{Holds the results of the PCA run by +#' Seurat. The cell embeddings are used for the silhouette plot in the +#' Shiny app. If Seurat v2.x or greater was used, only the PC dimensions +#' used in clustering will be considered in silhouette calculations. If +#' an alternative dimensionality reduction method was used prior to +#' clustering, use \code{readFromManual} to manually specify the desired +#' cell embeddings.} +#' \item{@@dr$tsne@@cell.embeddings}{Holds the results of the tSNE run by +#' Seurat. The cell embeddings are used for cell visualizations in the +#' Shiny app. If an alternative 2D projection method was used, use +#' \code{readFromManual} to manually specify the desired cell embeddings.} #' } #' -#' @examples +#' @examples #' \dontrun{ -#' data_for_scClustViz <- readFromSeurat(your_seurat_object, -#' convertGeneIDs=F) -#' rm(your_seurat_object) +#' data_for_scClustViz <- readFromSeurat(your_seurat_object) +#' rm(your_seurat_object) #' # All the data scClustViz needs is in 'data_for_scClustViz'. -#' +#' #' DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz) -#' +#' #' save(data_for_scClustViz,DE_for_scClustViz, #' file="for_scClustViz.RData") #' # Save these objects so you'll never have to run this slow function again! -#' +#' #' runShiny(filePath="for_scClustViz.RData") #' } #' #' @family importData functions #' -#' @seealso https://satijalab.org/seurat/ for more information on the Seurat package, and -#' \code{\link{readFromManual}} for loading data by manually passing the requisite data -#' objects. +#' @seealso https://satijalab.org/seurat/ for more information on the Seurat +#' package, and \code{\link{readFromManual}} for loading data by manually +#' passing the requisite data objects. #' #' @export @@ -101,42 +104,73 @@ readFromSeurat <- function(inD) { #' Read in data manually by passing the requested objects #' -#' Creates the data object expected by the cluster-wise differential expression testing -#' function and the Shiny app. The user must provide the input objects as arguments. -#' -#' @param ngs A matrix (can be a sparse matrix from the Matrix package) of normalized gene -#' expression per cell. Genes on the rows, with row names as gene names (suggest using -#' official gene symbols, will convert if requested - see \code{convertGeneIDs}) -#' -#' @param md A dataframe containing cell metadata, not including cluster assignments. -#' -#' @param cl A dataframe containing cell cluster assignments, where every column is the -#' result of a clustering run with different parameters. The columns will be sorted in -#' order of increasing resolution (k, number of clusters). -#' -#' @param dr_clust A matrix of cell embeddings in the reduced-dimensional space used for -#' clustering (ie. PCA), with rows of cells (with rownames), and columns of dimensions. -#' This will be used to calculate euclidean distance between cells for the silhouette -#' plot, so it will be more relevant if the scale of each dimension is weighted by the -#' variance it explains. -#' -#' @return The function returns a list containing input data necessary for both the -#' cluster-wise differential expression testing function and the Shiny app itself. -#' The list contains the following elements: -#' \describe{ +#' Creates the data object expected by the cluster-wise differential expression +#' testing function and the Shiny app. The user must provide the input objects +#' as arguments. +#' +#' @param nge A matrix (can be a sparse matrix from the Matrix package) of +#' normalized gene expression per cell. Genes on the rows, with row names as +#' gene names (suggest using official gene symbols, will convert if requested +#' - see \code{convertGeneIDs}) +#' +#' @param md A dataframe containing cell metadata, not including cluster +#' assignments. +#' +#' @param cl A dataframe containing cell cluster assignments, where every column +#' is the result of a clustering run with different parameters. The columns +#' will be sorted in order of increasing resolution (k, number of clusters). +#' +#' @param dr_clust A matrix of cell embeddings in the reduced-dimensional space +#' used for clustering (ie. PCA), with rows of cells (with rownames), and +#' columns of dimensions. This will be used to calculate euclidean distance +#' between cells for the silhouette plot, so it will be more relevant if the +#' scale of each dimension is weighted by the variance it explains. +#' +#' @param dr_viz A nx2 matrix of cell embeddings in two-dimensional space used +#' for visualization of cells, with rows of cells (with rownames), and 2 +#' columns of dimensions. This is typically a tSNE projection, but any 2D +#' embedding of cells is accepted. +#' +#' @return The function returns a list containing input data necessary for both +#' the cluster-wise differential expression testing function and the Shiny app +#' itself. The list contains the following elements: +#' \describe{ #' \item{nge}{The normalized gene expression matrix.} #' \item{md}{The metadata dataframe, not including cluster assignments.} -#' \item{cl}{The cluster assignment dataframe, containing cluster assignments for each -#' resolution tested. The columns will be sorted in order of increasing resolution -#' (k, number of clusters).} -#' \item{dr_clust}{The cell embeddings used in the clustering, from PCA.} -#' \item{dr_viz}{The cell embeddings used for visualization in 2D, from tSNE.} +#' \item{cl}{The cluster assignment dataframe, containing cluster +#' assignments for each resolution tested. The columns will be sorted in +#' order of increasing resolution (k, number of clusters).} +#' \item{dr_clust}{The cell embeddings used in the clustering.} +#' \item{dr_viz}{The cell embeddings used for visualization in 2D.} #' } -#' +#' +#' @examples +#' \dontrun{ +#' ### Reading in data from a SingleCellExperiment class ### +#' clusterAssignments <- grepl("^Clust",colnames(colData(mySCE))) +#' # A logical vector separating the cluster assignments from the rest of the +#' # cell metadata in the colData slot. This is an example that you will have +#' # to change to reflect your cluster assignment column names. +#' data_for_scClustViz <- readFromManual(nge=logcounts(mySCE), +#' md=colData(mySCE)[,!clusterAssignments], +#' cl=colData(mySCE)[,clusterAssignments], +#' dr_clust=reducedDim(mySCE,"PCA"), +#' dr_viz=reductedDim(mySCE,"tSNE")) +#' # All the data scClustViz needs is in 'data_for_scClustViz'. +#' +#' DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz) +#' +#' save(data_for_scClustViz,DE_for_scClustViz, +#' file="for_scClustViz.RData") +#' # Save these objects so you'll never have to run this slow function again! +#' +#' runShiny(filePath="for_scClustViz.RData") +#' } +#' #' @family importData functions -#' +#' #' @seealso \code{\link{readFromSeurat}} for loading data from a Seurat object. -#' +#' #' @export readFromManual <- function(nge,md,cl,dr_clust,dr_viz) { diff --git a/R/runViz.R b/R/runViz.R index 71982e6..fe607b0 100644 --- a/R/runViz.R +++ b/R/runViz.R @@ -44,56 +44,59 @@ #' function will try to predict the appropriate keytype of the rownames (this #' takes a bit of time). #' -#' @param exponent Default = 2. The log base of your normalized input data. -#' Seurat normalization uses the natural log (set this to exp(1)), while other -#' normalization methods generally use log2 (set this to 2). This is used if +#' @param exponent Default = Taken from \code{clusterWiseDEtest} output. The log +#' base of your normalized input data. Seurat normalization uses the natural +#' log (set this to exp(1)), while other normalization methods generally use +#' log2 (set this to 2). This is used if you use the function for testing +#' differential gene expression between custom sets, and is set automatically +#' to match the parameters used in \code{clusterWiseDEtest}. +#' +#' @param pseudocount Default = Taken from \code{clusterWiseDEtest} output. The +#' pseudocount added to all log-normalized values in your input data. Most +#' methods use a pseudocount of 1 to eliminate log(0) errors. This is used if #' you use the function for testing differential gene expression between -#' custom sets, and should be set to the same parameters as in +#' custom sets, and is set automatically to match the parameters used in #' \code{clusterWiseDEtest}. #' -#' @param pseudocount Default = 1. The pseudocount added to all log-normalized -#' values in your input data. Most methods use a pseudocount of 1 to eliminate -#' log(0) errors. This is used if you use the function for testing -#' differential gene expression between custom sets, and should be set to the -#' same parameters as in \code{clusterWiseDEtest}. -#' -#' @param FDRthresh Default = 0.01. The false discovery rate to use as a -#' threshold for determining statistical significance of differential -#' expression calculated by the Wilcoxon rank-sum test. This is used if you -#' use the function for testing differential gene expression between custom -#' sets, and should be set to the same parameters as in -#' \code{clusterWiseDEtest}. +#' @param FDRthresh Default = Taken from \code{clusterWiseDEtest} output. The +#' false discovery rate to use as a threshold for determining statistical +#' significance of differential expression calculated by the Wilcoxon rank-sum +#' test. This is used if you use the function for testing differential gene +#' expression between custom sets, and is set automatically to match the +#' parameters used in \code{clusterWiseDEtest}. #' -#' @param threshType Default = "dDR". Filtering genes for use in differential -#' expression testing can be done multiple ways. We use an expression ratio -#' filter for comparing each cluster to the rest of the tissue as a whole, but -#' find that difference in detection rates works better when comparing -#' clusters to each other. You can set threshType to \code{"logGER"} to use a -#' gene expression ratio for all gene filtering, or leave it as default -#' (\code{"dDR"}) to use difference in detection rate as the thresholding -#' method when comparing clusters to each other. This is used if you use the -#' function for testing differential gene expression between custom sets, and -#' should be set to the same parameters as in \code{clusterWiseDEtest}. +#' @param threshType Default = Taken from \code{clusterWiseDEtest} output. +#' Filtering genes for use in differential expression testing can be done +#' multiple ways. We use an expression ratio filter for comparing each cluster +#' to the rest of the tissue as a whole, but find that difference in detection +#' rates works better when comparing clusters to each other. You can set +#' threshType to \code{"logGER"} to use a gene expression ratio for all gene +#' filtering, or leave it as default (\code{"dDR"}) to use difference in +#' detection rate as the thresholding method when comparing clusters to each +#' other. This is used if you use the function for testing differential gene +#' expression between custom sets, and is set automatically to match the +#' parameters used in \code{clusterWiseDEtest}. #' -#' @param dDRthresh Default = 0.15. Magnitude of detection rate difference of a -#' gene between clusters to use as filter for determining which genes to test -#' for differential expression between clusters. This is used if you use the -#' function for testing differential gene expression between custom sets, and -#' should be set to the same parameters as in \code{clusterWiseDEtest}. +#' @param dDRthresh Default = Taken from \code{clusterWiseDEtest} output. +#' Magnitude of detection rate difference of a gene between clusters to use as +#' filter for determining which genes to test for differential expression +#' between clusters. This is used if you use the function for testing +#' differential gene expression between custom sets, and is set automatically +#' to match the parameters used in \code{clusterWiseDEtest}. #' -#' @param logGERthresh Default = 1. Magnitude of gene expression ratio for a -#' gene between clusters to use as filter for determining which genes to test -#' for differential expression between clusters. This is used if you use the -#' function for testing differential gene expression between custom sets, and -#' should be set to the same parameters as in \code{clusterWiseDEtest}. +#' @param logGERthresh Default = Taken from \code{clusterWiseDEtest} output. +#' Magnitude of gene expression ratio for a gene between clusters to use as +#' filter for determining which genes to test for differential expression +#' between clusters. This is used if you use the function for testing +#' differential gene expression between custom sets, and is set automatically +#' to match the parameters used in \code{clusterWiseDEtest}. #' #' @return The function causes the scClustViz Shiny GUI app to open in a #' seperate window. #' #' @examples #' \dontrun{ -#' data_for_scClustViz <- readFromSeurat(your_seurat_object, -#' convertGeneIDs=F) +#' data_for_scClustViz <- readFromSeurat(your_seurat_object) #' rm(your_seurat_object) #' # All the data scClustViz needs is in 'data_for_scClustViz'. #' @@ -111,7 +114,7 @@ #' runShiny(system.file("e13cortical_forViz.RData",package="MouseCortex"), #' # Load input file (E13.5 data) from package directory. #' outPath=".", -#' # Save any further analysis performed in the app to the +#' # Save any further analysis performed in the app to the #' # working directory rather than library directory. #' annotationDB="org.Mm.eg.db", #' # This is an optional argument, but will add annotations. @@ -131,10 +134,10 @@ #' ) #' } #' -#' @seealso \code{\link{readFromSeurat}} or \code{\link{readFromManual}} for reading in -#' data to generate the first input object for this function, and -#' \code{\link{clusterWiseDEtest}} to do the differential expression testing to generate -#' the second input object for this function. +#' @seealso \code{\link{readFromSeurat}} or \code{\link{readFromManual}} for +#' reading in data to generate the first input object for this function, and +#' \code{\link{clusterWiseDEtest}} to do the differential expression testing +#' to generate the second input object for this function. #' #' @import shiny #' @importFrom scales alpha @@ -145,8 +148,8 @@ runShiny <- function(filePath,outPath, cellMarkers=list(), annotationDB,rownameKeytype, - exponent=2,pseudocount=1,FDRthresh=0.01, - threshType="dDR",dDRthresh=0.15,logGERthresh=1) { + exponent,pseudocount,FDRthresh, + threshType,dDRthresh,logGERthresh) { # ^ Load data from file ------------------------------------------------------------------ while(T) { if (exists(".lastFileCall")) { @@ -165,9 +168,9 @@ runShiny <- function(filePath,outPath, break } } - # The above weird-ass loop checks to see if the file has already been loaded - # (if this function has been run previously this session), otherwise loads the - # file. + # The above weird-ass loop (or weird ass-loop if you prefer) checks to see if + # the file has already been loaded (if this function has been run previously + # this session), otherwise loads the file. temp_objNames <- sapply(.lastFileCall[[filePath]],function(X) names(get(X)),simplify=F) for (L in names(temp_objNames)) { @@ -180,6 +183,14 @@ runShiny <- function(filePath,outPath, # needed in the Shiny app, and saves the objects in the function environment # under the names the shiny app expects. + # Load parameters from clusterWiseDEtest output + if (missing(exponent)) { exponent <- params$exponent } + if (missing(pseudocount)) { pseudocount <- params$pseudocount } + if (missing(FDRthresh)) { FDRthresh <- params$FDRthresh } + if (missing(threshType)) { threshType <- params$threshType } + if (missing(dDRthresh)) { dDRthresh <- params$dDRthresh } + if (missing(logGERthresh)) { logGERthresh <- params$logGERthresh } + cl <- cl[names(deNeighb)] # Ensures that only clusters that were tested for differential expression are # displayed. This prevents a whole pile of errors. diff --git a/man/clusterWiseDEtest.Rd b/man/clusterWiseDEtest.Rd index d262c78..8c8a53f 100644 --- a/man/clusterWiseDEtest.Rd +++ b/man/clusterWiseDEtest.Rd @@ -9,82 +9,159 @@ clusterWiseDEtest(il, testAll = TRUE, exponent = 2, pseudocount = 1, logGERthresh = 1) } \arguments{ -\item{il}{The list outputted by one of the importData functions (either +\item{il}{The list outputted by one of the importData functions (either \code{\link{readFromSeurat}} or \code{\link{readFromManual}}).} -\item{testAll}{Logical value indicating whether to test all cluster solutions -(\code{TRUE}) or stop testing once a cluster solution has been found where there is -no differentially expressed genes found between at least one pair of nearest -neighbouring clusters (\code{FALSE}). \emph{If set to (\code{FALSE}), only the -cluster solutions tested will appear in the scClustViz shiny app.}} +\item{testAll}{Default = TRUE. Logical value indicating whether to test all +cluster solutions (\code{TRUE}) or stop testing once a cluster solution has +been found where there is no differentially expressed genes found between +at least one pair of nearest neighbouring clusters (\code{FALSE}). \emph{If +set to (\code{FALSE}), only the cluster solutions tested will appear in the +scClustViz shiny app.}} -\item{exponent}{The log base of your normalized input data. Seurat normalization uses -the natural log (set this to exp(1)), while other normalization methods generally use -log2 (set this to 2).} +\item{exponent}{Default = 2. The log base of your normalized input data. +Seurat normalization uses the natural log (set this to exp(1)), while other +normalization methods generally use log2 (set this to 2).} -\item{pseudocount}{The pseudocount added to all log-normalized values in your input -data. Most methods use a pseudocount of 1 to eliminate log(0) errors.} +\item{pseudocount}{Default = 1. The pseudocount added to all log-normalized +values in your input data. Most methods use a pseudocount of 1 to eliminate +log(0) errors.} -\item{FDRthresh}{The false discovery rate to use as a threshold for determining statistical -significance of differential expression calculated by the Wilcoxon rank-sum test.} +\item{FDRthresh}{Default = 0.01. The false discovery rate to use as a +threshold for determining statistical significance of differential +expression calculated by the Wilcoxon rank-sum test.} -\item{threshType}{Filtering genes for use in differential expression testing can be -done multiple ways. We use an expression ratio filter for comparing each cluster to -the rest of the tissue as a whole, but find that difference in detection rates works -better when comparing clusters to each other. You can set threshType to -\code{"logGER"} to use a gene expression ratio for all gene filtering, or leave it as -default (\code{"dDR"}) to use difference in detection rate as the thresholding method -when comparing clusters to each other.} +\item{threshType}{Default = "dDR". Filtering genes for use in differential +expression testing can be done multiple ways. We use an expression ratio +filter for comparing each cluster to the rest of the tissue as a whole, but +find that difference in detection rates works better when comparing +clusters to each other. You can set threshType to \code{"logGER"} to use a +gene expression ratio for all gene filtering, or leave it as default +(\code{"dDR"}) to use difference in detection rate as the thresholding +method when comparing clusters to each other.} -\item{dDRthresh}{Magnitude of detection rate difference of a gene between clusters to -use as filter for determining which genes to test for differential expression between -clusters.} +\item{dDRthresh}{Default = 0.15. Magnitude of detection rate difference of a +gene between clusters to use as filter for determining which genes to test +for differential expression between clusters.} -\item{logGERthresh}{Magnitude of gene expression ratio for a gene between clusters to -use as filter for determining which genes to test for differential expression between -clusters.} +\item{logGERthresh}{Default = 1. Magnitude of gene expression ratio for a +gene between clusters to use as filter for determining which genes to test +for differential expression between clusters.} } \value{ -The function returns a list containing the results of differential expression - testing for all sets of cluster solutions. \emph{Saving both the input (the object passed - to the \code{il} argument) and the output of this function to an RData file is all - the preparation necessary for running the scClustViz Shiny app itself.} - The output list of this function contains the following elements: - \describe{ - \item{CGS}{} - \item{deTissue}{} - \item{deVS}{} - \item{deMarker}{} - \item{deDist}{} - \item{deNeighb}{} +The function returns a list containing the results of differential + expression testing for all sets of cluster solutions. \emph{Saving both the + input (the object passed to the \code{il} argument) and the output of this + function to an RData file is all the preparation necessary for running the + scClustViz Shiny app itself.} The output list of this function contains the + following elements: + \describe{ + \item{CGS}{A nested list of dataframes. Each list element is named for + a column in \code{il$cl} (a cluster resolution). That list element + contains a named list of clusters at that resolution. Each of those + list elements contains a dataframe of three variables, where each + sample is a gene. \code{DR} is the proportion of cells in the cluster + in which that gene was detected. \code{MDTC} is mean normalized gene + expression for that gene in only the cells in which it was detected + (see \link{meanLogX} for mean calculation). \code{MTC} is the mean + normalized gene expression for that gene in all cells of the cluster + (see \link{meanLogX} for mean calculation).} + \item{deTissue}{Differential testing results from Wilcoxon rank sum tests + comparing a gene in each cluster to the rest of the cells as a whole in + a one vs all comparison. The results are stored as a nested list of + dataframes. Each list element is named for a column in \code{il$cl} (a + cluster resolution). That list element contains a named list of + clusters at that resolution. Each of those list elements contains a + dataframe of three variables, where each sample is a gene. + \code{logGER} is the log gene expression ratio calculated by + subtracting the mean expression of the gene (see \link{meanLogX} for + mean calculation) in all other cells from the mean expression of the + gene in this cluster. \code{pVal} is the p-value of the Wilcoxon rank + sum test. \code{qVal} is the false discovery rate-corrected p-value of + the test.} + \item{deVS}{Differential testing results from Wilcoxon rank sum tests + comparing a gene in each cluster to that gene in every other cluster in + a series of tests. The results are stored as a nested list of + dataframes. Each list element is named for a column in \code{il$cl} (a + cluster resolution). That list element contains a named list of + clusters at that resolution (cluster A). Each of those lists contains a + named list of all the other clusters at that resolution (cluster B). + Each of those list elements contains a dataframe of four variables, + where each sample is a gene. \code{dDR} is the difference in detection + rate of that gene between the two clusters (DR[A] - DR[B]). + \code{logGER} is the log gene expression ratio calculated by taking the + difference in mean expression of the gene (see \link{meanLogX} for + mean calculation) between the two clusters (MTC[A] - MTC[B]). + \code{pVal} is the p-value of the Wilcoxon rank sum test. \code{qVal} + is the false discovery rate-corrected p-value of the test.} + \item{deMarker}{Differential testing results from Wilcoxon rank sum tests + comparing a gene in each cluster to that gene in every other cluster in + a series of tests, and filtering for only those genes that show + significant positive differential expression versus all other clusters. + The results are stored as a nested list of dataframes. Each list + element is named for a column in \code{il$cl} (a cluster resolution). + That list element contains a named list of clusters at that resolution + (cluster A). Each of those list elements contains a dataframe where + variables represent comparisons to all the other clusters and each + sample is a gene. For each other cluster (cluster B), there are three + variables, named as follows: \code{vs.B.dDR} is the difference in + detection rate of that gene between the two clusters (DR[A] - DR[B]). + \code{vs.B.logGER} is the log gene expression ratio calculated by + taking the difference in mean expression of the gene (see + \link{meanLogX} for mean calculation) between the two clusters (MTC[A] + - MTC[B]). \code{vs.B.qVal} is the false discovery rate-corrected + p-value of the Wilcoxon rank sum test.} + \item{deDist}{A named list of distances between clusters for each cluster + resolution. Distances are calculated as number of differentially + expressed genes between clusters.} + \item{deNeighb}{Differential testing results from Wilcoxon rank sum tests + comparing a gene in each cluster to that gene in its nearest + neighbouring cluster (calculated by number of differentially expressed + genes), and filtering for only those genes that show significant + positive differential expression versus all other clusters. The results + are stored as a nested list of dataframes. Each list element is named + for a column in \code{il$cl} (a cluster resolution). That list element + contains a named list of clusters at that resolution (cluster A). Each + of those list elements contains a dataframe where variables represent + the comparison to its nearest neighbouring cluster (cluster B) and each + sample is a gene. There are three variables, named as follows: + \code{vs.B.dDR} is the difference in detection rate of that gene + between the two clusters (DR[A] - DR[B]). \code{vs.B.logGER} is the log + gene expression ratio calculated by taking the difference in mean + expression of the gene (see \link{meanLogX} for mean calculation) + between the two clusters (MTC[A] - MTC[B]). \code{vs.B.qVal} is the + false discovery rate-corrected p-value of the Wilcoxon rank sum test.} + \item{params}{A list of the parameters from the argument list of this + function used to do the analysis, saved so that the same parameters are + used in the Shiny app.} } } \description{ -Performs differential expression testing between clusters for all cluster solutions in -order to assess the biological relevance of each cluster solution. Differential -expression testing is done using the Wilcoxon rank-sum test implemented in the base R -\code{stats} package. For details about what is being compared in the tests, see the -"Value" section. +Performs differential expression testing between clusters for all cluster +solutions in order to assess the biological relevance of each cluster +solution. Differential expression testing is done using the Wilcoxon rank-sum +test implemented in the base R \code{stats} package. For details about what +is being compared in the tests, see the "Value" section. } \examples{ \dontrun{ - data_for_scClustViz <- readFromSeurat(your_seurat_object, - convertGeneIDs=F) - rm(your_seurat_object) + data_for_scClustViz <- readFromSeurat(your_seurat_object) + rm(your_seurat_object) # All the data scClustViz needs is in 'data_for_scClustViz'. - + DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz) - + save(data_for_scClustViz,DE_for_scClustViz, file="for_scClustViz.RData") # Save these objects so you'll never have to run this slow function again! - + runShiny(filePath="for_scClustViz.RData") } } \seealso{ -\code{\link{readFromSeurat}} or \code{\link{readFromManual}} for reading in - data to generate the input object for this function, and \code{\link{runShiny}} to - use the interactive Shiny GUI to view the results of this testing. +\code{\link{readFromSeurat}} or \code{\link{readFromManual}} for + reading in data to generate the input object for this function, and + \code{\link{runShiny}} to use the interactive Shiny GUI to view the results + of this testing. } diff --git a/man/readFromManual.Rd b/man/readFromManual.Rd index 2a2f92e..f068c03 100644 --- a/man/readFromManual.Rd +++ b/man/readFromManual.Rd @@ -7,39 +7,71 @@ readFromManual(nge, md, cl, dr_clust, dr_viz) } \arguments{ -\item{md}{A dataframe containing cell metadata, not including cluster assignments.} +\item{nge}{A matrix (can be a sparse matrix from the Matrix package) of +normalized gene expression per cell. Genes on the rows, with row names as +gene names (suggest using official gene symbols, will convert if requested +- see \code{convertGeneIDs})} -\item{cl}{A dataframe containing cell cluster assignments, where every column is the -result of a clustering run with different parameters. The columns will be sorted in -order of increasing resolution (k, number of clusters).} +\item{md}{A dataframe containing cell metadata, not including cluster +assignments.} -\item{dr_clust}{A matrix of cell embeddings in the reduced-dimensional space used for -clustering (ie. PCA), with rows of cells (with rownames), and columns of dimensions. -This will be used to calculate euclidean distance between cells for the silhouette -plot, so it will be more relevant if the scale of each dimension is weighted by the -variance it explains.} +\item{cl}{A dataframe containing cell cluster assignments, where every column +is the result of a clustering run with different parameters. The columns +will be sorted in order of increasing resolution (k, number of clusters).} -\item{ngs}{A matrix (can be a sparse matrix from the Matrix package) of normalized gene -expression per cell. Genes on the rows, with row names as gene names (suggest using -official gene symbols, will convert if requested - see \code{convertGeneIDs})} +\item{dr_clust}{A matrix of cell embeddings in the reduced-dimensional space +used for clustering (ie. PCA), with rows of cells (with rownames), and +columns of dimensions. This will be used to calculate euclidean distance +between cells for the silhouette plot, so it will be more relevant if the +scale of each dimension is weighted by the variance it explains.} + +\item{dr_viz}{A nx2 matrix of cell embeddings in two-dimensional space used +for visualization of cells, with rows of cells (with rownames), and 2 +columns of dimensions. This is typically a tSNE projection, but any 2D +embedding of cells is accepted.} } \value{ -The function returns a list containing input data necessary for both the - cluster-wise differential expression testing function and the Shiny app itself. - The list contains the following elements: - \describe{ +The function returns a list containing input data necessary for both + the cluster-wise differential expression testing function and the Shiny app + itself. The list contains the following elements: + \describe{ \item{nge}{The normalized gene expression matrix.} \item{md}{The metadata dataframe, not including cluster assignments.} - \item{cl}{The cluster assignment dataframe, containing cluster assignments for each - resolution tested. The columns will be sorted in order of increasing resolution - (k, number of clusters).} - \item{dr_clust}{The cell embeddings used in the clustering, from PCA.} - \item{dr_viz}{The cell embeddings used for visualization in 2D, from tSNE.} + \item{cl}{The cluster assignment dataframe, containing cluster + assignments for each resolution tested. The columns will be sorted in + order of increasing resolution (k, number of clusters).} + \item{dr_clust}{The cell embeddings used in the clustering.} + \item{dr_viz}{The cell embeddings used for visualization in 2D.} } } \description{ -Creates the data object expected by the cluster-wise differential expression testing -function and the Shiny app. The user must provide the input objects as arguments. +Creates the data object expected by the cluster-wise differential expression +testing function and the Shiny app. The user must provide the input objects +as arguments. +} +\examples{ +\dontrun{ + ### Reading in data from a SingleCellExperiment class ### + clusterAssignments <- grepl("^Clust",colnames(colData(mySCE))) + # A logical vector separating the cluster assignments from the rest of the + # cell metadata in the colData slot. This is an example that you will have + # to change to reflect your cluster assignment column names. + data_for_scClustViz <- readFromManual(nge=logcounts(mySCE), + md=colData(mySCE)[,!clusterAssignments], + cl=colData(mySCE)[,clusterAssignments], + dr_clust=reducedDim(mySCE,"PCA"), + dr_viz=reductedDim(mySCE,"tSNE")) + # All the data scClustViz needs is in 'data_for_scClustViz'. + + DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz) + + save(data_for_scClustViz,DE_for_scClustViz, + file="for_scClustViz.RData") + # Save these objects so you'll never have to run this slow function again! + + runShiny(filePath="for_scClustViz.RData") +} + } \seealso{ \code{\link{readFromSeurat}} for loading data from a Seurat object. diff --git a/man/readFromSeurat.Rd b/man/readFromSeurat.Rd index 76a540b..d9f7fab 100644 --- a/man/readFromSeurat.Rd +++ b/man/readFromSeurat.Rd @@ -10,67 +10,70 @@ readFromSeurat(inD) \item{inD}{A Seurat object containing slots as outlined in Details.} } \value{ -The function returns a list containing input data necessary for both the - cluster-wise differential expression testing function and the Shiny app itself. The - list contains the following elements: +The function returns a list containing input data necessary for both + the cluster-wise differential expression testing function and the Shiny app + itself. The list contains the following elements: \describe{ \item{nge}{The normalized gene expression matrix.} \item{md}{The metadata dataframe, not including cluster assignments.} - \item{cl}{The cluster assignment dataframe, containing cluster assignments for each - resolution tested. The columns will be sorted in order of increasing resolution - (k, number of clusters).} - \item{dr_clust}{The cell embeddings used in the clustering, from PCA.} - \item{dr_viz}{The cell embeddings used for visualization in 2D, from tSNE.} + \item{cl}{The cluster assignment dataframe, containing cluster + assignments for each resolution tested. The columns will be sorted in + order of increasing resolution (k, number of clusters).} + \item{dr_clust}{The cell embeddings used in the clustering, from PCA.} + \item{dr_viz}{The cell embeddings used for visualization in 2D, from + tSNE.} } } \description{ -Loads the necessary data from a Seurat object for use in both the cluster-wise -differential expression testing function, as well as in the Shiny app itself. +Loads the necessary data from a Seurat object for use in both the +cluster-wise differential expression testing function, as well as in the +Shiny app itself. } \section{Seurat object slots}{ - The following slots are expected in the Seurat object. If - you're using Seurat v1.x, the equivalent slots are expected (this code takes - advantage of \code{UpdateSeuratObject} to find the relevant data in older Seurat - objects.) + The following slots are expected in the Seurat + object. If you're using Seurat v1.x, the equivalent slots are expected + (this code takes advantage of \code{UpdateSeuratObject} to find the + relevant data in older Seurat objects.) \describe{ - \item{@data}{Holds the normalized gene expression matrix.} - \item{@meta.data}{Holds the metadata, including cluster assignments. \emph{Cluster - assignment columns of the metadata should be titled with their resolution - parameters, as is the default in Seurat (ie. "res.0.8").} } - \item{@dr$pca@cell.embeddings}{Holds the results of the PCA run by Seurat. The - cell embeddings are used for the silhouette plot in the Shiny app. If Seurat v2.x - or greater was used, only the PC dimensions used in clustering will be considered - in silhouette calculations. If an alternative dimensionality reduction method was - used prior to clustering, use \code{readFromManual} to manually specify the - desired cell embeddings.} - \item{@dr$tsne@cell.embeddings}{Holds the results of the tSNE run by Seurat. The - cell embeddings are used for cell visualizations in the Shiny app. If an - alternative 2D projection method was used, use \code{readFromManual} to manually - specify the desired cell embeddings.} + \item{@data}{Holds the normalized gene expression matrix.} + \item{@meta.data}{Holds the metadata, including cluster assignments. + \emph{Cluster assignment columns of the metadata should be titled with + their resolution parameters, as is the default in Seurat (ie. + "res.0.8").} } + \item{@dr$pca@cell.embeddings}{Holds the results of the PCA run by + Seurat. The cell embeddings are used for the silhouette plot in the + Shiny app. If Seurat v2.x or greater was used, only the PC dimensions + used in clustering will be considered in silhouette calculations. If + an alternative dimensionality reduction method was used prior to + clustering, use \code{readFromManual} to manually specify the desired + cell embeddings.} + \item{@dr$tsne@cell.embeddings}{Holds the results of the tSNE run by + Seurat. The cell embeddings are used for cell visualizations in the + Shiny app. If an alternative 2D projection method was used, use + \code{readFromManual} to manually specify the desired cell embeddings.} } } \examples{ \dontrun{ - data_for_scClustViz <- readFromSeurat(your_seurat_object, - convertGeneIDs=F) - rm(your_seurat_object) + data_for_scClustViz <- readFromSeurat(your_seurat_object) + rm(your_seurat_object) # All the data scClustViz needs is in 'data_for_scClustViz'. - + DE_for_scClustViz <- clusterWiseDEtest(data_for_scClustViz) - + save(data_for_scClustViz,DE_for_scClustViz, file="for_scClustViz.RData") # Save these objects so you'll never have to run this slow function again! - + runShiny(filePath="for_scClustViz.RData") } } \seealso{ -https://satijalab.org/seurat/ for more information on the Seurat package, and - \code{\link{readFromManual}} for loading data by manually passing the requisite data - objects. +https://satijalab.org/seurat/ for more information on the Seurat + package, and \code{\link{readFromManual}} for loading data by manually + passing the requisite data objects. Other importData functions: \code{\link{readFromManual}} } diff --git a/man/runShiny.Rd b/man/runShiny.Rd index 25f66c3..d86939d 100644 --- a/man/runShiny.Rd +++ b/man/runShiny.Rd @@ -5,8 +5,8 @@ \title{Run the scClustViz Shiny app} \usage{ runShiny(filePath, outPath, cellMarkers = list(), annotationDB, - rownameKeytype, exponent = 2, pseudocount = 1, FDRthresh = 0.01, - threshType = "dDR", dDRthresh = 0.15, logGERthresh = 1) + rownameKeytype, exponent, pseudocount, FDRthresh, threshType, dDRthresh, + logGERthresh) } \arguments{ \item{filePath}{A character vector giving the relative filepath to an RData @@ -47,48 +47,52 @@ symbols. If less than 80% of rownames map to official gene symbols, the function will try to predict the appropriate keytype of the rownames (this takes a bit of time).} -\item{exponent}{Default = 2. The log base of your normalized input data. -Seurat normalization uses the natural log (set this to exp(1)), while other -normalization methods generally use log2 (set this to 2). This is used if +\item{exponent}{Default = Taken from \code{clusterWiseDEtest} output. The log +base of your normalized input data. Seurat normalization uses the natural +log (set this to exp(1)), while other normalization methods generally use +log2 (set this to 2). This is used if you use the function for testing +differential gene expression between custom sets, and is set automatically +to match the parameters used in \code{clusterWiseDEtest}.} + +\item{pseudocount}{Default = Taken from \code{clusterWiseDEtest} output. The +pseudocount added to all log-normalized values in your input data. Most +methods use a pseudocount of 1 to eliminate log(0) errors. This is used if you use the function for testing differential gene expression between -custom sets, and should be set to the same parameters as in +custom sets, and is set automatically to match the parameters used in \code{clusterWiseDEtest}.} -\item{pseudocount}{Default = 1. The pseudocount added to all log-normalized -values in your input data. Most methods use a pseudocount of 1 to eliminate -log(0) errors. This is used if you use the function for testing -differential gene expression between custom sets, and should be set to the -same parameters as in \code{clusterWiseDEtest}.} - -\item{FDRthresh}{Default = 0.01. The false discovery rate to use as a -threshold for determining statistical significance of differential -expression calculated by the Wilcoxon rank-sum test. This is used if you -use the function for testing differential gene expression between custom -sets, and should be set to the same parameters as in -\code{clusterWiseDEtest}.} +\item{FDRthresh}{Default = Taken from \code{clusterWiseDEtest} output. The +false discovery rate to use as a threshold for determining statistical +significance of differential expression calculated by the Wilcoxon rank-sum +test. This is used if you use the function for testing differential gene +expression between custom sets, and is set automatically to match the +parameters used in \code{clusterWiseDEtest}.} -\item{threshType}{Default = "dDR". Filtering genes for use in differential -expression testing can be done multiple ways. We use an expression ratio -filter for comparing each cluster to the rest of the tissue as a whole, but -find that difference in detection rates works better when comparing -clusters to each other. You can set threshType to \code{"logGER"} to use a -gene expression ratio for all gene filtering, or leave it as default -(\code{"dDR"}) to use difference in detection rate as the thresholding -method when comparing clusters to each other. This is used if you use the -function for testing differential gene expression between custom sets, and -should be set to the same parameters as in \code{clusterWiseDEtest}.} +\item{threshType}{Default = Taken from \code{clusterWiseDEtest} output. +Filtering genes for use in differential expression testing can be done +multiple ways. We use an expression ratio filter for comparing each cluster +to the rest of the tissue as a whole, but find that difference in detection +rates works better when comparing clusters to each other. You can set +threshType to \code{"logGER"} to use a gene expression ratio for all gene +filtering, or leave it as default (\code{"dDR"}) to use difference in +detection rate as the thresholding method when comparing clusters to each +other. This is used if you use the function for testing differential gene +expression between custom sets, and is set automatically to match the +parameters used in \code{clusterWiseDEtest}.} -\item{dDRthresh}{Default = 0.15. Magnitude of detection rate difference of a -gene between clusters to use as filter for determining which genes to test -for differential expression between clusters. This is used if you use the -function for testing differential gene expression between custom sets, and -should be set to the same parameters as in \code{clusterWiseDEtest}.} +\item{dDRthresh}{Default = Taken from \code{clusterWiseDEtest} output. +Magnitude of detection rate difference of a gene between clusters to use as +filter for determining which genes to test for differential expression +between clusters. This is used if you use the function for testing +differential gene expression between custom sets, and is set automatically +to match the parameters used in \code{clusterWiseDEtest}.} -\item{logGERthresh}{Default = 1. Magnitude of gene expression ratio for a -gene between clusters to use as filter for determining which genes to test -for differential expression between clusters. This is used if you use the -function for testing differential gene expression between custom sets, and -should be set to the same parameters as in \code{clusterWiseDEtest}.} +\item{logGERthresh}{Default = Taken from \code{clusterWiseDEtest} output. +Magnitude of gene expression ratio for a gene between clusters to use as +filter for determining which genes to test for differential expression +between clusters. This is used if you use the function for testing +differential gene expression between custom sets, and is set automatically +to match the parameters used in \code{clusterWiseDEtest}.} } \value{ The function causes the scClustViz Shiny GUI app to open in a @@ -103,8 +107,7 @@ is being compared in the tests, see the "Value" section. } \examples{ \dontrun{ - data_for_scClustViz <- readFromSeurat(your_seurat_object, - convertGeneIDs=F) + data_for_scClustViz <- readFromSeurat(your_seurat_object) rm(your_seurat_object) # All the data scClustViz needs is in 'data_for_scClustViz'. @@ -122,7 +125,7 @@ is being compared in the tests, see the "Value" section. runShiny(system.file("e13cortical_forViz.RData",package="MouseCortex"), # Load input file (E13.5 data) from package directory. outPath=".", - # Save any further analysis performed in the app to the + # Save any further analysis performed in the app to the # working directory rather than library directory. annotationDB="org.Mm.eg.db", # This is an optional argument, but will add annotations. @@ -144,8 +147,8 @@ is being compared in the tests, see the "Value" section. } \seealso{ -\code{\link{readFromSeurat}} or \code{\link{readFromManual}} for reading in - data to generate the first input object for this function, and - \code{\link{clusterWiseDEtest}} to do the differential expression testing to generate - the second input object for this function. +\code{\link{readFromSeurat}} or \code{\link{readFromManual}} for + reading in data to generate the first input object for this function, and + \code{\link{clusterWiseDEtest}} to do the differential expression testing + to generate the second input object for this function. }