diff --git a/404.html b/404.html index 6d63175..9b4052b 100644 --- a/404.html +++ b/404.html @@ -22,7 +22,7 @@ - + diff --git a/SVP_files/figure-html/FigGbv-1.png b/SVP_files/figure-html/FigGbv-1.png new file mode 100644 index 0000000..c824af2 Binary files /dev/null and b/SVP_files/figure-html/FigGbv-1.png differ diff --git a/SVP_files/figure-html/HeatmapCelltype-1.png b/SVP_files/figure-html/HeatmapCelltype-1.png new file mode 100644 index 0000000..b27c826 Binary files /dev/null and b/SVP_files/figure-html/HeatmapCelltype-1.png differ diff --git a/SVP_files/figure-html/LISAGFig-1.png b/SVP_files/figure-html/LISAGFig-1.png new file mode 100644 index 0000000..7524334 Binary files /dev/null and b/SVP_files/figure-html/LISAGFig-1.png differ diff --git a/SVP_files/figure-html/LISAIFig-1.png b/SVP_files/figure-html/LISAIFig-1.png new file mode 100644 index 0000000..34618c2 Binary files /dev/null and b/SVP_files/figure-html/LISAIFig-1.png differ diff --git a/SVP_files/figure-html/runGLOBALBVCellTypeCancerSEA-1.png b/SVP_files/figure-html/runGLOBALBVCellTypeCancerSEA-1.png new file mode 100644 index 0000000..243ca45 Binary files /dev/null and b/SVP_files/figure-html/runGLOBALBVCellTypeCancerSEA-1.png differ diff --git a/SVP_files/figure-html/runSGSAFree-1.png b/SVP_files/figure-html/runSGSAFree-1.png new file mode 100644 index 0000000..67d6ec0 Binary files /dev/null and b/SVP_files/figure-html/runSGSAFree-1.png differ diff --git a/SVP_files/figure-html/runSGSARef-1.png b/SVP_files/figure-html/runSGSARef-1.png new file mode 100644 index 0000000..c3cbe60 Binary files /dev/null and b/SVP_files/figure-html/runSGSARef-1.png differ diff --git a/index.html b/index.html index 6e4fc15..0645d32 100644 --- a/index.html +++ b/index.html @@ -22,7 +22,7 @@ - + @@ -206,7 +206,7 @@

Predicting cell states and their variability in single-cell or Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University
xshuangbin@163.com and guangchuangyu@gmail.com -

2024-12-12

+

2024-12-13

Introduction

diff --git a/overview-of-svp.html b/overview-of-svp.html index 3331bb1..0c2961a 100644 --- a/overview-of-svp.html +++ b/overview-of-svp.html @@ -22,7 +22,7 @@ - + diff --git a/quantification-of-cell-states-using-svp.html b/quantification-of-cell-states-using-svp.html index 9d5a946..7240a10 100644 --- a/quantification-of-cell-states-using-svp.html +++ b/quantification-of-cell-states-using-svp.html @@ -22,7 +22,7 @@ - + @@ -265,16 +265,16 @@

2.1 Quantification of CancerSEA e ) #> Building the nearest neighbor graph with the distance between features and #> cells ... -#> elapsed time is 6.205000 seconds +#> elapsed time is 3.968000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... -#> elapsed time is 0.043000 seconds +#> elapsed time is 0.040000 seconds #> Calculating the affinity score using random walk with restart ... -#> elapsed time is 0.226000 seconds +#> elapsed time is 0.117000 seconds #> Tidying the result of random walk with restart ... -#> elapsed time is 0.009000 seconds +#> elapsed time is 0.007000 seconds #> ORA analysis ... -#> elapsed time is 0.011000 seconds +#> elapsed time is 0.010000 seconds # Then the result was added to gsvaExps to return a SVPExperiment object, the result # can be extracted with gsvaExp, you can view more information via help(gsvaExp). @@ -372,16 +372,16 @@

2.1 Quantification of CancerSEA e ) #> Building the nearest neighbor graph with the distance between features and #> cells ... -#> elapsed time is 5.897000 seconds +#> elapsed time is 3.566000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... -#> elapsed time is 0.030000 seconds +#> elapsed time is 0.026000 seconds #> Calculating the affinity score using random walk with restart ... -#> elapsed time is 0.288000 seconds +#> elapsed time is 0.206000 seconds #> Tidying the result of random walk with restart ... -#> elapsed time is 0.018000 seconds +#> elapsed time is 0.012000 seconds #> ORA analysis ... -#> elapsed time is 0.028000 seconds +#> elapsed time is 0.021000 seconds # The result also was added to gsvaExps pdac_a_spe @@ -406,7 +406,7 @@

2.2 Quantification of cell-type

2.2.1 Quantification of cell-type using the pre-established marker gene sets

We can use the pre-established marker gene sets to quantifying the cell-type for each captured location. Here, we used the markers obtained from the CARD, which was used as input for CARDfree command.

-

(ref:runSGSA_free) The pie plot of cell-type activity

+
# The marker gene sets
 load(url("https://github.com/YingMa0107/CARD/blob/master/data/markerList.RData?raw=true"))
 markerList |> names()
@@ -461,16 +461,16 @@ 

2.2.1 Quantification of cell-type ) #> Building the nearest neighbor graph with the distance between features and #> cells ... -#> elapsed time is 6.086000 seconds +#> elapsed time is 3.473000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... -#> elapsed time is 0.018000 seconds +#> elapsed time is 0.012000 seconds #> Calculating the affinity score using random walk with restart ... -#> elapsed time is 0.234000 seconds +#> elapsed time is 0.126000 seconds #> Tidying the result of random walk with restart ... -#> elapsed time is 0.012000 seconds +#> elapsed time is 0.007000 seconds #> ORA analysis ... -#> elapsed time is 0.016000 seconds +#> elapsed time is 0.011000 seconds # Then we can visualize the cell-type activity via sc_spatial of SVP with pie plot gsvaExp(pdac_a_spe, "CellTypeFree") |> @@ -481,14 +481,14 @@

2.2.1 Quantification of cell-type color = NA, pie.radius.scale = .9 )

-
-(ref:runSGSA_free) +
+The pie plot of cell-type activity

-(#fig:runSGSA_free)(ref:runSGSA_free) +Figure 2.2: The pie plot of cell-type activity

We cal also display each cell-type activity with heat-map according to the spatial coordinate.

-

(ref:heatmap_celltype) the heatmap of each cell-type activity

+
pdac_a_spe |> gsvaExp(3) |> 
    sc_spatial(
      features = sort(names(markerList)),
@@ -500,10 +500,10 @@ 

2.2.1 Quantification of cell-type scale_color_viridis_c() #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale.

-
-(ref:heatmap_celltype) +
+the heatmap of each cell-type activity

-(#fig:heatmap_celltype)(ref:heatmap_celltype) +Figure 2.3: the heatmap of each cell-type activity

@@ -563,7 +563,7 @@

2.2.2 Quantification of cell-type #> RBCs Fibroblasts #> 9 97

Then, the quantification of the cell-type gene signatures can also be performed via runSGSA with the marker gene sets.

-

(ref:runSGSA_ref) The pie plot of cell-type activity with the marker gene sets from reference single-cell transcriptome

+
# use the maker gene from the reference single-cell transcriptome
 pdac_a_spe <- runSGSA(
   pdac_a_spe,
@@ -572,16 +572,16 @@ 

2.2.2 Quantification of cell-type ) #> Building the nearest neighbor graph with the distance between features and #> cells ... -#> elapsed time is 6.065000 seconds +#> elapsed time is 3.518000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... -#> elapsed time is 0.020000 seconds +#> elapsed time is 0.011000 seconds #> Calculating the affinity score using random walk with restart ... -#> elapsed time is 0.228000 seconds +#> elapsed time is 0.114000 seconds #> Tidying the result of random walk with restart ... -#> elapsed time is 0.010000 seconds +#> elapsed time is 0.007000 seconds #> ORA analysis ... -#> elapsed time is 0.012000 seconds +#> elapsed time is 0.010000 seconds pdac_a_spe |> gsvaExp('CellTypeRef') |> @@ -592,10 +592,10 @@

2.2.2 Quantification of cell-type color = NA, pie.radius.scale = .9 )

-
-(ref:runSGSA_ref) +
+The pie plot of cell-type activity with the marker gene sets from reference single-cell transcriptome

-(#fig:runSGSA_ref)(ref:runSGSA_ref) +Figure 2.4: The pie plot of cell-type activity with the marker gene sets from reference single-cell transcriptome

We can found that the result based on marker gene sets from reference single-cell transcriptome is similar with the result based on marker gene sets from previous reports.

diff --git a/reference-keys.txt b/reference-keys.txt index d9400ff..f9f0108 100644 --- a/reference-keys.txt +++ b/reference-keys.txt @@ -1,5 +1,12 @@ fig:overview fig:CancerSEA +fig:runSGSAFree +fig:HeatmapCelltype +fig:runSGSARef +fig:LISAGFig +fig:LISAIFig +fig:FigGbv +fig:runGLOBALBVCellTypeCancerSEA fig:runLOCALBV overview-of-svp quantification-of-cell-states-using-svp diff --git a/search_index.json b/search_index.json index 38b9c01..499b996 100644 --- a/search_index.json +++ b/search_index.json @@ -1 +1 @@ -[["index.html", "Predicting cell states and their variability in single-cell or spatial omics data Introduction", " Predicting cell states and their variability in single-cell or spatial omics data Shuangbin Xu and GuangChuang Yu Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University xshuangbin@163.com and guangchuangyu@gmail.com 2024-12-12 Introduction Understanding the spatial distribution and interplay of cell states in tissue is critical for elucidating tissue formation and function. Single-cell and spatial omics present a promising approach to addressing this need. Traditional methods typically include the identification of highly variable genes, dimensionality reduction, clustering, and the annotation of cells or functions based on gene over-expression. Nevertheless, these qualitative approaches are inadequate for accurately mapping the distributions of spatial features. To address this, integrating biomedical knowledge such as Gene Ontology, KEGG, Reactome, transcription factors, and cell-type marker genes directly allows for the evaluation of cell states from gene expression data, creating quantitative functional pathway profiles at the single captured location. After quantifying cell functions, analyzing their spatial distribution and co-distribution with other features can provide deeper insights into related biological questions. We focus on three aspects: the spatial variability of cell functions, regions where these functions cluster, and their co-distribution patterns with other features. Although existing tools such as SPARK-X(Zhu, Sun, and Zhou 2021), nnSVG(Weber et al. 2023), SpatialDE(Svensson, Teichmann, and Stegle 2018), SpaGFT(Chang et al. 2024), Seurat(Hao et al. 2023), and Squidpy(Palla et al. 2022) facilitate the exploration of spatially variable genes, they are primarily designed for gene-level analysis and lack the capability to investigate the spatial co-distribution of features. Additionally, many of these tools, including SpatialDE(Svensson, Teichmann, and Stegle 2018), SPARK(Sun, Zhu, and Zhou 2020), MERINGUE(Miller et al. 2021), and nnSVG(Weber et al. 2023), face challenges in handling large-scale spatial transcriptome data due to high memory consumption and low computational efficiency. To fill the gaps, we developed SVP to accurately predict cell states, explore their spatial distribution, and assess their spatial relationship with other features. References "],["overview-of-svp.html", "1 Overview of SVP", " 1 Overview of SVP The evaluation of functional status at individual locations captured by SVP is achieved using Multiple Correspondence Analysis (MCA) for dimensionality reduction. This process employs a standardized gene expression matrix to project both cells and genes into a unified MCA space. It has been established that this method allows for the calculation of distances not only between genes and cells but also between cells and genes, thereby facilitating the assessment of their associations(Cortal et al. 2021). Proximity in this space indicates a stronger relationship. These calculated distances are crucial for constructing a weighted k-nearest neighbors (KNN) network, linking each cell or gene to its most relevant counterparts. To discern features with varying levels of proximity, distances are first normalized to a 0-1 scale, with closer distances approaching 1 and farther distances approaching 0. This normalization is followed by division by the total distance among the nearest features, thereby assigning greater connection weights to closer features. Subsequently, databases of known biological knowledge, such as transcription factor target gene sets, Reactome functional gene sets, and cell-type marker gene sets, serve as initial seeds. Random walks on the constructed weighted KNN network yield preliminary functional state activity scores for each location. To mitigate potential biases introduced by dimensionality reduction, a hypergeometric distribution test enhances the enrichment analysis of top-ranking genes extracted directly from the expression matrix at each location. These analyses provide weights for functional activities, culminating in the derivation of functional activity scores at the single captured location (Fig. 1A). To identify spatially variable cell functions, we first established a cell neighbor weight matrix based on spot locations using the Delaunay triangulation (default) or KNN algorithm. This weight matrix, alongside global autocorrelation analyses such as Moran’s I (default), Geary’s C, or Getis-Ord’s G, facilitated the identification of spatially variable cell functions or gene characteristics. Additionally, we utilized the same cell neighbor weight matrix and local spatial autocorrelation algorithms (Local Getis-Ord or Local Moran) to delineate the local spatial distribution of these variable features (Fig. 1B). To examine spatial co-distribution among cell functions, we designed a bivariate spatial global and local autocorrelation algorithm employing the Lee index. This approach enables the assessment of correlation between different cell characteristics in their spatial distribution (Fig. 1C). Figure 1.1: Overview of SVP References "],["quantification-of-cell-states-using-svp.html", "2 Quantification of cell states using SVP 2.1 Quantification of CancerSEA etc function 2.2 Quantification of cell-type", " 2 Quantification of cell states using SVP SVP can integrate gene or protein lists from existing biomedical databases like cell-type markers (PanglaoDB, CellMarkers), gene ontology (GO), molecular features (MSigDB), and pathways (KEGG, Reactome, Wikipathways), transcription factors, and disease ontology, to assess cell states. Of course, the cell-type markers also can be extracted from reference single cell data. In this vignette, we use the spatial transcriptome and single cell transcriptome of a human pancreatic ductal adenocarcinomas from the article(Moncada et al. 2020) to demonstrate. Here, we first perform the normalization by logNormCounts of scuttle and the MCA dimensionality reduction by runMCA of SVP. # load the packages required in the vignette library(SpatialExperiment) library(SingleCellExperiment) library(scuttle) library(scran) library(SVP) library(ggsc) library(ggplot2) library(magrittr) # load the spatial transcriptome, it is a SpatialExperiment object # to build the object, you can refer to the SpatialExperiment package pdac_a_spe <- readRDS(url("https://raw.githubusercontent.com/xiangpin/Data_for_Vignette_SVP/main/data/pdac_a_spe.rds")) # we had removed the genes that did not express in any captured location. # First, we use logNormCounts of scuttle to normalize the spatial transcriptome pdac_a_spe <- logNormCounts(pdac_a_spe) # Now the normalized counts was added to the assays of pdac_a_spe as `logcounts` # it can be extracted via logcounts(pdac_a_spe) or assay(pdac_a_spe, 'logcounts') # Next, we use `runMCA` of `SVP` to preform the MCA dimensionality reduction pdac_a_spe <- runMCA(pdac_a_spe, assay.type = 'logcounts') #> Computing Fuzzy Matrix #> Computing SVD #> Computing Coordinates # The MCA result was added to the reducedDims, it can be extracted by reducedDim(pdac_a_spe, 'MCA') # more information can refer to SingleCellExperiment package pdac_a_spe #> class: SpatialExperiment #> dim: 13160 428 #> metadata(0): #> assays(2): counts logcounts #> rownames(13160): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): sample_id cluster_domain sizeFactor #> reducedDimNames(1): MCA #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(1): sample_id 2.1 Quantification of CancerSEA etc function SVP accepts SingleCellExperiment or SpatialExperiment object as input. The gene set should be a named list object, and the elements of the list must be included in the row names of the SingleCellExperiment or SpatialExperiment object. To quantifying the CancerSEA or other function for each captured location, the command runSGSA is used. # This is gene list of Cancer Single-cell State Atlas to comprehensively decode distinct # functional states of cancer cells at single cell resolution. data(CancerSEASymbol) # It contains 14 types of cancer single-cell state, you can obtain # more information via ?CancerSEASymbol names(CancerSEASymbol) #> [1] "Angiogenesis" "Apoptosis" "Cell_Cycle" "Differentiation" #> [5] "DNA_damage" "DNA_repair" "EMT" "Hypoxia" #> [9] "Inflammation" "Invasion" "Metastasis" "Proliferation" #> [13] "Quiescence" "Stemness" pdac_a_spe <- runSGSA(pdac_a_spe, gset.idx.list = CancerSEASymbol, # The target gene set list assay.type = 'logcounts', # The name of assays of gene profiler gsvaExp.name = 'CancerSEA' # The name of result to save to gsvaExps slot ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 6.205000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.043000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.226000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.009000 seconds #> ORA analysis ... #> elapsed time is 0.011000 seconds # Then the result was added to gsvaExps to return a SVPExperiment object, the result # can be extracted with gsvaExp, you can view more information via help(gsvaExp). pdac_a_spe #> class: SVPExperiment #> dim: 13160 428 #> metadata(0): #> assays(2): counts logcounts #> rownames(13160): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): sample_id cluster_domain sizeFactor #> reducedDimNames(1): MCA #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): #> gsvaExps names(1) : CancerSEA gsvaExpNames(pdac_a_spe) #> [1] "CancerSEA" # The result is also a SingleCellExperiment or SpatialExperiment. gsvaExp(pdac_a_spe, 'CancerSEA') #> class: SpatialExperiment #> dim: 14 428 #> metadata(0): #> assays(1): affi.score #> rownames(14): Angiogenesis Apoptosis ... Quiescence Stemness #> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate #> geneSets #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): cluster_domain sizeFactor sample_id #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): # We use sc_spatial of ggsc to visualize the distribution of each functions # Note: when the number of features is too large, we reco gsvaExp(pdac_a_spe, 'CancerSEA') %>% sc_spatial( features = rownames(.), mapping = aes(x = x, y = y), geom = geom_bgpoint, pointsize = 1.2, ncol = 4, common.legend = FALSE # The output is a patchwork ) & scale_color_viridis_c(option='B', guide='none') #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. Figure 2.1: The heatmap of each CancerSEA function # The other function also can be quantified with runSGSA # the gset.idx.list supports the named gene list, gson object or # locate gmt file or html url of gmt. pdac_a_spe <- runSGSA( pdac_a_spe, gset.idx.list = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.2.Hs/h.all.v2023.2.Hs.symbols.gmt", gsvaExp.name = 'hallmark', assay.type = 'logcounts' # default ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 5.897000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.030000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.288000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.018000 seconds #> ORA analysis ... #> elapsed time is 0.028000 seconds # The result also was added to gsvaExps pdac_a_spe #> class: SVPExperiment #> dim: 13160 428 #> metadata(0): #> assays(2): counts logcounts #> rownames(13160): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): sample_id cluster_domain sizeFactor #> reducedDimNames(1): MCA #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): #> gsvaExps names(2) : CancerSEA hallmark 2.2 Quantification of cell-type To quantifying the cell-type for each captured location, the gene set should be the marker gene sets of cell-type. It can be the pre-established marker gene sets, such as the markers from CellMarkers or PanglaoDB. High-quality cell marker genes from past reports or your own compilations are acceptable. In addition, It can also be extracted from the reference single cell data. The following sections will detail the specific operations of each approach. 2.2.1 Quantification of cell-type using the pre-established marker gene sets We can use the pre-established marker gene sets to quantifying the cell-type for each captured location. Here, we used the markers obtained from the CARD, which was used as input for CARDfree command. (ref:runSGSA_free) The pie plot of cell-type activity # The marker gene sets load(url("https://github.com/YingMa0107/CARD/blob/master/data/markerList.RData?raw=true")) markerList |> names() #> [1] "Acinar_cells" #> [2] "Ductal_terminal_ductal_like" #> [3] "Ductal_CRISP3_high-centroacinar_like" #> [4] "Cancer_clone_A" #> [5] "Ductal_MHC_Class_II" #> [6] "Cancer_clone_B" #> [7] "mDCs_A" #> [8] "Ductal_APOL1_high-hypoxic" #> [9] "Tuft_cells" #> [10] "mDCs_B" #> [11] "pDCs" #> [12] "Endocrine_cells" #> [13] "Endothelial_cells" #> [14] "Macrophages_A" #> [15] "Mast_cells" #> [16] "Macrophages_B" #> [17] "T_cells_&_NK_cells" #> [18] "Monocytes" #> [19] "RBCs" #> [20] "Fibroblasts" # We can view the marker gene number of each cell-type. lapply(markerList, length) |> unlist() #> Acinar_cells Ductal_terminal_ductal_like #> 49 13 #> Ductal_CRISP3_high-centroacinar_like Cancer_clone_A #> 34 81 #> Ductal_MHC_Class_II Cancer_clone_B #> 7 110 #> mDCs_A Ductal_APOL1_high-hypoxic #> 292 47 #> Tuft_cells mDCs_B #> 128 209 #> pDCs Endocrine_cells #> 134 182 #> Endothelial_cells Macrophages_A #> 165 210 #> Mast_cells Macrophages_B #> 286 167 #> T_cells_&_NK_cells Monocytes #> 89 160 #> RBCs Fibroblasts #> 92 224 pdac_a_spe <- runSGSA( pdac_a_spe, gset.idx.list = markerList, gsvaExp.name = 'CellTypeFree', assay.type = 'logcounts' ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 6.086000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.018000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.234000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.012000 seconds #> ORA analysis ... #> elapsed time is 0.016000 seconds # Then we can visualize the cell-type activity via sc_spatial of SVP with pie plot gsvaExp(pdac_a_spe, "CellTypeFree") |> sc_spatial( features = sort(names(markerList)), mapping = aes(x = x, y = y), plot.pie = TRUE, color = NA, pie.radius.scale = .9 ) (#fig:runSGSA_free)(ref:runSGSA_free) We cal also display each cell-type activity with heat-map according to the spatial coordinate. (ref:heatmap_celltype) the heatmap of each cell-type activity pdac_a_spe |> gsvaExp(3) |> sc_spatial( features = sort(names(markerList)), mapping = aes(x = x, y = y), geom = geom_bgpoint, pointsize = 1.2, ncol = 4 ) + scale_color_viridis_c() #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. (#fig:heatmap_celltype)(ref:heatmap_celltype) 2.2.2 Quantification of cell-type using the reference single-cell data This approach requires obtaining the cell-type marker genes from the reference single-cell data. First, the reference single cell data should be normalized. Here, the command of logNormCounts of scuttle is used. Next, the MCA dimensionality reduction is performed by runMCA of SVP. Finally, the marker gene sets of cell-type are extracted via runDetectMarker of SVP. # load the reference single cell transcriptome # it is a SingleCellExperiment pdac_a_sce <- readRDS(url("https://raw.githubusercontent.com/xiangpin/Data_for_Vignette_SVP/main/data/pdac_a_sce.rds")) pdac_a_sce #> class: SingleCellExperiment #> dim: 15418 1926 #> metadata(0): #> assays(1): counts #> rownames(15418): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(1926): Cell1 Cell2 ... Cell1925 Cell1926 #> colData names(1): Cell #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): pdac_a_sce <- logNormCounts(pdac_a_sce) # obtain the top high variable genes set.seed(123) stats <- modelGeneVar(pdac_a_sce) hvgs <- getTopHVGs(stats) # perform the MCA reduction. pdac_a_sce <- pdac_a_sce |> runMCA(assay.type = 'logcounts', subset.row=hvgs[seq(2000)]) #> Computing Fuzzy Matrix #> Computing SVD #> Computing Coordinates # obtain the marker gene sets from reference single-cell transcriptome based on MCA space refmakergene <- runDetectMarker(pdac_a_sce, group.by='Cell', ntop=200, present.prop.in.group=.2, present.prop.in.sample=.5) refmakergene |> lapply(length) |> unlist() #> Acinar cells Ductal - terminal ductal like #> 53 82 #> Ductal - CRISP3 high/centroacinar like Cancer clone A #> 90 90 #> Ductal - MHC Class II Cancer clone B #> 72 90 #> mDCs A Ductal - APOL1 high/hypoxic #> 89 92 #> Tuft cells mDCs B #> 87 71 #> pDCs Endocrine cells #> 50 60 #> Endothelial cells Macrophages A #> 81 76 #> Mast cells Macrophages B #> 120 48 #> T cells & NK cells Monocytes #> 39 56 #> RBCs Fibroblasts #> 9 97 Then, the quantification of the cell-type gene signatures can also be performed via runSGSA with the marker gene sets. (ref:runSGSA_ref) The pie plot of cell-type activity with the marker gene sets from reference single-cell transcriptome # use the maker gene from the reference single-cell transcriptome pdac_a_spe <- runSGSA( pdac_a_spe, gset.idx.list = refmakergene, gsvaExp.name = 'CellTypeRef' ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 6.065000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.020000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.228000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.010000 seconds #> ORA analysis ... #> elapsed time is 0.012000 seconds pdac_a_spe |> gsvaExp('CellTypeRef') |> sc_spatial( features = sort(names(refmakergene)), mapping = aes(x = x, y = y), plot.pie = TRUE, color = NA, pie.radius.scale = .9 ) (#fig:runSGSA_ref)(ref:runSGSA_ref) We can found that the result based on marker gene sets from reference single-cell transcriptome is similar with the result based on marker gene sets from previous reports. References "],["spatial-statistical-analysis.html", "3 Spatial statistical analysis 3.1 Univariate spatial statistical analysis 3.2 Bivariate spatial statistical analysis", " 3 Spatial statistical analysis SVP implements several spatial statistical algorithms to explore the spatial distribution of cell function or other features from three aspects: whether cell function states have spatial variability, where spatially variable cell functions cluster, and the co-distribution patterns of spatially variable cell functions with other features. The following sections provide a detailed description of each aspect. 3.1 Univariate spatial statistical analysis This section addresses whether the feature is spatially variable and identifies its high-cluster areas. In detail, to identify the spatial variable features, SVP implements three spatial autocorrelation algorithms using Rcpp and RcppParallel: global Moran’s I, Geary’s C and Getis-Ord’s G. To identify the spatial high-cluster areas, SVP implements two local univariate spatial statistical algorithms using Rcpp and RcppParallel: local Getis-Ord’s G and local Moran’s I. 3.1.1 Identification of spatial variable features Here, we identify the spatial variable cell-types using runDetectSVG, which implements the global Moran’s I, Geary’s C and Getis-Ord’s G algorithms. # First approach: # we can first obtain the target gsvaExp with gsvaExp names via # gsvaExp() function, for example, CellTypeFree pdac_a_spe |> gsvaExp("CellTypeFree") #> class: SpatialExperiment #> dim: 20 428 #> metadata(0): #> assays(1): affi.score #> rownames(20): Acinar_cells Ductal_terminal_ductal_like ... RBCs #> Fibroblasts #> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate #> geneSets #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): cluster_domain sizeFactor sample_id #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): # Then we use runDetectSVG with method = 'moran' to identify the spatial variable # cell-types with global Moran's I. celltype_free.I <- pdac_a_spe |> gsvaExp("CellTypeFree") |> runDetectSVG( assay.type = 'affi.score', # because default is 'logcounts', it should be adjused method = 'moransi', action = 'only' ) #> Identifying the spatially variable gene sets (pathway) based on moransi #> elapsed time is 0.060000 seconds celltype_free.I |> dplyr::arrange(rank) |> head() #> obs expect.moransi sd.moransi #> Cancer_clone_A 0.7402441 -0.00234192 0.02807961 #> Cancer_clone_B 0.7187722 -0.00234192 0.02806609 #> Acinar_cells 0.6959057 -0.00234192 0.02788079 #> mDCs_A 0.5823319 -0.00234192 0.02760418 #> Ductal_CRISP3_high-centroacinar_like 0.5605952 -0.00234192 0.02794187 #> Endothelial_cells 0.5362277 -0.00234192 0.02757573 #> Z.moransi pvalue padj rank #> Cancer_clone_A 26.44573 2.042568e-154 4.085135e-153 1 #> Cancer_clone_B 25.69343 6.921386e-146 6.921386e-145 2 #> Acinar_cells 25.04404 1.013736e-138 6.758238e-138 3 #> mDCs_A 21.18062 7.205485e-100 3.602743e-99 4 #> Ductal_CRISP3_high-centroacinar_like 20.14672 1.437699e-90 5.750794e-90 5 #> Endothelial_cells 19.53057 3.018138e-85 1.006046e-84 6 # Second approach: # the result was added to the original object by specific the # gsvaexp argument in the runDetectSVG pdac_a_spe <- pdac_a_spe |> runDetectSVG( gsvaexp = 'CellTypeFree', method = 'moran' ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on moransi #> elapsed time is 0.077000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.moransi' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified #> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment` #> or `SpatialExperiment`),then also using `svDf()` to extract. gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi") |> dplyr::arrange(rank) |> head() #> obs expect.moransi sd.moransi #> Cancer_clone_A 0.7402441 -0.00234192 0.02807961 #> Cancer_clone_B 0.7187722 -0.00234192 0.02806609 #> Acinar_cells 0.6959057 -0.00234192 0.02788079 #> mDCs_A 0.5823319 -0.00234192 0.02760418 #> Ductal_CRISP3_high-centroacinar_like 0.5605952 -0.00234192 0.02794187 #> Endothelial_cells 0.5362277 -0.00234192 0.02757573 #> Z.moransi pvalue padj rank #> Cancer_clone_A 26.44573 2.042568e-154 4.085135e-153 1 #> Cancer_clone_B 25.69343 6.921386e-146 6.921386e-145 2 #> Acinar_cells 25.04404 1.013736e-138 6.758238e-138 3 #> mDCs_A 21.18062 7.205485e-100 3.602743e-99 4 #> Ductal_CRISP3_high-centroacinar_like 20.14672 1.437699e-90 5.750794e-90 5 #> Endothelial_cells 19.53057 3.018138e-85 1.006046e-84 6 The obs is the Moran’s I, expect.moransi is the expect Moran’s I (E(I)). Moran’s I quantifies the correlation between a single feature at a specific spatial point or region and its neighboring points or regions. Global Moran’s I usually ranges from -1 to 1, Global Moran’s I significantly above E(I) indicate a distinct spatial pattern or spatial clustering, which occurs when neighboring captured locations tend to have similar feature value. Global Moran’s I around E(I) suggest random spatial expression, that is, absence of spatial pattern. Global Moran’s I significantly below E(I) imply a chessboard-like pattern or dispersion. # using Geary's C pdac_a_spe <- pdac_a_spe |> runDetectSVG( gsvaexp = 'CellTypeFree', method = 'geary' ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on gearysc #> elapsed time is 0.046000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.gearysc' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified #> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment` #> or `SpatialExperiment`),then also using `svDf()` to extract. gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf('sv.gearysc') |> dplyr::arrange(rank) |> head() #> obs expect.gearysc sd.gearysc #> Cancer_clone_A 0.2709051 1 0.02873561 #> Cancer_clone_B 0.2880919 1 0.02879022 #> Acinar_cells 0.3055211 1 0.02952587 #> Ductal_CRISP3_high-centroacinar_like 0.4213014 1 0.02928596 #> mDCs_A 0.4603825 1 0.03058249 #> Endothelial_cells 0.4843175 1 0.03068854 #> Z.gearysc pvalue padj rank #> Cancer_clone_A 25.37252 2.535621e-142 5.071243e-141 1 #> Cancer_clone_B 24.72743 2.711724e-135 2.711724e-134 2 #> Acinar_cells 23.52103 1.242716e-122 8.284772e-122 3 #> Ductal_CRISP3_high-centroacinar_like 19.76027 3.272367e-87 1.636183e-86 4 #> mDCs_A 17.64466 5.592678e-70 2.237071e-69 5 #> Endothelial_cells 16.80375 1.145504e-63 3.818347e-63 6 The obs is the Geary’s C. Global Geary’s C is another popular index of global spatial autocorrelation, but focuses more on differences between neighboring values. Global Geary’s C usually range from 0 to 2. Global Geary’s C significantly below E[C]=1 indicate a distinct spatial pattern or spatial clustering, the value significantly above E[C]=1 suggest a chessboard-like pattern or dispersion, the value around E[C]=1 imply absence of spatial pattern. # using Getis-ord's G pdac_a_spe <- pdac_a_spe |> runDetectSVG( gsvaexp = 'CellTypeFree', method = 'getis' ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on getisord #> elapsed time is 0.037000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.getisord' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified #> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment` #> or `SpatialExperiment`),then also using `svDf()` to extract. gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf('sv.getisord') |> dplyr::arrange(rank) |> head() #> obs expect.G sd.G #> Cancer_clone_A 0.007214987 0.00234192 0.0001844816 #> Cancer_clone_B 0.007406175 0.00234192 0.0001975938 #> Acinar_cells 0.007109879 0.00234192 0.0001920919 #> mDCs_A 0.005853463 0.00234192 0.0001651428 #> Ductal_CRISP3_high-centroacinar_like 0.005800695 0.00234192 0.0001749367 #> Endothelial_cells 0.005749155 0.00234192 0.0001750884 #> Z.G pvalue padj rank #> Cancer_clone_A 26.41492 4.617621e-154 9.235241e-153 1 #> Cancer_clone_B 25.62962 3.567592e-145 3.567592e-144 2 #> Acinar_cells 24.82124 2.644298e-136 1.762865e-135 3 #> mDCs_A 21.26368 1.231569e-100 6.157844e-100 4 #> Ductal_CRISP3_high-centroacinar_like 19.77158 2.615753e-87 1.046301e-86 5 #> Endothelial_cells 19.46009 1.196819e-84 3.989397e-84 6 The obs is the Getis-Ord’s G of each cell-type. Global Getis-Ord’s G measures global spatial autocorrelation by evaluating the clustering of high/low value, specifically identifying hot spots or cold spots. Global Getis-Ord’s G values have no fixed range, as they depend on the specific dataset and spatial weight matrix. Global Getis-Ord’s G significantly above E(G) indicate a distinct spatial pattern or spatial clustering, which occurs when neighboring captured locations tend to have similar feature value. Global Getis-Ord’s G around E(G) suggest random spatial expression, that is, absence of spatial pattern. Global Getis-Ord’s G significantly below E(G) imply a chessboard-like pattern or dispersion. So the Cancer clone A, Cancer clone B, and Acinar cells were more concentrated in the spatial distribution compare to others. 3.1.2 Identification of local spatial aggregation areas Next, we can identify the spatial high-cluster areas for the spatial variable features via runLISA, which provides two local spatial statistics algorithms: Local Getis-Ord’s G and Local Moran’s I celltypefree_lisares <- pdac_a_spe |> gsvaExp("CellTypeFree") %>% runLISA(features=rownames(.), assay.type=1) celltypefree_lisares$Cancer_clone_A |> head() #> Gi E.Gi Var.Gi Z.Gi Pr (z != E(Gi)) #> Spot1 0.0001746202 0.00234192 1.649943e-06 -1.687270 0.09155141 #> Spot2 0.0003158584 0.00234192 2.996272e-06 -1.170475 0.24180992 #> Spot3 0.0002774819 0.00234192 2.489785e-06 -1.308341 0.19075761 #> Spot4 0.0001673520 0.00234192 2.489501e-06 -1.378215 0.16813699 #> Spot5 0.0004016950 0.00234192 2.994463e-06 -1.121225 0.26219219 #> Spot6 0.0004403359 0.00234192 2.130895e-06 -1.302670 0.19268729 #> cluster.no.test cluster.test #> Spot1 Low NoSign #> Spot2 Low NoSign #> Spot3 Low NoSign #> Spot4 Low NoSign #> Spot5 Low NoSign #> Spot6 Low NoSign Local Getis-Ord’s G (Gi) measures the aggregation of a feature value in a specific spot and its neighbors, determining if locations are significantly clustered with high (hot spots) or low values (cold spots). Gi usually have no fixed range, a larger Gi indicates a stronger clustering of high expression for the feature in captured location and its neighboring spots, while a smaller Gi indicates a stronger clustering of low expression. (ref:LISA_G_fig) The result of local Getis-Ord for cell-type gsvaExp(pdac_a_spe, 'CellTypeFree') |> plot_lisa_feature( lisa.res = celltypefree_lisares[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')], assay.type = 1, gap_line_width = .18 ) (#fig:LISA_G_fig)(ref:LISA_G_fig) The highlighted area signifies that the feature is significantly clustered in this region. We also can use local Moran’s I method to identify the region. celltypefree_lisa.i <- pdac_a_spe |> gsvaExp("CellTypeFree") %>% runLISA( features = rownames(.), assay.type = 1, method = 'localmoran' ) celltypefree_lisa.i$Cancer_clone_A |> head() #> Ii E.Ii Var.Ii Z.Ii Pr (z != E(Ii)) #> Spot1 0.2411323 -0.0004394871 0.02049855 1.687270 0.09155141 #> Spot2 0.2610045 -0.0005894003 0.04994947 1.170475 0.24180992 #> Spot3 0.2907443 -0.0007044064 0.04962291 1.308341 0.19075761 #> Spot4 0.3124376 -0.0007329623 0.05163310 1.378215 0.16813699 #> Spot5 0.2792327 -0.0007358277 0.06234951 1.121225 0.26219219 #> Spot6 0.2256567 -0.0005002450 0.03014053 1.302670 0.19268729 #> cluster.no.test cluster.test #> Spot1 Low-Low NoSign #> Spot2 Low-Low NoSign #> Spot3 Low-Low NoSign #> Spot4 Low-Low NoSign #> Spot5 Low-Low NoSign #> Spot6 Low-Low NoSign Local Moran’s I (Ii) measures the spatial correlation between a feature’s values in a specific area and its neighbors, assessing the similarity or difference between a location’s value and those of its surroundings. Ii also have no fixed range, a larger Ii signifies a stronger positive correlation in a region and its neighbors, with clusters of high or low values. A smaller value indicates a negative correlation, where high values are next to low values, or vice versa. (ref:LISA_I_fig) The result of local Moran for cell-type gsvaExp(pdac_a_spe, "CellTypeFree") |> plot_lisa_feature( lisa.res = celltypefree_lisa.i[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')], clustertype = 'High-High', assay.type = 1, gap_line_width = .18 ) (#fig:LISA_I_fig)(ref:LISA_I_fig) 3.2 Bivariate spatial statistical analysis Next, to explore the bivariate spatial co-distribution, SVP implements global and local bivariate spatial statistics algorithms. Global bivariate spatial statistics algorithm can be used to identify whether the bivariate is co-cluster in the specific space. Local bivariate spatial algorithm is to find the areas of spatial co-clustering of bivariate. 3.2.1 Global bivariate spatial analysis Here, we use runGLOBALBV of SVP to explore the spatial co-distribution between the cell-types. gbv.res <- pdac_a_spe |> gsvaExp('CellTypeFree') |> runGLOBALBV( features1 = names(markerList), assay.type = 1, permutation = NULL, # if permutation is NULL, mantel test will be used to calculated the pvalue add.pvalue = TRUE, alternative = 'greater' # one-side test ) # gbv.res is a list contained Lee and pvalue matrix. gbv.res |> as_tbl_df(diag=FALSE, flag.clust = TRUE) |> dplyr::filter(grepl("Cancer", x)) |> dplyr::arrange(desc(Lee)) #> # A tibble: 35 × 4 #> x y Lee pvalue #> <fct> <fct> <dbl> <dbl> #> 1 Cancer_clone_A Cancer_clone_B 0.728 0 #> 2 Cancer_clone_B Fibroblasts 0.185 2.04e-27 #> 3 Cancer_clone_A Fibroblasts 0.180 5.82e-27 #> 4 Cancer_clone_B Endothelial_cells 0.000286 4.99e- 1 #> 5 Cancer_clone_A Endothelial_cells -0.0169 8.07e- 1 #> 6 Cancer_clone_B T_cells_&_NK_cells -0.0442 9.97e- 1 #> 7 Cancer_clone_A RBCs -0.0443 9.80e- 1 #> 8 Cancer_clone_B RBCs -0.0513 9.91e- 1 #> 9 Cancer_clone_A T_cells_&_NK_cells -0.0696 1.00e+ 0 #> 10 Cancer_clone_B Macrophages_A -0.0773 1.00e+ 0 #> # ℹ 25 more rows The Lee value typically ranges from -1 to 1. When the index is closer to 1, it indicates a strong positive spatial association between the two variables, meaning the high-value areas of one variable tend to overlap with the high-value areas of the other, and low-value areas similarly overlap, showing strong spatial consistency. Conversely, if the index is near -1, it suggests a strong negative spatial association, where the high-value areas of one variable tend to overlap with the low-value areas of the other. We developed a function plot_heatmap_globalbv to visualize the result of global bivariate spatial analysis (ref:fig_gbv) The results of clustering analysis of spatial distribution between cell-type # We can also display the result of global univariate spatial analysis moran.res <- gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi") # The result of local univariate spatial analysis also can be added lisa.f1.res <- gsvaExp(pdac_a_spe, "CellTypeFree", withColData = TRUE) |> cal_lisa_f1(lisa.res = celltypefree_lisares, group.by = 'cluster_domain') plot_heatmap_globalbv(gbv.res, moran.t=moran.res, lisa.t=lisa.f1.res) #> Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height, #> resolveHJust(x$just, : semi-transparency is not supported on this device: #> reported only once per page (#fig:fig_gbv)(ref:fig_gbv) We can explore the spatial co-cluster between the different type of gsvaExp. For example, we explore the spatial co-distribution between the CellTypeFree and CancerSEA of pdac_a_spe object. (ref:runGLOBALBV_CellType_CancerSEA) The heatmap of spatial correlation between cell tyep and CancerSEA states # If the number of features is excessive, we recommend selecting # those with spatial variability. # celltypeid <- gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf("sv.moransi") |> # dplyr::filter(padj<=0.01) |> rownames() celltypeid <- rownames(gsvaExp(pdac_a_spe, "CellTypeFree")) cancerseaid <- gsvaExp(pdac_a_spe, "CancerSEA") |> runDetectSVG(assay.type = 1, verbose=FALSE) |> svDf() |> dplyr::filter(padj <= 0.01) |> rownames() #> elapsed time is 0.037000 seconds runGLOBALBV( pdac_a_spe, gsvaexp = c("CellTypeFree", "CancerSEA"), gsvaexp.features = c(celltypeid, cancerseaid), permutation = NULL, add.pvalue = TRUE ) -> res.gbv.celltype.cancersea #> The `gsvaexp` is specified, the specified `gsvaExp` will be #> used to perform the analysis. plot_heatmap_globalbv(res.gbv.celltype.cancersea) (#fig:runGLOBALBV_CellType_CancerSEA)(ref:runGLOBALBV_CellType_CancerSEA) 3.2.2 Local bivariate spatial analysis After global bivariate spatial analysis, we found the Cancer cell and Fibroblasts show significant spatial co-aggregation. Next, we use runLOCALBV of SVP to identify the co-aggregation areas. lbv.res <- pdac_a_spe |> gsvaExp('CellTypeFree') |> runLOCALBV(features1='Fibroblasts', features2=c('Cancer_clone_A', "Cancer_clone_B"), assay.type=1) # The result can be added to gsvaExp, then visualized by ggsc. LISAsce(pdac_a_spe, lisa.res=lbv.res, gsvaexp.name='LBV.res') |> gsvaExp("LBV.res") %>% plot_lisa_feature(lisa.res = lbv.res, assay.type = 1) Figure 3.1: The heatmap of Local bivariate spatial analysis The highlighted area denotes the region where the two variables significantly co-aggregate in space. "],["session-information.html", "4 Session information", " 4 Session information Here is the output of sessionInfo() on the system on which this document was compiled: #> R Under development (unstable) (2024-12-10 r87437) #> Platform: x86_64-pc-linux-gnu #> Running under: CentOS Linux 7 (Core) #> #> Matrix products: default #> BLAS: /biostack/tools/devtools/R/devel_241212/lib64/R/lib/libRblas.so #> LAPACK: /biostack/tools/devtools/R/devel_241212/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0 #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C #> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 #> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 #> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> time zone: Asia/Shanghai #> tzcode source: system (glibc) #> #> attached base packages: #> [1] stats4 stats graphics grDevices utils datasets methods #> [8] base #> #> other attached packages: #> [1] scatterpie_0.2.4 magrittr_2.0.3 #> [3] ggplot2_3.5.1 SVP_0.99.0 #> [5] scran_1.35.0 scuttle_1.17.0 #> [7] ggsc_1.5.0 SpatialExperiment_1.17.0 #> [9] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0 #> [11] Biobase_2.67.0 GenomicRanges_1.59.1 #> [13] GenomeInfoDb_1.43.2 IRanges_2.41.2 #> [15] S4Vectors_0.45.2 BiocGenerics_0.53.3 #> [17] generics_0.1.3 MatrixGenerics_1.19.0 #> [19] matrixStats_1.4.1 bookdown_0.41 #> #> loaded via a namespace (and not attached): #> [1] RcppAnnoy_0.0.22 splines_4.5.0 #> [3] later_1.4.1 ggplotify_0.1.2 #> [5] tibble_3.2.1 polyclip_1.10-7 #> [7] fastDummies_1.7.4 lifecycle_1.0.4 #> [9] edgeR_4.5.1 globals_0.16.3 #> [11] lattice_0.22-6 MASS_7.3-61 #> [13] limma_3.63.2 plotly_4.10.4 #> [15] sass_0.4.9 rmarkdown_2.29 #> [17] jquerylib_0.1.4 yaml_2.3.10 #> [19] metapod_1.15.0 httpuv_1.6.15 #> [21] Seurat_5.1.0 sctransform_0.4.1 #> [23] spam_2.11-0 sp_2.1-4 #> [25] spatstat.sparse_3.1-0 reticulate_1.40.0 #> [27] cowplot_1.1.3 pbapply_1.7-2 #> [29] RColorBrewer_1.1-3 abind_1.4-8 #> [31] zlibbioc_1.53.0 Rtsne_0.17 #> [33] purrr_1.0.2 pracma_2.4.4 #> [35] yulab.utils_0.1.8 tweenr_2.0.3 #> [37] GenomeInfoDbData_1.2.13 ggrepel_0.9.6 #> [39] irlba_2.3.5.1 listenv_0.9.1 #> [41] spatstat.utils_3.1-1 tidytree_0.4.6 #> [43] goftest_1.2-3 RSpectra_0.16-2 #> [45] spatstat.random_3.3-2 dqrng_0.4.1 #> [47] fitdistrplus_1.2-1 parallelly_1.40.1 #> [49] DelayedMatrixStats_1.29.0 leiden_0.4.3.1 #> [51] codetools_0.2-20 DelayedArray_0.33.3 #> [53] ggforce_0.4.2 tidyselect_1.2.1 #> [55] aplot_0.2.3 UCSC.utils_1.3.0 #> [57] farver_2.1.2 ScaledMatrix_1.15.0 #> [59] spatstat.explore_3.3-3 jsonlite_1.8.9 #> [61] BiocNeighbors_2.1.2 progressr_0.15.1 #> [63] ggridges_0.5.6 survival_3.7-0 #> [65] tools_4.5.0 treeio_1.31.0 #> [67] ggstar_1.0.4 ica_1.0-3 #> [69] Rcpp_1.0.13-1 glue_1.8.0 #> [71] gridExtra_2.3 SparseArray_1.7.2 #> [73] xfun_0.49 dplyr_1.1.4 #> [75] withr_3.0.2 fastmap_1.2.0 #> [77] bluster_1.17.0 fansi_1.0.6 #> [79] digest_0.6.37 rsvd_1.0.5 #> [81] gridGraphics_0.5-1 R6_2.5.1 #> [83] mime_0.12 colorspace_2.1-1 #> [85] scattermore_1.2 tensor_1.5 #> [87] spatstat.data_3.1-4 utf8_1.2.4 #> [89] tidyr_1.3.1 data.table_1.16.4 #> [91] httr_1.4.7 htmlwidgets_1.6.4 #> [93] S4Arrays_1.7.1 uwot_0.2.2 #> [95] pkgconfig_2.0.3 gtable_0.3.6 #> [97] lmtest_0.9-40 XVector_0.47.0 #> [99] htmltools_0.5.8.1 dotCall64_1.2 #> [101] SeuratObject_5.0.2 scales_1.3.0 #> [103] png_0.1-8 spatstat.univar_3.1-1 #> [105] ggfun_0.1.8 knitr_1.49 #> [107] reshape2_1.4.4 rjson_0.2.23 #> [109] nlme_3.1-166 cachem_1.1.0 #> [111] zoo_1.8-12 stringr_1.5.1 #> [113] KernSmooth_2.23-24 parallel_4.5.0 #> [115] miniUI_0.1.1.1 pillar_1.9.0 #> [117] grid_4.5.0 vctrs_0.6.5 #> [119] RANN_2.6.2 promises_1.3.2 #> [121] BiocSingular_1.23.0 tidydr_0.0.5 #> [123] beachmat_2.23.4 xtable_1.8-4 #> [125] cluster_2.1.8 evaluate_1.0.1 #> [127] magick_2.8.5 locfit_1.5-9.10 #> [129] cli_3.6.3 compiler_4.5.0 #> [131] rlang_1.1.4 crayon_1.5.3 #> [133] future.apply_1.11.3 labeling_0.4.3 #> [135] plyr_1.8.9 fs_1.6.5 #> [137] stringi_1.8.4 viridisLite_0.4.2 #> [139] deldir_2.0-4 BiocParallel_1.41.0 #> [141] munsell_0.5.1 lazyeval_0.2.2 #> [143] spatstat.geom_3.3-4 Matrix_1.7-1 #> [145] RcppHNSW_0.6.0 patchwork_1.3.0 #> [147] sparseMatrixStats_1.19.0 future_1.34.0 #> [149] statmod_1.5.0 shiny_1.9.1 #> [151] ROCR_1.0-11 igraph_2.1.2 #> [153] RcppParallel_5.1.9 bslib_0.8.0 #> [155] ggtree_3.15.0 fastmatch_1.1-4 #> [157] ape_5.8 "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "Predicting cell states and their variability in single-cell or spatial omics data Introduction", " Predicting cell states and their variability in single-cell or spatial omics data Shuangbin Xu and GuangChuang Yu Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University xshuangbin@163.com and guangchuangyu@gmail.com 2024-12-13 Introduction Understanding the spatial distribution and interplay of cell states in tissue is critical for elucidating tissue formation and function. Single-cell and spatial omics present a promising approach to addressing this need. Traditional methods typically include the identification of highly variable genes, dimensionality reduction, clustering, and the annotation of cells or functions based on gene over-expression. Nevertheless, these qualitative approaches are inadequate for accurately mapping the distributions of spatial features. To address this, integrating biomedical knowledge such as Gene Ontology, KEGG, Reactome, transcription factors, and cell-type marker genes directly allows for the evaluation of cell states from gene expression data, creating quantitative functional pathway profiles at the single captured location. After quantifying cell functions, analyzing their spatial distribution and co-distribution with other features can provide deeper insights into related biological questions. We focus on three aspects: the spatial variability of cell functions, regions where these functions cluster, and their co-distribution patterns with other features. Although existing tools such as SPARK-X(Zhu, Sun, and Zhou 2021), nnSVG(Weber et al. 2023), SpatialDE(Svensson, Teichmann, and Stegle 2018), SpaGFT(Chang et al. 2024), Seurat(Hao et al. 2023), and Squidpy(Palla et al. 2022) facilitate the exploration of spatially variable genes, they are primarily designed for gene-level analysis and lack the capability to investigate the spatial co-distribution of features. Additionally, many of these tools, including SpatialDE(Svensson, Teichmann, and Stegle 2018), SPARK(Sun, Zhu, and Zhou 2020), MERINGUE(Miller et al. 2021), and nnSVG(Weber et al. 2023), face challenges in handling large-scale spatial transcriptome data due to high memory consumption and low computational efficiency. To fill the gaps, we developed SVP to accurately predict cell states, explore their spatial distribution, and assess their spatial relationship with other features. References "],["overview-of-svp.html", "1 Overview of SVP", " 1 Overview of SVP The evaluation of functional status at individual locations captured by SVP is achieved using Multiple Correspondence Analysis (MCA) for dimensionality reduction. This process employs a standardized gene expression matrix to project both cells and genes into a unified MCA space. It has been established that this method allows for the calculation of distances not only between genes and cells but also between cells and genes, thereby facilitating the assessment of their associations(Cortal et al. 2021). Proximity in this space indicates a stronger relationship. These calculated distances are crucial for constructing a weighted k-nearest neighbors (KNN) network, linking each cell or gene to its most relevant counterparts. To discern features with varying levels of proximity, distances are first normalized to a 0-1 scale, with closer distances approaching 1 and farther distances approaching 0. This normalization is followed by division by the total distance among the nearest features, thereby assigning greater connection weights to closer features. Subsequently, databases of known biological knowledge, such as transcription factor target gene sets, Reactome functional gene sets, and cell-type marker gene sets, serve as initial seeds. Random walks on the constructed weighted KNN network yield preliminary functional state activity scores for each location. To mitigate potential biases introduced by dimensionality reduction, a hypergeometric distribution test enhances the enrichment analysis of top-ranking genes extracted directly from the expression matrix at each location. These analyses provide weights for functional activities, culminating in the derivation of functional activity scores at the single captured location (Fig. 1A). To identify spatially variable cell functions, we first established a cell neighbor weight matrix based on spot locations using the Delaunay triangulation (default) or KNN algorithm. This weight matrix, alongside global autocorrelation analyses such as Moran’s I (default), Geary’s C, or Getis-Ord’s G, facilitated the identification of spatially variable cell functions or gene characteristics. Additionally, we utilized the same cell neighbor weight matrix and local spatial autocorrelation algorithms (Local Getis-Ord or Local Moran) to delineate the local spatial distribution of these variable features (Fig. 1B). To examine spatial co-distribution among cell functions, we designed a bivariate spatial global and local autocorrelation algorithm employing the Lee index. This approach enables the assessment of correlation between different cell characteristics in their spatial distribution (Fig. 1C). Figure 1.1: Overview of SVP References "],["quantification-of-cell-states-using-svp.html", "2 Quantification of cell states using SVP 2.1 Quantification of CancerSEA etc function 2.2 Quantification of cell-type", " 2 Quantification of cell states using SVP SVP can integrate gene or protein lists from existing biomedical databases like cell-type markers (PanglaoDB, CellMarkers), gene ontology (GO), molecular features (MSigDB), and pathways (KEGG, Reactome, Wikipathways), transcription factors, and disease ontology, to assess cell states. Of course, the cell-type markers also can be extracted from reference single cell data. In this vignette, we use the spatial transcriptome and single cell transcriptome of a human pancreatic ductal adenocarcinomas from the article(Moncada et al. 2020) to demonstrate. Here, we first perform the normalization by logNormCounts of scuttle and the MCA dimensionality reduction by runMCA of SVP. # load the packages required in the vignette library(SpatialExperiment) library(SingleCellExperiment) library(scuttle) library(scran) library(SVP) library(ggsc) library(ggplot2) library(magrittr) # load the spatial transcriptome, it is a SpatialExperiment object # to build the object, you can refer to the SpatialExperiment package pdac_a_spe <- readRDS(url("https://raw.githubusercontent.com/xiangpin/Data_for_Vignette_SVP/main/data/pdac_a_spe.rds")) # we had removed the genes that did not express in any captured location. # First, we use logNormCounts of scuttle to normalize the spatial transcriptome pdac_a_spe <- logNormCounts(pdac_a_spe) # Now the normalized counts was added to the assays of pdac_a_spe as `logcounts` # it can be extracted via logcounts(pdac_a_spe) or assay(pdac_a_spe, 'logcounts') # Next, we use `runMCA` of `SVP` to preform the MCA dimensionality reduction pdac_a_spe <- runMCA(pdac_a_spe, assay.type = 'logcounts') #> Computing Fuzzy Matrix #> Computing SVD #> Computing Coordinates # The MCA result was added to the reducedDims, it can be extracted by reducedDim(pdac_a_spe, 'MCA') # more information can refer to SingleCellExperiment package pdac_a_spe #> class: SpatialExperiment #> dim: 13160 428 #> metadata(0): #> assays(2): counts logcounts #> rownames(13160): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): sample_id cluster_domain sizeFactor #> reducedDimNames(1): MCA #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(1): sample_id 2.1 Quantification of CancerSEA etc function SVP accepts SingleCellExperiment or SpatialExperiment object as input. The gene set should be a named list object, and the elements of the list must be included in the row names of the SingleCellExperiment or SpatialExperiment object. To quantifying the CancerSEA or other function for each captured location, the command runSGSA is used. # This is gene list of Cancer Single-cell State Atlas to comprehensively decode distinct # functional states of cancer cells at single cell resolution. data(CancerSEASymbol) # It contains 14 types of cancer single-cell state, you can obtain # more information via ?CancerSEASymbol names(CancerSEASymbol) #> [1] "Angiogenesis" "Apoptosis" "Cell_Cycle" "Differentiation" #> [5] "DNA_damage" "DNA_repair" "EMT" "Hypoxia" #> [9] "Inflammation" "Invasion" "Metastasis" "Proliferation" #> [13] "Quiescence" "Stemness" pdac_a_spe <- runSGSA(pdac_a_spe, gset.idx.list = CancerSEASymbol, # The target gene set list assay.type = 'logcounts', # The name of assays of gene profiler gsvaExp.name = 'CancerSEA' # The name of result to save to gsvaExps slot ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 3.968000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.040000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.117000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.007000 seconds #> ORA analysis ... #> elapsed time is 0.010000 seconds # Then the result was added to gsvaExps to return a SVPExperiment object, the result # can be extracted with gsvaExp, you can view more information via help(gsvaExp). pdac_a_spe #> class: SVPExperiment #> dim: 13160 428 #> metadata(0): #> assays(2): counts logcounts #> rownames(13160): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): sample_id cluster_domain sizeFactor #> reducedDimNames(1): MCA #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): #> gsvaExps names(1) : CancerSEA gsvaExpNames(pdac_a_spe) #> [1] "CancerSEA" # The result is also a SingleCellExperiment or SpatialExperiment. gsvaExp(pdac_a_spe, 'CancerSEA') #> class: SpatialExperiment #> dim: 14 428 #> metadata(0): #> assays(1): affi.score #> rownames(14): Angiogenesis Apoptosis ... Quiescence Stemness #> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate #> geneSets #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): cluster_domain sizeFactor sample_id #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): # We use sc_spatial of ggsc to visualize the distribution of each functions # Note: when the number of features is too large, we reco gsvaExp(pdac_a_spe, 'CancerSEA') %>% sc_spatial( features = rownames(.), mapping = aes(x = x, y = y), geom = geom_bgpoint, pointsize = 1.2, ncol = 4, common.legend = FALSE # The output is a patchwork ) & scale_color_viridis_c(option='B', guide='none') #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. Figure 2.1: The heatmap of each CancerSEA function # The other function also can be quantified with runSGSA # the gset.idx.list supports the named gene list, gson object or # locate gmt file or html url of gmt. pdac_a_spe <- runSGSA( pdac_a_spe, gset.idx.list = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.2.Hs/h.all.v2023.2.Hs.symbols.gmt", gsvaExp.name = 'hallmark', assay.type = 'logcounts' # default ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 3.566000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.026000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.206000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.012000 seconds #> ORA analysis ... #> elapsed time is 0.021000 seconds # The result also was added to gsvaExps pdac_a_spe #> class: SVPExperiment #> dim: 13160 428 #> metadata(0): #> assays(2): counts logcounts #> rownames(13160): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): sample_id cluster_domain sizeFactor #> reducedDimNames(1): MCA #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): #> gsvaExps names(2) : CancerSEA hallmark 2.2 Quantification of cell-type To quantifying the cell-type for each captured location, the gene set should be the marker gene sets of cell-type. It can be the pre-established marker gene sets, such as the markers from CellMarkers or PanglaoDB. High-quality cell marker genes from past reports or your own compilations are acceptable. In addition, It can also be extracted from the reference single cell data. The following sections will detail the specific operations of each approach. 2.2.1 Quantification of cell-type using the pre-established marker gene sets We can use the pre-established marker gene sets to quantifying the cell-type for each captured location. Here, we used the markers obtained from the CARD, which was used as input for CARDfree command. # The marker gene sets load(url("https://github.com/YingMa0107/CARD/blob/master/data/markerList.RData?raw=true")) markerList |> names() #> [1] "Acinar_cells" #> [2] "Ductal_terminal_ductal_like" #> [3] "Ductal_CRISP3_high-centroacinar_like" #> [4] "Cancer_clone_A" #> [5] "Ductal_MHC_Class_II" #> [6] "Cancer_clone_B" #> [7] "mDCs_A" #> [8] "Ductal_APOL1_high-hypoxic" #> [9] "Tuft_cells" #> [10] "mDCs_B" #> [11] "pDCs" #> [12] "Endocrine_cells" #> [13] "Endothelial_cells" #> [14] "Macrophages_A" #> [15] "Mast_cells" #> [16] "Macrophages_B" #> [17] "T_cells_&_NK_cells" #> [18] "Monocytes" #> [19] "RBCs" #> [20] "Fibroblasts" # We can view the marker gene number of each cell-type. lapply(markerList, length) |> unlist() #> Acinar_cells Ductal_terminal_ductal_like #> 49 13 #> Ductal_CRISP3_high-centroacinar_like Cancer_clone_A #> 34 81 #> Ductal_MHC_Class_II Cancer_clone_B #> 7 110 #> mDCs_A Ductal_APOL1_high-hypoxic #> 292 47 #> Tuft_cells mDCs_B #> 128 209 #> pDCs Endocrine_cells #> 134 182 #> Endothelial_cells Macrophages_A #> 165 210 #> Mast_cells Macrophages_B #> 286 167 #> T_cells_&_NK_cells Monocytes #> 89 160 #> RBCs Fibroblasts #> 92 224 pdac_a_spe <- runSGSA( pdac_a_spe, gset.idx.list = markerList, gsvaExp.name = 'CellTypeFree', assay.type = 'logcounts' ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 3.473000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.012000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.126000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.007000 seconds #> ORA analysis ... #> elapsed time is 0.011000 seconds # Then we can visualize the cell-type activity via sc_spatial of SVP with pie plot gsvaExp(pdac_a_spe, "CellTypeFree") |> sc_spatial( features = sort(names(markerList)), mapping = aes(x = x, y = y), plot.pie = TRUE, color = NA, pie.radius.scale = .9 ) Figure 2.2: The pie plot of cell-type activity We cal also display each cell-type activity with heat-map according to the spatial coordinate. pdac_a_spe |> gsvaExp(3) |> sc_spatial( features = sort(names(markerList)), mapping = aes(x = x, y = y), geom = geom_bgpoint, pointsize = 1.2, ncol = 4 ) + scale_color_viridis_c() #> Scale for colour is already present. #> Adding another scale for colour, which will replace the existing scale. Figure 2.3: the heatmap of each cell-type activity 2.2.2 Quantification of cell-type using the reference single-cell data This approach requires obtaining the cell-type marker genes from the reference single-cell data. First, the reference single cell data should be normalized. Here, the command of logNormCounts of scuttle is used. Next, the MCA dimensionality reduction is performed by runMCA of SVP. Finally, the marker gene sets of cell-type are extracted via runDetectMarker of SVP. # load the reference single cell transcriptome # it is a SingleCellExperiment pdac_a_sce <- readRDS(url("https://raw.githubusercontent.com/xiangpin/Data_for_Vignette_SVP/main/data/pdac_a_sce.rds")) pdac_a_sce #> class: SingleCellExperiment #> dim: 15418 1926 #> metadata(0): #> assays(1): counts #> rownames(15418): 1-Mar 1-Sep ... ZZEF1 ZZZ3 #> rowData names(0): #> colnames(1926): Cell1 Cell2 ... Cell1925 Cell1926 #> colData names(1): Cell #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): pdac_a_sce <- logNormCounts(pdac_a_sce) # obtain the top high variable genes set.seed(123) stats <- modelGeneVar(pdac_a_sce) hvgs <- getTopHVGs(stats) # perform the MCA reduction. pdac_a_sce <- pdac_a_sce |> runMCA(assay.type = 'logcounts', subset.row=hvgs[seq(2000)]) #> Computing Fuzzy Matrix #> Computing SVD #> Computing Coordinates # obtain the marker gene sets from reference single-cell transcriptome based on MCA space refmakergene <- runDetectMarker(pdac_a_sce, group.by='Cell', ntop=200, present.prop.in.group=.2, present.prop.in.sample=.5) refmakergene |> lapply(length) |> unlist() #> Acinar cells Ductal - terminal ductal like #> 53 82 #> Ductal - CRISP3 high/centroacinar like Cancer clone A #> 90 90 #> Ductal - MHC Class II Cancer clone B #> 72 90 #> mDCs A Ductal - APOL1 high/hypoxic #> 89 92 #> Tuft cells mDCs B #> 87 71 #> pDCs Endocrine cells #> 50 60 #> Endothelial cells Macrophages A #> 81 76 #> Mast cells Macrophages B #> 120 48 #> T cells & NK cells Monocytes #> 39 56 #> RBCs Fibroblasts #> 9 97 Then, the quantification of the cell-type gene signatures can also be performed via runSGSA with the marker gene sets. # use the maker gene from the reference single-cell transcriptome pdac_a_spe <- runSGSA( pdac_a_spe, gset.idx.list = refmakergene, gsvaExp.name = 'CellTypeRef' ) #> Building the nearest neighbor graph with the distance between features and #> cells ... #> elapsed time is 3.518000 seconds #> Building the seed matrix using the gene set and the nearest neighbor graph for #> random walk with restart ... #> elapsed time is 0.011000 seconds #> Calculating the affinity score using random walk with restart ... #> elapsed time is 0.114000 seconds #> Tidying the result of random walk with restart ... #> elapsed time is 0.007000 seconds #> ORA analysis ... #> elapsed time is 0.010000 seconds pdac_a_spe |> gsvaExp('CellTypeRef') |> sc_spatial( features = sort(names(refmakergene)), mapping = aes(x = x, y = y), plot.pie = TRUE, color = NA, pie.radius.scale = .9 ) Figure 2.4: The pie plot of cell-type activity with the marker gene sets from reference single-cell transcriptome We can found that the result based on marker gene sets from reference single-cell transcriptome is similar with the result based on marker gene sets from previous reports. References "],["spatial-statistical-analysis.html", "3 Spatial statistical analysis 3.1 Univariate spatial statistical analysis 3.2 Bivariate spatial statistical analysis", " 3 Spatial statistical analysis SVP implements several spatial statistical algorithms to explore the spatial distribution of cell function or other features from three aspects: whether cell function states have spatial variability, where spatially variable cell functions cluster, and the co-distribution patterns of spatially variable cell functions with other features. The following sections provide a detailed description of each aspect. 3.1 Univariate spatial statistical analysis This section addresses whether the feature is spatially variable and identifies its high-cluster areas. In detail, to identify the spatial variable features, SVP implements three spatial autocorrelation algorithms using Rcpp and RcppParallel: global Moran’s I, Geary’s C and Getis-Ord’s G. To identify the spatial high-cluster areas, SVP implements two local univariate spatial statistical algorithms using Rcpp and RcppParallel: local Getis-Ord’s G and local Moran’s I. 3.1.1 Identification of spatial variable features Here, we identify the spatial variable cell-types using runDetectSVG, which implements the global Moran’s I, Geary’s C and Getis-Ord’s G algorithms. # First approach: # we can first obtain the target gsvaExp with gsvaExp names via # gsvaExp() function, for example, CellTypeFree pdac_a_spe |> gsvaExp("CellTypeFree") #> class: SpatialExperiment #> dim: 20 428 #> metadata(0): #> assays(1): affi.score #> rownames(20): Acinar_cells Ductal_terminal_ductal_like ... RBCs #> Fibroblasts #> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate #> geneSets #> colnames(428): Spot1 Spot2 ... Spot427 Spot428 #> colData names(3): cluster_domain sizeFactor sample_id #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): #> spatialCoords names(2) : x y #> imgData names(0): # Then we use runDetectSVG with method = 'moran' to identify the spatial variable # cell-types with global Moran's I. celltype_free.I <- pdac_a_spe |> gsvaExp("CellTypeFree") |> runDetectSVG( assay.type = 'affi.score', # because default is 'logcounts', it should be adjused method = 'moransi', action = 'only' ) #> Identifying the spatially variable gene sets (pathway) based on moransi #> elapsed time is 0.077000 seconds celltype_free.I |> dplyr::arrange(rank) |> head() #> obs expect.moransi sd.moransi #> Cancer_clone_A 0.7402441 -0.00234192 0.02807961 #> Cancer_clone_B 0.7187722 -0.00234192 0.02806609 #> Acinar_cells 0.6959057 -0.00234192 0.02788079 #> mDCs_A 0.5823319 -0.00234192 0.02760418 #> Ductal_CRISP3_high-centroacinar_like 0.5605952 -0.00234192 0.02794187 #> Endothelial_cells 0.5362277 -0.00234192 0.02757573 #> Z.moransi pvalue padj rank #> Cancer_clone_A 26.44573 2.042568e-154 4.085135e-153 1 #> Cancer_clone_B 25.69343 6.921386e-146 6.921386e-145 2 #> Acinar_cells 25.04404 1.013736e-138 6.758238e-138 3 #> mDCs_A 21.18062 7.205485e-100 3.602743e-99 4 #> Ductal_CRISP3_high-centroacinar_like 20.14672 1.437699e-90 5.750794e-90 5 #> Endothelial_cells 19.53057 3.018138e-85 1.006046e-84 6 # Second approach: # the result was added to the original object by specific the # gsvaexp argument in the runDetectSVG pdac_a_spe <- pdac_a_spe |> runDetectSVG( gsvaexp = 'CellTypeFree', method = 'moran' ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on moransi #> elapsed time is 0.064000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.moransi' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified #> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment` #> or `SpatialExperiment`),then also using `svDf()` to extract. gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi") |> dplyr::arrange(rank) |> head() #> obs expect.moransi sd.moransi #> Cancer_clone_A 0.7402441 -0.00234192 0.02807961 #> Cancer_clone_B 0.7187722 -0.00234192 0.02806609 #> Acinar_cells 0.6959057 -0.00234192 0.02788079 #> mDCs_A 0.5823319 -0.00234192 0.02760418 #> Ductal_CRISP3_high-centroacinar_like 0.5605952 -0.00234192 0.02794187 #> Endothelial_cells 0.5362277 -0.00234192 0.02757573 #> Z.moransi pvalue padj rank #> Cancer_clone_A 26.44573 2.042568e-154 4.085135e-153 1 #> Cancer_clone_B 25.69343 6.921386e-146 6.921386e-145 2 #> Acinar_cells 25.04404 1.013736e-138 6.758238e-138 3 #> mDCs_A 21.18062 7.205485e-100 3.602743e-99 4 #> Ductal_CRISP3_high-centroacinar_like 20.14672 1.437699e-90 5.750794e-90 5 #> Endothelial_cells 19.53057 3.018138e-85 1.006046e-84 6 The obs is the Moran’s I, expect.moransi is the expect Moran’s I (E(I)). Moran’s I quantifies the correlation between a single feature at a specific spatial point or region and its neighboring points or regions. Global Moran’s I usually ranges from -1 to 1, Global Moran’s I significantly above E(I) indicate a distinct spatial pattern or spatial clustering, which occurs when neighboring captured locations tend to have similar feature value. Global Moran’s I around E(I) suggest random spatial expression, that is, absence of spatial pattern. Global Moran’s I significantly below E(I) imply a chessboard-like pattern or dispersion. # using Geary's C pdac_a_spe <- pdac_a_spe |> runDetectSVG( gsvaexp = 'CellTypeFree', method = 'geary' ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on gearysc #> elapsed time is 0.031000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.gearysc' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified #> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment` #> or `SpatialExperiment`),then also using `svDf()` to extract. gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf('sv.gearysc') |> dplyr::arrange(rank) |> head() #> obs expect.gearysc sd.gearysc #> Cancer_clone_A 0.2709051 1 0.02873561 #> Cancer_clone_B 0.2880919 1 0.02879022 #> Acinar_cells 0.3055211 1 0.02952587 #> Ductal_CRISP3_high-centroacinar_like 0.4213014 1 0.02928596 #> mDCs_A 0.4603825 1 0.03058249 #> Endothelial_cells 0.4843175 1 0.03068854 #> Z.gearysc pvalue padj rank #> Cancer_clone_A 25.37252 2.535621e-142 5.071243e-141 1 #> Cancer_clone_B 24.72743 2.711724e-135 2.711724e-134 2 #> Acinar_cells 23.52103 1.242716e-122 8.284772e-122 3 #> Ductal_CRISP3_high-centroacinar_like 19.76027 3.272367e-87 1.636183e-86 4 #> mDCs_A 17.64466 5.592678e-70 2.237071e-69 5 #> Endothelial_cells 16.80375 1.145504e-63 3.818347e-63 6 The obs is the Geary’s C. Global Geary’s C is another popular index of global spatial autocorrelation, but focuses more on differences between neighboring values. Global Geary’s C usually range from 0 to 2. Global Geary’s C significantly below E[C]=1 indicate a distinct spatial pattern or spatial clustering, the value significantly above E[C]=1 suggest a chessboard-like pattern or dispersion, the value around E[C]=1 imply absence of spatial pattern. # using Getis-ord's G pdac_a_spe <- pdac_a_spe |> runDetectSVG( gsvaexp = 'CellTypeFree', method = 'getis' ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on getisord #> elapsed time is 0.030000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.getisord' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified #> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment` #> or `SpatialExperiment`),then also using `svDf()` to extract. gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf('sv.getisord') |> dplyr::arrange(rank) |> head() #> obs expect.G sd.G #> Cancer_clone_A 0.007214987 0.00234192 0.0001844816 #> Cancer_clone_B 0.007406175 0.00234192 0.0001975938 #> Acinar_cells 0.007109879 0.00234192 0.0001920919 #> mDCs_A 0.005853463 0.00234192 0.0001651428 #> Ductal_CRISP3_high-centroacinar_like 0.005800695 0.00234192 0.0001749367 #> Endothelial_cells 0.005749155 0.00234192 0.0001750884 #> Z.G pvalue padj rank #> Cancer_clone_A 26.41492 4.617621e-154 9.235241e-153 1 #> Cancer_clone_B 25.62962 3.567592e-145 3.567592e-144 2 #> Acinar_cells 24.82124 2.644298e-136 1.762865e-135 3 #> mDCs_A 21.26368 1.231569e-100 6.157844e-100 4 #> Ductal_CRISP3_high-centroacinar_like 19.77158 2.615753e-87 1.046301e-86 5 #> Endothelial_cells 19.46009 1.196819e-84 3.989397e-84 6 The obs is the Getis-Ord’s G of each cell-type. Global Getis-Ord’s G measures global spatial autocorrelation by evaluating the clustering of high/low value, specifically identifying hot spots or cold spots. Global Getis-Ord’s G values have no fixed range, as they depend on the specific dataset and spatial weight matrix. Global Getis-Ord’s G significantly above E(G) indicate a distinct spatial pattern or spatial clustering, which occurs when neighboring captured locations tend to have similar feature value. Global Getis-Ord’s G around E(G) suggest random spatial expression, that is, absence of spatial pattern. Global Getis-Ord’s G significantly below E(G) imply a chessboard-like pattern or dispersion. So the Cancer clone A, Cancer clone B, and Acinar cells were more concentrated in the spatial distribution compare to others. 3.1.2 Identification of local spatial aggregation areas Next, we can identify the spatial high-cluster areas for the spatial variable features via runLISA, which provides two local spatial statistics algorithms: Local Getis-Ord’s G and Local Moran’s I celltypefree_lisares <- pdac_a_spe |> gsvaExp("CellTypeFree") %>% runLISA(features=rownames(.), assay.type=1) celltypefree_lisares$Cancer_clone_A |> head() #> Gi E.Gi Var.Gi Z.Gi Pr (z != E(Gi)) #> Spot1 0.0001746202 0.00234192 1.649943e-06 -1.687270 0.09155141 #> Spot2 0.0003158584 0.00234192 2.996272e-06 -1.170475 0.24180992 #> Spot3 0.0002774819 0.00234192 2.489785e-06 -1.308341 0.19075761 #> Spot4 0.0001673520 0.00234192 2.489501e-06 -1.378215 0.16813699 #> Spot5 0.0004016950 0.00234192 2.994463e-06 -1.121225 0.26219219 #> Spot6 0.0004403359 0.00234192 2.130895e-06 -1.302670 0.19268729 #> cluster.no.test cluster.test #> Spot1 Low NoSign #> Spot2 Low NoSign #> Spot3 Low NoSign #> Spot4 Low NoSign #> Spot5 Low NoSign #> Spot6 Low NoSign Local Getis-Ord’s G (Gi) measures the aggregation of a feature value in a specific spot and its neighbors, determining if locations are significantly clustered with high (hot spots) or low values (cold spots). Gi usually have no fixed range, a larger Gi indicates a stronger clustering of high expression for the feature in captured location and its neighboring spots, while a smaller Gi indicates a stronger clustering of low expression. gsvaExp(pdac_a_spe, 'CellTypeFree') |> plot_lisa_feature( lisa.res = celltypefree_lisares[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')], assay.type = 1, gap_line_width = .18 ) Figure 3.1: The result of local Getis-Ord for cell-type The highlighted area signifies that the feature is significantly clustered in this region. We also can use local Moran’s I method to identify the region. celltypefree_lisa.i <- pdac_a_spe |> gsvaExp("CellTypeFree") %>% runLISA( features = rownames(.), assay.type = 1, method = 'localmoran' ) celltypefree_lisa.i$Cancer_clone_A |> head() #> Ii E.Ii Var.Ii Z.Ii Pr (z != E(Ii)) #> Spot1 0.2411323 -0.0004394871 0.02049855 1.687270 0.09155141 #> Spot2 0.2610045 -0.0005894003 0.04994947 1.170475 0.24180992 #> Spot3 0.2907443 -0.0007044064 0.04962291 1.308341 0.19075761 #> Spot4 0.3124376 -0.0007329623 0.05163310 1.378215 0.16813699 #> Spot5 0.2792327 -0.0007358277 0.06234951 1.121225 0.26219219 #> Spot6 0.2256567 -0.0005002450 0.03014053 1.302670 0.19268729 #> cluster.no.test cluster.test #> Spot1 Low-Low NoSign #> Spot2 Low-Low NoSign #> Spot3 Low-Low NoSign #> Spot4 Low-Low NoSign #> Spot5 Low-Low NoSign #> Spot6 Low-Low NoSign Local Moran’s I (Ii) measures the spatial correlation between a feature’s values in a specific area and its neighbors, assessing the similarity or difference between a location’s value and those of its surroundings. Ii also have no fixed range, a larger Ii signifies a stronger positive correlation in a region and its neighbors, with clusters of high or low values. A smaller value indicates a negative correlation, where high values are next to low values, or vice versa. gsvaExp(pdac_a_spe, "CellTypeFree") |> plot_lisa_feature( lisa.res = celltypefree_lisa.i[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')], clustertype = 'High-High', assay.type = 1, gap_line_width = .18 ) Figure 3.2: The result of local Moran for cell-type 3.2 Bivariate spatial statistical analysis Next, to explore the bivariate spatial co-distribution, SVP implements global and local bivariate spatial statistics algorithms. Global bivariate spatial statistics algorithm can be used to identify whether the bivariate is co-cluster in the specific space. Local bivariate spatial algorithm is to find the areas of spatial co-clustering of bivariate. 3.2.1 Global bivariate spatial analysis Here, we use runGLOBALBV of SVP to explore the spatial co-distribution between the cell-types. gbv.res <- pdac_a_spe |> gsvaExp('CellTypeFree') |> runGLOBALBV( features1 = names(markerList), assay.type = 1, permutation = NULL, # if permutation is NULL, mantel test will be used to calculated the pvalue add.pvalue = TRUE, alternative = 'greater' # one-side test ) # gbv.res is a list contained Lee and pvalue matrix. gbv.res |> as_tbl_df(diag=FALSE, flag.clust = TRUE) |> dplyr::filter(grepl("Cancer", x)) |> dplyr::arrange(desc(Lee)) #> # A tibble: 35 × 4 #> x y Lee pvalue #> <fct> <fct> <dbl> <dbl> #> 1 Cancer_clone_A Cancer_clone_B 0.728 0 #> 2 Cancer_clone_B Fibroblasts 0.185 2.04e-27 #> 3 Cancer_clone_A Fibroblasts 0.180 5.82e-27 #> 4 Cancer_clone_B Endothelial_cells 0.000286 4.99e- 1 #> 5 Cancer_clone_A Endothelial_cells -0.0169 8.07e- 1 #> 6 Cancer_clone_B T_cells_&_NK_cells -0.0442 9.97e- 1 #> 7 Cancer_clone_A RBCs -0.0443 9.80e- 1 #> 8 Cancer_clone_B RBCs -0.0513 9.91e- 1 #> 9 Cancer_clone_A T_cells_&_NK_cells -0.0696 1.00e+ 0 #> 10 Cancer_clone_B Macrophages_A -0.0773 1.00e+ 0 #> # ℹ 25 more rows The Lee value typically ranges from -1 to 1. When the index is closer to 1, it indicates a strong positive spatial association between the two variables, meaning the high-value areas of one variable tend to overlap with the high-value areas of the other, and low-value areas similarly overlap, showing strong spatial consistency. Conversely, if the index is near -1, it suggests a strong negative spatial association, where the high-value areas of one variable tend to overlap with the low-value areas of the other. We developed a function plot_heatmap_globalbv to visualize the result of global bivariate spatial analysis # We can also display the result of global univariate spatial analysis moran.res <- gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi") # The result of local univariate spatial analysis also can be added lisa.f1.res <- gsvaExp(pdac_a_spe, "CellTypeFree", withColData = TRUE) |> cal_lisa_f1(lisa.res = celltypefree_lisares, group.by = 'cluster_domain') plot_heatmap_globalbv(gbv.res, moran.t=moran.res, lisa.t=lisa.f1.res) #> Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height, #> resolveHJust(x$just, : semi-transparency is not supported on this device: #> reported only once per page Figure 3.3: The results of clustering analysis of spatial distribution between cell-type We can explore the spatial co-cluster between the different type of gsvaExp. For example, we explore the spatial co-distribution between the CellTypeFree and CancerSEA of pdac_a_spe object. # If the number of features is excessive, we recommend selecting # those with spatial variability. # celltypeid <- gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf("sv.moransi") |> # dplyr::filter(padj<=0.01) |> rownames() celltypeid <- rownames(gsvaExp(pdac_a_spe, "CellTypeFree")) cancerseaid <- gsvaExp(pdac_a_spe, "CancerSEA") |> runDetectSVG(assay.type = 1, verbose=FALSE) |> svDf() |> dplyr::filter(padj <= 0.01) |> rownames() #> elapsed time is 0.032000 seconds runGLOBALBV( pdac_a_spe, gsvaexp = c("CellTypeFree", "CancerSEA"), gsvaexp.features = c(celltypeid, cancerseaid), permutation = NULL, add.pvalue = TRUE ) -> res.gbv.celltype.cancersea #> The `gsvaexp` is specified, the specified `gsvaExp` will be #> used to perform the analysis. plot_heatmap_globalbv(res.gbv.celltype.cancersea) Figure 3.4: The heatmap of spatial correlation between cell tyep and CancerSEA states 3.2.2 Local bivariate spatial analysis After global bivariate spatial analysis, we found the Cancer cell and Fibroblasts show significant spatial co-aggregation. Next, we use runLOCALBV of SVP to identify the co-aggregation areas. lbv.res <- pdac_a_spe |> gsvaExp('CellTypeFree') |> runLOCALBV(features1='Fibroblasts', features2=c('Cancer_clone_A', "Cancer_clone_B"), assay.type=1) # The result can be added to gsvaExp, then visualized by ggsc. LISAsce(pdac_a_spe, lisa.res=lbv.res, gsvaexp.name='LBV.res') |> gsvaExp("LBV.res") %>% plot_lisa_feature(lisa.res = lbv.res, assay.type = 1) Figure 3.5: The heatmap of Local bivariate spatial analysis The highlighted area denotes the region where the two variables significantly co-aggregate in space. "],["session-information.html", "4 Session information", " 4 Session information Here is the output of sessionInfo() on the system on which this document was compiled: #> R Under development (unstable) (2024-12-10 r87437) #> Platform: x86_64-pc-linux-gnu #> Running under: CentOS Linux 7 (Core) #> #> Matrix products: default #> BLAS: /biostack/tools/devtools/R/devel_241212/lib64/R/lib/libRblas.so #> LAPACK: /biostack/tools/devtools/R/devel_241212/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0 #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C #> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 #> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 #> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> time zone: Asia/Shanghai #> tzcode source: system (glibc) #> #> attached base packages: #> [1] stats4 stats graphics grDevices utils datasets methods #> [8] base #> #> other attached packages: #> [1] scatterpie_0.2.4 magrittr_2.0.3 #> [3] ggplot2_3.5.1 SVP_0.99.0 #> [5] scran_1.35.0 scuttle_1.17.0 #> [7] ggsc_1.5.0 SpatialExperiment_1.17.0 #> [9] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0 #> [11] Biobase_2.67.0 GenomicRanges_1.59.1 #> [13] GenomeInfoDb_1.43.2 IRanges_2.41.2 #> [15] S4Vectors_0.45.2 BiocGenerics_0.53.3 #> [17] generics_0.1.3 MatrixGenerics_1.19.0 #> [19] matrixStats_1.4.1 bookdown_0.41 #> #> loaded via a namespace (and not attached): #> [1] RcppAnnoy_0.0.22 splines_4.5.0 #> [3] later_1.4.1 ggplotify_0.1.2 #> [5] tibble_3.2.1 polyclip_1.10-7 #> [7] fastDummies_1.7.4 lifecycle_1.0.4 #> [9] edgeR_4.5.1 globals_0.16.3 #> [11] lattice_0.22-6 MASS_7.3-61 #> [13] limma_3.63.2 plotly_4.10.4 #> [15] sass_0.4.9 rmarkdown_2.29 #> [17] jquerylib_0.1.4 yaml_2.3.10 #> [19] metapod_1.15.0 httpuv_1.6.15 #> [21] Seurat_5.1.0 sctransform_0.4.1 #> [23] spam_2.11-0 sp_2.1-4 #> [25] spatstat.sparse_3.1-0 reticulate_1.40.0 #> [27] cowplot_1.1.3 pbapply_1.7-2 #> [29] RColorBrewer_1.1-3 abind_1.4-8 #> [31] zlibbioc_1.53.0 Rtsne_0.17 #> [33] purrr_1.0.2 pracma_2.4.4 #> [35] yulab.utils_0.1.8 tweenr_2.0.3 #> [37] GenomeInfoDbData_1.2.13 ggrepel_0.9.6 #> [39] irlba_2.3.5.1 listenv_0.9.1 #> [41] spatstat.utils_3.1-1 tidytree_0.4.6 #> [43] goftest_1.2-3 RSpectra_0.16-2 #> [45] spatstat.random_3.3-2 dqrng_0.4.1 #> [47] fitdistrplus_1.2-1 parallelly_1.40.1 #> [49] DelayedMatrixStats_1.29.0 leiden_0.4.3.1 #> [51] codetools_0.2-20 DelayedArray_0.33.3 #> [53] ggforce_0.4.2 tidyselect_1.2.1 #> [55] aplot_0.2.3 UCSC.utils_1.3.0 #> [57] farver_2.1.2 ScaledMatrix_1.15.0 #> [59] spatstat.explore_3.3-3 jsonlite_1.8.9 #> [61] BiocNeighbors_2.1.2 progressr_0.15.1 #> [63] ggridges_0.5.6 survival_3.7-0 #> [65] tools_4.5.0 treeio_1.31.0 #> [67] ggstar_1.0.4 ica_1.0-3 #> [69] Rcpp_1.0.13-1 glue_1.8.0 #> [71] gridExtra_2.3 SparseArray_1.7.2 #> [73] xfun_0.49 dplyr_1.1.4 #> [75] withr_3.0.2 fastmap_1.2.0 #> [77] bluster_1.17.0 fansi_1.0.6 #> [79] digest_0.6.37 rsvd_1.0.5 #> [81] gridGraphics_0.5-1 R6_2.5.1 #> [83] mime_0.12 colorspace_2.1-1 #> [85] scattermore_1.2 tensor_1.5 #> [87] spatstat.data_3.1-4 utf8_1.2.4 #> [89] tidyr_1.3.1 data.table_1.16.4 #> [91] httr_1.4.7 htmlwidgets_1.6.4 #> [93] S4Arrays_1.7.1 uwot_0.2.2 #> [95] pkgconfig_2.0.3 gtable_0.3.6 #> [97] lmtest_0.9-40 XVector_0.47.0 #> [99] htmltools_0.5.8.1 dotCall64_1.2 #> [101] SeuratObject_5.0.2 scales_1.3.0 #> [103] png_0.1-8 spatstat.univar_3.1-1 #> [105] ggfun_0.1.8 knitr_1.49 #> [107] reshape2_1.4.4 rjson_0.2.23 #> [109] nlme_3.1-166 cachem_1.1.0 #> [111] zoo_1.8-12 stringr_1.5.1 #> [113] KernSmooth_2.23-24 parallel_4.5.0 #> [115] miniUI_0.1.1.1 pillar_1.9.0 #> [117] grid_4.5.0 vctrs_0.6.5 #> [119] RANN_2.6.2 promises_1.3.2 #> [121] BiocSingular_1.23.0 tidydr_0.0.5 #> [123] beachmat_2.23.4 xtable_1.8-4 #> [125] cluster_2.1.8 evaluate_1.0.1 #> [127] magick_2.8.5 locfit_1.5-9.10 #> [129] cli_3.6.3 compiler_4.5.0 #> [131] rlang_1.1.4 crayon_1.5.3 #> [133] future.apply_1.11.3 labeling_0.4.3 #> [135] plyr_1.8.9 fs_1.6.5 #> [137] stringi_1.8.4 viridisLite_0.4.2 #> [139] deldir_2.0-4 BiocParallel_1.41.0 #> [141] munsell_0.5.1 lazyeval_0.2.2 #> [143] spatstat.geom_3.3-4 Matrix_1.7-1 #> [145] RcppHNSW_0.6.0 patchwork_1.3.0 #> [147] sparseMatrixStats_1.19.0 future_1.34.0 #> [149] statmod_1.5.0 shiny_1.9.1 #> [151] ROCR_1.0-11 igraph_2.1.2 #> [153] RcppParallel_5.1.9 bslib_0.8.0 #> [155] ggtree_3.15.0 fastmatch_1.1-4 #> [157] ape_5.8 "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/session-information.html b/session-information.html index 6c9e944..be4e241 100644 --- a/session-information.html +++ b/session-information.html @@ -22,7 +22,7 @@ - + diff --git a/spatial-statistical-analysis.html b/spatial-statistical-analysis.html index de22b3f..b5f3b72 100644 --- a/spatial-statistical-analysis.html +++ b/spatial-statistical-analysis.html @@ -22,7 +22,7 @@ - + @@ -239,7 +239,7 @@

3.1.1 Identification of spatial v action = 'only' ) #> Identifying the spatially variable gene sets (pathway) based on moransi -#> elapsed time is 0.060000 seconds +#> elapsed time is 0.077000 seconds celltype_free.I |> dplyr::arrange(rank) |> head() #> obs expect.moransi sd.moransi @@ -267,7 +267,7 @@

3.1.1 Identification of spatial v ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on moransi -#> elapsed time is 0.077000 seconds +#> elapsed time is 0.064000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.moransi' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified @@ -297,7 +297,7 @@

3.1.1 Identification of spatial v ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on gearysc -#> elapsed time is 0.046000 seconds +#> elapsed time is 0.031000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.gearysc' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified @@ -328,7 +328,7 @@

3.1.1 Identification of spatial v ) #> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'. #> Identifying the spatially variable gene sets (pathway) based on getisord -#> elapsed time is 0.037000 seconds +#> elapsed time is 0.030000 seconds #> The result is added to the input object, which can be extracted using #> `svDf()` with type='sv.getisord' for `SingleCellExperiment` or `SpatialExperiment`. #> If input object is `SVPExperiment`, and `gsvaexp` is specified @@ -376,17 +376,17 @@

3.1.2 Identification of local spa #> Spot5 Low NoSign #> Spot6 Low NoSign

Local Getis-Ord’s G (Gi) measures the aggregation of a feature value in a specific spot and its neighbors, determining if locations are significantly clustered with high (hot spots) or low values (cold spots). Gi usually have no fixed range, a larger Gi indicates a stronger clustering of high expression for the feature in captured location and its neighboring spots, while a smaller Gi indicates a stronger clustering of low expression.

-

(ref:LISA_G_fig) The result of local Getis-Ord for cell-type

+
gsvaExp(pdac_a_spe, 'CellTypeFree') |> 
   plot_lisa_feature(
     lisa.res = celltypefree_lisares[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')], 
     assay.type = 1, 
     gap_line_width = .18
   )
-
-(ref:LISA_G_fig) +
+The result of local Getis-Ord for cell-type

-(#fig:LISA_G_fig)(ref:LISA_G_fig) +Figure 3.1: The result of local Getis-Ord for cell-type

The highlighted area signifies that the feature is significantly clustered in this region.

@@ -415,7 +415,7 @@

3.1.2 Identification of local spa #> Spot5 Low-Low NoSign #> Spot6 Low-Low NoSign

Local Moran’s I (Ii) measures the spatial correlation between a feature’s values in a specific area and its neighbors, assessing the similarity or difference between a location’s value and those of its surroundings. Ii also have no fixed range, a larger Ii signifies a stronger positive correlation in a region and its neighbors, with clusters of high or low values. A smaller value indicates a negative correlation, where high values are next to low values, or vice versa.

-

(ref:LISA_I_fig) The result of local Moran for cell-type

+
gsvaExp(pdac_a_spe, "CellTypeFree") |>
   plot_lisa_feature(
      lisa.res = celltypefree_lisa.i[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')], 
@@ -423,10 +423,10 @@ 

3.1.2 Identification of local spa assay.type = 1, gap_line_width = .18 )

-
-(ref:LISA_I_fig) +
+The result of local Moran for cell-type

-(#fig:LISA_I_fig)(ref:LISA_I_fig) +Figure 3.2: The result of local Moran for cell-type

@@ -466,7 +466,7 @@

3.2.1 Global bivariate spatial an #> # ℹ 25 more rows

The Lee value typically ranges from -1 to 1. When the index is closer to 1, it indicates a strong positive spatial association between the two variables, meaning the high-value areas of one variable tend to overlap with the high-value areas of the other, and low-value areas similarly overlap, showing strong spatial consistency. Conversely, if the index is near -1, it suggests a strong negative spatial association, where the high-value areas of one variable tend to overlap with the low-value areas of the other.

We developed a function plot_heatmap_globalbv to visualize the result of global bivariate spatial analysis

-

(ref:fig_gbv) The results of clustering analysis of spatial distribution between cell-type

+
# We can also display the result of global univariate spatial analysis 
 moran.res <- gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi")
 
@@ -478,14 +478,14 @@ 

3.2.1 Global bivariate spatial an #> Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height, #> resolveHJust(x$just, : semi-transparency is not supported on this device: #> reported only once per page

-
-(ref:fig_gbv) +
+The results of clustering analysis of spatial distribution between cell-type

-(#fig:fig_gbv)(ref:fig_gbv) +Figure 3.3: The results of clustering analysis of spatial distribution between cell-type

We can explore the spatial co-cluster between the different type of gsvaExp. For example, we explore the spatial co-distribution between the CellTypeFree and CancerSEA of pdac_a_spe object.

-

(ref:runGLOBALBV_CellType_CancerSEA) The heatmap of spatial correlation between cell tyep and CancerSEA states

+
# If the number of features is excessive, we recommend selecting 
 # those with spatial variability.
 # celltypeid <- gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf("sv.moransi") |> 
@@ -497,7 +497,7 @@ 

3.2.1 Global bivariate spatial an svDf() |> dplyr::filter(padj <= 0.01) |> rownames() -#> elapsed time is 0.037000 seconds +#> elapsed time is 0.032000 seconds runGLOBALBV( pdac_a_spe, @@ -510,10 +510,10 @@

3.2.1 Global bivariate spatial an #> used to perform the analysis. plot_heatmap_globalbv(res.gbv.celltype.cancersea)

-
-(ref:runGLOBALBV_CellType_CancerSEA) +
+The heatmap of spatial correlation between cell tyep and CancerSEA states

-(#fig:runGLOBALBV_CellType_CancerSEA)(ref:runGLOBALBV_CellType_CancerSEA) +Figure 3.4: The heatmap of spatial correlation between cell tyep and CancerSEA states

@@ -532,7 +532,7 @@

3.2.2 Local bivariate spatial ana
The heatmap of Local bivariate spatial analysis

-Figure 3.1: The heatmap of Local bivariate spatial analysis +Figure 3.5: The heatmap of Local bivariate spatial analysis

The highlighted area denotes the region where the two variables significantly co-aggregate in space.