How to run enrichment analysis of metabolites using clusterProfier #731

Krithika-Bhuvan · 2024-10-15T22:44:15Z

I have a list of metabolites, In the form of metabolite names, and also in the form of KEGG Ids. How can i run enrichment analysis on this data ? Please advise

Example data:

name keggid
Pyruvate  C00022
Acetyl-CoA  C00024
2OG  C00026
Glycine  C00037
Succinate  C00042
Aspartate  C00049

The text was updated successfully, but these errors were encountered:

guidohooiveld · 2024-10-22T22:54:21Z

A pointer is given in, for example, the protocol recently published in Nature Methods, See section 1.9 on page 16.
https://doi.org/10.1038/s41596-024-01020-z

In essence you can use the enrichKEGG function (since you have a list of KEGG ids), and use it with organism = "cpd" (because the input are 'compounds'.

> library(clusterProfiler)
> 
> input <- data.frame("name"=c("Pyruvate","Acetyl-CoA","2OG","Glycine","Succinate","Aspartate"),
+                     "keggid"=c("C00022","C00024","C00026","C00037","C00042","C00049") )
> 
> ## run over-representation analysis.
> ## note not cutoffs are applied for significanc and minimum number of compunds.
> cpd_enrich_result <- enrichKEGG(input[,"keggid"],
+                                 organism = "cpd",
+                                 minGSSize=1,
+                                 pvalueCutoff=1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/cpd/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway"...
> cpd_enrich_result
#
# over-representation test
#
#...@organism    KEGG Compound 
#...@ontology    KEGG 
#...@keytype     kegg_compound 
#...@gene        chr [1:6] "C00022" "C00024" "C00026" "C00037" "C00042" "C00049"
#...pvalues adjusted by 'BH' with cutoff <1 
#...103 enriched terms found
'data.frame':   103 obs. of  14 variables:
 $ category      : chr  "Human Diseases" "Metabolism" "Metabolism" "Metabolism" ...
 $ subcategory   : chr  "Cancer: overview" "Global and overview maps" "Chemical structure transformation maps" "Carbohydrate metabolism" ...
 $ ID            : chr  "map05230" "map01200" "map01060" "map00630" ...
 $ Description   : chr  "Central carbon metabolism in cancer" "Carbon metabolism" "Biosynthesis of plant secondary metabolites" "Glyoxylate and dicarboxylate metabolism" ...
 $ GeneRatio     : chr  "6/6" "6/6" "6/6" "5/6" ...
 $ BgRatio       : chr  "37/6500" "112/6500" "141/6500" "64/6500" ...
 $ RichFactor    : num  0.1622 0.0536 0.0426 0.0781 0.0746 ...
 $ FoldEnrichment: num  175.7 58 46.1 84.6 80.8 ...
 $ zScore        : num  32.4 18.5 16.5 20.4 20 ...
 $ pvalue        : num  2.22e-14 2.29e-11 9.38e-11 4.70e-10 5.95e-10 ...
 $ p.adjust      : num  2.29e-12 1.18e-09 3.22e-09 1.10e-08 1.10e-08 ...
 $ qvalue        : num  2.11e-13 1.08e-10 2.96e-10 1.01e-09 1.01e-09 ...
 $ geneID        : chr  "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042" ...
 $ Count         : int  6 6 6 5 5 5 4 4 4 4 ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z 

> 
> as.data.frame(cpd_enrich_result)[1:3, ]
               category                            subcategory       ID
map05230 Human Diseases                       Cancer: overview map05230
map01200     Metabolism               Global and overview maps map01200
map01060     Metabolism Chemical structure transformation maps map01060
                                         Description GeneRatio  BgRatio
map05230         Central carbon metabolism in cancer       6/6  37/6500
map01200                           Carbon metabolism       6/6 112/6500
map01060 Biosynthesis of plant secondary metabolites       6/6 141/6500
         RichFactor FoldEnrichment   zScore       pvalue     p.adjust
map05230 0.16216216      175.67568 32.38612 2.224526e-14 2.291262e-12
map01200 0.05357143       58.03571 18.50615 2.289233e-11 1.178955e-09
map01060 0.04255319       46.09929 16.45613 9.376114e-11 3.219132e-09
               qvalue                                    geneID Count
map05230 2.107446e-13 C00022/C00024/C00026/C00037/C00042/C00049     6
map01200 1.084374e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
map01060 2.960878e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
> 
> dotplot(cpd_enrich_result)
>

lungboyz · 2025-01-10T00:53:42Z

I tried to run this using a set of compound IDs, and I got an error. I also get the same error using your example code. , but if I run enrichKEGG using gene ID, it works fine, suggesting that the KEGG website is okay, but the cpd IDs don't map.

keggChem [1:13]
[1] "C00130" "C00946" "C02990" "C01762" "C01081" "C00315" "C00750" "C00882"
[9] "C05385" "C16300" "C03406" "C00357" "C05282"

kk <- enrichKEGG(gene = keggChem[1:10],
organism = 'cpd',
pvalueCutoff = 0.05)

Error in content[, 1] : subscript out of bounds

Krithika-Bhuvan · 2025-01-10T03:01:32Z

got an error. I also get the same error using your example code. , but if I run enrichKEGG using gene ID, it works fine, suggesting that the KEGG website is okay, but the cpd IDs don't map.

Hello @lungboyz - thanks for sharing. Would it be possible to share an both - example with error , and the example that worked, so its easier to compare and understand ? Much appreciated

guidohooiveld · 2025-01-10T07:32:10Z

@lungboyz

When using your 13 input ids it is working for me.... For testing purposes I would set all significance cutoffs to 1 (like I do below; so that no filtering is occurring). You also may need to use the latest version of clusterProfiler.

> library(clusterProfiler)
> 
> keggChem <- c("C00130","C00946","C02990","C01762","C01081","C00315","C00750",
+               "C00882","C05385","C16300","C03406","C00357","C05282")
> 
> cpd_enrich_result <- enrichKEGG(keggChem ,
+                                 organism = "cpd",
+                                 minGSSize=1,
+                                 pvalueCutoff=1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/cpd/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway"...
> 
> cpd_enrich_result
#
# over-representation test
#
#...@organism    KEGG Compound 
#...@ontology    KEGG 
#...@keytype     kegg_compound 
#...@gene        chr [1:13] "C00130" "C00946" "C02990" "C01762" "C01081" "C00315" "C00750" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...29 enriched terms found
'data.frame':   29 obs. of  14 variables:
 $ category      : chr  "Metabolism" "Metabolism" "Metabolism" "Metabolism" ...
 $ subcategory   : chr  "Metabolism of cofactors and vitamins" "Metabolism of other amino acids" "Chemical structure transformation maps" "Global and overview maps" ...
 $ ID            : chr  "map00770" "map00410" "map01065" "map01240" ...
 $ Description   : chr  "Pantothenate and CoA biosynthesis" "beta-Alanine metabolism" "Biosynthesis of alkaloids derived from histidine and purine" "Biosynthesis of cofactors" ...
 $ GeneRatio     : chr  "2/11" "2/11" "2/11" "4/11" ...
 $ BgRatio       : chr  "30/6502" "32/6502" "35/6502" "328/6502" ...
 $ RichFactor    : num  0.0667 0.0625 0.0571 0.0122 0.0526 ...
 $ FoldEnrichment: num  39.41 36.94 33.78 7.21 31.11 ...
 $ zScore        : num  8.68 8.39 8 4.75 7.66 ...
 $ pvalue        : num  0.0011 0.00126 0.0015 0.00158 0.00177 ...
 $ p.adjust      : num  0.0103 0.0103 0.0103 0.0103 0.0103 ...
 $ qvalue        : num  0.00447 0.00447 0.00447 0.00447 0.00447 ...
 $ geneID        : chr  "C00750/C00882" "C00315/C00750" "C00130/C01762" "C00130/C01081/C00750/C00882" ...
 $ Count         : int  2 2 2 4 2 2 2 2 2 2 ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320 

> 
> as.data.frame(cpd_enrich_result)[1:3,]
           category                            subcategory       ID
map00770 Metabolism   Metabolism of cofactors and vitamins map00770
map00410 Metabolism        Metabolism of other amino acids map00410
map01065 Metabolism Chemical structure transformation maps map01065
                                                         Description GeneRatio
map00770                           Pantothenate and CoA biosynthesis      2/11
map00410                                     beta-Alanine metabolism      2/11
map01065 Biosynthesis of alkaloids derived from histidine and purine      2/11
         BgRatio RichFactor FoldEnrichment   zScore      pvalue   p.adjust
map00770 30/6502 0.06666667       39.40606 8.679041 0.001103125 0.01026385
map00410 32/6502 0.06250000       36.94318 8.390155 0.001255496 0.01026385
map01065 35/6502 0.05714286       33.77662 8.003453 0.001501922 0.01026385
              qvalue        geneID Count
map00770 0.004470644 C00750/C00882     2
map00410 0.004470644 C00315/C00750     2
map01065 0.004470644 C00130/C01762     2
> 
> 
> packageVersion("clusterProfiler")
[1] ‘4.14.4’
>

guidohooiveld mentioned this issue Oct 23, 2024

KEGG analysis of metabolites #712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run enrichment analysis of metabolites using clusterProfier #731

How to run enrichment analysis of metabolites using clusterProfier #731

Krithika-Bhuvan commented Oct 15, 2024

guidohooiveld commented Oct 22, 2024

lungboyz commented Jan 10, 2025

Krithika-Bhuvan commented Jan 10, 2025 •

edited

Loading

guidohooiveld commented Jan 10, 2025 •

edited

Loading

How to run enrichment analysis of metabolites using clusterProfier #731

How to run enrichment analysis of metabolites using clusterProfier #731

Comments

Krithika-Bhuvan commented Oct 15, 2024

guidohooiveld commented Oct 22, 2024

lungboyz commented Jan 10, 2025

Krithika-Bhuvan commented Jan 10, 2025 • edited Loading

guidohooiveld commented Jan 10, 2025 • edited Loading

Krithika-Bhuvan commented Jan 10, 2025 •

edited

Loading

guidohooiveld commented Jan 10, 2025 •

edited

Loading