Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run enrichment analysis of metabolites using clusterProfier #731

Open
Krithika-Bhuvan opened this issue Oct 15, 2024 · 4 comments
Open

Comments

@Krithika-Bhuvan
Copy link

I have a list of metabolites, In the form of metabolite names, and also in the form of KEGG Ids. How can i run enrichment analysis on this data ? Please advise

Example data:

name keggid
Pyruvate  C00022
Acetyl-CoA  C00024
2OG  C00026
Glycine  C00037
Succinate  C00042
Aspartate  C00049
@guidohooiveld
Copy link

A pointer is given in, for example, the protocol recently published in Nature Methods, See section 1.9 on page 16.
https://doi.org/10.1038/s41596-024-01020-z

In essence you can use the enrichKEGG function (since you have a list of KEGG ids), and use it with organism = "cpd" (because the input are 'compounds'.

> library(clusterProfiler)
> 
> input <- data.frame("name"=c("Pyruvate","Acetyl-CoA","2OG","Glycine","Succinate","Aspartate"),
+                     "keggid"=c("C00022","C00024","C00026","C00037","C00042","C00049") )
> 
> ## run over-representation analysis.
> ## note not cutoffs are applied for significanc and minimum number of compunds.
> cpd_enrich_result <- enrichKEGG(input[,"keggid"],
+                                 organism = "cpd",
+                                 minGSSize=1,
+                                 pvalueCutoff=1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/cpd/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway"...
> cpd_enrich_result
#
# over-representation test
#
#...@organism    KEGG Compound 
#...@ontology    KEGG 
#...@keytype     kegg_compound 
#...@gene        chr [1:6] "C00022" "C00024" "C00026" "C00037" "C00042" "C00049"
#...pvalues adjusted by 'BH' with cutoff <1 
#...103 enriched terms found
'data.frame':   103 obs. of  14 variables:
 $ category      : chr  "Human Diseases" "Metabolism" "Metabolism" "Metabolism" ...
 $ subcategory   : chr  "Cancer: overview" "Global and overview maps" "Chemical structure transformation maps" "Carbohydrate metabolism" ...
 $ ID            : chr  "map05230" "map01200" "map01060" "map00630" ...
 $ Description   : chr  "Central carbon metabolism in cancer" "Carbon metabolism" "Biosynthesis of plant secondary metabolites" "Glyoxylate and dicarboxylate metabolism" ...
 $ GeneRatio     : chr  "6/6" "6/6" "6/6" "5/6" ...
 $ BgRatio       : chr  "37/6500" "112/6500" "141/6500" "64/6500" ...
 $ RichFactor    : num  0.1622 0.0536 0.0426 0.0781 0.0746 ...
 $ FoldEnrichment: num  175.7 58 46.1 84.6 80.8 ...
 $ zScore        : num  32.4 18.5 16.5 20.4 20 ...
 $ pvalue        : num  2.22e-14 2.29e-11 9.38e-11 4.70e-10 5.95e-10 ...
 $ p.adjust      : num  2.29e-12 1.18e-09 3.22e-09 1.10e-08 1.10e-08 ...
 $ qvalue        : num  2.11e-13 1.08e-10 2.96e-10 1.01e-09 1.01e-09 ...
 $ geneID        : chr  "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042" ...
 $ Count         : int  6 6 6 5 5 5 4 4 4 4 ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z 

> 
> as.data.frame(cpd_enrich_result)[1:3, ]
               category                            subcategory       ID
map05230 Human Diseases                       Cancer: overview map05230
map01200     Metabolism               Global and overview maps map01200
map01060     Metabolism Chemical structure transformation maps map01060
                                         Description GeneRatio  BgRatio
map05230         Central carbon metabolism in cancer       6/6  37/6500
map01200                           Carbon metabolism       6/6 112/6500
map01060 Biosynthesis of plant secondary metabolites       6/6 141/6500
         RichFactor FoldEnrichment   zScore       pvalue     p.adjust
map05230 0.16216216      175.67568 32.38612 2.224526e-14 2.291262e-12
map01200 0.05357143       58.03571 18.50615 2.289233e-11 1.178955e-09
map01060 0.04255319       46.09929 16.45613 9.376114e-11 3.219132e-09
               qvalue                                    geneID Count
map05230 2.107446e-13 C00022/C00024/C00026/C00037/C00042/C00049     6
map01200 1.084374e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
map01060 2.960878e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
> 
> dotplot(cpd_enrich_result)
> 

image

@lungboyz
Copy link

I tried to run this using a set of compound IDs, and I got an error. I also get the same error using your example code. , but if I run enrichKEGG using gene ID, it works fine, suggesting that the KEGG website is okay, but the cpd IDs don't map.

keggChem [1:13]
[1] "C00130" "C00946" "C02990" "C01762" "C01081" "C00315" "C00750" "C00882"
[9] "C05385" "C16300" "C03406" "C00357" "C05282"

kk <- enrichKEGG(gene = keggChem[1:10],
organism = 'cpd',
pvalueCutoff = 0.05)

Error in content[, 1] : subscript out of bounds

@Krithika-Bhuvan
Copy link
Author

Krithika-Bhuvan commented Jan 10, 2025

got an error. I also get the same error using your example code. , but if I run enrichKEGG using gene ID, it works fine, suggesting that the KEGG website is okay, but the cpd IDs don't map.

Hello @lungboyz - thanks for sharing. Would it be possible to share an both - example with error , and the example that worked, so its easier to compare and understand ? Much appreciated

@guidohooiveld
Copy link

guidohooiveld commented Jan 10, 2025

@lungboyz

When using your 13 input ids it is working for me.... For testing purposes I would set all significance cutoffs to 1 (like I do below; so that no filtering is occurring). You also may need to use the latest version of clusterProfiler.

> library(clusterProfiler)
> 
> keggChem <- c("C00130","C00946","C02990","C01762","C01081","C00315","C00750",
+               "C00882","C05385","C16300","C03406","C00357","C05282")
> 
> cpd_enrich_result <- enrichKEGG(keggChem ,
+                                 organism = "cpd",
+                                 minGSSize=1,
+                                 pvalueCutoff=1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/cpd/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway"...
> 
> cpd_enrich_result
#
# over-representation test
#
#...@organism    KEGG Compound 
#...@ontology    KEGG 
#...@keytype     kegg_compound 
#...@gene        chr [1:13] "C00130" "C00946" "C02990" "C01762" "C01081" "C00315" "C00750" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...29 enriched terms found
'data.frame':   29 obs. of  14 variables:
 $ category      : chr  "Metabolism" "Metabolism" "Metabolism" "Metabolism" ...
 $ subcategory   : chr  "Metabolism of cofactors and vitamins" "Metabolism of other amino acids" "Chemical structure transformation maps" "Global and overview maps" ...
 $ ID            : chr  "map00770" "map00410" "map01065" "map01240" ...
 $ Description   : chr  "Pantothenate and CoA biosynthesis" "beta-Alanine metabolism" "Biosynthesis of alkaloids derived from histidine and purine" "Biosynthesis of cofactors" ...
 $ GeneRatio     : chr  "2/11" "2/11" "2/11" "4/11" ...
 $ BgRatio       : chr  "30/6502" "32/6502" "35/6502" "328/6502" ...
 $ RichFactor    : num  0.0667 0.0625 0.0571 0.0122 0.0526 ...
 $ FoldEnrichment: num  39.41 36.94 33.78 7.21 31.11 ...
 $ zScore        : num  8.68 8.39 8 4.75 7.66 ...
 $ pvalue        : num  0.0011 0.00126 0.0015 0.00158 0.00177 ...
 $ p.adjust      : num  0.0103 0.0103 0.0103 0.0103 0.0103 ...
 $ qvalue        : num  0.00447 0.00447 0.00447 0.00447 0.00447 ...
 $ geneID        : chr  "C00750/C00882" "C00315/C00750" "C00130/C01762" "C00130/C01081/C00750/C00882" ...
 $ Count         : int  2 2 2 4 2 2 2 2 2 2 ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320 

> 
> as.data.frame(cpd_enrich_result)[1:3,]
           category                            subcategory       ID
map00770 Metabolism   Metabolism of cofactors and vitamins map00770
map00410 Metabolism        Metabolism of other amino acids map00410
map01065 Metabolism Chemical structure transformation maps map01065
                                                         Description GeneRatio
map00770                           Pantothenate and CoA biosynthesis      2/11
map00410                                     beta-Alanine metabolism      2/11
map01065 Biosynthesis of alkaloids derived from histidine and purine      2/11
         BgRatio RichFactor FoldEnrichment   zScore      pvalue   p.adjust
map00770 30/6502 0.06666667       39.40606 8.679041 0.001103125 0.01026385
map00410 32/6502 0.06250000       36.94318 8.390155 0.001255496 0.01026385
map01065 35/6502 0.05714286       33.77662 8.003453 0.001501922 0.01026385
              qvalue        geneID Count
map00770 0.004470644 C00750/C00882     2
map00410 0.004470644 C00315/C00750     2
map01065 0.004470644 C00130/C01762     2
> 
> 
> packageVersion("clusterProfiler")
[1] ‘4.14.4’
> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants