Can't find my sample names #31

jsaintvanne · 2023-01-18T11:12:09Z

Hi,

I'm trying some worklows using ramclustR just very fast and I can't find my sample names in the output...
After take a look at the script of ramclustR function, I can see that you have a lot of results table containing all that you need (rt, intensity, cluster, etc...) But I can't find the sample names in the results MSP file (whereas the rownames of table are my sample names

Someone can help me please ?

Thanks a lot !

cbroeckl · 2023-01-18T15:31:48Z

@jsaintvanne - the spectra are stored in the .msp output file. The spectra that are exported are representative of all the files. While not perfectly accurate, you can picture each individual spectrum as the average spectrum for that compound, taking into account all the data in the dataset. So given that, every spectrum is associated with every sample - it is only the signal intensity that changes, which is stored in the SpecAbund data matrix and exported .csv file.

jsaintvanne · 2023-01-19T11:23:38Z

Thanks for your really fast answer @cbroeckl !

Here we work with a samplemetadata and different conditions but we analyze all that at the same time and I thought that ramclustR can differenciate that. So sorry or this stupid previous question and now an other one : should it be great for you to have something that take conditions of samples as input to be able to differenciate them and have the cluster in each condition (that can change a lot between blank and standard for example).

cbroeckl · 2023-01-19T15:42:59Z

@jsaintvanne - your sample names are delivered from the xcms object. Generally my approach to sample naming is to use concatentated factors. i.e.

treatment-4hr-rep1
control-2hr-rep3

such that the sample name can be split into separate factors. There is a function to enable the splitting, rc.expand.sample.names. Above, this would split the two sample names into a data frame with three columns, when i use '-' as the delimiter.

That said, i think that class specific clustering will be a difficult path forward:

ramclustR's algorithm is dependent on quantitative variation in the feature data. the less variation the less clear the relationships between features.
The feature grouping behavior is likely to be slightly different in different sample groups, and rectifying those discrepencies is not trivial. i.e. what if feature 1211 (for example) is part of C003 in one group and C008 in another group? That isn't to say that there aren't solutions, but they require a good deal of thought before implementing.

An alternative path which may alleviate your concerns is to switch from pearson's correlation to spearman's. Rank correlation will be much less prone to the influcence of the ouliers (blanks, for example) than pearsons. This is enabled in the main ramclustr function as option cor.method. Pearson's is default, but you could set it to cor.method = spearman.

hechth · 2023-01-24T14:27:56Z

@cbroeckl thanks for the explanation - didn't know this about the spearman correlation!

I assume that another option would be to actually run the individual conditions independently and then build networks/find identical or similar features across the groups using spectral matching?

cbroeckl · 2023-01-24T16:05:37Z

@hechth - absolutely could be done. A few items to consider:

are there enough samples in each group for correlation-based clustering to be meaningful? If not, it would be best to develop a peak shape based clustering as well. i had actually started down this path and lost steam and ultimately abandoned it, for lack of time to validate it. There is a clear path forward for it though. You can simultaneously use all the similarity metrics, retention time of the feature, correlation, and peak shape by expanding the existing similarity product score. In theory IMS data could also be incorporated, if available.
if you perform RAMClustR by sample groups then cluster spectra, how do you deal with feature assignments which are in conflict?
How do you deal with missing spectra in the blanks (NA values are a bit of a nuisance...).
If you are going to be performing clustering by sample type, would be be best to perform XCMS by sample type as well?
If two spectra from two groups align pretty well but imperfectly, what set of features should be used in the quantitative assignments - only the overlapping features or all features?

hechth mentioned this issue Apr 14, 2023

Possible future developments #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't find my sample names #31

Can't find my sample names #31

jsaintvanne commented Jan 18, 2023

cbroeckl commented Jan 18, 2023

jsaintvanne commented Jan 19, 2023

cbroeckl commented Jan 19, 2023

hechth commented Jan 24, 2023

cbroeckl commented Jan 24, 2023 •

edited

Loading

Can't find my sample names #31

Can't find my sample names #31

Comments

jsaintvanne commented Jan 18, 2023

cbroeckl commented Jan 18, 2023

jsaintvanne commented Jan 19, 2023

cbroeckl commented Jan 19, 2023

hechth commented Jan 24, 2023

cbroeckl commented Jan 24, 2023 • edited Loading

cbroeckl commented Jan 24, 2023 •

edited

Loading