-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different results each the CITE-seq count is run #165
Comments
Hey Colin,
This is really strange as there is no randomness in the code, it should
pretty much be the exact same output each time for the same parameters.
Could you show me some examples?
…On Fri, 11 Mar 2022, 22:05 colin986, ***@***.***> wrote:
Hi,
I'm getting a different output each time CITE-seq count is run. My
whitelist and parameters do not each each time.
Is this expected? Is there anyway to control this in terms of
reproducibility (i.e. setting a seed) ?
Thanks,
Colin
—
Reply to this email directly, view it on GitHub
<#165>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJVO2CYBEN5O25JQ467P33U7OYQXANCNFSM5QQV3YFA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi Hoohm, Thanks for coming back to me. You were right. The CITE-seq count output is the same each time. The variation in the result seems to come from the HTODemux function in Seurat when using clara clustering option (When using kmeans clustering the output is consistent). The result changes each time I run CITE-Seq count. The function has an option to set the seed, but I've still found that the output changes each time. So what I mean here is that HTODemux is reproducible with the same CITE-seq count output. CITE-seq count is also reproducible. However, when I re-run CITE-Seq count and HTODemux I get a different result - I don't understand why this is happening. I know HTODemux draws 100 samples from the dataset for clara clustering - I wonder if during the CITE-Seq count the samples, while the same, the data are written in a a different order and the 100 samples are drawn in a different order - and that gives rise to variability in the output? Thanks, |
I can verify "different" The difference is in the column order, not in the actual content of the count matrices. Reordering the columns to match each other (or the whitelist) results in identical matrices. I haven't been able to pin down the source of the variation. I can't see any random functions. Initially I suspected parallelization, with different chunks finishing in different orders depending on the run, but the problem persists even with only one thread. This difference in ordering produces different assignments from I haven't looked at why, but @colin986's suggestion that different ordering might produce different sampling (even with the same seed) seems plausible to me. Setting For now I am reordering |
Thank you for looking into this. I was afraid there was a bug I missed in my code but the downstream issues seem more plausible. Btw, if you are interested to test it out, I have a beta branch rewritten in Polars that is available. Some inputs names have changed but it should overall decrease memory usage and improve speeds. |
Thanks @Hoohm. I'll check out the beta branch when I get a moment. I'm not sure if it is worth making a feature request, but I do think it would be helpful if |
Hi,
I'm getting a different output each time CITE-seq count is run. My whitelist and parameters do not each each time.
Is this expected? Is there anyway to control this in terms of reproducibility (i.e. setting a seed) ?
Thanks,
Colin
The text was updated successfully, but these errors were encountered: