Performance comparison #272
Replies: 2 comments
-
Interesting comparisons! The rayon_basic implementation actually looks about that same I would have done it. Here's a version using MPMC channels:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @zaeleus ! By the way I should probably mention my file and machine even though I'm only trying to compare roughly and between the various strategies. I'm using a Mac Mini with an 8-core M1 processor from fall 2020 16 GB RAM. Pretty average specs. Then I'm running on the chm13v2 genome which I downloaded gzipped but I unzipped/re-zipped with I added three more strategies for counting CATs to my repository and added my results above. I hadn't tried with crossbeam (thanks!), but I did attempt a tokio multitask implementation spawning tokio tasks and using analogous tokio channels for inter-task communication. But I gave that up. Anyway I'm not sure why the crossbeam one doesn't much my rayon_basic. At one point, I did get a ~2x speed boost changing from a classic for loop + array index + increment count variable, to the iterator with filter()/count(), but that's what you did, so I dunno. I changed it a little so I could get it to run on my normal non-nightly Rust, but I kept the main stuff intact. |
Beta Was this translation helpful? Give feedback.
-
I made a crate with three binaries that all count the number of times CAT exists in a FASTA file. I used noodles_bgzf and noodles_fasta. The goal was just a learning exercise to understand the runtime and memory footprint in a few strategies reading a compressed/indexed FASTA.
https://github.com/andypohl/cat-count
I did three styles:
Some basic results, not measured super scientifically just my rough observations:
Does anyone have any suggestions? I'm still learning Rust, so I'm sure there's a lot of tricks I'm not aware of yet.
It seems like the Async reader for FASTA or BGZF is missing some capabilities that the synchronous code has. Namely that there's a low-level seqeuence_reader iterator in the non-async part not in the async part, and I couldn't figure out a way to seek to a particular sequence in a BGZF-compressed fasta with the async readers.
Again any feedback is welcome!
Beta Was this translation helpful? Give feedback.
All reactions