Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup CB loading via indexing #43

Open
wants to merge 1 commit into
base: v1.1
Choose a base branch
from
Open

Conversation

aerval
Copy link

@aerval aerval commented Dec 3, 2024

Hello,

Cell barcode loading was quite slow for me (~48 min for a file with 8000 barcodes) and I figured it was because accessing pandas data frame via column matching is quite slow. The implementation via indexing of the df speeds this up dramatically (less than 5 seconds for the same file for me), while producing the exact same dictionary (CB_dict, tested on two different files via md5).

I understand that this may not be a problem on all systems and definitely is not an issue on smaller dataset but maybe it helps others as well.

Cheers,
Philipp

- much faster to access barcode indexes than search via matching
- some code optimization:
  - read file via for loop
  - access df column directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant