-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEGA-106-Segmentation-Metrics #36
base: main
Are you sure you want to change the base?
Conversation
27e9e5a
to
95956a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jaspreetishar, great work! I had some minor comments that is focusing on readability, simplicity, and reproducibility of our codebase. Other than that, can share the data dir data/segmentation_metrics_data
so I can do a quick test of the QC notebook? Thanks!
notebooks/Segmentation_QC.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaspreetishar These are minor but can enhance readability, simplicity, and mostly important, reproducibility.
For the string that needs to be repeated multiple times (say more than 2-3 times),it is better to make it a variable so you only need to change one place later if needed. 2. Additionally, use for loop if possible.
For example:
cell_polygon_metadata_files = [f"data/segmentation_metrics_data/inputs/{directories[0]}/cell_metadata.parquet", f"data/segmentation_metrics_data/inputs/{directories[1]}/cell_metadata.parquet", f"data/segmentation_metrics_data/inputs/{directories[2]}/cell_metadata.parquet", None]
and
paths_output_cell_metrics = [f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[0]}.csv", f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[1]}.csv", f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[2]}.csv", f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[3]}.csv"]
can be simplified as:
cell_polygon_metadata_files = [ f'{seg_qc_dir}/inputs/{d}/cell_metadata.parquet' for d in directories[:3] ] + [None]
paths_output_cell_metrics = cell_polygon_metadata_files = [ f'{seg_qc_dir}/outputs/cell_specific_metrics_{dataset_name}-{sa}.csv' for sa in segmentation_approaches ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error was due to a faulty transformation matrix. I have fixed it and will share it shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the effort to simplify the code. In lines with reproducibility, the variable directories
can be calculated as following so you don't need to spell them out and next person on a different dataset only need to change the value of variable dataset_name
instead of 4 strings: directories = [f"{dataset_name.lower()}_{d.lower().replace('-','_')}" for d in segmentation_approaches]
. This one is minor you don't have to change the code but it will be good to get familiar with.
src/celldega/qc/__init__.py
Outdated
gene = "feature_name" | ||
transcript_index = "transcript_id" | ||
|
||
trx = transform_transcript_coordinates(technology='Xenium', chunk_size=1000000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only support Xenium? What if we are QCing a MERSCOPE dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added support for Merscope as well
Adding a qc module that will contain a iST segmentation metrics method.