Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEGA-106-Segmentation-Metrics #36

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

cornhundred
Copy link
Collaborator

Adding a qc module that will contain a iST segmentation metrics method.

@jaspreetishar jaspreetishar marked this pull request as ready for review November 19, 2024 17:12
@jaspreetishar jaspreetishar force-pushed the DEGA-106-Segmentation-Metrics branch from 27e9e5a to 95956a1 Compare November 20, 2024 21:39
@jaspreetishar jaspreetishar marked this pull request as draft December 18, 2024 21:13
@cornhundred cornhundred marked this pull request as ready for review January 17, 2025 02:03
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/pre/boundary_tile.py Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
@jaspreetishar jaspreetishar marked this pull request as draft January 17, 2025 17:14
@jaspreetishar jaspreetishar marked this pull request as ready for review January 17, 2025 17:14
@jaspreetishar jaspreetishar self-assigned this Jan 17, 2025
Copy link
Contributor

@huanlity huanlity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jaspreetishar, great work! I had some minor comments that is focusing on readability, simplicity, and reproducibility of our codebase. Other than that, can share the data dir data/segmentation_metrics_data so I can do a quick test of the QC notebook? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaspreetishar These are minor but can enhance readability, simplicity, and mostly important, reproducibility.

For the string that needs to be repeated multiple times (say more than 2-3 times),it is better to make it a variable so you only need to change one place later if needed. 2. Additionally, use for loop if possible.

For example:
cell_polygon_metadata_files = [f"data/segmentation_metrics_data/inputs/{directories[0]}/cell_metadata.parquet", f"data/segmentation_metrics_data/inputs/{directories[1]}/cell_metadata.parquet", f"data/segmentation_metrics_data/inputs/{directories[2]}/cell_metadata.parquet", None]

and

paths_output_cell_metrics = [f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[0]}.csv", f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[1]}.csv", f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[2]}.csv", f"data/segmentation_metrics_data/outputs/cell_specific_metrics_{dataset_name}-{segmentation_approaches[3]}.csv"]

can be simplified as:
cell_polygon_metadata_files = [ f'{seg_qc_dir}/inputs/{d}/cell_metadata.parquet' for d in directories[:3] ] + [None]

paths_output_cell_metrics = cell_polygon_metadata_files = [ f'{seg_qc_dir}/outputs/cell_specific_metrics_{dataset_name}-{sa}.csv' for sa in segmentation_approaches ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the example data. I am getting an error during the 4th iteration of the for loop, presumably, it is from skin_xenium_default. Do you have any idea?
Screenshot 2025-01-21 at 3 51 22 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was due to a faulty transformation matrix. I have fixed it and will share it shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the effort to simplify the code. In lines with reproducibility, the variable directories can be calculated as following so you don't need to spell them out and next person on a different dataset only need to change the value of variable dataset_name instead of 4 strings: directories = [f"{dataset_name.lower()}_{d.lower().replace('-','_')}" for d in segmentation_approaches]. This one is minor you don't have to change the code but it will be good to get familiar with.

src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
gene = "feature_name"
transcript_index = "transcript_id"

trx = transform_transcript_coordinates(technology='Xenium', chunk_size=1000000,
Copy link
Contributor

@huanlity huanlity Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only support Xenium? What if we are QCing a MERSCOPE dataset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added support for Merscope as well

src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
src/celldega/qc/__init__.py Outdated Show resolved Hide resolved
notebooks/Segmentation_QC.ipynb Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants