-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dbit #95
Merged
Merged
Dbit #95
Changes from 90 commits
Commits
Show all changes
99 commits
Select commit
Hold shift + click to select a range
af2a72f
Added draft of DBiT reader
lillux c50fc92
Hardcoded variable for grid scaling
lillux faeb59d
Merge branch 'main' of https://github.com/lillux/spatialdata-io
lillux 1666707
Added comments
lillux 61b71a0
Ignore spyder data
lillux 7dec791
Remove spyder data
lillux 22c25e4
Removed metadata
lillux 8c35aef
Add symbol export spec in DBiT
lillux d944741
Fixed typos
lillux b8fd2a6
Added _constants for DBiT, and related check
lillux d12c518
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6ea5614
Open .h5ad with anndata instead of scanpy
lillux b41451d
Merge branch 'dbit' of https://github.com/lillux/spatialdata-io into …
lillux ed34628
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] fb55943
Fixed import
lillux 697064e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f2960aa
fixes from pre-commit.ci
lillux 1c493e3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 63a916f
fixed exceptions and type hints
lillux 3234bba
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9f862da
Fix typing
lillux d7dc6e2
fixed variable names
lillux 24bb806
Merge branch 'scverse:main' into dbit
lillux 48fca73
Merge branch 'scverse:main' into dbit
lillux fbe6c82
code revision started
lillux 3821757
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8d4ae3e
Code revision part 2
lillux 92df11d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6c9c748
code revision fix docstring
lillux 2c62b45
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] fbd48b3
Code revision small fixes
lillux aef5275
Code review small fixes to docstrings
lillux 96efc11
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d0e3023
Code review inject_docs
lillux 6adfcf4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0544e46
Code review remove inject_doc
lillux d5ad8f8
Code review docs api.md
lillux 5d50168
Code review docstring fix
lillux 568566a
Code review Typo
lillux 3bb717f
Typo
lillux e3a1a66
Docs fix
lillux a4abc3a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f5c34db
Typo docstring
lillux 1f7d88b
Typo
lillux 06177c5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f76c89b
Searching error...
lillux 912948b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 950b2b1
Typo fix
lillux 8fcd5b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 158b0bb
Fixing docstring
lillux b1d35ad
Docstring done. Fixing typo
lillux d40a00b
Fixed dbit function to remove duplcated code
lillux 2f4e59a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 090d4f7
Docstring fix
lillux cc2add1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c97196f
Docstring fix
lillux e326942
Fix docstring typing
lillux 82dea42
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 583eb81
Typing stuff
lillux 0795475
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1426f39
Typing ...
lillux 69191f0
typing
lillux 0ed986d
Fix naming
lillux 187d1dc
Typing
lillux ce53ad7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1254a89
Typing
lillux 82119de
Working on type hints in docstring
lillux 8e8e400
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] efd4805
Fixed typo
lillux f214968
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 88b7197
Tyiping
lillux 4372901
Fix typo
lillux 76b228b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5e3c33e
Mypy fix
lillux 632cb22
Mypy
lillux c5785d2
Mypy complains
lillux 1813285
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7681f0e
Test for mypy
lillux 1edaaff
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6cddf83
Fix mypy
lillux aa4cd2f
Typo docstring and fix
lillux d68969e
Fix mypy
lillux c2227ab
Mypy
lillux 74ca806
Fix mypy
lillux 7578970
fix mypy
lillux ee959b8
Docstring fix
lillux c04ba2a
mypy errors fix
lillux 1b95a97
mypy errors
lillux fd6d9cc
ignore mypy
lillux aa58adc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a965e32
Merge branch 'scverse:main' into dbit
lillux c49c5d5
Now the dimension of the DBiT grid depends on the number of barcodes.
lillux ffd8a9d
Docstrings fix
lillux b1173c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2ec3f7a
Added DBiT-seq reader to changelog
lillux 955a52c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 90ba226
Merge branch 'scverse:main' into dbit
lillux b9ddfb0
Merge branch 'main' into dbit
lillux cb94f9b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ __pycache__/ | |
# IDEs | ||
/.idea/ | ||
/.vscode/ | ||
/.spyproject/ | ||
|
||
# docs | ||
/docs/generated/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,4 +20,5 @@ I/O for the `spatialdata` project. | |
steinbock | ||
merscope | ||
mcmicro | ||
dbit | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,328 @@ | ||
from __future__ import annotations | ||
|
||
import os | ||
import re | ||
from pathlib import Path | ||
from re import Pattern | ||
from typing import Optional, Union | ||
|
||
import anndata as ad | ||
import numpy as np | ||
import pandas as pd | ||
import shapely | ||
import spatialdata as sd | ||
from dask_image.imread import imread | ||
from numpy.typing import NDArray | ||
from spatialdata import SpatialData | ||
from spatialdata._logging import logger | ||
from xarray import DataArray | ||
|
||
from spatialdata_io._constants._constants import DbitKeys | ||
from spatialdata_io._docs import inject_docs | ||
|
||
__all__ = ["dbit"] | ||
|
||
|
||
def _check_path( | ||
path: Path, | ||
pattern: Pattern[str], | ||
key: DbitKeys, | ||
path_specific: Optional[str | Path] = None, | ||
optional_arg: bool = False, | ||
) -> tuple[Union[Path, None], bool]: | ||
""" | ||
Check that the path is valid and match a regex pattern. | ||
|
||
Parameters | ||
---------- | ||
path : | ||
The path of the main directory where to search for the path. | ||
path_specific : | ||
path to the file, if it is not in the main directory. | ||
pattern : | ||
regex pattern. | ||
key : | ||
String to match in the path or path_specific path. | ||
optional_arg : | ||
User specify if the file to search is mandatory (optional_arg=True, raise an Error if not found) | ||
or optional (optional_arg=False, raise a Warning if not found). | ||
|
||
Raises | ||
------ | ||
FileNotFoundError | ||
The error is raised if no match is found in the given paths and optional_arg=True. | ||
|
||
Returns | ||
------- | ||
tuple(pathlib.PosixPath, bool) | ||
return a tuple(pathlib.PosixPath, bool). The bool is a flag that indicate if one of the supplied path arguments points to a file that match the supplied key. | ||
""" | ||
flag = False | ||
file_path = None | ||
try: | ||
checked_file = [i for i in os.listdir(path) if pattern.match(i)][0] # this is the filename | ||
file_path = Path.joinpath(path, checked_file) | ||
flag = True | ||
except IndexError: | ||
# handle case in which the searched file is not in the same directory as path | ||
if path_specific is not None: | ||
if os.path.isfile(path_specific): | ||
file_path = Path(path_specific) | ||
flag = True | ||
else: | ||
if optional_arg: | ||
logger.warning(f"{path_specific} is not a valid path for {key}. No {key} will be used.") | ||
else: | ||
raise FileNotFoundError(f"{path_specific} is not a valid path for a {key} file.") | ||
else: | ||
if optional_arg: | ||
logger.warning(f"No file containing {key} found in folder {path}. No {key} will be used.") | ||
else: | ||
raise FileNotFoundError(f"No file containing {key} found in folder {path}.") | ||
return file_path, flag | ||
|
||
|
||
def _barcode_check(barcode_file: Path) -> pd.DataFrame: | ||
""" | ||
Check that the barcode file is formatted as expected. | ||
|
||
What do we expect : | ||
A tab separated file, headless, with 2 columns: | ||
column 0 : str, composed by "A" or "B", followed by a number, like 'A6', 'A22','B7' | ||
column 1 : str, of 8 chars representing nucleotides, like 'AACTGCTA' | ||
|
||
Parameters | ||
---------- | ||
barcode_file : | ||
The path to the barcode file. | ||
|
||
Raises | ||
------ | ||
ValueError : | ||
ValueError is raised if a field of the file does not comply with the expected pattern. | ||
Appropriate error message is printed. | ||
|
||
Returns | ||
------- | ||
pd.DataFrame : | ||
A pandas.DataFrame with 2 columns, named 'A' and 'B', with a barcode as row index. | ||
Columns 'A' and 'B' contains an int each, that are the spatial coordinate of the barcode. | ||
The columns are ordered in ascending order. | ||
|
||
""" | ||
df = pd.read_csv(barcode_file, header=None, sep="\t") | ||
# check if there are 2 columns | ||
if len(df.columns) != 2: | ||
raise ValueError( | ||
f"The barcode file you passed at {barcode_file} does not have 2 columns.\nYour file has to be formatted with 2 columns, the first for positions, the second for the barcode, as follows:\n\nA1 AACCTTGG\nA2 GGCATGTA\nA3 GCATATGC\n..." | ||
) | ||
# check if the data in the columns are correct. | ||
# Pattern 1: match A or B at the start, then match 1 or 2 numbers at the end. Case sensitive. | ||
patt_position = re.compile(r"(^[A|B])([\d]{1,2}$)") | ||
# Pattern 2: match nucleotides string of 8 char. Case insensitive. | ||
patt_barcode = re.compile(r"^[A|a|T|t|C|c|G|g]{8}$") | ||
# dict, used to collect data after matching | ||
bc_positions: dict[str, dict[str, str]] = {} | ||
# line[0]: row index, line[1] row values. line[1][0] : barcode coordinates, line[1][1] : barcode | ||
for line in df.iterrows(): | ||
if not bool(patt_position.fullmatch(line[1][0])): | ||
raise ValueError( | ||
f"Row {line[0]}, has an incorrect positional id: {line[1][0]}, \nThe correct pattern for the position is a str, containing a letter between A or B, and one or two digits. Case insensitive." | ||
) | ||
if not bool(patt_barcode.fullmatch(line[1][1])): | ||
raise ValueError( | ||
f"Row {line[0]} has an incorrect barcode: {line[1][1]}, \nThe correct pattern for a barcode is a str of 8 nucleotides, each char is a letter between A,T,C,G. Case insensitive." | ||
) | ||
barcode = line[1][1] | ||
letter = line[1][0][0] | ||
try: | ||
bc_positions[barcode][letter] = line[1][0][1:] | ||
except KeyError: | ||
bc_positions[barcode] = {} | ||
bc_positions[barcode][letter] = line[1][0][1:] | ||
# return pandas.DataFrame, in (pseudo)long form | ||
return pd.DataFrame(bc_positions).transpose() | ||
|
||
|
||
def _xy2edges(xy: list[int], scale: float = 1.0, border: bool = True, border_scale: float = 1) -> NDArray[np.double]: | ||
""" | ||
Construct vertex coordinate of a square from the barcode coordinates. | ||
|
||
The constructed square has a scalable border. | ||
|
||
Parameters | ||
---------- | ||
xy : | ||
lillux marked this conversation as resolved.
Show resolved
Hide resolved
|
||
coordinate of the spot identified by its barcode. | ||
scale : | ||
Resize the square. | ||
border : | ||
If True, the square is shrinked toward its center, leaving an empty border. | ||
border_scale : | ||
The factor by which the border is scaled. | ||
The default is 1. It corresponds to a border length of 0.125 * length of the square's edge | ||
|
||
Returns | ||
------- | ||
ndarray | ||
The resized square derived from the barcoded reads coordinates of a certain spot. | ||
|
||
lillux marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
# unpack coordinates | ||
x, y = xy | ||
# create border grid | ||
border_grid = np.array([[0.125, 0.125], [0.125, -0.125], [-0.125, -0.125], [-0.125, 0.125], [0.125, 0.125]]) | ||
# create square grid | ||
square = np.array([[0, 0], [0, 1], [1, 1], [1, 0], [0, 0]]) | ||
# position square on the right place following the barcodes coordinates | ||
# barcode coordinates starts at 1, so we subtract 1 to start at 0 | ||
square[:, 0] += x - 1 | ||
square[:, 1] += y - 1 | ||
# check if user wants border, and scale accordingly | ||
if border: | ||
return (np.array(square) * scale) + (border_grid * border_scale) | ||
|
||
return np.array(square) * scale | ||
|
||
|
||
@inject_docs(vx=DbitKeys) | ||
def dbit( | ||
path: str | Path, | ||
anndata_path: Optional[str] = None, | ||
barcode_position: Optional[str] = None, | ||
image_path: Optional[str] = None, | ||
dataset_id: Optional[str] = None, | ||
border: bool = True, | ||
border_scale: float = 1, | ||
) -> SpatialData: | ||
""" | ||
Read DBiT experiment data (Deterministic Barcoding in Tissue) | ||
|
||
This function reads the following files: | ||
|
||
- ``{vx.COUNTS_FILE!r}`` : Counts matrix. | ||
- ``{vx.BARCODE_POSITION!r}`` : Barcode file. | ||
- ``{vx.IMAGE_LOWRES_FILE!r}`` : Histological image | Optional. | ||
|
||
.. seealso:: | ||
|
||
- `High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue <https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8/>`_. | ||
|
||
Parameters | ||
---------- | ||
path : | ||
Path to the directory containing the data. | ||
anndata_path : | ||
path to the counts and metadata file. | ||
barcode_position : | ||
path to the barcode coordinates file. | ||
dataset_id : | ||
Dataset identifier to name the constructed `SpatialData` elements. | ||
If not given, filename is used as dataset_id | ||
image_path : | ||
path to the low resolution image. | ||
It expect the image to be correctly cropped and transformed. | ||
border : | ||
Value passed internally to _xy2edges() | ||
If True, the square is shrinked toward its center, leaving an empty border. | ||
border_scale : | ||
Value passed internally to _xy2edges() | ||
The factor by which the border is scaled. | ||
The default is 1. It corresponds to a border length of (0.125 * length of the square's edge) | ||
|
||
Returns | ||
------- | ||
:class:`spatialdata.SpatialData`. | ||
""" | ||
path = Path(path) | ||
# if path is invalid, raise error | ||
if not os.path.isdir(path): | ||
raise FileNotFoundError( | ||
f"The path you have passed: {path} has not been found. A correct path to the data directory is needed." | ||
) | ||
# compile regex pattern to find file name in path, according to _constants.DbitKeys() | ||
patt_h5ad = re.compile(f".*{DbitKeys.COUNTS_FILE}") | ||
patt_barcode = re.compile(f".*{DbitKeys.BARCODE_POSITION}.*") | ||
patt_lowres = re.compile(f".*{DbitKeys.IMAGE_LOWRES_FILE}") | ||
|
||
# search for files paths. Gives priority to files matching the pattern found in path. | ||
anndata_path_checked = _check_path( | ||
path=path, path_specific=anndata_path, pattern=patt_h5ad, key=DbitKeys.COUNTS_FILE | ||
)[0] | ||
barcode_position_checked = _check_path( | ||
path=path, path_specific=barcode_position, pattern=patt_barcode, key=DbitKeys.BARCODE_POSITION | ||
)[0] | ||
image_path_checked, hasimage = _check_path( | ||
path=path, | ||
path_specific=image_path, | ||
pattern=patt_lowres, | ||
key=DbitKeys.IMAGE_LOWRES_FILE, | ||
optional_arg=True, | ||
) | ||
|
||
# read annData. | ||
adata = ad.read_h5ad(anndata_path_checked) | ||
# Read barcode. | ||
bc_df = _barcode_check(barcode_file=barcode_position_checked) # type: ignore | ||
|
||
# add barcode positions to annData. | ||
# A and B naming follow original publication and protocol | ||
adata.obs["array_A"] = [int(bc_df.loc[x[8:16], "A"]) for x in adata.obs_names] | ||
adata.obs["array_B"] = [int(bc_df.loc[x[:8], "B"]) for x in adata.obs_names] | ||
# sort annData by barcode position. Barcode A first, then Barcode B | ||
adata.obs.sort_values(by=["array_A", "array_B"], inplace=True) | ||
|
||
# populate annData | ||
if dataset_id is None: # if no dataset_id, use file name as id. | ||
logger.warning("No dataset_id received as input.") | ||
dataset_id = ".".join( | ||
anndata_path_checked.name.split(".")[:-1] # type: ignore | ||
) # this is the filename stripped from the file extension | ||
logger.warning(f"{dataset_id} is used as dataset_id.") | ||
|
||
adata.obs["region"] = dataset_id | ||
adata.obs["sample"] = dataset_id | ||
adata.obs["region"] = adata.obs["region"].astype("category") | ||
# assignment of pixel id follow the row ordering of adata.obs | ||
adata.obs["pixel_id"] = np.arange(len(adata.obs_names)) | ||
adata.uns["spatialdata_attrs"] = {"region_key": "region", "region": dataset_id, "instance_key": "pixel_id"} | ||
# the array that we will create below has 2 columns without header | ||
# so it is up to the user to know that: | ||
# the first columns is 'array_A' | ||
# the second column in 'array_B'. | ||
adata.obsm["spatial"] = adata.obs[["array_A", "array_B"]].values | ||
# parse data from annData using SpatialData parser | ||
table_data = sd.models.TableModel.parse(adata) | ||
|
||
# read and convert image for SpatialData | ||
# check if image exist, and has been passed as argument | ||
if hasimage: | ||
image = imread(image_path_checked).squeeze().transpose(2, 0, 1) # channel, y, x | ||
image = DataArray(image, dims=("c", "y", "x"), name=dataset_id) | ||
image_sd = sd.models.Image2DModel.parse(image) | ||
# calculate scale factor of the grid wrt the histological image. | ||
# this is needed because we want to mantain the original histological image | ||
# dimensions, and scale the grid accordingly in such a way that the grid | ||
# overlaps with the histological image. | ||
# We assume that the grid is a square, and indeed it is if we follow the DBiT protocol. | ||
grid_length = 50 # hardcoded lines number | ||
# we are passing an image, that is supposed to be perfectly | ||
# cropped to match the tissue part from which the data are collected. | ||
# We assume the user has already cropped and transformed everything correctly. | ||
if hasimage: | ||
scale_factor = max(image_sd.shape) / grid_length | ||
else: | ||
scale_factor = 1 | ||
# contruct polygon grid with grid coordinates | ||
xy = adata.obs[["array_A", "array_B"]].values.astype(int) | ||
f = np.array( | ||
[_xy2edges(x, scale=scale_factor, border=border, border_scale=scale_factor * border_scale) for x in xy] | ||
) | ||
ra = shapely.to_ragged_array([shapely.Polygon(x) for x in f]) | ||
grid = sd.models.ShapesModel.parse(ra[1], geometry=ra[0], offsets=ra[2], index=adata.obs["pixel_id"].copy()) | ||
# create SpatialData object! | ||
sdata = sd.SpatialData(table=table_data, shapes={dataset_id: grid}) | ||
if hasimage: | ||
imgname = dataset_id + "_image" | ||
sdata.images[imgname] = image_sd | ||
return sdata |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just out of curiosity, but which pipeline/preprocessing output an h5ad directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing is that DBiT seq can be used for both scATAC and scRNA data. We currently use a simple script to process fastq so that they can be feeded to any scRNA/scATAC processing pipeline. From that point on, it's just SOTA processing producing
h5ad
files. Theobs_name
in theh5ad
will be then used to map texels on the surface.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thank you for the clarification @dawe !