Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with indices with left_exclusive join #824

Open
LucaMarconato opened this issue Jan 13, 2025 · 0 comments
Open

Problem with indices with left_exclusive join #824

LucaMarconato opened this issue Jan 13, 2025 · 0 comments

Comments

@LucaMarconato
Copy link
Member

LucaMarconato commented Jan 13, 2025

To reproduce

While working on some tests for #822, I discovered a bug with left_exclusive (unrelated to the bug addressed in the mentioned PR).

To reproduce please add the left_exclusive string in the pytest.mark.parameterize in test_inner_join_match_rows_duplicate_obs_indices() (test_relational_query.py).
The bug it's unrelated to the one fixed by the mentioned PR because the issue appears also if we comment this line from the test:

sdata["table"].obs.index = ["a"] * sdata["table"].n_obs

The error I get is the following:
@melonora could you please have a look at it?

Traceback

tests/core/query/test_relational_query.py:380 (test_inner_join_match_rows_duplicate_obs_indices[left_exclusive])
sdata_query_aggregation = SpatialData object
├── Points
│     └── 'points': DataFrame with shape: (<Delayed>, 5) (2D points)
├── Shapes
│     ├─...:
        points (Points), by_circles (Shapes), by_polygons (Shapes), values_circles (Shapes), values_polygons (Shapes)
join_type = 'left_exclusive'

    @pytest.mark.parametrize('join_type', ['left', 'right', 'inner', 'right_exclusive', 'left_exclusive'])
    def test_inner_join_match_rows_duplicate_obs_indices(sdata_query_aggregation: SpatialData, join_type: str) -> None:
        sdata = sdata_query_aggregation
        # sdata["table"].obs.index = ["a"] * sdata["table"].n_obs
        sdata["values_circles"] = sdata_query_aggregation["values_circles"][:4]
        sdata["values_polygons"] = sdata_query_aggregation["values_polygons"][:5]
    
>       element_dict, table = join_spatialelement_table(
            sdata=sdata,
            spatial_element_names=["values_circles", "values_polygons"],
            table_name="table",
            how=join_type,
        )

test_relational_query.py:388: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../src/spatialdata/_core/query/relational_query.py:680: in join_spatialelement_table
    elements_dict_joined, table = _call_join(elements_dict, table, how, match_rows)
../../../src/spatialdata/_core/query/relational_query.py:697: in _call_join
    elements_dict, table = JoinTypes[how](elements_dict, table, match_rows)
../../../src/spatialdata/_core/query/relational_query.py:528: in __call__
    return self.value(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

element_dict = defaultdict(<function _create_sdata_elements_dict_for_join.<locals>.<lambda> at 0x3087c3c40>, {'shapes': defaultdict(<...))  ...                3
4  POLYGON ((66 -6, 66 6, 78 6, 78 -6, 66 -6))  ...                4

[5 rows x 3 columns]})})
table = AnnData object with n_obs × n_vars = 21 × 1
    obs: 'region', 'instance_id', 'categorical_in_obs', 'numerical_in_obs'
    uns: 'spatialdata_attrs'
match_rows = 'no'

    def _left_exclusive_join_spatialelement_table(
        element_dict: dict[str, dict[str, Any]], table: AnnData, match_rows: Literal["left", "no", "right"]
    ) -> tuple[dict[str, Any], AnnData | None]:
        regions, region_column_name, instance_key = get_table_keys(table)
        groups_df = table.obs.groupby(by=region_column_name, observed=False)
        for element_type, name_element in element_dict.items():
            for name, element in name_element.items():
                if name in regions:
                    group_df = groups_df.get_group(name)
                    table_instance_key_column = group_df[instance_key]
                    if element_type in ["points", "shapes"]:
                        mask = np.full(len(element), True, dtype=bool)
>                       mask[table_instance_key_column.values] = False
E                       IndexError: index 4 is out of bounds for axis 0 with size 4

../../../src/spatialdata/_core/query/relational_query.py:435: IndexError

Process finished with exit code 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant