Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_track bug #697

Open
daniel-caichac-DHI opened this issue May 29, 2024 · 8 comments
Open

extract_track bug #697

daniel-caichac-DHI opened this issue May 29, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@daniel-caichac-DHI
Copy link
Collaborator

daniel-caichac-DHI commented May 29, 2024

Describe the bug
Hi, I was facing issues with modelskill track comparisons, so I had to dig deeper (thanks to JAN for the tips), but there seems to be an issue with the .extract_track tool when working with UTM coordinates at least.
I ended up running dataextractionfm.exe and that one worked fine, but it breaks the workflow of having a single notebook.

The issue is that it is not finding the paired data, when indeed there is.

To Reproduce
I am uploading a zip file with csv file of tracks, a 1-day dfsu with results, and the full workflow (notebook) to reproduce the issue.
Bug.zip

System information:

  • Python version: 3.10.8
  • MIKE IO version : 1.6.2
@daniel-caichac-DHI daniel-caichac-DHI changed the title extractk_track bug extract_track bug May 29, 2024
@ecomodeller ecomodeller added the bug Something isn't working label May 29, 2024
@ecomodeller
Copy link
Member

ecomodeller commented May 29, 2024

The error is caused by the GeometryFM2D.contains method, which incorrectly identifies points as outside the domain.

Here is a breakdown of the boundary segments, where some of the exterior segments seem to be classified as interior.

import matplotlib.pyplot as plt
import pandas as pd

import mikeio

df = pd.read_csv('Altmetry_data_debug.csv',index_col=0,parse_dates=True)
df_utm = df[['easting','northing','significant_wave_height']]
ds = mikeio.Dfsu('Model_subset.dfsu').read()
fig, ax = plt.subplots(figsize=(10,4))
g = ds.geometry

for exterior in g.boundary_polylines.exteriors:
    ax.plot(*exterior.xy.T, color='k', linewidth=1.2)
for interior in g.boundary_polylines.interiors:
    ax.plot(*interior.xy.T, color='r', linewidth=0.5, linestyle='dashed')
plt.scatter(df_utm['easting'],df_utm['northing']);

image

@jsmariegaard I suppose the detection if a point is inside the domain or not, is handled in a different way in DataTrackExtractionFM.exe?

@daniel-caichac-DHI
Copy link
Collaborator Author

This explains why all the modelskill comparisons where done with data in this part only
image

@ecomodeller
Copy link
Member

If I replace the current implementation with shapely it seems to work as expected. The reason why this approach is not the default, was that is can be quite slow to construct.

def contains(self, points: np.ndarray) -> np.ndarray:
        """test if a list of points are contained by mesh

        Parameters
        ----------
        points : array-like n-by-2
            x,y-coordinates of n points to be tested

        Returns
        -------
        bool array
            True for points inside, False otherwise
        """
        # import matplotlib.path as mp 

        # points = np.atleast_2d(points)

        # exterior = self.boundary_polylines.exteriors[0]
        # cnts = mp.Path(exterior.xy).contains_points(points)

        # if self.boundary_polylines.n_exteriors > 1:
        #     # in case of several dis-joint outer domains
        #     for exterior in self.boundary_polylines.exteriors[1:]:
        #         in_domain = mp.Path(exterior.xy).contains_points(points)
        #         cnts = np.logical_or(cnts, in_domain)

        # # subtract any holes
        # for interior in self.boundary_polylines.interiors:
        #     in_hole = mp.Path(interior.xy).contains_points(points)
        #     cnts = np.logical_and(cnts, ~in_hole)

        # return cnts

        from shapely.geometry import Point

        domain = self.to_shapely().buffer(0)
        return np.array([domain.contains(Point(p)) for p in points])

image

@daniel-caichac-DHI
Copy link
Collaborator Author

I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains or .inteserction methods implemented

@ecomodeller
Copy link
Member

I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains or .inteserction methods implemented

See this branch https://github.com/DHI/mikeio/tree/geometryFM_contains_shapely

@ecomodeller
Copy link
Member

This problem is not only relevant for track extraction.

It is a problem for point extraction as well in some cases when using IDW interpolation.

Below is an example of a tide gauge located on an island, where the 5 nearest elements creates a domain, where the contains method fails.

image

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Jun 3, 2024

But then why don't we:

  • create a failing test (we can use the files I sent and the example you just showed)
  • change to the solution with shapely
  • pass the tests while sacrificing speed, and then
  • add to the backlog as task of improve_efficiency_contains_points or similar
    ?

@ecomodeller
Copy link
Member

But then why don't we:

  • create a failing test (we can use the files I sent and the example you just showed)
  • change to the solution with shapely
  • pass the tests while sacrificing speed, and then
  • add to the backlog as task of improve_efficiency_contains_points or similar
    ?

We will!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants