Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full containment and overlapping modes in polygonToCellsExperimental #796

Merged
merged 9 commits into from
Nov 27, 2023

Conversation

nrabinowitz
Copy link
Collaborator

@nrabinowitz nrabinowitz commented Nov 4, 2023

Adds support for full containment mode and overlapping modes in polygonToCellsExperimental.

  • Defines an enum for the containment modes. My assumption here is that the lowest 4 bits of the 32-bit flag are reserved for containment modes (not sure that we'd need more than 7, but wanted to be on the safe side), and that we'd use them as an integer, not as bitwise flags, since the modes are mutually exclusive.
  • Implements the check for containment in iterStepPolygonCompact. This was very straightforward as we already have cellBoundaryInsidePolygon polygon available.
  • Implements the check for overlapping cells in iterStepPolygonCompact. This checks center point inclusion first, then shares code with the full containment check, substituting a new cellBoundaryCrossesPolygon check for cellBoundaryInsidePolygon.
  • Adds some very basic tests, which essentially just duplicate existing polyfill tests with full containment and overlapping turned on, substituting an empirically determined result and checking that it's reasonable compared to the center containment mode.
  • Updates all tests to use modes instead of 0 for flags.

TODO:

  • We should probably clean up the tests and use CENTER_CONTAINMENT instead of 0 for the flag value in existing tests.
  • There are a couple of error branches here that I think we can get coverage on by sticking invalid cells into the iterator object.
  • I'd love better tests here, though I'm not sure what would make sense.

@coveralls
Copy link

coveralls commented Nov 4, 2023

Coverage Status

coverage: 98.806% (+0.02%) from 98.784%
when pulling f0f1b45 on nrabinowitz:polyfill-contained
into 6c1b003 on uber:master.

@nrabinowitz nrabinowitz changed the title Add support for full containment mode in polygonToCellsExperimental Add support for full and overlapping containment modes in polygonToCellsExperimental Nov 4, 2023
@nrabinowitz nrabinowitz changed the title Add support for full and overlapping containment modes in polygonToCellsExperimental Add support for full containment and overlapping modes in polygonToCellsExperimental Nov 4, 2023

BENCHMARK(polygonToCells_AllCountries4, 5, {
for (int index = 0; index < ${polygons.length}; index++) {
H3_EXPORT(polygonToCellsExperimental)(&COUNTRIES[index], res, OVERLAPPING, hexagons);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new modes are slower than center containment, unsurprisingly, but still manage to be faster than the old algo in some cases:

Res 0	-- polygonToCells_AllCountries1: 198222.000000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries2: 42474.800000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries3: 247580.600000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries4: 282227.200000 microseconds per iteration (5 iterations)
Res 1	-- polygonToCells_AllCountries1: 324889.800000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries2: 18640.000000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries3: 72821.200000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries4: 89494.400000 microseconds per iteration (5 iterations)
Res 2	-- polygonToCells_AllCountries1: 257009.400000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries2: 66020.600000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries3: 143866.400000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries4: 181394.600000 microseconds per iteration (5 iterations)
Res 3	-- polygonToCells_AllCountries1: 424776.400000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries2: 280514.200000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries3: 442108.400000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries4: 545078.600000 microseconds per iteration (5 iterations)
Res 4	-- polygonToCells_AllCountries1: 2035545.200000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries2: 1462475.200000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries3: 1969331.600000 microseconds per iteration (5 iterations)
	-- polygonToCells_AllCountries4: 2559085.000000 microseconds per iteration (5 iterations)

Comment on lines 47 to 52
typedef enum {
CENTER_CONTAINMENT = 0,
FULL_CONTAINMENT = 1,
OVERLAPPING = 2,
INVALID_CONTAINMENT = 3
} ContainmentMode;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to prefix the enun names with something here (POLYGON_CENTER_CONTAINMENT?) so that it is clear a valid enum is being fed into polygonToCells.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - leaning towards CONTAINMENT_ as the prefix, so we'd have

  • CONTAINMENT_CENTER
  • CONTAINMENT_FULL
  • CONTAINMENT_OVERLAPPING
  • CONTAINMENT_INVALID

One thought here is that we might apply the same modes to circles or bounding boxes at some point, so I don't necessarily think we want to bake POLYGON into the name.

Comment on lines 48 to 51
CENTER_CONTAINMENT = 0,
FULL_CONTAINMENT = 1,
OVERLAPPING = 2,
INVALID_CONTAINMENT = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CENTER_CONTAINMENT = 0,
FULL_CONTAINMENT = 1,
OVERLAPPING = 2,
INVALID_CONTAINMENT = 3
CENTER_CONTAINMENT = 0, ///< Cells are included if their center is in the polygon
FULL_CONTAINMENT = 1, ///< Cells are included if they are entirely contained in the polygon
OVERLAPPING = 2, ///< Cells are included if any part of them overlaps the polygon
INVALID_CONTAINMENT = 3 ///< This mode is invalid and should not be used.

Comment on lines +422 to +423
iterErrorPolygonCompact(iter, centerErr);
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and similar case below) are uncovered in test, but look like they should be testable by setting an invalid cell on the iterator and then calling this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll try to add coverage here today or tomorrow.

Comment on lines 460 to 463
if (bboxErr) {
iterErrorPolygonCompact(iter, bboxErr);
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case may not be testable because presumably the boundary error above would not be E_SUCCESS? Perhaps there exists some index for which cellToBoundary returns 0 but cellToBbox returns non-0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related question, do we have fuzzer question of these functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is coverable, will switch to NEVER if I can't figure out an approach.

We don't have a fuzzer for polygonToCellsExperimental, but I can add one - that's probably best in a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to help getting a fuzzer added. Agreed doing that in a separate PR would make the most sense.

@nrabinowitz
Copy link
Collaborator Author

@isaacbrodsky I believe the latest commit should close the holes in coverage.

@michaelbrichko
Copy link
Contributor

Thanks for adding this functionality !!
How mature is it? Are there any knows issues for overlapping functionality? Any known gaps ?

@nrabinowitz
Copy link
Collaborator Author

Thanks for adding this functionality !! How mature is it? Are there any knows issues for overlapping functionality? Any known gaps ?

Well, you can see it was merged two weeks ago, so that's how mature it is :). But we've done a fair amount of testing against the older algo, and it seems to behave as expected. Overlapping mode seems to work as expected, we aren't aware of any issues, but it's hard to extensively test this. The main (expected) issue is that overlapping mode is slower than the other modes, as discussed in this PR.

@michaelbrichko
Copy link
Contributor

Ok, thanks for quick answer !

@brendan-morin
Copy link

Look great! Has this been released yet? If not, is there a stable tag we could reference? We use the h3-py bindings and it looks like it’s been a bit since the last release.

@bertday
Copy link

bertday commented Apr 2, 2024

Hello H3 friends! 👋

I use H3 often and this PR would be tremendously helpful. Is there any timeline for when a new release might come out with the new polyfill modes?

Thank you for all of the hard work that went into this!

isaacbrodsky added a commit that referenced this pull request Oct 6, 2024
This adds new fuzzers for the `polygonToCellsExperimental` function, based on the existing functions. Since #796 (this PR is based on that branch) adds containment modes, this updates the existing fuzzers to exercise those options too.

The estimation functions are adjusted so that the fuzzers pass. This may reduce performance of the new algorithms somewhat, but avoids potential buffer problems. This can be revisited to get them back to how they were intended to work.

---------

Co-authored-by: Nick Rabinowitz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants