Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make queryables and summaries automatically updatable #18

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mishaschwartz
Copy link

@mishaschwartz mishaschwartz commented Jan 16, 2025

Previously this app implemented a custom /queryables endpoint that crawled the database to display information about the items stored in the database. This method has some limitations:

  • It only worked for individual collections, not all queryables across all collections
  • It was really slow since it had to inspect the entire database every time the endpoint was called

This PR improves on this method by introducing postgres functions to collect the same queryables information from the database and store it in the queryables table. This caches the queryables information and allows the default /queryables endpoint function to get the same information quickly for a single collection or for all collections.

A similar strategy is also implemented here to ensure that the collection summaries and extents are kept up to date.

See the updated README for more detailed information.

To test this in birdhouse:

  • clone this repo and run docker build -t tmp-stac-all:local-test .
  • in birdhouse/components/stac/default.env update:
    • export STAC_IMAGE='tmp-stac-all:local-test'
  • start up the birdhouse stack normally (add in the stac-populator optional component to add some info to the STAC database if there's nothing there already).
  • check out the /queryables endpoints to see what is there by default
  • check out the extent and summaries sections from the /collections endpoint to see what is there by default
  • run PATCH /queryables and PATCH /summaries (you may need admin permissions since magpie permissions are set to deny non-GET requests)
  • check out the endpoints again to see what has changed

Hint: to send PATCH requests to stac with the admin cookies:

ADMIN_COOKIE=$(curl -X POST http://<BIRDHOUSE_FQDN>/magpie/signin -H "Content-Type: application/json" -d '{"user_name": "<ADMIN_USERNAME>", "password": "<ADMIN_PASSWORD>"}' --cookie-jar -)

curl -X PATCH http://<BIRDHOUSE_FQDN>/stac/queryables --cookie <(echo "$ADMIN_COOKIE")
curl -X PATCH http://<BIRDHOUSE_FQDN>/stac/summaries --cookie <(echo "$ADMIN_COOKIE")

Copy link

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably start a CHANGELOG to keep track of updates and features applied. Could also start versioning properly.

.dockerignore Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
discover_queryables.sql Outdated Show resolved Hide resolved
discover_summaries.sql Outdated Show resolved Hide resolved
Comment on lines 125 to 135
JOIN LATERAL jsonb_each(properties) ON TRUE
JOIN LATERAL jsonb_array_elements(
CASE jsonb_typeof(value)
WHEN 'array' THEN
value
ELSE
jsonb_build_array(value)
END
) AS a ON TRUE
-- see https://github.com/stac-extensions/timestamps
WHERE key NOT IN ('created', 'updated', 'published', 'expires', 'unpublished', 'datetime', 'start_datetime', 'end_datetime')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the result of more complex structures. How are they handled?

I'm thinking of, for example, bands/eo:bands/raster:bands (https://github.com/radiantearth/stac-spec/blob/master/commons/common-metadata.md#bands, https://github.com/stac-extensions/eo, https://github.com/stac-extensions/raster) or mlm:inputs/mlm:outputs/mlm:hyperparameters (https://github.com/stac-extensions/mlm) that have JSON arrays of nested objects.

Same question applies for queryables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good question. Right now the summaries will show them as an array of objects (in queryables it's also an array of objects under an "enum" key).

It might be nice in the future to add special handling cases for bands and other common object structures. The tricky part is identifying them since there are many different property names from different stac extensions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expectation of /queryables would be that unusual or complicated structures that cannot be directly queried would not be listed. For the time being, I don't think STAC queries offers an official way to search nested objects, beside maybe some convoluted CQL2 filter?
In most cases, I believe those complicated structures are more informative metadata than expected to be queryable. So, maybe just omit them from /queryables would be best?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code tries to replicate the behaviour we had before but if I have time I can try to handle this in a nicer way.

beside maybe some convoluted CQL2 filter

This app implements the Filter extension so it supports CQL2 syntax

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to double-check to be sure: https://github.com/stac-api-extensions/filter#queryables
I had a memory this was still in "not supported state".

See emphasized text parts:

Queryables can also be used to advertise "synthesized" property values. The only requirement in CQL2 is that the property have a type and evaluate to literal value of that type or NULL. For example, a filter like "Items must have an Asset with an eo:band with the common_name of 'nir'" can be expressed. A Queryable assets_bands could be defined to have a type of array of string and have the semantics that it contains all of common_name values across all assets and bands for an Item. This could then be filtered with the CQL2 expression 'nir' in assets_bands. Implementations would then expand this expression into the appropriate query against its datastore. (TBD if this will actually work or not. This is also related to the upcoming restriction on property/literal comparisons)

An implementation may also choose not to advertise any queryables, and provide the user with out-of-band information or simply let them try querying against fields. While this is not allowed according to the OGC CQL2 Queryable spec, it is allowed in STAC API by the Filter Extension.

Somewhat still free for all.
To be compliant, I guess /queryables not explicitly handled (objects/array) would have to be omitted, especially if not actually processed by the code in any special way. It is counter-productive for servers that rely on it.
The filter can still allow users to try some complex JSON-path syntax within a CQL2 expression, but /queryables doesn't advertise any "guarantee". The best really would be for those edge cases to have custom properties to facilitate filtering.

README.md Show resolved Hide resolved
.bumpversion.cfg Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
CHANGES.md Show resolved Hide resolved
Copy link

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to Francis review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants