-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check query before fetching data from server #12
Comments
Assuming that the purpose of pytest-gee is to test packages that use the Earth Engine API (either via the Earth Engine Python client library or REST API), you may want to mock responses of the Earth Engine API so the test run without accessing Earth Engine. |
Sorry for not answering earlier but the objective is actually to check the Python API response. The problem we've been facing in the past is small changes made by the GEE behavior without notifying developers. The purpose of the lib here is to really asset the result of the server-side computation. To provide some context, this plugn is a byproduct of the geetools package that create and manipulate server side object. I really need the response to be evaluate to make sur I'm doing the right thing. I actually got an idea from a colleague on a private repo, I'll try to implement it when I have the time (you'll see it's fancy 😄) |
In this case I would suggest setting up two sets of tests:
|
ah sure this will be done in downstream packages (geetools, ipygee, pygaul... maybe more ?) |
Just offering my opinion from the EE perspective (cc @schwehr). The onus should not be on EE users and packages to verify the consistency of Python API calls to EE. I can only speak from the Python team's perspective, but we take backward compatibility and API stability seriously. Here's what we've done in the last year to improve the reliability and consistency of the Python client libs.
With that being said, there is still room for improvement––code changes can still happen that modify the output of an algorithm call. I like your idea to record server responses. We call these types of tests hermetic tests internally. We record live backend responses, cache them, and replay them in a hermetic test environment. They are a powerful tool for reducing test instability, but are often brittle and expensive to maintain the recordings. |
thanks @naschmitz for your detailed answer, I think a simple use case would be the following: I create a function that removes the system properties from the property names of an object: def removeSystem(obj) -> ee.list:
names = obj.propertyNames()
systemNames = names.filter(ee.Filter.stringStartsWith("item", "system:"))
return name.removeAll(systemNames) Which would work. If I understand what you suggest, I should not test consistency of the API call response but simply the API call itself: def test_remove_system(data_regression, obj):
list= removeSystem(obj)
data = ee.serializer.encode(list)
data_regression.check(data) Which work consistently as I'm always doing the same API call. But at some point I guess a clever contributor will notify me and say: "hey you can do it better": def removeSystem(obj) -> ee.list:
names = obj.propertyNames()
return names.filter(ee.Filter.stringStartsWith("item", "system:").Not()) And then without testing the actual backend response I cannot check if the new implemented call is still providing the same results. If I understood Rodrigo's suggestion correctly, it's: @naschmitz is using |
Thank you @naschmitz for taking the time to reply on this thread 🙂 @12rambau understood my suggestion correctly. It's based on the hypothesis that "If the query string (from Using a simple example: def my_func() -> ee.Number:
"""A function that returns 4"""
return ee.Number(2).add(2)
# a normal test
def test_my_func():
four = my_func()
assert four.getInfo() == 4 If the package is being updated frequently and it has many tests like this, we'll be making a lot of call unnecessarily. So, we could do: def test_my_func():
# open the recorded query string
with open('serialized_four.json', 'r') as file:
expected_serialized = json.load(file) # {"result": "0", "values": {"0": {"constantValue": 4}}}
# get query string of "my_func"
serialized = my_func().serialize()
# if query string has not changed, test passes
if serialized == expected_serialized:
assert True
# if query string has changed, check server's response
else:
is_same = four.getInfo() == 4
assert is_same
# if the server's response is the same, record the new query string
if is_same:
# store new expected_serialized
with open('serialized_four.json', 'w') as file:
json.dump(serialized, file) |
+1 to what @naschmitz said. The results from An example of the EE api unit tests checking the complete expression graph for an API call: DictionaryTest.test_combine() Please note that the EE server responses can change over time independent of the client version (for the same client request). Pinning on a client version will not help with that issue. Migrating our internal integration tests to GitHub will help make that more observable. |
Problem
Commercial user of GEE API get charged for its usage, so every time a test is triggered, it fetches data from server creating cost for the company. Also, it takes time to fetch data, so when trying to run a complete set of tests, it could take long to finish.
Solution
Save the query string into a file (write/read functionality can be found in geetools) and, when the test is triggered, first check if the query string is the same, if
True
then skip it, ifFalse
then fetch.Context
The version of GEE python API could be something to consider, since the structure of the query could change from version to version, thus a version check could be implemented. The version should be stored somewhere alongside the query string, so to avoid having to modify the string and for example create a new string/dict with the version code, my proposal is to write it as part of the filename, something like:
test_some_method_0-1-384.gee
Probably @tylere could have an opinion on this.
The text was updated successfully, but these errors were encountered: