Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Custom expectation created using SQL becomes missing after initializing the datacontext #10756

Open
Erua-chijioke opened this issue Dec 10, 2024 · 2 comments

Comments

@Erua-chijioke
Copy link

Erua-chijioke commented Dec 10, 2024

Describe the bug
Using Great expectations version 1.2.4 I created a custom expectation using sql and added the expectation to the expectation suite. After the current python session ended (my data bricks cluster was turned off because of inactivity ), I turned on my databricks cluster and I re-initialized the already created data context which was meant to hold the expectation suite. When I tried to access the expectation suite to see the content, i discovered that the custom expectation i created earlier is no longer present and it is throwing an an error.

To Reproduce
create data source, asset, batch definition and expectation suite**
dataframe = spark.sql("SELECT * FROM xxx")
context = gx.get_context(project_root_dir="/dbfs/xxx")

data_source_name = "my_data_source"
data_asset_name = "my_dataframe_data_asset"
batch_definition_name = "my_batch_definition"

data_source = context.data_sources.add_or_update_spark(name=data_source_name)
try:
data_asset=data_source.add_dataframe_asset(name=data_asset_name)
except Exception as e:
data_asset = context.data_sources.get(data_source_name).get_asset(data_asset_name)

try:
batch_definition = data_asset.add_batch_definition_whole_dataframe(batch_definition_name)
except Exception as e:
batch_definition = data_asset.get_batch_definition(batch_definition_name)

suite_name = "d365_enriched_generaljournaltransaction_expectation_suite"
try:
suite = gx.ExpectationSuite(name=suite_name)
suite = context.suites.add(suite)
except Exception as e:
suite = context.suites.get(name=suite_name)

#now lets create a custom expectation using sql
class ExpectValidLineItemSum(gx.expectations.UnexpectedRowsExpectation):
unexpected_rows_query: str = ("""SELECT CrayonCompanyIdRef, SUM(accountingcurrencyamount) AS total_amount
FROM {batch}
WHERE accountingdate >= MAKE_DATE(YEAR(CURRENT_DATE) - 1, 1, 1)
AND accountingdate <= MAKE_DATE(YEAR(CURRENT_DATE) - 1, 12, 31)
GROUP BY CrayonCompanyIdRef
HAVING SUM(accountingcurrencyamount) NOT BETWEEN -1 AND 1""")
description: str = "Line items should have a valid sum between -1 amd +1"

expectation = ExpectValidLineItemSum()

try:
suite.add_expectation(expectation)
except Exception as e:
print("Expectation already exists in the suite")

Now the expectation was added successfully to the expectation suite.
Try ending the current python session (in my case restarting the databricks cluster will do that)
run the following line of codes to get the expectation suite you saved in the last python session

create data source, asset, batch definition and expectation suite**

context = gx.get_context(project_root_dir="/dbfs/xxx")

data_source_name = "my_data_source"
data_asset_name = "my_dataframe_data_asset"
batch_definition_name = "my_batch_definition"

suite_name = "d365_enriched_generaljournaltransaction_expectation_suite"
try:
suite = gx.ExpectationSuite(name=suite_name)
suite = context.suites.add(suite)
except Exception as e:
suite = context.suites.get(name=suite_name)

This is the error it throws
ERROR:great_expectations.core.expectation_suite:Could not add expectation; provided configuration is not valid: Could not add expectation; provided configuration is not valid: expect_valid_line_item_sum not found
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/core/expectation_suite.py", line 639, in _build_expectation
expectation = expectation_configuration.to_domain_obj()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/expectations/expectation_configuration.py", line 447, in to_domain_obj
expectation_impl = self._get_expectation_impl()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/expectations/expectation_configuration.py", line 444, in _get_expectation_impl
return get_expectation_impl(self.type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/expectations/registry.py", line 396, in get_expectation_impl
raise gx_exceptions.ExpectationNotFoundError(f"{expectation_name} not found") # noqa: TRY003
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
great_expectations.exceptions.exceptions.ExpectationNotFoundError: expect_valid_line_item_sum not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/core/expectation_suite.py", line 94, in init
self.expectations.append(self._process_expectation(exp))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/core/expectation_suite.py", line 216, in _process_expectation
return self._build_expectation(expectation_configuration=expectation_like)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/great_expectations/core/expectation_suite.py", line 646, in _build_expectation
raise gx_exceptions.InvalidExpectationConfigurationError( # noqa: TRY003
great_expectations.exceptions.exceptions.InvalidExpectationConfigurationError: Could not add expectation; provided configuration is not valid: expect_valid_line_item_sum not found
Expected behavior
The expectation which I saved on the expectation suite should be able to be retrieved even after I restart an ended python session. This is because it is meant to persist this expectation in the expectation suite which is a json file inside the data context.

Environment (please complete the following information):

  • Operating System: Windows
  • Great Expectations Version: 1.2.4
  • Data Source: Spark Dataframe
  • Cloud environment: Databricks

Additional context
Add any other context about the problem here.

@Erua-chijioke Erua-chijioke changed the title Custom expectation created using SQL becomes missing after initializing the datacontext [BUG] Custom expectation created using SQL becomes missing after initializing the datacontext Dec 10, 2024
@adeola-ak adeola-ak moved this from To Do to In progress in GX Core Issues Board Dec 17, 2024
@adeola-ak
Copy link
Contributor

Does the suggestion earlier of importing your custom expectation in the python session where you try to load your context help with this issue? We believe importing the expectation in the second python session will help

@Erua-chijioke
Copy link
Author

@adeola-ak, I used the UnexpectedRowsExpectation directly without subclassing. It was another alternative @tyler-hoffman suggested I should use.

I noticed another issue with UnexpectedRowsExpectation in that it does not show the results properly in the datadocs. I will have to create another ticket for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

2 participants