Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702

Merged
merged 12 commits into from
Jan 10, 2025

Conversation

wrridgeway
Copy link
Member

@wrridgeway wrridgeway commented Jan 9, 2025

This PR makes sure we don't accidentally update part of an open data asset. If there are any columns in an open data asset that wouldn't get updated by the script, it will throw an error.

For now, it will only log a warning if there are columns in the Athena view feeding an asset that are not also in the asset. This is done to give us flexibility for now since not all columns in a view are intended to end up on the open data portal. This can be adjusted in the future if we shift the views feeding all assets to the open_data db since we'll want those views and assets to mirror one another 1:1.

Here are examples of expected failure and expected success with a warning.

@wrridgeway wrridgeway self-assigned this Jan 9, 2025
@wrridgeway wrridgeway changed the title Ensure API upload script throws an error if Athena and Socrata columns mismatch Ensure API upload script throws error if Athena and Socrata columns mismatch Jan 9, 2025
@wrridgeway wrridgeway changed the title Ensure API upload script throws error if Athena and Socrata columns mismatch Ensure API upload script throws error/warning if Athena and Socrata columns mismatch Jan 9, 2025
@wrridgeway wrridgeway marked this pull request as ready for review January 9, 2025 21:06
@wrridgeway wrridgeway requested a review from a team as a code owner January 9, 2025 21:06
Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for separating this from the SSL error business that I was unsure about!


if len(columns_not_on_socrata) > 0:
exception_message += f"\nColumns in Athena but not on Socrata: {columns_not_on_socrata}"
logging.warning(exception_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking] Not necessary to change, but I thought you might be interested in some Python code style: It's actually somewhat uncommon to call methods on the logging module directly like this. Per the docs, typical usage is to instantiate a module-level logger object that we then use to call logging methods:

# __name__ is a reserved keyword storing the name of the current module
logger = logging.getLogger(__name__)
logger.warning("Your warning message here")

When we call a method like warning() directly on the logging object that we import into the module, we're calling it on the "root" logger, which is the default logging object that the logging module exports.

There are a few ways that a module-level logger object offers more flexibility than the root logger, but the most important one is that the log output contains a reference to the module that emitted the log:

>>> import logging
>>> logging.warning("foo")

WARNING:root:foo

>>> logger = logging.getLogger(__name__)
>>> logger.warning("foo")

# __main__ is a special module name referring to the top-level code environment, see:
# https://docs.python.org/3/library/__main__.html
WARNING:__main__:foo

Since all of the code for socrata/socrata_upload.py lives in one module, this feature doesn't matter much, since we can be confident that any logs emitted by the root logger are coming from this script. If we ever decide to refactor the code to pull out certain pieces into other modules, however, module-level loggers would help us know which part of the code is emitting the logs that we see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jeancochrane, fixed!

@wrridgeway wrridgeway merged commit f37dd20 into master Jan 10, 2025
7 checks passed
@wrridgeway wrridgeway deleted the ensure-bad-column-match-errors branch January 10, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants