-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for separating this from the SSL error business that I was unsure about!
socrata/socrata_upload.py
Outdated
|
||
if len(columns_not_on_socrata) > 0: | ||
exception_message += f"\nColumns in Athena but not on Socrata: {columns_not_on_socrata}" | ||
logging.warning(exception_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Thought, non-blocking] Not necessary to change, but I thought you might be interested in some Python code style: It's actually somewhat uncommon to call methods on the logging
module directly like this. Per the docs, typical usage is to instantiate a module-level logger object that we then use to call logging methods:
# __name__ is a reserved keyword storing the name of the current module
logger = logging.getLogger(__name__)
logger.warning("Your warning message here")
When we call a method like warning()
directly on the logging
object that we import into the module, we're calling it on the "root" logger, which is the default logging object that the logging
module exports.
There are a few ways that a module-level logger object offers more flexibility than the root logger, but the most important one is that the log output contains a reference to the module that emitted the log:
>>> import logging
>>> logging.warning("foo")
WARNING:root:foo
>>> logger = logging.getLogger(__name__)
>>> logger.warning("foo")
# __main__ is a special module name referring to the top-level code environment, see:
# https://docs.python.org/3/library/__main__.html
WARNING:__main__:foo
Since all of the code for socrata/socrata_upload.py
lives in one module, this feature doesn't matter much, since we can be confident that any logs emitted by the root logger are coming from this script. If we ever decide to refactor the code to pull out certain pieces into other modules, however, module-level loggers would help us know which part of the code is emitting the logs that we see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jeancochrane, fixed!
This PR makes sure we don't accidentally update part of an open data asset. If there are any columns in an open data asset that wouldn't get updated by the script, it will throw an error.
For now, it will only log a warning if there are columns in the Athena view feeding an asset that are not also in the asset. This is done to give us flexibility for now since not all columns in a view are intended to end up on the open data portal. This can be adjusted in the future if we shift the views feeding all assets to the
open_data
db since we'll want those views and assets to mirror one another 1:1.Here are examples of expected failure and expected success with a warning.