Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702

wrridgeway · 2025-01-09T17:04:11Z

This PR makes sure we don't accidentally update part of an open data asset. If there are any columns in an open data asset that wouldn't get updated by the script, it will throw an error.

For now, it will only log a warning if there are columns in the Athena view feeding an asset that are not also in the asset. This is done to give us flexibility for now since not all columns in a view are intended to end up on the open data portal. This can be adjusted in the future if we shift the views feeding all assets to the open_data db since we'll want those views and assets to mirror one another 1:1.

Here are examples of expected failure and expected success with a warning.

jeancochrane

Thanks for separating this from the SSL error business that I was unsure about!

jeancochrane · 2025-01-10T16:29:04Z

socrata/socrata_upload.py

+
+        if len(columns_not_on_socrata) > 0:
+            exception_message += f"\nColumns in Athena but not on Socrata: {columns_not_on_socrata}"
+            logging.warning(exception_message)


[Thought, non-blocking] Not necessary to change, but I thought you might be interested in some Python code style: It's actually somewhat uncommon to call methods on the logging module directly like this. Per the docs, typical usage is to instantiate a module-level logger object that we then use to call logging methods:

# __name__ is a reserved keyword storing the name of the current module logger = logging.getLogger(__name__) logger.warning("Your warning message here")

When we call a method like warning() directly on the logging object that we import into the module, we're calling it on the "root" logger, which is the default logging object that the logging module exports.

There are a few ways that a module-level logger object offers more flexibility than the root logger, but the most important one is that the log output contains a reference to the module that emitted the log:

>>> import logging >>> logging.warning("foo") WARNING:root:foo >>> logger = logging.getLogger(__name__) >>> logger.warning("foo") # __main__ is a special module name referring to the top-level code environment, see: # https://docs.python.org/3/library/__main__.html WARNING:__main__:foo

Since all of the code for socrata/socrata_upload.py lives in one module, this feature doesn't matter much, since we can be confident that any logs emitted by the root logger are coming from this script. If we ever decide to refactor the code to pull out certain pieces into other modules, however, module-level loggers would help us know which part of the code is emitting the logs that we see.

Thank you @jeancochrane, fixed!

First commit

2135377

wrridgeway self-assigned this Jan 9, 2025

wrridgeway added 3 commits January 9, 2025 17:26

Switch some errors to warnings

6c72b34

Commenting

fa7e049

Remove needless list

5638db9

wrridgeway changed the title ~~Ensure API upload script throws an error if Athena and Socrata columns mismatch~~ Ensure API upload script throws error if Athena and Socrata columns mismatch Jan 9, 2025

wrridgeway changed the title ~~Ensure API upload script throws error if Athena and Socrata columns mismatch~~ Ensure API upload script throws error/warning if Athena and Socrata columns mismatch Jan 9, 2025

wrridgeway added 3 commits January 9, 2025 17:56

Remove stray comment

aa7d295

Add ndg-httpsclient

739adb2

Limit columns

769365b

wrridgeway marked this pull request as ready for review January 9, 2025 21:06

wrridgeway requested a review from a team as a code owner January 9, 2025 21:06

wrridgeway added 4 commits January 9, 2025 21:15

Make output more useful

860c958

Commenting

45925a9

Change package version

5f2ca83

Remove new package

4f1b391

jeancochrane approved these changes Jan 10, 2025

View reviewed changes

Improve logging

77a63ad

wrridgeway merged commit f37dd20 into master Jan 10, 2025
7 checks passed

wrridgeway deleted the ensure-bad-column-match-errors branch January 10, 2025 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702

Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702

wrridgeway commented Jan 9, 2025 •

edited

Loading

jeancochrane left a comment

jeancochrane Jan 10, 2025

wrridgeway Jan 10, 2025

Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702

Ensure API upload script throws error/warning if Athena and Socrata columns mismatch #702

Conversation

wrridgeway commented Jan 9, 2025 • edited Loading

jeancochrane left a comment

Choose a reason for hiding this comment

jeancochrane Jan 10, 2025

Choose a reason for hiding this comment

wrridgeway Jan 10, 2025

Choose a reason for hiding this comment

wrridgeway commented Jan 9, 2025 •

edited

Loading