Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update extract-chicago-permits workflow to query Athena and output Excel workbooks #13

Conversation

jeancochrane
Copy link
Collaborator

@jeancochrane jeancochrane commented Dec 11, 2023

This PR updates the extract-chicago-permits workflow to enable the changes made in #11, namely:

  • Updating the IAM role to allow querying data to check for invalid PINs
  • Updating dependencies and workflow commands to allow outputting data as Excel workbooks rather than CSVs

Example successful workflow: https://github.com/ccao-data/extract-permits/actions/runs/7170976334/job/19525652232

Closes #12.

@@ -2,7 +2,7 @@
Chicago Permit Ingest Process - Automation

This script automates the current process for cleaning permit data from the Chicago Data Portal's Building Permits table
and preparing it for upload to iasWorld via SmartFile. This involves fetching the data, cleaning up certain fields,
and preparing it for upload to iasWorld via SmartFile. This involves fetching the data, cleaning up certain fields,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are exclusively removing stray whitespace, which you can confirm by hiding whitespace changes in the diff. I did this accidentally by opening and saving the file in my editor, but I figure it's probably good to remove this stray whitespace so I left the diff in.

@jeancochrane jeancochrane marked this pull request as ready for review December 11, 2023 17:57
@jeancochrane jeancochrane requested a review from dfsnow December 11, 2023 17:57
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): @jeancochrane Do you expect this run to output valid permits as well as invalid ones? Considering we're not diffing yet I'm surprised that all the permits specified are invalid.

@jeancochrane
Copy link
Collaborator Author

Hmm that's pretty weird @dfsnow, I'll take a look!

@jeancochrane
Copy link
Collaborator Author

Good catch @dfsnow, I identified and fixed a couple bugs in 78702bc and the output is looking more reasonable:

Pulling PINs from Athena
# rows ready for upload:  4643
# rows flagged for length:  5
# rows flagged for empty/invalid fields:  1282
creating 24 xlsx files ready for SmartFile upload

Workflow run logs: https://github.com/ccao-data/extract-permits/actions/runs/7172832443/job/19530866528

@jeancochrane jeancochrane merged commit f53dfa2 into main Dec 12, 2023
1 check passed
@jeancochrane jeancochrane deleted the jeancochrane/12-update-chicago-extraction-workflow-to-allow-pulling-data-from-athena branch December 12, 2023 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Chicago extraction workflow to allow pulling data from Athena
2 participants