Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BACKEND: Make scraped hearing data available on web (push to database with code) #11

Open
bbrewington opened this issue Jan 18, 2022 · 4 comments
Labels
skill-python Involves tech skill: Python (web scraping, API's, etc) skill-web-dev-backend Involves tech skill: back-end web development
Milestone

Comments

@bbrewington
Copy link
Member

No description provided.

@bbrewington bbrewington added skill-python Involves tech skill: Python (web scraping, API's, etc) skill-web-dev-backend Involves tech skill: back-end web development labels Jan 18, 2022
@bbrewington bbrewington added this to the version-2 milestone Jan 18, 2022
@bbrewington bbrewington changed the title Make hearing data available on web (automated push to database) Make scraped hearing data available on web (push to database with code) Jan 18, 2022
@bbrewington bbrewington changed the title Make scraped hearing data available on web (push to database with code) Feature: Make scraped hearing data available on web (push to database with code) Jan 18, 2022
@bbrewington bbrewington changed the title Feature: Make scraped hearing data available on web (push to database with code) BACKEND: Make scraped hearing data available on web (push to database with code) Jan 18, 2022
@BennyJW
Copy link
Contributor

BennyJW commented Jan 19, 2022

Has there been any discussion of how to do this? One option is to deploy to Google Cloud Run. Use a scheduler to activate maybe once a day. Load the resulting CSV into Cloud Storage. Then load into Google BigQuery. This would be serverless and easily within the Google free tier.

And github actions to trigger a refresh of the Cloud Run docker image.

@bbrewington
Copy link
Member Author

@BennyJW I think once per day would be a good start. any thoughts on picking between these options? solid ideas, and feel free to spin up a test...maybe share details here in comments if you take a stab at it?

Options:

  1. Scraper --> CSV --> Cloud Storage --> BigQuery
  2. Scraper --> JSON --> Firebase

I might be conflating phone number storage / API call, with scraped data storage / API call

@BennyJW
Copy link
Contributor

BennyJW commented Jan 25, 2022

@bbrewington These are good options. My inclination would be #1, because then we can upload the daily raw data to BigQuery, and use SQL to process/extract the data we need. Particularly if as we go we decide we need data from other sources.

@abrie
Copy link
Contributor

abrie commented Feb 6, 2022

See PR #26, which permits an implementation of option 1; but without the cloud storage step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skill-python Involves tech skill: Python (web scraping, API's, etc) skill-web-dev-backend Involves tech skill: back-end web development
Projects
None yet
Development

No branches or pull requests

3 participants