Home

Welcome to the wiki!

Database

We are using Supabase to host a PostgreSQL database. Rather than using Supabase's Python client library, we are using SQLAlchemy ORM to insert and query the database. This allows us to move to a different host when the time comes.

The database credentials are in a Google Drive. Contact Tung, Wilson, or Tony for access (read+write).

Scraping

Running spiders to scrape data

In the first web_scraping folder (ls should return scrapy.cfg and another web_scraping folder). Run

scrapy crawl [spider name]

For example:

scrapy crawl searchspider

Crawl results

Depending on the spider, a .json file will be created/rewritten in web_crawling/jsons.

Currently:

statespider --> states.json (state, url)
munispider --> municipalities.json (state, municipality, url)
searchspider --> parking_code.json (state, municipality, state_url, parking_code)

searchspider

Loops through every entry of municipalities.json and follows the URL for the municipality.

Using scrapy-playwright, for each request it:

waits 6 seconds for JS to load
types a keyword into the search bar
presses "Enter" key
waits 6 seconds for the results to load
results page is sent to parse_search to find the URL with parking code

To resolve:

how to find the right link with parking code (currently we're extracting the first link)
when municipality URL redirects to a site that is not municode
if a keyword does not return any results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly