Repository for IS3107 project: JobPulse
- Set working directory to
/Web_App
streamlit run app.py
- Press
Ctrl + C
to terminate streamlit
- Navigate to bash
- Create virtual environment using
python3 -m venv .venv \ source .venv/bin/activate
- Set AIRFLOW_HOME
export $AIRFLOW_HOME={pwd}
- Run Airflow
airflow standalone
- Navigate to http://localhost:8070/ for airflow web server
- Press
Ctrl + C
to terminate airflow
- Standalone_admin_password.txt - password for local airflow
- user_credentials.txt - credentials for LinkedIn account used for web scraping
- credentials.py - stores the credentials for the PostGresSQL Database
- Back up store of older project artifacts
- Scraping code for Indeed website
- Raw data from scraping
- Scraping code for InternSG website
- Raw data from scraping
- Scraping code for JobStreet website
- Raw data from scraping
- Scraping code for LinkedIn website
- Raw data from scraping
- Scraping code for MyCareersFuture website
- Raw data from scraping
- Data / reference code for Web_App applications
- Jobs_analysis_with_Spacy.ipynb contains additional exploration of using part-of-speech tagging to identify and match jobs with “skills” tags. Word cloud from the dashboard is derived from here.
- recc_system.ipynb contains additional exploration of recommendation system ML models. Job recommender is derived from here.
- app.py is the main file for the web application that shows the downstream application
- DAG.py is the DAG file for Apache airflow
- modified and cleaned versions of the raw data files
- final CSV of all combined data sources after pre-processing and cleaning