is3107_DataEngineering

Repository for IS3107 project: JobPulse

Set Up

UI with Streamlit

Set working directory to /Web_App
streamlit run app.py
Press Ctrl + C to terminate streamlit

Airflow

Navigate to bash
Create virtual environment using python3 -m venv .venv \ source .venv/bin/activate
Set AIRFLOW_HOME export $AIRFLOW_HOME={pwd}
Run Airflow airflow standalone
Navigate to http://localhost:8070/ for airflow web server
Press Ctrl + C to terminate airflow

Files

Config files

Standalone_admin_password.txt - password for local airflow
user_credentials.txt - credentials for LinkedIn account used for web scraping
credentials.py - stores the credentials for the PostGresSQL Database

Archive (outdated)

Back up store of older project artifacts

Indeed_WebScraping

Scraping code for Indeed website
Raw data from scraping

InternSg_WebScraping

Scraping code for InternSG website
Raw data from scraping

JobStreet_WebScraping

Scraping code for JobStreet website
Raw data from scraping

LinkedIn_WebScraping

Scraping code for LinkedIn website
Raw data from scraping

MyCareersFuture_WebScraping

Scraping code for MyCareersFuture website
Raw data from scraping

Job Analysis

Data / reference code for Web_App applications
Jobs_analysis_with_Spacy.ipynb contains additional exploration of using part-of-speech tagging to identify and match jobs with “skills” tags. Word cloud from the dashboard is derived from here.
recc_system.ipynb contains additional exploration of recommendation system ML models. Job recommender is derived from here.

Web_App

app.py is the main file for the web application that shows the downstream application

dags

DAG.py is the DAG file for Apache airflow

remaining CSV files

modified and cleaned versions of the raw data files

consolidate.csv

final CSV of all combined data sources after pre-processing and cleaning

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Archive		Archive
Indeed_WebScraping		Indeed_WebScraping
InternSg_WebScraping		InternSg_WebScraping
Job Analysis		Job Analysis
JobStreet_WebScraping		JobStreet_WebScraping
LinkedIn_WebScraping		LinkedIn_WebScraping
MyCareersFuture_WebScraping		MyCareersFuture_WebScraping
Web_App		Web_App
__pycache__		__pycache__
dags		dags
logs		logs
.DS_Store		.DS_Store
README.md		README.md
airflow.cfg		airflow.cfg
airflow.db		airflow.db
consolidated.csv		consolidated.csv
credentials.py		credentials.py
indeed_jobs_modified.csv		indeed_jobs_modified.csv
jobstreet_jobs_modified.csv		jobstreet_jobs_modified.csv
linkedin_jobs_modified.csv		linkedin_jobs_modified.csv
mycareerfuture_jobs_modified.csv		mycareerfuture_jobs_modified.csv
mycareersfuture_jobs_modified.csv		mycareersfuture_jobs_modified.csv
new_indeed_jobs_modified.csv		new_indeed_jobs_modified.csv
new_jobstreet_jobs_modified.csv		new_jobstreet_jobs_modified.csv
new_linkedin_jobs_modified.csv		new_linkedin_jobs_modified.csv
new_mycareerfuture_jobs_modified.csv		new_mycareerfuture_jobs_modified.csv
new_mycareersfuture_jobs_modified.csv		new_mycareersfuture_jobs_modified.csv
standalone_admin_password.txt		standalone_admin_password.txt
user_credentials.txt		user_credentials.txt
webserver_config.py		webserver_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

is3107_DataEngineering

Set Up

UI with Streamlit

Airflow

Files

Config files

Archive (outdated)

Indeed_WebScraping

InternSg_WebScraping

JobStreet_WebScraping

LinkedIn_WebScraping

MyCareersFuture_WebScraping

Job Analysis

Web_App

dags

remaining CSV files

consolidate.csv

About

Releases

Packages

Languages

WenJett/SeekersJob

Folders and files

Latest commit

History

Repository files navigation

is3107_DataEngineering

Set Up

UI with Streamlit

Airflow

Files

Config files

Archive (outdated)

Indeed_WebScraping

InternSg_WebScraping

JobStreet_WebScraping

LinkedIn_WebScraping

MyCareersFuture_WebScraping

Job Analysis

Web_App

dags

remaining CSV files

consolidate.csv

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages