Skip to content

simple scrapy project to scrape course information in unidiscover

License

Notifications You must be signed in to change notification settings

danielchancfa/unidiscover_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unidiscover_scraper

About the Project

simple scrapy project using python Scrapy for unidiscover to scrape university course information including but not limited to:

  1. course description
  2. salary
  3. employment

Final output as a csv table in discover_uni(1).csv.

Blow is part of the first 5 row:

courseidentifier uniname uniid coursename link course_name Study mode Distance learning Placement year Year abroad
0 10008071/AAUNDERRADUATE5YEAR/Full-time AA School of Architecture 10008071 MArch Architecture /course-details/10008071/AAUNDERRADUATE5YEAR/Full-time MArch Architecture Full time Not Available Not Available Not Available
1 10007783/LV61/Full-time University of Aberdeen 10007783 MA (Hons) Anthropology and History /course-details/10007783/LV61/Full-time MA (Hons) Anthropology and History Full time Not Available Not Available Optional
2 10007783/LV65/Full-time University of Aberdeen 10007783 MA (Hons) Anthropology and Philosophy /course-details/10007783/LV65/Full-time MA (Hons) Anthropology and Philosophy Full time Not Available Not Available Optional
3 10007783/LR61/Full-time University of Aberdeen 10007783 MA (Hons) Anthropology and French /course-details/10007783/LR61/Full-time MA (Hons) Anthropology and French Full time Not Available Not Available Compulsory
4 10007783/LQ65/Full-time University of Aberdeen 10007783 MA (Hons) Anthropology and Gaelic /course-details/10007783/LQ65/Full-time MA (Hons) Anthropology and Gaelic Full time Not Available Not Available Optional

Built With

Getting Started

Prerequisites

  • Python 3.8+
  • Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install scrapy

Usage

scrapy crawl unispider -o course_data_40page.json
  • output raw data as to course_data_40page.json as json file
  • convert the json file into csv tabular format in convert json to csv.ipynb

Road map

  • define output as items in items.py
  • connect to a DB

About

simple scrapy project to scrape course information in unidiscover

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published