MUSA 620 - Geospatial Data Science in Python
University of Pennsylvania, Stuart Weitzman School of Design
Thursday from 5pm to 8pm in Meyerson Hall, room B4.
- Instructor: Nick Hand, [email protected]
- Teaching Assistant: Chloe Sheen, [email protected]
Nick:
- 6-8pm, Mondays
- Remotely via Google Hangouts ([email protected]).
- Easiest by appointment, so please send me an email if you'd like to chat.
Chloe:
- Tuesdays, 2-4pm
- On campus in EC 226
- GitHub: https://github.com/MUSA-620-Fall-2019
- Piazza: https://piazza.com/upenn/fall2019/musa620/home
This course will provide students with the knowledge and tools to turn data into meaningful insights, with a focus on real-world case studies in the urban planning and public policy realm. Focusing on the latest Python software tools, the course will outline the “pipeline” approach to data science. It will teach students the tools to gather, visualize, and analyze datasets, providing the skills to effectively explore large datasets and transform results into understandable and compelling narratives. The course is organized into five main sections:
- Exploratory Data Science: Students will be introduced to the main tools needed to get started analyzing and visualizing data using Python.
- Introduction to Geospatial Data Science: Building on the previous set of tools, this module will teach students how to work with geospatial datasets using a range of modern Python toolkits.
- Data Ingestion & Big Data: Students will learn how to collect new data through web scraping and APIs, as well as how to work effectively with the large datasets often encountered in real-world applications.
- Geospatial Data Science in the Wild: Armed with the necessary data science tools, students will be introduced to a range of advanced analytic and machine learning techniques using a number of innovative examples from modern researchers.
- From Exploration to Storytelling: The final module will teach students to present their analysis results using web-based formats to transform their insights into interactive stories.
The course will be conducted in weekly sessions devoted to lectures, interactive demonstrations, and in-class labs.
There is one required final project at the end of the semester, and you must complete five of the seven homework assignments. Four of the assignments are required, and you are allowed to choose the last assignment to complete (out of the remaining three options). The required assignments are denoted by asterisks below.
For the final project, students will replicate the pipeline approach on a dataset (or datasets) of their choosing. Students will be required to use several of the analysis techniques taught in the class and produce a web-based data visualization that effectively communicates the empirical results to a non-technical audience. The final product should also include a description of the methods used in each step of the data science process (collection, analysis, and visualization).
For more details on the final project, see its repository.
The grading breakdown is as follows: 50% for homework; 40% for final project, 10% for participation. Your participation grade is a function of both in-class participation and Piazza participation.
Of the seven homework assignment, you must complete five, Three are required (denoted by the asterisk below). Late homework will be accepted but penalized.
This course relies on use of Python and various related packages and for geospatial topics. All software is open-source and freely available.
Students are expected to be familiar with and comply with Penn’s Code of Academic Integrity, which is available in the Pennbook, or online at https://catalog.upenn.edu/pennbook/code-of-academic-integrity.
* Denotes a required homework assignment
Assignment dates of homework are tentative and subject to change