The final project is to replicate the pipeline approach on a dataset (or datasets) of your choosing.
The final deliverable will be a web-based data visualization and accompanying description including a summary of the results and the methods used in each step of the process (collection, analysis and visualization).
Written proposal due date: Wednesday, December 7th (or earlier)
Project due date: 5pm on Tuesday, December 20th
Final projects must receive prior approval in the form of a written proposal.
The final deliverable must include all three of the below items:
- a web-based data visualization with a URL (public or private)
- a document describing the project, the results, and the technical methods used in each step (collection, analysis and visualization)
- all code/spreadsheets/datasets used
The materials for steps #2 an #3 above should be submitted to your own specific GitHub repository, which can be created using the link below:
https://classroom.github.com/a/HKj-gRRZ
- The code for your web-based visualization (#1) can be either in the same repository or in a separate public repository. If it is in its own repository, be sure to link to it from the main, submission repository.
- Be sure to include the names of everyone who worked on the final project somewhere in the README, etc!
The project is open-ended. The topic and technologies used are up to you. However, the it must satisfy at least two of the items below:
- Data is collected through a means more sophisticated than downloading (e.g. scraping, API).
- At least one of the datasets contains more than 1,000,000 rows.
- It combines data collected from 3 or more different sources.
- The analysis of the data is reasonably complex, involving multiple steps (geospatial joins/operations, data shaping, data frame operations, etc).
- You use one of the analysis techniques for urban street networks (e.g., osmnx, pandana), clustering (e.g., scikit-learn), or raster datasets
- You perform a machine learning analysis with scikit-learn as part of the analysis.
- The webpage includes a significant interactive component (cross-filtering, interactive widgets, etc)
As a rough guideline, you should shoot for something that is 3-4 times as involved as the required assignments.
Group projects are permitted, with a maximum number of group members of 3. You are also permitted to combine this assignment with one you are working on for another course. But keep in mind that if you choose either of these options, the expectations for the project's scope will be adjusted accordingly.
If you combine this assignment with one from another course, the portion that you are submitting for this final project must be a clearly defined addition to the original project. In such a case, you will be graded only on the portion submitted for this course, not on the entire project.
The final project is worth 40% of the final grade and will be graded on four criteria:
- Concept: Is it sufficiently complex/challenging/sophisticated? Is the final product useful/interesting/creative?
- Technical implementation: Was it well thought out? Was each step done correctly? Does it work as described? Is it consistent with the proposal?
- Visualization: How well does the data visualization serve its purpose? Does it tell a clear story? Are the colors/layout/titles well-chosen?
- Writeup: Is all of the above explained clearly? The writeup should be a multi-page document that explains in depth all aspects of the project's implementation as well as the final results.
To have a better sense of what will be possible for the interactive, web-based component of the final project, here are a few links with public examples.
Note: We'll cover both of these examples in much more detail in the new couple of weeks.
Option 1: Use the Panel python library
- Example gallery: https://panel.pyviz.org/gallery/index.html
- Other demos from a Panel developer: https://jsignell.github.io/
Option 2: Use the Github Pages to display charts in a more blog-oriented format. Here are a couple of demo examples from last semester:
- https://musa-550-fall-2021.github.io/github-pages-starter/
- https://musa-550-fall-2021.github.io/github-pages-single-page-starter/
- An Analysis System for Taxi Data: A series of planning, visualization and prediction tools around taxi ridership.
- Hospitality in the Era of Airbnb