The TLC releases monthly trip record data including fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. It can be found here: (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml)
Using data from 2016, please build a model that would help a prospective employee determine the best place to live in 2017 based on the following:
- The employee is moving to New York to start a new job at 1 Irving Pl, New York, NY 10003
- The employee wants to live in one of the following areas: a) +/- 2 blocks around Lincoln Centre, b) Sutton Place or c) Two Bridges.
- The employee prefers an efficient commute (she doesn't like to ride her bike or take the subway). Her employer pays for yellow cab rides.
- She aims to get into work before 9:00 AM, and leave around 6:00 PM.
*** (bonus) factor in real estate prices. Her budget is around 2500 USD.
- A data science model that predicts the best place where the employee should live if she wants an efficient commute (based on her commute times), based on her three preferences.
- Clear assumptions on how efficiency is defined.
- Visualization of the sample data method used to compute commute times. Which statistical methods did you use? Be sure to document your assumptions and thought process.
- A report that details your process of experimenting and building the above.
In your report, be sure to include answers to the following:
- Where should the candidate live and why? What's the commute time from that location?
- Which data science problem are you tackling?
- Which features do you find more relevant? Why?
- Which subset of the data are you using? Which (if any) sampling methods did you apply?
- We are very interested in your thought process, assumptions, and design decisions. Please document them in your report.
- The time limit for this challenge is 72 hours. You can use whichever programming languages or stack you feel most comfortable with.
- Please submit a PR to this repository, with the code that you have produced and the report on your process.
- Your solution should be functional, and we should be able to reproduce the results in your report.
As data scientists, an invaluable part of our skill set is knowing how to effectively Google our problems and bugs. As such, it is OK for you to use resources on the Internet for this challenge. We only ask you to refrain from doing two things:
- Copying and pasting code samples from the Internet and presenting them as your own work. This would be considered plagiarism and disqualify you immediately.
- Googling anything specific to this dataset. Please treat the dataset as if it is novel and unique to you.
VERY IMPORTANT: Don't hesitate to contact us along the way and update us on your progress, so we can provide feedback on your direction.