Use regression to predict potential success of low-budget films based on film characteristics. What factors drive success and how are these factors different for success as defined by revenues vs. success as defined by recognition (Oscar nominations?).
For more information, see my blog post.
movie-scraper.py
scrapes data for all movies (16,100+) on boxofficemojo.comoscar-scraper.py
scrapes Oscars data from Newsdaymovie-predictor-analysis.ipynb
iPython notebook with regression analysisdata/
contains data (pkl files and CPI csv) used in iPython notebook analysis; also the default subdir where pkl files will be saved if scraping scripts are runpresentation/
contains pdf presentation of findings & recommendations
$ git clone https://github.com/dianalam/movie-predictor.git
Scripts were written in Python 2.7. You'll need the following modules:
matplotlib >= 1.5.1
numpy >= 1.10.1
pandas >= 0.17.1
python-dateutil >= 2.4.2
scipy >= 0.16.0
seaborn >= 0.6.0
sklearn >= 0.17
statsmodels >= 0.6.1
To install modules, run:
$ pip install <module>
Scrape boxofficemojo. Note that script comes with the option to get a text alert via Twilio once script is done running. To use, you'll need to pass your Twilio Account SID, Auth Token, and phone number (with +1 at the beginning) as environment variables.
$ TWILIO_SID = <my-account-sid> TWILIO_TOKEN = <my-auth-token> PHONE_NUM = <my-phone-number> python movie-scraper.py
Scrape Newsday
$ python oscar-scraper.py
View/run regression analysis (cells already executed in file)
ipython notebook movie-predictor-analysis.ipynb
Thanks to: