Skip to content
This repository has been archived by the owner on May 7, 2018. It is now read-only.
/ datascibowl Public archive
forked from jo-tham/datascibowl

Notes and scripts for kaggle datascibowl

Notifications You must be signed in to change notification settings

cortexml/datascibowl

 
 

Repository files navigation

Project Structure

Hopefully you’re using virtualenvwrapper, make a new project and clone this repo into the venv project directory.

mkproject datascibowl -p /usr/bin/python2.7

This has only been tested with python 2.7, using it as the python in the venv is therefore recommended.

The contents of this project are something like this:

ls -l
total 252
drwxr-xr-x 4 jotham users   4096 Feb 24 07:01 data
-rw-r--r-- 1 jotham users    117 Feb 22 17:10 data_download.txt
drwxr-xr-x 3 jotham users   4096 Mar  4 20:09 ipynb_tests
drwxr-xr-x 3 jotham users   4096 Mar  3 22:01 ipynb_work
-rw-r--r-- 1 jotham users 185519 Mar  3 22:10 notes.org
drwxr-xr-x 8 jotham users   4096 Feb 22 17:33 pylearn2_fork
-rw-r--r-- 1 jotham users    566 Feb 22 17:32 requirements.txt
-rw-r--r-- 1 jotham users    613 Feb 22 17:58 theano_gpu_test.py
-rw-r--r-- 1 jotham users   1009 Feb 22 20:35 theano_gpu_test.pyc
-rw-r--r-- 1 jotham users  31011 Feb 23 20:59 theano_test.txt

ipynb_tests has some gross hackery and early model tests.

ipynb_work has scripts that should stand on their own for fitting multilayer perceptron and convolutional neural network.

data directory should contain data you want to use with pylearn. E.g.

find ./data -maxdepth 2 -type d -ls
524382    4 drwxr-xr-x   4 jotham   users        4096 Feb 24 07:01 ./data
1183533    4 drwxr-xr-x   4 jotham   users        4096 Feb 24 07:02 ./data/datascibowl
1451882    4 drwxr-xr-x 123 jotham   users        4096 Feb 22 17:25 ./data/datascibowl/train
1315239 3864 drwxr-xr-x   2 jotham   users     3956736 Feb 22 17:23 ./data/datascibowl/test
1050702    4 drwxr-xr-x   2 jotham   users        4096 Feb 22 17:12 ./data/mnist

Put the data in there as shown above. The class for the datascibowl data requires the PYLEARN2_DATA_PATH environment var. Enforce it however you like. I opt to put it in ipython notebooks using

import os
os.environ["PYLEARN2_DATA_PATH"] = "/home/jotham/projects/2014-12-20_datascibowl/data"

But you could make a venv hook or something.

Project and pylearn setup

Required packages are in requirements.txt.

cat requirements.txt
Cython==0.22
Jinja2==2.7.3
MarkupSafe==0.23
Pillow==2.7.0
PyYAML==3.11
Theano==0.6.0
argparse==1.3.0
backports.ssl-match-hostname==3.4.0.2
certifi==14.05.14
ipython==2.3.1
matplotlib==1.4.2
mock==1.0.1
nose==1.3.4
numexpr==2.4
numpy==1.9.1
pandas==0.15.2
pyaml==14.12.10
-e [email protected]:jo-tham/pylearn2.git@6a7d018b4c388617df57244e7df0b825839a1329#egg=pylearn2-origin/datascibowl
pyparsing==2.0.3
python-dateutil==2.4.0
pytz==2014.10
pyzmq==14.4.1
scikit-image==0.10.1
scikit-learn==0.15.2
scipy==0.14.1
six==1.9.0
tables==3.1.1
tornado==4.0.2
wsgiref==0.1.2

If you want to contribute to the pylearn fork, remove it from requirements.txt and install using following instructions. Install everything else into the venv.

pip install -r requirements.txt

Clone the pylearn fork at jo-tham/pylearn2. Checkout the datascibowl branch. Install it into the active venv with.

cd pylearn2
python setup.py develop

Try running the contents of the ipython notebooks in ipynb_work.

About

Notes and scripts for kaggle datascibowl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%