Create an event schedule to scrape order book data from a public exchange API (here implemented with Kraken API), compresses it into relevant features, and insert it into a private database. The scraper is hosted serverless on AWS Lambda making it virtually free to run.
- Zappa - deploy Python Lambdas and schedule events
- krakenex - API for Kraken exchange
- AWS Lambda - serverless compute service
- AWS RDS - relational database service
Clone this repo to your local machine.
$ git clone https://github.com/hexamax/orderbook-crypto-scraper.git
Zappa requires an active virtual environment to deploy. Either install and activate your own virtual environment or execute the following steps.
$ cd ohlc-crypto-scraper
$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install -r requirements.txt
Write your database credentials into the corresponding fields of the database configuration file located at ohlc-crypto-scraper/db_config.py
WARNING: The scraper was written, used and tested for a PostgreSQL database only. For compatability make sure to be running a Postgres instance as well. To set up a low cost RDS Postgres instance on AWS check out this tutorial.
The zappa_settings.json
file was initialized with some sensible defaults and will run fine without additional manipulation. However, here are some easy changes you can make to customize your deploy:
- Specify the rate at which the data scraping event is executed by changing the rate expression located in
zappa_settings.json > events > expression
. - Specify the aws region of your deploy in the
aws_region
field. - Specify a custom name for your S3 bucket using the
s3_bucket
field.
Use the following command for the initial deploy only.
$ zappa deploy scrape_event
Zappa will spit out the deployment information to your terminal and let you know if the deploy was succesfull. If the deploy was succesfull your data scraper should now be up and running.
Subsequent deploys are possible by calling zappa update.
$ zappa update scrape_event
If you decided to change the rate expression in the zappa_settings.json
file you can easily reschedule your scraper.
$ zappa schedule scrape_event
This will remove the Lambda function.
$ zappa undeploy scrape_event
You can monitor your scraper's AWS CloudWatch logs directly from the console.
$ zappa tail scrape_event