-
Create a new Google Compute Engine instance from the
sdow-web-server
instance template, which is configured with the following specs:- Name:
sdow-web-server-#
- Zone:
us-central1-c
- Machine Type: e2-micro (2 vCPU, 1 core, 1 GB memory)
- Boot disk: 32 GB SSD, Debian GNU/Linux 12 (bookworm)
- Notes: Click "Set access for each API" and use default values for all APIs except set Storage to "Read Write"
- Firewall: Allow HTTP and HTTPS traffic
- Monitoring: Install Ops Agent for Monitoring and Logging
- Name:
-
Set the default region and zone for the
gcloud
CLI:$ gcloud config set compute/region us-central1 $ gcloud config set compute/zone us-central1-c
-
SSH into the machine:
$ gcloud compute ssh sdow-web-server-# --project=sdow-prod
-
Install required operating system dependencies to run the Flask app:
$ sudo apt-get -q update $ sudo apt-get -yq install git pigz sqlite3 $ sudo apt install python3-virtualenv
-
Clone this directory via HTTPS and navigate into the repo:
$ git clone https://github.com/jwngr/sdow.git $ cd sdow/
-
Create and activate a new
virtualenv
environment:$ virtualenv -p python3 env $ source env/bin/activate
-
Install the required Python libraries:
$ pip install -r requirements.txt
-
Copy the latest compressed SQLite file from the
sdow-prod
GCS bucket:$ gsutil -u sdow-prod cp gs://sdow-prod/dumps/<YYYYMMDD>/sdow.sqlite.gz sdow/
-
Decompress the SQLite file:
# Warning: This may take ~10 minutes. $ pigz -d sdow/sdow.sqlite.gz
-
Create the
searches.sqlite
file:$ sqlite3 sdow/searches.sqlite ".read sql/createSearchesTable.sql"
Note: Alternatively, copy a backed-up version of
searches.sqlite
:$ gsutil -u sdow-prod cp gs://sdow-prod/backups/<YYYYMMDD>/searches.sql.gz sdow/searches.sql.gz $ pigz -d sdow/searches.sql.gz $ sqlite3 sdow/searches.sqlite ".read sdow/searches.sql" $ rm sdow/searches.sql
-
Install required operating system dependencies to generate an SSL certificate (this and the following instructions are based on these blog posts):
$ sudo apt-get -q update $ sudo apt install nginx snapd $ sudo snap install --classic certbot $ sudo ln -s /snap/bin/certbot /usr/bin/certbot
-
Add this
location
block inside theserver
block in/etc/nginx/sites-available/default
:location ~ /.well-known { allow all; }
-
Start NGINX:
$ sudo systemctl restart nginx
-
Ensure the VM has been assigned the proper static IP address (
sdow-web-server-static-ip
) by editing it on the GCP console. -
Create an SSL certificate using Let's Encrypt's
certbot
:$ sudo certbot certonly -a webroot --webroot-path=/var/www/html -d api.sixdegreesofwikipedia.com --email [email protected]
-
Ensure auto-renewal of the SSL certificate is configured properly:
$ sudo certbot renew --dry-run
-
Configure the following cron jobs:
$ crontab -e # Add the stuff below and save.
# Auto-renew the SSL certificate daily. 0 4 * * * sudo /usr/bin/certbot renew --noninteractive --renew-hook "sudo /bin/systemctl reload nginx" # Restart the web server every ten minutes (to defend against hangs). */10 * * * * /home/jwngr/sdow/env/bin/supervisorctl -c /home/jwngr/sdow/config/supervisord.conf restart gunicorn # Backup the searches database weekly. 0 6 * * 0 /home/jwngr/sdow/scripts/backupSearchesDatabase.sh
Note: Let's Encrypt debug logs can be found at
/var/log/letsencrypt/letsencrypt.log
.Note: Supervisor debug logs can be found at
/tmp/supervisord.log
. -
Install a mail service in order to read logs from cron jobs:
$ sudo apt-get -yq install postfix # Choose "Local only" and use the default email address.
Note: Cron job logs will be written to
/var/mail/jwngr
. -
Generate a strong Diffie-Hellman group to further increase security:
$ sudo openssl dhparam -out /etc/ssl/certs/dhparam.pem 2048
-
Copy over the NGINX configuration, making sure to back up the original configuration:
$ sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup $ sudo cp config/nginx.conf /etc/nginx/nginx.conf
-
Restart
nginx
:$ sudo systemctl restart nginx
-
Activate the
virtualenv
environment:$ cd sdow/ $ source env/bin/activate
-
Start the Flask web server via Supervisor which runs Gunicorn:
$ cd config/ $ supervisord
-
Use
supervisorctl
to manage the running web server:$ supervisorctl status # Get status of running processes $ supervisorctl stop gunicorn # Stop web server $ supervisorctl start gunicorn # Start web server $ supervisorctl restart gunicorn # Restart web server
Note:
supervisord
andsupervisorctl
must be run from theconfig/
directory or specify the configuration file via the-c
argument or else they will return an obscure"http://localhost:9001 refused connection"
error message.Note: Log output from
supervisord
is written to/tmp/supervisord.log
and log output fromgunicorn
is written to/tmp/gunicorn-stdout---supervisor-<HASH>.log
. Logs are also written to Stackdriver Logging.
To update the web server to a more recent sdow.sqlite
file with minimal downtime, run the
following commands after SSHing into the web server:
$ cd sdow/
$ source env/bin/activate
$ gsutil -u sdow-prod cp gs://sdow-prod/dumps/YYYYMMDD/sdow.sqlite.gz sdow/sdow_new.sqlite.gz
$ pigz -d sdow/sdow_new.sqlite.gz # This takes ~10 minutes and causes search to be non-responsive.
$ mv sdow/sdow_new.sqlite sdow/sdow.sqlite
$ cd config/
$ supervisorctl restart gunicorn
To update the Python server code which powers the SDOW backend, run the following commands after SSHing into the web server:
$ cd sdow/
$ source env/bin/activate
$ git pull
$ pip install -r requirements.txt
$ cd config/
$ supervisorctl restart gunicorn