-
Notifications
You must be signed in to change notification settings - Fork 6
Backups
The CantusDB Staging and Production servers are set up to automatically be backed up periodically.
Past backups of the database can be found in the /home/ubuntu/backups/postgres/
directory on the relevant server.
Backups of the production server only are also transferred to Project Storage on DRAC's Cedar cluster.
Backups are triggered by cron, and scheduling information for these backups can be found in the cron
role of the CantusDB ansible set-up.
We retain a limited number of daily (we keep 7), weekly (8) and monthly (12) backups, and an unlimited number of annual backups. Within /home/ubuntu/backups/postgres/
, these can be found in the daily/
, weekly/
, monthly/
and yearly/
directories respectively. Logic outlining the retention of backups can be found in CantusDB/cron/postgres/db_backup.sh
.
A manual database dump can be created by ssh-ing into the relevant server and running docker-compose exec postgres pg_dump cantusdb -U cantusdb > ~/cantus_dump.sql
.
A set of backups (following the policy outlined in "Retention" above) from the production server are stored in DRAC's Cedar Project Storage. Transfer of these backups to cedar storage is managed by a cron job on the server.
The transfer process makes use of key-pair authentication with a public key installed by an active DRAC user with access to Jennifer's DRAC allocation. The following outlines the process of setting up this key:
- Create a new key pair on your local machine.
- Install the public key for your DRAC user. Log-in to https://ccdb.alliancecan.ca/ and go to "My Account > Manage SSH Keys". Paste your public key in the designated field, provide an optional name, and click "Add Key".
- Test that your key was installed by attempting to use your private key to log in to Cedar. In your terminal, try
ssh -i /path/to/private/key/created/above [your ccdb account username]@cedar.computecanada.ca
. Note that you may need to configure your CCDB account with two-factor authentication. - Once logged in, you'll be on the command line in your user's home directory. Backups are at
./projects/def-jbain/cantusdb/
. Note that if you have to re-create this folder, you'll probably need to change the group ownership of this directory to thedef-jbain
group rather than the default (your own user group) and set the GID bit for the directory and all sub-directories. Set the GID bit by running `chmod -R g+s [/directory]. See "Backup File Permissions" below for more details and links to relevant DRAC documentation.
If this works, you'll then need to configure this key to be used for automation purposes. Details of the requirements for this are listed on the DRAC wiki.
In short, you'll need to:
- email Alliance support to have them add your CCDB account to those whitelisted for automated access.
- reinstall the ssh key you created with the automation restrictions listed in the DRAC page linked above. Delete the key you just added to your CCDB account and re-add the public key as
restricted,from="[IP of production CantusDB server",command="/cvmfs/soft.computecanada.ca/custom/bin/computecanada/allowed_commands/transfer_commands.sh" [public key]
Now, you'll need to give the CantusDB server access to the private key paired to the public key you just installed.
-
scp
the private key to the CantusDB server. - Access the command line as root with
sudo bash
. - Change the owner and group to of the private key file to
cantusdb
usingchown cantusdb path/to/keyfile
andchgrp cantusdb path/to/keyfile
. - Move the private keyfile to
/home/cantusdb/.ssh
withmv /path/to/keyfile /home/cantusdb/.ssh
. If the.ssh
directory does not exist, create it with thecantusdb
user so it has appropriate permissions. - Try to
scp
a file into the cedar directory to test that your key is properly set up. Remember to pointscp
to your keyfile through the-i
flag. You'll want to use the "automation" host for the cluster (as of writing,robot.cedar.alliancecan.ca
; see the DRAC documentation). If the file appears, you're probably set up!
We use the rsync
utility to transfer files from the production server to project storage. See CantusDB's ansible repo cron-update
role for the actual command run. Use of rsync
comes with some additional considerations.
Our rsync
command is configured not to delete files from Cedar. This is to ensure that accidental deletion of the backup files on the production server don't trigger deletion of those files from the Cedar cluster...defeating the purpose of the backup! However, this currently means that we will need to manually do any deletion of unnecessary backup files. Jennifer's allocation is currently 1TB of Cedar storage space, meaning it's unlikely we'll run into that limit anytime soon, but it's a good thing to know that at some point some manual cleaning will be necessary.
File ownership and permissions are a bit tricky because the Cedar cluster has strict storage quotas for individual users (the larger quotas are through the user group associated with the DRAC allocation; in this case, the def-jbain
group has the 1TB quota). This means that if rsync
tries to create a file that is owned by an individual's user group instead of def-jbain
, you'll get an error. There's even a question about this point in DRAC's FAQs. In short, the rsync
command needs to be configured so that the def-jbain
group owns the file created on the other end. The current two-part cron
job does this, but there may be a single command that could be substituted.
Note that if new subdirectories are created that are part of the backups
directory on the production machine, this will cause the creation of the same subdirectory on Cedar. These newly-created directories will probably need to have their GID bit set (see documentation) so that files in those subdirectories have the correct group ownership by default.
Jennifer has much more (30TB) storage currently allocated on the Graham cluster. This is "nearline" tape-based storage, rather than "project" storage. See more on the benefits of nearline storage on the DRAC wiki. Whether the increased space and decreased cost is useful to us when balanced against some of the considerations listed there is potentially a project for another time.