Google Colab is very user friendly and relatively easy to get started with.
Below are several recommendations and warnings that we feel are worth highlighting.
GitHub & Colab
- It’s extremely useful to link GitHub and colab (using the latter in a browser where you are signed into GitHub), following these instructions. You can install the “open in colab” GitHub extension, which allows you to open notebooks directly from the GitHub user interface.
File Transfer
- There are many ways to get data in and out of google colab. We strongly recommend that students use a solution that can be scripted (e.g. linking to google drive in the notebook code) instead of solutions that require manual input (e.g. uploading data from your local computer) More information can be found here.
GPU Use
-
Using a GPU on colab should be as simple as turning on the GPU in your notebook settings and running your keras/tensorflow code. If you do encounter issues or need to look into more advanced GPU settings, this page is a good starting point.
-
Colab’s free GPU is great but it takes some care to use it. There are two important caveats:
-
Free Colab notebooks are aggressively CPU constrained. Users need to watch out for CPU bottlenecks and creatively work around them. Techniques such as using smaller batch sizes, caching pre-processed data, moving processing steps to GPU, etc. can be very helpful.
-
Colab notebooks are also aggressively time constrained. The kernels are pruned based on activity in obscure and inconsistent ways. It's possible to manage this but very valuable to be aware of in advance.
-
-
The best way to approach the second problem (and an important approach to cloud computing in general) is to set up your system so that it can be interrupted and pick up gracefully at any moment. For example, you might use google drive to store a log of neural network training progress as well as weights for each epoch. You can write your script/notebook so that it's able to check if training was in progress and then pick it back up in the middle. You can read the log, see if the most recent epoch < max_epochs, and if so, load the most recent weights, set the learning rate and other params accordingly, and then off to the races.