Article Image
read

There is one problem with machine learning. We don't all have a good GPU or none at all. As a student it is quite common to just have a laptop which normally doesn't really have a GPU and of course also can't be upgraded besides getting an external GPU...

As mentioned in a previous article Google is here to help us out at least kind of. Google Colab is a service where you can use a Jupyter Notebook on their server including a K80 GPU. This can be connected to your gdrive and then you can start.

This blog is more about the downsides and how to actually work with this service.

The first steps after creating a notebook:

  • Activate the GPU:

    • Runtime -> Change runtime type -> Select GPU
  • Access to your gdrive:

    • !apt-get install -y -qq software-properties-common python-software-properties module-init-tools !add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null !apt-get update -qq 2>&1 > /dev/null from google.colab import auth auth.authenticate_user() from oauth2client.client import GoogleCredentials creds = GoogleCredentials.get_application_default() import getpass

    and

        from google.colab import drive
        drive.mount('/content/drive')

    Both of these blocks will give you an url where you get a token to actually give google colab the access to gdrive. Unfortunately this has to be done every time. The ! in front of some lines/commands indicates that this is a linux and not a python command.

    This is useful for some other steps later on. You can check how much disk space you have using

    !df -h

    Which gives you something like this:

    Filesystem      Size  Used Avail Use% Mounted on
    overlay         359G  9.8G  331G   3% /
    tmpfs           6.4G     0  6.4G   0% /dev
    tmpfs           6.4G     0  6.4G   0% /sys/fs/cgroup
    tmpfs           6.4G  249M  6.2G   4% /opt/bin
    /dev/sda1       365G   12G  354G   4% /etc/hosts
    shm              64M     0   64M   0% /dev/shm
    tmpfs           6.4G     0  6.4G   0% /sys/firmware
    drive           100G   57G   44G  57% /content/drive

    Where the last line indicates your gdrive.

Okay this is basically everything to get you started but some more useful information:

  • Installing python packages with !pip install ...

This sounds awesome but actually working with it is sometimes a pain. If you have a lot of training data (here images) it would normally take hours to transfer them gdrive as gdrive is extremly slow in creating files and normally gdrive doesn't give you the option of extracting zip files. I know there are some chrome extension of doing that but what they seem to do is ridiculous in our case as they download your zip file, extract them on their server and push it to gdrive which itself again is slow in creating files. Furthermore those extensions sometimes have a file size limit. Anyway... Remember that you basically have access to some kind of a server now using this colab notebook. You can unzip your zip file yourself now but as you have guessed it is slow as gdrive still have to create the files. Additonally it often appears that it seems to work and the unzip command finishes and you have access to those files in google colab but your gdrive doesn't show them. I have no idea where they are... My idea after sawing how much space you seem to have on the google colab instance was to transfer the zip file from the mounting point to google colab and extract it on that server. This is much faster than the other approach and doesn't show any weird behavior so far. Nonetheless the zip and the extracted folder are gone if you start your instance again but at least it is relatively fast.

Now running your neural network model is straight forward the next thing is to save the model. Here I also sometimes have a problem of saving the trained model to my gdrive using the mounting point but you can use PyDrive:

import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

and then

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

save the model on the google colab server and use:

upload = drive.CreateFile({'title': 'FILENAME_ON_GDRIVE'})
upload.SetContentFile('FILENAME_ON_COLAB')
upload.Upload()

The filename on gdrive somehow needs to be a filename and not a path as it seems that it can only be stored in your root folder. Don't ask me why...

Try it out and post your thoughts!

Have fun!

Okay and if you consider buying a GPU for your computer:

Affiliate link:

Would love to get one for myself so use the above link (any item on amazon afterwards) to increase my chances ;)

If you enjoy the blog in general please consider a dotation using PayPal:

Blog Comments powered by Disqus.
Blog Logo

Ole Kröger


Published

Image

All about open source stuff

Back to Overview