Adding GPU to your Google cloud platform cluster

2 minute read

Adding GPU to your Google cloud platform

Google cloud platform is Google’s product in the competitive market of cloud services providers. We used Google cloud platform in one of our course at UGA, Data science practicum. This is a tutorial to add supported Nvidia GPU to your Google cloud platform. A side note: This tutorial has all the information provided as on 04/01/2018 and there might be new features added to the platform such that many of the steps described below can be reduced.

Types of GPUs supported

  1. Nvidia-Tesla-K80
  2. Nvidia-Tesla-p100

Before you begin

  1. You will need GPU quota in your project. Check your quota.
  2. You will not be able to create cluster with GPU through console, You will have to install Google cloud sdk on your local machine. Instructions can be found here
  3. You will have to learn how to create a Google cloud dataproc cluster using gcloud command. Go here for details.

gcloud create dataproc cluster command

An example of the create command can be found below:

gcloud dataproc clusters create ankit-31 --initialization-actions gs://uga-dsp/scripts/conda-dataproc-bootstrap.sh --master-boot-disk-size 500GB --master-machine-type custom-4-15360 --num-masters 1 --region us-east1 --scopes cloud-platform --worker-boot-disk-size 500GB --worker-machine-type custom-6-30720 --num-workers 3

Compare it with the tutorial link in 4th point above to understand different arg parameters.

I want to weigh on specifically one arg which is --initialization-actions. We can specify a .sh file which can install our choice of software while creation of the cluster. Take conda-dataproc-bootstrap.sh for an example, It installs conda environment for us while creating the cluster.

Let’s add GPU to the cluster

So if you have understood above requirements, we just have to do two things to attach GPU to the cluster:

1. Attaching GPU to the cluster

Here, I want to mention that this feature is a Beta feature of Google so you will have to use your create cluster command as gcloud beta ... instead of gcloud....

You will have to add two args to the gcloud create command as below:

gcloud beta dataproc clusters create <args> --master-accelerator type=nvidia-tesla-k80 --worker-accelerator type=nvidia-tesla-k80,count=4

Example:

gcloud beta dataproc clusters create ankit-31 --initialization-actions gs://uga-dsp/scripts/conda-dataproc-bootstrap.sh --master-boot-disk-size 500GB --master-machine-type custom-4-15360 --num-masters 1 --region us-east1 --scopes cloud-platform --worker-boot-disk-size 500GB --worker-machine-type custom-6-30720 --num-workers 3 --master-accelerator type=nvidia-tesla-k80 --worker-accelerator type=nvidia-tesla-k80,count=3

This will attach the GPU to master and worker nodes.

2. Installing GPU drivers

You will have to use one more initialization script to install nvidia drivers in the --initialization-actions. You will have to upload below script on your bucket and use it’s gs:// url in the --initialization-actions.

script: https://github.com/ankit-vaghela30/Cilia-Segmentation/blob/master/scripts/install-gpu-drivers.sh

This script will install required drivers on the cluster.

To verify that the drivers are installed properly, execute below command

nvidia-smi

I would highly recommend visiting below tutorial and especially go to “Spark Configuration” and “Example GPU job” sections to see how to run the script.

tutorial: https://cloud.google.com/dataproc/docs/concepts/compute/gpus

References:

[1] Adding GPU to Cluster

[2] gcloud create dataproc cluster command

[3] Installing Google Cloud sdk