The Caffe deep learning framework is relatively straightforward to use, but getting CUDA and cuDNN to play nicely can be daunting for some. Here I'll show you how to set up an Amazon EC2 instance with CUDA 7.0, cuDNN v3, the NVIDIA fork of Caffe and DIGITS v3. DIGITS is a web application written in python that provides a clean GUI for interfacing with Caffe.
Setting Up An EC2 Instance
This is not an AWS tutorial but setting up an EC2 instance is pretty self-explanatory. If you haven't, navigate to https://aws.amazon.com and create an account. Sign into the console and navigate to the EC2 dashboard. There are two gotchas to be aware of already:
- You can have two AWS accounts with the same email. When I first started I had two accounts without realizing it.
- Take note of your region (top right). When you create an EC2 instance it is located in a particular region. You won't be able to view your instances from a different region.
If you haven't set up an EC2 instance before, the steps are as follows:
- Click Launch Instance on the EC2 dashboard. Select "Ubuntu Server 14.04 LTS ..." and on the next page select a GPU-enabled instance type (g2.2xlarge or g2.8xlarge).
- Now press Next until you get to Add Storage. Increase the storage size of the root volume. I suggest 20GB. 8GB is not enough.
- Click Next twice to get to Configure Security Group. Click Add Rule, enter 5000 in the port range, and set Source to Anywhere.
- Now select Review and Launch. Select Launch again, and you should be prompted to download a keypair.
- Select Create a new key pair from the drop-down box, name it, and hit Download. I'm naming mine keypair. Move this file to your home directory.
Once the instance has a status of running, get the public ip and we can ssh into it after changing the permissions on our keypair file.
$ chmod 600 ~/keypair.pem ssh -i ~/keypair.pem [email protected]<public ip>;
We are now in control of the Ubuntu machine.
Installing NVIDIA Drivers
Update apt-get, install preliminaries and download the CUDA 7.0 installer using these commands:
$ sudo apt-get update && sudo apt-get upgrade; $ sudo apt-get install build-essential; $ wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run;
This package also includes an NVIDIA proprietary driver. Now we extract the CUDA 7.0 installer:
$ chmod +x cuda_7.0.28_linux.run; $ mkdir nvidia_installers; $ ./cuda_7.0.28_linux.run -extract=`pwd`/nvidia_installers;
Before we install the driver, we need to update the machine:
sudo apt-get install linux-image-extra-virtual;
And we need to disable the currently installed open-source driver so it doesn't interfere:
$ sudo nano /etc/modprobe.d/blacklist-nouveau.conf;
Add the following lines:
blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off
$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf; $ sudo update-initramfs -u; $ sudo reboot;
SSH back into the machine after waiting for it to reboot, and run the following commands to complete installation:
$ sudo apt-get install linux-source; $ sudo apt-get install linux-headers-`uname -r`; $ cd nvidia_installers; $ sudo ./NVIDIA-Linux-x86_64-346.46.run;
Accept the EULA. You may get a few warnings, just select OK. When asked about the nvidia-xconfig utility I selected Yes. NVIDIA proprietary drivers are now installed. Run
nvidia-smi for information about your GPU and driver version.
Run the following commands:
$ sudo modprobe nvidia; $ sudo apt-get install build-essential; $ sudo ./cuda-linux64-rel-7.0.28-19326674.run; $ sudo ./cuda-samples-linux-7.0.28-19326674.run;
Accept the EULA and press enter when asked to accept any defaults. Now we update your path variables. Run
sudo nano ~/.bashrc and add the following lines:
$ export PATH=$PATH:/usr/local/cuda-7.0/bin; $ export LD_LIBRARY_PATH=:/usr/local/cuda-7.0/lib64;
To download cuDNN you must register with NVIDIA's Accelerated Computing Developer Program. Do so here https://developer.nvidia.com/cudnn. Select cuDNN v3 Library for Linux and download it to the home folder of your local machine. To get the library onto your Ubuntu VM use the following command on your local machine:
$ scp -i ~/keypair.pem ~/cudnn-7.0-linux-x64-v3.0-prod.tgz [email protected]<public ip>:/home/ubuntu/cudnn-7.0-linux-x64-v3.0-prod.tgz;
Back in the Ubuntu VM now, run the following commands:
$ cd; $ tar -zxf cudnn-7.0-linux-x64-v3.0-prod.tgz; $ cd cuda; $ sudo cp lib64/libcudnn.so.7.0.64 /usr/local/cuda/lib64/; $ sudo cp include/cudnn.h /usr/local/cuda/include/; $ cd /usr/local/cuda/lib64; $ sudo ln -s libcudnn.so.7.0.64 libcudnn.so.7.0; $ sudo ln -s libcudnn.so.7.0 libcudnn.so;
The cuDNN libraries are now available.
Install the dependencies:
$ sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev protobuf-compiler gfortran libjpeg62 libfreeimage-dev libatlas-base-dev git python-dev python-pip libgoogle-glog-dev libbz2-dev libxml2-dev libxslt-dev libffi-dev libssl-dev libgflags-dev liblmdb-dev python-yaml python-numpy; $ sudo easy_install pillow;
Now we download caffe:
$ cd ~; $ git clone https://github.com/NVIDIA/caffe.git nv-caffe;
We need to install caffe's python dependencies. This can take a while (for me, up to half an hour).
$ cd nv-caffe; $ cat python/requirements.txt | Xargs -L 1 sudo pip install;
cp Makefile.config.example Makefile.config and uncomment the line
USE_CUDNN := 1 using
sudo nano makefile.config. Use the command
htop (you may need to install htop with
sudo apt-get install htop) to check how many CPU cores you have, then we can compile caffe. Execute the following commands with X as your number of CPU cores:
make pycaffe -jX; make all -jX; make test -jX;
Run the following commands to confirm Caffe is working properly:
Errors are common here. Here are some bugfixes:
# Error: 'build/examples/mnist/convert_mnist_data.bin: error while loading shared libraries: libcudart.so.7.0: cannot open shared object file: No such file or directory'
$ sudo ldconfig /usr/local/cuda/lib64;
# Error: 'libdc1394 error: Failed to initialize libdc1394'
$ sudo ln /dev/null /dev/raw1394;
# Can't complete the train script:
$ nvidia-modprobe -u -c=0;
Download DIGITS v3:
$ cd; $ git clone https://github.com/NVIDIA/DIGITS.git digits; $ cd digits;
DIGITS is written in python, and so we have to install some python dependencies. We'll do this inside of a virtual environment.
$ sudo apt-get install python-pil python-numpy python-scipy python-protobuf python-gevent python-Flask python-flaskext.wtf gunicorn python-h5py;
# Will take up to half an hour
$ sudo pip install virtualenv virtualenv venv source venv/bin/activate cat requirements.txt | xargs -L 1 pip install;
Use the command:
when you're ready to deactivate the virtual python environment (not now). Finally, run DIGITS with the command:
Direct it to
~/nv-caffe for the Caffe installation directory. You may need to execute:
sudo ln /dev/null /dev/raw1394;
Open DIGITS in a local web browser by accessing the URL
http://<public ip>:5000. That's it. You now have a DIGITS web server that you can use to train image classification models.