ICRG @ Shanghai Jiao Tong University

To ensure the code can efficiently be trained on servers. Here I create a docker image.

System configruation

Linux dist: Ubuntu 16.4 Nvidia-driver: 384/387 Cuda version: V9.0 Cudnn version: V7 Pytorch: v1.1.0 (highest version still support cuda v.9.0)

Docker install

A complete docker tutorial can be found at docker and docker engine on ubuntu.

1. configurations

update apt packages

$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

Yo may get the curl error curl : Depends: libcurl3-gnutls (= 7.47.0-1ubuntu2) but 7.47.0-1ubuntu2.7 is to be installed, which may caused by the dependance issue between curl and libcurl. A quick fix is purge the libcurl:

$ sudo apt-get purge  libcurl3-gnutls
$ sudo apt-get install curl

Add docker GPG keys:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88 #fingerprint verification

use the following command shows the linux distribution:

$ lsb_release -cs 
> xenial # dist of ubuntu 16.4

then add the stable repo:

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

make sure you use [arch=amd64] above instead of x86_64 even if the system info shows x86_64.

2. Install docker.

 $ sudo apt-get update
 $ sudo apt-get install docker-ce docker-ce-cli containerd.io

new dependance error may raise.
 containerd.io : Depends: libseccomp2 (>= 2.4.0) but 2.2.3-3ubuntu3 is to be installed
 docker-ce : Depends: libseccomp2 (>= 2.3.0) but 2.2.3-3ubuntu3 is to be installed
both the containered.io and docker-ce have the dependace on libseccomp2. The docker-ce one can be addressed suggestion:
$ sudo add-apt-repository ppa:ubuntu-sdk-team/ppa
$ sudo apt-get update
However, the containered.io dependency problem still remains:
containerd.io : Depends: libseccomp2 (>= 2.4.0) but 2.3.1-2ubuntu2~ubuntu16.04.1~ppa1 is to be installed
To address that, the only feasible solution is to manually update the package. The installation follows the suggestion:
$ sudo nano /etc/apt/sources.list
then add the following line to the above sources.list file:
deb http://security.ubuntu.com/ubuntu xenial-security main 
then don’t forget the update:
$ sudo apt-get update

3. verify the installation by “hello-world” docker image.

$ sudo docker run hello-world

The docker version can be checked by:

$ docker version

4. Allow non-sudo user use docker

more details can be found at docker

Due to the bind between docker daemon and unix socket (owned by root user and granted access by sudo), you can create docker group specifically for members not on the sudo-list.

$ sudo groupadd docker #create docker group
$ sudo usermod -aG docker $USER # add current USER to the docker group
$ newgrp docker # activate the group change, relog the account if not work.

NVIDIA Container toolkit (aka. nvidia-docker)

install

As described in their repo, install the nvidia docker on ubuntu:

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Usage

CUDA version: the CUDA version highly depends on the NVIDIA GPU driver version. Here I use cuda:9.0.

#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi

Build and run your image

more details can be found at docker get-start

docker hub

docker hub account login:

$ docker login

docker image build

The ./Dockerfile is used for configure the docker image to be builded. Note that, to minimize the docker image size, the multi-stage build is used. Run following bash script for docker image build:

$ bash ./docker_build.sh

To better manage the python packages, you can use requirements.txt in the Dockerfile. Note that, the dependancies defined in requirements.txt can be install by conda-compatible pip.

Some more descriptions can be found here

$ docker login
$ docker push <docker-id>/<repository-name>:<tag>
# docker push elliothe/pytorch:1.1

docker container run

In order to let the code running with the container as the development platform, the code the is mounted via the docker volume approach.