To ensure the code can efficiently be trained on servers. Here I create a docker image.
System configruation
Linux dist: Ubuntu 16.4 Nvidia-driver: 384/387 Cuda version: V9.0 Cudnn version: V7 Pytorch: v1.1.0 (highest version still support cuda v.9.0)
Docker install
A complete docker tutorial can be found at docker and docker engine on ubuntu.
1. configurations
update apt
packages
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
Yo may get the curl error curl : Depends: libcurl3-gnutls (= 7.47.0-1ubuntu2) but 7.47.0-1ubuntu2.7 is to be installed
, which may caused by the dependance issue between curl
and libcurl
. A quick fix is purge the libcurl:
$ sudo apt-get purge libcurl3-gnutls
$ sudo apt-get install curl
Add docker GPG keys:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88 #fingerprint verification
use the following command shows the linux distribution:
$ lsb_release -cs
> xenial # dist of ubuntu 16.4
then add the stable
repo:
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
make sure you use [arch=amd64] above instead of x86_64 even if the system info shows x86_64.
2. Install docker.
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
new dependance error may raise.
containerd.io : Depends: libseccomp2 (>= 2.4.0) but 2.2.3-3ubuntu3 is to be installed docker-ce : Depends: libseccomp2 (>= 2.3.0) but 2.2.3-3ubuntu3 is to be installed
both the
containered.io
anddocker-ce
have the dependace onlibseccomp2
. Thedocker-ce
one can be addressed suggestion:$ sudo add-apt-repository ppa:ubuntu-sdk-team/ppa $ sudo apt-get update
However, the
containered.io
dependency problem still remains:containerd.io : Depends: libseccomp2 (>= 2.4.0) but 2.3.1-2ubuntu2~ubuntu16.04.1~ppa1 is to be installed
To address that, the only feasible solution is to manually update the package. The installation follows the suggestion:
$ sudo nano /etc/apt/sources.list
then add the following line to the above
sources.list
file:deb http://security.ubuntu.com/ubuntu xenial-security main
then don’t forget the update:
$ sudo apt-get update
3. verify the installation by “hello-world” docker image.
$ sudo docker run hello-world
The docker version can be checked by:
$ docker version
4. Allow non-sudo user use docker
more details can be found at docker
Due to the bind between docker daemon and unix socket (owned by root user and granted access by sudo), you can create docker group specifically for members not on the sudo-list.
$ sudo groupadd docker #create docker group
$ sudo usermod -aG docker $USER # add current USER to the docker group
$ newgrp docker # activate the group change, relog the account if not work.
NVIDIA Container toolkit (aka. nvidia-docker)
install
As described in their repo, install the nvidia docker on ubuntu:
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Usage
CUDA version: the CUDA version highly depends on the NVIDIA GPU driver version. Here I use cuda:9.0.
#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi
# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi
# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi
Build and run your image
more details can be found at docker get-start
docker hub
docker hub account login:
$ docker login
docker image build
The ./Dockerfile
is used for configure the docker image to be builded. Note that, to minimize the docker image size, the multi-stage build is used. Run following bash script for docker image build:
$ bash ./docker_build.sh
To better manage the python packages, you can use requirements.txt
in the Dockerfile
. Note that, the dependancies defined in requirements.txt
can be install by conda-compatible pip.
Some more descriptions can be found here
share docker images on Docker Hub
$ docker login
$ docker push <docker-id>/<repository-name>:<tag>
# docker push elliothe/pytorch:1.1
docker container run
In order to let the code running with the container as the development platform, the code the is mounted via the docker volume approach.