To ensure the code can efficiently be trained on servers. Here I create a docker image.
System configruation
Linux dist: Ubuntu 16.4 Nvidia-driver: 384/387 Cuda version: V9.0 Cudnn version: V7 Pytorch: v1.1.0 (highest version still support cuda v.9.0)
Docker install
A complete docker tutorial can be found at docker and docker engine on ubuntu.
1. configurations
update apt packages
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
Yo may get the curl error curl : Depends: libcurl3-gnutls (= 7.47.0-1ubuntu2) but 7.47.0-1ubuntu2.7 is to be installed, which may caused by the dependance issue between curl and libcurl. A quick fix is purge the libcurl:
$ sudo apt-get purge libcurl3-gnutls
$ sudo apt-get install curl
Add docker GPG keys:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88 #fingerprint verification
use the following command shows the linux distribution:
$ lsb_release -cs
> xenial # dist of ubuntu 16.4
then add the stable repo:
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
make sure you use [arch=amd64] above instead of x86_64 even if the system info shows x86_64.
2. Install docker.
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
new dependance error may raise.
containerd.io : Depends: libseccomp2 (>= 2.4.0) but 2.2.3-3ubuntu3 is to be installed docker-ce : Depends: libseccomp2 (>= 2.3.0) but 2.2.3-3ubuntu3 is to be installedboth the
containered.ioanddocker-cehave the dependace onlibseccomp2. Thedocker-ceone can be addressed suggestion:$ sudo add-apt-repository ppa:ubuntu-sdk-team/ppa $ sudo apt-get updateHowever, the
containered.iodependency problem still remains:containerd.io : Depends: libseccomp2 (>= 2.4.0) but 2.3.1-2ubuntu2~ubuntu16.04.1~ppa1 is to be installedTo address that, the only feasible solution is to manually update the package. The installation follows the suggestion:
$ sudo nano /etc/apt/sources.listthen add the following line to the above
sources.listfile:deb http://security.ubuntu.com/ubuntu xenial-security mainthen don’t forget the update:
$ sudo apt-get update
3. verify the installation by “hello-world” docker image.
$ sudo docker run hello-world
The docker version can be checked by:
$ docker version
4. Allow non-sudo user use docker
more details can be found at docker
Due to the bind between docker daemon and unix socket (owned by root user and granted access by sudo), you can create docker group specifically for members not on the sudo-list.
$ sudo groupadd docker #create docker group
$ sudo usermod -aG docker $USER # add current USER to the docker group
$ newgrp docker # activate the group change, relog the account if not work.
NVIDIA Container toolkit (aka. nvidia-docker)
install
As described in their repo, install the nvidia docker on ubuntu:
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Usage
CUDA version: the CUDA version highly depends on the NVIDIA GPU driver version. Here I use cuda:9.0.
#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi
# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi
# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi
Build and run your image
more details can be found at docker get-start
docker hub
docker hub account login:
$ docker login
docker image build
The ./Dockerfile is used for configure the docker image to be builded. Note that, to minimize the docker image size, the multi-stage build is used. Run following bash script for docker image build:
$ bash ./docker_build.sh
To better manage the python packages, you can use requirements.txt in the Dockerfile. Note that, the dependancies defined in requirements.txt can be install by conda-compatible pip.
Some more descriptions can be found here
share docker images on Docker Hub
$ docker login
$ docker push <docker-id>/<repository-name>:<tag>
# docker push elliothe/pytorch:1.1
docker container run
In order to let the code running with the container as the development platform, the code the is mounted via the docker volume approach.