Getting Started with Docker on NVIDIA Jetson Nano

Verifying if it is shipped with Docker Binaries

ajeetraina@ajeetraina-desktop:~$ sudo docker version
[sudo] password for ajeetraina: 
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:47:53 2020
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:

Checking Docker runtime

Starting with JetPack 4.2, NVIDIA has introduced a container runtime with Docker integration. This custom runtime enables Docker containers to access the underlying GPUs available in the Jetson family.

pico@pico1:/tmp/docker-build$ sudo nvidia-docker version
NVIDIA Docker: 2.0.3
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:47:53 2020
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:

Installing Docker Compose on NVIDIA Jetson Nano

Jetson Nano doesnt come with Docker Compose installed by default. You will need to install it first:

export DOCKER_COMPOSE_VERSION=1.27.4
sudo apt-get install libhdf5-dev
sudo apt-get install libssl-dev
sudo pip3 install docker-compose=="${DOCKER_COMPOSE_VERSION}"
apt install python3
apt install python3-pip
pip install docker-compose

docker-compose version
docker-compose version 1.26.2, build unknown
docker-py version: 4.3.1
CPython version: 3.6.9
OpenSSL version: OpenSSL 1.1.1  11 Sep 2018

Next, add default runtime for NVIDIA:

Edit /etc/docker/daemon.json

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },

    "default-runtime": "nvidia",
    "node-generic-resources": [ "NVIDIA-GPU=0" ]
}

Restart the Docker Daemon

systemctl restart docker

Identify the Jetson board

pico@pico1:~$ git clone https://github.com/jetsonhacks/jetsonUtilities
Cloning into 'jetsonUtilities'...
remote: Enumerating objects: 123, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 123 (delta 15), reused 23 (delta 8), pack-reused 84
Receiving objects: 100% (123/123), 32.87 KiB | 5.48 MiB/s, done.
Resolving deltas: 100% (49/49), done.
pico@pico1:~$ cd jetson
-bash: cd: jetson: No such file or directory
pico@pico1:~$ cd jetsonUtilities/

pico@pico1:~/jetsonUtilities$ ls
LICENSE  README.md  jetsonInfo.py  scripts

pico@pico1:~/jetsonUtilities$ python3 jetsonInfo.py 
NVIDIA Jetson Nano (Developer Kit Version)
 L4T 32.4.4 [ JetPack 4.4.1 ]
   Ubuntu 18.04.5 LTS
   Kernel Version: 4.9.140-tegra
 CUDA 10.2.89
   CUDA Architecture: 5.3
 OpenCV version: 4.1.1
   OpenCV Cuda: NO
 CUDNN: 8.0.0.180
 TensorRT: 7.1.3.0
 Vision Works: 1.6.0.501
 VPI: 4.4.1-b50
 Vulcan: 1.2.70

Install the latest version of CUDA

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/sbsa/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-ubuntu1804-11-3-local_11.3.1-465.19.01-1_arm64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-3-local_11.3.1-465.19.01-1_arm64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-3-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

Verify Docker runtime

docker info | grep runtime
 Runtimes: nvidia runc io.containerd.runc.v2 io.containerd.runtime.v1.linux

Testing GPU Support

We’ll use the deviceQuery NVIDIA test application (included in L4T) to check that we can access the GPU in the cluster. First, we’ll create a Docker image with the appropriate software, run it directly as Docker, then run it using containerd ctr and finally on the Kubernetes cluster itself.

Running deviceQuery on Docker with GPU support

Create a directory

mkdir test
cd test

Copy the sample files

Copy the demos where deviceQuery is located to the working directory where the Docker image will be created:

cp -R /usr/local/cuda/samples .

Create a Dockerfile

FROM nvcr.io/nvidia/l4t-base:r32.5.0
RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples
WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make
CMD ["./deviceQuery"]

sudo docker build -t ajeetraina/jetson_devicequery . -f Dockerfile

pico@pico2:~/test$ sudo docker run --rm --runtime nvidia ajeetraina/jetson_devicequery:latest
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3963 MBytes (4155383808 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

Test 2: Running deviceQuery on containerd with GPU support

Since K3s uses containerd as its runtime by default, we will use the ctr command line to test and deploy the deviceQuery image we pushed on containerd with this script:

#!/bin/bash
IMAGE=ajeetraina/jetson_devicequery:latest
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
ctr i pull docker.io/${IMAGE}
ctr run --rm --gpus 0 --tty docker.io/${IMAGE} deviceQuery

Execute the script

sudo sh usectr.sh

sudo sh usectr.sh 
docker.io/ajeetraina/jetson_devicequery:latest:                                   resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:dfeaad4046f78871d3852e5d5fb8fa848038c57c34c6554c6c97a00ba120d550: done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:4438ebff930fb27930d802553e13457783ca8a597e917c030aea07f8ff6645c0:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:b1cdeb9e69c95684d703cf96688ed2b333a235d5b33f0843663ff15f62576bd4:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:bf60857fb4964a3e3ce57a900bbe47cd1683587d6c89ecbce4af63f98df600aa:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:0aac5305d11a81f47ed76d9663a8d80d2963b61c643acfce0515f0be56f5e301:    done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:37987db6d6570035e25e713f41e665a6d471d25056bb56b4310ed1cb1d79a100:   done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:f0f57d03cad8f8d69b1addf90907b031ccb253b5a9fc5a11db83c51aa311cbfb:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:08c23323368d4fde5347276d543c500e1ff9b712024ca3f85172018e9440d8b0:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:04da93b342eb651d6b94c74a934a3290697573a907fa0a06067b538095601745:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:f84ceb6e8887e9b3b454813459ee97c2b9730869dbd37d4cca4051958b7a5a36:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:93752947af53e2a3225e145b359b956df36e20521b5dde0fe6d3fb92fd2a9538:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:b235194751dee33624fc154603f7e25ecdfbb02538fb7d55fa796df9afa95fee:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:905b1329c1d473c79650e33b882d980b3522fb72e58ecd3456c4fb3c4039fe92:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:8931d5ba88b488c949f77f990e8f9198b153ceb71afd0369eac9c39beb38f2d6:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:cfb2938be99fb944fe31165bdf44532a5536865ce53b12eb7758d1e2a51ad33e:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:606a67bb8db9a1111022bdc6406442e11c1a66653136c5c777114bf67b61038a:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:2f37138d1c8ac71d9314a0f8996ba69579bbc6ee6a57440557bc7eef486ed292:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:9ce7ce1da17c2b8149573d1d73132f61a73083f0cd498eeb7a0da404fd77db14:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:a36863a728ec9221c83c745f40511946dfd63beca0f10c9afcc774ef7a98e420:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:86dd6e5994e2c15f2783d8d543327479ccee7f3b20023dd962fdb9a211071e16:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:f5299db1221c515de91f59d84b79f2f839f9c94a5d0cc7fad04134e23ec9b88a:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:15a5811e1a7bf377cbac066b04e0b36b4c1a41ca63eb3c67c17b734577f6beea:    done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:cb893097de39451407d7167b312ec56eaea80baa041877af8239dbe833fa044b:    done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 81.4s                                                                    total:  305.5  (3.8 MiB/s)                                       
unpacking linux/arm64/v8 sha256:dfeaad4046f78871d3852e5d5fb8fa848038c57c34c6554c6c97a00ba120d550...

done

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3963 MBytes (4155383808 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

Test 3: Running deviceQuery on the K3s cluster

pico@pico2:~/test$ cat pod_deviceQuery.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: devicequery
spec:
  containers:
    - name: nvidia
      image: ajeetraina/jetson_devicequery:latest

      command: [ "./deviceQuery" ]
pico@pico2:~/test$

sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply -f ./pod_deviceQuery.yaml
pod/devicequery created

pico@pico2:~/test$ sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl describe pod devicequery
Name:         devicequery
Namespace:    default
Priority:     0
Node:         pico4/192.168.1.163
Start Time:   Sun, 13 Jun 2021 09:16:44 -0700
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  nvidia:
    Container ID:  
    Image:         ajeetraina/jetson_devicequery:latest
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      ./deviceQuery
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mcrmv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-mcrmv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  78s   default-scheduler  Successfully assigned default/devicequery to pico4
  Normal  Pulling    77s   kubelet            Pulling image "ajeetraina/jetson_devicequery:latest"
pico@pico2:~/test$

cat pod_deviceQuery_jetson4.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: devicequery
spec:
  nodeName: pico4
  containers:
    - name: nvidia
      image: ajeetraina/jetson_devicequery:latest
      command: [ "./deviceQuery" ]
pico@pico2:~/test$

pico@pico2:~/test$ sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl describe pod devicequery
Name:         devicequery
Namespace:    default
Priority:     0
Node:         pico4/192.168.1.163
Start Time:   Sun, 13 Jun 2021 09:16:44 -0700
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.42.1.3
IPs:
  IP:  10.42.1.3
Containers:
  nvidia:
    Container ID:  containerd://fd502d6bfa55e2f80b2d50bc262e6d6543fd8d09e9708bb78ecec0b2e09621c3
    Image:         ajeetraina/jetson_devicequery:latest
    Image ID:      docker.io/ajeetraina/jetson_devicequery@sha256:dfeaad4046f78871d3852e5d5fb8fa848038c57c34c6554c6c97a00ba120d550
    Port:          <none>
    Host Port:     <none>
    Command:
      ./deviceQuery
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 13 Jun 2021 09:21:50 -0700
      Finished:     Sun, 13 Jun 2021 09:21:50 -0700
    Ready:          False
    Restart Count:  5
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mcrmv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-mcrmv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m51s                  default-scheduler  Successfully assigned default/devicequery to pico4
  Normal   Pulled     5m45s                  kubelet            Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 2m5.699757621s
  Normal   Pulled     5m43s                  kubelet            Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 1.000839703s
  Normal   Pulled     5m29s                  kubelet            Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 967.072951ms
  Normal   Pulled     4m59s                  kubelet            Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 1.025604394s
  Normal   Created    4m59s (x4 over 5m45s)  kubelet            Created container nvidia
  Normal   Started    4m59s (x4 over 5m45s)  kubelet            Started container nvidia
  Warning  BackOff    4m20s (x8 over 5m42s)  kubelet            Back-off restarting failed container
  Normal   Pulling    2m47s (x6 over 7m51s)  kubelet            Pulling image "ajeetraina/jetson_devicequery:latest"

pico@pico2:~/test$ sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply -f ./pod_deviceQuery_jetson4.yaml
pod/devicequery configured