Proxmox GPU Passthrough for LXC for Docker, for apps WebODM, immich

Proxmox GPU Passthrough for Docker using LXC to host WebODM with ClusterODM

References:


Remove Old NVIDIA Drivers

  1. List existing NVIDIA or CUDA packages:
    apt list --installed | egrep -i "nvidia|cuda" | cut -d/ -f1
  2. If drivers are listed, uninstall the current NVIDIA runfile driver:
    sudo ./NVIDIA-Linux-*.run --uninstall
  3. Re-check installed packages:
    apt list --installed | egrep -i "nvidia|cuda" | cut -d/ -f1
  4. If any packages remain, remove them:
    apt list --installed | egrep -i "nvidia|cuda" | cut -d/ -f1 | xargs apt remove -y

Setting Up GPU Passthrough on Proxmox Server

  1. Install required packages:
    apt install pve-headers dkms pciutils
  2. Edit /etc/default/grub and update:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
  3. Update grub:
    update-grub2
  4. Blacklist default GPU drivers:
    echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
    echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
    echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
  5. Add to /etc/modules:
    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
  6. Update initramfs:
    update-initramfs -u -k all
  7. Reboot the Proxmox server.
  8. Download NVIDIA driver (from NVIDIA Drivers). This document uses the official NVIDIA driver runfile. The distro version can break with apt updates.
    Example:

    NVIDIA-Linux-x86_64-570.133.07.run
  9. Set the installer as executable:
    chmod +x NVIDIA-Linux-*.run
  10. Run the installer:
    ./NVIDIA-Linux-*.run
  11. Reboot the Proxmox server again.
  12. Check installation:
    nvidia-smi
  13. Check NVIDIA device IDs:
    ls -al /dev/nvidia*
    Example output:

    crw-rw-rw- 1 root root 195,   0 /dev/nvidia0
    crw-rw-rw- 1 root root 195, 255 /dev/nvidiactl
    crw-rw-rw- 1 root root 509,   0 /dev/nvidia-uvm
    crw-rw-rw- 1 root root 509,   1 /dev/nvidia-uvm-tools

    Note device IDs like 195, 235, 255, 509.

  14. Edit LXC config file at /etc/pve/lxc/<ID>.conf:
    lxc.cgroup2.devices.allow: c 195:* rwm
    lxc.cgroup2.devices.allow: c 235:* rwm
    lxc.cgroup2.devices.allow: c 255:* rwm
    lxc.cgroup2.devices.allow: c 509:* rwm
    lxc.mount.entry: /dev/nvidia0 /dev/nvidia0 none bind,optional,create=file
    lxc.mount.entry: /dev/nvidiactl /dev/nvidiactl none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-modeset /dev/nvidia-modeset none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-uvm /dev/nvidia-uvm none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-uvm-tools /dev/nvidia-uvm-tools none bind,optional,create=file
  15. Check nvidia-smi again:
    nvidia-smi

Set Up LXC Container for Docker

  1. Install tools:
    apt install pciutils
  2. Install NVIDIA driver again with:
    ./NVIDIA-Linux-*.run --no-kernel-modules
  3. Install APT prerequisites:
    apt update
    apt install -y apt-transport-https ca-certificates curl gnupg lsb-release
  4. Add NVIDIA APT repo:
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  5. Update package list:
    apt update
  6. Install NVIDIA Container Toolkit:
    apt install -y nvidia-container-toolkit
  7. Configure Docker to use NVIDIA runtime:
    sudo nvidia-ctk runtime configure --runtime=docker
  8. Restart Docker:
    sudo systemctl restart docker
  9. Verify GPU inside container:
    nvidia-smi
    Example output:

    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  Quadro P600                    On  |   00000000:01:00.0 Off |                  N/A |
    |  0%   37C    P8            N/A  /  N/A  |       3MiB /   2048MiB |      0%      Default |
    +-----------------------------------------+------------------------+----------------------+
    | Processes:                                                                              |
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+

Reference: NVIDIA Container Toolkit Installation Guide


WebODM + ClusterODM (Docker Setup)

docker run -d --rm -ti -p 3000:3000 -p 10000:10000 -p 8080:8080 opendronemap/clusterodm

Run on the worker node:

docker run -d -p 3001:3000 opendronemap/nodeodm:gpu --gpus all --restart always
./webodm.sh start --default-nodes 0 --detached --port 80

Connect Node in WebODM UI:
Go to: http://10.0.1.131:10000
Add Node: 10.0.1.131:3001


Configure Immich for GPU

(Adapted from: Immich Docs)

  1. Download the latest hwaccel.ml.yml file and place it in the same folder as docker-compose.yml.
  2. In docker-compose.yml, under immich-machine-learning, uncomment the extends section and change cpu to the appropriate backend.
  3. Also in immich-machine-learning, add one of: [armnn, cuda, rocm, openvino, rknn] to the image tag.
  4. Redeploy the immich-machine-learning container with the updated settings.