Proxmox GPU Passthrough for LXC for Docker, for apps WebODM, immich

Proxmox GPU Passthrough for Docker using LXC to host WebODM with ClusterODM

References:

Remove Old NVIDIA Drivers

List existing NVIDIA or CUDA packages:
apt list --installed | egrep -i "nvidia|cuda" | cut -d/ -f1
If drivers are listed, uninstall the current NVIDIA runfile driver:
sudo ./NVIDIA-Linux-*.run --uninstall
Re-check installed packages:
apt list --installed | egrep -i "nvidia|cuda" | cut -d/ -f1
If any packages remain, remove them:
apt list --installed | egrep -i "nvidia|cuda" | cut -d/ -f1 | xargs apt remove -y

Setting Up GPU Passthrough on Proxmox Server

Install required packages:
apt install pve-headers dkms pciutils
Edit /etc/default/grub and update:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
Update grub:
update-grub2

Blacklist default GPU drivers:

echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf

Add to /etc/modules:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Update initramfs:
update-initramfs -u -k all
Reboot the Proxmox server.
Download NVIDIA driver (from NVIDIA Drivers). This document uses the official NVIDIA driver runfile. The distro version can break with apt updates.
Example:
```
NVIDIA-Linux-x86_64-570.133.07.run
```
Set the installer as executable:
chmod +x NVIDIA-Linux-*.run
Run the installer:
./NVIDIA-Linux-*.run
Reboot the Proxmox server again.
Check installation:
nvidia-smi

Check NVIDIA device IDs:
ls -al /dev/nvidia*
Example output:

crw-rw-rw- 1 root root 195,   0 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 /dev/nvidiactl
crw-rw-rw- 1 root root 509,   0 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509,   1 /dev/nvidia-uvm-tools

Note device IDs like 195, 235, 255, 509.

Edit LXC config file at /etc/pve/lxc/<ID>.conf:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.cgroup2.devices.allow: c 255:* rwm
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.mount.entry: /dev/nvidia0 /dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl /dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset /dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm /dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools /dev/nvidia-uvm-tools none bind,optional,create=file

Check nvidia-smi again:
nvidia-smi

Set Up LXC Container for Docker

Install tools:
apt install pciutils
Install NVIDIA driver again with:
./NVIDIA-Linux-*.run --no-kernel-modules

Install APT prerequisites:

apt update
apt install -y apt-transport-https ca-certificates curl gnupg lsb-release

Add NVIDIA APT repo:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update package list:
apt update
Install NVIDIA Container Toolkit:
apt install -y nvidia-container-toolkit
Configure Docker to use NVIDIA runtime:
sudo nvidia-ctk runtime configure --runtime=docker
Restart Docker:
sudo systemctl restart docker

Verify GPU inside container:
nvidia-smi
Example output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro P600                    On  |   00000000:01:00.0 Off |                  N/A |
|  0%   37C    P8            N/A  /  N/A  |       3MiB /   2048MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
| Processes:                                                                              |
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Reference: NVIDIA Container Toolkit Installation Guide

WebODM + ClusterODM (Docker Setup)

docker run -d --rm -ti -p 3000:3000 -p 10000:10000 -p 8080:8080 opendronemap/clusterodm

Run on the worker node:

docker run -d -p 3001:3000 opendronemap/nodeodm:gpu --gpus all --restart always
./webodm.sh start --default-nodes 0 --detached --port 80

Connect Node in WebODM UI:
Go to: http://10.0.1.131:10000
Add Node: 10.0.1.131:3001

Configure Immich for GPU

(Adapted from: Immich Docs)

Download the latest hwaccel.ml.yml file and place it in the same folder as docker-compose.yml.
In docker-compose.yml, under immich-machine-learning, uncomment the extends section and change cpu to the appropriate backend.
Also in immich-machine-learning, add one of: [armnn, cuda, rocm, openvino, rknn] to the image tag.
Redeploy the immich-machine-learning container with the updated settings.