Version v1.6 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.

NVIDIA Fabric Manager

In this guide we’ll follow the procedure to enable NVIDIA Fabric Manager.

NVIDIA GPUs that have nvlink support (for eg: A100) will need the nvidia-fabricmanager system extension also enabled in addition to the NVIDIA drivers. For more information on Fabric Manager refer https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html

The published versions of the NVIDIA fabricmanager system extensions is available here

The nvidia-fabricmanager extension version has to match with the NVIDIA driver version in use.

Enabling the NVIDIA fabricmanager system extension

Create the boot assets or a custom installer and perform a machine upgrade which include the following system extensions:

ghcr.io/siderolabs/nvidia-open-gpu-kernel-modules:535.129.03-v1.6.7
ghcr.io/siderolabs/nvidia-container-toolkit:535.129.03-v1.13.5
ghcr.io/siderolabs/nvidia-fabricmanager:535.129.03

Patch the machine configuration to load the required modules:

machine:
  kernel:
    modules:
      - name: nvidia
      - name: nvidia_uvm
      - name: nvidia_drm
      - name: nvidia_modeset
  sysctls:
    net.core.bpf_jit_harden: 1