Skip to content

Instantly share code, notes, and snippets.

@573
Last active April 17, 2024 07:06
Show Gist options
  • Save 573/94cb731d196c105d3f27b673780ee13a to your computer and use it in GitHub Desktop.
Save 573/94cb731d196c105d3f27b673780ee13a to your computer and use it in GitHub Desktop.
wsl gpu tip: just test if nvidia gpu inside rootless docker is working nix

UPDATE: See nix-community/NixOS-WSL#454

programs.nix-ld = {
    enable = true;
  };
  wsl = {
    enable = true;
    defaultUser = "${username}";
    nativeSystemd = true;
    useWindowsDriver = true;
  };
  hardware.opengl = {
    enable = true;
    driSupport = true;
    setLdLibraryPath = true;

    extraPackages = with pkgs; [
      mesa.drivers
      libvdpau-va-gl
      (libedit.overrideAttrs (attrs: {postInstall = (attrs.postInstall or "") + ''ln -s $out/lib/libedit.so $out/lib/libedit.so.2'';}))
    ];
  };
MESA_D3D12_DEFAULT_ADAPTER_NAME=Nvidia strace -o strace.log glxinfo -B

UPDATE: NixOS/nixpkgs#278969 is merged, retest https://discourse.nixos.org/t/docker-rootless-with-nvidia-support/37069/17 or https://discourse.nixos.org/t/using-nvidia-container-runtime-with-containerd-on-nixos/27865/30 and https://discourse.nixos.org/t/broken-nvidia-gpu-acceleration-in-docker-containers/40459

https://www.svlsimulator.com/docs/installation-guide/running-linux-gpu-applications-on-windows/

docker run --rm -it --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

Just for later: docker run -e LD_LIBRARY_PATH =/usr/lib64/ ...

# cuda-docker/flake.nix
{ description = "gpu docker creating";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixpkgs-unstable";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = {
    self,
    nixpkgs,
    flake-utils,
  }:
    flake-utils.lib.eachDefaultSystem (system: let
      pkgs = import nixpkgs {inherit overlays system;};
in with pkgs; {
  packages.default = dockerTools.buildImage {
  name = "cuda-env-docker";
  tag = "latest";
  copyToRoot = pkgs.buildEnv {
    name = "image-root";
    pathsToLink = ["/bin"];
    paths = [
      pkgs.cudatoolkit
    ];
  };
  config = {
    Env = [
      "CUDA_PATH=${pkgs.cudatoolkit}"
      "PYTHONFAULTHANDLER=1"
      "PYTHONBREAKPOINT=ipdb.set_trace"
      "LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib64"
      "PATH=/bin:$PATH"
    ];
    Cmd = ["/bin/nvidia-smi"];
  };
};
});
}
NIXPKGS_ALLOW_UNFREE=1 nix build --impure --no-link
ls -la result
docker load < result
docker run -it --rm --gpus all cuda-env-docker:latest

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Giving up for now.

With:

$ env | grep LD_LIBRARY_PATH NIX_LD_LIBRARY_PATH=/run/current-system/sw/share/nix-ld/lib LD_LIBRARY_PATH=/mnt/c/Windows/System32/lxss/lib:/etc/sane-libs

and following config https://github.com/573/nix-config-1/tree/74cbd2ca9a3fb4a485484f21905e5456bd71f45c was able to run at least without errors:

$ nvidia-container-cli -k -d /dev/tty info

NB: $ export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH should do as well and I guess it is the cudatoolkit version I installed that by that time finally matched the windows driver (https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=cudatoolkit). docker --gpus all still not working though.

On the glxgears image created: docker run -it -v /tmp/.X11-unix:/tmp/.X11-unix -v /mnt/wslg:/mnt/wslg -v /mnt/c/Windows/System32 /lxss/lib:/usr/lib/wsl/lib --device=/dev/dxg -e DISPLAY=$DISPLAY -e WAYLAND_DISPLAY=$WAYLAND_DISPLAY -e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR -e PULSE_SERVER=$PULSE_SERVER --gpus all glxgears still gives:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown. ERRO[0000] error waiting for container:

(see microsoft/WSL#9099 (comment))

Brainstorming: Do I need to override the kernelPackage / driver inside WSL to equal Windows (545.29.02-6.6.13 nixos.org-search https://search.nixos.org/packages?channel=23.11&from=50&size=50&sort=relevance&type=packages&query=linuxKernel.packages.linux_*nvidia, /nix/store/2v117slwx40lhjjf9yrx8k13dz5y0f4y-nvidia-x11-545.29.02-6.1.74-bin log below and Driver Version: 551.23 compare nvidia-smi log below) side ? Also, what about https://search.nixos.org/packages?channel=23.11&show=cudaPackages.nvidia_driver&from=250&size=50&sort=relevance&type=packages&query=nvidia ?

Logs so far:

$ LD_LIBRARY_PATH=/usr/lib/wsl/lib nvidia-smi
Thu Feb  1 13:33:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce 940MX           On  |   00000000:02:00.0 Off |                  N/A |
| N/A    0C    P8             N/A /  200W |      56MiB /   2048MiB |     15%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
$ LD_LIBRARY_PATH=/usr/lib/wsl/lib nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0201 12:33:08.626976 75084 nvc.c:376] initializing library context (version=1.9.0, build=v1.9.0)
I0201 12:33:08.627013 75084 nvc.c:350] using root /
I0201 12:33:08.627020 75084 nvc.c:351] using ldcache /etc/ld.so.cache
I0201 12:33:08.627027 75084 nvc.c:352] using unprivileged user 1000:100
I0201 12:33:08.627042 75084 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0201 12:33:08.638109 75084 dxcore.c:227] Creating a new WDDM Adapter for hAdapter:40000000 luid:2457037
I0201 12:33:08.648218 75084 dxcore.c:268] Adding new adapter via dxcore hAdapter:40000000 luid:2457037 wddm version:2700
I0201 12:33:08.648281 75084 dxcore.c:326] dxcore layer initialized successfully
W0201 12:33:08.649878 75084 nvc.c:401] skipping kernel modules load on WSL
I0201 12:33:08.650083 75085 rpc.c:71] starting driver rpc service
I0201 12:33:08.759571 75086 rpc.c:71] starting nvcgo rpc service
I0201 12:33:08.760437 75084 nvc_info.c:761] requesting driver information with ''
W0201 12:33:08.770810 75084 nvc_info.c:394] missing library libnvidia-ml.so
W0201 12:33:08.770848 75084 nvc_info.c:394] missing library libnvidia-cfg.so
W0201 12:33:08.770857 75084 nvc_info.c:394] missing library libnvidia-nscq.so
W0201 12:33:08.770862 75084 nvc_info.c:394] missing library libcuda.so
W0201 12:33:08.770866 75084 nvc_info.c:394] missing library libnvidia-opencl.so
W0201 12:33:08.770870 75084 nvc_info.c:394] missing library libnvidia-ptxjitcompiler.so
W0201 12:33:08.770874 75084 nvc_info.c:394] missing library libnvidia-fatbinaryloader.so
W0201 12:33:08.770897 75084 nvc_info.c:394] missing library libnvidia-allocator.so
W0201 12:33:08.770902 75084 nvc_info.c:394] missing library libnvidia-compiler.so
W0201 12:33:08.770918 75084 nvc_info.c:394] missing library libnvidia-pkcs11.so
W0201 12:33:08.770945 75084 nvc_info.c:394] missing library libnvidia-ngx.so
W0201 12:33:08.770961 75084 nvc_info.c:394] missing library libvdpau_nvidia.so
W0201 12:33:08.770970 75084 nvc_info.c:394] missing library libnvidia-encode.so
W0201 12:33:08.770976 75084 nvc_info.c:394] missing library libnvidia-opticalflow.so
W0201 12:33:08.770982 75084 nvc_info.c:394] missing library libnvcuvid.so
W0201 12:33:08.770986 75084 nvc_info.c:394] missing library libnvidia-eglcore.so
W0201 12:33:08.770991 75084 nvc_info.c:394] missing library libnvidia-glcore.so
W0201 12:33:08.770998 75084 nvc_info.c:394] missing library libnvidia-tls.so
W0201 12:33:08.771017 75084 nvc_info.c:394] missing library libnvidia-glsi.so
W0201 12:33:08.771021 75084 nvc_info.c:394] missing library libnvidia-fbc.so
W0201 12:33:08.771046 75084 nvc_info.c:394] missing library libnvidia-ifr.so
W0201 12:33:08.771064 75084 nvc_info.c:394] missing library libnvidia-rtcore.so
W0201 12:33:08.771071 75084 nvc_info.c:394] missing library libnvoptix.so
W0201 12:33:08.771094 75084 nvc_info.c:394] missing library libGLX_nvidia.so
W0201 12:33:08.771099 75084 nvc_info.c:394] missing library libEGL_nvidia.so
W0201 12:33:08.771105 75084 nvc_info.c:394] missing library libGLESv2_nvidia.so
W0201 12:33:08.771109 75084 nvc_info.c:394] missing library libGLESv1_CM_nvidia.so
W0201 12:33:08.771129 75084 nvc_info.c:394] missing library libnvidia-glvkspirv.so
W0201 12:33:08.771135 75084 nvc_info.c:394] missing library libnvidia-cbl.so
W0201 12:33:08.771139 75084 nvc_info.c:394] missing library libdxcore.so
W0201 12:33:08.771144 75084 nvc_info.c:398] missing compat32 library libnvidia-ml.so
W0201 12:33:08.771149 75084 nvc_info.c:398] missing compat32 library libnvidia-cfg.so
W0201 12:33:08.771173 75084 nvc_info.c:398] missing compat32 library libnvidia-nscq.so
W0201 12:33:08.771180 75084 nvc_info.c:398] missing compat32 library libcuda.so
W0201 12:33:08.771185 75084 nvc_info.c:398] missing compat32 library libnvidia-opencl.so
W0201 12:33:08.771190 75084 nvc_info.c:398] missing compat32 library libnvidia-ptxjitcompiler.so
W0201 12:33:08.771194 75084 nvc_info.c:398] missing compat32 library libnvidia-fatbinaryloader.so
W0201 12:33:08.771200 75084 nvc_info.c:398] missing compat32 library libnvidia-allocator.so
W0201 12:33:08.771205 75084 nvc_info.c:398] missing compat32 library libnvidia-compiler.so
W0201 12:33:08.771209 75084 nvc_info.c:398] missing compat32 library libnvidia-pkcs11.so
W0201 12:33:08.771215 75084 nvc_info.c:398] missing compat32 library libnvidia-ngx.so
W0201 12:33:08.771220 75084 nvc_info.c:398] missing compat32 library libvdpau_nvidia.so
W0201 12:33:08.771225 75084 nvc_info.c:398] missing compat32 library libnvidia-encode.so
W0201 12:33:08.771231 75084 nvc_info.c:398] missing compat32 library libnvidia-opticalflow.so
W0201 12:33:08.771236 75084 nvc_info.c:398] missing compat32 library libnvcuvid.so
W0201 12:33:08.771241 75084 nvc_info.c:398] missing compat32 library libnvidia-eglcore.so
W0201 12:33:08.771247 75084 nvc_info.c:398] missing compat32 library libnvidia-glcore.so
W0201 12:33:08.771252 75084 nvc_info.c:398] missing compat32 library libnvidia-tls.so
W0201 12:33:08.771258 75084 nvc_info.c:398] missing compat32 library libnvidia-glsi.so
W0201 12:33:08.771262 75084 nvc_info.c:398] missing compat32 library libnvidia-fbc.so
W0201 12:33:08.771268 75084 nvc_info.c:398] missing compat32 library libnvidia-ifr.so
W0201 12:33:08.771287 75084 nvc_info.c:398] missing compat32 library libnvidia-rtcore.so
W0201 12:33:08.771292 75084 nvc_info.c:398] missing compat32 library libnvoptix.so
W0201 12:33:08.771316 75084 nvc_info.c:398] missing compat32 library libGLX_nvidia.so
W0201 12:33:08.771322 75084 nvc_info.c:398] missing compat32 library libEGL_nvidia.so
W0201 12:33:08.771326 75084 nvc_info.c:398] missing compat32 library libGLESv2_nvidia.so
W0201 12:33:08.771331 75084 nvc_info.c:398] missing compat32 library libGLESv1_CM_nvidia.so
W0201 12:33:08.771337 75084 nvc_info.c:398] missing compat32 library libnvidia-glvkspirv.so
W0201 12:33:08.771342 75084 nvc_info.c:398] missing compat32 library libnvidia-cbl.so
W0201 12:33:08.771347 75084 nvc_info.c:398] missing compat32 library libdxcore.so
I0201 12:33:08.774404 75084 nvc_info.c:274] selecting /usr/lib/wsl/drivers/nvltig.inf_amd64_60796aa09a3d57d6/nvidia-smi
I0201 12:33:08.777698 75084 nvc_info.c:294] selecting /nix/store/2v117slwx40lhjjf9yrx8k13dz5y0f4y-nvidia-x11-545.29.02-6.1.74-bin/origBin/nvidia-debugdump
I0201 12:33:08.777890 75084 nvc_info.c:294] selecting /nix/store/2v117slwx40lhjjf9yrx8k13dz5y0f4y-nvidia-x11-545.29.02-6.1.74-bin/origBin/nvidia-cuda-mps-control
I0201 12:33:08.777961 75084 nvc_info.c:294] selecting /nix/store/2v117slwx40lhjjf9yrx8k13dz5y0f4y-nvidia-x11-545.29.02-6.1.74-bin/origBin/nvidia-cuda-mps-server
I0201 12:33:08.778051 75084 nvc_info.c:294] selecting /nix/store/1vpvyqk554niba666l220lxl6yc46zbw-nvidia-persistenced-545.29.02/origBin/nvidia-persistenced
W0201 12:33:08.778126 75084 nvc_info.c:420] missing binary nv-fabricmanager
I0201 12:33:08.778149 75084 nvc_info.c:436] skipping path lookup for dxcore
I0201 12:33:08.778173 75084 nvc_info.c:524] listing device /dev/dxg
W0201 12:33:08.778219 75084 nvc_info.c:344] missing ipc path /var/run/nvidia-persistenced/socket
W0201 12:33:08.778264 75084 nvc_info.c:344] missing ipc path /var/run/nvidia-fabricmanager/socket
W0201 12:33:08.778302 75084 nvc_info.c:344] missing ipc path /tmp/nvidia-mps
I0201 12:33:08.778326 75084 nvc_info.c:817] requesting device information with ''
I0201 12:33:08.813269 75084 nvc_info.c:689] listing dxcore adapter 0 (GPU-15f44737-27a6-acec-0167-d15a9e435d7b at 00000000:02:00.0)
NVRM version:   551.23
CUDA version:   12.3

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce 940MX
Brand:          GeForce
GPU UUID:       GPU-15f44737-27a6-acec-0167-d15a9e435d7b
Bus Location:   00000000:02:00.0
Architecture:   5.0
I0201 12:33:08.813365 75084 nvc.c:430] shutting down library context
I0201 12:33:08.813412 75086 rpc.c:95] terminating nvcgo rpc service
I0201 12:33:08.813943 75084 rpc.c:132] nvcgo rpc service terminated successfully
I0201 12:33:08.817439 75085 rpc.c:95] terminating driver rpc service
I0201 12:33:08.820885 75084 rpc.c:132] driver rpc service terminated successfully

related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment