This is a follow up to https://hackweek.suse.com/projects/architecting-a-machine-learning-project-with-suse-caasp.

In the last hackweek I learned that in order to run machine learning workflows on top of SUSE CaaSP, the missing piece is to have libnvidia-containers and nvidia-containers-runtime-hook packaged.

Since then, nvidia has added the build for leap15 in libnvidia-container and nvidia-container-runtime.

However, none of them is released into the libnvidia-container repo nor nvidia-container-runtime repo.

This project is about packaging those two projects in the openSUSE Build Service for openSUSE Leap 15.1.

Looking for hackers with the skills:

nvidia machinelearning containers

This project is part of:

Hack Week 19

Activity

  • 9 months ago: drdavis liked Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: afesta liked Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: jordimassaguerpla added keyword "nvidia" to Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: jordimassaguerpla added keyword "machinelearning" to Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: jordimassaguerpla added keyword "containers" to Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: a_faerber liked Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: jordimassaguerpla started Packaging libnvidia-containers and nvidia-container-runtime-hook
  • 9 months ago: jordimassaguerpla originated Packaging libnvidia-containers and nvidia-container-runtime-hook

  • Comments

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      First package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/libnvidia-container And a Pull Request to upstream: https://github.com/NVIDIA/libnvidia-container/pull/77

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      Second package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/nvidia-container-runtime-toolkit

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      Prove that this worked:

      On a workstation with Quadro K2000 with SLE15SP1:

      Installing nvidia graphics driver kernel module

      zypper ar https://download.nvidia.com/suse/sle15sp1/ nvidia
      zypper ref
      zypper install nvidia-gfxG05-kmp-default
      modprobe nvidia
      lsmod | grep nvidia
      

      Expected output:

      nvidia_drm             49152  0
      nvidia_modeset       1114112  1 nvidia_drm
      drm_kms_helper        204800  1 nvidia_drm
      drm                   536576  3 nvidia_drm,drm_kms_helper
      nvidia_uvm           1036288  0
      nvidia              20414464  2 nvidia_modeset,nvidia_uvm
      ipmi_msghandler       110592  2 nvidia,ipmi_devintf
      

      Installing NVIDIA driver for computing with GPUs using CUDA

      zypper install nvidia-computeG05
      nvidia-smi
      

      Expected output:

      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
      | 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      |  No running processes found                                                 |
      +-----------------------------------------------------------------------------+
      

      Installing libnvidia-containers

      zypper ar https://download.opensuse.org/repositories/home:/jordimassaguerpla:/nvidia_container/SLE_15_SP1/ nvidia_container
      zypper install libnvidia-container
      usermod -G root USER
      usermod -G video USER
      

      USER should be a user in your system which is not root

      su - USER -c nvidia-container-cli info
      

      expected output

      NVRM version:   440.59
      CUDA version:   10.2
      
      Device Index:   0
      Device Minor:   0
      Model:          Quadro K2000
      Brand:          Quadro
      GPU UUID:       GPU-6a04b812-c20e-aeb6-9047-6382930eef7d
      Bus Location:   00000000:05:00.0
      Architecture:   3.0
      

      NOTE: we need to use a different user that is not root for this test because the root user does not run with the video group by default. We will fix this later when installing the toolkit. If you use root, you will see this error message

      nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected
      

      Installing nvidia-container-toolkit

      zypper install nvidia-container-toolkit
      

      Test with podman

      zypper install podman podman-cni-config
      podman run nvidia/cuda nvidia-smi
      

      expected output:

      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
      | 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      |  No running processes found                                                 |
      +-----------------------------------------------------------------------------+
      

      So it works! add-emoji

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      As a result, I updated the docs: https://github.com/jordimassaguerpla/SUSEhackweek18/commit/5fca6c12034b4df34c403f14276be754e809b086#diff-2df0241dfedf44f37dcafae751ab29ae

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      The previous link got broken ... damn markdown ;) docs

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      Upstream (NVIDIA) uses Dockerfiles to build the packages for the other distros.

      Here a small experiment of building the SUSE Leap RPM with a Dockerfile within OBS:

      https://build.opensuse.org/package/show/home:jordimassaguerpla:branches:openSUSE:Templates:Images:15.1/libnvidia-containers

    • jordimassaguerpla
      9 months ago by jordimassaguerpla | Reply

      Result of the experiment. Using a Dockerfile works very good because you can develop and debug using "docker build" and then you can commit that to obs to have a build on a central location, store the sources, etc. etc.

      The issue is that the result is an image, it can't be the RPM. There is no "-v" option to mount a volume during the build. Thus, even you can build the image in obs, then you have to run the image to extract the RPM.

      obs=build.opensuse.org.

    Similar Projects

    Finish Machine Learning introductory course by dmacvicar

    I started [Standford's machine learning course]...


    libuitest - a generic GUI testing library by dancermak

    Testing GUIs is hard: unit tests require a ...


    Colorizing old images on my NAS (or run machine learning deoldify algorithm on an edge TPU) by jordimassaguerpla

    For my 40th birthday I got from my friends a ve...


    Artificial Intelligence playground for Data Scientist by afesta

    Project here: https://confluence.suse.com/displ...


    Modernize Mash deployment by seanmarlow

    Mash is a Python based CI/CD pipeline for aut...


    Run VMs in CaaSP 4 cluster with SUSE-powered kubevirt by jfehlig

    This project aims to run VMs in a CaaSP 4 clust...


    Hammer an Envoy service mesh onto a SAP S4/HANA landscape and watch everything explode. by STorresi

    Although CNCF projects are almost exclusively r...