Nvidia has a way to support GPU on Kubernetes via docker and crio, but so far they don't support SLES and CaaSP, this is the goal of this project.

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 19

Activity

  • 9 months ago: drdavis liked Nvidia GPU support for CaaSP
  • 9 months ago: a_faerber liked Nvidia GPU support for CaaSP
  • 9 months ago: huizhizhao originated Nvidia GPU support for CaaSP

  • Comments

    • drdavis
      9 months ago by drdavis | Reply

      https://github.com/NVIDIA/gpu-operator

      • huizhizhao
        8 months ago by huizhizhao | Reply

        Thank you Darren, I didn't take a look into this carefully, but tried with the steps and I failed to enable GPU on my environment. Will go deep on this.

    • huizhizhao
      8 months ago by huizhizhao | Reply

      CaaSP: GPU support on CaaSP: 1. What I think about enable GPU on CaaSP: 1). Make Nvidia official support GPU on CaaSP. 2). Have an installation guide for customers about how to enable GPU on CaaSP.

      1. There's an email "[caasp-internal] Sharing a GPU", Roger mentioned that "We are planning on integrating Nvidia’s vGPU driver, after initial release. But it must be licensed from them. " But what I heard it's already been 2 years since we start with this. Also, this about enable vGPU on SLES, not CaaSP.

      Steps to enable Nvidia GPU on CaaSP: 1. Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable. 2. Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx) 3. Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?targetos=Linux&targetarch=x8664&targetdistro=SLES&targetversion=15&targettype=rpmlocal) 4. Install nvidia-container-runtime-hook on the GPU nodes: 4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support 4.2 Run "make sle15.1" // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker. 4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes. 5. Label GPU nodes, example: nvidia-smi -q | grep 'Product Name' kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours. kubectl label nodes node1 hardware-type=NVIDIAGPU

      If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example: verify by vector-add: nvidia-test.yaml apiVersion: v1 kind: Pod metadata: name: cuda-vector-add namespace: default spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1" env: - name: NVIDIAVISIBLEDEVICES value: all - name: NVIDIADRIVERCAPABILITIES value: "compute,utility" - name: NVIDIAREQUIRECUDA
      value: "cuda>=5.0" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] seLinuxOptions: type: nvidiacontainert resources:

      • huizhizhao
        8 months ago by huizhizhao | Reply

        Please ignore this, a better format version can be found below.

    • huizhizhao
      8 months ago by huizhizhao | Reply

      Please ignore the comment above, it has bad format, below is the version with better format.

      Steps to enable Nvidia GPU on CaaSP:

      1. Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
      2. Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx)
      3. Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?targetos=Linux&targetarch=x8664&targetdistro=SLES&targetversion=15&targettype=rpmlocal)
      4. Install nvidia-container-runtime-hook on the GPU nodes: 4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support 4.2 Run "make sle15.1" // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker. 4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes.
      5. Label GPU nodes, example: nvidia-smi -q | grep 'Product Name' kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours. kubectl label nodes node1 hardware-type=NVIDIAGPU

      If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example: verify by vector-add: nvidia-test.yaml yaml apiVersion: v1 kind: Pod metadata: name: cuda-vector-add namespace: default spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1" env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: "compute,utility" - name: NVIDIA_REQUIRE_CUDA value: "cuda>=5.0" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] seLinuxOptions: type: nvidia_container_t resources: limits: nvidia.com/gpu: 1

      Then run kubectl create -f nvidia-test.yaml

      Or run kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/examples/workloads/deployment.yml

      Our Comment tool doesn't deal with yaml format well, this is why the yaml format still looks bad.

    • huizhizhao
      8 months ago by huizhizhao | Reply

      Before the verification, please install nvidia device plugin: kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml

    Similar Projects

    This project is one of its kind!