The idea here is to study and understand how ephemeral storage for containers works and investigate if local storage could be avoided at all and Ceph used instead. Could new storage driver be developed to support Ceph storage: https://github.com/containers/storage

The goal of this project is to understand requirements for container's ephemeral storage (performance, latency and etc) and maybe PoC on storage driver for Ceph.

Looking for hackers with the skills:

ceph k8s

This project is part of:

Hack Week 19


Comments

  • denisok
    about 2 months ago by denisok | Reply

    so after reading some info about different storage drivers and getting know some details of containers/storage, I would first try to just mount CephFS and point vfs storage driver to that mount and see what impact on perf would be and how would it work.

    Also I plan to search for some benchmarks that I could use to test solution.

  • denisok
    about 1 month ago by denisok | Reply

    I have created virtual 3 nodes Ceph cluster and tried CephFS as a mount point for the graphroot for the storage driver. It was working fine but quite slower than local (ssd) storage:

    dkondratenko@f66:~/Workspace> time podman pull opensuse/toolbox
    Trying to pull docker.io/opensuse/toolbox...
      denied: requested access to the resource is denied
    Trying to pull quay.io/opensuse/toolbox...
      unauthorized: access to the requested resource is not authorized
    Trying to pull registry.opensuse.org/opensuse/toolbox...
    Getting image source signatures
    Copying blob 6b18d304886e done  
    Copying blob 32329fe4ff6e done  
    Copying config d5ebabf36c done  
    Writing manifest to image destination
    Storing signatures
    d5ebabf36cd400a32148eda6b97ec2989c307a4d4ebfdb03822ec6e35b0d7f36
    
    real    3m25.789s
    user    0m35.065s
    sys     0m9.852s
    
    real    0m28.302s
    user    0m30.095s
    sys     0m6.377s
    
    
    dkondratenko@f66:~/Workspace> time podman pull opensuse/busybox
    Trying to pull docker.io/opensuse/busybox...
      denied: requested access to the resource is denied
    Trying to pull quay.io/opensuse/busybox...
      unauthorized: access to the requested resource is not authorized
    Trying to pull registry.opensuse.org/opensuse/busybox...
    Getting image source signatures
    Copying blob ce98310bda16 done  
    Copying config 52423b05e2 done  
    Writing manifest to image destination
    Storing signatures
    52423b05e29476a7619641e48c4a5c960eb18b7edb5264c433f7af0bfeebcad0
    
    real    0m25.236s
    user    0m1.147s
    sys     0m0.593s
    
    real    0m3.696s
    user    0m0.907s
    sys     0m0.327s
    
    
    time podman run -it opensuse/toolbox /bin/sh -c 'zypper in -y bonnie; bonnie'
    
    Bonnie: Warning: You have 15927MiB RAM, but you test with only 100MiB datasize!    
    Bonnie:          This might yield unrealistically good results,
    Bonnie:          for reading and seeking and writing.
    Bonnie 1.6: File './Bonnie.201', size: 104857600, volumes: 1
    Writing       25MiB with putc()...         done:  64357 kiB/s  74.1 %CPU
    Rewriting    100MiB...                     done:2696297 kiB/s  41.2 %CPU
    Writing      100MiB intelligently...       done: 558372 kiB/s  31.7 %CPU
    Reading       12MiB with getc()...         done:  86603 kiB/s 100.0 %CPU
    Reading      100MiB intelligently...       done:5248052 kiB/s  99.5 %CPU
    Seeker 2 1 3 4 7 6 8 5 9 10 12 11 15 16 14 13 start 'em ................
    Estimated seek time: raw 0.010ms, eff 0.009ms
    
                ----Sequential Output (nosync)----- ----Sequential Input--- --Rnd Seek-
                -Per Char-- --Block---- -Rewrite--- -Per Char-- --Block---- --04k (16)-
    Machine     MiB  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU   /sec %CPU
    2b337e6 1*  100  64357 74.1 558372 31.72696297 41.2  86603  1005248052 99.5  96801  617
    
    real    0m18.456s
    user    0m0.127s
    sys     0m0.056s
    
    
    Bonnie: Warning: You have 15927MiB RAM, but you test with only 100MiB datasize!
    Bonnie:          This might yield unrealistically good results,
    Bonnie:          for reading and seeking and writing.
    Bonnie 1.6: File './Bonnie.202', size: 104857600, volumes: 1
    Writing       25MiB with putc()...         done:  37109 kiB/s  46.4 %CPU
    Rewriting    100MiB...                     done:1687791 kiB/s  26.0 %CPU
    Writing      100MiB intelligently...       done: 197676 kiB/s  11.3 %CPU
    Reading       12MiB with getc()...         done:  87179 kiB/s 100.0 %CPU
    Reading      100MiB intelligently...       done:6430948 kiB/s  99.4 %CPU
    Seeker 1 3 2 4 6 5 8 7 9 10 12 11 14 13 15 16 start 'em ................
    Estimated seek time: raw 1.088ms, eff 1.087ms
    
                ----Sequential Output (nosync)----- ----Sequential Input--- --Rnd Seek-
                -Per Char-- --Block---- -Rewrite--- -Per Char-- --Block---- --04k (16)-
    Machine     MiB  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU   /sec %CPU
    b653a06 1*  100  37109 46.4 197676 11.31687791 26.0  87179  1006430948 99.4    919  5.3
    
    real    0m51.470s
    user    0m0.113s
    sys     0m0.094s
    

  • denisok
    about 1 month ago by denisok | Reply

    So there are two main investigation point here

    • tar to CephFS is quite slow
    • operations in container image are two slow

    As for unpacking to CephFS, first of all it is slow because Ceph run in VMs and no way could provide a performance.

    Second, in case of CephFS unpacking of same image layer could be done only once, so the same image layer wouldn't need unpacking and could be just mounted. Container run from CephFS doesn't have that dramatic performance impact:

    dkondratenko@f66:~/Workspace> time podman run -it opensuse/toolbox /bin/sh -c 'echo Hello world!'
    Hello world!
    
    real    0m0.448s
    user    0m0.097s
    sys     0m0.088s
    
    real    0m1.470s
    user    0m0.117s
    sys     0m0.079s
    

    Next point is a container layer that shouldn't be persistent but eats quite a space and performance, as in bonnie example, to intall it and for tests. That layer could be mounted in local FS and deleted after each run.

    Also CephFS performance could be improved by caching, images are persistent, so technics similar to ceph-immmutable-object-cache could be used.

    I have tried to pull some images to local storage and instead of some layers just mount CephFS layers, but such container failed to start. I track it down to simple container that tries to start with pre-mounted fuse-overlayfs mount that has both CephFS layers and local ones, but such container doesn't see right mount point. I wasn't able to track it further.

    Next steps:

    • try some hw ceph cluster and see how big perf differences
    • find out if CephFS could be mounted together with local layers as overlayfs to the namspace
    • implement storage driver that could provide hybrid approach, CephFS for the common layers and local layer container layer
    • find out if implementation of on-prem registry that would be a proxy and unpack layers to CephFS makes sense

Similar Projects

distributed build for Ceph in containers by denisok

Investigate possibilities for the distributed b...


Run VMs in CaaSP 4 cluster with SUSE-powered kubevirt by jfehlig

This project aims to run VMs in a CaaSP 4 clust...