Introduction

TensorFlow™ is an open-source software library for Machine Intelligence written on Python. It was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. (https://www.tensorflow.org/)

Using values recorded by SUSE Manager it should be possible to predict the outcome of certain operations if machine learning is applied. We are especially interested in the time it takes to apply patches to systems. With anecdotal values a neural network should be trained to predict this for future operations. We need do find out which values can and should be provided, which classifier(s) to use, aso.

Goals:

  • Monday:

    • Learn about Tensorflow: Definitions, how to create a model, different frameworks, etc
    • Define set of features that can be gathered from the SUSE Manager DB to create our dataset.
    • Explore the values of the dataset: Know about min-max values, boundaries, type of data (categorical, continuous).
    • Define crossed relation between data (crossed columns).
    • Is our dataset good enough?
  • Tuesday:

    • Create and test different tensorflow models: DNNCombinedLinearClassifier, DNNClassifier, etc
    • Are those models' estimations good enough?
    • Is tensorflow suitable for achiving the project goal? are estimation good enough for us?
    • Upload working example.

Outcomes:

  • Initial dataset was not really good. We modified the SQL query to collect also package ids.
  • In the past we restricted the dataset to only contain actions for erratas which only contains one package, but the resulting dataset was not big enough.
  • We implemented a DNNRegressor.
  • Dataset: COLUMNS = ["server_id","errata_id","nrcpu","mhz","ram","package_id","size","time"] (we only currently use server_id, errata_id, package_id)
  • Currently the dataset is based patch installation actions which contains only a one single errata but this errata can have multiple packages associated.
  • We don't know the installation time for a package, because the "time" data we have is for the complete action, so we do a very draft estimation just dividing the total time by the number of packages the errata contains.
  • Estimations seems to be good enough, of course, the database still needs to be improved as well as the model itself where the feature columns definition can be adjusted to get better results.
  • Current estimations are good enough to, at least, give an estimation saying if the action you're planning is going to take less than ~10 seconds, ~30 seconds, ~1 minute, ~5 minutes, etc.

Some samples of estimations:

expected -> estimated

0.233874837557475 -> 0.230502188205719
0.233874837557475 -> 0.25423765182495117
0.233874837557475 -> 0.1823016107082367
0.979458148662861 -> 0.8299890756607056
0.979458148662861 -> 0.8462812900543213
0.211660345395406 -> 0.22346541285514832
1.70577935377757 -> 1.9606330394744873
2.60000002384186 -> 2.39455509185791
0.976182460784912 -> 0.1866598129272461
0.976182460784912 -> 0.614652693271637
2.80241966247559 -> 1.0975050926208496
0.6621074676513671 -> 0.6865990161895752
0.0968895809991019 -> 0.041620612144470215
0.0968895809991019 -> 0.1236574649810791
0.0968895809991019 -> 0.05707252025604248
1.3669094741344499 -> 2.2393956184387207
1.3669094741344499 -> 2.2393956184387207

"Actual" vs "Predicted" screenshots:

Screenshot1

Full graph: view full graph here

Next steps:

  • Refinement of model and dataset
  • Add actions with multiple errata to the dataset
  • Implement also a DNNClassifier to directly classifing instead of getting a float number (possible classes: seconds, minutes, hours).
  • POC of integration with the SUSE Manager UI
  • Refeed the neural network with the actual results of the new actions on SUSE Manager.
  • Replace package_id with something consistent across customers (eg: package name)
  • Try to find a way to avoid averaging the time per package on erratas that point to multiple packages
  • Estimate the actual action (not per package)

Code repository: Internal GitLab

Looking for hackers with the skills:

tensorflow python machinelearning susemanager

This project is part of:

Hack Week 16

Activity

  • over 1 year ago: bfilho liked Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: j_renner liked Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: PSuarezHernandez added keyword "tensorflow" to Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: PSuarezHernandez added keyword "python" to Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: PSuarezHernandez added keyword "machinelearning" to Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: PSuarezHernandez added keyword "susemanager" to Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: mdinca liked Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: dmaiocchi liked Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: dmaiocchi disliked Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • over 2 years ago: dmaiocchi liked Learning & using Tensorflow to estimate patch installation times on SUSE Manager
  • All Activity

    Comments

    • PSuarezHernandez
      over 2 years ago by PSuarezHernandez | Reply

      The outcomes from this HW project has been published!! The project page has been updated to include the results!

    Similar Projects

    Try to write simple rope-base Python language-server for LSP protocol by mcepl

    Future of tools supporting editors in dealing w...


    openSUSE Leap release process improvements by lkocman

    Goal:

    I'd like to have the release proce...


    Home assistant that doesn't spy on you - developer's edition by DKarakasilis

    There are various home assistant solutions out ...


    Simple script to obtain latest project version for package maintenance purpose by wnereiz

    I am now maintaining many packages for openSUSE...


    Write a commandline client for the geekos by dheidler

    There used to be a tool called tel that would...


    Colorizing old images on my NAS (or run machine learning deoldify algorithm on an edge TPU) by jordimassaguerpla

    For my 40th birthday I got from my friends a ve...


    Finish Machine Learning introductory course by dmacvicar

    I started [Standford's machine learning course]...


    libuitest - a generic GUI testing library by dancermak

    Testing GUIs is hard: unit tests require a ...


    Packaging libnvidia-containers and nvidia-container-runtime-hook by jordimassaguerpla

    This is a follow up to https://hackweek.suse.co...


    Artificial Intelligence playground for Data Scientist by afesta

    Project here: https://confluence.suse.com/displ...


    SUSE Manager: Better feedback for scheduled actions by fkobzik

    Motivation

    Running async actions in SUSE ...


    Investigate options to introduce Plugins to SUSE Manager by cbosdonnat

    For years we have been discussing the idea to m...


    Testing GNU/Linux distributions on Uyuni by juliogonzalezgil

    Join the rocket chat channel! [https://chat.su...


    Uyuni: re-architecting code with Akka by moio

    Simplify the codebase by using a more _modern...