Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:

  • Small / unimportant fix
  • Big / important fix
  • Small / important feature
  • Big / important feature

For this project I will

  1. Generate a basic corpus of labeled data from a different set of project related with openSUSE
  2. Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
  3. Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...

Looking for hackers with the skills:

nlp machinelearning git github

This project is part of:

Hack Week 10 Hack Week 11 Hack Week 12

Activity

  • about 3 years ago: jordimassaguerpla liked Detect type of change in a project analyzing the log history
  • almost 4 years ago: nicolasbock liked Detect type of change in a project analyzing the log history
  • almost 5 years ago: ZRen disliked Detect type of change in a project analyzing the log history
  • almost 5 years ago: ZRen liked Detect type of change in a project analyzing the log history
  • almost 5 years ago: bkutil liked Detect type of change in a project analyzing the log history
  • over 5 years ago: cschum liked Detect type of change in a project analyzing the log history
  • over 5 years ago: froh joined Detect type of change in a project analyzing the log history
  • over 5 years ago: vitezslav_cizek liked Detect type of change in a project analyzing the log history
  • over 5 years ago: froh liked Detect type of change in a project analyzing the log history
  • over 5 years ago: oholecek liked Detect type of change in a project analyzing the log history
  • All Activity

    Comments

    • aplanas
      over 5 years ago by aplanas | Reply

      Yeah. Hackweek 10 collied with openSUSE 13.1, so I will try to for on this during this new Hackweek instance : )

    • froh
      over 5 years ago by froh | Reply

      Would it be hard to train for regression fix vs new feature, based on the comment? I'd be curious how much energy project have to put into regression fixes vs feature additions.

    • osynge
      almost 5 years ago by osynge | Reply

      Have you considered looking at ELK and integrating this work in the ELK stack.

    Similar Projects

    libuitest - a generic GUI testing library by dancermak

    Testing GUIs is hard: unit tests require a ...


    Artificial Intelligence playground for Data Scientist by afesta

    Project here: https://confluence.suse.com/displ...


    Packaging libnvidia-containers and nvidia-container-runtime-hook by jordimassaguerpla

    This is a follow up to https://hackweek.suse.co...


    Finish Machine Learning introductory course by dmacvicar

    I started [Standford's machine learning course]...


    Colorizing old images on my NAS (or run machine learning deoldify algorithm on an edge TPU) by jordimassaguerpla

    For my 40th birthday I got from my friends a ve...


    Zero-ish downtime deploy on da cheap! by josegomezr

    Zero-ish downtime deploy on da cheap!

    This...


    Polish filtra and move data collection to Postresql by jochenbreuer

    Last [hackweek](https://hackweek.suse.com/proje...


    "Physical" notifications with Raspberry Pi and addressable LEDs by dannysauer

    I'd like a way to have a device on my desk whic...


    openSUSE Leap release process improvements by lkocman

    Goal:

    I'd like to have the release proce...