The current crash-setup source is located here. Pretty much is working nicely but it doesn't care for the debug source making it impossible to use the crash> gdb list *<symbol> command right away. This is bug 997558 which should be worked on.

A further script would be really useful to build up the analysis environment on l3slave for core dumps of crashed user-space applications. The MF Open Enterprise Server (OES) has a tool called NetIQ GetCore Utility which prepares such an environment with files opencore.sh, opencore.ini, the binaries of the executable and libs, and the core dump itself. The files it generates can be used as a base and prove that it is possible to build up a core dump analysis environment decoupled from the host system. What it is missing is setting up the debug symbols and the debug source as well. I've done that manually already. So this is possible and should be automated.

Both these topics are efficiency improvement tasks for the L3 work.

Looking for mad skills in:

shell c

This project is part of:

Hack Week 17


Comments

  • sparschauer
    about 1 year ago by sparschauer | Reply

    The core-setup project should not depend on l3slave and should work with openSUSE for non-employees as well. It should have different modes for:

    • analysis on the system which collected the dump:
      • auto-get debuginfo and debugsource packages with zypper - packages are latest? (recommended for reproducible segfaults)
      • include paths to custom code like e.g. scanmem which I compile from source as an upstream maintainer
    • analysis on another system
      • auto-get and extract RPMs from SUSE servers like e.g. http://download.opensuse.org/update/leap/42.3/oss/x86_64/, http://download.opensuse.org/debug/update/leap/42.3/oss/x86_64/
    • SLES
      • get packages from NFS mounts
      • get packages from *.suse.de servers
    • openSUSE
      • get packages from download.opensuse.org or mirrors

    Key features:

    • read coredump, detect crashed process and libs
    • convert build-id to package name + version with the help of SUSE repos
    • maybe use rpm.txt from supportconfig if exact package info cannot be gathered from the dump
    • get and extract the required binary, debuginfo, and debugsource packages
    • build up opencore.sh and opencore.ini - maybe with support for custom settings

    Initial tasks:

    • check for, evaluate, and analyze similar tools
    • analyze coredump structure - What is included and can be used?

    Main goals:

    • prevent gdb from picking up wrong code/source/debuginfo or wrong packages
    • take needed code parts from other FOSS projects to avoid duplicate efforts

  • sparschauer
    about 1 year ago by sparschauer | Reply

    For crash-setup -d vmcore, crash-setup uses kdumpid which uses libkdumpfile.

    Example output of kdumpid:

    Format: compressed kdump
    Arch: x86_64
    Version: 4.4.138-94.39-default

    • sparschauer
      about 1 year ago by sparschauer | Reply

      crash-setup uses kernel-source.git to convert the version into a git tag, then to a commit, and then to the oldest git branch containing it. This step is extremely slow as it is done by comparing the commit id in the git log of all the configured git branches from old to new. The SLE release is gathered from this. I'm sure this can be sped up.

      Example: 4.4.138-94.39 -> rpm-4.4.138-94.39 -> baa07f9df91b -> SLE12-SP3

      • sparschauer
        about 1 year ago by sparschauer | Reply

        git describe is even slower. Only configuring major.minor per branch would help.

  • sparschauer
    about 1 year ago by sparschauer | Reply

    I've introduced a new crash-setup option '-s' or '--source' to get the debugsource package automatically as well. With the crash cd command I can get to the right directory so that the gdb list command will work. The idea is to create a file opencrash.sh containing crash -i ./opencrash.ini vmlinux.gz vmlinux.debug vmcore and a file opencrash.ini containing e.g. cd ./root/usr/src/debug/kernel-default-4.4.138/linux-4.4/kernel automatically.

    • sparschauer
      about 1 year ago by sparschauer | Reply

      Autocreation of opencrash.sh/.ini implemented. Since SLE12 there are different paths to source files. So cd root is used for SLE11 and before.

      Commits can be found here.

  • sparschauer
    about 1 year ago by sparschauer | Reply

    Merge request for crash-setup submitted.

  • sparschauer
    about 1 year ago by sparschauer | Reply

    I worked on the core-setup already. gdb -ex "quit" -c core can get the build id of the main program running. Then we can look into repodata/*primary.xml.gz and search for it. The next "location href" line above shows the name of the debuginfo package.

    Example:
    $ ./core-setup ./core.hald.2341
    main build id from core.hald.2341: da45f02baff8ef519840d5a17fd15926f2c802e2
    looking up main build id...
    x86_64/hal-debuginfo-0.5.12-23.76.1.x86_64.rpm

    Now we can remove "-debuginfo" from the RPM name and get the standard package name "hal-0.5.12-23.76.1.x86_64.rpm". With these two packages it is possible to run gdb regularly and to gather the build ids of the libraries.

  • sparschauer
    about 1 year ago by sparschauer | Reply

    I made a lot of progress with core-setup. add-emoji It handles the hald example core dump pretty well already. The CPU arch, the crashed executable, and the main build id are gathered from gdb -batch -ex "info auxv" -c core. Besides the packages for the crashed executable, the debug and standard packages for 20 of 22 libraries are fetched and extracted.

    Unfortunately, just removing "-debuginfo" from the package name is not enough. There are more standard packages required for the same debuginfo package. The biggest issue is that the build id is not stored in the coredump and only reflects the standard packages of the host system. So the wrong package versions are gathered.

    Looks like I have to parse rpm.txt from the supportconfig for the right versions and check which package provides the library file which has to be extracted first and then make sure that really the correct files are picked up by gdb.

    • sparschauer
      about 1 year ago by sparschauer | Reply

      At least this method works pretty well with the NetIQ GetCore Utility which gathers all loaded ELF files from the crashed system. This way the correct build ids and therefore debuginfo packages are picked up.

  • michalnowak
    about 1 year ago by michalnowak | Reply

    You may find Fedora's ABRT project aligned with your's.

  • sparschauer
    7 months ago by sparschauer | Reply

    @alnovak ignored my merge request for crash-setup. Using the change as custom enhancement if nobody else is interested in it. The core-setup idea has proven working when I picked up the rpms from the supportconfig rpm.txt manually for L3 bug 1105883.

Similar Projects

ethtool ops for netdevsim by mkubecek

This can be seen as a subproject of [ethtool ne...


netlink interface for ethtool by mkubecek

There seems to be an overall consensus that the...