Monday, April 4, 2016

Monitoring RPM dependency graph size with INSIM

During the recent months, there have been multiple attempts to reduce
installation size of packages that are in cloud and container images. Such attempts usually rely on using repoquery or just human estimation. While those techniques can suffice, I think in the long term we need to have tool that helps to at least partially automate this effort. The main reason for that is to prevent regressions by continuously monitoring the installation size.

INSIM (INstallation SIze Monitor) is a tool that monitors installation sizes of given packages over the time for multiple releases of Fedora. It can compare dependency graph of given package at any two points in history.

Example


Let's take a look at a concrete example. I'm a maintainer of freemind package and although freemind is not in any cloud image, I'm getting complaints from users that it's pulling in hundreds of MBs, while upstream size is way smaller.

You can see it here: http://insim.fedorainfracloud.org/insim/module/freemind

Let's take a look how INSIM can help me in this case.The basic view can show you the installation size comparison graphs across multiple Fedora releases. We can clearly see that although I tried to reduce the depenency size in Fedora 23, there has been a regression in size in Fedora 24. I haven't updated freemind itself, so the change has to be somewhere deeper in the dependency graph.

INSIM then shows graphs showing dependency size changes within particular Fedora releases. By clicking on a particular point on the graph's line, you can select it for displaying dependency details at particular time, or select it for comparison. After you select it for comparison, you can select another point, possibly in a graph for different release, to do the comparison against. In case of freemind, I'll select points on Fedora 23 and Fedora 24 graphs. Insim will display the changes in dependencies in multiple forms - as a list, and as a graph. Since I'm comparing across two releases, the graph form is more useful.


This is an excerpt from the dependency graph, which wouldn't fit here. You can see the full diff at: http://insim.fedorainfracloud.org/insim/installation/diff/74409/112879
There you can clearly see which package is to blame for the regression in installation size - groovy-lib. Therefore I was able to identify the reason for size increase within few seconds.

How it works

Now that I introduced how it can help, I'd like to mention a few details about
how it works. The main entity tracked by INSIM is called a module. Module is a set of packages to be monitored. In most cases, it will contain just the single package - the one the maintainer is interested in - but it can be a whole package group. A module can be based on another module which is used as a baseline in comparisons. In the previous example the baseline for freemind was java - the java-1.8.0-openjdk package. The dependencies pulled in by the base are not included in the graphs and calculations.
INSIM periodically processes Fedora repos to resolve dependencies of packages using libsolv/hawkey library. Therefore the resolution algorithm is the same as with dnf. The dependency difference calculations are then done on demand.

How can you use it?

The production instance is available at http://insim.fedorainfracloud.org/insim/. The current version is not self service yet - if you want to add you own module there, you need to ping INSIM maintainer, Mikolaj Izdebski (mizdebsk@redhat.com, mizdebsk on #fedora-devel).

Links

Production instance: http://insim.fedorainfracloud.org/insim/
Official documentation: https://fedoraproject.org/wiki/Insim
Source repository and issue tracking: https://pagure.io/insim/
Announcement on fedora-devel: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/LTRYDUPS347ZNWZRE6QLVYWV2TICWQVC/