Monday, April 4, 2016

Monitoring RPM dependency graph size with INSIM

During the recent months, there have been multiple attempts to reduce
installation size of packages that are in cloud and container images. Such attempts usually rely on using repoquery or just human estimation. While those techniques can suffice, I think in the long term we need to have tool that helps to at least partially automate this effort. The main reason for that is to prevent regressions by continuously monitoring the installation size.

INSIM (INstallation SIze Monitor) is a tool that monitors installation sizes of given packages over the time for multiple releases of Fedora. It can compare dependency graph of given package at any two points in history.

Example


Let's take a look at a concrete example. I'm a maintainer of freemind package and although freemind is not in any cloud image, I'm getting complaints from users that it's pulling in hundreds of MBs, while upstream size is way smaller.

You can see it here: http://insim.fedorainfracloud.org/insim/module/freemind

Let's take a look how INSIM can help me in this case.The basic view can show you the installation size comparison graphs across multiple Fedora releases. We can clearly see that although I tried to reduce the depenency size in Fedora 23, there has been a regression in size in Fedora 24. I haven't updated freemind itself, so the change has to be somewhere deeper in the dependency graph.

INSIM then shows graphs showing dependency size changes within particular Fedora releases. By clicking on a particular point on the graph's line, you can select it for displaying dependency details at particular time, or select it for comparison. After you select it for comparison, you can select another point, possibly in a graph for different release, to do the comparison against. In case of freemind, I'll select points on Fedora 23 and Fedora 24 graphs. Insim will display the changes in dependencies in multiple forms - as a list, and as a graph. Since I'm comparing across two releases, the graph form is more useful.


This is an excerpt from the dependency graph, which wouldn't fit here. You can see the full diff at: http://insim.fedorainfracloud.org/insim/installation/diff/74409/112879
There you can clearly see which package is to blame for the regression in installation size - groovy-lib. Therefore I was able to identify the reason for size increase within few seconds.

How it works

Now that I introduced how it can help, I'd like to mention a few details about
how it works. The main entity tracked by INSIM is called a module. Module is a set of packages to be monitored. In most cases, it will contain just the single package - the one the maintainer is interested in - but it can be a whole package group. A module can be based on another module which is used as a baseline in comparisons. In the previous example the baseline for freemind was java - the java-1.8.0-openjdk package. The dependencies pulled in by the base are not included in the graphs and calculations.
INSIM periodically processes Fedora repos to resolve dependencies of packages using libsolv/hawkey library. Therefore the resolution algorithm is the same as with dnf. The dependency difference calculations are then done on demand.

How can you use it?

The production instance is available at http://insim.fedorainfracloud.org/insim/. The current version is not self service yet - if you want to add you own module there, you need to ping INSIM maintainer, Mikolaj Izdebski (mizdebsk@redhat.com, mizdebsk on #fedora-devel).

Links

Production instance: http://insim.fedorainfracloud.org/insim/
Official documentation: https://fedoraproject.org/wiki/Insim
Source repository and issue tracking: https://pagure.io/insim/
Announcement on fedora-devel: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/LTRYDUPS347ZNWZRE6QLVYWV2TICWQVC/

Monday, October 13, 2014

New features in mock-1.2

You may noticed there's been a new release of mock in rawhide (only). It incorporates all the new features I've been working on during my Google Summer of Code project, so I'd like to summarize them here for the people who haven't been reading my blog. Note that there were some other new features that weren't implemented by me, so I don't mention them here. You can read more about the release at http://miroslav.suchy.cz/blog/archives/2014/10/12/big_changes_in_mock/index.html.

LVM plugin
The usual way to cache already initialized buildroot is using tarballs. Mock can now also use LVM as a backend for caching buildroots which is a bit faster and enables efficient snapshotting (copy-on-write). This feature is intended to be used by people who maintain a lot packages and find themselves waiting for mock to install the same set of BuildRequires over and over again.
Mock uses LVM thin provisioning which means that one logical volume (called thinpool) can hold all thin logical volumes and snapshots used by all buildroots (you have to set it like that in the config) without each of them having fixed size. Thinpool is created by mock when it's starts initializing and after the buildroot is initialized, it creates a postinit snapshot which will be used as default. Default snapshot means that when you execute clean or start a new build without --no-clean option, mock will rollback to the state in default snapshot. As you install more packages you can create your own snapshots (usually for dependency chains that are common to many of your packages). I'm a Java packager and most of my packages BuildRequire maven-local which pulls in 100MB worth of packages. Therefore I can install maven-local just once and then make a snapshot with
mock --snapshot maven
and then it will be used as the default snapshot to which --clean will rollback whenever I build another package. When I want to rebuild a package that doesn't use maven-local, I can use
mock --rollback-to postinit
and the initial snapshot will be used for following builds. My maven snapshot will still exist, so I can get back to it later using --rollback-to maven. To get rid of it completely, I can use
mock --remove-snapshot maven
So how do you enable it?
The plugin is distributed as separate subpackage mock-lvm because it pulls in additional dependencies which are not available on RHEL6. So you first need to install it.
You need to specify a volume group which mock will use to create it's thinpool. Therefore you need to have some unoccupied space in your volume group, so you'll probably need to shrink some partition a bit. Mock won't touch anything else in the VG, so don't be afraid to use the VG you have for your system. It won't eat your data, I promise. The config for enabling it will look like this:
config_opts['plugin_conf']['root_cache_enable'] = False
config_opts['plugin_conf']['lvm_root_enable'] = True
config_opts['plugin_conf']['lvm_root_opts'] = {
    'volume_group': 'my-volume-group',
    'size': '8G',
    'pool_name': 'mock',
}

To explain it: You need to disable root_cache - having two caches with the same contents would just slow you down. You need to specify a size for the thinpool. It can be shared across all mock buildroots so make sure it's big enough. Ideally there will be just one thinpool. Then specify name for the thinpool - all configs which have the same pool_name will share the thinpool, thus being more space-efficient. Just make sure the name doesn't clash with existing volumes in your system (you can list existing volumes with lvs command). For information about more configuration options for LVM plugin see config documentation in /etc/mock/site-defaults.cfg.
Additional notes:
Mock leaves the volume mounted by default so you can easily acces the data. To conveniently unmount it, there's a command --umount. To remove all volumes use --scrub lvm. This will also remove the thinpool only if no other configuration has it's volumes there.
Make sure there's always enough space, overflown thinpool will stop working.

Nosync - better IO performance

One of the reasons why mock has always been quite slow is because installing a lot of packages generates heavy IO load. But the main bottleneck regarding IO is not unpacking files from packages to disk but writing Yum DB entries. Yum DB access (used by both yum and dnf) generates a lot of fsync(2) calls. Those don't really make sense in mock because people generally don't try to recover mock buildroots after hardware failure. We discovered that getting rid of fsync improves the package installation speed by almost a factor of 4. Mikolaj Izdebski developed small C library 'nosync' that is LD_PRELOADed and replaces fsync family of calls with (almost) empty implementations. I added support for it in mock.
How to activate it?
You need to install nosync package (available in rawhide) and for multilib systems (x86_64) you need version for both architectures. Then it can be enabled in mock by setting
config_opts['nosync'] = True
It is requires those extra steps to set up but it really pays off quickly.

DNF support
Mock now has support for using DNF as package manager instead of Yum. To enable it, set
config_opts['package_manager'] = 'dnf'
You need to have dnf and dnf-plugins-core installed. There are also commandline switches --yum and --dnf which you can use to choose the package manager without altering the config. The reason for this is that DNF is still not yet 100% mature and there may be a situation where you'd need to revert back to Yum to install something.
You can specify separate config for dnf with dnf.conf config option. If you omit it, mock will use the configuration you have for Yum (config_opts['yum.conf']). To use yum-cache with DNF you have to explicitly set
cachedir=/var/cache/yum
in the dnf.conf or yum.conf config option.
Otherwise, it should behave the same in most situations and also be a bit faster.

Printing more useful output on terminal
Mock will now print the output of Yum/DNF and rpmbuild. It also uses a pseudoterminal to trick it into believing it's attached to terminal directly and also get package downloading output including the progress bars. That way you know whether it's dowloading something or it cannot connect. You need to have debuglevel=2 in your yum.conf for this to work.

Concurrent shell acces to buildroot
Non-desctructive operations use just a shared lock intread of exclusive one. That means you can get shell even though there's a build running. Please use it with caution to not alter the environment of the running build. Destructive operations like clean still need exclusive lock.

Executing package management commands
Mock now has a switch --pm-cmd which you can use to execute arbitrary Yum/DNF command. Example:
mock --pm-cmd clean metadata-expire
There are also --yum-cmd and --dnf-cmd aliases which force using particular package manager.

--enablerepo and --disablerepo options
Passing --enablerepo/--disablerepo to package manager whenever mock invokes it. Now you can have a list of disabled repos in your mock config and enable them only when you need them.

short-circuit
rpmbuild has --short-circuit option that can skip certain stages of build. It can be very useful for debuging builds which fail in later stages. Mock now also has --short-circuit option which leverages it. It accepts a name of the stage that will be the first one to be executed. Available stages are: build, install and binary. (prep stage is also possible, but I'm not the one who added that and I have no idea what it's supposed to do :D). Example:
mock --short-circuit install foo.1.2-fc22.src.rpm
rpmbuild arguments
You can specify arbitrary options that will be passed to rpmbuild with --rpmbuild-opts. Mainly for build debugging purposes.

Configurable executable paths
Mock now also supports specifying paths to rpm, rpmbuild, yum, yum-builddep and dnf executables so you can use different than system-wide versions. This may be useful for Software Collections in the future.

Automatic initialization
You don't need to call --init, you can just do --rebuild and it will do init for you. It will also correctly detect when the initialization didn't finish succesfully and start over.

More thorough cleanup logic
There sould be no more mounted volumes left behind after you interrupted build by ^C. And if they are (i.e. because it was killed), it should handle it without crashing.

Python 3 support
Main part of mock should be fully Python 3 compatible. Python 2 is still used as default. Unported parts are LVM plugin and mockchain.

--unique-ext
This is a feature that was already present for few releases, but it seems only a few people know about it, so I'd like to mention it even though it's not new. I quite often find myself in situation when I want to build a package with the same config, but there's some other build already running, so I cannot. A lot of people just copy the config and change the name of chroot, but that means additional work and most importantly it cannot use the same caches as the original config, because mock sees them as something different. Unique-ext provides a better way. It's a commandline switch that adds a suffix to chroot name, so mock creates different chroot, but it uses the same config and in turn also same caches. Caching mechanisms provide locking to make this work. Using unique-ext with LVM plugin means that the new chroot is based on the postinit snapshot. There's a lock that prevents the postinit snapshot being unnecessarily initialized twice.

If you have any questions, ping me on #fedora-devel (nick msimacek)

Saturday, August 9, 2014

GSoC - week 10+11

Last week I've been working on improving the LVM plugin thinpool sharing capabilities. I didn't explain LVM thin-provisioning before, so I'll do it now to rationalize what I'm doing.
With classical volumes you have a volume with given size and it always occupies the whole space that was given to it at the time of creation, which means that when you have more volumes, you usually can't use the space very efficiently because some of the volumes aren't used up to the capacity whereas others are full. Resizing them is a costly and risky operation. With LVM thin-provisioning there's one volume called thinpool, which provides space to thin volumes that are created within. The thin volumes take only as much space as they need and they don't have physical size limit (unless you set virtual size limit). That means that if the space is not used by one volume it can be used by another.
Previously there was one thinpool per configuration which corresponded to one buildroot. It could have snapshots, but there was still one buildroot that could be used at the moment. Now you can use mock's unique-ext mechanism to spawn more buildroots from single config. Unique-ext was there even before I started making changes, but now I implemented proper support for it in the LVM plugin. It's a feature that was designed mostly for buildsystems, but I think it can also be very useful for regular users who want to have more builds running at the same time. With LVM plugin the thinpool and all the snapshots are shared between the unique-exts, which means you can have multiple builds sharing the same cache and each one can be based on different snapshot. The naming scheme had to be changed to have multiple working volumes where the builds are executed. Mock implements locking of the initial volume creation, so if you launch two builds from the same config and there wasn't any snapshot before, only one of the processes will create the initial snapshot with base packages. The other process will block for that time, because otherwise you'll end up with two identical snapshots and that would be a waste of resources.
Other sharing mechanism that is now implemented is sharing the thinpool among independent configs. Then the snapshots aren't shared because the configs can be entirely different (for example different versions of Fedora), but you can have only one big logical volume (the thinpool) for all mock related stuff, which can save a lot of space for people that often use many different configs. You can set it with config_opts['pool_name'] = 'my-pool' and then all the configs with pool_name set to the same name will share the same thinpool.
Other than that I was mostly fixing bugs and communicating with upstream.

This week I've been on Flock and it has been amazing. There were some talks that are relevant for mock, most notably State of Copr build service, which will probably use some of the new features of mock in the future and Env&Stacks WG plans, which also mentioned mock improvements as one of their areas of interest.

Thursday, July 24, 2014

GSoC - Mock improvements - week 9

This week I've been mostly focusing on minor improvements and documentation (manpages). Almost all my changes were already submitted upstream and if everything goes well, you can expect a new release of mock to be available in rawhide in the near future. I merged changes from the ready branch to master so now they should differ only in minor things. (Sorry for duplicates in git history, I didn't realize that beforehand)

Support for Mikolaj's nosync external library was added and the old implementations that existed as a part of mock were dropped. You can enable it by setting
config_opts['nosync'] = True
and you have to install the nosync library (mock doesn't require it in the specfile, because it's not available everywhere). If the target is multilib, you need both aritectures of the library to be installed in order to have a preload library for both types of executables. If you don't, it will print a warning and won't be activated. If you can't install both versions and still want to use it, set
config_opts['nosync_force'] = True
but expect a lot of (harmless) warnings from ld.so.  The library is available in rawhide (your mirrors might not have picked it yet)

LVM plugin was moved to separate subpackage and conditionaly disabled on RHEL 6, since it requires lvm2-python-libs and newer kernel and glibc (for setns). One of the things that I needed to sacrifice when I was making the LVM plugin was the IPC namespace unsharing, which mock uses for a long time. The problem was that lvcreate and other commands deadlocked on unitialized semaphore in the new namespace, so I temporarily disabled it and hoped I'll find a solution later. And I did, I wrapped all functions that manipulate LVM in function that calls setns to get back to global IPC namespace and after the command is done, it call setns again to get back to mock's IPC namespace.

One of the other problems I encountered is Python's (2.7) handling of SIGPIPE signal. It sets it to ignored and doesn't set it back to default when it executes a new process, so a shell launched from Python 2 (by Popen, or execve) doesn't always behave the same as regular shell.
Example in shell:
$ cat /dev/zero | head -c5
# cat got SIGPIPE and exited without error

$ python -c 'import subprocess as s; s.call(["bash"])'
$ cat /dev/zero | head -c5
cat: write error: Broken pipe
# SIGPIPE was ignored and cat got EPIPE from write()

It can be fixed by calling
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
in Popen's preexec_fn argument.

For cat it's just an example and it didn't make much difference. But if you put tee between the cat and head, it will loop indefinitely instead of exiting after first 5 bytes. And there are lots of scripts out there relying on the standard behavior. It actually bit me in one of my other programs, so I thought it's worth sharing and I also fixed it in mock.

Friday, July 18, 2014

GSoC - Mock improvements - week 8

Good news, we're merging my changes upstream. It's been a lot of changes and the code wasn't always in the best shape, so I didn't want to submit it before all major features are implemented. Mirek Suchy agreed he'll do the code review and merge the changes. Big thanks to him for that :)
I've setup a new branch, rebased it on top of current upstream and tried to revisit all my code and get rid of changes that were reverted/superseded, or are not appropriate for merging yet.  I squashed fixup commits to their originl counterparts to reduce the number of commits and changed lines.
The changes that weren't submitted are:
  • C nofsync library, because Mikolaj made a more robust nosync library that is packaged separately, and therefore supersedes the bundled one.
    Link: https://github.com/kjn/nosync/
    I did the review: https://bugzilla.redhat.com/show_bug.cgi?id=1118850
    That way mock can stay noarch, which gets rid of lots of packaging issues. And also saves me a lot problems with autoconf/automake. There is no support for it yet, because I need to figure out how to make it work correctly in multilib environment.
  • nofsync DNF plugin - it's an ugly DNF hack and I consider it superseded by aforementioned nosync library
  • noverify plugin - it's also a DNF hack, I will make a RFE for optional verification in DNF upstream instead
Everything else was submitted including the LVM plugin.  The merging branch is not pushed on github because I frequently need to make changes by interactive rebasing and force pushing the branch each time kind of defeats the purpose of SCM.
Other than that I was mostly fixing bugs, the only new features are the possibility of specifying additional commandline options to rpmbuild, such as --rpmfcdebug with --rpmbuild-opts option and ability to override command executable paths for rpm, rpmbuild, yum, yum-builddep and dnf, in order to be able to use different version of the tools than the system-wide version.

Saturday, July 12, 2014

GSoC 2014 - week 7

Hi again, I'm sorry I didn't post last week, because I've been on a vacation.
Here's what I've done this week:

Passing additional options to underlying tools
rpmbuild has an option --short-circuit that skips stages of build preceding the specified one. It doesn't build a complete RPM package, but it's very handy for debugging builds that fail, especially in the install section. But this option is not accessible from within mock and I already mentioned in my proposal that I want to make it available. The option is also called --short-circuit and it accepts an argument - either build, install, or binary, representing the build phase that will be the first while the preceding phases would be skipped.
Example invocation:
$ mock rnv-1.7.11-6.fc21.src.rpm --short-circuit install

For Yum or DNF some of the options that are often used when user invokes the package manager directly also weren't available in mock. --enablerepo and --disablerepo are very common ones and now they are also supported by mock - they're directly passed to the underlying package manager.
Example invocation:
$ mock --install xmvn maven-local --enablerepo jenkins-xmvn --enablerepo jenkins-javapackages
The repos of course have to be present in the yum.conf in mock config.

Python 3 support
I started working on porting mock to Python 3. This doesn't mean that mock will run on Python 3 only, I'm trying to preserve compatibility with Python 2.6 without the need to have two version of mock for each. I changed the trace_decorator to use regular Python decorators instead of peak.utils.decorate and dropped dependency on the decoratortools package. There are slight changes in traceLog's output, that I don't consider important, but if someone did, it could be solved by using python-decorator package, which is available for both versions. There are some features that are still untested, but the regularly used functionality is already working. Rebuilding RPMs, SRPMs, working in shell, manipulating packages is tested. The plugins, that are enabled by default (yum-cache, root-cache, ccache, selinux) also work. What doesn't work is the LVM plugin, because it uses lvm2-python-libs, which doesn't have a Python 3 version yet. Same applies to mockchain, which uses urlgrabber. To try mock with Python 3, either change your system default Python implementation or manually hardcode python3 as the interpreter to the shebang in /usr/sbin/xmock.

Wednesday, June 25, 2014

GSoC week 5

Nofsync
I created another implementation of nofsync plugin (disables fsync(), makes it much faster), this time in python as DNF plugin that disables fsyncing in the YumDB. It is a small bit slower than the C library using LD_PRELOAD, because it doesn't eliminate fsyncs made from scriptlets (by gtk-update-icon-cache and such). But it's much simpler from packaging perspective (mock can stay noarch) and could be actually upstreamable (in dnf), because there are some other use cases, where you don't try to recover from hardware failure anyway - for example anaconda. If the power goes down, you probably don't try to resume existing installation. And this could make it faster (nofsync makes package installation approximately 3 times faster).
To compare the two implementations, set either
config_opts['nofsync'] = 'python'
or
config_opts['nofsync'] = 'ld_preload'
Default is python, to disable it, set the option to something else (empty string)

LVM support
Last week I implemented base for LVM plugin for mock using regular snapshots. This week I rewrote the plugin to use LVM Thin snapshots, which offer better performance, flexibility and share the space with the original volume and other snapshots, therefore don't waste much space. I created basic commands that can be used to manipulate the snapshots.
Example workflow:
I'll try to demonstrate how building different packages can be faster with LVM plugin. Let's repeat the configuration options necessary to set it up:
config_opts['plugin_conf']['root_cache_enable'] = False
config_opts['plugin_conf']['lvm_root_enable'] = True
config_opts['plugin_conf']['lvm_root_opts'] = {
    'volume_group': 'my-volume-group',

}
You can now also specify 'mount_options', which will be passed to -o option of mount. To set size to larger than the default 2GB, use for example 'size': '4G' (it is passed to lvcreate's -L option, so it can be any string lvcreate will understand). Now let's initialize it:
$ mock --init
Mock will now create thin pool with given size, create a logical volume in it, mount it and install the base packages into it. After the initialization is done, it creates a new snapshot named 'postinit', which will be then used to rollback changes during --clean (which is by default also executed as part of --rebuild). Now try to install some packages you often use for building your own packages. I'm a Java packager and almost every Java package in Fedora requires maven-local to build.
$ mock --install maven-local
But now since I want to rebuild more Java packages, I'd like to make snapshot of the buildroot.
$ mock --snapshot mvn
This creates a new snapshot of the current state and sets it as the default. We can list snapshots of current buildroot with --list-snapshots command (the default snapshot is prefixed with asterisk)
$ mock --list-snapshots
Snapshots for mock-devel:
  postinit
* mvn


So let's rebuild something
$ mock --rebuild jetty-9.2.1-1.fc21.src.rpm
$ mock --rebuild jetty-schemas-3.1-3.fc21.src.rpm
Because the 'mvn' snapshot was set as the default, it means that each clean executed as part of the rebuild command didn't return to the state in 'postinit', but to the state in 'mvn' snapshot. And that was the reason we wanted LVM support in the first place - it didn't have to install 300+MB of maven-local's dependencies again (with original mock, this would probably take more than 3 minutes) but still the buildroot was cleaned of the packages pulled in by previous build. We could then install some additional packages, for example eclipse, and make a snapshot that can be used to build eclipse plugins.
Now let's pretend there has been an update to my 'rnv' package, which is in C and doesn't use maven-local.
$ mock --rollback-to postinit
$ mock --list-snapshots
  mvn
* postinit
Now 'postinit' snapshot was set as default and buildroot has been restored to the state it was in when 'postinit' snapshot was taken (after initialization, no maven-local there). The 'mvn' snapshot is retained and we can switch back again using --rollback-to mvn.
So now I can rebuild my hypothetical rnv update. If I decide that I don't need the 'mvn' snapshot anymore, I can remove it with
$ mock --remove-snapshot mvn
You cannot remove 'postinit' snapshot. To remove all logical volumes belonigng to the buildroot, use mock --scrub lvm
 
So that's it. You can create as many snapshots as you want (and snapshots of snapshots) and keep a hierarchy of them to build packages that have different sets of BuildRequires.
Few more details:
  • The real snapshot names passed to LVM commands have root name prefixed to avoid clashes with other buildroots or volumes that don't belong to mock at all. It also checks whether the snapshots belong to mock's thinpool.
  • The volume group needs to be provided by user, mock won't create one. It won't touch anything else besides the thinpool, so it should be quite safe if it uses the same volume group as you system (I have it like that).
  • The command names suck. I know. I'll try to provide short options for them.
  • If you try the version in my jenkins repository, everything is renamed to xmock including the command - to allow it to exist alongside original mock.