Wednesday, May 28, 2014

GSoC 2014 - week 2

It's been a week (and one day) since my last post, so I'd like to share my progress. I've been continuing with refactoring the mock initialization and splitting the functionality into smaller pieces.

DNF support
I've implemented suport for using DNF as a package manager instead of Yum. Both package managers are used through a common abstract class and which one is used is decided based on the configuration ('package_manager' option, which defaults to yum when unspecified). This behavior can be overriden by explicitly passing --yum or --dnf commandline options. There have been some concerns whether the installroot support in DNF is working correctly, but I've been able to do a rebuild of a package using DNF exclusively and it worked fine without any changes to the config.

Direct interaction with package manager
One of the features I mentioned in my proposal was providing direct interaction with the package manager. Current mock allows invoking only small subset of commands that Yum/DNF provide. I added a '--pm-cmd' command that allows invoking arbitrary package management commands. There are also '--yum-cmd' and '--dnf-cmd' that are just shortcuts to specifying '--yum' or '--dnf' and the '--pm-cmd' option.
Example:
mock --yum-cmd reinstall expat-devel

Mock interactivity
The week before I implemented shared locking that allows multiple mock processes to operate on the same buildroot and I was able to get more instances of mock shell. But that was only part of what needed to be done, because other mock commands had their own cleanup logic, thus usually killing still running processes and clashing while unmounting shared devices. I've been working on integrating the new initialization/cleanup scheme into all remaining commands and now there shouldn't be such clashes anymore. I've also made the cleanup more reliable (runs even if exception is raised) and the initialization more resistant - dealing with possible remaining mounts from previous run without needing the user to unmount them manually. I'm also trying to make the startup faster by marking what parts of initialization which were already done, therefore don't need to be repeated.

Next steps
I'd like to start making foundation for the main feature I want to implement - support for using LVM for storing buildroot images and therefore being able to leverage it's snapshoting features.

Wednesday, May 21, 2014

GSoC 2014: mock improvements - week 1

My proposal for Google Summer of Code 2014 - improving the mock tool has been accepted. In the next few months I'm going to focus on improving this tool used in packaging to facilitate faster and more effective workflow for packagers.

What do I want to improve?
(This section is just recapitulation of the proposal, if you already read it, skip it)
  • Better caching - imporve the performance and provide easier mechanisms for cloning and manipulating buildroots. Currently, mock uses tarballs for keeping the buildroot cache and extracts them when needed. This is slow and not very flexible. Using snapshots would provide a way to conveniently save, restore or clone the current state of the buildroot. This could speed up the workflow of packagers who have to work with many packages at once. Current options are either cleaning the buildroot with each package which takes a long time to since dependencies have to be installed again. Or manually copying the buildroot which is also slow and inconvenient. Or not cleaning the mock at all which can make some packaging mistakes (missing BuildRequires) go unnoticed. My proposed modification could present another workflow that could be faster than cleaning with each rebuild but not error prone as not cleaning at all. For example Java packages built with Maven could save a buildroot snapshot with maven-local and its long chain of dependencies and reuse it for multiple packages.
  • Provide a way to revert recent changes without having to clean the buildroot and install all dependencies over again. A similar idea to the previous. Mock could automatically make a snapshot after installing package's build dependencies, but before actually building it. This could provide a way to do manual modifications to buildroot without having to reinstall dependencies after you turned your immediate changes into patches and you want to rebuild it to see whether they work.
  • Ability to work interactively - add realtime logging to the terminal. Packager should immediately see the output of the package management system and the build system on his terminal without having to look for the logs stored deep in the filesystem
  • Allow to use mock shell during running build - mock uses one global lock per buildroot which prevents you to use the mock shell when there is already a build in progress and vice versa. More finely-grained locking would be more appropriate, because packager usually doesn't want to interfere with the running build, but just query some information or copy files.
  • Allow using rpmbuild's short-circuit mechanisms to reinitiate failed build without having to start over - rpmbuild provides short-circuit option that starts the execution of the build in the given phase skipping previous phases. If the package built fine but the file verification failed, don't force the packager to repeat the whole process from the very beginnning.
  • Use DNF instead of YUM to install packages - DNF is a modular package manager that is meant to replace YUM in the future. Since it is already usable, mock should be able to use it instead.
  • Provide more ways to interact with the package management system within the mock - using DNF straight from the mock shell could be more straightforward than interacting with the package manager via mock command line interface.
  • Handle user interrupts without corrupting the buildroot or leaving still running processes - If you run a mock build and realize you made a mistake and want to stop it with <C-c>, there is a high probability that your buildroot will not be usable for the next build or there will be some remaining processes that weren't terminated when you interrupted. It also should provide a way to pause the whole build, in case you need more computational power or your battery is running low due to increased resource usage
The repository
I've setup a git fork of the upstream git repository based on the msuchy-work branch. You can see my changes here: https://github.com/msimacek/mock

What have I already done?
Started refactoring the mock backend, which currently lacks the abstraction needed for implementing aforementioned features. Mock started as a simple script and evolved as such. The functionality is scattered across the modules and lots of the commands are big monolithic sequences of function calls performing different things that could and should be split into smaller parts. My mentor already suggested making a set of small, atomic commands forming a low-level interface on which the high-level and more complex commands would be built. I began with refactoring the locking scheme. The current implementation allows only one running instance of mock per buildroot by using an exclusive file-based lock. The functions that operate on the buildroot are responsible for locking the buildroot, performing the initialization and the cleanup. And sometimes those actions are done directly from main. I'm trying to factor out the common parts, so the buildroot preparation and finalization is a responsibility of single function and the commands can assume that the builroot is already prepared and will be cleaned up after they finish without their direct intervention.
I changed the locking scheme to use the exclusive lock only when the operations are destructive (buildroot cleaning/deleting). For non-destructive commands the buildroot is locked exclusively only for its initialization and only if it wasn't initialized already. Then it is locked with a shared lock, allowing multiple instances at the same time to coexist. If the buildroot is locked with a shared lock, we know that there is still some mock process working. Thus, the initialization is done only by the first process to enter buildroot and the cleanup actions are performed only by the process that happens to be the last. This allows for example having a running build and still logging into the mock shell and examining the build environment. Further redesigning is needed to enable the possibility of having lvm-based storage with snapshot abilities, which will require slightly different initialization and finalization to work.

Another part where an additional abstraction will be needed is the package management system. Currently, only yum is supported and the command is hardcoded, therefore adding support for dnf needs some refactoring. I created an abstract class providing a common interface for package management, which will then have subclasses Yum and Dnf that would handle specifics of those two systems. The current implementation I have is just a skeleton denoting how it may look like, but it needs to be integrated to the rest of the mock and replace hardcoded invocations of yum.

What I plan to do next?
Continue with refactoring. Apropriate abstraction is essesntial to be able to add new features and fix existing issues. The whole initialization part is quite messy and mixes input parsing, config file parsing, initialization and performing the actual commands into one big part. The backend should be divided into low-level atomic commands and high-level ones, similarly to git which has a plumbing a layer and a porcelain layer built on top of it. It should also provide direct acces to the underlying package manager (currently yum) in order to not restrict users to a particular set of commands and enable them to utilize its full power.