One

“One Ring to rule them all”

– J.R.R. Tolkien, The Lord of the Rings

It has been done.

LibreOffice is now build by one instance of make that is aware of the whole dependency tree. According to my master development build (that is: a build without localization, help, extensions) yesterday, this instance of make now knows about

126.501 targets from 1.717 makefiles

and has a complete view of how they relate to each other. The memory usage of make is at 207 MiB, only slightly overshooting the initial estimations done in the early days of gbuild of 170-190MiB (counting in that the codebase changed a lot in two years, the estimate is actually really good). Given that recursive make is considered harmful and that LibreOffice — one of the biggest open source projects, with huge dependencies and doing releases on three platforms (Windows, OS X and Unix — a lot more if you separate the different Unix flavours), can do this — there is little excuse left for other projects to not follow suit.

On my machine, checking if anything needs to be rebuild in LibreOffice now takes ~28.7sec (or 37.2sec when also running the default sanity checks along that). That might sound a lot, but consider the scale! And it is a long way from the old OpenOffice.org build system that we came from: Just from my memory, it took about 5 Minutes to do that on the old build system. On Windows it took almost 30 Minutes to find out that there is nothing to do. One of my earliest talks (Slide 29) on the topic of gbuild compared the performance of partial build, if you find these numbers hard to believe. Oh, and of course you still can check for updating only a subset of LibreOffice (a “module” – e.g. Writer) and that takes only 2-3 seconds even for the biggest ones.

How gbuild spends the 37 seconds to ensure that nothing need to be rebuild: orange = reading the definition of targets (singlethreaded, CPU-bound), grey = stat'ing and checking the filesystem, blue = running sanity tests (multithreaded)

How gbuild spends the 37 seconds to ensure that nothing need to be rebuild: orange = reading/parsing the makefiles (singlethreaded), grey = stat’ing and checking the filesystem, blue = running sanity tests (multithreaded)

Does this difference in performance matter? As Linus argued to eloquently in his google tech talk on git: Yes, it does. Because it enables different ways to work, that just were not possible before. One such example is that we can have incremental build tinderboxes like the Linux-Fedora-x86_64_22-Incremental one, which comes with a turnaround of some 3-5 minutes most of the time by now and quickly reports if something was broken.

There are other things improved with the new build system too. For example, in the old build system, if you wanted to add a library, you had to touch a lot of places (at minimum: makefile.mk for building it, prj/d.lst for copying it, solenv/inc/libs.mk for others to be able to link to it, scp2 to add it to the installation and likely some other things I have forgotten), while now you have to only modify two places: one to describe what to build and one to describe where it ends up in the install. So while the old build system was like a game of jenga, we can now move more confidently and quickly.

Touching the old build system was like a game of jenga. Except that it wasnt fun. (Photo: Copyright CC BY-NC-SA 2.0 Jose Hernandez)

Then there is scalability: The old build system did not scale well beyond 4-8 jobs as it had no global notion of how make jobs where running. As we see CPU architectures become more important that have slower, but cheaper cores, this is getting increasingly relevant. Do you have a 1000 core distcc cluster you want to testdrive? LibreOffice might be the project you want to try.

Finally, the migration to gbuild is a proof of how amazing the community is that is growing around the project: While I set up the initial infrastructure for gbuild, the hard work of migrating over 200 modules (each the size of your average open source project) to it without breaking on one of three platforms or disrupting the ongoing development on features and bugfixes was mostly done by a crowd of volunteers. Looking back, I doubt the migration to gbuild would have been completed in reasonable time in an environment less inviting to volunteers and contributors — it was the distribution of the work that made this possible. So the credit for that we now can profit from the benefits of gbuild really goes to these guys. Big kudos for everyone working on this, you created something amazing!

Addendum: This post has been featured on lwn and led to a spirited discussion there.

Notes:

For estimating the number of targets, I used:

make -f Makefile -np all slowcheck|grep 'File.*update'|wc -l

For the memory usage:

pmap -d $(ps -a|grep make|cut -f1 -d\ )|egrep -o writeable/private:.[0-9]+K|cut -f 2 -d\
About these ads

About bmichaelsen

productivity liberator

14 responses to “One”

  1. HelloWorld says :

    37 seconds is certainly better than 5 Minutes. But you know what? It’s still way too slow, for my taste anyway.

    If you want things to be FAST, get rid of make and try tup:

    http://www.gittup.org/tup/

    • bmichaelsen says :

      Patches welcome! ;)

      But seriously: I wasnt even trying hard. As you can see above, of the 37 seconds, 8 seconds are executing unittests. Removing that would bring the time down to ~30 seconds. Now I happened to just use the latest upstream version of GNU make, which is regressing a bit versus GNU make 3.81. If I use that version and put myself in full gentoo ricer mode (those where the days) and therefore compile it with CFLAGS=”-O3 -march=native”, I can get the time down to real 0m19.695s. If I then skip reading the 186MB of files generated by “gcc -MM”, its down to 0m10.962s.

      Also note that the deepth and size of the LibreOffice dependency tree is something to take into account: The tup examples are way too tame for my taste in that regard. Even a “time find ../workdir/unxlngx6.pro/Dep/ -type f|xargs cat|wc -l” takes 0m0.936s on my machine (or 5% of the total make time) just for a most trivial parsing of those 186MB — and that doesnt even store the file names in a data structure. “time find ../workdir/unxlngx6.pro/Dep/ -type f|xargs cat|sort > /dev/null” already takes … 3m32.874s.

      So tup and also e.g. http://martine.github.com/ninja/ are certainly interesting projects, but I dont think that walking the graph is actually the important place to optimize. If one wants to optimize something, I think the best bet is to go for a/ the hashtables where the filenames are stored b/ create a “dumb dependency only mode” in make, in which the makefile parser drops to a much simpler submode, which only allows declaration of dependencies between files. Both could be done directly in GNU make.

      But that would kinda miss the point: Which was to make LibreOffice build on something that is pretty much universally available and hopefully doesnt make us not maintain our own tools on three platforms. GNU make was a conservative bet there, but I think its still a good one: I dont see it dying in the next 10 years, nor us having to take over maintainership (as was the case for dmake).

      Do I think you should use gbuild to build your project? Most likely not: unless you need complex stuff like building libs for different directories and rpathing to each other on three platforms (OS X is especially fun on that one) or precompiled headers on Windows, optionally build your project against external or internal versions of a library etc. it is most likely an overkill.

      But I suggest you look hard at not using automake and just write plain makefiles instead. If you would try automake for a project as complex as LibreOffice the result would look even more horrible than your average automake project, if you even get it to work with the bazillion of special cases. And if your project is sweet and small, your plain makefiles will be a lot simpler, faster and clearer than anything automake does.

      • Thomas says :

        Out of curiosity, do you use a ramdisk or something close to build ? Maybe a fast SSD ?

      • bmichaelsen says :

        Yes, this particular build was done on tmpfs. Though the performance gains (on Linux) arent that huge, once you just have the RAM you already won, as the kernel cache is doing a real good job even without tmpfs.

        From what I see, having enough RAM is king. Once you have enough of it, a SSD is not needed anymore. In fact, that machine still has plain old magnetic discs.

        Another reason for RAM, apart from being faster: You dont have to worry about disc wear. On the machine where I test the Ubuntu release builds I have an SSD. With the size of the LibreOffice build (esp. in a release configuration) and when doing full builds from scratch, that possibly churns away quite a bit.

        Anyway: That SSD is still chugging along nicely …

  2. kowalski marcin says :

    Being a Gentoo user, i would like to know which version of libreoffice will feature this change?

    Is that something that will appear in nearest minor version, or a feature put off for next major release?

    • bmichaelsen says :

      The migration to gbuild was done step by step — or rather module by module. LibreOffice 4.0 already builds almost all with it, but stuff like e.g. having the job management correct only works with everything in gbuild. This will be in LibreOffice 4.1.x and, no — I dont think we will backport that to 4.0.x ;)

  3. foo says :

    Do you have pointers to ressources on how to craft a good build system like gbuild using make? I mean something that would go beyond the O’Reilly book, which I think doesn’t go very far.
    The GNU Make manual is fine, but it merely suggests what basic mechanisms are available, I’d like to see the bigger picture of what can be achieved.

  4. Vasileios Anagnostopoulos says :

    Does this mean that mingw builds are easier now?

    • bmichaelsen says :

      Not per se, but I would be tempted to entertain the idea that the cleaned up makefiles make it less likely for a change to break stuff on one platform, but not the others.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: