50 ways to fill your vector …

“The problem is all inside your head” she said to me
“The answer is easy if you take it logically”
— Paul Simon, 50 ways to leave your lover

So recently I tweaked around with these newfangled C++11 initializer lists and created an EasyHack to use them to initialize property sequences in a readable way. This caused a short exchange on the LibreOffice mailing list, which I assumed had its part in motivating Stephans interesting post “On filling a vector”. For all the points being made (also in the quick follow up on IRC), I wondered how much the theoretical “can use a move constructor” discussed etc. really meant when the C++ is translated to e.g. GENERIC, then GIMPLE, then amd64 assembler, then to the internal RISC instructions of the CPU – with multiple levels of caching in addition.

So I quickly wrote the following (thanks so much for C++11 having the nice std::chrono now).

data.hxx:

#include <vector>
struct Data {
    Data();
    Data(int a);
    int m_a;
};
void DoSomething(std::vector<Data>&);

data.cxx:

#include "data.hxx"
// noop in different compilation unit to prevent optimizing out what we want to measure
void DoSomething(std::vector<Data>&) {};
Data::Data() : m_a(4711) {};
Data::Data(int a) : m_a(a+4711) {};

main.cxx:

#include "data.hxx"
#include <iostream>
#include <vector>
#include <chrono>
#include <functional>

void A1(long count) {
    while(--count) {
        std::vector<Data> vec { Data(), Data(), Data() };
        DoSomething(vec);
    }
}

void A2(long count) {
    while(--count) {
        std::vector<Data> vec { {}, {}, {} };
        DoSomething(vec);
    }
}

void A3(long count) {
    while(--count) {
        std::vector<Data> vec { 0, 0, 0 };
        DoSomething(vec);
    }
}

void B1(long count) {
    while(--count) {
        std::vector<Data> vec;
        vec.reserve(3);
        vec.push_back(Data());
        vec.push_back(Data());
        vec.push_back(Data());
        DoSomething(vec);
    }
}

void B2(long count) {
    while(--count) {
        std::vector<Data> vec;
        vec.reserve(3);
        vec.push_back({});
        vec.push_back({});
        vec.push_back({});
        DoSomething(vec);
    }
}

void B3(long count) {
    while(--count) {
        std::vector<Data> vec;
        vec.reserve(3);
        vec.push_back(0);
        vec.push_back(0);
        vec.push_back(0);
        DoSomething(vec);
    }
}

void C1(long count) {
    while(--count) {
        std::vector<Data> vec;
        vec.reserve(3);
        vec.emplace_back(Data());
        vec.emplace_back(Data());
        vec.emplace_back(Data());
        DoSomething(vec);
    }
}

void C3(long count) {
    while(--count) {
        std::vector<Data> vec;
        vec.reserve(3);
        vec.emplace_back(0);
        vec.emplace_back(0);
        vec.emplace_back(0);
        DoSomething(vec);
    }
}

double benchmark(const char* name, std::function<void (long)> testfunc, const long count) {
    const auto start = std::chrono::system_clock::now();
    testfunc(count);
    const auto end = std::chrono::system_clock::now();
    const std::chrono::duration<double> delta = end-start;
    std::cout << count << " " << name << " iterations took " << delta.count() << " seconds." << std::endl;
    return delta.count();
}

int main(int, char**) {
    long count = 10000000;
    while(benchmark("A1", &A1, count) < 60l)
        count <<= 1;
    std::cout << "Going with " << count << " iterations." << std::endl;
    benchmark("A1", &A1, count);
    benchmark("A2", &A2, count);
    benchmark("A3", &A3, count);
    benchmark("B1", &B1, count);
    benchmark("B2", &B2, count);
    benchmark("B3", &B3, count);
    benchmark("C1", &C1, count);
    benchmark("C3", &C3, count);
    return 0;
}

Makefile:

CFLAGS?=-O2
main: main.o data.o
    g++ -o $@ $^

%.o: %.cxx data.hxx
    g++ $(CFLAGS) -std=c++11 -o $@ -c $<

Note the object here is small and trivial to copy as one would expect from objects passed around as values (as expensive to copy objects mostly can be passed around with a std::shared_ptr). So what did this measure? Here are the results:

Time for 1280000000 iterations on a Intel i5-4200U@1.6GHz (-march=core-avx2) compiled with gcc 4.8.3 without inline constructors:

implementation / CFLAGS -Os -O2 -O3 -O3 -march=…
A1 89.1 s 79.0 s 78.9 s 78.9 s
A2 89.1 s 78.1 s 78.0 s 80.5 s
A3 90.0 s 78.9 s 78.8 s 79.3 s
B1 103.6 s 97.8 s 79.0 s 78.0 s
B2 99.4 s 95.6 s 78.5 s 78.0 s
B3 107.4 s 90.9 s 79.7 s 79.9 s
C1 99.4 s 94.4 s 78.0 s 77.9 s
C3 98.9 s 100.7 s 78.1 s 81.7 s

creating a three element vector without inlined constructors
And, for comparison, following are the results, if one allows the constructors to be inlined.
Time for 1280000000 iterations on a Intel i5-4200U@1.6GHz (-march=core-avx2) compiled with gcc 4.8.3 with inline constructors:

implementation / CFLAGS -Os -O2 -O3 -O3 -march=…
A1 85.6 s 74.7 s 74.6 s 74.6 s
A2 85.3 s 74.6 s 73.7 s 74.5 s
A3 91.6 s 73.8 s 74.4 s 74.5 s
B1 93.4 s 90.2 s 72.8 s 72.0 s
B2 93.7 s 88.3 s 72.0 s 73.7 s
B3 97.6 s 88.3 s 72.8 s 72.0 s
C1 93.4 s 88.3 s 72.0 s 73.7 s
C3 96.2 s 88.3 s 71.9 s 73.7 s

creating a three element vector without inlined constructors
Some observations on these measurements:

  • -march=... is at best neutral: The measured times do not change much in general, they only even slightly improve performance in five out of 16 cases, and the two cases with the most significant change in performance (over 3%) are actually hurting the performance. So for the rest of this post, -march=... will be ignored. Sorry gentooers. ;)
  • There is no silver bullet with regard to the different implementations: A1, A2 and A3 are the faster implementations when not inlining constructors and using -Os or -O2 (the quickest A* is ~10% faster than the quickest B*/C*). However when inlining constructors and using -O3, the same implementations are the slowest (by 2.4%).
  • Most common release builds are still done with -O2 these days. For those, using initializer lists (A1/A2/A3) seem too have a significant edge over the alternatives, whether constructors are inlined or not. This is in contrast to the conclusions made from “constructor counting”, which assumed these to be slow because of additional calls needed.
  • The numbers printed in bold are either the quickest implementation in a build scenario or one that is within 1.5% of the quickest implementation. A1 and A2 are sharing the title here by being in that group five times each.
  • With constructors inlined, everything in the loop except DoSomething() could be inline. It seems to me that the compiler could — at least in theory — figure out that it is asked the same thing in all cases. Namely, reserve space for three ints on the heap, fill them each with 4711 and make the ::std::vector<int> data structure on the stack reflect that, then hand that to the DoSomething() function that you know nothing about. If the compiler would figure that out, it would take the same time for all implementations. This doesnt happen either on -O2 (differ by ~18% from quickest to slowest) nor on -O3 (differ by ~3.6%).

One common mantra in applications development is “trust the compiler to optimize”. The above observations show a few cracks in the foundations of that, esp. if you take into account that this is all on the same version of the same compiler running on the same platform and hardware with the same STL implementation. For huge objects with expensive constructors, the constructor counting approach might still be valid. Then again, those are rarely statically initialized as a bigger bunch into a vector. For the more common scenario of smaller objects with cheap constructors, my tentative conclusion so far would be to go with A1/A2/A3 — not so much because they are quickest in the most common build scenarios on my platform, but rather because the readability of them is a value on its own while the performance picture is muddy at best.

And hey, if you want to run the tests above on other platforms or compilers, I would be interested in results!

Note: I did these runs for each scenario only once, thus no standard deviation is given. In general, they seemed to be rather stable, but this being wallclock measurements, one or the other might be outliers. caveat emptor.

SwNodeIndex: Ludicious Speed

No-no-no, light speed is too slow!
Yes, we’ll have to go right to… ludicrous speed!

– Dark Helmet, Spaceballs

So, I recently brought up the topic of writers notes in the LibreOffice ESC call. More specifically: the SwNodeIndex class, which is, if one broadly simplifies an iterator over the container holding all the paragraphs of a text document. Before my modifications, the SwNodes container class had all these SwNodeIndices in a homegrown intrustive double linked list, to be able to ensure these stay valid e.g. if a SwNode gets deleted/removed. Still — as usual with performance topics — wild guesses arent helpful, and measurements should trump over intuition. I used valgrind for that, and measured the number of instructions needed for loading the ODF spec. Since I did the same years and years ago on the old OpenOffice.org performance project, I just checked if we regressed against that. Its comforting that we did not at all — we were much faster, but that measurement has to be taken with a few pounds of salt, as a lot of other things differ between these two measurements (e.g. we now have a completely new build system, compiler versions etc.). But its good we are moving in the right direction.

implementation SwNodes SwNodeIndex total instructions performance linedelta
DEV300_m45 71,727,655 73,784,052 9,823,158,471 ? ?
master@fc93c17a 84,553,232 60,987,760 6,170,762,825 0% 0
std::list 18,461,317 103,461,317 14,502,230,571 -5,725%
(-235% of total)
+12/-70
std::vector 18,986,848 3,707,286,032 9,811,541,380 -2,502% +22/-70
std::unordered_map 18,984,984 82,843,000 7,083,620,244 -627%
(-15% of total)
+16/-70
std::vector rbegin 18,986,848 143,851,229 6,214,602,532 -30%
(-0.7% of total)
+23/-70
sw::Ring<> 23,447,256 inlined 6,154,660,709 11%
(2.6% of total)
+108/-229

With that comforting knowledge, I started to play around with the code. The first thing I did was to replace the handcrafted intrusive list with a std::list pointing to the SwNodeIndex instances as a member in the SwNodes class. This is expected to slow down things, as now two allocs are needed: one for the SwNodeIndex class and one for the node entry in the std::list. To be honest though, I didnt expect this to slow down the code handling the nodes by a factor of ~57 for the loading of the example document. This whole document loading time (not just the node handling) slows by a factor of ~2.4. So ok, this establishes for certain that this part of the code is highly performance sensitive.

The next thing I tried to get a feel for how the performance reacts was using a std::vector in the SwNodes class. When reserving some memory early, this should severely reduce the amount of allocs needed. And indeed this was quicker than the std::list even with a naive approach just doing a push_back() for insertion and a std::find()/std::erase() for removal. However, the node indices are often temporarily created and quickly destroyed again. Thus adding new indices at the end and searching from the start certainly is not ideal: Thus this is also slower than the intrusive list that was on master by a factor of ~25 for the code doing the node handling.

Searching for a SwNodeIndex from the end of the vector, where we likely just inserted it and then swapping it with the last entry makes the std::vector almost compatitive with the original implementation: but still 30% slower than the original implementation. (The total loading time would only have increased by 0.7% using the vector like this.)

For completeness, I also had a look at a std::unordered_map. It did a bit better than I expected, but still would have slowed down loading by 15% for the example experiment.

Having ruled out that standard containers would do much good here without lots of tweaking, I tried the sw::Ring<> class that I recently rewrote based on Boost.Intrusive as a inline header class. This was 11% quicker than the old implementation, resulting in 2.6% quicker loading for the whole document. Not exactly a heroic archivement, but also not too bad for just some 200 lines touched. So this is now on master.

Why do this linked list outperform the old linked list? Inlining. Especially, the non-inlined constructors and the destructor calling a trivial non-inlined member function. And on top of that, the contructors and the function called by the destructor called two non-inlined friend functions from a different compilation unit, making it extra hard for a compiler to optimize that. Now, link time optimization (LTO) could maybe do something about that someday. However, with LTO being in different states on different platforms and with developers possibly building without LTO for build time performance for some time, requiring the compiler/linker to be extra clever might be a mixed blessing: The developers might run into “the map is not the territory” problems.

my personal take-aways:

  • The SwNodeIndex has quite a relevant impact on performance. If you touch it, handle with care (and with valgrind).
  • The current code has decent performance, further improvement likely need deeper structual work (see e.g. Kendys bplustree stuff).
  • Intrusive linked lists might be cumbersome, but for some scenarios, they are really fast.
  • Inlining can really help (doh).
  • LTO might help someday (or not).
  • friend declarations for non-inline functions across compilation units can be a code smell for possible performance optimization.

Please excuse the extensive writing for a meager 2.6% performance improvement — the intention is to avoid somebody (including me) to redo some or all of the work above just to come to the same conclusion.


Note: Here is how this was measured:

  • gcc 4.8.3
  • boost 1.55.0
  • test document: ODF spec
  • valgrind --tool=callgrind "--toggle-collect=*LoadOwnFormat*" --callgrind-out-file=somefilename.cg ./instdir/program/soffice.bin
  • ./autogen.sh --disable-gnome-vfs --disable-odk --disable-postgresql-sdbc --disable-report-builder --disable-scripting-beanshell --enable-gio --enable-symbols --with-external-tar=... --with-junit=... --with-hamcrest=... --with-system-libs --without-doxygen --without-help --without-myspell-dicts --without-system-libmwaw --without-system-mdds --without-system-orcus --without-system-sane --without-system-vigra --without-system-libodfgen --without-system-libcmis --disable-firebird-sdbc --without-system-libebook --without-system-libetonyek --without-system-libfreehand --without-system-libabw --disable-gnome-vfs --without-system-glm --without-system-glew --without-system-librevenge --without-system-libcdr --without-system-libmspub --without-system-libvisio --without-system-libwpd --without-system-libwps --without-system-libwpg --without-system-libgltf --without-system-libpagemaker --without-system-coinmp --with-jdk-home=...

Auld Lang Syne

we’ll tak’ a cup o’ kindness yet,
for auld lang syne.

– Die Roten Rosen, Auld Lang Syne

Eike already greeted the Year of Our Lady of Discord 3181 four days ago, but I’d like to take this opportunity to have a look at the state of the LibreOffice project — the bug tracker status that is.

By the end of 2014:

unconfirmed

And a special “Thank You!” goes out to everyone who created one of the over 100 Easy Hacks written for LibreOffice in 2014, and everyone who helped, mentored or reviewed patches by new contributors to the LibreOffice project. Easy Hacks are a good way someone curious about the code of LibreOffice to get started in the project with the help of more experienced developers. If you are interested in that, you find more information on Easy Hacks on the TDF wiki. Note that there are also Easy Hacks about Design related topics and on topics related to QA.

If “I should contribute to LibreOffice once in 2015″ wasnt part of your new years resolutions yet, you are invited to add this as Easy Hacks might convince you that its worthwhile and … easy.

To Win in Toulouse

Now the only thing a gambler needs
Is a suitcase and a trunk.
– Animals, The House of the Rising Sun

So, as many others, I have been to the LibreOffice Hackfest in Toulouse which — unlike many of our other Hackfests — was part of a bigger event: Capitole du Libre. As we had our own area and were not 30+ hackers, this also had the advantage that we got quicker to work. And while I had still some boring administrative work to do, this is a Hackfest were I actually got to do some coding. I looked for some bookmark related bugs in Writer, but the first bugs I looked at were just too well suited to be Easy Hacks: fdo#51741 (“Deleting bookmark is not seen as modification of document”) and fdo#56116 (“Names of bookmarks should allow all characters which are valid in HTML anchor names (missing: ‘:’ and ‘.’)”). Both were made Easy Hacks and both are fixed on master now. I then fixed fdo#85542 (“DOCX import of overlapping bookmarks”), which proved slightly more work then expected and provided a unittest for it to never come back. I later learned that the second part was entirely nonoptional, as Markus promised he would not have let me leave Toulouse without writing a unittest for commited code. I have to admit that that is a supportable position.

Toulouse Hackfest Room

Toulouse Hackfest Room

Scenes like the above were actually rather rare as we were mostly working over our notebooks. One thing I came up with at the Hackfest, but didnt finish there was some clang plugins for finding cascading conditional ops and and conditional ops that have assignments as a sideeffect in their midst. While I found nothing as mindboggling as the tweet that gave inspiration to these plugins in sw (Writer), I found some impressive expressions that certainly wouldnt be a joy to step through in gdb (or even better: set a breakpoint in) when debugging and fixed those. We probably could make a few EasyHacks out of what these (or similar) plugins find outside of sw/ (I only looked there for now) — those are reasonably easy to refactor, but you dont want to do that in the middle of a debugging session. While at it, I also looked at clangs “value assigned, but never read” hints. Most were harmless, but also trivial to get rid of. On the other hand, some of those pointed to real logic errors that are otherwise hard to see. Like this one, which has been hiding — if git is to be believed — in plain sight ever since OpenOffice.org was originally open sourced in 2000. All in all, this experience is encouraging. Now that there are our coverity defect density is a just a rounding error above zero getting more fancy clang plugins might be promising.

Just one week after the Hackfest in Toulouse, there was another event LibreOffice took part in: The Bug Squashing Party in Munich — its encouraging to see Jonathan Riddell being a commiter to LibreOffice too now, but that is not all, we have more events coming up: The Document Foundation and LibreOffice will have an assembly at 31c3 in Hamburg, you are most welcome to drop by there! And next then, there will be FOSDEM 2015 in Bruessels, where LibreOffice will be present as usual.

Killing the NPAPI plugin

“Alles hat ein Ende nur die Wurst hat zwei.”
— horrible 1980ies german folk song lyrics

So, three month ago, the ESC discussed if we would still support the NPAPI plugin to show documents in the browser. The discussion was ignited over an mostly innocent bug: fdo#45071, but the discussion soon dived into long-term viability of this technology and with Chrome “phasing out NPAPI support over the coming year” and as Mozilla Firefox “will no longer activate most plugins by default” it became quickly clear that trying to keep the plugin alive would be throwing good developer time after bad. So in LibreOffice 4.4.x there will not be a NPAPI plugin anymore, as the patch that was waiting on gerrit for a month is now pushed to the master branch. As by the time of release plugins will not be enabled anymore on the most common browsers using it, this likely will not make much of a difference to most users anyway.

Luckily, LibreOffice is not only deprecating old technologies when they become more and more irrelevant — it is also moving towards new technologies when they gain relevance.

Free Four

The memories of a man in his old age

are the deeds of a man in his prime

– Free Four, Obscured by Clouds, Pink Floyd

I just donated to:

  • the Wikimedia Foundation
  • the OpenBSD project

and became:

Being involved in a project that is heavily driven by donations, I keep remembering myself of the importance of putting my money were my mouth is.

Some of these donations were triggered by recent events and initiatives in these projects. GNOME’s outreach for women program for example. Or OpenBSDs bold initiative in starting LibreSSL, which is doing what needed to be done and vitalizing an overlooked area of open source development. Watching them explain the status quo and how they are attacking it remembers me of LibreOffice — beyond the name. Plus, I dont want to be compared with a My little Pony character again.

goals of LibreSSL -- they remind me of something

goals of LibreSSL — they remind me of something

Others are already working examples of the long tail, crowd funding and the meshed society (Wikipedia) or tailblazing to be one (Krautreporter) beyond the world of software. The latter might also have been influenced by one of the last wishes of a man that unexpectedly died way to early. May he rest in peace.

Train Stops

And the sons of pullman porters and the sons of engineers
Ride their father’s magic carpets made of steel
Mothers with their babes asleep are rockin’ to the gentle beat
And the rhythm of the rails is all they feel

– The City of New Orleans, Willie Nelson interpreting Steve Goodman

So, LibreOffice does its releases on a train release schedule and since we recently modified the schedule a bit (by putting out the alpha1 release earlier), I took the opportunity to take a closer look and explain a bit on what we are doing. With this every 6 months of LibreOffice development currently roughly look like this:
week after x.y.0 development release candidates finalized releases
fresh stable fresh stable
0 x.y.0
1 x.y.1~rc1 x.(y-1).4~rc1
2
3 x.y.1~rc2 x.(y-1).4~rc2
4 x.y.1 x.(y-1).4
5
6 x.y.2~rc1
7
8 x.y.2~rc2 x.(y-1).5~rc1
9 x.y.2
10 x.(y-1).5~rc2
11 x.y.3~rc1 x.(y-1).5
12
13 x.(y+1)~alpha1 x.y.3~rc2
14 x.y.3
15
16
17
18 x.(y+1)~beta1
19
20 x.(y+1)~beta2 x.(y-1).6~rc1
21
22 x.(y+1)~rc1 x.(y-1).6~rc2
23 x.(y-1).6
24 x.(y+1)~rc2
25 x.(y+1)~rc3

The last two columns are most visible to most visitors of the LibreOffice website. Those are the versions found on the LibreOffice Fresh and LibreOffice Stable download pages. We are in roughly at week 18 after 4.2.0 release now, and the versions available are 4.2.4 fresh and 4.1.6 stable. A careful reader will note that according to that schedule we should be at 4.2.3 and 4.1.5 — that is true, but the 4.2 series still had an extra 4.2.1 intermediate release to adjust the schedule of 4.2 in direction of the current plan. This is not expected for future releases (also note that there is always some flexibility in the plan to allow for holidays etc.)

If you count all the prereleases, release candidates and releases, you will find that we do 25 of those in 26 weeks. Beside the fact that this is a lot of work for release engineers, one might wonder if anyone can keep up with that, and if so — how? The answer to that depends on how you are using LibreOffice.

self deployment on LibreOffice fresh

If you are an user or a small business installing LibreOffice yourself, you will probably run LibreOffice fresh and the table above simplifies for you as follows:

week after x.y.0 development release candidates finalized releases
0 x.y.0
1 x.y.1~rc1
4 x.y.1
6 x.y.2~rc1
9 x.y.2
11 x.y.3~rc1
14 x.y.3
18 x.(y+1)~beta1
20 x.(y+1)~beta2
22 x.(y+1)~rc1
24 x.(y+1)~rc2
25 x.(y+1)~rc3

The last column shows the releases you are running. If you are a member of the LibreOffice community it would be very helpful if you also spend some time of this 6 months period for three actions:

  • running at least one of the release candidates in the table (available for download here) before the final is released.
  • running at least one beta releases in the table. Note that there will be a bug hunting session on the 4.3.0 beta release this week, that will help you get started.
  • running a nightly build once anywhere in the weeks 1-18. Note that if you are getting excited about seeing the latest and greatest builds while they are still steaming, there are tools that can help you with this on Linux and Windows.

If you do these each of these three things once in the timeframe of six months and report any issues you find, you are helping LibreOffice already a lot — and you are making sure that the finalized releases of the fresh series are not only containing all the latest features, but also free of severe regressions.

bigger deployments on LibreOffice stable

If you are not installing LibreOffice yourself, but instead have a major deployment administrated centrally, things are a bit different. You might be more conservative and interested in the releases from LibreOffice stable. And you probably have professional support from a certified developer or a company employing certified developers.

week after x.y.0 development release candidates finalized releases
1 x.(y-1).4~rc1
4 x.(y-1).4
8 x.(y-1).5~rc1
11 x.(y-1).5
13 x.(y+1)~alpha1
18 x.(y+1)~beta1
20 x.(y-1).6~rc1
23 x.(y-1).6

If you intend to deploy one series of LibreOffice (e.g. 4.3), there are two things that are highly recommended to be done:

  • make the alpha or beta releases available quickly to interested volunteers in your deployment early. They might find bugs or regressions that are specific to your use of the software.
  • make the release candidates of versions that you intent to deploy available early to your users.

Of these two actions, the first is by far the most important: It identifies issues early on in the life cycle and gives both your support provider and the LibreOffice developer community at large time to resolve the issue. In fact, I would argue that if you have a major deployment, the only excuse for not making available prereleases, is that you made available nightly builds.

Ubuntu

So, Ubuntu qualifies as a “bigger deployment” and I have to take care of LibreOffice on it. Also people want to be able to run the latest and greatest LibreOffice releases from the LibreOffice fresh series. Do I follow my recommendations here? Yes, mostly I do:

  • both LibreOffice fresh and LibreOffice stable series are available from PPAs for Ubuntu and are updated regularly and quickly when an rc2 is available.
  • prereleases are made available as bibisect repositories rather quick (build on Ubuntu 12.04 LTS). In addition, fully packaged versions of LibreOffice are build in the prereleases PPA as early as starting with beta1.

So, you are invited to run or test builds from these PPAs — or download the bibisect repositories — to keep LibreOffice releases coming in the steady and stable fashion they do. Finally, there is a bug hunting session for LibreOffice this week and as said above, no matter if you are running a huge deployment or installing on your own, you are helping LibreOffice — and yourself, as a user of LibreOffice — a lot by testing the prereleases:

Follow

Get every new post delivered to your Inbox.