Powerplay

I’ve got the power.

— The power, Snap

So, Im back from vacation. One of the things I did was reorganizing my hardware, and for doing so, I bought a wattmeter to measure what my machines and toys actually consume. A lot of the stuff was what I expected, but there where a few nasty surprises:

 (all values in Watt)
Ideapad S12 Thinkpad W520 Bertha TV Pandaboard ES
power supply only 0 0.2 2.5 0.3 0.1
standby 0.3 0.2 2.5 15 3.2
desktop w/o display 13 10 122 130 6.1
with display 18 16 180 100
g+/gmail 20 20 212
compiling 90/70/35 417 8.1

From this set a few surprising takeaways:

  • The wimpy Ideapad S12 with its Atom CPU eats more power when idling than the Thinkpad W520 with its beefy i7 Quad-Core and 16GB of RAM (13 Watts vs. 10 Watts).
  • My TV doing nothing but waiting for the remote to tell it to turn itself on eats more power that each of my notebooks (15 Watts vs. 10/13 Watts).
  • Just opening Firefox with one tab google plus and one tab google mail eats 4 extra Watts on my notebook and 32 extra Watts on my desktop. It seems all that JavaScript voodoo does not come free at all: ~6 Euros per month when I leave it open on my desktop all the time.
  • Running my desktop (Bertha) as an tinderbox for LibreOffice 24/7 would cost me ~1.000EUR per annum. Doing it with three of those boxes would a very expensive and noisy alternative to what others sell as a room heater.
  • My TV eats 30 Watts more when displaying the black screen of a disconnected HDMI signal than with normal TV display. Maybe its expensive to search for a signal?
  • Compiling LibreOffice without ccache on my Notebook kicks the power consumption to 90 Watts — but only for a few minutes. Then the thermal controls throttle the machine down to 70 or even 35 Watts, which seems all the machine can disperse over sustained periods.
My electricity is 100% from water power, btw. Admittedly — its unlikely to come from the Hoover dam, though (Image copyright CC BY 2.0 by Gordon Wrigley)

And then there where these leftover pieces to measure, no surprises there, just a confirmation of my suspicion that the old Asus notebook I run as a home server is eating way too much power:

(all values in Watt)
bits and pieces
mic preamp off 1.1
mic preamp on 10
hub 5
phone 4
“home server” (decommissioned Asus Z53 notebook) 30

My tentative conclusions are:

  • replacing my old “home server” with something ARM-based like a Raspberry Pi or a Pandaboard breaks even after one year — I should do that.
  • Even when under load, a ARM-based Pandaboard has a modest power consumption.
  • I will completely turn off my TV on principle as the standby consumption is just pure impudence. As a bonus it prevents my BluRay player from kicking on the 100 Watt TV when I throw in a audio CD (Thanks Panasonic, for providing this excellent and “useful” integration).
  • A cheap Netbook might be less powerful, but it hardly consumes less than a high-end Notebook when idling. You get what you pay for.
  • I bought a cooler for my Notebook, hoping to unlock it from choking itself with thermal restriction. It should be a good idea in general as the logs not only talked about throttling, but also about more scary MCEs.
  • Buying a wattmeter is a good decision, when you run nontrivial amounts of hardware.

Addendum: The 2.5 Watts for Bertha when off may seem bad — but its not at all, if you consider it is running a lights-out management on that.

tb3 — more efficient tinderboxing

Stop right there I gotta know right now before we go any further

Let me sleep on it and I’ll give you an answer in the morning

— Paradise by the Dashboard Light, Bat out of Hell, Meat Loaf

So, I did some work recently to possibly make our tinderboxes more efficient and scalable — which is a bit ironic as I recently hinted others at Paul Grahams advise to “do things that do not scale”. At LibreOffice we currently have tinderbox setup that served us as good as it could in the first years: It gave a quick overview of the basic health of current development branch of LibreOffice. But LibreOffice takes some time to build and test and with 50-100 commits to master each day it is playing catch-up with a moving target.

And whle they did a good job at this, they also have a few distinct weaknesses: For one, these tinderboxes would also mail everyone who commited on a branch since the last known good build if they were unhappy. Since they do not know anything about each other, with a generic breaker each tinderbox would do that on its own. In a tragic imitation of a certain comic this would result in the incremental Linux tinderbox reporting after 5 minutes something went wrong, with all the other tinderboxes dribbling in with the same message over time, finalized by the full Windows build tinderbox excitedly reporting to 200 people (as a slow builder would have more commits between builds) that something was amiss — possibly hours after it was fixed again. This resulted in these messages being filtered away by most users and even worse: the Windows tinderbox reports, which should be the most useful of them, as most developers use Linux as development platform, being easily ignored as “someone else broke it”.

So I set out improve the situation with the initial goal:

  • to start make tinderboxes being able to coordinate
  • to make it possible to easily collate the information from multiple builders
  • while leaving the control over what is build with the owner of the tinderbox (as most of these boxes are sponsored, we dont want to make them into drones)
  • for slow platforms like Windows or ARM enable bisecting a breaker as the frequency of builds is too low for those in the commit range to feel personally responsible
  • while bisecting a breaker, also keep an eye one the branch moving forward (as in: dont try to bisect a breaker further when it was fixed in the meantime)

And I am happy to report to have reached this initial goal with tb3 which is a tinderbox coordinator written in Python3 and having as many lines of codes for unittests as for the product itself. So how is tb3 intended to work?

Leaving control over what is build with the owner of the tinderbox

tb3 is build around the idea, that the information about the state of the source is collected and managed by a central “tinderbox coordinator” and one or more tinderboxes go to it to:

  • ask for something to build, giving the coordinator a branch and a platform that they are interested to work for
  • report that they have started to build a certain state and give an estimate on when they will be finished
  • report that they have finished to build a certain state and give a result

Note that the first two steps are separate: The tinderbox is essentially just asking for a suggestion on what to build — its not promising to actually follow these proposals. It can come back and report to be building something completely different(*). Now the proposals the coordinator hands out come with a score. Just looking at a classical tinderbox mode, which will always build the current HEAD of a branch on a specific platform, the score of the highest ranking proposal will be equal to the number of commits since the last finished build. With tb3, a tinderbox can watch multiple branches (e.g. a development branch and a release branch) and commit itself to building the one which saw the most commits since the last finished. It can also use multipliers and use something like “if there are 10 times as many new commits on the development branch as on the release branch, then build that, otherwise stick to the release branch” or use limits: “I only want run a build if there are at least 5 new commits”.

Coordinating multiple tinderboxes

So how do we coordinate multiple tinderboxes and ensure that e.g. if someone pushes 9 commits to master, we do not get five Linux tinderboxes to build that last commit and then sprinkle everyones mailbox over the next hour? Here is where the “coordinator” part truly kicks in. The first tinderbox that asks for something to build will get proposals with scores as shown by the green line in the chart below: The highest score is the “9” of the newest commit — the commit that has the biggest distance from the last build. If the first tinderbox reported to have taken on that proposed build, what would a second tinderbox that also asks to build something see? It makes little sense to give it the same build as the first tinderbox. Optimistically assuming that tinderbox will report something back, the best thing this second box can do is build something with the biggest distance to to the finished build and to the build running on the first tinderbox. As such, the coordinator will send it scored as denoted by the blue line and if the tinderbox accepts it will build commit 5 — which is why a third tinderbox asking for something to build, while the other two are running, will get proposals as per the pink line and thus be suggested to build commit 3.

proposal scores with tinderboxes just started
proposal scores with tinderboxes just started

Trusting tinderboxes … a bit

Now these tinderboxes “promised” to build some commit. But can we give the tinderbox unconstrained trust? E.g. should we never ever tell any other tinderbox to build this one commit, because some other tinderbox promised to build it? The answer is obviously no: As a tinderbox is a gift, the owner should be allowed to reboot or reassign a tinderbox for other tasks at any time with imprudence. This is why the tinderbox gives the coordinator an estimated duration for its build and the tinderbox coordinator “reserves” this commit for that time. As you did see in the last chart the commit that just had a tinderbox running got scores of zero. As time goes by the coodinator looses trust in the tinderbox to still report back: the chart below shows the scores given after twice the time the tinderbox gave as an estimate has passed. You see the blue line now scores highest at commit 6, not commit 5 and the pink line scores highest at commit 5, not commit 3 — so as the coordinator looses trust in the running tinderboxes to come back, it again proposes to do builds closer to the already scheduled ones.

proposed scores with tinderbox results overdue
proposed scores with tinderbox results overdue

Another thing to note is that the highest score is rising: While in the first chart, each running tinderbox lowered the highest score by one (green line: highest at 9, blue line: highest at 8, pink line: highest at 7) after twice the time has passed, the highscores are all around 9 again.

Bisecting a breaker

Should a branch be broken, it usually would be very helpful if the tinderboxes would help bisecting. This is especially true for slow platforms and builds like Windows, ARM or the document load torturer by Markus. However, we do not want the tinderbox to over fixate on that, as our branch is a moving target. If there is a build breaker somewhere in a range of 256 commits, we do not want a slow tinderbox to bust away for 8 builds to find the offending one, and while doing that leave the head of the branch unwatched for a long time. So by default, the bisecting proposals have a highscore that is equal to the number of commits to bisect still. As such, by default, a tinderbox will be told to bisect — as long as:

  • the head of the branch is still broken
  • there are more commits in the bisect range, than there are new commit on the branch.
scores of commits in a range to bisect
proposal scores of commits in a range to bisect

Otherwise, the tinderbox will be told to build the latest commit, to check if the branch is still broken or fixed in the meantime. As such the coordinator will guard against commiting tinderboxes to bisect a breaker that was already fixed. Therefore the coordinator knows a few more states than plain ‘good’ or ‘bad’ for a commit:

  • UNKNOWN — nothing known yet
  • RUNNING — a tinderbox is currently claiming to run this commit
  • GOOD — a tinderbox was happy with it
  • BAD — a tinderbox was unhappy with it
  • ASSUMED_GOOD — not tested, but the previous and the next finished build were good
  • ASSUMED_BAD — not tested, but the previous and the next finished build were bad
  • POSSIBLY_BREAKING — not tested, but the previous finished build was good and the next finished build was bad
  • POSSIBLY_FIXING — not tested, but the previous finished build was bad and the next finished build was good
  • BREAKING — this one was bad, while the previous commit was good

Here is some example output

$ ./tb3-show-history --repo ~/checkouts/core.git --platform linux --branch 65134fb75c3e94b7869fb6d490f88bf4b252760e --history-count 10
65134fb75c3e94b7869fb6d490f88bf4b252760e started on 2013-07-25 17:27:30.383767 with builder ubuntu-tinderbox and finished on 2013-07-25 17:40:41.226494 -- artifacts at 65134fb75c3e94b7869fb6d490f88bf4b252760e-137476605045.out, state: BAD (took 0:13:10.842727)
6100d94078d37cb1413a0e45460cee480ba3e211 started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_BAD
24d46ea66485ff8b5bca49ec587b41547787bf42 started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_BAD
d041980a7aad0e6d111752ca98db42f9853a3c6b started on 2013-07-25 17:40:52.587150 with builder ubuntu-tinderbox and finished on 2013-07-25 17:53:04.204549 -- artifacts at d041980a7aad0e6d111752ca98db42f9853a3c6b-137476685269.out, state: BAD (took 0:12:11.617399)
3b28ec6855e5df0629427752d7dafae1f0a277d4 started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_BAD
cca0b9ae02603ab88ec7d8810aab2a8a1b4efda2 started on 2013-07-25 18:08:01.201013 with builder ubuntu-tinderbox and finished on 2013-07-25 18:20:39.536451 -- artifacts at cca0b9ae02603ab88ec7d8810aab2a8a1b4efda2-137476848124.out, state: BREAKING (took 0:12:38.335438)
767b02bd7614059dd80d0cd1be306d9b63291f31 started on 2013-07-25 17:53:14.745394 with builder ubuntu-tinderbox and finished on 2013-07-25 18:07:42.527839 -- artifacts at 767b02bd7614059dd80d0cd1be306d9b63291f31-137476759480.out, state: GOOD (took 0:14:27.782445)
c852f83bc4d91de51c61ad4be0edf1b848247eaa started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_GOOD
0d874ee2e452ea67c03a27bf1a7f26d0ffc617dc started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_GOOD
ff14c3b595ebe71153f97ebb8871cf024ea76959 started on 2013-07-25 17:12:58.024727 with builder ubuntu-tinderbox and finished on 2013-07-25 17:27:17.439374 -- artifacts at ff14c3b595ebe71153f97ebb8871cf024ea76959-137476517809.out, state: GOOD (took 0:14:19.414647)

Some details and missing bits

The coordinator stores the results in git notes as JSON objects. This has multiple advantages: There is no need for a external database and the state of the notes are under revision control. It also has one disadvantage: Its not exactly quick. However the revision control can help to mitigate that mostly  — as e.g. a webfrontend can easily ask: “what changed on the state since I last polled you?” and do incremental updates from there.

Which brings me to the missing bits: The stuff that tells the world the state of the repo on a webfrontend, RSS feed, IRC Bots or via email digests. The second missing bit is some kind of privilege separating between the tinderboxes and the coordinator. tb3 is currently churning away on the Sun Ultra 24 that I donated to the Document Foundation doing duty as an Ubuntu tinderbox, but coordinator and tinderbox are still running on the same account — even though as separate processes. As setuid for scripts is messy business, I plan to give tb3 a trivial REST-like interface on a non-public HTTP server. In addition to being able to offload the authentication and authorization problems outside of tb3 to something considering it a solved problem, it also makes integration in webfrontends etc. simple (esp. given that all the data is in JSON already anyway.)

In the long run, the scoring of tb3 also should make it easier for the buildbots that do duty on gerrit to make a call on if they should test build something there or if their help is more needed for tinderbox duty.

tl;dr

tb3 can:

  • coordinate multiple tinderboxes working on the same build scenario or branch
  • coordinate one tinderbox working on multiple build scenarios or multiple branches
  • make tinderboxes bisect without loosing sight of the head of a branch
  • especially help tests and builds that are painfully slow

They can also create builds for bibisect along the way, but that is a story for another day.

(*) This is helpful for some test suites like e.g. subsequentcheck. If you do a build as proposed by the coordinator, you can cheaply report back the result of the build only. And since you then can just the subsequentcheck test suite on top of the build of that commit (and only on that commit), you can then report to be running these tests and report the results without ever caring if the coordinator thinks this commit has as high priority for this.

postscriptum: Yeah, I know, I promised to be on vacation now and not harass you with any posts, but this is a scheduled blogpost and as such does not count.