So, Im back from vacation. One of the things I did was reorganizing my hardware, and for doing so, I bought a wattmeter to measure what my machines and toys actually consume. A lot of the stuff was what I expected, but there where a few nasty surprises:
The wimpy Ideapad S12 with its Atom CPU eats more power when idling than the Thinkpad W520 with its beefy i7 Quad-Core and 16GB of RAM (13 Watts vs. 10 Watts).
My TV doing nothing but waiting for the remote to tell it to turn itself on eats more power that each of my notebooks (15 Watts vs. 10/13 Watts).
Running my desktop (Bertha) as an tinderbox for LibreOffice 24/7 would cost me ~1.000EUR per annum. Doing it with three of those boxes would a very expensive and noisy alternative to what others sell as a room heater.
My TV eats 30 Watts more when displaying the black screen of a disconnected HDMI signal than with normal TV display. Maybe its expensive to search for a signal?
Compiling LibreOffice without ccache on my Notebook kicks the power consumption to 90 Watts — but only for a few minutes. Then the thermal controls throttle the machine down to 70 or even 35 Watts, which seems all the machine can disperse over sustained periods.
And then there where these leftover pieces to measure, no surprises there, just a confirmation of my suspicion that the old Asus notebook I run as a home server is eating way too much power:
(all values in Watt)
bits and pieces
mic preamp off
mic preamp on
“home server” (decommissioned Asus Z53 notebook)
My tentative conclusions are:
replacing my old “home server” with something ARM-based like a Raspberry Pi or a Pandaboard breaks even after one year — I should do that.
Even when under load, a ARM-based Pandaboard has a modest power consumption.
I will completely turn off my TV on principle as the standby consumption is just pure impudence. As a bonus it prevents my BluRay player from kicking on the 100 Watt TV when I throw in a audio CD (Thanks Panasonic, for providing this excellent and “useful” integration).
A cheap Netbook might be less powerful, but it hardly consumes less than a high-end Notebook when idling. You get what you pay for.
I bought a cooler for my Notebook, hoping to unlock it from choking itself with thermal restriction. It should be a good idea in general as the logs not only talked about throttling, but also about more scary MCEs.
Buying a wattmeter is a good decision, when you run nontrivial amounts of hardware.
Addendum: The 2.5 Watts for Bertha when off may seem bad — but its not at all, if you consider it is running a lights-out management on that.
So, I did some work recently to possibly make our tinderboxes more efficient and scalable — which is a bit ironic as I recently hinted others at Paul Grahams advise to “do things that do not scale”. At LibreOffice we currently have tinderbox setup that served us as good as it could in the first years: It gave a quick overview of the basic health of current development branch of LibreOffice. But LibreOffice takes some time to build and test and with 50-100 commits to master each day it is playing catch-up with a moving target.
And whle they did a good job at this, they also have a few distinct weaknesses: For one, these tinderboxes would also mail everyone who commited on a branch since the last known good build if they were unhappy. Since they do not know anything about each other, with a generic breaker each tinderbox would do that on its own. In a tragic imitation of a certain comic this would result in the incremental Linux tinderbox reporting after 5 minutes something went wrong, with all the other tinderboxes dribbling in with the same message over time, finalized by the full Windows build tinderbox excitedly reporting to 200 people (as a slow builder would have more commits between builds) that something was amiss — possibly hours after it was fixed again. This resulted in these messages being filtered away by most users and even worse: the Windows tinderbox reports, which should be the most useful of them, as most developers use Linux as development platform, being easily ignored as “someone else broke it”.
So I set out improve the situation with the initial goal:
to start make tinderboxes being able to coordinate
to make it possible to easily collate the information from multiple builders
while leaving the control over what is build with the owner of the tinderbox (as most of these boxes are sponsored, we dont want to make them into drones)
for slow platforms like Windows or ARM enable bisecting a breaker as the frequency of builds is too low for those in the commit range to feel personally responsible
while bisecting a breaker, also keep an eye one the branch moving forward (as in: dont try to bisect a breaker further when it was fixed in the meantime)
And I am happy to report to have reached this initial goal with tb3 which is a tinderbox coordinator written in Python3 and having as many lines of codes for unittests as for the product itself. So how is tb3 intended to work?
Leaving control over what is build with the owner of the tinderbox
tb3 is build around the idea, that the information about the state of the source is collected and managed by a central “tinderbox coordinator” and one or more tinderboxes go to it to:
ask for something to build, giving the coordinator a branch and a platform that they are interested to work for
report that they have started to build a certain state and give an estimate on when they will be finished
report that they have finished to build a certain state and give a result
Note that the first two steps are separate: The tinderbox is essentially just asking for a suggestion on what to build — its not promising to actually follow these proposals. It can come back and report to be building something completely different(*). Now the proposals the coordinator hands out come with a score. Just looking at a classical tinderbox mode, which will always build the current HEAD of a branch on a specific platform, the score of the highest ranking proposal will be equal to the number of commits since the last finished build. With tb3, a tinderbox can watch multiple branches (e.g. a development branch and a release branch) and commit itself to building the one which saw the most commits since the last finished. It can also use multipliers and use something like “if there are 10 times as many new commits on the development branch as on the release branch, then build that, otherwise stick to the release branch” or use limits: “I only want run a build if there are at least 5 new commits”.
Coordinating multiple tinderboxes
So how do we coordinate multiple tinderboxes and ensure that e.g. if someone pushes 9 commits to master, we do not get five Linux tinderboxes to build that last commit and then sprinkle everyones mailbox over the next hour? Here is where the “coordinator” part truly kicks in. The first tinderbox that asks for something to build will get proposals with scores as shown by the green line in the chart below: The highest score is the “9” of the newest commit — the commit that has the biggest distance from the last build. If the first tinderbox reported to have taken on that proposed build, what would a second tinderbox that also asks to build something see? It makes little sense to give it the same build as the first tinderbox. Optimistically assuming that tinderbox will report something back, the best thing this second box can do is build something with the biggest distance to to the finished build and to the build running on the first tinderbox. As such, the coordinator will send it scored as denoted by the blue line and if the tinderbox accepts it will build commit 5 — which is why a third tinderbox asking for something to build, while the other two are running, will get proposals as per the pink line and thus be suggested to build commit 3.
Trusting tinderboxes … a bit
Now these tinderboxes “promised” to build some commit. But can we give the tinderbox unconstrained trust? E.g. should we never ever tell any other tinderbox to build this one commit, because some other tinderbox promised to build it? The answer is obviously no: As a tinderbox is a gift, the owner should be allowed to reboot or reassign a tinderbox for other tasks at any time with imprudence. This is why the tinderbox gives the coordinator an estimated duration for its build and the tinderbox coordinator “reserves” this commit for that time. As you did see in the last chart the commit that just had a tinderbox running got scores of zero. As time goes by the coodinator looses trust in the tinderbox to still report back: the chart below shows the scores given after twice the time the tinderbox gave as an estimate has passed. You see the blue line now scores highest at commit 6, not commit 5 and the pink line scores highest at commit 5, not commit 3 — so as the coordinator looses trust in the running tinderboxes to come back, it again proposes to do builds closer to the already scheduled ones.
Another thing to note is that the highest score is rising: While in the first chart, each running tinderbox lowered the highest score by one (green line: highest at 9, blue line: highest at 8, pink line: highest at 7) after twice the time has passed, the highscores are all around 9 again.
Bisecting a breaker
Should a branch be broken, it usually would be very helpful if the tinderboxes would help bisecting. This is especially true for slow platforms and builds like Windows, ARM or the document load torturer by Markus. However, we do not want the tinderbox to over fixate on that, as our branch is a moving target. If there is a build breaker somewhere in a range of 256 commits, we do not want a slow tinderbox to bust away for 8 builds to find the offending one, and while doing that leave the head of the branch unwatched for a long time. So by default, the bisecting proposals have a highscore that is equal to the number of commits to bisect still. As such, by default, a tinderbox will be told to bisect — as long as:
the head of the branch is still broken
there are more commits in the bisect range, than there are new commit on the branch.
Otherwise, the tinderbox will be told to build the latest commit, to check if the branch is still broken or fixed in the meantime. As such the coordinator will guard against commiting tinderboxes to bisect a breaker that was already fixed. Therefore the coordinator knows a few more states than plain ‘good’ or ‘bad’ for a commit:
UNKNOWN — nothing known yet
RUNNING — a tinderbox is currently claiming to run this commit
GOOD — a tinderbox was happy with it
BAD — a tinderbox was unhappy with it
ASSUMED_GOOD — not tested, but the previous and the next finished build were good
ASSUMED_BAD — not tested, but the previous and the next finished build were bad
POSSIBLY_BREAKING — not tested, but the previous finished build was good and the next finished build was bad
POSSIBLY_FIXING — not tested, but the previous finished build was bad and the next finished build was good
BREAKING — this one was bad, while the previous commit was good
Here is some example output
$ ./tb3-show-history --repo ~/checkouts/core.git --platform linux --branch 65134fb75c3e94b7869fb6d490f88bf4b252760e --history-count 10
65134fb75c3e94b7869fb6d490f88bf4b252760e started on 2013-07-25 17:27:30.383767 with builder ubuntu-tinderbox and finished on 2013-07-25 17:40:41.226494 -- artifacts at 65134fb75c3e94b7869fb6d490f88bf4b252760e-137476605045.out, state: BAD (took 0:13:10.842727)
6100d94078d37cb1413a0e45460cee480ba3e211 started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_BAD
24d46ea66485ff8b5bca49ec587b41547787bf42 started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_BAD
d041980a7aad0e6d111752ca98db42f9853a3c6b started on 2013-07-25 17:40:52.587150 with builder ubuntu-tinderbox and finished on 2013-07-25 17:53:04.204549 -- artifacts at d041980a7aad0e6d111752ca98db42f9853a3c6b-137476685269.out, state: BAD (took 0:12:11.617399)
3b28ec6855e5df0629427752d7dafae1f0a277d4 started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_BAD
cca0b9ae02603ab88ec7d8810aab2a8a1b4efda2 started on 2013-07-25 18:08:01.201013 with builder ubuntu-tinderbox and finished on 2013-07-25 18:20:39.536451 -- artifacts at cca0b9ae02603ab88ec7d8810aab2a8a1b4efda2-137476848124.out, state: BREAKING (took 0:12:38.335438)
767b02bd7614059dd80d0cd1be306d9b63291f31 started on 2013-07-25 17:53:14.745394 with builder ubuntu-tinderbox and finished on 2013-07-25 18:07:42.527839 -- artifacts at 767b02bd7614059dd80d0cd1be306d9b63291f31-137476759480.out, state: GOOD (took 0:14:27.782445)
c852f83bc4d91de51c61ad4be0edf1b848247eaa started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_GOOD
0d874ee2e452ea67c03a27bf1a7f26d0ffc617dc started on None with builder None and finished on None -- artifacts at None, state: ASSUMED_GOOD
ff14c3b595ebe71153f97ebb8871cf024ea76959 started on 2013-07-25 17:12:58.024727 with builder ubuntu-tinderbox and finished on 2013-07-25 17:27:17.439374 -- artifacts at ff14c3b595ebe71153f97ebb8871cf024ea76959-137476517809.out, state: GOOD (took 0:14:19.414647)
Some details and missing bits
The coordinator stores the results in git notes as JSON objects. This has multiple advantages: There is no need for a external database and the state of the notes are under revision control. It also has one disadvantage: Its not exactly quick. However the revision control can help to mitigate that mostly — as e.g. a webfrontend can easily ask: “what changed on the state since I last polled you?” and do incremental updates from there.
Which brings me to the missing bits: The stuff that tells the world the state of the repo on a webfrontend, RSS feed, IRC Bots or via email digests. The second missing bit is some kind of privilege separating between the tinderboxes and the coordinator. tb3 is currently churning away on the Sun Ultra 24 that I donated to the Document Foundation doing duty as an Ubuntu tinderbox, but coordinator and tinderbox are still running on the same account — even though as separate processes. As setuid for scripts is messy business, I plan to give tb3 a trivial REST-like interface on a non-public HTTP server. In addition to being able to offload the authentication and authorization problems outside of tb3 to something considering it a solved problem, it also makes integration in webfrontends etc. simple (esp. given that all the data is in JSON already anyway.)
In the long run, the scoring of tb3 also should make it easier for the buildbots that do duty on gerrit to make a call on if they should test build something there or if their help is more needed for tinderbox duty.
coordinate multiple tinderboxes working on the same build scenario or branch
coordinate one tinderbox working on multiple build scenarios or multiple branches
make tinderboxes bisect without loosing sight of the head of a branch
especially help tests and builds that are painfully slow
They can also create builds for bibisect along the way, but that is a story for another day.
(*) This is helpful for some test suites like e.g. subsequentcheck. If you do a build as proposed by the coordinator, you can cheaply report back the result of the build only. And since you then can just the subsequentcheck test suite on top of the build of that commit (and only on that commit), you can then report to be running these tests and report the results without ever caring if the coordinator thinks this commit has as high priority for this.
postscriptum: Yeah, I know, I promised to be on vacation now and not harass you with any posts, but this is a scheduled blogpost and as such does not count.
After having uploaded slides already quite some time ago, its time for some update. So I added slides to the talks I gave at FISL 14 and 29c3 and added some video links to the descriptions for the FISL, 29c3 and the LibreOffice conference 2012 talks. Here are all the slides. And with that last long pending task done, I will bolt out for vacation. Enjoy!