Re: [Tails-dev] Automated tests specification

Author: anonym
Date:
To: The Tails public development discussion list
Old-Topics: Re: [Tails-dev] Automated tests specification
Subject: Re: [Tails-dev] Automated tests specification

On 07/01/2015 07:19 PM, intrigeri wrote:
> bertagaz wrote (25 Jun 2015 09:41:23 GMT) :
>> for feature branches, we could run the full test suite only on the
>> daily builds, and either only the automated tests related to the
>> branch on every git push, and/or a subset of the whole test suite.
>
> I'm not sure what's the benefit of testing a topic branch every day if
> no new work has been pushed to it. In the general case, as a developer
> I'd rather see them tested on Git push only, with some rate limiting
> per-day if needed.

I would say that testing images built due to a Git push only is good
enough, but not ideal. We have good reasons for building branches on
Debian package uploads too, and retrying on builds triggered by that
would be nice as well.

> See below wrt. one specific case.

I couldn't find what this refers to.

>> We can also consider testing only the feature branch that are marked
>> as ReadyforQA as a beginning, even if that doesn't cover Scenario
>> 2 (developers).
>
> Absolutely, I think that would be the top-priority thing to do for
> topic branches: let's ensure we don't merge crap.

It would be great to also ensure that we don't review "crap". :) I guess
that is scenario 2, which we explicitly ignore with this proposal. I'll
post some ideas about how to deal with that in a separate thread, but I
guess this is a good start that will give us 95% of what we want.

>> We can also maybe find more ways to split the automated test suite
>> in faster subsets of feature depending on the context, define
>> priorities for built ISO and/or tests.
>
> This feels ambituous and potentially quite complex. I say let's design
> something simple and then re-adjust.

I'm not sure I like this idea in principle. With "context" I assume you
(bertagaz) mean the context of the change implemented in the ISO to be
tested, e.g. for an ISO that upgrades Tor, the context is "tests that
uses Tor". It's true that in that case we may only want to run some
subset of tests that uses Tor, but not Tails USB installation/upgrades,
for instance. This is in fact something we have done manually, and while
it has worked quite well, I think we've already "missed" stuff. After
all, these subsets would represent the obvious things to test that I as
an implementer or reviewer probably would explicitly test before asking
for a review. Hence, only running them wouldn't catch the non-obvious
edge cases that would be found outside of the subsets.

It should be noted, though, that defining such subsets actually isn't
very complex. It can be implemented with cucumbers tags, e.g. we could
have scenarios tagged @networking, @tor, @lan, @persistence,
@usb_upgrade etc. even in combinations, and then run only scenarios that
have at least one of the tags we're interested in.

>> The automated test suite MUST be able to run features in parallel
>> for a single automated build ISO. This way, if more than one
>> isotester are idle, it can use several of them to test an
>> ISO faster.
>
> Wow! Not sure if/how this can work out, or actually optimize things
> much, with the upcoming new VM snapshots handling.

I doubt we'll have this for a while and agree with downgrading it to a
"MAY". Or maybe even dropping it, see my answer to the next quite. The
remainder of the answer under this quote will deal with the issues we'd
face if we want to do this, for the interested:

The test suite wasn't written with this requirement considered, and
cucumber itself doesn't support any form of parallelism. I've seen hacks
that wrap around cucumber that allows it in various forms, like the
parallel_tests gem [1].

[1] https://github.com/grosser/parallel_tests

In principle, for parallelism to be valid, the "units" you parallelize
around must be atomic, so e.g. no global, mutable state is shared
between them, at least not without carefully working around the problems
that such sharing can cause. In its current state, without the new
snapshot system, our test suite could be parallelized on the feature
level since we do not share anything important between features.
However, *with* the new snapshot system, the checkpoints we create are
shared between features, so extra care must be taken. I guess we'd need
to use locks + waiting for their creation, or something.

So, at least from the test suite's *internal* perspective it should be
possible and perhaps not even super difficult (although I'm sure I've
missed something that will make it more complex). I do worry about the
logic needed outside the test suite though, that organizes the parallel
runs and combines results (e.g. error logs) and such. Sounds very hairy.

> Anyway: I doubt we'll have the situation when we have idle
> isotesterN's -- we're rather trying to limit the workload to something
> they can handle -- so perhaps it's not worth putting too much time
> into this?

Actually, let's ask the question: what would parallel tests actually do
for us? Given intrigeri's observation (i.e. "I doubt we'll [...] have
idle isotesterN's"), it would just speed individual test suite runs in
the order they are in the queue, and it would only improve the overall
waiting time if we have more isotesters than images we want to test. But
we're already starved in terms of how many images we can test, so as we
get new hardware we'll just start to test more images for quite a while,
making that break point ever distant in the future. Hence I think we can
completely stop thinking about parallelism for quite some time.

>> The automated test suite MUST be able to accept a treshold of
>> failures for some features before sending notifications. This can
>> help if a scenario fails because of a network congestion, but other
>> use cases will probably raise.
>
> The current running theory is that the test suite *itself* (as opposed
> to the way it's being run e.g. by Jenkins) should handle this itself,
> see e.g. https://labs.riseup.net/code/issues/9515. I prefer it a lot
> more to having Jenkins ignore failures, as it also benefits people who
> run the test suite outside of Jenkins.

I agree: the test suite itself has the most detailed information about
errors and flexibility to re-run stuff in a fine-grained way.

> But realistically, surely we'll
> anyway have transient failures, and I'm not sure what's the best way
> to deal with it. I doubt it parameterizes a lot how we design the
> whole thing, though: it seems to be only about Jenkins publishers
> configuration, and should not impact the rest, so perhaps we can just
> postpone this topic (and not call it a MUST) until #9515 and friends
> are resolved, and an initial deployment makes our actual needs
> clearer? (See, I'm not *always* in favour of over-engineering
> things ;)

I personally hope that #9519 ("Make the test suite more deterministic
through network simulation") will be the final solution for the
"transient (network) errors" problem.

Cheers!

This message is part of the following thread:
	the complete thread tree sorted by date
	bertagaz at
	anonym at