Re: [Tails-dev] Automated tests specification

Delete this message

Reply to this message
Author: bertagaz
Date:  
To: The Tails public development discussion list
Subject: Re: [Tails-dev] Automated tests specification
Hi,

Many thanks for your deep review and opinion share. More below.

On Wed, Aug 26, 2015 at 02:00:25PM +0200, anonym wrote:
> On 07/01/2015 07:19 PM, intrigeri wrote:
> > bertagaz wrote (25 Jun 2015 09:41:23 GMT) :
> >> for feature branches, we could run the full test suite only on the
> >> daily builds, and either only the automated tests related to the
> >> branch on every git push, and/or a subset of the whole test suite.
> >
> > I'm not sure what's the benefit of testing a topic branch every day if
> > no new work has been pushed to it. In the general case, as a developer
> > I'd rather see them tested on Git push only, with some rate limiting
> > per-day if needed.
>
> I would say that testing images built due to a Git push only is good
> enough, but not ideal. We have good reasons for building branches on
> Debian package uploads too, and retrying on builds triggered by that
> would be nice as well.


Yes, the APT upload use case is thought about but reserved for future
developments, it's not supposed to be taken care of in the current
milestone.

The rational behind my proposal was that it would at least raise the
issue if there were some external changes that breaks the build of this
feature branch (mostly, changes in APT/Debian).

> > See below wrt. one specific case.
>
> I couldn't find what this refers to.
>
> >> We can also consider testing only the feature branch that are marked
> >> as ReadyforQA as a beginning, even if that doesn't cover Scenario
> >> 2 (developers).
> >
> > Absolutely, I think that would be the top-priority thing to do for
> > topic branches: let's ensure we don't merge crap.
>
> It would be great to also ensure that we don't review "crap". :) I guess
> that is scenario 2, which we explicitly ignore with this proposal. I'll
> post some ideas about how to deal with that in a separate thread, but I
> guess this is a good start that will give us 95% of what we want.


Ok, so that looks like the way to cut down the number of automated tests
everyone agrees on so far.

Now I'm wondering if we should implement this at first, or just start
with testing all of them on eveyr push and see if we need to switch to
that solution if our infra can't cope with it.

> >> We can also maybe find more ways to split the automated test suite
> >> in faster subsets of feature depending on the context, define
> >> priorities for built ISO and/or tests.
> >
> > This feels ambituous and potentially quite complex. I say let's design
> > something simple and then re-adjust.
>
> I'm not sure I like this idea in principle. With "context" I assume you
> (bertagaz) mean the context of the change implemented in the ISO to be
> tested, e.g. for an ISO that upgrades Tor, the context is "tests that
> uses Tor". It's true that in that case we may only want to run some
> subset of tests that uses Tor, but not Tails USB installation/upgrades,
> for instance. This is in fact something we have done manually, and while
> it has worked quite well, I think we've already "missed" stuff. After
> all, these subsets would represent the obvious things to test that I as
> an implementer or reviewer probably would explicitly test before asking
> for a review. Hence, only running them wouldn't catch the non-obvious
> edge cases that would be found outside of the subsets.
>
> It should be noted, though, that defining such subsets actually isn't
> very complex. It can be implemented with cucumbers tags, e.g. we could
> have scenarios tagged @networking, @tor, @lan, @persistence,
> @usb_upgrade etc. even in combinations, and then run only scenarios that
> have at least one of the tags we're interested in.


Yes, cucumber tags were the solution I was thinking about to implement
this. But I get your "do not miss stuffs" argument and it sounds
completely rational to me.

Yet that could be an option we could combine with the previous one
("test only ReadyForQA branches"): we could test only specific features
for all the dev life of a branch, and then once it is marked as
ReadyForQA, run the whole test suite on it. That would pretty much looks
like the way you describe the development of a branch.

Could be interesting, especially if a dev is using the test suite as a
TDD solution. But I guess in this case she doesn't need such an input
from Jenkins, she'd already have it by herself.

> >> The automated test suite MUST be able to run features in parallel
> >> for a single automated build ISO. This way, if more than one
> >> isotester are idle, it can use several of them to test an
> >> ISO faster.
> >
> > Wow! Not sure if/how this can work out, or actually optimize things
> > much, with the upcoming new VM snapshots handling.
>
> I doubt we'll have this for a while and agree with downgrading it to a
> "MAY". Or maybe even dropping it, see my answer to the next quite. The
> remainder of the answer under this quote will deal with the issues we'd
> face if we want to do this, for the interested:


Agree on the MAY, I've already updated the blueprint about that.

> The test suite wasn't written with this requirement considered, and
> cucumber itself doesn't support any form of parallelism. I've seen hacks
> that wrap around cucumber that allows it in various forms, like the
> parallel_tests gem [1].
>
> [1] https://github.com/grosser/parallel_tests
>
> In principle, for parallelism to be valid, the "units" you parallelize
> around must be atomic, so e.g. no global, mutable state is shared
> between them, at least not without carefully working around the problems
> that such sharing can cause. In its current state, without the new
> snapshot system, our test suite could be parallelized on the feature
> level since we do not share anything important between features.
> However, *with* the new snapshot system, the checkpoints we create are
> shared between features, so extra care must be taken. I guess we'd need
> to use locks + waiting for their creation, or something.


Hmm true, didn't think about this new shiny snapshot feature of yours
when writing it. It could maybe be workarounded by sharing with NFS or
another mechanism the created snapshots. But I agree it looks quite
complicated.

> So, at least from the test suite's *internal* perspective it should be
> possible and perhaps not even super difficult (although I'm sure I've
> missed something that will make it more complex). I do worry about the
> logic needed outside the test suite though, that organizes the parallel
> runs and combines results (e.g. error logs) and such. Sounds very hairy.


My idea was to rather use Jenkins ability for that. It has ways to start
jobs in parallel, wait for their result and then aggregate them. So it
may not require any change in our test suite, but doesn't help in the
snapshot sharing.

> > Anyway: I doubt we'll have the situation when we have idle
> > isotesterN's -- we're rather trying to limit the workload to something
> > they can handle -- so perhaps it's not worth putting too much time
> > into this?
>
> Actually, let's ask the question: what would parallel tests actually do
> for us? Given intrigeri's observation (i.e. "I doubt we'll [...] have
> idle isotesterN's"), it would just speed individual test suite runs in
> the order they are in the queue, and it would only improve the overall
> waiting time if we have more isotesters than images we want to test. But
> we're already starved in terms of how many images we can test, so as we
> get new hardware we'll just start to test more images for quite a while,
> making that break point ever distant in the future. Hence I think we can
> completely stop thinking about parallelism for quite some time.


Agreed.

> >> The automated test suite MUST be able to accept a treshold of
> >> failures for some features before sending notifications. This can
> >> help if a scenario fails because of a network congestion, but other
> >> use cases will probably raise.
> >
> > The current running theory is that the test suite *itself* (as opposed
> > to the way it's being run e.g. by Jenkins) should handle this itself,
> > see e.g. https://labs.riseup.net/code/issues/9515. I prefer it a lot
> > more to having Jenkins ignore failures, as it also benefits people who
> > run the test suite outside of Jenkins.
>
> I agree: the test suite itself has the most detailed information about
> errors and flexibility to re-run stuff in a fine-grained way.


Agreed too.

> > But realistically, surely we'll
> > anyway have transient failures, and I'm not sure what's the best way
> > to deal with it. I doubt it parameterizes a lot how we design the
> > whole thing, though: it seems to be only about Jenkins publishers
> > configuration, and should not impact the rest, so perhaps we can just
> > postpone this topic (and not call it a MUST) until #9515 and friends
> > are resolved, and an initial deployment makes our actual needs
> > clearer? (See, I'm not *always* in favour of over-engineering
> > things ;)
>
> I personally hope that #9519 ("Make the test suite more deterministic
> through network simulation") will be the final solution for the
> "transient (network) errors" problem.


Let's hope so!


Take care.

bert.