Re: [Tails-dev] Help appreciated to investigate some Jenkins…

Delete this message

Reply to this message
Author: intrigeri
To: tails-sysadmins, tails-dev
Subject: Re: [Tails-dev] Help appreciated to investigate some Jenkins performance issue

here's my summary, after spending some time to understand kibi's
report and what the underlying problem was:

- Copying artifacts between Jenkins and any of its builder nodes
seems to be capped at 10 MiB/s. That feels pretty low given the
sort of hardware we're running our CI on.

- Apart of the important exception discussed below, the relative
impact of this bottleneck on our CI jobs is generally not huge, but
still significant: in most cases, copying artifacts back to Jenkins
accounts for almost 10% of our CI build jobs run time.

- There's 1 exception, that accounts for 30 minutes (and growing)
waiting time on the critical path of our release process: one
recently introduced job (parallel_collect_IUKs) mostly copies
a bunch of artifacts back and forth¹, so here copying artifacts
accounts for >99% of the total job run time.

I don't know how this problem ranks compared to the other
infrastructure issues that negatively impact RM'ing work (AFAIK,
most are tracked on GitLab with the Release Management label).

I can think of 2 possible approaches:

- Sysadmins investigate and address the underlying performance
bottleneck ⇒ artifacts get copied faster.
Last time I checked, the bottleneck was CPU, but don't take my word
on that.

- RMs/FT optimize the design of parallel IUK builds on Jenkins to
lower the amount of transferred data (in the current
implementation, every built IUK is copied 3 times).
This could be a workaround but that won't help other CI jobs.

I encourage these teams to work together on this (and possibly on the
bigger picture of infra issues that affect RM'ing work).
If you feel the need to keep me in the loop, please do.


Fully quoting kibi's original message for more context:

Cyril Brulebois (2020-07-27):
> Instead of building all IUKs serially on a single Jenkins worker, I've
> developed a proof of concept to trigger them all in parallel, across all
> workers. The downside is that it didn't seem obvious how to gather all
> results, so I went for a downstream job; for 4.9, jobs are:
> Having all builds in a single place makes it possible for us to download
> all the things from there, by specifying a single job ID.
> But there seems to be a performance issue here. All IUKs as of Tails 4.9
> are below 8 GB, but require close to 30 minutes to get processed…
> Compared to the 1.5 hour of actual work (*building* those IUKs), that's
> an extra wait I'd be happy to spare… Lots of things to do when preparing
> a release, the shorter we wait, the better!
> Any ideas how to improve the situation?
> I'm tracking numbers in this ticket, and that doesn't seem to be a
> one-time issue:
> (See how it went from 17 minutes for 4.7, to 23 minutes for 4.8, and now
> 29 minutes for 4.9…)
> Thanks already!
> (I do realize our Jenkins instance in behind authentication, and that
> not all tails-dev@ subscribers can reach it, but I wasn't sure how to
> reach all Jenkins-knowledgeable people in an easy way…)