Re: [Tails-dev] [RFC] Design of our freezable APT repository

Author: intrigeri
Date:
To: The Tails public development discussion list
Subject: Re: [Tails-dev] [RFC] Design of our freezable APT repository

Hi,

anonym wrote (10 Mar 2016 20:06:31 GMT) :
>> Upgrading to a new snapshot

> I expect it to be quite rare that we need to encode a particular
> snapshot in a topic branch, which is both good and bad. Good, because we
> then do not have to deal with the problems it may cause very often; bad,
> because it happens rarely enough that one might not look for the
> problems all the time, and hence let them slip through. :)

> Specifically, I fear that we may have problems with merging topic
> branches that encode some snapshot into a base branch, and then forget
> to remove the encoding (or otherwise deal with it in a sane way) so it
> messes up the base branch.

> Have I missed/misunderstood something?

First of all, such encoding of snapshots is an integral part of
proposed changes in such a topic branch; it's something one needs to
carefully review when merging, just like any other code change. In the
general case, merging a topic branch that encodes some snapshot into
a base branch means "I want that base branch to use that snapshot",
and most of the time the purpose of such a topic branch will precisely
be to bump snapshot references to newer versions, so in general
we should be good.

Let's look at the scope of potential problems though:

* The devel branch is not affected since it "always uses the freshest
set of APT repository snapshots available" (I'm not 100% sure yet
but I think this will simply be fully automatic so one can't mess up
with it by mistake).

* The testing branch can be affected by this problem, between the time
the faulty merge is done, and the time we release something based on
testing (since "the RM encodes in the `testing` Git branch the fact
that it is not frozen anymore"), that is our code freeze period.
That's the time during which the snapshot references encoded in Git
are most important, and we'll be frozen, so I expect we'll be
careful about how we deal with such information on the
testing branch.

* The handling of the stable branch in this respect is less clearly
specified, but I suspect it'll be quite close to the
testing branch's.

⇒ I'm not too concerned about this problem :)

>> Freeze exceptions
> [...]
>> 2. Pin, in config/chroot_apt/preferences, the upgraded package we
>> have just imported. The aforementioned tool can do this as well.
>>
>> [Our current default APT pinning ranks Tails overlay APT suites over
>> any other APT source, so why the need to add an APT pinning entry? The
>> problem is that it's hard to do the next step (clean up) with this APT
>> pinning, combined with the fact that we can't easily delete a package
>> from an APT suite and see this deletion propagated over suite merges. I
>> (intrigeri) was not able to find a good solution to that problem under
>> these constraints, so [...] this document assumes that we change this,
>> and pin our overlay APT suites at the same level as the APT sources
>> corresponding to the Debian release Tails is currently based on. This
>> implies that we manually pin, in Git, the packages from our overlay APT
>> suites, that we want to override the ones found in other repositories
>> regardless of version numbers.]

> I actually think making the packages we want more explicit is a good
> thing for transparency, and makes it easier, as a developer, to quickly
> see how we diverge from Debian by just looking in Git. In general, I
> think I believe it is better to have as much as possible encoded in Git,
> as opposed to other external repositories (APT or whatever).

Yes!

> However, the next point:

>> 3. Make it so branches stop using the upgraded package once they
>> have been unfrozen [...]

> indeed exposes the problem. Manually removing the added pinnings feels a
> bit error prone and cumbersome. However, this clean-up will only happen
> when branches are unfreezed, which is only after releasing, so it
> doesn't sound too bad. Right?

Exactly. As long as this only impacts people who anyway need to
understand all these things quite deeply (i.e. RMs), and the toolkit
around it is helpful enough, I think we're good: human carefulness and
cleverness will be needed in small amounts only, and the sensitive
operations will be performed by people who know what they're doing.

> BTW, it would be great to have a linting tool that compared the current
> APT pinnings vs what is available in the current Debian branches used
> given some Tails source checkout.

I'm open to adding ideas of helpful tools to the blueprint.

I'll need help to specify more clearly what problem we want desired
tools to solve, and how.

If I got it right, you want to know something like "what would happen
if we dropped our APT pinning", right? Do we want to know that for the
case when we remove APT pinning we have set up to grant freeze
exceptions only, or all APT pinning? The former, I guess, right?

>> Another option, instead of adding/removing temporary APT pinning,
>> would be to backport the package we want to upgrade, and make it so it
>> has a version greater than the one in the time-based snapshot used by
>> the frozen release branch, and lower than the one in more recent
>> time-based snapshots.

> This makes me really unenthusiastic. Please do not underestimate the
> added overhead of having to rebuild packages for trivialities like this.
> I stronly object to this approach.

Agreed ⇒ made it clear on the blueprint that this approach is NACK'ed.

>> Number of distributions
>>
>> ... in reprepro's conf/distributions, for the reprepro instance(s)
>> dedicated to taking snapshots of the regular Debian archive, assuming
>> other mirrored archives such as security.d.o, deb.tpo, etc. each go to
>> their own reprepro instance.

> This make it sound like the design itself fixes which APT sources are
> possible to use, and that it will be a pain to add new ones. Or will
> some puppet magic automatically set up a new reprepro instance when a
> new source is added in any random branch? If so: crazy! :)

No, we don't plan to have any such magic thing.

Indeed, we do hard-code such things.

> To make the problem a bit more concrete, you later list:

>> torproject: 5 (oldstable, stable, testing, unstable, obfs4proxy)

> which doesn't include the *-experimental branches. How would we deal
> with a Tor-alpha integration branch, for instance? Would we be force to
> follow the releases manually, and then upload them ourselves to e.g.
> deb.t.b.o?

No. We would need -sysadmins@'s cooperation to add the suite you need
to our snapshotting setup.

Of course a pull request against the puppet-tails repository would be
the preferred way to do it, in this case you would have to modify two
files:

https://git-tails.immerda.ch/puppet-tails/tree/templates/reprepro/snapshots/time_based/torproject/distributions.erb
https://git-tails.immerda.ch/puppet-tails/tree/templates/reprepro/snapshots/time_based/torproject/updates.erb

But just asking "can we please mirror $suite from $apt_repo" would
work too :)

IMO this is a rare use case, and it's good enough to handle it
this way.

> Something I think we still need to support is adding APT sources (to
> {{binary,config}/chroot_sources) that exist outside of the freezable APT
> repo system. I imagine this will remain useful for conributors which do
> not have the ability to upload packages to any of the already added
> ones.

Yes, added to the blueprint in the TODO section.

> Sure, we have config/chroot_local-packages, put it's not so nice
> for contributors if they want to push branches to som Git repo. Imagine
> if someone wanted to contribute grsec kernel integration. The would have
> to push a commit with binary blobs in the order of 100 MiB.

config/chroot_local-packages/README discourages against doing that,
righfully if one asks me.

I don't understand your example, though: linux-grsec is in Debian, so
with the current design one doesn't need to upload anything. I assume
you mean something like "say someone wanted to contribute integration
bits for $large_package_that_is_not_in_Debian_yet", instead. Right?

>> Garbage collection
> [...]
>> To ensure that garbage collection doesn't delete a snapshot we still
>> need, e.g. the one currently referenced in the frozen testing branch,
>> we'll maintain a list of snapshots that need to be kept around.

> To be clear: each topic branch could potentially have encoded a
> different snapshot, correct?

Yes.

> In practice most will just follow their
> base branches,

Yes.

> but my point is that the garbage collector will have to
> chech each branch, right?

I think this would be over-engineering it a lot, given what our actual
use cases are.

First of all, note that a time-based snapshot will be Valid-Until 10
days only after generation, so a snapshot won't be usable for more
than 10 days, unless we bump its expiration date manually.
In practice, topic branches that bump snapshot references will be rare
and short-lived, so I think we'll rarely have to bump expiration date
and tell the GC about this exception.

Speaking of which, I see two main ways to handle the garbage
collection process:

a. use a manually maintained list of snapshots that need to be
kept around, as the blueprint currently suggests;

b. rely on Valid-Until; i.e. the way to express "I want to keep
a given snapshot around" would be to postpone its expiration date;
I see no reason to differenciate "keep a given snapshot around"
from "keep a given snapshot usable".

I think we should do (b), _and_ have some cronjob warn us if we're
going to have serious problems, e.g. if the snapshot used by a frozen
testing branch is going to expire (and be deleted); this avoids the
need to maintain a list of exceptions.

Let's discuss separately the two main cases:

* frozen testing branch: we rarely freeze for more than 10 days, so
in the general case there's no problem; and the cronjob check
mentioned above should help us deal with corner cases.

* topic branch that hard-codes snapshot references for some reason:
in the only practical use cases I have in mind for such branches,
they would be short-lived, and quickly merged into testing, and
then we're back to the above case. We can still deal with corner
cases by manually bumping Valid-Until when we know in advance we
want to keep them around for a longer while; and when we forget and
the snapshot expires & is deleted, well, really: in what situation
do we need a topic branch that points to an old snapshot this way?
I can think of special cases that would be nice bonuses, but they
are not part of the design goals of this freezable APT repo
project, and I feel we need to avoid growing its scope too much.

Thoughts?

> I didn't look much past this, since it seemed a bit too
> implementation-focused, and even about thus that we will not or may not
> ever have.

I hope I have clarified this above.

> However, I see nothing about how to deal with Debian packages that
> fetches something external at install time (firmwares, sources illegal
> in some jurisdictions). This sounds like a non-trivial problem, and I
> really wonder what your thoughts on solutions are.

Indeed, that's outside of the scope of the current "freezable APT
repository" project. My current best solution for that is to package
all these things as .deb's somewhere (possibly in a very ad-hoc way in
our own overlay APT repo), so we get them handled (snapshotted etc.)
for free just like any other package. What do you think?

> Crazy idea: along with the snapshots we also have a particular cache of
> a caching proxy. For instance, we set `http_proxy` in the build env to
> point to some caching proxy running in our infra, and when we build with
> snapshot X, we set the proxy username to X to tell the proxy that cache
> X is to be used. Cache X contains exactly the cached files used when
> building with snapshot X the first time, because that is when they were
> seeded. Crazy! We'd need some sort of access control for this, though.
> :)

Yes, crazy :) Frankly, I need to focus on stuff that's in the scope
of the project I'm working on right now, so I haven't thought of
it much.

> And I also wonder if the same mechanism can be used to determine the
> "tagged, partial snapshots", instead of the "'build the almost-final
> image' to generate the package manifest" thing I mentioned I didn't like
> above. Think of it as an apt-cacher-ng instance that is seeded when
> doing the canonical release build, and then frozen forever.

Indeed, that might have been another way to implement it.
We're "almost done" with our current implementation so I doubt we want
to switch to this new idea now, though :)

Cheers!
--
intrigeri

This message is part of the following thread:
	the complete thread tree sorted by date
	anonym at
	anonym at