Re: [Tails-dev] [RFC] Design of our freezable APT repository

Poista viesti

Vastaa
Lähettäjä: anonym
Päiväys:  
Vastaanottaja: The Tails public development discussion list
Aihe: Re: [Tails-dev] [RFC] Design of our freezable APT repository
intrigeri:
> Hi,
>
> anonym wrote (10 Mar 2016 20:06:31 GMT) :
>>> Upgrading to a new snapshot
>
>> I expect it to be quite rare that we need to encode a particular
>> snapshot in a topic branch, which is both good and bad. Good, because we
>> then do not have to deal with the problems it may cause very often; bad,
>> because it happens rarely enough that one might not look for the
>> problems all the time, and hence let them slip through. :)
>
>> Specifically, I fear that we may have problems with merging topic
>> branches that encode some snapshot into a base branch, and then forget
>> to remove the encoding (or otherwise deal with it in a sane way) so it
>> messes up the base branch.
>
>> Have I missed/misunderstood something?
>
> First of all, such encoding of snapshots is an integral part of
> proposed changes in such a topic branch; it's something one needs to
> carefully review when merging, just like any other code change. In the
> general case, merging a topic branch that encodes some snapshot into
> a base branch means "I want that base branch to use that snapshot",
> and most of the time the purpose of such a topic branch will precisely
> be to bump snapshot references to newer versions, so in general
> we should be good.


Ack, fair enough.

> Let's look at the scope of potential problems though:
>
> * The devel branch is not affected since it "always uses the freshest
> set of APT repository snapshots available" (I'm not 100% sure yet
> but I think this will simply be fully automatic so one can't mess up
> with it by mistake).


In the future, if we become a rolling distro based on Debian Testing,
we'll have to reconsider this, but the design indeed allows us to change
this behaviour easily, so let's forget about it for now.

> * The testing branch can be affected by this problem, between the time
> the faulty merge is done, and the time we release something based on
> testing (since "the RM encodes in the `testing` Git branch the fact
> that it is not frozen anymore"), that is our code freeze period.
> That's the time during which the snapshot references encoded in Git
> are most important, and we'll be frozen, so I expect we'll be
> careful about how we deal with such information on the
> testing branch.
>
> * The handling of the stable branch in this respect is less clearly
> specified, but I suspect it'll be quite close to the
> testing branch's.


Agreed. And for the stable branch we also need to be extra careful since
this is the type of issue one does not have to spend brain power on
resolving in case of an emergency release.

> ⇒ I'm not too concerned about this problem :)


Now I'm not either any more, thanks!

>>> Freeze exceptions

[...]
>> BTW, it would be great to have a linting tool that compared the current
>> APT pinnings vs what is available in the current Debian branches used
>> given some Tails source checkout.
>
> I'm open to adding ideas of helpful tools to the blueprint.
>
> I'll need help to specify more clearly what problem we want desired
> tools to solve, and how.
>
> If I got it right, you want to know something like "what would happen
> if we dropped our APT pinning", right? Do we want to know that for the
> case when we remove APT pinning we have set up to grant freeze
> exceptions only, or all APT pinning? The former, I guess, right?


Well, I'm not sure how it would be determined if a pinning was added for
a freeze exception exactly, and not some other purpose. Any way, this
tool seems to be useful in the latter case you talk about too, to keep
our pinnings trimmed, especially if we become a rolling distro, and may
have to frequently pin stuff (security updates) from Debian Unstable.
Furthermore, I expect this latter case to be easier to solve, and I
think I'd be happy enough with that one solved -- with informative
enough output it will be easy enough to use it for the first case too.

>>> Another option, instead of adding/removing temporary APT pinning,
>>> would be to backport the package we want to upgrade, and make it so it
>>> has a version greater than the one in the time-based snapshot used by
>>> the frozen release branch, and lower than the one in more recent
>>> time-based snapshots.
>
>> This makes me really unenthusiastic. Please do not underestimate the
>> added overhead of having to rebuild packages for trivialities like this.
>> I stronly object to this approach.
>
> Agreed ⇒ made it clear on the blueprint that this approach is NACK'ed.


\o/

>>> Number of distributions
>>>
>>> ... in reprepro's conf/distributions, for the reprepro instance(s)
>>> dedicated to taking snapshots of the regular Debian archive, assuming
>>> other mirrored archives such as security.d.o, deb.tpo, etc. each go to
>>> their own reprepro instance.
>
>> This make it sound like the design itself fixes which APT sources are
>> possible to use, and that it will be a pain to add new ones. Or will
>> some puppet magic automatically set up a new reprepro instance when a
>> new source is added in any random branch? If so: crazy! :)
>
> No, we don't plan to have any such magic thing.
>
> Indeed, we do hard-code such things.


That is what I expected, and, to be clear, it indeed seems sound. :)

>> To make the problem a bit more concrete, you later list:
>
>>> torproject: 5 (oldstable, stable, testing, unstable, obfs4proxy)
>
>> which doesn't include the *-experimental branches. How would we deal
>> with a Tor-alpha integration branch, for instance? Would we be force to
>> follow the releases manually, and then upload them ourselves to e.g.
>> deb.t.b.o?
>
> No. We would need -sysadmins@'s cooperation to add the suite you need
> to our snapshotting setup.

[...]
> IMO this is a rare use case, and it's good enough to handle it
> this way.


Fair enough.

>> Sure, we have config/chroot_local-packages, put it's not so nice
>> for contributors if they want to push branches to som Git repo. Imagine
>> if someone wanted to contribute grsec kernel integration. The would have
>> to push a commit with binary blobs in the order of 100 MiB.
>
> config/chroot_local-packages/README discourages against doing that,
> righfully if one asks me.
>
> I don't understand your example, though: linux-grsec is in Debian, so
> with the current design one doesn't need to upload anything. I assume
> you mean something like "say someone wanted to contribute integration
> bits for $large_package_that_is_not_in_Debian_yet", instead. Right?


Sorry, I just forgot about its recent inclusion in Debian. Re-read my
statement as if that had not happened, and we considered including e.g.
a package from a repo made by some other project, Subgraph, Mempo or
whatever.

>>> Garbage collection
>> [...]
>> but my point is that the garbage collector will have to
>> chech each branch, right?
>
> I think this would be over-engineering it a lot, given what our actual
> use cases are.


Indeed, some minor manual work can work around this, as you point out.

[...]
> Speaking of which, I see two main ways to handle the garbage
> collection process:
>
> a. use a manually maintained list of snapshots that need to be
>    kept around, as the blueprint currently suggests;

>
> b. rely on Valid-Until; i.e. the way to express "I want to keep
>    a given snapshot around" would be to postpone its expiration date;
>    I see no reason to differenciate "keep a given snapshot around"
>    from "keep a given snapshot usable".

>
> I think we should do (b), _and_ have some cronjob warn us if we're
> going to have serious problems, e.g. if the snapshot used by a frozen
> testing branch is going to expire (and be deleted); this avoids the
> need to maintain a list of exceptions.


This sounds reasonable.

> Let's discuss separately the two main cases:
>
>  * frozen testing branch: we rarely freeze for more than 10 days, so
>    in the general case there's no problem; and the cronjob check
>    mentioned above should help us deal with corner cases.


Sure, but it *does* happen. Let's just make it explicit in the freeze
section of the release docs to explicitly set the snapshot expiry to the
expected release date + 5 days to account for release delays.

>  * topic branch that hard-codes snapshot references for some reason:
>    in the only practical use cases I have in mind for such branches,
>    they would be short-lived, and quickly merged into testing, and
>    then we're back to the above case. We can still deal with corner
>    cases by manually bumping Valid-Until when we know in advance we
>    want to keep them around for a longer while; and when we forget and
>    the snapshot expires & is deleted, well, really: in what situation
>    do we need a topic branch that points to an old snapshot this way?
>    I can think of special cases that would be nice bonuses, but they
>    are not part of the design goals of this freezable APT repo
>    project, and I feel we need to avoid growing its scope too much.


Sure. Out of context, the "quickly merged" part sounds like something we
could fail on occasionally, but in the context you present it seems like
we've done well historically, at least.

>> I didn't look much past this, since it seemed a bit too
>> implementation-focused, and even about thus that we will not or may not
>> ever have.
>
> I hope I have clarified this above.


Yes, to be clear, all of this looks very good! :)

>> However, I see nothing about how to deal with Debian packages that
>> fetches something external at install time (firmwares, sources illegal
>> in some jurisdictions). This sounds like a non-trivial problem, and I
>> really wonder what your thoughts on solutions are.
>
> Indeed, that's outside of the scope of the current "freezable APT
> repository" project.


I see. This implies that there is no explicit goal (yet) to make each
Tails release buildable "forever", correct? I sort of assumed it would
be part of this, so it's good if we can clear that out immediately. And
I agree (with the implicit assumption) that we can keep things simpler
for now until we start working on reproducible builds for real.

> My current best solution for that is to package
> all these things as .deb's somewhere (possibly in a very ad-hoc way in
> our own overlay APT repo), so we get them handled (snapshotted etc.)
> for free just like any other package. What do you think?


Isn̈́'t the problem that some of these Debian packages fetch these blobs
from static URLs during package installation? How would a .deb
containing the blobs help, then? Is there some Debian packaging
mechanism that all of these use that looks for files in some cache first
where you intend to place them?

>> Crazy idea: along with the snapshots we also have a particular cache of
>> a caching proxy. For instance, we set `http_proxy` in the build env to
>> point to some caching proxy running in our infra, and when we build with
>> snapshot X, we set the proxy username to X to tell the proxy that cache
>> X is to be used. Cache X contains exactly the cached files used when
>> building with snapshot X the first time, because that is when they were
>> seeded. Crazy! We'd need some sort of access control for this, though.
>> :)
>
> Yes, crazy :) Frankly, I need to focus on stuff that's in the scope
> of the project I'm working on right now, so I haven't thought of
> it much.


Ack! It's mostly relevant for the reproducible builds part, so let's not
focus on it until then; but let's keep this future design change in mind
while implementing this design, ok?

>> And I also wonder if the same mechanism can be used to determine the
>> "tagged, partial snapshots", instead of the "'build the almost-final
>> image' to generate the package manifest" thing I mentioned I didn't like
>> above. Think of it as an apt-cacher-ng instance that is seeded when
>> doing the canonical release build, and then frozen forever.
>
> Indeed, that might have been another way to implement it.
> We're "almost done" with our current implementation so I doubt we want
> to switch to this new idea now, though :)


So you send and RFC on a *design* when it already is *implemented* and
cannot be changed? A true RFC (in the IEEE sense) indeed! :) (This is
not an actual critique, just a bad joke! :))

Cheers!