Re: [Tails-ux] Clone Persistence option in Tails Installer (…

Delete this message

Reply to this message
Author: intrigeri
Date:  
To: Tails user experience & user interface design
Subject: Re: [Tails-ux] Clone Persistence option in Tails Installer (#7049)
Hi,

sajolida:
> intrigeri:
>> sajolida:
>>> - To prevent people confusing their current Tails with their backup
>>> Tails, the backup Tails could be aware that it is a backup Tails and
>>> display some warning when first started. Otherwise, changes made on
>>> the backup Tails would be overwritten when the backup is updated.
>>
>> If we can use a bit in the GPT partition flag space (see the ticket
>> about storing the Tails version number in there for technical
>> details), then this should be reasonably cheap.


> #8862 (Display version number of destination device in Tails Installer)
> is unclear about why a GPT flag is needed. I understand that, in the
> context of #8862, it would allow knowing the version number without
> mounting the system partition of the destination Tails.


Exactly.

> I think that the context of backups is different and doesn't even
> require a GPT flag. I first thought about storing this information in
> the system partition so it could be displayed (and localized) as soon as
> Tails Greeter opens. But I understand that the system partition is
> read-only so maybe that's not an option: the user wouldn't be able to
> confirm that they want to use their backup Tails and erase this flag
> from Tails Greeter.


Indeed.

> But actually, in the context of backups, this information doesn't matter
> until the Persistence is unlocked, so this information can probably be
> stored in the Persistence of the backup Tails and still displayed in
> Tails Greeter as soon as the Persistence gets unlocked and before the
> GNOME session.


Perfect! Case closed, then.

>>> Before the backup Persistence is being created, the user is prompted for
>>> a passphrase.
>>
>> What is this passphrase used for?


> To create the Persistence on the backup Tails. I assumed that we would
> work at the file system level.


Got it! Hopefully that won't be needed:

> I did a bit of research on cryptsetup to see if we could reuse the
> encryption key and the passphrase of the current LUKS volume to create
> the LUKS volume of the backup Tails in a way that would be transparent
> to the user but I don't think that's possible without writing kernel code :)


I was surprised to read this (AFAICT all the info we need is available
from userspace) to I took a quick look.

At first glance, cryptsetup luksHeaderBackup/luksHeaderRestore should
work to create that backups LUKS volume; and then, to unlock it, one
could dump the master key from memory and pass it to cryptsetup
open --master-key-file. Forgive me if that's what you already
researched and you found out it won't work for some reason :)

>>> Update
>>> ------
>>
>>> Before the backup Persistence is being updated, the user is prompted for
>>> a passphrase.
>>
>> Here as well, what for?


> To unlock the Persistence of the backup Tails.


OK, so same as creation discussed above, it might or might not be
needed, depending on whether we can transparently reuse the same
LUKS key.

>>> ### Open questions
>>
>>> - Is it fine to copy the content of the current Persistence while it is
>>> being used?
>>
>> Doing so creates inconsistent backups: among a set of files that are
>> supposed to go together (be it config, program data, user data), some
>> of them are backup'ed at version N while some others at version N+1.
>>
>> Inconsistent backups ⇒ inconsistent data sets. That can break
>> software's ability to use the data at various degrees. I can't tell
>> off-hand how bad this can be but it has the potential to make software
>> with large and complex data store, such as Thunderbird,
>> really unhappy.


> Understood. Then we should prevent users from updating their backups
> while the problematic applications are open.


Yep.

> To polish the experience a bit, it might be worth investigating which
> applications are really problematic in practice, and not only in theory.


This could indeed allow us to increase our confidence in inconsistent
backups. For some applications (e.g. APT) we don't care as the data
can easily be retrieved from the Internet. For some others it might be
easy to prove that the can't be a problem in practice, ever. But for
some complex apps (I'm thinking of Thunderbird here again), we won't
ever be certain that we've tested all the problematic situations that
can happen in practice: it's about race conditions that can arise due
to the ordering of many moving parts (user actions, actions taken by
the application in the background) vs. the various steps of the backup
process itself. I'm wary of a reasoning such as "we did not manage to
produce a really problematic race condition, ergo no problematic race
condition can happen in practice". So I'm pretty sure that for *some*
apps, we'll have to err on the safe side and ask the user to close
them. Still, if we can avoid asking the user to close *some* apps
while backing up data, it'll be an improvement, even if we ask them to
close some other apps.

(I know I might be overcautious here. That's because I'm wary of
giving users an unreliable solution that can occasionally create
unusable backups: not being able to access one's data in their
backups, whenever one needs it, sounds like pretty bad UX to me.)

> I assumed that working on the block device level would mean copying the
> whole Persistence partition block by block every time. Doing a dd of
> Tails (1.2 GB) to a USB stick takes around 5 minutes, so doing full copy
> of a 5.5 Persistence would take around 30 minutes. Not fast. Especially
> if you can't use Thunderbird while your backups are updating.


That's of course a no-go.

> But maybe there are more efficient ways of operating at the block level
> that I ignore!


There are more efficient ways as long as:

( comparing blocks between the source and destination device
+ writing only the changed blocks to the destination device
)

… takes less time than copying the full source device to the
destination device.

Given USB sticks are often vastly slower at writing than at reading,
and chances are that most blocks have not changed, my hunch says this
should hold true in most cases. But I guess it'll still be slower than
a solution that operates at the filesystem level and relies on mtime
to identify changed files.

Would it be useful to do some measurements, to know if operating at
the block device level is a valid option? I guess it's only useful if
operating at the block device level has substantial benefits over the
filesystem one.

Implementation wise a web search suggests a few somewhat relevant
options:

- There are rsync patches floating around, that add a --write-devices
option for rsync to copy only the blocks that are different.
- https://vog.github.io/bscp/
- http://www.bouncybouncy.net/programs/blocksync.py
- https://sourceforge.net/projects/blocksync/

I don't think it's worth investigating them all in depth right now.

>> S3 might be facilitated by working at the block device level (as
>> opposed to filesystem level).


> How so?


> S3 is: I want to cross a border with no data on me.


> The scenario I imagined for S3 would be:


> 1. I cross the border with no Tails device at all with me.
> 2. I install Tails on the other side of the border.
> 3. I get my Tails to fetch all my files online.


I'm curious about the "fetch all my files online" operation (see below
for potential caveats).

> I thought that this might work as well on the file system level.
> At least it's how I do it when I cross borders with no data on me :)


A block device image is inherently one single file that can be
dumped/restored and uploaded/downloaded in one single operation,
without requiring any cleverness wrt. restoring the
contents correctly.

One can of course create/restore and upload/download an archive of the
contents of the filesystem. That requires:

- before restoring: creating the destination LUKS+filesystem if it
does not exist yet; else, unlocking it

- more care to ensure the content is correctly restored (e.g. ACLs,
extended attributes, UID/GID, this sort of things)

I don't know how much this weighs in the grand scheme of things,
hence I wrote "might be facilitated".

Cheers,
--
intrigeri