Re: [Tails-ux] Clone Persistence option in Tails Installer (…

Delete this message

Reply to this message
Author: sajolida
Date:  
To: Tails user experience & user interface design
Subject: Re: [Tails-ux] Clone Persistence option in Tails Installer (#7049)
intrigeri:
> sajolida:
>> intrigeri:
>>> sajolida:
>>>> Before the backup Persistence is being created, the user is prompted for
>>>> a passphrase.
>>>
>>> What is this passphrase used for?
>
>> To create the Persistence on the backup Tails. I assumed that we would
>> work at the file system level.
>
> Got it! Hopefully that won't be needed:
>
>> I did a bit of research on cryptsetup to see if we could reuse the
>> encryption key and the passphrase of the current LUKS volume to create
>> the LUKS volume of the backup Tails in a way that would be transparent
>> to the user but I don't think that's possible without writing kernel code :)
>
> I was surprised to read this (AFAICT all the info we need is available
> from userspace) to I took a quick look.
>
> At first glance, cryptsetup luksHeaderBackup/luksHeaderRestore should
> work to create that backups LUKS volume; and then, to unlock it, one
> could dump the master key from memory and pass it to cryptsetup
> open --master-key-file. Forgive me if that's what you already
> researched and you found out it won't work for some reason :)


I'm glad that you got excited about the "same header" approach :)

I thought that the hard part would be to dump the master key from
memory. I couldn't find any cryptsetup operation to do that and thought
that it might be somehow hard to access on purpose.

>>>> ### Open questions
>>>
>>>> - Is it fine to copy the content of the current Persistence while it is
>>>> being used?
>>>
>>> Doing so creates inconsistent backups: among a set of files that are
>>> supposed to go together (be it config, program data, user data), some
>>> of them are backup'ed at version N while some others at version N+1.
>>>
>>> Inconsistent backups ⇒ inconsistent data sets. That can break
>>> software's ability to use the data at various degrees. I can't tell
>>> off-hand how bad this can be but it has the potential to make software
>>> with large and complex data store, such as Thunderbird,
>>> really unhappy.
>
>> Understood. Then we should prevent users from updating their backups
>> while the problematic applications are open.
>
> Yep.
>
>> To polish the experience a bit, it might be worth investigating which
>> applications are really problematic in practice, and not only in theory.
>
> This could indeed allow us to increase our confidence in inconsistent
> backups. For some applications (e.g. APT) we don't care as the data
> can easily be retrieved from the Internet. For some others it might be
> easy to prove that the can't be a problem in practice, ever. But for
> some complex apps (I'm thinking of Thunderbird here again), we won't
> ever be certain that we've tested all the problematic situations that
> can happen in practice: it's about race conditions that can arise due
> to the ordering of many moving parts (user actions, actions taken by
> the application in the background) vs. the various steps of the backup
> process itself. I'm wary of a reasoning such as "we did not manage to
> produce a really problematic race condition, ergo no problematic race
> condition can happen in practice". So I'm pretty sure that for *some*
> apps, we'll have to err on the safe side and ask the user to close
> them. Still, if we can avoid asking the user to close *some* apps
> while backing up data, it'll be an improvement, even if we ask them to
> close some other apps.
>
> (I know I might be overcautious here. That's because I'm wary of
> giving users an unreliable solution that can occasionally create
> unusable backups: not being able to access one's data in their
> backups, whenever one needs it, sounds like pretty bad UX to me.)


I also want to be cautious while bothering users to the minimum and I
think that we basically agree here.

- Thunderbird sounds like a good example of an application to be careful
about.
- I would guess that backing up bookmarks while Tor Browser is running
wouldn't be so problematic, while preventing users to use Tor Browser
during 10 minutes while their backups are updating would be annoying.
- Or backing up the network connections with the network is connected :)

>> But maybe there are more efficient ways of operating at the block level
>> that I ignore!
>
> There are more efficient ways as long as:
>
>  (   comparing blocks between the source and destination device
>    + writing only the changed blocks to the destination device
>  )

>
> … takes less time than copying the full source device to the
> destination device.
>
> Given USB sticks are often vastly slower at writing than at reading,
> and chances are that most blocks have not changed, my hunch says this
> should hold true in most cases. But I guess it'll still be slower than
> a solution that operates at the filesystem level and relies on mtime
> to identify changed files.
>
> Would it be useful to do some measurements, to know if operating at
> the block device level is a valid option? I guess it's only useful if
> operating at the block device level has substantial benefits over the
> filesystem one.


I can't think of any if we can tell our backup tool to save EVERYTHING
(ACL, links, etc.).

Funnily, anarcat blogged about doing this with rsync last month:

https://anarc.at/blog/2019-07-07-rsync-oneliner/

> Implementation wise a web search suggests a few somewhat relevant
> options:
>
>  - There are rsync patches floating around, that add a --write-devices
>    option for rsync to copy only the blocks that are different.
>  - https://vog.github.io/bscp/
>  - http://www.bouncybouncy.net/programs/blocksync.py
>  - https://sourceforge.net/projects/blocksync/

>
> I don't think it's worth investigating them all in depth right now.


Ack.

>>> S3 might be facilitated by working at the block device level (as
>>> opposed to filesystem level).
>
>> How so?
>
>> S3 is: I want to cross a border with no data on me.
>
>> The scenario I imagined for S3 would be:
>
>> 1. I cross the border with no Tails device at all with me.
>> 2. I install Tails on the other side of the border.
>> 3. I get my Tails to fetch all my files online.
>
> I'm curious about the "fetch all my files online" operation (see below
> for potential caveats).
>
>> I thought that this might work as well on the file system level.
>> At least it's how I do it when I cross borders with no data on me :)
>
> A block device image is inherently one single file that can be
> dumped/restored and uploaded/downloaded in one single operation,
> without requiring any cleverness wrt. restoring the
> contents correctly.
>
> One can of course create/restore and upload/download an archive of the
> contents of the filesystem. That requires:
>
>  - before restoring: creating the destination LUKS+filesystem if it
>    does not exist yet; else, unlocking it

>
>  - more care to ensure the content is correctly restored (e.g. ACLs,
>    extended attributes, UID/GID, this sort of things)

>
> I don't know how much this weighs in the grand scheme of things,
> hence I wrote "might be facilitated".


Right. On the other hand, if you have a 32 GB Persistence that only has
1 GB of files in them and you want to fetch it online on the block
device level, I understand that you would have to fetch the whole
freaking 32 GB.

If we want to solve S1 at the file system level, we'll already need a
tool that copies EVERYTHING in the file system. Let's hope that the same
tool can be used to fetch EVERYTHING online. I guess rsync would do
that. I don't know if borg would do that for example.

--
sajolida
Tails — https://tails.boum.org/
UX · Fundraising · Technical Writing