Re: [Tails-dev] Feature #5301 - Clone or Backup Persistent V…

Delete this message

Reply to this message
Author: anonym
Date:  
To: The Tails public development discussion list
Subject: Re: [Tails-dev] Feature #5301 - Clone or Backup Persistent Volume
Andrew Gallagher:
> On 13/12/15 17:46, sajolida wrote:
>>
>> If you're interested in plugging this in the current code [1] that would
>> be awesome. But as we haven't really finished the design decision around
>> the backup scenarios, I can't promise anything regarding it's official
>> acceptance yet.
>
> I have a reasonably stable version of the code available here:
>
> https://github.com/andrewgdotcom/tails-clone-persistent/releases/tag/v0.1.1-beta1


Cool! Thanks for your contribution!

I haven't tested it, but I did have a look at src/tcp-helper.c since the
necessity of memory unsafe code surprised me:

* It has the ".c" extension, but really is C++, so by convention it
should be ".cpp", no?

* BTW, regarding the name: for many of us "tcp" has other associations.
It completely threw me off for a few seconds ("What?! Does this do
TCP/IP networking?"), so I'd rather see something else. I'm a proponent
for less surprises and a super boring experience when dealing with code. :)

* The usage of `fgets()` makes me worried. Of course you use the magical
constant 1000 consistently, so we're ok, but still, using a constant
(`#define LINE_READ_BUFFER_SIZE 1000` or something) is good practice and
ensures consistency if we change the buffer size later. I'd say the same
for other magic constants are bad used in more than once place (for
those used in one place only, an obligatory comment explaining it is ok).

* Manual memory management is hard:

- `tails_free_start()` has a memory leak since you do not `free()` the
`line` variable before returning. Since it's fixed length and not
required to live after this function returns you should put it on the
stack like you did in e.g. `mount_device()`.

- Each use of your string-returning functions introduces a memory leak
since the returned string is not `free()`:d in the calling functions.

- You take the following precaution to use the notoriously dangerous
`strncpy()` safely (well, to ensure that the resulting buffer is a
well-formed C string -- you do nothing to ensure that the last byte that
is overwritten isn't anything important, however):

      strncpy(buffer, line+offset, len);
      // make double sure it's properly null terminated
      buffer[len]='\0';


    However,


      char ... buffer[100];
      ...
      offset=temp.find("End ")
      len=temp.find("Size ")-offset


    and AFAICT nothing guarantees that `len` < 100 (= the size of
`buffer`) so you may write a null outside of the buffer, causing
undefined behaviour. In practice, perhaps this shouldn't happen because
of the format of what is being parsed (but what about corruption?), but
I do not feel convinced without a comment arguing for that, and note
safe without a conditional check or assertion.


* This conditional worries me a lot:

      if((offset=temp.find("End ")) &&
         (len=temp.find("Size ")-offset)) { ...


So in C/C++, everything non-zero is treated as "true", so if we fail
the first `find()` (=> return value std::string::npos = -1, but I guess
since it's a size_t and hence unsigned it will wrap around and become
MAX_INT_32 - 1) the first operand to `&&` is true, which cannot be what
you intend. Essentially, it means that we only treat the `find()` as
failing if we happened to find "End " at the exact beginning of the
buffer. Clearly this is not what you intent.

And now it gets really interesting. The second operand is false if we
manage to calculate the length 0, and given that we cannot find "End"
and "Size" at the same position, it can happens if and only if we find
neither "End" nor "Size", since -1 - -1 = 0. Incidentally, this is
actually what makes sense for this whole conditional, but we definitely
didn't reach there by intention or sound reasoning, right? :)

So this conditional is scaring me because it works by coincidence --
you actually want to explicitly make sure that both `find()`:s found
something, i.e. neither returned `std::string::npos`. In addition, you
also want to check that `len` makes sense by checking that the first
found position is smaller than the second. This is probably what you want:

      size_t end_pos = temp.find("End ");
      size_t size_pos = temp.find("Size ");
      if(end_pos  != std::string::npos &&
         size_pos != std::string::npos &&
         end_pos  < size_pos) {
          len = end_pos - size_pos;
          offset = end_pos;
          ...


Regarding style, assignments in conditionals is often a good way to
ask for subtle bugs and annoy code reviewers, except for a few common,
well-understood patterns, imho.

* The usage of `system()` and `popen()` is dangerous since they invoke
the shell in all its command injection-prone glory. For instance, let's
look at this part of `do_copy()`:

      system((_STR + "/usr/bin/rsync -a --delete
--exclude=gnupg/random_seed --exclude=lost+found " + source_location +
"/ " + mount_point).c_str());


Here `source_location` will be the first parameter sent to the
executable, so if I set it to "; rm -rf /;" the command `rm -rf /` will
be executed in the shell as root. Woo! Well, actually I see that you do
enough input sanitation to prevent shell exploitation but imho that is
the wrong approach. For instance, you disallow perfectly fine path
characters like, for example, ö (and everything else non-ascii), space,
asterisk and backspace (ok, I'm not _so_ sad about the last two). :)
Best practice is to drop the input sanitation and pick alternatives to
`system()`/`popen()` that avoids invoking the shell and calls `exec()`
or similar. I'm too ignorant of C/C++ to recommend you anything.


>From what I gather your program is essentially a wrapper around a few

executables (e.g. cryptsetup, mount, rsync) except that it does some
minor file parsing (`tails_free_start()` reads the block device). While
using complex, memory unsafe languages like C/C++ sometimes is
warranted, I fail to see it in this instance. Above it may seem that I
am excessive in my criticism, but I really want to make it clear how
hard it is to write correct C/C++ code. I would never consider doing it
myself since I don't have 10+ years of day-to-day experience doing it
(wow, isn't that a catch-22! :)). I really do not want to introduce more
C/C++ code into production unless there are really good reasons.

So, I personally would not include this program. Hence I recommend you
to rewriting it in the same scripting language you end up using for the
GUI, so Python 3, it seems. This will ease the integration as well as be
safer (memory wise), a lot shorter and more readable, supporting unicode
path characters (FWIW), architecture independent (and no Makefile
complexity) and *much* easier for us to maintain. Also, you can find
libraries (udisks) for dealing with disk partitioning that can be of
better service to you than `parted`, so you do not need hacks like
`tails_free_start()`. Similarly, there are nice cryptsetup/rsync/fs
formatting/mount handling/etc interfaces through libraries, so you can
probably drop most of your ad-hoc command output parsing too.

Cheers!