Re: [mat-dev] Regarding archives support

Delete this message

Reply to this message
Author: intrigeri
Date:  
To: The metadata anonymisation toolkit mailing list
Subject: Re: [mat-dev] Regarding archives support
Hi,

jvoisin wrote (15 Jan 2014 02:46:43 GMT) :
> I just pushed some related things:
> - zip archives are now supported
> - nested archives are supported, but not from the GUI
> - mtime/atime are now handled for files within archives, and also for
> office documents
> - office document handling has been heavily refactored and simplified.
> As a side effect, the archive's handling code is now a little bit more
> complex.


> As usual, feedback is more than welcome, since I'm planning to do a
> release this weekend.


I'm very glad these features were implemented!

As usual, avoiding to craft many unrelated changes into a commit, and
splitting commits a bit more, even for related changes, would help
reviewing and giving feedback. And please, commit messages that tell
others a bit more (e.g. what was broken before, why the change fixes
it and why it's done this way). Thanks in advance!

Anyway, I gave up reviewing bbe17fd and a few other commits, because
it just seemed to do too much to fit into my mental space, and it
provided far too little explanation for me to do a useful review
without spending two hours putting the entire context back into my
mind. Sorry for the grumpiness: I really wanted to be helpful, and
could not, which is a bit frustrating.

I'm wondering what should be done with the binary files in the test
suite data. I'm not saying we should absolutely generate all of it
from source, and keep these files out of Git (ideally, most of it
should, but well). But perhaps documenting why and how they are
changed in a given commit would be good?

While I'm at it: s/double-clic/double-click/

About archives support: I've run mat (on the command-line) on
a tarball with Perl scripts and a PDF in it, and MAT tells me it was
successfully cleaned, while only the PDF is in the resulting "cleaned"
archive. So I guess there are some missing bits here.

Then I've retried in the GUI with the original archive. There is quite
some room for improvement in the UX:

1. The "those files [...]" message is part of the scrollable area, so
in practice, I was shown a few hundred files, and *then* only,
after scrolling to the end, I get to learn what this list of files
is about. I suggest moving this message a. to the beginning, before
the list; and b. outside of the scrollable area.
2. It's not clear what happens if I select (or if I don't select)
a given file. The "Include" column header is not enough.
3. The message says "MAT will not clean your archive". Really?
Won't it rather clean whatever it can, and additionally include the
unsupported files the user selected?

I suggest the following UI:

MAT is not able to clean the following files, found in the
FILENAME archive.

Select the files you want to include in the cleaned up
archive anyway.

Also, the OK button text should be something more specific, such as
"Scour" (since it's apparently the new terminology; too bad many
non-native speakers likely don't know what this means).

Then, I've selected a file or three in the list, and clicked OK.
The archive is not cleaned, the GUI does not tell me anything about
it, and I see this on the console:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gi/overrides/GLib.py", line 633, in <lambda>
    return (lambda data: callback(*data), user_data)
  File "./mat-gui", line 413, in __mat_clean
    if self.liststore[line][0].file.remove_all(whitelist=list_to_add):
  File "/home/intrigeri/cyber/mat/git/MAT/archive.py", line 197, in remove_all
    tarin.extract(item, self.tempdir)
  File "/usr/lib/python2.7/tarfile.py", line 2084, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/lib/python2.7/tarfile.py", line 2160, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/lib/python2.7/tarfile.py", line 2200, in makefile
    with bltn_open(targetpath, "wb") as target:
IOError: [Errno 13] Permission denied: '/tmp/tmp_U07jm/path/to/blib/lib/Blog.pm'


Indeed, in the archive, that file has permissions 0444. Another corner
case to handle :) I'll let you file a ticket about it.

Cheers,
--
intrigeri
| GnuPG key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc
| OTR fingerprint @ https://gaffer.ptitcanardnoir.org/intrigeri/otr.asc