Re: [mat-dev] Regarding archives support

Delete this message

Reply to this message
Autor: jvoisin
Data:  
Para: The metadata anonymisation toolkit mailing list
Assunto: Re: [mat-dev] Regarding archives support


On 01/17/2014 06:53 PM, intrigeri wrote:
> Hi,
>
> jvoisin wrote (15 Jan 2014 02:46:43 GMT) :
>> I just pushed some related things:
>> - zip archives are now supported
>> - nested archives are supported, but not from the GUI
>> - mtime/atime are now handled for files within archives, and also for
>> office documents
>> - office document handling has been heavily refactored and simplified.
>> As a side effect, the archive's handling code is now a little bit more
>> complex.
>
>> As usual, feedback is more than welcome, since I'm planning to do a
>> release this weekend.
>
> I'm very glad these features were implemented!
>
> As usual, avoiding to craft many unrelated changes into a commit, and
> splitting commits a bit more, even for related changes, would help
> reviewing and giving feedback. And please, commit messages that tell
> others a bit more (e.g. what was broken before, why the change fixes
> it and why it's done this way). Thanks in advance!

Noted.
>
> Anyway, I gave up reviewing bbe17fd and a few other commits, because
> it just seemed to do too much to fit into my mental space, and it
> provided far too little explanation for me to do a useful review
> without spending two hours putting the entire context back into my
> mind. Sorry for the grumpiness: I really wanted to be helpful, and
> could not, which is a bit frustrating.

Sorry about this :/
>
> I'm wondering what should be done with the binary files in the test
> suite data. I'm not saying we should absolutely generate all of it
> from source, and keep these files out of Git (ideally, most of it
> should, but well). But perhaps documenting why and how they are
> changed in a given commit would be good?

Noted.
>
> While I'm at it: s/double-clic/double-click/

Fixed
>
> About archives support: I've run mat (on the command-line) on
> a tarball with Perl scripts and a PDF in it, and MAT tells me it was
> successfully cleaned, while only the PDF is in the resulting "cleaned"
> archive. So I guess there are some missing bits here.

This is by design. I just documented it in 9987e61 because it wasn't
obvious at all.
>
> Then I've retried in the GUI with the original archive. There is quite
> some room for improvement in the UX:
>
> 1. The "those files [...]" message is part of the scrollable area, so
>    in practice, I was shown a few hundred files, and *then* only,
>    after scrolling to the end, I get to learn what this list of files
>    is about. I suggest moving this message a. to the beginning, before
>    the list; and b. outside of the scrollable area.

Agreed, fixed.
> 2. It's not clear what happens if I select (or if I don't select)
>    a given file. The "Include" column header is not enough.

Agreed to, I clarified this.
> 3. The message says "MAT will not clean your archive". Really?
>    Won't it rather clean whatever it can, and additionally include the
>    unsupported files the user selected?

This was a remain. Fixed

>
> I suggest the following UI:
>
> MAT is not able to clean the following files, found in the
> FILENAME archive.
>
> Select the files you want to include in the cleaned up
> archive anyway.
>
> Also, the OK button text should be something more specific, such as
> "Scour" (since it's apparently the new terminology; too bad many
> non-native speakers likely don't know what this means).

The reason of the apparition of this new terminology (Apart from
enriching my English vocabulary) is actually to help translators.
The MAT used to use the word "Clean", both as a verb/action, and as a
state. Now, the action is named "Scour", and the state "Clean"
>
> Then, I've selected a file or three in the list, and clicked OK.
> The archive is not cleaned, the GUI does not tell me anything about
> it, and I see this on the console:
>
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/gi/overrides/GLib.py", line 633, in <lambda>
>     return (lambda data: callback(*data), user_data)
>   File "./mat-gui", line 413, in __mat_clean
>     if self.liststore[line][0].file.remove_all(whitelist=list_to_add):
>   File "/home/intrigeri/cyber/mat/git/MAT/archive.py", line 197, in remove_all
>     tarin.extract(item, self.tempdir)
>   File "/usr/lib/python2.7/tarfile.py", line 2084, in extract
>     self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
>   File "/usr/lib/python2.7/tarfile.py", line 2160, in _extract_member
>     self.makefile(tarinfo, targetpath)
>   File "/usr/lib/python2.7/tarfile.py", line 2200, in makefile
>     with bltn_open(targetpath, "wb") as target:
> IOError: [Errno 13] Permission denied: '/tmp/tmp_U07jm/path/to/blib/lib/Blog.pm'

>
> Indeed, in the archive, that file has permissions 0444. Another corner
> case to handle :) I'll let you file a ticket about it.

Arg. This is not cool.
I'll add a bug.

Thank you very much for the feedback.
>
> Cheers,
> --
> intrigeri
> | GnuPG key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc
> | OTR fingerprint @ https://gaffer.ptitcanardnoir.org/intrigeri/otr.asc
> _______________________________________________
> mat-dev mailing list
> mat-dev@???
> https://mailman.boum.org/listinfo/mat-dev
>