Re: [T(A)ILS-dev] Another GSoC proposal for the meta-data an…

Delete this message

Reply to this message
Author: julien.voisin
Date:  
To: intrigeri
CC: tor-assistants, The T\(A\)ILS public development discussion list
Subject: Re: [T(A)ILS-dev] Another GSoC proposal for the meta-data anonymizing toolkit
2011/4/7 intrigeri <intrigeri@???>

> Hi Julien, hi fellow Tails developers!
>
> Thanks for your quick reply. I think the next thing to do is to update
> your proposal on the Google Melange website accordingly to the results
> of this discussion... and reply the next bunch of questions I am
> asking bellow.
>
> > I don't have thought to secure removal,
> > because it's heavily hardware/filesystem dependent.
> > But I think a binding to smr (Secure Remove) or shred (GNU Core
> Utilities)
> > would be fine.
>
> I'll let you decide whether you want to make this feature an
> {important, optional} part of your proposal. Anyhow, I'd be glad to
> see it mentioned at least as a desirable future improvement, just to
> make sure it is taken into account when designing the lib and apps.
>
> > But, I don't plan to implement the "Proprietary file needing
> > conversion", since I don't think that it's the job of a
> > "metadata-cleaner".
>
> Fair enough.
>
> >>Tails being based on Debian Squeeze, would you mind making
> >>"compatibility with Debian Squeeze + official squeeze-backports" a
> >>formal goal of your GSoC project?
> >>
> >Since the app will be coded in pure python,
> >it will run on any platform with Python.
> >But yes, I can do more intensive testing to ensure the compatibility.
>
> I see your proposal now explicitly states the "run on Debian Squeeze"
> requirement, which is great. Beware: using "pure Python" won't give
> this for free. Let me explain you my point a bit.
>
> You've mentioned you intend to use some libraries, such as Hachoir.
>
> As you do know, both Python and libraries tend to add features in
> every new release. Some newest releases may not be available in the
> Debian Squeeze environment I described => in order to make your
> results usable in Tails, you'll need to make sure you don't use any
> feature that appeared too late to make it into Squeeze.
>
> Also, not every Python library is packaged into Debian. For
> maintainability reasons, I would not like to make Tails dependent on a
> Python library that is not part of Debian => in order to make your
> results usable in Tails, you'll need to make sure you only use
> libraries that are part of Debian Squeeze.
>
> End of explanation. No answer expected.
>
> > I can learn git, it's not a big deal.
>
> Great. Maybe this could happen during the Community Bonding Period so
> that you don't spend too much time getting up to speed with the tools
> during the actual coding time.
>


That's what I was planing to do.


> >>Can we by chance see this code?
>
> >>Wouldn't it be better to add support to Hachoir itself (possibly
> >>using such an external library) for file formats you want to support
> >>but not supported yet?
>
> > It's not exclusive. If someone as already done a lot of work in
> > another library than hachoir, with this design, its would be easy to
> > add it into the tool.
>
> Ok.
>
> >>I would prefer the whole meta-data fieldset to be replaced by a set of
> >>data that would be common to every user of the meta-data anonymizing
> >>toolkit.
> >>
> >>What do you think?
>
> > It make more sense than my idea : there are too much differents fields,
> > and too much interractions between them to do a nice "fields-analyser".
>
> Ok.
>
> > But I think that the good option would to let the user choose between
> > "I'm mister nobody"/"I don't want any meta" and a custom sheme.
>
> > I don't realy like the principe of "all or none".
>
> > But, afterall, the batch-mode "make my meta common" is a priority,
> > and the "make me custom data" is more accessory.
>
> Indeed.
>
> IMHO this shall appear clearly in your proposal, and not only in the
> title (which is "Meta-data anonymizing toolkit" rather than "Meta-data
> customization toolkit" for reasons).
>
> >>If you don't mind, using PyGTK would avoid adding a dependency on
> >>pyQT in Tails, that already depends on the former but does not ship
> >>the latter. What do you think?
>
> > I don't have looked a lot into GUI stuff, so I don't have any
> > preference.
>
> I see you switched your proposal to PyGTK. Thanks.
>
> >>How do you plan to make it easy for us to test your code (say, every
> >>week or two) in Real World conditions, i.e. in Tails? If you have any
> >>experience in Debian packaging, this would be the way to go. Else,
> >>please let us know and we'll deal with this part of the work.
>
> It seems to me you did not answer this question of mine.



I am interested to do it, but heard that it can be quite tricky, so a little
help would be much appreciated

>>> Timeline:
> >>
> >>>     * first three weeks :

> >>
> [...]
> >>Ok. Only dark spot: "begin of the cmdline tool" seems quite vague to
> >>me... especially since no other period is scheduled later to finish
> >>this task.
>
> > Since I'm planing to develop the tool feature by feature, I can't
> > garantee that the cmdline tool will be achieved at the end of the
> > first three weeks.
>
> Right. So you need to schedule some time later to finish it, don't
> you?



Yes, I do.


> > "Begining" is not the right word, "coding the essentials features"
> > fits better.
>
> Well, I feel the need to insist a bit. I understand you'll begin your
> implementation with the essential features, which seems fine to me.
> What are the command-line tool features you consider to be essential,
> what are the non-essential ones, then?
>


Essentials features:
-reading meta
-deleting meta
-writing meta
-listing all the meta

Non-essentials features:
-interactive editing of the meta


> > I think I'll focus on :
> > - pdf
> > - mpeg audio
> > - ogg
> > - bmp/gif/jpeg/png
> > - exe
> > - archives (bzip2, zip, tar)
>
> Fine with me.
>
> >>>     * 1 week : emphasis on the unit test
> >>>           o For such a critic tool (the smallest crack could
> >>>             compromise the user), the testing should be bulletproof !
> >>>             So I’m planing to focus on it one week long.

> >>
> >>I'm not sure about this one, although I do like your emphasis on
> >>robustness and unit testing.
> >>
> >>On the one hand the first three weeks schedule seems to indicate you
> >>intend to implement the tests roughly at the same time as the tested
> >>code ("implement the first tests (for EXIF)"), which I like very much.
> >>On the other hand you schedule one full week dedicated to unit testing
> >>at two-third of the coding period. It seems to me you do not need to
> >>spend one week implementing tests at this point, if they have been
> >>properly written {before, while, soon after} implementing the tested
> >>code. What do you think?
>
> It seems to me you did not answer this question of mine.
>


I’d like to use the unitest lib to implement end-to-end testing.

I’m planing to have a collections of file with dangerous meta for each type
of file, so I can run the tool on it to delete/anonymize to files,

and then re-run the tool on it the show if they are some compromising meta
remaining. The final goal is to be able to test the whole tool in only one
or two commands,

no matters the OS/environment. Another job of this test-suite is to serve as
a regression test suite.


>
> >>
> >>>     * Remaining weeks: cleanup, bugfixing, integration work, final
> >>>       documentation

> >>
> >>Three weeks seem like quite long for this, but this might be because I
> >>failed to see the exact scope meant behind your words. Mainly
> "integration
> >>work" and "final documentation" may require very little or very much
> >>time, depending on what you mean:
> >>
> >>  - end-user and/or design documentation? (I'd rather design
> >>    documentation to be written before/while/soon-after every step.)
> >>  - integration work == ? Packaging for Debian (very useful for Tails)
> >>    and other distributions? Packaging for foreign operating systems
> >>    (read: Windows, OSX; beware that this does not take much more time
> >>    than expected, to the detriment of other planned tasks)? Did you
> >>    mean anything else than that?

>
> > I'd like to keep the three remaining week in case of problems,
> > or missings features, so I am sure I'll be able to deliver a
>


"complete product" was the missing end :)


>
> Seems like you did not finish your sentence. I presume that's why
> I can find no answer to my question about integration work.
> I can find your answer about documentation a few lines bellow, though.
>
> >>> Every Week-end : documentation time!
> >>
> >>End-user documentation and/or design documentation?
>
> > Both.
>
> > -design :
> > I'd like to review my code frequently do document it,
> > and to correct typo and other dumb stuffs, in order to produce
> > clean and readable code.
>
> > -end-user :
> > It's more easy to document the code after beeing sure it works (so, not
> > while developping/testing it), but not to late either.
>
> Great. Considering you'll review and cleanup your code + write
> documentation in such an incremental way, it's likely you won't need
> that much time for those tasks at the end of the coding period.
>
> So this is still the part of your schedule that is the most unclear to
> me. I appreciate your wish to plan time "in case of problems, or
> missings features", but then please state it clearly in your proposal,
> rather than listing tasks that will be done already at this time,
> according to the development process you are describing. If needed,
> please clarify what part of these tasks you plan to do incrementally,
> what part you plan to do during these 3 weeks.
>


I have updated my application.


>
>
>>You need to know I spend most of my time offline, especially in the
> >>summer. Therefore I mostly communicate over asynchronous media such
> >>as email, which I generally read and reply every day => round-trip
> >>time generally less than 24h. Knowing this, how do you see things?
> >>What can be a reasonable way to make your "using irc quite a lot"
> >>and my "mostly using email" fit together joyfully? E.g. we could
> >>additionally formally schedule IRC or XMPP meetings on a regular
> >>basis (say 45min every week) so that we can discuss things more
> >>smoothly.
>
>

Saying that I use quite a lot irc doesn't mean that I do not like other
communication medium : mails are nice too :)
Furthermore, I really like your idea of "weekly meeting".


>
> >>As a final question, what are your plans for the "Community Bonding
> >>Period" [8] (April 25 - May 23)? Google describes this as "Students
> >>get to know mentors, read documentation, get up to speed to begin
> >>working on their projects."
> >>
> >> [8]
> >
> http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
>
> > I'd like to begin reading some documentation about GUI in Python,
> > and to do some preliminary work:
> > - thinking about problems I may encounter
> > - doing prototypes/testing of my implementation ideas
> > - exchanging some ideas with the community/my mentor
> > - getting in touch with Hachoir lib
>
> Ok. This is a great amount of work you plan to do during this time.
> Maybe you would benefit prioritizing these items. E.g. I see you
> mention Python GUI programming as something you intend to spend some
> time on during the Community Bonding Period. Reading other parts of
> this discussion, it seems to me this is entirely novel to you, so
> maybe you should consider making it your main technical topic for this
> period, vs. other kind of tasks you already feel quite comfortable
> with. What do you think?
>
>


The task I’m planing to do in the bonding period are not order by
importance.

Some are more difficult/important than other. I think I’m going to mainly
focus on the GUI, and on Hachoir.


Other questions:
>
> You mentioned "batch mode to handle a whole directory (or set of
> directories)" in your list of deliverables, but I did not see it
> anywhere in your schedule. Do you still plan to implement this? If you
> do, when?
>
>

I’m planing to implement this during the first month (I have changed a
little bit the timeline)


> When you state in your proposal "I prefer to focus on the
> algorithmes/efficacity of the tools and the CLI, instead of the GUI.";
> are you expressing your personal preference in general, or your
> feeling about what is more or less important for this specific
> project?
>
>

I am expressing my personal preference, but despite that nowadays I think
that I must provide a solid CLI tool, before starting an graphic one.


> In any case: I do agree the backend library's quality is very
> important for any one of the other tools to work properly and be easy
> to maintain. On the other hand, I think the GUI is at least as
> important as the CLI. I think people used to CLI and manpage reading
> are probably *already* able to anonymize meta-data in files, by using
> exiv2, pdftk and friends; a consistent CLI interface for doing this
> would clearly be most welcome, but please consider the case of other
> (!CLI-friends) people who currently have no way to anonymize meta-data
> in files.
>
> > I think I could deliver a finished product after the GSoC.
> > And I'd like to stay around and contribute more after that!
>
> Glad to hear this :)
>
> Bye,
> --
> intrigeri <intrigeri@???>
> | GnuPG key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc
> | OTR fingerprint @ https://gaffer.ptitcanardnoir.org/intrigeri/otr.asc
> | Did you exchange a walk on part in the war
> | for a lead role in the cage?
>