Re: [T(A)ILS-dev] Metadata Anonymizing Toolkit for file Publ…

Delete this message

Reply to this message
Author: intrigeri
Date:  
To: Antonio Davoli
CC: Damian Johnson, Tor Assistants, The T\(A\)ILS public development discussion list, Peter Eckersley, tech
Subject: Re: [T(A)ILS-dev] Metadata Anonymizing Toolkit for file Publication - GSoC'11 Proposal
Hi Antonio and all involved people,

(
Upon Damian's request, I'm Cc'ing the rest of the Tor/EFF team
involved in GSoC 2011.

Tor/EFF people: the tails-dev list I am writing to has a public
archive, beware when replying.
Antonio, Tails developers: please keep the Cc: list intact.
)

Antonio Davoli wrote (31 Mar 2011 08:34:48 GMT) :
> This is the proposal I wrote for the Google Summer of Code 2011. You
> can find all the details inside the document.


Thank you for your proposal!

> I will look forward to your comments.


Here is the first batch.

> another target to achieve is the compatibily with different
> operating systems as Linux, Windows and Mac OS X.


This would be most welcome but I'd be worried to see this goal make
the implementation too painful; have you checked if the tools and
libraries you plan to use do support Windows and Mac OS X as well as
GNU/Linux? Is "making install easy on supported platforms" part of
your proposal?

About the choice of C++ for the MAT library: I don't know what the
other possible mentor's situation is, but last time I have written C++
seriously was ten years ago. I don't think this prevents me from
successfully mentoring your project, but you have to be aware of this
when communicating with us; choosing C++ also implies we'll depend on
you for the MAT's future evolutions and maintenance much more than if
you picked a language we are more fluent at, e.g. Python that seems to
have libraries supporting all the planned file types ready to be used
in Debian.

Since you are planning to implement the GUI in Python, why not
implement the MAT library in Python too?

A few questions about supported file formats:

- I suggest you try supporting ZIP archives as well as Tar ones. What
do you think? Do you plan to anonymize metadata in archive headers
only, or files contained in such archives as well?
- What about video files? I expect you at least make it clear in your
proposal why you put these aside, if you really plan to do so.

> For what concerns the command-line utilities, a tool for each type
> of files will be created.


What is the rationale for exposing such details to the user, instead
of a common command-line frontend?

> In order to implement a secure deletion of files after cleaning
> operations MAT will use the library srm. mostly because it is
> supported by several platforms.


srm is a quite well-spread name. Could you please point us at the
implementation you are talking of?

Other than that, I've got a bunch of questions. Please read-on.

It's not clear if you plan to rely on manual testing or rather on
writing a test-suite {before,while,after} implementing the code. What
are your plans for unit-testing? GUI testing? Test-suite framework?

I'm not sure about your plan to add i18n support as a second step. Why
not writing the library and command-line utilities with i18n-support
built-in right from the start?

Same for the Windows and OSX support: in my humble opinion, if you
really want this requirement to be part of your project, you need to
carefully choose the libs you use with this in mind right from the
start. Starting to port the code to these platforms after 1.5 months
spent writing it is, IMHO, very much optimistic.

About "merging with Tails", I think this item has two sides:

  1. In order to be properly installable in Tails, your set of tools
     will need to be packaged for Debian, i.e. we should at least be
     able to prepare custom .deb packages from your code. You do not
     necessarily need to plan preparing the Debian packages yourself,
     especially if you don't know Debian packaging yet (do you?). Nor
     do you need to care that much about having the software uploaded
     to Debian: filing a RFP bug should be enough from your side.
     Well, I'd be delighted if you would integrate these tasks as part
     of your summer schedule if you feel this is realistic, but don't
     worry too much if you feel you can't.


  2. In order to make sure your code can be used in Tails, making sure
     you pick tools and libraries that are available in Debian Squeeze
     (+ squeeze-backports) is a must and should not be a late
     requirement. Also, when do you plan to start testing your code
     inside of Tails? I suggest doing this on a regular basis, e.g.
     once a week or once every two weeks, to make sure you don't go a
     wrong way for too long.


Basically, without invoking heavy words such as "continuous
integration" or "agile development", I'd like you to make it easy for
us at every single step to test your work against our practical
requirements, and even better to try it yourself in a realistic
production setup. How do you see this process?

> Merging with the library and command-line tools.


What do you mean?

About documentation, I read "Starting of a documentation’s draft."
planned for the end of July. Are you talking of end-user documentation
or design documentation here? While I'm comfortable with seeing the
end-user doc written this late, I'd prefer design documentation to be
written at every single step of your work, so that we (mentors,
reviewers) don't need to guess from your code what you are trying to
do.

What VCS do you plan to use? (The Tails team makes heavy use of Git
offline work and branching features.)

About your code samples:

  - I'll have a look at CAPWAP once I get online.
  - ext3 undelete: do you plan anything about it beside the code dump
    you showed us? Seems like the code in the tarballs has not been
    changed since 2009, and I could find no installation / usage
    instructions.
  - Are you still using and maintaining openVPNServer?


> Because of my studies I can not dedicate all the time to this
> project, but I can assure a commitment of at least four hours a day
> that I’m going to spend entirely on this project.


Do your studies happen to go on until the end of the GSoC coding time?

> I think to accomplish the project within the established schedule,
> but I would realy like to collaborate with the Tor community and
> even within the Tails project that I believe is really useful and
> interesting. Naturally I would like to follow the evolution of my
> project.


Glad to hear this; on the other hand the question was about the
*chances* you will stick around, rather than if you "would like" to.
See what I mean? (Every developer "would like" to maintain the
software s/he writes, but as a matter of fact not all do.)

> Usually I prefer to inform my supervisor (as I do with my thesis
> advisor) weekly. I can do it through a weekly summary about the
> progress or through a IRC session. I think it is quite good to plan
> a series of meeting weekly in order to check together the status of
> the development.


Ok, seems great to me. Weekly reports over email seem perfect to get
the rest of the Tails community tuned with your progress. Planned IRC
meetings between the mentor(s) and you seem fine to me as well, but
are no replacement for the reports.

> my nickname on IRC is double.


Adding what IRC network you happen to connect to would be helpful.
Please note I mostly work offline and we'll mainly communicate over
async' media. Not that many Tails developers hang out in
#tails@freenode IRC channel actually. Are you comfortable with this?

> I could also give a hand to Tails Project for the Italian
> translation.


Glad to hear this! Might be a good way to start working a bit together
and see how things go:

       https://tails.boum.org/contribute/how/translate/


> April 26 - May 24 (Pre-SoC): Start to get confidence with all the
> libraries that will be used inside the toolkit and with the toolkit
> GTK+.


Seems nice. I'd suggest getting confidence with the tools used by the
Tails team e.g. ikiwiki and Git, as well as getting involved a bit in
the Tails community.

I'd like you to propose a bit more detailed plan with a more precise
specification as well as finer-grained steps for the involved tasks.
Sentences like "Creation of the library and of the command-line
utilities." look a bit scarce to me. Don't you want to integrate more
of the specification I sent you over email in your proposal? This
would make it clearer what you intend to achieve, and what you intend
to let aside.

A few more questions:

How much GNU/Linux experience have you got, both from a user and
developer point-of-view? The metadata in your proposal PDF file
pretends you produced it using Microsoft Word on Mac OS X, so I
wonder ;)

What are the parts of your planned work that may be harder than
others? Why?

When/how will you document the library API?

Will the GUI you plan to create have drag'n'drop support?

What license do you plan to release your code under? (Tails is using
GPL3+.)

To end with, I think you should have at least one native English
speaker proofread your proposal's text as the last step before the
deadline, once you feel it's ready for prime-time.

Bye,
--
intrigeri <intrigeri@???>
| GnuPG key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc
| OTR fingerprint @ https://gaffer.ptitcanardnoir.org/intrigeri/otr.asc
| Do not be trapped by the need to achieve anything.
| This way, you achieve everything.