Re: [T(A)ILS-dev] Metadata Anonymizing Toolkit for file Publ…

Delete this message

Reply to this message
Author: Antonio Davoli
Date:  
To: intrigeri
CC: Damian Johnson, Tor Assistants, The T\(A\)ILS public development discussion list, Peter Eckersley, tech
Subject: Re: [T(A)ILS-dev] Metadata Anonymizing Toolkit for file Publication - GSoC'11 Proposal
Hello all,

I am sorry for the late answer but I had to prepare a talk for this Monday.

On Thu, Mar 31, 2011 at 6:18 PM, intrigeri <intrigeri@???> wrote:
>
> > I will look forward to your comments.
> Here is the first batch.
>


First, I would like to thank you for all the interesting points you have
raised.

This would be most welcome but I'd be worried to see this goal make
> the implementation too painful; have you checked if the tools and
> libraries you plan to use do support Windows and Mac OS X as well as
> GNU/Linux? Is "making install easy on supported platforms" part of
> your proposal?
>


I had checked them before sending the proposal. It seems that all the
libraries
(with the exception of librtf) are supported without problems on
Windows and
Mac OS X. However I am thinking to change the proposal to not include the
multi-platforms capability. It could always be integrated in the
next versions.



> About the choice of C++ for the MAT library: [...] Python that seems to
> have libraries supporting all the planned file types ready to be used
> in Debian.
>


I have considered your points and I agree with you in changing in favour of
Python, even because the C++ seems to be hostile both from the project
point of view and from the interaction with the mentors.


A few questions about supported file formats:
>
> - I suggest you try supporting ZIP archives as well as Tar ones. What
> do you think? Do you plan to anonymize metadata in archive headers
> only, or files contained in such archives as well?
>


I am planning to add the support for the ZIP archives as I did for the Tar
ones.
I think that the idea to anonymize also the files inside the archive is
quite
interesting and I want to add it in the new version of the proposal.


> - What about video files? I expect you at least make it clear in your
> proposal why you put these aside, if you really plan to do so.
>


I found a project for the support of XMP metadata written in Python
(http://code.google.com/p/python-xmp-toolkit/) with a long series of
supported
files and which is based on the SDK of Adobe. Among them the are several
video files included.


> > For what concerns the command-line utilities, a tool for each type
> > of files will be created.
>
> What is the rationale for exposing such details to the user, instead
> of a common command-line frontend?
>


Maybe it is better to remove this capability and maintain only a classical
front end
for the users interaction.


> srm is a quite well-spread name. Could you please point us at the
> implementation you are talking of?
>


I thought to use the wipe implementation (http://wipe.sourceforge.net/)
or the secure-delete package. Both of them are available in the Debian
packages. However I would like to spend some time in comparing these
two implementations.


It's not clear if you plan to rely on manual testing or rather on
> writing a test-suite {before,while,after} implementing the code. What
> are your plans for unit-testing? GUI testing? Test-suite framework?
>


After the change of language, I think to implement all the test suite
with the pyUnit test framework. For what concerns the creation time, i think
to develop the test {after,during} the coding phase. The tests can be even
useful to define the guidelines of the projects.


> I'm not sure about your plan to add i18n support as a second step. Why
> not writing the library and command-line utilities with i18n-support
> built-in right from the start?
>


Your point is correct, I am going to change that sentence on the proposal
because it is easy to misunderstand. Of course I should have written in a
clearer way.



> Same for the Windows and OSX support: in my humble opinion, if you
> really want this requirement to be part of your project, you need to
> carefully choose the libs you use with this in mind right from the
> start. Starting to port the code to these platforms after 1.5 months
> spent writing it is, IMHO, very much optimistic.
>


As I said before, I prefer to remove this support. It should take a lot of
time
that I prefer to spend in creating a complete project for Linux.


>
> About "merging with Tails", I think this item has two sides:
>
>  1. In order to be properly installable in Tails, your set of tools
>     will need to be packaged for Debian, i.e. we should at least be
>     able to prepare custom .deb packages from your code. You do not
>     necessarily need to plan preparing the Debian packages yourself,
>     especially if you don't know Debian packaging yet (do you?). Nor
>     do you need to care that much about having the software uploaded
>     to Debian: filing a RFP bug should be enough from your side.
>     Well, I'd be delighted if you would integrate these tasks as part
>     of your summer schedule if you feel this is realistic, but don't
>     worry too much if you feel you can't.

>


I hope to have enough time to study the .deb in these days, in order to
introduce that capability inside the new version of the proposal.

2. In order to make sure your code can be used in Tails, making sure
>     you pick tools and libraries that are available in Debian Squeeze
>     (+ squeeze-backports) is a must and should not be a late
>     requirement. Also, when do you plan to start testing your code
>     inside of Tails? I suggest doing this on a regular basis, e.g.
>     once a week or once every two weeks, to make sure you don't go a
>     wrong way for too long.

>


Thank you for the hint, I am going to check all of them.



> Basically, without invoking heavy words such as "continuous
> integration" or "agile development", I'd like you to make it easy for
> us at every single step to test your work against our practical
> requirements, and even better to try it yourself in a realistic
> production setup. How do you see this process?
>


I plan to create the test suite for the project during (as said above) the
phase of code creation.


> > Merging with the library and command-line tools.
>
> What do you mean?
>


This sentence is not clear and I am going to remove it.


> About documentation, I read "Starting of a documentation’s draft."
> planned for the end of July. Are you talking of end-user documentation
> or design documentation here? While I'm comfortable with seeing the
> end-user doc written this late, I'd prefer design documentation to be
> written at every single step of your work, so that we (mentors,
> reviewers) don't need to guess from your code what you are trying to
> do.
>


With that sentence I was directed to the end users documentation, as I said
before
I am going to write the documentation for the design as the first step.


>
> What VCS do you plan to use? (The Tails team makes heavy use of Git
> offline work and branching features.)
>


Actually I am a Git user for the projects that I am following at the
University.
However I am going to spend more time on studying its use in a productive
environment.


> About your code samples:
>
> - I'll have a look at CAPWAP once I get online.
>


I am not maintaing CAPWAP anymore but I have been maintaining it for one
year.
I had to corrected several bugs present in the protocol implementation, but
my primary
goal has been to create a solution for frequency planning in large scale
Wireless Networks
and I introduced several new messages in the implementation. I have also
created a
multithread server for the interaction of the protocol with the external
applications.
However I have not still uploaded the code of my thesis on my page but I
should do it
very soon. Furthermore I collaborated on a publication of this work that is
actually
in press (http://linkinghub.elsevier.com/retrieve/pii/S0140366411000107).


>  - ext3 undelete: do you plan anything about it beside the code dump
>    you showed us? Seems like the code in the tarballs has not been
>    changed since 2009, and I could find no installation / usage
>    instructions.

>


Unfortunately the manual of this project is written in Italian (it was a
project
that I created for an university exam). I have not had time to upload it to
sourceforge
and to write an extensive documentation. I am considering to remove it from
the
proposal if I will not have time to write any guide/tutorial.


> - Are you still using and maintaining openVPNServer?
>


During this year I stopped my work with these code because I started my PhD.
Thus this code is not maintained anymore.


>
> > Because of my studies I can not dedicate all the time to this
> > project, but I can assure a commitment of at least four hours a day
> > that I’m going to spend entirely on this project.
>
> Do your studies happen to go on until the end of the GSoC coding time?
>


I am going to be busier untile the June 15 th. However I am going to be
completely free from July 15 th.


> > I think to accomplish the project within the established schedule,
> > but I would realy like to collaborate with the Tor community and
> > even within the Tails project that I believe is really useful and
> > interesting. Naturally I would like to follow the evolution of my
> > project.
>
> Glad to hear this; on the other hand the question was about the
> *chances* you will stick around, rather than if you "would like" to.
> See what I mean? (Every developer "would like" to maintain the
> software s/he writes, but as a matter of fact not all do.)
>


Of course, I was considering that change to join an open source
community full time :)


> > Usually I prefer to inform my supervisor (as I do with my thesis
> > advisor) weekly. I can do it through a weekly summary about the
> > progress or through a IRC session. I think it is quite good to plan
> > a series of meeting weekly in order to check together the status of
> > the development.
>
> Ok, seems great to me. Weekly reports over email seem perfect to get
> the rest of the Tails community tuned with your progress. Planned IRC
> meetings between the mentor(s) and you seem fine to me as well, but
> are no replacement for the reports.



Ok, great.


>
> > my nickname on IRC is double.
>
> Adding what IRC network you happen to connect to would be helpful.
> Please note I mostly work offline and we'll mainly communicate over
> async' media. Not that many Tails developers hang out in
> #tails@freenode IRC channel actually. Are you comfortable with this?
>


I am sorry for the forgotten network. My nickname is already registered
inside the freenode network. However if developers (and mentors) prefer
to communicate through the email exchanges, I am completely comfortable
with their choice.

> April 26 - May 24 (Pre-SoC): Start to get confidence with all the
> > libraries that will be used inside the toolkit and with the toolkit
> > GTK+.
>
> Seems nice. I'd suggest getting confidence with the tools used by the
> Tails team e.g. ikiwiki and Git, as well as getting involved a bit in
> the Tails community.
>


I should read about ikiwiki, I have never used before.


>
> I'd like you to propose a bit more detailed plan with a more precise
> specification as well as finer-grained steps for the involved tasks.
> Sentences like "Creation of the library and of the command-line
> utilities." look a bit scarce to me. Don't you want to integrate more
> of the specification I sent you over email in your proposal? This
> would make it clearer what you intend to achieve, and what you intend
> to let aside.
>


I am going to improve the specification and also the timeline with more
details
about every step.


> A few more questions:
>
> How much GNU/Linux experience have you got, both from a user and
> developer point-of-view? The metadata in your proposal PDF file
> pretends you produced it using Microsoft Word on Mac OS X, so I
> wonder ;)
>


I was late and I wrote the proposal on my laptop with that horrible version
of Word (it is the worst editor that I have ever used). Anyway I started to
use GNU/Linux something like seven years ago and I had the change to
develop several applications (mostly for university projects) regarding
also system and network programming.


> What are the parts of your planned work that may be harder than
> others? Why?
>


I think the hardest part will be the starting point and the design of the
structure
of the main library. The production of code should not be so hard,
especially
if the design is well done.


When/how will you document the library API?
>


I am going to study Sphinx, because I want to write the documentation
directly
during the writing phase of the API functions.


> Will the GUI you plan to create have drag'n'drop support?
>


I think at the moment I am going to discard this support, unless your
thoughts
are different about it.


> What license do you plan to release your code under? (Tails is using
> GPL3+.)
>


I agree to release the code with GPL3+ license in according to Tails.


> To end with, I think you should have at least one native English
> speaker proofread your proposal's text as the last step before the
> deadline, once you feel it's ready for prime-time.
>


I have found a couple of speaker that are going to help me with the last
version
of the proposal.

Do you want to read the newer version when I finish to write it? Naturally,
I don't
want to waste your time.

Cordially,

Antonio Davoli