Autor: julien.voisin Data: Para: tails-dev, tor-assistants Assunto: Re: [T(A)ILS-dev] [tor-assistants] Another GSoC proposal for the
meta-data anonymizing toolkit
2011/4/7 Robert Ransom <rransom.8774@???>
> On Wed, 6 Apr 2011 23:41:57 +0200
> "julien.voisin" <julien.voisin@???> wrote:
>
> > >What file formats do you intend to support initially?
> > >I think PDF, images, audio and video files are the most important to
> > >support to start with. What do you think?
>
> > I think I'll focus on :
> > - pdf
> > - mpeg audio
> > - ogg
> > - bmp/gif/jpeg/png
> > - exe
> > - archives (bzip2, zip, tar)
>
> How do you plan to anonymize ‘exe’ files? The machine code in a
> compiled program often contains information about the compiler which
> produced it, so scrubbing off the clearly marked metadata is not
> sufficient.
> > Similarly, audio, video, and image data captured by a hardware device
> can often be linked to other data captured by the same device, and
> files in formats that use lossy compression are also likely to contain
> subtle fingerprints of the compressor which produced them. Do you plan
> to research and develop tools that try to remove those fingerprints?
> (A tool which simply scrubs off the explicitly provided metadata would
> still be useful to experts, but it would need to warn users that the
> resulting files may still contain fingerprints which could link them to
> the users' devices and/or software configuration and to other
> previously published files which contain ‘personally identifying
> information’ or intact metadata.)
>
> Since I'll relay on hachoir in a first time, I don't know how much it's
effective on every case.
I am planing to do some tests during the Community Bonding Period.
The plan is to first rely on the capabilities of Hachoir. I don't know yet
how effective it is for each case. I am planing to do some tests during the
Community Bonding Period. Following the test strategy described before, it
will be easy to add a new document in the test set. If I could make a test
file with metas not supported by Hachoir (or someone from the community
provide such a file), we could then decide on the proper action : propose a
patch to Hachoir or use another library for this specific format.
Doing R&D about supporting exotics fields, or improve existing support will
depend of my progress. Speaking of the fingerprints, the subject of the
project is “metadata anonymisation”, and not “fingerprint detection” : they
are too many softwares, too many subtle ways to alter/mark a file (and I
don’t even speak of steganography).
So, nop, I’m not planing to implement fringerprinting detection. Not that it
is not interesting, it’s just that it’s not realistic to support it
correctly (actually it’s not realistic to support it at all, given that I
must first support the metas, in such a short time frame). Would you agree
that it is better to first focus on making a working tool that does the job
for the known metadatas ?
That done, if the design is good we could easily add support for more exotic
fields (or some kind of fingerprinting). I think we should never loose track
of the sad truth : no matter how big the effort we spend on this, making a
comprehensive tool able of detecting every kind of meta and every pattern of
fingerprinting is just not feasible.