Re: [mat-dev] [Pkg-anonymity-tools] pdf-redact-tools

Delete this message

Reply to this message
Author: intrigeri
Date:  
To: Micah Lee
CC: pkg-anonymity-tools, mat-dev
New-Topics: Re: [mat-dev] [Pkg-anonymity-tools] pdf-redact-tools
Subject: Re: [mat-dev] [Pkg-anonymity-tools] pdf-redact-tools
Hi,

Micah Lee wrote (25 Feb 2015 01:44:09 GMT) :
> On 02/24/2015 02:31 PM, intrigeri wrote:
>> * Are there any plans to make both tools converge somehow, e.g.
>> by adding the missing features in MAT? Both are implemented in
>> Python, so the starting point doesn't seem that bad :)


> Interesting idea! I don't have any plans to do this, but it would be
> cool to merge the projects. Although they are kind of for different
> things, and I foresee a lot of work going into making this
> fully-functional in a GUI.


> Specifically, pdf-redact-tools splits a PDF into several PNGs that can
> then be edited (with separate software, like GIMP) to draw black boxes
> over it, and then merges them back together again. Obviously it would be
> cool if the drawing-black-boxes functionality could go into the MAT GUI,
> but it would be a lot of work.


Indeed. Julien (author of the MAT), do you think that
pdf-redact-tools' PDF split and merge functionality would be a good
fit for the MAT, e.g. assuming it's fine to make it work from the CLI
only? Admitedly it is a bit far fetched, but with the big picture in
mind, perhaps it would be worth adding one a-bit-off-topic feature to
avoid greater code and project overhead duplication?

>> * How does `pdf-redact-tools --sanitize' differ from the seemingly
>> equivalent feature provided by MAT?


> I wasn't aware that MAT provided any sort of sanitizing functionality.
> What pdf-redact-tools does to sanitize a PDF is split the original into
> PNGs and then merge it back into a new PDF (with a much larger file size).


OK, thanks. IIRC the MAT re-renders each PDF page onto a Cairo
surface, and then saves all the resulting surfaces as a shiny new PDF.

I suggest you two (Julien and Micah) compare your differing methods,
find out what's the best one, and then converge on it: the good news
is that MAT is meant to be usable as a library, so as a first
convergence step, whatever is considered to be state-of-the-art could
be implemented in the MAT and then pdf-redact-tools could use it.

Perhaps pkg-anonymity-tools@ can be dropped from Cc for the rest of
this specific discussion. (I read mat-dev@, no need to Cc me.)

>> (OT comment: I'm always concerned by seeing tools pretend they do
>> something "securely". [...] May you please securely redact that
>> statement? :)


> Quite true! In this case I'm using the term "securely" to mean not
> accidentally including any text data in the final output that was meant
> to be redacted (because the final output doesn't include any text data).
> And although it's bold, I do believe it succeeds at this.


Cool! I think "securely" can then be replaced by something that
expresses this design goal more clearly: this will address my concern
*and* make it clearer to potential users what the tool is meant to
guarantee :)

> Of course, it's not necessarily secure against malicious PDFs.
> pdf-redact-tools just calls out to ImageMagick to do the tricky work, so
> if the input PDF exploits a bug in ImageMagick, the computer could still
> get hacked.


> So perhaps some rewording/warning is in order.


Indeed.

> And here's another note about this project. Being able to redact
> documents before publishing them is very important for journalists, but
> there don't seem to be *any* other free software solutions to this
> problem, and journalists frequently mess up and make it so you can
> select and copy text under the black boxes they add [1]. There are
> various proprietary tools that let you do this, most notably Adobe
> Acrobat Pro.


> pdf-redact-tools isn't even the best solution -- it requires users to
> use the terminal and a separate graphics editing program, but it's
> better than having to buy Adobe products and not use Linux. Considering
> OSX is such a popular platform for journalists, I also like that
> pdf-redact-tools works great in OSX.


Got that part, it covers important usecases, hence my initial "Yay!" :)

Cheers,
--
intrigeri