Re: [mat-dev] [Pkg-anonymity-tools] pdf-redact-tools

Delete this message

Reply to this message
Author: Micah Lee
Date:  
To: intrigeri
CC: pkg-anonymity-tools, mat-dev
Subject: Re: [mat-dev] [Pkg-anonymity-tools] pdf-redact-tools
On 02/24/2015 02:31 PM, intrigeri wrote:
> We're already maintaining MAT [1]. I understand pdf-redact-tools
> covers some needs that the MAT doesn't address yet. Still, I'm
> personally reluctant to see this kind of functionality thinly spread
> over various packages in Debian, mostly for UX reasons, and also
> because IMO there are currently to few people working in this niche
> field for us to really benefit from competing implementations, so:
>
> * Are there any plans to make both tools converge somehow, e.g.
> by adding the missing features in MAT? Both are implemented in
> Python, so the starting point doesn't seem that bad :)


Interesting idea! I don't have any plans to do this, but it would be
cool to merge the projects. Although they are kind of for different
things, and I foresee a lot of work going into making this
fully-functional in a GUI.

Specifically, pdf-redact-tools splits a PDF into several PNGs that can
then be edited (with separate software, like GIMP) to draw black boxes
over it, and then merges them back together again. Obviously it would be
cool if the drawing-black-boxes functionality could go into the MAT GUI,
but it would be a lot of work.

For example, here's a document that was redacted with pdf-redact-tools
(see page 22):

https://s3.amazonaws.com/s3.documentcloud.org/documents/1278905/sharing-communications-metadata-across-the-u-s.pdf

We took the original document, converted it to a PDF, split it into a
bunch of PNGs, edited some of them, and then merged it back into a PDF
for publishing.

> * How does `pdf-redact-tools --sanitize' differ from the seemingly
> equivalent feature provided by MAT?


I wasn't aware that MAT provided any sort of sanitizing functionality.
What pdf-redact-tools does to sanitize a PDF is split the original into
PNGs and then merge it back into a new PDF (with a much larger file size).

> (OT comment: I'm always concerned by seeing tools pretend they do
> something "securely". Not only that's a very bold statement, but that
> that's not my main concern. What really bothers me is that it
> re-inforces the idea that computer security is a boolean value, that
> tools are either secure or not -- that's a far too common mistaken
> belief already, that IMO is undermining any educational effort aimed
> at conveying the basics of threat modeling and of adjusting one's
> defenses to whatever adversaries they expect. Try doing Tails user
> support for a couple days on IRC, and then it'll become obvious how
> painful it is to constantly struggle again this misconception. May you
> please securely redact that statement? :)


Quite true! In this case I'm using the term "securely" to mean not
accidentally including any text data in the final output that was meant
to be redacted (because the final output doesn't include any text data).
And although it's bold, I do believe it succeeds at this.

Of course, it's not necessarily secure against malicious PDFs.
pdf-redact-tools just calls out to ImageMagick to do the tricky work, so
if the input PDF exploits a bug in ImageMagick, the computer could still
get hacked.

So perhaps some rewording/warning is in order.

And here's another note about this project. Being able to redact
documents before publishing them is very important for journalists, but
there don't seem to be *any* other free software solutions to this
problem, and journalists frequently mess up and make it so you can
select and copy text under the black boxes they add [1]. There are
various proprietary tools that let you do this, most notably Adobe
Acrobat Pro.

pdf-redact-tools isn't even the best solution -- it requires users to
use the terminal and a separate graphics editing program, but it's
better than having to buy Adobe products and not use Linux. Considering
OSX is such a popular platform for journalists, I also like that
pdf-redact-tools works great in OSX.

[1]
https://www.techdirt.com/articles/20140128/08542126021/new-york-times-suffers-redaction-failure-exposes-name-nsa-agent-targeted-network-uploaded-pdf.shtml

--
Micah Lee