Hi,
> I had a quick look to help a bit anonym.
>
Thanks.
> Alan wrote (01 Mar 2014 23:28:20 GMT) :
> > - Sanitize IPv6 addresses in WhisperBack
> > (https://labs.riseup.net/code/issues/6571)
>
> I'm not following you on the "I don't see why to use such a complex
> solution while addresses in the exemple are already expanded"
> position. The fact the provided example contains expanded IPv6
> addresses does not guarantee that "compressed" (or whatever it's
> called) IPv6 addresses never appear in logs.
>
Sure. But:
- I can't remove something that I don't have examples how it looks like;
- the proposed fix looks too complicated to me and I have no easy way
to test it.
> To clarify: I'm (almost, see below) fine with seeing your proposed
> changes merged, because it improves the current situation, but IMO
> what it does is resolve a (non-existing yet) "Sanitize expanded IPv6
> addresses in WhisperBack reports" subtask, but clearly *not* #6571
> entirely.
>
OK. Created :
- Sanitize expanded IPv6 addresses in WhisperBack
(
https://labs.riseup.net/code/issues/6804) which is ready for QA
- Sanitize "compressed" IPv6 addresses in WhisperBack ;
(
https://labs.riseup.net/code/issues/6805) which is not.
> > + log_string = re.sub(r'[0-9a-fA-F]{4}(:[0-9a-fA-F]{4}){7}',
> > + r'[IP6 REMOVED]',
> > + log_string)
>
> Better make the regexp case insensitive, I think.
>
Why? The above regexp will match uppercase and lowercase.
> Also, there's "IP6" written in a few places. Is this intentional?
>
Yes. What would you prefer?
> > - Whisperback should sanitize DMI
> > (https://labs.riseup.net/code/issues/6797)
>
> I'm unsure about that one. The DMI string often contain useful info
> (e.g. the version of the BIOS), so killing it entirely because it may
> contain serial numbers does not look optimal to me.
>
I think we already have most of the information we want from dmidecode,
which gives it more selectively. I fail to see how keep what could be
useful to us while removing the serial number of the machine in this
very vendor-specific string, so I preferred to be on the safe side.
> > - Whisperback should sanitize more serial numbers
> > (https://labs.riseup.net/code/issues/6798)
>
> The regexp could certainly use some refactoring at some point,
> but that's no blocker yet.
>
Please see #6800 "Whisperback should use a generic regexp to
match serial numbers" for the next step (for 1.1 I think). I just
wanted to have a chance to include the improvements that was already in
my git to land in 0.23 and 1.0.
>
> Also, I'm glad there is a test suite, but the way it works seems
> strange to me: if I understand it correctly, it's testing that
> "[$something REMOVED]" can be found after sanitizing, but it does not
> test that the info that should be removed cannot be found anymore,
> which is what we should really test, no?
It would be better, yes. Want to create a ticket?
Thanks again for this review,
Cheers