Re: [Tails-ux] Report on Piwik prototype

Delete this message

Reply to this message
Author: jaster
Date:  
To: Tails user experience & user interface design
Subject: Re: [Tails-ux] Report on Piwik prototype
Once upon a time, webmasters had the same problem with AOL's proxy.
Millions of people had the same IP. The workaround was to count a new
user session every time the same IP was inactive for a few minutes.

One problem with IPs now is that the EU privacy rules seem to require
not identifying users to websites without their opting in for that.

IMO, counting people is overrated. The better questions are around why
people come and what they ask for and do. User vocabulary. Page views.
Search terms. You can learn a lot about those things without needing to
identify individuals.

jaster


On 2017-09-04 16:14, sajolida wrote:
> u:
>> sajolida:


>>> a. Ask boum.org to deactivate IP anonymization. We could import the logs
>>> daily and then get rid of the original logs and rely instead on the IP
>>> anonymization feature of Piwik [3]. It's not a hack but serious stuff
>>> build for users with legal requirements so I expect it to be well
>>> integrated and doing what it should.
>>>
>>> Downsides:
>>>
>>> - I'm not sure boum.org will be able or ready to have IP in their logs.
>>> - Relying on logs of activity done through the Tor network might not
>>> provide a perfect way of singling out people. For example, I expect
>>> people using the same exit node to visit our website to still be
>>> considered as a single visit.
>>> - We might still want to keep and analyze the raw logs for some
>>> data that Piwik wouldn't provide us. For example, until now I didn't
>>> find how to replace our boot statistics: see the hits on
>>> security/index.en.atom only by libwww-perl. It's probably possible
>>> but I can say it yet. Or to count the hits on the hash tags that I
>>> used to flag the activity related to the donation campaign. But for
>>> this, there are other mechanisms of Piwik to do this even better next
>>> year.
>>>
>>> [3]: https://piwik.org/docs/privacy/
>>>
>>> b. Rely on the JavaScript. Again we could rely on the IP anonymization
>>> feature of Piwik to keep sleeping at night. It's not clear to me whether
>>> people using the same exit node would be singled out with this technique
>>> (relying on some cookie maybe).
>>>
>>> Downsides:
>>>
>>> - We won't have analytics from people without JavaScript.
>>> - The JavaScript might not give us all the analytics we need.
>>> For example the hits on the security upgrade feed by Tails Upgrader.
>>>
>>> I'm not sure what's best and it would anyway involve a more political
>>> discussion about what information we want from our users. Happy to
>>> gather impression and hints on what would such a discussion imply but
>>> I'm not sure here is the right place to have it.
>>
>> I dont' know where to have this discussion, but here is some input of
>> mine. I've looked at Piwik's demo in the meantime and tried to figure
>> out how much information we would get out of non-anonymized logs.
>
> Cool!
>
> As far as I understand, the big point in either switching to
> non-anonymized Apache logs or the UP anonymization feature of Piwik is
> that we will be able to differentiate *visits* from *hits* and that this
> is a requirement to know "how many people".
>
> For example to differentiate two visits from a single person from two
> visits from two different people. Or in this case to know how many
> people are behind the thousands of hits by Tor Browser.
>
> Being able to identify visits is also the only way for us to study how a
> single person travels through our website; apart from referrer who only
> tell us aggregated information from one page to the other but not
> throughout a whole visit of multiple pages.
>
>> - possibly we'd know how many people access the website from Tails or
>> throught the Tor network, and how many access it through the "normal"
>> Internet which could be interesting. However, I suspect that through the
>> user-agent we will already have this information
>
> Yes. We can already fraction *hits* according to user-agent and know
> which fraction comes from Tor Browser and which does not (putting aside
> that a, probably small, fraction of Firefox users will have the same
> user-agent as Tor Browser).
>
> What we're not able to tell yet is how many *people* (ie. *visits*) come
> from Tor or non-Tor.