Re: [Tails-ux] Report on Piwik prototype

Delete this message

Reply to this message
Author: sajolida
Date:  
To: Tails user experience & user interface design
Subject: Re: [Tails-ux] Report on Piwik prototype
u:
> sajolida:
>> As part of #12562: "Have a web analytics platform" I started playing
>> with Piwik to do web analytics on the activity on our website.
>
> Cool! I don't read the UX mailing list often enough these times and feel
> ashamed that it took me two months to find out about this great work
> you've been doing. I'll try to improve on this!


:)

>> a. Ask boum.org to deactivate IP anonymization. We could import the logs
>> daily and then get rid of the original logs and rely instead on the IP
>> anonymization feature of Piwik [3]. It's not a hack but serious stuff
>> build for users with legal requirements so I expect it to be well
>> integrated and doing what it should.
>>
>> Downsides:
>>
>> - I'm not sure boum.org will be able or ready to have IP in their logs.
>> - Relying on logs of activity done through the Tor network might not
>> provide a perfect way of singling out people. For example, I expect
>> people using the same exit node to visit our website to still be
>> considered as a single visit.
>> - We might still want to keep and analyze the raw logs for some
>> data that Piwik wouldn't provide us. For example, until now I didn't
>> find how to replace our boot statistics: see the hits on
>> security/index.en.atom only by libwww-perl. It's probably possible
>> but I can say it yet. Or to count the hits on the hash tags that I
>> used to flag the activity related to the donation campaign. But for
>> this, there are other mechanisms of Piwik to do this even better next
>> year.
>>
>> [3]: https://piwik.org/docs/privacy/
>>
>> b. Rely on the JavaScript. Again we could rely on the IP anonymization
>> feature of Piwik to keep sleeping at night. It's not clear to me whether
>> people using the same exit node would be singled out with this technique
>> (relying on some cookie maybe).
>>
>> Downsides:
>>
>> - We won't have analytics from people without JavaScript.
>> - The JavaScript might not give us all the analytics we need.
>> For example the hits on the security upgrade feed by Tails Upgrader.
>>
>> I'm not sure what's best and it would anyway involve a more political
>> discussion about what information we want from our users. Happy to
>> gather impression and hints on what would such a discussion imply but
>> I'm not sure here is the right place to have it.
>
> I dont' know where to have this discussion, but here is some input of
> mine. I've looked at Piwik's demo in the meantime and tried to figure
> out how much information we would get out of non-anonymized logs.


Cool!

As far as I understand, the big point in either switching to
non-anonymized Apache logs or the UP anonymization feature of Piwik is
that we will be able to differentiate *visits* from *hits* and that this
is a requirement to know "how many people".

For example to differentiate two visits from a single person from two
visits from two different people. Or in this case to know how many
people are behind the thousands of hits by Tor Browser.

Being able to identify visits is also the only way for us to study how a
single person travels through our website; apart from referrer who only
tell us aggregated information from one page to the other but not
throughout a whole visit of multiple pages.

> - possibly we'd know how many people access the website from Tails or
> throught the Tor network, and how many access it through the "normal"
> Internet which could be interesting. However, I suspect that through the
> user-agent we will already have this information


Yes. We can already fraction *hits* according to user-agent and know
which fraction comes from Tor Browser and which does not (putting aside
that a, probably small, fraction of Firefox users will have the same
user-agent as Tor Browser).

What we're not able to tell yet is how many *people* (ie. *visits*) come
from Tor or non-Tor.

> - we'd probably not be able to profit much from the location feature,
> because a lot of traffic will be Tor exit nodes anyway. So having
> non-anonymized logs will still make it hard to differentiate as
> TorBrowser users are supposed to share the same web fingerprint, right?


Right, if we want information from where Tails users are located we
can't use Piwik. I plan to use surveys for that :)

> With or without anonymized logs:
>
> - we'd know which pages/languages get the most hits
> - we'd know at which dates which pages get more hits


Right.

> - we might get to know more about how our donation campaign works


Yes, as soon as we have some "referrer" mechanism that works better than
the one I meant to use last year (see the intro of my analysis and the
failure of my "anchor" mechanism).

> - we might get out more information on referrers: where do people come
> from, what did they look for
> - if we use the Javascript feature, we will set a cookie on users
> browsers and we'll be able to better follow them (entry point, exit
> point, path). This will not be possible if we simply use Apache log
> feeding from my understanding. (see
> https://piwik.org/faq/general/faq_18254/ about cookies)


Right.

For example, I'd be interested in doing stats on the installation
funnel: throughout the installation process, at which step do people
drop more? which success rate do we have overall (people going from
/install until the last step) across different installation scenarios
(Windows, macOS, Ubuntu, etc.).

This might be possible already with hits only but might be easier with
visits; I'm not sure. Note that in the future, I might split each step
in different pages (right now all the steps are in a single page).

> To me it looks like this is what we want and we would not get it out of
> non anonymized Apache logs. This would require more research, insight
> and discussion obviously.


Yeap. I don't think I'll have time to work more on Piwik soon as I'll be
very busy with DAVE 2 and the UX work for OTF, so I'm personally putting
this "on hold" until I get more time, maybe when we have to self-host
our website (#14588), etc. Until then I might still play a bit with my
prototype, for example to see how we could do analyze search logs but
little more.

>> Next steps
>> ==========
>>
>> - I'll try to do useful stuff through Piwik for #12082 "Analyze the
>> results of the donation campaign" over the summer.
>
> Did you use Piwik for this in the end?


Nope. Initially I thought I had to fallback on raw logs to study my
"anchor" mechanism and went this way. In the end the "anchor" mechanism
didn't work so I might have been able to go back to Piwik but I was in
the zone already and didn't think about that :)