[Tails-project] some stats on the website

Delete this message

Reply to this message
Autor: sajolida
Dla: Public mailing list about the Tails project
Temat: [Tails-project] some stats on the website
To the request of the Internationalization Lab who helped us translated
the website in Farsi, I did some stats on the hits we see on the
website. These are aggregate numbers from April 24 to May 22 so I
thought I could as well publish them here as they might be of interest
to different people. I'm also documenting my scripts for the future and
in case I made errors (I often do on these things).

Translation stats

These are the stats we publish in our month reports. They have nothing
to do with website hits but, since I'm writing this for the
Internationalization Lab, I thought I'd copy it here as well.

Overall translation of the website

- de: 50% (2615) strings translated, 43% words translated
- fa: 47% (2502) strings translated, 54% words translated
- fr: 63% (3278) strings translated, 63% words translated
- it: 17% ( 949) strings translated, 18% words translated
- pt: 31% (1661) strings translated, 29% words translated

Total original words: 53520

Core pages of the website

See https://tails.boum.org/contribute/l10n_tricks/core_po_files.txt

- de: 79% (1432) strings translated, 79% words translated
- fa: 40% ( 726) strings translated, 42% words translated
- fr: 73% (1330) strings translated, 77% words translated
- it: 49% ( 886) strings translated, 56% words translated
- pt: 55% (1001) strings translated, 55% words translated

Total original words: 14006

Hits per language

for lang in en fa fr de ; do echo -n "${lang} " ; grep -E "GET
.+\.${lang}\.html HTTP/1\..\" 200" access.log* | wc -l ; done

en 1501323 (83.1%)
fa 11468 ( 0.6%)
fr 124823 ( 6.9%)
de 170007 ( 9.4%)

Top 50 pages in Farsi and their hits

Note that this doesn't mean that these pages are actually translated in
Farsi. For example, the top 2, 3, 8, 10, and 12 pages are not translated
into Farsi.

grep -E "GET .+\.fa\.html HTTP/1\..\" 200" /tmp/access.log | sed -n -re
's/.* ([^ ]+)\.fa\.html HTTP.*/\1/p' | sort | uniq -c | sort -rn | head
-n 50

    686 /index
    312 /install
    189 /install/os
    183 /news
    162 /about
    128 /support/faq
    128 /getting_started
    127 /install/win
    126 /doc/anonymous_internet/Tor_Browser
    120 /install/win/usb
    106 /news/version_2.3
    106 /install/win/usb/overview
    105 /doc
    103 /support/known_issues
    103 /doc/first_steps/startup_options/bridge_mode
     85 /doc/about/license
     84 /support
     83 /contribute
     82 /press
     74 /doc/about/warning
     74 /contribute/how/donate
     67 /security
     66 /doc/first_steps/introduction_to_gnome_and_the_tails_desktop
     66 /doc/about/trust
     64 /install/vm
     63 /doc/anonymous_internet/claws_mail_to_icedove
     59 /doc/encryption_and_privacy/secure_deletion
     59 /doc/anonymous_internet/tor_status
     57 /doc/anonymous_internet/icedove
     56 /doc/anonymous_internet/electrum
     55 /doc/first_steps/bug_reporting
     55 /doc/anonymous_internet/pidgin
     55 /doc/about/features
     54 /doc/anonymous_internet/i2p
     53 /doc/first_steps/startup_options/network_configuration
     52 /install/dvd
     51 /security/Numerous_security_holes_in_2.2.1
     51 /install/debian
     50 /doc/introduction
     50 /doc/anonymous_internet/index
     49 /news/version_1.7
     49 /doc/first_steps/upgrade
     49 /doc/first_steps/startup_options/mac_spoofing
     48 /doc/about/openpgp_keys
     48 /doc/about/acknowledgments_and_similar_projects
     47 /news/version_2.2.1
     46 /news/version_2.2
     46 /doc/advanced_topics/virtualization
     45 /doc/first_steps/installation/manual/linux
     44 /install/win/clone/overview

Top 50 pages across all languages

grep -E "GET .+\...\.html HTTP/1\..\" 200" /tmp/access.log | sed -n -re
's/.* ([^ ]+)\...\.html HTTP.*/\1/p' | sort | uniq -c | sort -rn | head
-n 50

554957 /news
154156 /install
146827 /install/os
99661 /install/win
70154 /index
65685 /install/win/usb/overview
62222 /install/win/usb
40132 /about
33440 /install/debian
28471 /getting_started
22188 /doc/about/warning
20882 /install/debian/usb
20486 /news/version_2.3
20428 /install/dvd
20370 /install/debian/usb/overview
19971 /install/linux
19889 /doc
19305 /install/vm
14709 /install/mac
12914 /install/win/clone/overview
12627 /doc/about/features
11638 /install/clone
10947 /install/download
10279 /support/faq
9732 /doc/first_steps/installation
9239 /install/linux/usb/overview
9072 /security/Numerous_security_holes_in_2.2.1
8242 /doc/first_steps/reset/windows
8204 /support/known_issues
8024 /install/linux/usb
7871 /support
6729 /doc/first_steps/installation/manual
6351 /doc/first_steps/startup_options/bridge_mode
6243 /doc/first_steps/startup_options/administration_password
6136 /doc/advanced_topics/virtualization/virtualbox
6088 /install/mac/usb/overview
5808 /doc/about/license
5746 /doc/get/verify
5654 /doc/first_steps/persistence/configure
5301 /doc/first_steps/startup_options
5123 /install/expert/usb/overview
4986 /doc/first_steps/startup_options/mac_spoofing
4968 /doc/first_steps/start_tails
4772 /install/mac/usb
4704 /doc/advanced_topics/virtualization
4698 /doc/about/fingerprint
4677 /install/expert/usb
4386 /doc/first_steps/upgrade
4333 /doc/first_steps/reset/linux
4289 /doc/about/requirements

Top 20 user agents

grep -E "GET .+\.fa\.html HTTP/1\..\" 200" /tmp/access.log | sed -e 's/
/ /g' | cut -d ' ' -f 17 | sort | uniq -c | sort -rn | head -n 20

   7722 "Mozilla/5.0
    731 "Domain
    701 "Mozilla/4.5
    576 "Wget/1.15
    490 "Mozilla/4.0
    274 "Googlebot/2.1
    252 "Riddler
    232 "GigablastOpenSource/1.0"
     99 "ltx71
     61 "PrivateSearch/0.1.0
     61 "eilisabot/1.0.0-beta"
     54 "yacybot
     49 "ResearchBot;
     27 "UserAgent"
     21 "DoCoMo/2.0
     15 "SAMSUNG-SGH-E250/1.0
     14 "Opera/9.80
     11 "-"
     10 "Ruby"
      7 "UCWEB/2.0