Are client-side trackers such as Google Analytics or WordPress SiteStats still reliable?

Long ago the we measured traffic to our site via the log files of our webserver. Pretty reliable because you can exactly measure which files are requested by whom: you woke up in the middle of the night because a certain trend in the logs made you make abrupt changes to your site.

Nowadays these logs are more or less seen as interesting for “technical issues” and not so much anymore for analysis of website traffic.

But… a couple of things happened since long ago:

  1. rise of clientside trackers – After a period of webbased “counters”, who simply insert an element in a webpage and count the number of requests along with some meta information (and which we placed on our sites just because they were cool as a graphical element), Google Analytics came on the scene and got a great following of users followed by companies using the same technology e.g. Clicky or e.g. the site stats inside every WordPress blog using JetPack. It has gotten to a point where these “counts” (page views, uniques) which were just “cool” long ago are now actually used to determine site value, marketing strategies and so on and used by a lot of folks as core metrics e.g. in Flippa they are often requested by potential buyers.
  2. rise of clientside anti tracker – However at the same time, users actively started to block these trackers. E.g. 15.527.966 users are using AdblockPlus for Firefox, 800.000 users are using Ghostery, 300.000 users are using DoNotTrackMe and there are by now zillions of products, pretty much installed by default by many users (including blocking certain hosts directly from their .hosts files) that I must conclude that the only thing client side trackers are nowadays tracking is the amount of users who are not technically enabled to install plugins or users who are actually a bot. Which Firefox users do not even have the most popular plugins installed? The Firefox users that have no clue on what a plugin is!
I estimate from my own measurements from my own traffic over sites that only 70% of the traffic nowadays is measured by client-side trackers such as Google Analytics and I expect this figure to fall.I also boldy state the this 70% of the real trafic is misleading for marketing purposes because it is the percentage of traffic of “less-computer-able-persons” or in other words: you miss the “advanced users” who install one of the countless blockers such as ghostery. And the behaviour of these type of persons is typically very different than experienced users in other words: you should definitely not base useability or hotspot pictures advices on client side tracker statistics. On top of that you won’t get any counts for persons who will leave your sites even before its completely loaded or just anyone or anything that does not load your piece of script or img.
As a counter you might argue that there is no real alternative because server-side measurements (or measurement on the DNS server such as in CloudFlare) contain a lot of false positives: bots / crawlers, all kinds of malware/threats etc…) However this is ofcourse exactly the same for clientside tracking… The main difference is that your clientside host has a list of known bots and threats that gets filtered out. This is pretty identical to what you need to do on your server-side. So this is not a real argument. The other difference is that clientside registers nothing that does trigger the client-side javascript inserts so users that only load part of your pages are also not counted and this information is also lost in the void.I therefore state that client-side measurements for analytics is now dead unless you only want to measure the 70% of traffic. You can NOT base value on these stats. You can NOT do useability analysis on these stats. You NEED the serverside information or even better, the serverside logs combined with the DNS server logs.

What is your estimate of the amount ofusers that has a client-side blocking plugin installed?

p.s. I expect a rise in statistical cloud apps that will trigger on serverside calls (so e.g. a request of my page1.aspx will trigger on the serverside a call to the statistical server of whatever cloud app for visualizing stats) versus cloud apps where you sync your webserver logs with to run analytics on a non-real-time basis (and as a note: for “the masses”). So e.g. on a Nginx based server some app that real-time monitors the ngx_http_log_module or maybe an extension on it and interfaces with a stats service on a remote server or one layer higher such as e.g. http://code.google.com/p/php-ga/ and such.

In fact … this is what I expect to be landing on my cool page somewhere soon by some hip company:

It will give me both real-time info as well as longer trail info and it will give me the technical information I need often not completely different from other essential information to understand what is going on. (I did not dive deep enough in OWA to understand if it already does the thing above).

I notice that a lot of websites from even large companies out there do not use this model since “long-tail” 404′s still exist on all kinds of places. 404′s might be a good measurement for ‘fail’: main pages on a domain might have some fancy 404 page but long forgotten sub domains that no longer exist often are deserted places. But also places that might never exist but may be interesting e.g. when I type http://forum.apple.com I would expect another reply (of course it gives server does not exist) but what if 1000 users would type this in their browsers. Would it ever get noticed by yourself on your domains?

Comments