Stats, stats and more stats (and lies!)

I’m a bit of a web-stat-aholic.  Despite the fact that this is a personal blog with hardly any relevance to the outside world, I still feel the need to see how many people read it.  But then that’s true of all the websites I throw up.  In some ways I find the stats just interesting, even if the numbers are really small, it amuses me how people find the sites, what search strings they use, and how certain pages get more hits.

I use three stats systems on this site, Google Analytics, the WordPress.com stats plugin, and the CyStats plugin.  Clearly the whole area of ‘what constitutes a visitor’ is murky at best, and when a page is made up of lots of resources that each generate a request to the web server, it gets a little harder to work out how many hits you’ve had, but I’m amused by the difference in information the three systems provide, and the apparent totally useless WordPress.com stats plugin.

When I moved the blog to WordPress I thought the WordPress.com stats plugin would be a good option, and indeed it looked like it was reasonably accurate when the visitor count was 1 or 2 people a day.  However, as the site gets found by google and random hits start to increase, the stats look more and more crazy, in particular the ‘top posts and pages’ section.

Here’s the current info from that plugin for pages visited today and yesterday,

stats1So yesterday, apparently the only two pages read on the site were the Watchmen post and the Wii Fit page.  And today, people are only reading the Watchmen post and nothing else.  I kinda find that hard to believe, and in fact, the other two stats systems agree that it’s complete bollocks.  I’ve no idea what-so-ever what the WordPress.com stats plugin is doing but it’s certainly not recording which pages are being viewed.

Total visitors or page views being different I can live with because how they’re measured is pretty vague, but you would think a stats plugin would know which pages were being read, that is kind of the whole point.  In contrast, this is what CyStats thinks has been read today,

Windows 7 Beta - file sharing                           8   14%
Main page                                               8   14%
of protein and fat and blood sugar                      4   7%
So, what went wrong (or WordPress, Cron and Squid)      3   5%
Lord of the Rings Online - a review - part one          3   5%
Windows 7 Beta in Sun's xVM VirtualBox                  3   5%
Where oh where has my Gallium gone?                     3   5%
/category/politics                                      2   3%
/tag/dvd                                                2   3%
A month with WordPress                                  2   3%
Old photo's                                             2   3%
First real go at non-drybrush skin                      2   3%
Whiskey & Red Bull                                      2   3%
Windows 7 beta + Lord of the Rings Online               2   3%
David Gemmell Legend Award news                         2   3%
About                                                	2   3%
Eating without thinking                                 2   3%
/2006/08                                                2   3%
Archives                                                2   3%

which as you can see is rather more varied (and slightly more believable).  However, the list of visited pages on Google Analytics for today is different again, not just the numbers, but the actual pages, listing some not viewed above and missing out some that were viewed.

Ultimately, I have the logs from my web hosting account (when they work), and that means I can see, for real, which pages are being accessed and how often, but reading those logs can be a pain and using tools to interpret them just introduce more interpretation that leads to yet another set of figures.

I guess where I’m going with this post is that trusting the stats for your site is impossible, but some tools are clearly more broken than others, and the WordPress.com stats plugin is entirely useless, since it’s clearly unable to work out which page your visitors are reading.  Don’t trust it.