Cygwin and rsync and all things nice

I wrote a little while ago that I was running Linux (Ubuntu in this case) inside a VirtualBox virtual machine, and it was all good.  Before that I’ve played with lots of methods of getting my favourite unix utilities (like rsync) working under Windows.  I’ve used Cygwin, and pre-compiled Windows versions and stripped-down Cygwin versions, and second machines running Linux and VM’s.

One of the main drivers for getting those things working is to back up my websites, held on my hosting account.  I can ssh into my hosting account, and that means if I can get rsync going locally, I can use it with ssh to copy all changes to my local machine.  It’s efficient (rsync only copies changes) and it’s easy.  The pain is always finding a decent compliant version of rsync.

Anyway, I already said that when I started using the Linux VM I ported my script across to that, and along with the VirtualBox shared folders, I could backup my websites and they were visible under XP.  It wasn’t pretty but it worked, and it meant I had to start up the VM.  At the start that wasn’t a problem because I was using it quite a bit but as the days went on and I stopped launching it, backups were less frequent.

And then today – random disaster.  I crashed the VirtualBox VM image, and after a couple of restarts it eventually stopped booting.  This wasn’t a great problem as I had snapshots of working images, so I just rolled back to one of those with two clicks.  Two clicks which took less time than the following thought took to get from one end of my brain to the other ‘I made the snapshots weeks ago, and since then I’ve written a lot of scripts and downloaded a lot of files and you just erased them all you idiot’.

So, I set about repatching Ubuntu and setting up various settings that I’d lost and made a few more snapshots.  But I needed a more permanent, reliable website backup solution.

Which means I’ve installed Cygwin again.  I know there are Windows binaries for rsync, and I know there are other apps which claim to do the same thing, but you can’t (in my view) beat the simplicity of Cygwin and the unix binaries.   Now I have a working cron daemon, ssh configured, rsync installed, and my little script which does all the work.  The rsync command is pretty simple,

rsync –recursive –links –safe-links –rsh=ssh –stats –human-readable me@mywebhost:/myhomedir/ /path/to/local/copy/

Then I just tar up the resulting files, compress them, make sure the filename has a date in it, and I can be confident I’ve got copies of everything I need.  Since most of my sites rely on mysql for their data, I also run some jobs on my webhost to mysqldump all the data into files three times a week, and I then back those files up locally.  I could mysqldump the content remotely, but it’s a hell of a lot quicker to do it on their system, compress them, and then rsync the compressed files.

Installing ssmtp lets me send mail from the Cygwin command line, so the script can send me a mail when it’s finished, and I’ll schedule it in cron to run once a week or something.  Much better.

Plus, I get all the fun of vi, grep and awk 🙂

The phpbb website was hacked

The guys who write the phpBB forum software have had their main website hacked.  The whole process looked pretty sophisticated and the hacker had access for a couple of weeks (increasingly deep access during that time).  The bottom line is that they have posted all the e-mail addresses, user ID’s and hashed passwords for every account registered on phpbb.com.

In fact, they went one further and just dumped the entire mysql database, and made it available, so it’s got all the fields of information used to register accounts.

Now the passwords are md5 hashes, rather than plain text, however phpBB v2 used a straight md5 hash which is easy to brute force.  phpBB v3 salts the hash first, and so is harder to brute force.  If you created an account on phpBB while it was running v2 and then never logged in again after it upgraded to v3 then lots of people you don’t want having access are currently trying to brute force your password.  If you had a simple password, they’ve already done it, and in fact, they broke about 18,000 passwords pretty quickly (all the obvious ones).

Robert Graham has done some basic analysis of the passwords over in this article.  He’s also posted a link to the blogger site which details the hack, which is still there at the moment, although none of the links from that site to the resulting files he published work.  The phpbb.com site is down for maintenance until they make sure it’s safe and that nothing else was changed.

The hack was carried out using a 0-day exploit of PHPlist, a mail manager application, and not directly related to the phpBB software itself.  The hacker had access for a couple of weeks, and the patches to PHPlist were released after he gained access, so patching as soon as they could wouldn’t have helped the phpBB guys.  What would have helped, was not upgrading to the latest version of PHPlist straight away – a possible good argument for running at least one level back from the latest level of any software (excluding security patches, of course).  Those two requirements probably conflict too often for it to be perfect advice.

I had an account on phpbb.com so I spent a few hours last night checking what user credentials I’d used and making sure I wasn’t using the same combination anywhere else.  I think I had a reaonably strong password, it’s never a word in the dictionary, it’s not even a word in the dictionary with some letters replaced by numbers, so it can only be brute forced by using random combinations of characters.  However, computational power is cheap and getting cheaper, so any password that can be brute forced will eventually be brute forced.  I’m not sure if I logged in to phpbb.com after they moved to v3, but I suspect I did so my password was probably salted as well.  However, I didn’t take any chances and I changed my passwords on a bunch of services last night.

Does this hack teach us anything?  No, not really, but it reminds us of some stuff we should have already known.  Try not to use the same user id / password combination more than once, and certainly keep stuff you care about (like online banking credentials) totally different to stuff you don’t care about (like message boards you’re going to use once).

The article analysing the passwords reminds us that picking clever passwords is harder than you first think, because with millions of other computer users around the world doing the same thing, passwords can still be very common.  Picking trustno1 (Mulder’s password from X-Files) won’t help you when your hackers are X-Files fans, and joshua isn’t as clever as you imagined when you find out half the hackers in the world watched WarGames as well.    I’m not sure the list on this site is really the top 500 passwords, but it’s a good example of 500 pretty weak passwords, because if they’re that easy to think up, they’re already in a brute force dictionary somewhere.

If you can create your own md5 hashes (various methods depending on your OS) then you can do a simple check to see if the password might be weak.  Search for it on google.  For example, the md5 hash for password is 5f4dcc3b5aa765d61d8327deb882cf99, now check out google and see how many hits you get for it.  If someone used password as their password on phpbb.com then the hackers knew it in about 2 seconds.  And making the o’s into zeros won’t help you.  You don’t even need to be a hacker to go from unsalted md5 hashes to passwords, there are several websites out there, easily found in google, where you put in md5 hashes and they tell you the string used to create that md5 hash.

Take care with your accounts and your passwords, keep them out of the dictionary.

25 things about me (meme!)

As seen here and Facebook (which I can’t link to because you’re probably not friends with my friends, and if you are, you already saw their stuff). On Facebook you’re supposed to tag 25 people, and they’re supposed to then do their 25 things thing.  Maybe I’ll post this on Facebook as well, who knows.

    1. I’m type two diabetic, through a mixture of genetics and too much pizza, your guess as to which has the greater effect is about as accurate as modern science’s best guess as well.  I was diagnosed in 2005.
    2. During university (or as it was when I started, polytechnic) in Sheffield, I would sometimes drink 7-12 cans of regular ice cold coca cola a day.  This in no way contributed to #1 above, really, no link at all.  During periods of revision or ‘last minute finalisation of essays’ (i.e. writing 3000 words from scratch in 2 hours), cola consumption could exceed this limit.
    3. John Hughes, John Landis, Ivan Reitman and Harold Ramis basically formed my entire view of the universe.  Later in life, Kevin Smith simply confirmed it all to be true.
    4. I prefer savoury tastes to sweet tastes.  This isn’t to say I don’t like sweet stuff, but on the whole I enjoy savoury tastes more.  Despite the claims of the first GP I spoke to after being diagnosed as diabetic, my diet does not contain a lot of sugar, it consists mostly of savoury, sour, and umami tasting foods.  I stopped taking sugar in my tea in my early teens, and while I enjoyed jellies I always prefered the sour ones (clearly off limits now thanks to #1).  Note to America: By jellies, I don’t mean jams, I mean these.
    5. Clearly I enjoy food (as evidenced by my weight and #1), I most enjoy food which has a strong taste and I often find myself craving chilli based foods.  I am not afraid of some heat in a food product, but I prefer there to be a strong flavour to match it.  If I’m making home made chilli it’s not usually hot in a chilli sense, but it is often salty and has a very strong spice flavour.
    6. I spent several years at university associating strongly with Garfield for various reasons, but not least because he liked lasagne.
    7. I have absolutely no idea what blood type I am.
    8. I firmly believe that a technical skill is less important than the ability to understand problems and apply related knowledge.  If you have the ability to look at something, understand it, and apply related knowledge you can learn any technical skill worth knowing.
    9. I can’t sing, I can’t write music, I can’t play back music that I hear on a musical instrument even if I know technically how to play that instrument (keyboard and tenor horn as a child), I can’t dance, but I have exceptional rhythm and can hold a beat like the universe depended on it.
    10. I believe the phrase “sarcasm is the lowest form of wit” is actually irony.
    11. I could stand in front of 1000 people and talk about a topic I picked up 2 days ago and sound confident but I couldn’t start a face to face conversation with any one of those people about something I’ve been doing for 25 years.
    12. The music of my youth was Madness, Queen, Eurythmics and Adam and the Ants.  I still love the first three, but I don’t rate Adam as much any more.
    13. During an assembly at school when I was around 10, our head teacher explained that not everyone could be good at everything.  We were all different and had different strengths and skills.  I decided there and then that it was clear I would never win a race around our school playground with anyone else due to my weight and lack of physical prowess, however, I would be able to beat everyone else by running diagonally across the playground since I knew that distance was shorter than the length of the other two bits of that triangle.  I decided that maths would be where I excelled and was pleased with myself.  Seven years later I gave up A level maths because it was too hard, and discovered I could neither win a foot race, nor beat people at maths.  I resigned myself to knowing how to use an apostrophe (even if these days, lazy fingers and lazy proof reading mean I get them in the wrong place too often).
    14. I have never had a driving lesson and hence can’t drive.
    15. I used to own (when at university) a long black rain coat, which I loved dearly.  I also had a pair of jeans which stopped at the knees, because they had worn out, formed holes, and the bottom sections had fallen off.  When combined (coat and jeans), with the coat buttoned up, the bottom of my legs were visible but naked.  I used to go out dressed like this.  It scared people.
    16. I’m a negative person.  I don’t just believe your glass is half empty, I think you’ll find it has micro-fractures and is currently leaking water.  Soon it will be entirely empty and you won’t have had time to drink any of it.  I don’t always think negativity is bad, if that were the case the electron would really suck.  However it obviously has to be managed.  I find I’m very good at spotting the flaws in solutions, designs, plans, theories and ideas.  This doesn’t always sit well with the owners of those intellectual objects.  It doesn’t really make them any happier to realise that I’m equally negative about my own ideas, theories, plans, designs and solutions.  However, when I learn to channel this skill correctly and couch my comments in happy-talk, it usually leads to better solutions, designs, ideas, theories and plans.  Usually.  If you find me pissing on your bonfire too much, just remind me, and I’ll try and keep it under control.
    17. My bladder capacity appears to be immense.  This is due to a combination of factors.  A lot of cola and beer while at university, a distinct and deep hatred of shared urinals, general laziness and a propensity to end up sitting somewhere that means I have to ask people to move if I want to stand up and go to the toilet.
    18. I was virtually teetotal until I went to university, but it didn’t last long.  The first real drink I had was Newcastle Brown Ale.  It made me so ill I vowed never to drink again.  Until the next evening.  I didn’t really choose to be teetotal until university, I just wasn’t ‘in’ with the kind of kids at school who went out drinking and so never got invited out.  I don’t necessarily regret this, I’m just letting you know.  You know.
    19. If it hadn’t been for the debt of being at university and because I’m basically really lazy, I’d probably still be in higher education in some form or another.  However, the company I worked for in my third year placement on my degree asked me to come back when my degree finished, assuming I passed, and I’m always happy to fall into the easy option.
    20. Despite my mother smoking all the time I lived at home, I don’t smoke.  I did smoke a few cigars during one particular year (1991-1992 I seem to recall), but just stopped one day.  I don’t seem to have the kind of personality that gets addicted to things.
    21. Although I engage heavily in several hobbies which appear initially to demand a superb imagination (roleplaying, live action roleplaying, reading, etc.), my imagination basically sucks.
    22. I love debate.  I find discussing ideas fun.  I totally respect the fact that other people believe different stuff to me, and will happily debate it until it gets dark and then light again with no intention of changing their mind, and every intention of maybe changing mine.  However, because I am somewhat intense when presenting my current belief on any particular topic, most people think I’m trying to change their minds, get frustrated or angry at me and then shut up and move on.
    23. I am generally uncomfortable with physical contact, but try hard to ignore it.  This doesn’t mean I don’t want you to hug me, I do, just that I’ll probably look uncomfortable for the first five seconds and that I probably won’t initiate the contact.  I’m getting better.
    24. My self image is a cross between Neo (The Matrix), Clint Eastwood and Egon Spengler.  The reality is clearly a cross between John Belushi, Bill Hicks and Raymond Stantz.
    25. Despite the fact that I don’t recall every verbalising to my mother or sister that I love them, I do in fact, love them.

Stats, stats and more stats (and lies!)

I’m a bit of a web-stat-aholic.  Despite the fact that this is a personal blog with hardly any relevance to the outside world, I still feel the need to see how many people read it.  But then that’s true of all the websites I throw up.  In some ways I find the stats just interesting, even if the numbers are really small, it amuses me how people find the sites, what search strings they use, and how certain pages get more hits.

I use three stats systems on this site, Google Analytics, the WordPress.com stats plugin, and the CyStats plugin.  Clearly the whole area of ‘what constitutes a visitor’ is murky at best, and when a page is made up of lots of resources that each generate a request to the web server, it gets a little harder to work out how many hits you’ve had, but I’m amused by the difference in information the three systems provide, and the apparent totally useless WordPress.com stats plugin.

When I moved the blog to WordPress I thought the WordPress.com stats plugin would be a good option, and indeed it looked like it was reasonably accurate when the visitor count was 1 or 2 people a day.  However, as the site gets found by google and random hits start to increase, the stats look more and more crazy, in particular the ‘top posts and pages’ section.

Here’s the current info from that plugin for pages visited today and yesterday,

stats1So yesterday, apparently the only two pages read on the site were the Watchmen post and the Wii Fit page.  And today, people are only reading the Watchmen post and nothing else.  I kinda find that hard to believe, and in fact, the other two stats systems agree that it’s complete bollocks.  I’ve no idea what-so-ever what the WordPress.com stats plugin is doing but it’s certainly not recording which pages are being viewed.

Total visitors or page views being different I can live with because how they’re measured is pretty vague, but you would think a stats plugin would know which pages were being read, that is kind of the whole point.  In contrast, this is what CyStats thinks has been read today,

Windows 7 Beta - file sharing                           8   14%
Main page                                               8   14%
of protein and fat and blood sugar                      4   7%
So, what went wrong (or WordPress, Cron and Squid)      3   5%
Lord of the Rings Online - a review - part one          3   5%
Windows 7 Beta in Sun's xVM VirtualBox                  3   5%
Where oh where has my Gallium gone?                     3   5%
/category/politics                                      2   3%
/tag/dvd                                                2   3%
A month with WordPress                                  2   3%
Old photo's                                             2   3%
First real go at non-drybrush skin                      2   3%
Whiskey & Red Bull                                      2   3%
Windows 7 beta + Lord of the Rings Online               2   3%
David Gemmell Legend Award news                         2   3%
About                                                	2   3%
Eating without thinking                                 2   3%
/2006/08                                                2   3%
Archives                                                2   3%

which as you can see is rather more varied (and slightly more believable).  However, the list of visited pages on Google Analytics for today is different again, not just the numbers, but the actual pages, listing some not viewed above and missing out some that were viewed.

Ultimately, I have the logs from my web hosting account (when they work), and that means I can see, for real, which pages are being accessed and how often, but reading those logs can be a pain and using tools to interpret them just introduce more interpretation that leads to yet another set of figures.

I guess where I’m going with this post is that trusting the stats for your site is impossible, but some tools are clearly more broken than others, and the WordPress.com stats plugin is entirely useless, since it’s clearly unable to work out which page your visitors are reading.  Don’t trust it.

Downtime, sorry.

Gradwell (my web host) is having some major issues with its infrastructure.  The net result is periods of slowness, no response or broken page loads.  I’m sorry.  It’s not just affecting this site, but the other sites I host, and some of those are hosted for other people.  I apologise to all those people.  I am considering alternative arrangements, but inertia is a bitch and while Gradwell is having issues isn’t a good time to start trying to transfer things away anyway (unless it goes on).  They are rejigging their kit and making some pretty major changes over this weekend to try and smooth things out, we’ll see how it goes.

Until then, if you get issues with any of the sites I host, please wait a little while and give it another try.  Thanks for your patience.