Cygwin and rsync and all things nice

I wrote a little while ago that I was running Linux (Ubuntu in this case) inside a VirtualBox virtual machine, and it was all good.  Before that I’ve played with lots of methods of getting my favourite unix utilities (like rsync) working under Windows.  I’ve used Cygwin, and pre-compiled Windows versions and stripped-down Cygwin versions, and second machines running Linux and VM’s.

One of the main drivers for getting those things working is to back up my websites, held on my hosting account.  I can ssh into my hosting account, and that means if I can get rsync going locally, I can use it with ssh to copy all changes to my local machine.  It’s efficient (rsync only copies changes) and it’s easy.  The pain is always finding a decent compliant version of rsync.

Anyway, I already said that when I started using the Linux VM I ported my script across to that, and along with the VirtualBox shared folders, I could backup my websites and they were visible under XP.  It wasn’t pretty but it worked, and it meant I had to start up the VM.  At the start that wasn’t a problem because I was using it quite a bit but as the days went on and I stopped launching it, backups were less frequent.

And then today – random disaster.  I crashed the VirtualBox VM image, and after a couple of restarts it eventually stopped booting.  This wasn’t a great problem as I had snapshots of working images, so I just rolled back to one of those with two clicks.  Two clicks which took less time than the following thought took to get from one end of my brain to the other ‘I made the snapshots weeks ago, and since then I’ve written a lot of scripts and downloaded a lot of files and you just erased them all you idiot’.

So, I set about repatching Ubuntu and setting up various settings that I’d lost and made a few more snapshots.  But I needed a more permanent, reliable website backup solution.

Which means I’ve installed Cygwin again.  I know there are Windows binaries for rsync, and I know there are other apps which claim to do the same thing, but you can’t (in my view) beat the simplicity of Cygwin and the unix binaries.   Now I have a working cron daemon, ssh configured, rsync installed, and my little script which does all the work.  The rsync command is pretty simple,

rsync –recursive –links –safe-links –rsh=ssh –stats –human-readable me@mywebhost:/myhomedir/ /path/to/local/copy/

Then I just tar up the resulting files, compress them, make sure the filename has a date in it, and I can be confident I’ve got copies of everything I need.  Since most of my sites rely on mysql for their data, I also run some jobs on my webhost to mysqldump all the data into files three times a week, and I then back those files up locally.  I could mysqldump the content remotely, but it’s a hell of a lot quicker to do it on their system, compress them, and then rsync the compressed files.

Installing ssmtp lets me send mail from the Cygwin command line, so the script can send me a mail when it’s finished, and I’ll schedule it in cron to run once a week or something.  Much better.

Plus, I get all the fun of vi, grep and awk 🙂

The phpbb website was hacked

The guys who write the phpBB forum software have had their main website hacked.  The whole process looked pretty sophisticated and the hacker had access for a couple of weeks (increasingly deep access during that time).  The bottom line is that they have posted all the e-mail addresses, user ID’s and hashed passwords for every account registered on phpbb.com.

In fact, they went one further and just dumped the entire mysql database, and made it available, so it’s got all the fields of information used to register accounts.

Now the passwords are md5 hashes, rather than plain text, however phpBB v2 used a straight md5 hash which is easy to brute force.  phpBB v3 salts the hash first, and so is harder to brute force.  If you created an account on phpBB while it was running v2 and then never logged in again after it upgraded to v3 then lots of people you don’t want having access are currently trying to brute force your password.  If you had a simple password, they’ve already done it, and in fact, they broke about 18,000 passwords pretty quickly (all the obvious ones).

Robert Graham has done some basic analysis of the passwords over in this article.  He’s also posted a link to the blogger site which details the hack, which is still there at the moment, although none of the links from that site to the resulting files he published work.  The phpbb.com site is down for maintenance until they make sure it’s safe and that nothing else was changed.

The hack was carried out using a 0-day exploit of PHPlist, a mail manager application, and not directly related to the phpBB software itself.  The hacker had access for a couple of weeks, and the patches to PHPlist were released after he gained access, so patching as soon as they could wouldn’t have helped the phpBB guys.  What would have helped, was not upgrading to the latest version of PHPlist straight away – a possible good argument for running at least one level back from the latest level of any software (excluding security patches, of course).  Those two requirements probably conflict too often for it to be perfect advice.

I had an account on phpbb.com so I spent a few hours last night checking what user credentials I’d used and making sure I wasn’t using the same combination anywhere else.  I think I had a reaonably strong password, it’s never a word in the dictionary, it’s not even a word in the dictionary with some letters replaced by numbers, so it can only be brute forced by using random combinations of characters.  However, computational power is cheap and getting cheaper, so any password that can be brute forced will eventually be brute forced.  I’m not sure if I logged in to phpbb.com after they moved to v3, but I suspect I did so my password was probably salted as well.  However, I didn’t take any chances and I changed my passwords on a bunch of services last night.

Does this hack teach us anything?  No, not really, but it reminds us of some stuff we should have already known.  Try not to use the same user id / password combination more than once, and certainly keep stuff you care about (like online banking credentials) totally different to stuff you don’t care about (like message boards you’re going to use once).

The article analysing the passwords reminds us that picking clever passwords is harder than you first think, because with millions of other computer users around the world doing the same thing, passwords can still be very common.  Picking trustno1 (Mulder’s password from X-Files) won’t help you when your hackers are X-Files fans, and joshua isn’t as clever as you imagined when you find out half the hackers in the world watched WarGames as well.    I’m not sure the list on this site is really the top 500 passwords, but it’s a good example of 500 pretty weak passwords, because if they’re that easy to think up, they’re already in a brute force dictionary somewhere.

If you can create your own md5 hashes (various methods depending on your OS) then you can do a simple check to see if the password might be weak.  Search for it on google.  For example, the md5 hash for password is 5f4dcc3b5aa765d61d8327deb882cf99, now check out google and see how many hits you get for it.  If someone used password as their password on phpbb.com then the hackers knew it in about 2 seconds.  And making the o’s into zeros won’t help you.  You don’t even need to be a hacker to go from unsalted md5 hashes to passwords, there are several websites out there, easily found in google, where you put in md5 hashes and they tell you the string used to create that md5 hash.

Take care with your accounts and your passwords, keep them out of the dictionary.

Stats, stats and more stats (and lies!)

I’m a bit of a web-stat-aholic.  Despite the fact that this is a personal blog with hardly any relevance to the outside world, I still feel the need to see how many people read it.  But then that’s true of all the websites I throw up.  In some ways I find the stats just interesting, even if the numbers are really small, it amuses me how people find the sites, what search strings they use, and how certain pages get more hits.

I use three stats systems on this site, Google Analytics, the WordPress.com stats plugin, and the CyStats plugin.  Clearly the whole area of ‘what constitutes a visitor’ is murky at best, and when a page is made up of lots of resources that each generate a request to the web server, it gets a little harder to work out how many hits you’ve had, but I’m amused by the difference in information the three systems provide, and the apparent totally useless WordPress.com stats plugin.

When I moved the blog to WordPress I thought the WordPress.com stats plugin would be a good option, and indeed it looked like it was reasonably accurate when the visitor count was 1 or 2 people a day.  However, as the site gets found by google and random hits start to increase, the stats look more and more crazy, in particular the ‘top posts and pages’ section.

Here’s the current info from that plugin for pages visited today and yesterday,

stats1So yesterday, apparently the only two pages read on the site were the Watchmen post and the Wii Fit page.  And today, people are only reading the Watchmen post and nothing else.  I kinda find that hard to believe, and in fact, the other two stats systems agree that it’s complete bollocks.  I’ve no idea what-so-ever what the WordPress.com stats plugin is doing but it’s certainly not recording which pages are being viewed.

Total visitors or page views being different I can live with because how they’re measured is pretty vague, but you would think a stats plugin would know which pages were being read, that is kind of the whole point.  In contrast, this is what CyStats thinks has been read today,

Windows 7 Beta - file sharing                           8   14%
Main page                                               8   14%
of protein and fat and blood sugar                      4   7%
So, what went wrong (or WordPress, Cron and Squid)      3   5%
Lord of the Rings Online - a review - part one          3   5%
Windows 7 Beta in Sun's xVM VirtualBox                  3   5%
Where oh where has my Gallium gone?                     3   5%
/category/politics                                      2   3%
/tag/dvd                                                2   3%
A month with WordPress                                  2   3%
Old photo's                                             2   3%
First real go at non-drybrush skin                      2   3%
Whiskey & Red Bull                                      2   3%
Windows 7 beta + Lord of the Rings Online               2   3%
David Gemmell Legend Award news                         2   3%
About                                                	2   3%
Eating without thinking                                 2   3%
/2006/08                                                2   3%
Archives                                                2   3%

which as you can see is rather more varied (and slightly more believable).  However, the list of visited pages on Google Analytics for today is different again, not just the numbers, but the actual pages, listing some not viewed above and missing out some that were viewed.

Ultimately, I have the logs from my web hosting account (when they work), and that means I can see, for real, which pages are being accessed and how often, but reading those logs can be a pain and using tools to interpret them just introduce more interpretation that leads to yet another set of figures.

I guess where I’m going with this post is that trusting the stats for your site is impossible, but some tools are clearly more broken than others, and the WordPress.com stats plugin is entirely useless, since it’s clearly unable to work out which page your visitors are reading.  Don’t trust it.

Responsible web sites

Most small websites on the ‘net sit on shared hosting of some kind or another  ((this is an educated guess)).  Shared hosting means that a small number of servers handle all the requests for a large number of web sites.  How that’s achieved varies, but the bottom line is that it’s a shared infrastructure.  It’s a bit like living in shared accommodation.  There’s a single door through which everyone gets into the building, then a number of apartments which have their own doors.  But they all share the same electricity supply and water and other utilities.

With shared web hosting, all the traffic comes into the same web host network and web server cluster, and is then handled by all the different web site configurations.  In the same way that there are people who would like to break into your apartment, there are people who’d like to break into your web site to steal stuff, deface it, or to try and gain further access to the shared infrastructure.

Continue reading

WordPress search sucks

I really like WordPress, I’m glad I moved to it from Blogger.  I think with the right templates it’s pretty flexible, I wish I had a) more time and b) more css/layout skill to do some template work.  However, WordPress search sucks.

It sucks for a few reasons,

  1. results come back in reverse date order (which makes sense for a blog but is too inflexible)
  2. there’s no indication in the search results which words matched the article
  3. the search just takes all the terms and does a basic sql query for any of them, so if you search for ‘i like bacon’ you get posts with the word like and then posts with the word bacon
  4. the standard navigation doesn’t tell you how many pages of results you got, just that you can read the next page

For blogs, I guess it’s ok as a basic tool, but really it should,

  1. return posts by most relevant first
  2. do proper searches based on the phrase you submit
  3. indicate which words matched the post
  4. show how many posts matched
  5. list the posts by just title, or summary or full and allow you to switch
  6. show which page you’re on, if there are more than one page of results

I’ve looked at various plugins, but not really had any luck finding one which fixes all the problems.  I’ll keep looking.

Windows 7 beta – again

Played with the Windows 7 beta again last night (I put ear plugs in so the fan noise didn’t get to me).  What can I say, either Microsoft are pulling a fast one and this is a fake version of Windows designed to run really quick, or it’s actually pretty good.  I installed Firefox and OpenOffice on the the machine yesterday and both are quick and responsive.  OO starts pretty smartly and dragging windows around is pretty quick (not very scientific, but let’s be honest, the way I use a machine these days I’m all about how responsive the UI is).

The access protection stuff doesn’t really get in my way, there’s a couple of dialogs to click on when installing stuff (‘do you want App_01 to make changes to your system?’) but otherwise it’s pretty seamless.

I don’t like the file dialogs.  I’m a big fan of files and directories, and I much prefer just seeing a real representative list of what the structure looks like, rather than all the ‘fake’ entries, but I guess that’s because I’m a command-line geek at heart and that as things develop that view is going to go away entirely (it’s not really necessary any more I guess).

File sharing works fine, I picked ‘work’ as the network location, popped the machine into a workgroup and I can happily drag files off my XP machine, haven’t tried the other way around yet.

I tried the Kaperskey trial anti-virus thing this time instead of AVG, and the installer is very annoying with about 14 features I didn’t like the sound of so I turned off.  Otherwise it installed and seems fine.

I’ll probably not post anything else about Windows 7 – as far as I care these days it works fine for me.  I used to demand a lot more from the OS and spend a lot of time configuring and tweaking, but these days I’ve turned into much more of a casual OS user.  As long as I can browse, install apps and manage photo’s I’ll be fine.  Actually, I might try and get Lord of the Rings online installed and running just to see how it copes since I guess games are the other thing I do, if I make any progress or fall foul with that I’ll let you know.  Otherwise, I’m confident that when I have to upgrade from XP, Windows 7 looks like it’ll be fine.

Of course, I don’t intent to get rid of my Linux VM’s, I can’t give up my command line just yet.

Working again

Gradwell finally read the ticket and fixed the issue (took 3 minutes from reading the ticket to fixing the issue, shame they took 12 hours to read the ticket).

It’s annoying that I really like the Gradwell setup for managing my domains / web sites and that moving anywhere else will be a serious and huge annoying pain (~10 domains, lots of mail forwarding rules, multiple sites on the domains, pretty flexible setup for controlling it all at Gradwell).  I don’t want a single domain web hosting solution, I don’t want a web hosting company to own my domains, I don’t necessarily want a ‘reseller’ account but I basically want the config flexibility that goes with reseller accounts.  The NearlyFreeSpeech.net setup is quite nice but they don’t naturally register / host UK domains.

I’d buy a dedicated server or VPS somewhere if I could be bothered to work out how to handle my DNS.

/sigh

So I’ll probably just continue to vacillate and moan about Gradwell (like I have for 3 years).  Better the devil you know …