[HCoop-Discuss] Backups: rsync vs. Amazon Glacier

Clinton Ebadi clinton at unknownlamer.org
Fri Sep 7 17:00:54 EDT 2012


Adam Chlipala <adam at chlipala.net> writes:

> On 09/03/2012 12:23 PM, Steve Killen wrote:
>
>     So we currently do backups with rsync.net for ~$60/mo. I just ran across Amazon Glacier:
>    
>     http://aws.amazon.com/glacier/
>    
>     It's $0.01/GB a month.
>    
>     I'm just spitballing to get the conversation started, but off the
>     cuff it seems worth looking into to reduce our backup costs--how
>     much data are we maintaining with rsync?

Transfer costs additional money, reading files before N days costs
additional money, deleting files before N days incurs a cost for those N
days, you have to wait 2-3 hours for data.

Basically, it's not really useful for the sort of backups we're
making. We keep most ephemeral backups (I'd like to keep more, but the
current backup scripts suck) that, if ever needed, need to be accessed
more or less at-will.

Additionally, rsync.net supports Free Software development (we get a
discount, and so do any open source developers who ask for one) *and*
uses standard Free technologies so we're not beholden to them. Amazon,
OTOH, pushes DRM and proprietary web APIs and is really unfriendly
toward Free Software.

It's all a moot point because the off-the-shelf backup solution we're
transitioning to requires sftp, and Amazon doesn't offer that:

> I don't even know if a working, reasonable back-up regime is in place
> at this point.  It wouldn't surprise me if that slipped by the wayside
> during various upgrades.
>
> A regular process for testing the integrity of back-up data would be
> great; I don't think we ever had one.

Amazingly, we do have a vaguely working backup regime. AFS volumes and
databases are well backed up, and in theory deleuze gets backed up. The
other machines... not so lucky. It's also pretty terrible in that it
does a complete volume dump every single run, so it takes nearly 72
hours and is responsible for about 80% of HCoop's data use (putting us
dangerously close to 5Mbit/s).

The justifications for doing full dumps vaguely made sense when they
were first implemented (need to encrypt them basically), but ... it's
still untenable.

Luckily, obnam <http://liw.fi/obnam/> exists now, and can give us
incremental and secure backups... I'm experimenting with it locally
using my laptops + workstation (I need to back my laptop up to RAID1ed
storage anyway) and expect to get it into production at HCoop once I
finish getting this new Apache machine up. The general idea of the new
backup regime:

 - Each machine has its own repository that a daily cron job pushes to
 - Repository for database dumps (+ daily cron)
 - Repository for afs backup dumps (+ daily cron)
   - Unfortunately to preserve afs attributes we have to do a local `vos
     dump' of the (near zero disk space using) backup volumes. You win
     some, you lose some.

Then, let obnam handle the rest initially keeping ~30 days of backups
and seeing how much space that uses. Thankfully obnam does the hard
parts and all I really need to do is manage the repository keyring and
set up a few cron jobs and we're good to go...

Verification that backups (aside from afs volume dumps, which are easy)
actually work is bit more challenging... but now that we're moving to
having virtualization servers with the real stuff going on inside VMs,
it will at least be possible to do a disaster-recovery test without
affecting other operations.

-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
Url : http://lists.hcoop.net/pipermail/hcoop-discuss/attachments/20120907/50aa6d53/attachment.pgp 


More information about the HCoop-Discuss mailing list