[HCoop-Discuss] Backups: rsync vs. Amazon Glacier

Mon Sep 17 22:43:19 EDT 2012

Clinton,

I'm not sure which are Adam's comments and which are yours. But in summary, am I right in reading that you see hcoop continuing to use rsync.net as our offsite backup storage and think obnam holds great promise as the management system for these backups?

Thanks,

-- Jesse Shumway   <layline AT hcoop.net>

On Sep 7, 2012, at 5:00 PM, Clinton Ebadi wrote:

> Adam Chlipala <adam at chlipala.net> writes:
> 
>> On 09/03/2012 12:23 PM, Steve Killen wrote:
>> 
>>    So we currently do backups with rsync.net for ~$60/mo. I just ran across Amazon Glacier:
>> 
>>    http://aws.amazon.com/glacier/
>> 
>>    It's $0.01/GB a month.
>> 
>>    I'm just spitballing to get the conversation started, but off the
>>    cuff it seems worth looking into to reduce our backup costs--how
>>    much data are we maintaining with rsync?
> 
> Transfer costs additional money, reading files before N days costs
> additional money, deleting files before N days incurs a cost for those N
> days, you have to wait 2-3 hours for data.
> 
> Basically, it's not really useful for the sort of backups we're
> making. We keep most ephemeral backups (I'd like to keep more, but the
> current backup scripts suck) that, if ever needed, need to be accessed
> more or less at-will.
> 
> Additionally, rsync.net supports Free Software development (we get a
> discount, and so do any open source developers who ask for one) *and*
> uses standard Free technologies so we're not beholden to them. Amazon,
> OTOH, pushes DRM and proprietary web APIs and is really unfriendly
> toward Free Software.
> 
> It's all a moot point because the off-the-shelf backup solution we're
> transitioning to requires sftp, and Amazon doesn't offer that:
> 
>> I don't even know if a working, reasonable back-up regime is in place
>> at this point.  It wouldn't surprise me if that slipped by the wayside
>> during various upgrades.
>> 
>> A regular process for testing the integrity of back-up data would be
>> great; I don't think we ever had one.
> 
> Amazingly, we do have a vaguely working backup regime. AFS volumes and
> databases are well backed up, and in theory deleuze gets backed up. The
> other machines... not so lucky. It's also pretty terrible in that it
> does a complete volume dump every single run, so it takes nearly 72
> hours and is responsible for about 80% of HCoop's data use (putting us
> dangerously close to 5Mbit/s).
> 
> The justifications for doing full dumps vaguely made sense when they
> were first implemented (need to encrypt them basically), but ... it's
> still untenable.
> 
> Luckily, obnam <http://liw.fi/obnam/> exists now, and can give us
> incremental and secure backups... I'm experimenting with it locally
> using my laptops + workstation (I need to back my laptop up to RAID1ed
> storage anyway) and expect to get it into production at HCoop once I
> finish getting this new Apache machine up. The general idea of the new
> backup regime:
> 
> - Each machine has its own repository that a daily cron job pushes to
> - Repository for database dumps (+ daily cron)
> - Repository for afs backup dumps (+ daily cron)
>   - Unfortunately to preserve afs attributes we have to do a local `vos
>     dump' of the (near zero disk space using) backup volumes. You win
>     some, you lose some.
> 
> Then, let obnam handle the rest initially keeping ~30 days of backups
> and seeing how much space that uses. Thankfully obnam does the hard
> parts and all I really need to do is manage the repository keyring and
> set up a few cron jobs and we're good to go...
> 
> Verification that backups (aside from afs volume dumps, which are easy)
> actually work is bit more challenging... but now that we're moving to
> having virtualization servers with the real stuff going on inside VMs,
> it will at least be possible to do a disaster-recovery test without
> affecting other operations.