[Hcoop-discuss] Next hardware configuration/Email service

Adam Chlipala adamc at hcoop.net
Wed Feb 1 10:25:17 EST 2006


Justin S. Leitgeb wrote:

>What benefit would we get from rsync'ing the IMAP files to the primary 
>fileserver?  Wouldn't we just want users to access them through the IMAP 
>daemon?  I think that for properly backing up the IMAP server we will 
>have to stop it momentarily anyway, or pull some other kind of trick in 
>order to get a decent snapshot.  There are some threads I have browsed 
>on the internet regarding this but I've never run an IMAP server myself 
>so I can't say for sure.
>  
>
Maybe we have different ideas of what a "proper" back-up is.  I think 
it's OK for any IMAP files that are being modified to be lost or 
corrupted in the backup images, assuming each message gets its own 
file.  (Most messages are only modified at receipt time, so we'll catch 
them in the next backup.)  We are currently backing up all mailboxes 
on-disk with rsnapshot (a script based on rsync), and I'm not aware of 
any IMAP-specific correctness problems with the scheme.  Are you just 
worried about performance?

Like I said before, things are easier if we only have to apply "heavy 
duty" backup techniques to a single filesystem.  This is why it's 
preferable to back up IMAP files to the shared filesystem, if that's not 
their primary home.

Karl Chen wrote:

>I think the first thing to decide is if we want a transparent
>distributed file system like AFS/Coda/etc, or an exposed system
>like you describe with rsync and synchronization commands.
>
>Adam, it sounds like you favor control and performance over
>synchronization and ease of use, which would mean rsync instead of
>AFS.
>
>Personally I think AFS would be easier and tweakable enough for
>performance concerns.
>
No, I'd say I favor ease of use.  I suggested doing everything with a 
shared filesystem, and it was Justin who said that he has some doubts 
that we could achieve acceptable performance that way.  It seems like a 
good compromise is using an AFS work-alike as the _logical_ view of all 
our files, while implementing our own rsync-based caching schemes for it 
where appropriate.  Perhaps AFS already provides super caching support, 
such that rsync wouldn't be needed; I don't know enough about it to say.

Justin S. Leitgeb wrote:

>The point is that AFS alone is not going to deal with performance 
>issues.  Real-world web sites can stress even large machines by today's 
>standards.  I just think that we need to recognize this out front, and 
>realize that we can't scrimp on the disk configuration of the fileserver 
>-- RAID 10 may be something we should consider if we really want to go 
>down that route, and it won't be cheap.  We will also want loads of 
>memory in the front-end servers so that they can cache rather than 
>introducing additional overhead assembling AFS packets.
>
We could do this in a forward-looking way by putting in the 
infrastructure for this sort of organization, but just not including 
much capacity.  A simple proof that this would work to start out with is 
the fact that we get by fine with our current server specs, and in fact 
we are underutilizing what we have.  Maybe RAID 10 and RAM are not 
cheap, but we wouldn't need much of them to start out with.  We'd still 
be able to gain valuable experience using them in our initially 
undemanding setting.

In any case, it's probably a good idea to pick out a few most promising 
hardware configurations, price them out, and see what seems worth the cost.




More information about the HCoop-Discuss mailing list