[Hcoop-discuss] Next hardware configuration/Email service

Justin S. Leitgeb leitgebj at hcoop.net
Mon Jan 30 17:02:50 EST 2006


Adam Chlipala wrote:

>What about having a centralized shared filesystem with shared user 
>accounts, but not accessing it directly wherever that would introduce 
>unacceptable performance penalties?  Instead of relying on the generic 
>caching behavior of our network filesystem drivers, we would use 
>domain-specific caching of the kind you suggested for web clusters.  
>Files used by web and mail servers (for instance) would have primary, 
>"logical" homes on the shared filesystem, but they would mostly be 
>accessed from copies of the relevant directory tress stored on 
>particular servers.  We would rsync the copies with the "real" versions 
>daily.  For web site files, I think this would be rsyncing from shared 
>to cached files; for mail, I think it would go in the other direction.
>
>  
>

What benefit would we get from rsync'ing the IMAP files to the primary 
fileserver?  Wouldn't we just want users to access them through the IMAP 
daemon?  I think that for properly backing up the IMAP server we will 
have to stop it momentarily anyway, or pull some other kind of trick in 
order to get a decent snapshot.  There are some threads I have browsed 
on the internet regarding this but I've never run an IMAP server myself 
so I can't say for sure.

>We would only have to worry about hardcore backup stuff for the shared 
>filesystem.  Most servers would only need some simple level of RAID to 
>prevent loss of the last day's data due to a disk failure.  On the file 
>server, we'd worry about protection against benevolent human error (keep 
>multiple images of all data) and system break-ins (regular copying of 
>all data to an off-network location, where even someone who gains root 
>access can't get to it).
>
>  
>
Sounds like it could work.  If we were doing these rsync's, then we may 
be able to even eliminate RAID on the web servers and just have a single 
disk.  Things like centralized syslogging could also reduce dependency 
on front-end webserver disks and allow us to build out with cheaper 
boxes that are interchangeable.  It may even be that we could get by 
with hitting the fileserver directly, depending on the load of the front 
end boxes... it would be easy enough to experiment with.  Running out of 
file descriptors might be another problem with this approach though. 

BTW, cfengine is becoming more popular and could be a great way to 
manage our network, if we spend some time figuring out how to get it 
running.  We're looking at it for our clusters, and bigger shops like 
google seem to use it internally.

> From the perspective of members, they could have the benefit of the 
>unified filesystem view while working on performance-critical stuff.  
>Then they would just need to run a suitable "publish" command to get 
>their important service files cached in the right way.
>
>  
>
I agree -- this would be ideal.

>>What about using LDAP or an alternative for managing these user accounts 
>>across servers?  I'm not familiar enough with the applications you've 
>>developed to know for sure how that would work out, but it seems that 
>>there are plenty of tools for account administration, and we could 
>>easily build something ourselves.  And in the setup above, the user 
>>would only need accounts on two machines -- the web host they're 
>>assigned to, as well as an IMAP account on the mail server.
>> 
>>
>>    
>>
>At CMU, where I went to undergrad, they used AFS with Kerberos to manage 
>a shared filesystem with common user accounts.  This might work for us.
>
>  
>

I've heard a lot of good things about AFS and Kerberos as well -- sounds 
like a good plan that would definitely scale well.

Lots of good ideas here.  Anyone up for sketching out a network diagram, 
perhaps using "dia" in linux?  Then we could post some of these ideas to 
the wiki page as they become more refined.  I might have time to throw 
one together sometime this week if no one else volunteers.

Justin






More information about the HCoop-Discuss mailing list