[Hcoop-discuss] Next hardware configuration/Email service
Justin S. Leitgeb
leitgebj at hcoop.net
Mon Jan 30 17:02:50 EST 2006
Adam Chlipala wrote:
>What about having a centralized shared filesystem with shared user
>accounts, but not accessing it directly wherever that would introduce
>unacceptable performance penalties? Instead of relying on the generic
>caching behavior of our network filesystem drivers, we would use
>domain-specific caching of the kind you suggested for web clusters.
>Files used by web and mail servers (for instance) would have primary,
>"logical" homes on the shared filesystem, but they would mostly be
>accessed from copies of the relevant directory tress stored on
>particular servers. We would rsync the copies with the "real" versions
>daily. For web site files, I think this would be rsyncing from shared
>to cached files; for mail, I think it would go in the other direction.
>
>
>
What benefit would we get from rsync'ing the IMAP files to the primary
fileserver? Wouldn't we just want users to access them through the IMAP
daemon? I think that for properly backing up the IMAP server we will
have to stop it momentarily anyway, or pull some other kind of trick in
order to get a decent snapshot. There are some threads I have browsed
on the internet regarding this but I've never run an IMAP server myself
so I can't say for sure.
>We would only have to worry about hardcore backup stuff for the shared
>filesystem. Most servers would only need some simple level of RAID to
>prevent loss of the last day's data due to a disk failure. On the file
>server, we'd worry about protection against benevolent human error (keep
>multiple images of all data) and system break-ins (regular copying of
>all data to an off-network location, where even someone who gains root
>access can't get to it).
>
>
>
Sounds like it could work. If we were doing these rsync's, then we may
be able to even eliminate RAID on the web servers and just have a single
disk. Things like centralized syslogging could also reduce dependency
on front-end webserver disks and allow us to build out with cheaper
boxes that are interchangeable. It may even be that we could get by
with hitting the fileserver directly, depending on the load of the front
end boxes... it would be easy enough to experiment with. Running out of
file descriptors might be another problem with this approach though.
BTW, cfengine is becoming more popular and could be a great way to
manage our network, if we spend some time figuring out how to get it
running. We're looking at it for our clusters, and bigger shops like
google seem to use it internally.
> From the perspective of members, they could have the benefit of the
>unified filesystem view while working on performance-critical stuff.
>Then they would just need to run a suitable "publish" command to get
>their important service files cached in the right way.
>
>
>
I agree -- this would be ideal.
>>What about using LDAP or an alternative for managing these user accounts
>>across servers? I'm not familiar enough with the applications you've
>>developed to know for sure how that would work out, but it seems that
>>there are plenty of tools for account administration, and we could
>>easily build something ourselves. And in the setup above, the user
>>would only need accounts on two machines -- the web host they're
>>assigned to, as well as an IMAP account on the mail server.
>>
>>
>>
>>
>At CMU, where I went to undergrad, they used AFS with Kerberos to manage
>a shared filesystem with common user accounts. This might work for us.
>
>
>
I've heard a lot of good things about AFS and Kerberos as well -- sounds
like a good plan that would definitely scale well.
Lots of good ideas here. Anyone up for sketching out a network diagram,
perhaps using "dia" in linux? Then we could post some of these ideas to
the wiki page as they become more refined. I might have time to throw
one together sometime this week if no one else volunteers.
Justin
More information about the HCoop-Discuss
mailing list