[Hcoop-discuss] Next hardware configuration/Email service

Justin S. Leitgeb leitgebj at hcoop.net
Tue Jan 31 00:39:22 EST 2006


Karl Chen wrote:

> I think the first thing to decide is if we want a transparent
> distributed file system like AFS/Coda/etc, or an exposed system
> like you describe with rsync and synchronization commands.
>
> Adam, it sounds like you favor control and performance over
> synchronization and ease of use, which would mean rsync instead of
> AFS.
>
> Personally I think AFS would be easier and tweakable enough for
> performance concerns.
>
>
> We're talking about all servers being hosted in the same place, so
> intra-server bandwidth is free, right?
>
>  
>

Intra-server bandwidth is free, and any server we get should have two or 
more gigabit ports available for building LAN(s) between our systems.  
We would probably want/need to buy a separate switch for this at some 
point, as these networks should be used for backups and administrative 
purposes as well.

The problem is that our bottleneck is not going to be just the 
throughput between servers, but that a couple of popular sites can still 
bring the fastest typical rack servers to their knees.  I saw a dual-cpu 
3.2 GHz Dell 1850 with a load average of 110 today, and it was in a 
cluster of two load-balanced apache servers with all local files 
(granted, everything was mod_perl and mysql content).

The point is that AFS alone is not going to deal with performance 
issues.  Real-world web sites can stress even large machines by today's 
standards.  I just think that we need to recognize this out front, and 
realize that we can't scrimp on the disk configuration of the fileserver 
-- RAID 10 may be something we should consider if we really want to go 
down that route, and it won't be cheap.  We will also want loads of 
memory in the front-end servers so that they can cache rather than 
introducing additional overhead assembling AFS packets.

Instead of this, I guess I still prefer the idea of building out 
"horizontally", however this is achieved.  I'm a huge fan of systems 
that are as elegant as AFS appears to be, but it could be throwing us 
into a world of much more expensive hardware in order to do it right.

I'm not putting forward a single solution here, and doing so would 
probably be impossible.  I just think that we need to realize that a 
single system or performance tweak at some point won't be enough.  The 
architecture (software and hardware) that we develop has to be able to 
flexible enough to handle different configurations for performance and 
ease of maintenance depending on the application being run.  Maybe this 
is something that we could write with custom software -- depending on 
the application, a "publish" command could either rsync files to a 
front-end server, or just tell Apache to pick up modified files to its 
cache.  But this may have been what Adam was suggesting...

Still open to suggestions -- this isn't an easy architecture to plan.

Justin




More information about the HCoop-Discuss mailing list