[Hcoop-discuss] Next hardware configuration/Email service
Justin S. Leitgeb
leitgebj at hcoop.net
Tue Jan 31 00:39:22 EST 2006
Karl Chen wrote:
> I think the first thing to decide is if we want a transparent
> distributed file system like AFS/Coda/etc, or an exposed system
> like you describe with rsync and synchronization commands.
>
> Adam, it sounds like you favor control and performance over
> synchronization and ease of use, which would mean rsync instead of
> AFS.
>
> Personally I think AFS would be easier and tweakable enough for
> performance concerns.
>
>
> We're talking about all servers being hosted in the same place, so
> intra-server bandwidth is free, right?
>
>
>
Intra-server bandwidth is free, and any server we get should have two or
more gigabit ports available for building LAN(s) between our systems.
We would probably want/need to buy a separate switch for this at some
point, as these networks should be used for backups and administrative
purposes as well.
The problem is that our bottleneck is not going to be just the
throughput between servers, but that a couple of popular sites can still
bring the fastest typical rack servers to their knees. I saw a dual-cpu
3.2 GHz Dell 1850 with a load average of 110 today, and it was in a
cluster of two load-balanced apache servers with all local files
(granted, everything was mod_perl and mysql content).
The point is that AFS alone is not going to deal with performance
issues. Real-world web sites can stress even large machines by today's
standards. I just think that we need to recognize this out front, and
realize that we can't scrimp on the disk configuration of the fileserver
-- RAID 10 may be something we should consider if we really want to go
down that route, and it won't be cheap. We will also want loads of
memory in the front-end servers so that they can cache rather than
introducing additional overhead assembling AFS packets.
Instead of this, I guess I still prefer the idea of building out
"horizontally", however this is achieved. I'm a huge fan of systems
that are as elegant as AFS appears to be, but it could be throwing us
into a world of much more expensive hardware in order to do it right.
I'm not putting forward a single solution here, and doing so would
probably be impossible. I just think that we need to realize that a
single system or performance tweak at some point won't be enough. The
architecture (software and hardware) that we develop has to be able to
flexible enough to handle different configurations for performance and
ease of maintenance depending on the application being run. Maybe this
is something that we could write with custom software -- depending on
the application, a "publish" command could either rsync files to a
front-end server, or just tell Apache to pick up modified files to its
cache. But this may have been what Adam was suggesting...
Still open to suggestions -- this isn't an easy architecture to plan.
Justin
More information about the HCoop-Discuss
mailing list