[Hcoop-discuss] Server pricing

ntk at hcoop.net ntk at hcoop.net
Wed Feb 8 15:53:30 EST 2006


>>>We could buy a scaleable system like you're suggesting above, but:
>>>
>>>1) It would be expensive.
>>>2) It's not necessary right now, in terms of space.
>>>3) We would outgrow it at some point and need to buy a completely new
>>> server

I think these objections will always apply to new setups unless we are
really maxing out our current setup, by which point it's a little late to
be planning.  Our current membership *is* underutilizing fyodor, and
redundancy aside (though this is a big issue), these objections would
apply to any new setup.  I think what we are predicating on is that there
*will* be significant growth, and that potential users and current users
with other applications in mind will benefit from a powerful new setup. 
And when we do make the switch, we want to have something big and modular
enough that it won't be outgrown for a long time, ideally at least a
couple of years or so.

>>I think it's entirely clear that we have current and future members who
>>would love to be able to use arbitrarily much disk space.  Keep in mind
>>that many of us now are purposely avoiding using HCoop for purposes that
>>would incur significant disk usage; it's not that we're all only
>>interested in low disk usage services, but rather that we impose low
>>disk quotas ATM.  Can you elaborate on your point 2 in light of this?
>>
>>Can it really be that expensive to have a slow fileserver for which it's
>>relatively easy to add new disks?  I'm fine with bad performance as long
>>as we can get commercial-level reliability and protection against data
>> loss.

I also agree that having a fileserver centrally storing home directories
would make life much easier for us.  There is the issue of whether or not
the server would be able to handle IMAP requests without grinding to a
halt.  I suspect that at first it would be fine, but it could be
problematic as we grow.

My own sysadmin experiences are not completely applicable, but possibly
helpful.  On a 64-node computational cluster with several dozen active
users, we had one file-server which had home directories mounted on all 64
nodes via nfs.  The server had a SCSI RAID with 10 drives.  This seemed to
work fine, until users starting logging in and filling up the queue with
jobs.  64 nodes hammering one fileserver quickly created a huge
bottleneck, disk reads started starving and performance was abysmal.  I
quickly discovered the problem was NOT network at all (GigE and Myrinet in
this case), but rather that the physical R/W throughput of the RAID array
was maxed out.

So we changed runscripts to have nothing mounted on the nodes, and any
data they needed would be copied back and forth (via scp) before and after
jobs finished running, but the directory structure of /home was
automatically mirrored to all the nodes from the fileserver by a runscript
on the headnode.  only the head node and a couple of other administrative
nodes mounted the fileserver.

Obviously we're not in a computational environment.  Our needs are
probably not nearly as intense, and we certainly won't have 64 nodes.  But
in any event, the big (and obvious) lesson for me is that you do NOT want
to start bottlenecking on your disk I/O, and that you will bottleneck your
disk I/O in a networked environment, not on your network I/O, and we
should keep this in mind when planning the network topology.

-ntk





More information about the HCoop-Discuss mailing list