<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

These are some very important questions that we will definitely have to

deal with as we grow.&nbsp; Of course, with the next network architecture,

initially only consisting of three servers, there will be numerous

single points of failure.&nbsp; Unfortunately, as Nathan pointed out, we

probably can't do a whole lot about that before we get a bit bigger.&nbsp; <br>

<br>

We should really continue to talk about this, though, in order to scale

out as gracefully as possible in the future.&nbsp; Single points of failure

may never be completely eliminated, but we can do a pretty good job of

getting rid of them to increase resource availability.&nbsp; Some data

centers, just to give you some idea of what we can do, have up to 100

load-balanced web servers in a single cluster, along with database

clusters, redundant switches and fiber connections.&nbsp; Furthermore, these

things aren't all prohibitively expensive.&nbsp; I put some preliminary

comments on possibilities for Hcoop's future on the page <a

 href="http://wiki.hcoop.net/wiki/SystemArchitecturePlans">http://wiki.hcoop.net/wiki/SystemArchitecturePlans</a>

(under "Scaling for Redundancy and Performance"), let's continue to

talk about it there.<br>

<br>

Justin<br>

<br>

Nathan Kennedy wrote:

<blockquote cite="mid44270941.1070505@hcoop.net" type="cite">

  <pre wrap="">Karl Chen wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">The quote request letter looks fine to me.

>From the gripes page there's this bullet:

* If one machine goes down, then our services will go down, too.

Is there anything not prohibitively expensive we can actually do

about this?  

DNS can be redundant, mail can be queued, but anything that needs

files is screwed if the file server goes down...

    </pre>

  </blockquote>

  <pre wrap=""><!---->Short of clustering, which is beyond our means at this point, I don't 

think there is anything we can do about this.  I think what we want is 

to prevent total hardware failure (with things such as redundant power 

supplies, UPS--very nice if this is provided by the facility--and RAID), 

and rapid on-site hardware support to replace failed hardware in the 

event of outages.  If we do this right, the mean time between failures 

may not be much worse than the lifetime of our hardware between upgrades 

anyway.  In any event the great majority of commercial web hosts out 

there function the same way--if a critical server kicks the bucket, the 

sites it hosts go down until someone fixes it or restores from backup to 

a replacement machine.  Truly clustered hosting is particularly 

difficult to do with the kind of dynamic stuff that many of our users 

run, and it is much more costly and complex to administer.   Once we get 

a whole bunch of servers, say at least half a rack, then it could make 

sense to keep a hot spare server idle on the rack, that could very 

rapidly replace any failing servers.  That could reduce downtime and 

also provide insurance since the probability of any one server failing 

gets much higher as the number of servers increases.

-ntk

_______________________________________________

Hcoop-discuss mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Hcoop-discuss@hcoop.net">Hcoop-discuss@hcoop.net</a>

<a class="moz-txt-link-freetext" href="http://hcoop.net/cgi-bin/mailman/listinfo/hcoop-discuss">http://hcoop.net/cgi-bin/mailman/listinfo/hcoop-discuss</a>

  </pre>

</blockquote>

<br>

<br>

</body>

</html>