[HCoop-Discuss] Frank's confusion

Wed Aug 19 12:33:14 EDT 2009

Adam Chlipala <adamc at hcoop.net> writes:

> Adam Megacz wrote:
>> Franklin Gordon Bynum <frank at hcoop.net> writes:
>>   
>>> but the example of krunk going down must seal the issue for good:
>>> AFS-- for practical reasons, not technical ones-- is that AFS isn't
>>> going to work for us at this point in time as we have it now.
>>>     
>>
>> Hi Frank.  I can understand your confusion, as you yourself have
>> mentioned that you aren't familiar with computers.
>>
>> In computing, we make a distinction between hardware and software.
>>
>> Krunk's ethernet defective controller is what is called a "hardware"
>> failure.  AFS is a piece of software.  Startling as this may sound,
>> software cannot cause hardware failures.
>>   
>
> This seems like a deliberately disingenuous argument.  If we weren't 
> using a network filesystem, then a failure of network hardware wouldn't 
> break the filesystem.  It's that simple.

Yes, it would break everything instead.

Instead we are merely lacking the read only copy of our afs volumes and
a secondary KDC which is causing the underlying issues with deleuze to
be more noticeable (transient failure to check credentials -> failure
instead of checking krunk and succeeding).

Note that I am not defending Adam M's intentionally flamewar provoking
form of the argument.

> The whole line of discussion is about which architectural choices make 
> sense, given their likely impacts on the services we provide, and given 
> the amount of labor that we're willing to devote to setting things up 
> and keeping them working.  Some problems are inevitable if you use 
> distributed systems; such problems can provide motivation to avoid 
> distribution without casting aspersions on particular systems

I'd wager that AFS and Keberos reduce the amount of effort we have to
put into things overall. The main issue we have now is that we have
insufficient admin power in general: *nothing* is getting done *at
all*. I will accept blame for part of this (a minor miscommunication
with docelic followed by me not pushing him to set my account up more
quickly and my apt-get upgrade snafu ... followed by my window of having
infinite free time closing thanks to finding paying work).

It is almost as bad to continually scapegoat one part of our
infrastructure just because it has the most visible problems. 

How about exim? I think we should stop supporting email because we have
had to use hackish scripts to retry delivery and such owing to
load. Sometimes mail sits in the queue for hours. This is fundamentally
broken. 

How about databases? They are slow, there are transient connection
errors frequently, et. We should stop supporting them because they are
broken and we don't have any professional DBAs around to maintain them.

How about Apache? Do we even know /what/ is broken with Apache? Nope! We
just restart it every now and then to work around the problem. We don't
have an Apache expert around to debug it. So we ought to stop supporting
that too.

It would be ridiculous to accept these arguments, but their form is
identical the one used against AFS and Kerberos.

The last sentence in your email should instead read:

  Some problems are inevitable if you use computers.

-- 
emacsen: every copy of Emacs comes with a bag of pot and 5 hits of acid
emacsen: and a hotel coffee maker