[HCoop-Help] Wiki down?

Clinton Ebadi clinton at unknownlamer.org
Thu May 22 16:52:48 EDT 2014


Sajith T S <sajith at hcoop.net> writes:

> It can't be just me that's causing an Internal Server Error...
>
> Thanks for looking into this!

The getcwd() bug in afs struck again... 

I burned out a bit a little over a week ago; I've just been rebooted
navajos daily to keep it going. But I think I've finally reintegrated
all of the threads at hcoop (having to use dovecot instead of courier on
the new mail server made my beautiful plans unravel!) and will be
resuming work on getting navajos stable and the new mail server in
production.

A few notes on how I'm going about restabilizing navajos's afs client,
since -sysadmin is pretty much an echo chamber nowadays:

The first step will be upgrading to an openafs 1.6.8 prerelease for the
client on navajos. The developers said it ought to fix the suexec
getcwd() bug. Maybe the rxkad errors and soft lockups in the VFS will go
away then... but probably not. Next comes doing traffic dumps, auditing
which processes are contacting the afs database servers, and then
figuring out *what* to debug to trace the afs client problems.

I tried upgrading to linux 3.2 from 2.6.32 on navajos hoping perhaps it
was a problem with the old kernel that was causing at least some of the
problems. I think it *did* relieve the rxkad errors, but replaced them
with even worse lockups... apache was repeatedly hanging in the vfs
instead of afs client, and sooner.

I am beginning to suspect it may just be a better idea to snapshot
navajos, upgrade it in-place to wheezy, and hope it goes OK. Our config
packages are up to date with wheezy (mccarthy has apache installed too,
and it works) so nothing should go awry. Wheezy was also a fairly minor
upgrade as far as web software goes, and navajos has been running things
like the backported python since day one...

At least then if problems persist we'd be debugging them against a
reasonably modern kernel (using the wheezy-backports kernel). We'd also
get better TLS support out of it...

I've also done some preliminary work for switching from courier to
dovecot. The courier mail indexes -> dovecot indexes script claims that
no mailboxes at hcoop would have errors upon conversion; I've made the
vmail database code amenable to publishing more than one format, and
have a pretty solid idea of how we need to configure dovecot. The hard
parts remain ;) (mostly making sure dovecot+exim won't start eating your
mail during delivery).

-- 
Jessie: i stuck the phone antenna up the dogs nose and he ignored me
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
Url : http://lists.hcoop.net/pipermail/hcoop-help/attachments/20140522/7c9c3145/attachment.pgp 


More information about the HCoop-Help mailing list