View Single Post
Old 09-23-2004, 09:37 PM   #1
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
Latest Cellar crash better

Holy crap, people.

This one was a killer, but I think the machine is now entirely much better, thankyouvery much. What follows are the crappy details that I had up here instead of a real Cellar for a few hours:
---


The power supply fan seized. OK, how many times have I mentioned how
important the power supply is? Now there's a very expensive Enermax
running the Cellar machine.

It first appeared that the system was suffering from the same trouble
that hosed it last time - a failing drive. But once I realized how hot
the machine was, and that the power supply was particular hot, I
figured it might have something more serious. I replaced the power
supply but the system was still failing. And then the motherboard
suddenly died.

The process of rescuing a Linux system that won't boot, involves
booting from "rescue" CD, fixing what's wrong with your system, and
then booting from the system again. This means hitting that DEL key,
going into the BIOS, and telling it to boot from the other device.
After doing that 20 times, suddenly the system just wouldn't boot at
all. I may have blown another BIOS setting, but... thing wouldn't
even beep.

That was last night, and at that point I had no recourse because the
stores were closed. This morning I bought a new motherboard (and a
new processor just in case). Installed those, they run great...
But what THEN turned up was a slowly failing hard drive or further
filesystem confusion.

So! I added another drive, and moved one filesystem to it, leaving
the root system on the questionable drive. I'll upgrade and move the
whole thing to another new drive over the weekend.

This corrected almost everything wrong, but the Cellar still wouldn't
fly. In fact, starting it would cause the system to crash again!
I found that the database tables that run the Cellar had been really
messed up, requiring very complete rebuilds. The system lived well
through database reconstruction work, which was like a stress test on
everything except the web and database. But that added another couple
hours of downtime for the Cellar. That's where it is now, the tables
aren't working right for some reason...

This stuff is getting expensive and so I'm going to add the tip jar back.
Future tip jar donations will go to devoting an entire system solely to
the Cellar, a move that's well overdue. It used to have its own system
when its was dial-up.

And why did all this happen? One reason I can think of. I washed the
fan filter, and I might have put it back WET. It didn't feel wet but
you know how those things are. Well, in a dusty environment (your
basic house), dust + water = glue. It might have just been too humid.
The thing died a week after replacing that filter so this may be a
reach, but my next house will include a clean room.
Undertoad is offline   Reply With Quote