PDA

View Full Version : Dell Precision Workstation with unusual crashing events


andyofne
08-02-2007, 02:04 PM
In the office I have a Dell PWS 530 running Windows 2000 Pro with a 1.8Ghz Xeon processor, 1GB of RDRAM, a Matrox G550 dual head video card, and a SCSI hard drive.

About two weeks ago the system started crashing unexpectedly.

There are no entries for the crashes in the event viewer. No message from the BIOS shows up when you restart (saying things like System shut down was caused by a thermal event), and there is no real rhyme or reason to what causes the system to shut down.

I've run the Dell Diagnostics CD on the system for literally hours and everything reported back fine. I ran memtest on the RAM and it came up clean.

I've also:

- Switched hard drives out
- Switched processor/heat sinks
- Switched video cards
- Tried to install a clean copy of Windows 2000 on a new hard disk (fails)
- Tries to install RedHat Linux on a new hard disk (fails)
- Tried removing extra devices (nothing really extra installed)
- Ordered a replacement motherboard (refurb) only to find out it was the wrong revision and doesn't support my available processors.

I cannot duplicate the shut down event either. It will some times run fine for a while and then some times it shuts down as soon as you double click on a desktop icon.

It seems to be able to stay up and running fine for hours so long as you don't touch it.

It will run in Safe Mode seemingly fine for extended periods of time. However, I can't run the applications I need to use in Safe Mode.

Right now, I'm running a SLAX live linux CD to see how long I can run the system this way.

So, to recap: the system just instantly shuts off and does not report anything , any where. There is no indication what is causing it.

Up to this point, the system has run fine for the last 4 years.

It has a mission critical application, vendor installed, that I cannot reinstall on another machine without the vendor's support.

My boss decided not to pay the annual maintenance agreement so now it will cost between $5,000 and $10,000 to get the system reinstalled or repaired through the vendor.

Has anyone ever seen a system drop dead like this?

I do not believe this is heat related because I created a 'heat' situation and the system shut down BUT posted a message on reboot saying "The system was shut down due to a thermal event". I do not see this message when the system crashes 'normally'.

If you can think of any other tests I can run before I get the replacement motherboard next week, I'll likely try whatever you can suggest.

Thank you, that is all.

*Xx~Vlad~xX*
08-02-2007, 02:36 PM
Have you tried: c:\>chkdsk /f from the command prompt to see if there are any bad blocks that the Dell Diag didn't catch? I have seen systems do this when the OS tires to page to a sector on disk w/ a bad block or two. Sometimes it would simply shutdown or BSOD while simply opening an application, or saving a txt file.

I wonder if a Ghost of the old drive onto a new drive would work (being that Ghosting might present a challenge w/ SCSI HD's) I have never tried ghost w/ SCSI drives before, just SATA/EIDE.

Also sounds like a short somewhere on the system board or the Power Supply not dishing out the right amount of AC to the system board.

Ghanzafar
08-02-2007, 02:52 PM
Also do not forget to check the power supply make sure the voltage is correct. Many problems come from a faulty power supply. The power supply might work for a while but after a while the circuits heat up and crashes can occur. Alot of shit today is made by cheap labor and poor workmanship it could well be a cold sodi joint. It is hard to test the power supply unless you have a tester and voltage meter (try swapping it out and see if that is the problem).

andyofne
08-02-2007, 03:12 PM
Have you tried: c:\>chkdsk /f from the command prompt to see if there are any bad blocks that the Dell Diag didn't catch? I have seen systems do this when the OS tires to page to a sector on disk w/ a bad block or two. Sometimes it would simply shutdown or BSOD while simply opening an application, or saving a txt file.

I wonder if a Ghost of the old drive onto a new drive would work (being that Ghosting might present a challenge w/ SCSI HD's) I have never tried ghost w/ SCSI drives before, just SATA/EIDE.

Also sounds like a short somewhere on the system board or the Power Supply not dishing out the right amount of AC to the system board.

Well, I've actually switched SCSI disks to a new, working disk and it still crashes during the installation phase at the point where it tries to save the configuration... seconds before it finishes.

I have made a ghost image but I haven't tried to put it on the new disk because the system doesn't stay up long enough or accept an OS.

andyofne
08-02-2007, 03:14 PM
Also do not forget to check the power supply make sure the voltage is correct. Many problems come from a faulty power supply. The power supply might work for a while but after a while the circuits heat up and crashes can occur. Alot of shit today is made by cheap labor and poor workmanship it could well be a cold sodi joint. It is hard to test the power supply unless you have a tester and voltage meter (try swapping it out and see if that is the problem).

Agreed.

However, this is a Precision Work Station that has a special, heavy duty power supply. You can't simply put in another ATX supply.

I may be able to swap that out tomorrow with another PWS 530 but I'm not 100% certain about that. I'm going to give it a shot.

Also, the system runs fine in safe mode and it's been running fine off the linux live CD since my first post.

I don't know what I can do to "stress test" the system with the Live CD but I"m trying to make it crash.

Ghanzafar
08-02-2007, 03:25 PM
I had the same issue a while back. I replaced everything memory harddrive cpu and the last thing I replaced was the powersupply. Go figure the cheapest thing was the last replacement. I repacked the memory cpu and shipped it back to tigerdirect but kept the raptor harddrive.

andyofne
08-02-2007, 04:35 PM
Well, after talking with Reaper on the phone, I took the board out of the case and examined the capacitors with a magnifying glass (I have old eyes) and I found one that looks like it may have 'burst'.

When I've seen blow capacitors in the past they've been visibly fattened around the middle or actually exploded with shredded paper and black 'burn' marks. This one looks like it puffed out a bit at the top but it's still 'sort of' functioning.

http://www.utgmc.com/images/cap1.jpg

Looking directly down on the top you can clearly see that the capacitor is cracked open. It isn't clear from this angle.

hoser
08-02-2007, 09:36 PM
I told you motherboard two days ago.

Bastage.

andyofne
08-02-2007, 09:46 PM
I ordered a motherboard over a week ago, if you recall, but I got the wrong one.

So there.

hoser
08-02-2007, 10:12 PM
I told you motherboard over a week ago.

andyofne
08-02-2007, 10:14 PM
I told you motherboard over a week ago.

You were at the beach. Try again!

hoser
08-03-2007, 07:09 AM
Oh yea. Nevermind :)

{2399}Straycat
08-03-2007, 04:53 PM
lol

D_A_M_A_G_E
08-06-2007, 08:05 PM
It`s actually pretty clear from that < ;)

andyofne
08-06-2007, 08:15 PM
New board should be on site Wednesday.

Lonestar
08-09-2007, 04:30 PM
Your problem is that it's a Dell. :p

andyofne
08-09-2007, 05:07 PM
Installed the 'new' board with no problems. however, the customized vendor installed software has a licensing program that is bound to the MAC address on the onboard NIC.

I had to jump through some hoops with the vendor to get a new license key generated for their shitty software.

All is working well again.

On that machine at any rate.

{2399}Reaper
08-09-2007, 10:32 PM
Now, create a vmware of the system... and I believe you can clone a mac on a vmware session.

hoser
08-10-2007, 07:39 AM
That's the best idea you've had all year.

*Xx~Vlad~xX*
08-10-2007, 07:48 AM
Now, create a vmware of the system... and I believe you can clone a mac on a vmware session.

Yes you can clone the MAC address w/ VMWare :D

{2399}Reaper
08-10-2007, 09:22 PM
wow, I just reread what I wrote and I am suprised beanie didnt come in here and cream himself... I meant a MAC ADDRESS, not one of those things that the logo is a grey apple with a chunk out of it.... hehe

VMWARE is the bomb.... I read some place that dell is considering a motherboard that has vmware at the bios level.... which should mean you dont need the base OS load to get vmware installed.. it would be on a chip. I think I read it on digg some place....