FPSNetwork

FPSNetwork (http://fpsnetwork.com/forum/index.php)
-   Hardware troubles (http://fpsnetwork.com/forum/forumdisplay.php?f=65)
-   -   Dell Precision Workstation with unusual crashing events (http://fpsnetwork.com/forum/showthread.php?t=461)

andyofne 08-02-2007 02:04 PM

Dell Precision Workstation with unusual crashing events
 
In the office I have a Dell PWS 530 running Windows 2000 Pro with a 1.8Ghz Xeon processor, 1GB of RDRAM, a Matrox G550 dual head video card, and a SCSI hard drive.

About two weeks ago the system started crashing unexpectedly.

There are no entries for the crashes in the event viewer. No message from the BIOS shows up when you restart (saying things like System shut down was caused by a thermal event), and there is no real rhyme or reason to what causes the system to shut down.

I've run the Dell Diagnostics CD on the system for literally hours and everything reported back fine. I ran memtest on the RAM and it came up clean.

I've also:

- Switched hard drives out
- Switched processor/heat sinks
- Switched video cards
- Tried to install a clean copy of Windows 2000 on a new hard disk (fails)
- Tries to install RedHat Linux on a new hard disk (fails)
- Tried removing extra devices (nothing really extra installed)
- Ordered a replacement motherboard (refurb) only to find out it was the wrong revision and doesn't support my available processors.

I cannot duplicate the shut down event either. It will some times run fine for a while and then some times it shuts down as soon as you double click on a desktop icon.

It seems to be able to stay up and running fine for hours so long as you don't touch it.

It will run in Safe Mode seemingly fine for extended periods of time. However, I can't run the applications I need to use in Safe Mode.

Right now, I'm running a SLAX live linux CD to see how long I can run the system this way.

So, to recap: the system just instantly shuts off and does not report anything , any where. There is no indication what is causing it.

Up to this point, the system has run fine for the last 4 years.

It has a mission critical application, vendor installed, that I cannot reinstall on another machine without the vendor's support.

My boss decided not to pay the annual maintenance agreement so now it will cost between $5,000 and $10,000 to get the system reinstalled or repaired through the vendor.

Has anyone ever seen a system drop dead like this?

I do not believe this is heat related because I created a 'heat' situation and the system shut down BUT posted a message on reboot saying "The system was shut down due to a thermal event". I do not see this message when the system crashes 'normally'.

If you can think of any other tests I can run before I get the replacement motherboard next week, I'll likely try whatever you can suggest.

Thank you, that is all.

*Xx~Vlad~xX* 08-02-2007 02:36 PM

Have you tried: c:\>chkdsk /f from the command prompt to see if there are any bad blocks that the Dell Diag didn't catch? I have seen systems do this when the OS tires to page to a sector on disk w/ a bad block or two. Sometimes it would simply shutdown or BSOD while simply opening an application, or saving a txt file.

I wonder if a Ghost of the old drive onto a new drive would work (being that Ghosting might present a challenge w/ SCSI HD's) I have never tried ghost w/ SCSI drives before, just SATA/EIDE.

Also sounds like a short somewhere on the system board or the Power Supply not dishing out the right amount of AC to the system board.

Ghanzafar 08-02-2007 02:52 PM

Also do not forget to check the power supply make sure the voltage is correct. Many problems come from a faulty power supply. The power supply might work for a while but after a while the circuits heat up and crashes can occur. Alot of shit today is made by cheap labor and poor workmanship it could well be a cold sodi joint. It is hard to test the power supply unless you have a tester and voltage meter (try swapping it out and see if that is the problem).

andyofne 08-02-2007 03:12 PM

Quote:

Originally Posted by *Xx~Vlad~xX* (Post 3697)
Have you tried: c:\>chkdsk /f from the command prompt to see if there are any bad blocks that the Dell Diag didn't catch? I have seen systems do this when the OS tires to page to a sector on disk w/ a bad block or two. Sometimes it would simply shutdown or BSOD while simply opening an application, or saving a txt file.

I wonder if a Ghost of the old drive onto a new drive would work (being that Ghosting might present a challenge w/ SCSI HD's) I have never tried ghost w/ SCSI drives before, just SATA/EIDE.

Also sounds like a short somewhere on the system board or the Power Supply not dishing out the right amount of AC to the system board.

Well, I've actually switched SCSI disks to a new, working disk and it still crashes during the installation phase at the point where it tries to save the configuration... seconds before it finishes.

I have made a ghost image but I haven't tried to put it on the new disk because the system doesn't stay up long enough or accept an OS.

andyofne 08-02-2007 03:14 PM

Quote:

Originally Posted by Ghanzafar (Post 3700)
Also do not forget to check the power supply make sure the voltage is correct. Many problems come from a faulty power supply. The power supply might work for a while but after a while the circuits heat up and crashes can occur. Alot of shit today is made by cheap labor and poor workmanship it could well be a cold sodi joint. It is hard to test the power supply unless you have a tester and voltage meter (try swapping it out and see if that is the problem).

Agreed.

However, this is a Precision Work Station that has a special, heavy duty power supply. You can't simply put in another ATX supply.

I may be able to swap that out tomorrow with another PWS 530 but I'm not 100% certain about that. I'm going to give it a shot.

Also, the system runs fine in safe mode and it's been running fine off the linux live CD since my first post.

I don't know what I can do to "stress test" the system with the Live CD but I"m trying to make it crash.

Ghanzafar 08-02-2007 03:25 PM

I had the same issue a while back. I replaced everything memory harddrive cpu and the last thing I replaced was the powersupply. Go figure the cheapest thing was the last replacement. I repacked the memory cpu and shipped it back to tigerdirect but kept the raptor harddrive.

andyofne 08-02-2007 04:35 PM

Well, after talking with Reaper on the phone, I took the board out of the case and examined the capacitors with a magnifying glass (I have old eyes) and I found one that looks like it may have 'burst'.

When I've seen blow capacitors in the past they've been visibly fattened around the middle or actually exploded with shredded paper and black 'burn' marks. This one looks like it puffed out a bit at the top but it's still 'sort of' functioning.



Looking directly down on the top you can clearly see that the capacitor is cracked open. It isn't clear from this angle.

hoser 08-02-2007 09:36 PM

I told you motherboard two days ago.

Bastage.

andyofne 08-02-2007 09:46 PM

I ordered a motherboard over a week ago, if you recall, but I got the wrong one.

So there.

hoser 08-02-2007 10:12 PM

I told you motherboard over a week ago.


All times are GMT -5. The time now is 11:23 AM.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.