• Whea_uncontrollable_error

    • This topic has 24 replies, 7 voices, and was last updated 9 years ago.
    Author
    Topic
    #505200

    I acquired this Dell Precision Tower 5810 in Sept 2015 & am running Windows 8.1 Pro 64-bit

    For a while now, I periodically get the error WHEA_UNCONTROLLABLE_ERROR

    In the past it happened seldom & I got past it by just rebooting.

    But now it is happening more often.

    I turn my system off every night & back on every morning.

    I only get this error when turning on the System in the morning.
    It sometimes occurs 3 days in a row & sometimes it is only every other day or every 3rd day.

    I did do some research a month or so ago to try to find out why it was doing this – but never really got anywhere.

    It seems that this error occurs with gamers with overclocked systems (i.e. heat)
    I am not a gamer, and this system is not overclocked.

    Also, this system is in a very clean room, the tower is up on a desk & not on the floor – so dust accumulation is not a problem on this barely 6 month old system.

    But, at an abundance of caution, I opened the Tower this AM & the inside is super clean.

    Can anyone steer me down a path to solving this problem?

    Viewing 11 reply threads
    Author
    Replies
    • #1559306

      Are you sure that the system is not overclocked?

      A Dell precision is a very high end machine. I wonder if it appears to be overclocked?

      Perhaps the fan is bad. Or perhaps someone used poor-quality thermal compound when they attached the heat sink to the CPU.

      Group "L" (Linux Mint)
      with Windows 10 running in a remote session on my file server
      • #1559410

        Are you sure that the system is not overclocked?

        A Dell precision is a very high end machine. I wonder if it appears to be overclocked?

        Perhaps the fan is bad. Or perhaps someone used poor-quality thermal compound when they attached the heat sink to the CPU.

        “Appearing” to be overclocked was my initial thought when I previously did the research & saw it usually is heat related on gamer’s overclocked systems.

        This Dell Precision Tower 5810 was a warranty replacement for my previous Dell XPS 8700 Special Edition purchased 1-16-15 that had a catastrophic failure.

        The Precision does have an Intel Xeon Processor E5-1630 v3 (Four Core HT, 10 MB Cache, 3.7Ghz Turbo) and a Nvidia Quadro K4200 4 GB Graphics Card and is considered a Workstation.

    • #1559317

      I assume this is Error 124? A hardware issue.

      No overclock. You have already confirmed that, plus most big box PCs don’t permit OCing the CPU or RAM. No OC on the GPU either?

      Overheating? See what HWMonitor(free) has to say as you use the PC, eventually save the Monitoring Data.

      Drivers and BIOS up to date?

      Unplug any peripherals that are not critical and see if the error goes away. That is the easiest hardware ruleout.

      Run memtest86 (free). 7 passes without an error = pass.

      Run the ssd and hard drive maker’s diagnostic apps (free) on the drives. Be prepared to jot down any error code generated. Good idea to set up to write a log of the testing: drive info, SMART results, short/quick test. DO NOT accept any offer to fix a drive without backing the drive up first. And I’d replace such a drive fast. [unless there is a firmware update that fixes the issue]

      Dell Precision Tower 5810
      https://www.dell.com/support/home/us/en/19/product-support/product/precision-t5810-workstation/drivers#SimpleDrivers

      Run a preboot system assessment (ePSA), jot down any error code. Feed any error code into the Diagnostic tab on the Dell link above. https://www.dell.com/support/Article/ai/en/aidhs1/SLN115162/EN

      If nothing points to the hardware issue then you will have to unplug a component one at a time and run the PC until you figure it out. Or failing that swap in one part (preferably known working) at a time, or swap the part into a known working PC to see if you can recreate the error.

      http://www.sevenforums.com/crash-lockup-debug-how/35349-stop-0x124-what-means-what-try.html

      http://social.technet.microsoft.com/Forums/windows/en-US/39713c45-5f99-454d-a408-2a0e64f6e1ab/bsod-0x00000124-please-help?forum=w7itproperf

      http://social.technet.microsoft.com/Forums/windows/en-US/c0ca1dae-66fa-4f1c-b773-d3b3730464ad/windows-bsod-124?forum=w7itprohardware

      http://www.overclock.net/t/1120291/solving-fixing-bsod-124-on-sandybridge-read-op-first

      http://www.carrona.org/bsodindx.html#0x00000124

    • #1559320

      If you have more than one RAM stick installed, then run the memory test on just one stick at a time as you can get a false reading otherwise.

      The trial version of HDSentinel will give you a written report but if the machine is only 6 months old then you would have to be extremely unlucky to have a problematic HDD.

      http://www.hdsentinel.com/ (green download button)

    • #1559321

      You also may want to test your memoery with memtest86:
      http://www.memtest86.com/

      Jerry

    • #1559323

      WHEA is a hardware error, usually self-reported, for instance by the CPU. But, it can be down to software, usually a driver.

      If you have some minidumps, copy them to the Desktop, zip them and attach the zip, I’ll take a look.

    • #1559331

      I acquired this Dell Precision Tower 5810 in Sept 2015 ….

      Missed that Sudo. Old system for “new.” But if new then still under Dell warranty it should be Dell’s problem. They have a client app on the Dell link I included that gets info for their TS people.

      • #1559332

        But if new then still under Dell warranty it should be Dell’s problem.

        Yup, if it’s a hardware problem…

      • #1559412

        Missed that Sudo. Old system for “new.” But if new then still under Dell warranty it should be Dell’s problem. They have a client app on the Dell link I included that gets info for their TS people.

        I have intentionally avoided getting Dell involved because of the almost 6 month fiasco I went through with them before they finally agreed to replace the XPS 8700.

        Among numerous other things:
        I had an onsite tech 4 times
        They replaced the CPU twice
        They replaced the entire motherboard twice
        They replaced the SSD once

        But, I am at the point where I will now probably do the following:
        Run the Dell Diagnostics on their website
        Start a Dell Support Ticket

        Except for my very 1st Desktop eons ago (A custom built Everex), I have repeatedly purchased Dells.

        I have gradually seen the decline in Dell Support & Dell Quality.

        This Precision will be my last Dell.

    • #1559413

      Take it back under warranty and get them to sort it out.

      • #1559417

        Take it back under warranty and get them to sort it out.

        That is not as easy as it sounds.

        My previous experience took 6 months & Dell techs made the problem much worse to the point where they had no choice but to replace the system.

        • #1559424

          That is not as easy as it sounds.

          My previous experience took 6 months & Dell techs made the problem much worse to the point where they had no choice but to replace the system.

          Given this appears to be a hardware issue, replacing the system could be a good thing. :rolleyes:

          Jerry

          • #1559659

            Given this appears to be a hardware issue, replacing the system could be a good thing. :rolleyes:

            Jerry

            This 9-15 system is already a replacement for the system I purchased in 1-15

            There is no way Dell will again replace my system.

            Besides, getting yet another system that will probably have other problems may really not be a good thing.

            • #1559660

              If they have supplied a machine that is defective then they’ll have no choice but to repair/replace.

              If you don’t ask then you don’t get.

            • #1559704

              getting yet another system that will probably have other problems may really not be a good thing.

              I think you have just been unlucky here. Dell is usually good stuff. I’ve had a few Dells over the years, and I’ve worked on a ton of Dells. There haven’t been very many bad ones.

              The only problem I’ve seen much of with Dell has been that the battery on Dell laptops often develops a memory rather quickly. But since you don’t have a laptop, that isn’t a problem for you.

              There may be some common factor at your residence, such as electrical problems, which is somehow causing the failures. But it’s not because Dell is bad stuff.

              Group "L" (Linux Mint)
              with Windows 10 running in a remote session on my file server
            • #1564099

              UPDATE

              I had not gotten the WHEA_UNCONTROLLABLE_ERROR again since I 1st posted this thread.
              But, since it had happened randomly before, I decided it was time to try to resolve it.

              Before I contacted Dell Support:

              I ran the 2 Diagnostic Tests on the Dell Website as follows:
              10 Minute Quick Test
              40 Minute Full Test
              All items passed both tests

              But because I was getting the WHEA_UNCONTROLLABLE_ERROR at boot-up, I decided to also run the ePSA (ENHANCED Pre-Boot System Assessment).
              The Network 1 failed with the Error 2000-0620 (See the Screenshot below.)
              44494-Network-3-of-3

              Armed with the above information, I called Dell Support.

              In a Remote Session with Dell on 4-27-16 (using the Dell Command Software already on my System), Dell checked for updates to my System & found 3.

              These following 3 Updates were installed by Dell:
              Dell Command Software Update
              Dell Precision Workstation T5810 BIOS Update (from A7 to A12)
              NVIDIA Quadro Driver 354.13

              But since that did not solve my problem, Dell dispatched a tech to replace my Motherboard on-site.

              The Tech replaced my Motherboard on 4-28-16 & then ran the ePSA Diagnostic which failed with the same Error 2000-0620

              The Tech re-ran the ePSA a 2nd time & this time it passed, so the Tech left.

              After the Tech left, I re-ran the ePSA 6 times (3 times with the Ethernet Cable connected at the back of the Tower and 3 times with the Ethernet Cable disconnected from the back of the Tower).
              The ePSA failed all 6 times.

              On 5-3-16 at the instruction of Dell, I updated the BIOS on the replaced Motherboard from A9 to A12

              On 5-4-16 on the telephone with Dell, I explained how the ePSA passes when run from a Cold Boot, but fails when run from a Re-Start.
              Previously, the ePSA failed every time.
              But, I could not say with any certainty that the ePSA was done every time back then only on Re-Start or whether it was sometimes done from a Cold Boot.
              Dell did not seem concerned about that in any way.

              Thinking that “there may be another component that is causing the motherboard to fail”, Dell again dispatched the Tech to me to replace the Motherboard again & to also replace the Power Supply Unit. I was also instructed by Dell to have a different Ethernet Cable available to swap it out for the current one.

              On 5-5-16 the Tech replaced the Motherboard again & also replaced the Power Supply Unit.
              Same result.
              ePSA passes on Cold Boot but fails on Re-Start.

              On 5-7-16, I updated the BIOS on the 2nd Motherboard from A8 to A12

              Just today (5-16-16), Dell says (Exact Quote from E-Mail sent to me):
              “the ePSA fails with a NIC card failure.
              For this particular system, motherboard, and NIC card ePSA will fail the NIC test because DHCP mode has been disabled.
              Because this is a false positive for the System and the part that would cause this error has been replaced twice, it is definitely not related to the WHEA_UNCONTROLLABLE_ERROR on boot up.”

              On a follow-up E-Mail, Dell also stated:
              The DHCP being disabled “is a setting that has to be changed in the Intel Management Engine BIOS which is separate from the computer’s BIOS settings.
              You may access this by typing Ctrl+P during boot up.
              However, I do not know if changing these settings will negatively impact the system’s performance or usability.
              That’s why I have not encouraged you to change these settings.”

              Below is a Screenshot that shows that the DHCP Mode may in fact be enabled instead of disabled????
              44495-DHCP-Mode

              If the DHCP Mode is in fact disabled in the Intel Management Engine BIOS as Dell suggests, it came that way with the System.

    • #1559414
    • #1559415

      THANKS to everyone that replied.

      You all have provided helpful suggestions that I will pursue before I place my Precision in the @%%$^# hands of Dell.

    • #1564107

      What does Event Viewer record ?

      One thing that will confirm DHCP enabled is to enter ipconfig /all on a cmd prompt.

      Does the Dell Diagnostics do a proper job of checking the HDD ?

      Have you performed a chkdsk /f ?

      SeaTools for Windows or the trial version of HDSentinel may have more concise info –

      http://www.seagate.com/gb/en/support/downloads/item/seatools-win-master/

      http://www.hdsentinel.com/

      Hit the green download button and uninstall when done to stop the clock on the 30 day trial – you can then reinstall it from your Downloads folder as and when.

      • #1564257

        ADDITIONAL INFO FROM DELL RE DHCP MODE IS DISABLED

        I requested from Dell a link to support their contentions.

        I received the following by E-Mail:

        I have searched all over, but the information that I retrieved was Dell internal use and I don’t show a customer-facing version of the article.

        A couple of hours later received this from Dell by E-Mail:

        I received approval to copy the internal article on this for you:

        “When DHCP mode is disabled in MEBx, ME is holding a static IP on the LOM. ePSA and LOM tests for the NIC require the subsystem to be in DHCP mode by design in order to run diagnostic routines successfully. This applies to all low level NIC diagnostics and is not a problem with ePSA, it is a design limitation of the Intel LOM module.”

        ME is the Intel Management Engine. LOM is local area network on motherboard.

      • #1564281

        One thing that will confirm DHCP enabled is to enter ipconfig /all on a cmd prompt.

        I just did the ipconfig /all – See the Screenshot below:

        44505-ipconfig-all

        It shows that the DHCP is enabled

        I will point that out to Dell.

        Here is the bottom part of that Screenshot

        44506-ipconfig-all-2-of-2

    • #1564401

      Just a tip – to post the output from a cmd window, right click in the text area then click on Select all and press enter.

      You can then right click in the reply box and select Paste.

    • #1564430

      I really don’t think that Windows IP/DHCP etc. status has anything to do with the pre-boot environment tests/UEFI/BIOS. In most recent OEM machines on Intel chipsets, Windows can modify/change many aspects of the BIOS/UEFI under certain conditions.

      IMO, Dell/Intel are at the root of your ‘bad’ pre-boot test results.

      I still don’t see how this would lead to a WHEA_UNCONTROLLABLE_ERROR – unless it’s a Dell/Intel pre-boot bug that’s not overridden correctly by Windows (maybe ‘bad’ Intel drivers disallow/block the changes needed for Windows to override, or they’re blocked by poor/restrictive UEFI/BIOS – or MS licensing restrictions for UEFI/EFI boot).

      I’d have it boxed and under the porch waiting for collection and a refund.

    Viewing 11 reply threads
    Reply To: Whea_uncontrollable_error

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: