News, tips, advice, support for Windows, Office, PCs & more. Tech help. No bull. We're community supported by donations from our Plus Members, and proud of it
Home icon Home icon Home icon Email icon RSS icon
  • LangaList: Should you trust a hard drive after a major error?

    Posted on February 18th, 2019 at 05:32 woody Comment on the AskWoody Lounge

    Tough question. No easy answers. But there are lots of ways you can diagnose a suddenly disruptive drive.

    Fred Langa with money-saving advice to fix (or accept!) a problem we’ve all encountered.

    Out this  morning in AskWoody Plus Newsletter 16.6.0. Now available – yes, for free — on AskWoody.

    If that helped, take a second to support AskWoody on Patreon

    Home Forums LangList: Should you trust a hard drive after a major error?

    This topic contains 27 replies, has 13 voices, and was last updated by  anonymous 1 month ago.

    • Author
      Posts
    • #328641 Reply

      woody
      Da Boss

      Tough question. No easy answers. But there are lots of ways you can diagnose a suddenly disruptive drive. Fred Langa with money-saving advice to fix (
      [See the full post at: LangList: Should you trust a hard drive after a major error?]

      3 users thanked author for this post.
    • #328651 Reply

      anonymous

      In my personal opinion, no. Even one bad sector is enough for me to toss a hard disk.

      • #328727 Reply

        Microfix
        AskWoody MVP

        Why not use as a secondary HDD? once OS knows where the bad sectors or clusters are. Stood the test of time for a good while longer (with regular backups offline). Although I couldn’t do that these days..all SSD’s.

        | W10 Pro x64 1803 | W8.1 Pro x64 | Linux x64 Hybrids | W7 Pro x64/ XP Pro O/L
          Can't see the wood for the trees? Look again!
        • #328733 Reply

          Canadian Tech
          AskWoody_MVP

          Speaking of SSDs, in my experience, when they begin to fail, they fail quite completely and suddenly. One of the reasons I do not use them.

          CT

          2 users thanked author for this post.
          • #328813 Reply

            warrenrumak
            AskWoody Plus

            You’d prefer to give all your users a worse experience, every day, on the off-chance that maybe one of them will have one failure?

            The MBTF for many new SSDs nowadays is 1.5 to 2.0 million hours, which is comparable to HDDs.  Across 150 systems, that’s an average of 2 failures every 3 years.

            Reading material:

            https://www.wepc.com/tips/ssd-reliability/
            https://www.backblaze.com/blog/hdd-vs-ssd-in-data-centers/

            • #328896 Reply

              Canadian Tech
              AskWoody_MVP

              A 7200 rpm rotating drive will equal the performance of an SSD on a modern PC with the exception of the startup. It costs far less.

              CT

            • #329362 Reply

              anonymous

              That heavily depends on workload.  For bare office work with light random seeks and some large contiguous read/writes, they do well. For anything large like CAD you will notice a significant difference jumping from a single HDD to a single SSD.  Going RAID0/10 can offer better speed but even that can have a hard time against a single SSD on a workstation.

              When talking business, keep a NAS or storage server around to handle backups or file hosting.  They should not be relying on a single disk for anything important or critical, regardless of SSD or HDD.  If an SSD dies just restore from backup.

      • #328988 Reply

        Ascaris
        AskWoody_MVP

        That would have made things very hard in the earlier days, when low-leveling a drive and manually entering the defect mapping was part of the deal.  All of the hard drives had bad sectors right from the factory, and their locations were often printed on the back of the drive itself for easy reference.  The rule of thumb we had back then was that a drive should not have more than 0.1% bad sectors– so on a 40MB drive, you would want to see 40KB or less in bad sectors.

        With IDE drives, low-leveling was already done, and the bad sector remapping was too, but it didn’t mean the bad sectors were not present… they just weren’t visible to the user.  It still works that way now, though acquired errors might be handled in different ways by different manufacturers.

        It’s not the presence of surface defects that raises alarm bells for me.  It’s the increase in their number that is the real danger sign.  A single soft sector that is marked and remapped means it’s important to watch the drive, but I wouldn’t immediately take it out of service for that.  Over time, if no more appear, I would eventually come to trust the drive as much as any other (which isn’t much).

        If more and more bad sectors appear, the odds are that the drive is failing, and replacement should be planned for.

        Even with this said, I’d have to answer the Langa question with a “No.”  I would not trust a hard drive after a major error… because I wouldn’t trust a hard drive before a major error either.  Data on hard drives is inherently ephemeral, and if you have important stuff that is only in one place, you’re courting disaster in any case.  Drives do fail, errors do take place, PCs do get stolen (laptops especially), and data loss does occur.

        That’s why I didn’t initially suggest the backing up above when writing about the single corrected soft sector or the drive showing an increase in bad sectors.  Of course you would want to back up the drive in either case, but that’s a given anyway!  I always advise people who don’t have a backup to always make one at the first sign of hard drive issues, but I advise it just as much for people whose PCs are still working perfectly (it’s just that they tend to ask what to do less often when nothing’s wrong).  You don’t always get a warning before imminent disaster, and even if you do, the warning sign could be the loss of the single most important file on your PC to you, whatever one that may be.

        Group "L" (KDE Neon User Edition 5.15.3 & Kubuntu 18.04).

        3 users thanked author for this post.
    • #328708 Reply

      Canadian Tech
      AskWoody_MVP

      I’ve been supporting something like 150 systems for about 17 years now. Lots of experience. I have a pretty strongly held policy that is based on fixing hundreds of PCs.

      Hard drives come with warranties. The manufacturers insist you run their diagnostic tests on their drives before doing a warranty replacement. That software is what you need to test the HARDWARE of a drive. That is not the same thing as a chkdsk or the like because they test the LOGIC of the data on the drive, not the hardware. I run those tests quite routinely as part of my checkup routine. If I get any kind of a failure registering from those tests (any one of them), I replace the drive.

      Hard drives are really not very expensive and are easy to install.

      If its a 2.5″ (latptop) drive and near or past 5 years, I replace it, regardless of whether it shows an error. If it has not failed yet, it will soon. If it is a desktop 3.5, that number is more like 8 years.

      You cans spend eons of time trying to diagnose, cleanup and fix a system. After doing this so many times, i finally realized that re-installing the OS is a whole lot faster and delivers a sure thing — usually a vastly improved system that is clean as virgin snow. So, most often I do a re-install either on the disk that is OK or on a new one.

      I am in the process of replacing most of my clients’ hard drives. when I do, I create a system image that I label as final state. That means that there will never be another update to it and even if Microsoft evaporated, you could use that image to re-install a perfectly running system either on the current drive or a new one.

      CT

      7 users thanked author for this post.
      • #328720 Reply

        Microfix
        AskWoody MVP

        I’ve still got the diagnostics floppy disks from Hitachi, Maxtor, Seagate, Samsung, Quantum with the oldest manufacturer being Connor. Those used to be the first port of call for me if a HDD started misbehaving. Gee! you hit a nostalgic note for me there @CT..

        | W10 Pro x64 1803 | W8.1 Pro x64 | Linux x64 Hybrids | W7 Pro x64/ XP Pro O/L
          Can't see the wood for the trees? Look again!
      • #328928 Reply

        anonymous

        Hello CT, I agree with your ideas of Hard Drives and SSD’s. Steve Gibson gave an explanation of how SSDs work and they too can suffer from age. I know SSDs are fast, but they can suffer from a “congestive problem” and slow down. Steve Gibson’s SpinRite does a good job in clearing out and forcing a TRIM on SSDs and they resume being fast (I reference Steve Gibson’s podcasts). I also have heard that when they fail, it is instant, or has very little warning. I usually recommend a regular spinning drive, for the average regular user, unless the person wants maximum speed like the NVMe drives.

        While I do not replace Hard Drives as quickly as you do, I do run the disk checking utilities to monitor drive health on a regular basis. I have drives that were installed in the early 2000’s and they still are running. They are not be run every day but they still do the job. My hard drive in my daily use PC started having issues after 10 years. At this time the entire computer was having issues (like an old car) and it was wiped clean and recycled.

        “You cans spend eons of time trying to diagnose, cleanup and fix a system.” This is true. You have a good routine of “preventative replacement” and that works for you and your clients. As I said to you before, I use you as an example of keeping computers running, without updates, and no infections. I will also add, no hardware failures too, since you have your preventative replacement policy.

        I worked for a large corporation in the 1990’s that told us that the computers are to be re-imaged if anything goes wrong, make sure you have backups on the mainframe. The reason was it “took too much time” to diagnose a problem. Just re-image and it is done.

        Thank you CT.

        2 users thanked author for this post.
        • #329087 Reply

          mn–
          AskWoody Lounger

          I worked for a large corporation in the 1990’s that told us that the computers are to be re-imaged if anything goes wrong, make sure you have backups on the mainframe. The reason was it “took too much time” to diagnose a problem. Just re-image and it is done.

          Yeah, I’ve been at such a place too.

          It wasn’t really that the time to diagnose was too long, it was that this was an avoidable delay for the end user – replace the end user’s system right away with a clean-image spare and they’re back working, then I’d take the one with a problem back to the IT area for diagnosis, hardware repair if necessary / economical (self or a warranty tech), and then if good reimage and back on the shelf of spare systems or else disposal.

          That workplace also had lots of shared PCs, what with some desks having 3 or 4 shift changes a day, so it would’ve been a bother to chase down all the people who “might” have left files on any given one.

    • #328819 Reply

      Lars220
      AskWoody Lounger

      If it is a desktop 3.5, that number is more like 8 years.

      Thank you Canadian Tech I appreciate your sharing experience and advice. I would like to share that just yesterday I helped a 92 year old neighbor and friend of mine to get some photos and videos off a really old Windows XP hard drive that he was storing in an anti-static plastic bag. I was not paying too much attention to the details, but I think it was a ‘Maxtor’ drive with IDE connections. We were able to determine that the drive and files were from 2003, so that is why I am posting. 16 years, what luck. He was just storing it, not using it at all. The handy tool to get IDE into his Windows 7 USB port:

      https://www.sabrent.com/product/USB-DSC8/usb-3-0-sataide-hard-drive-converter-power-supply-led-activity-lights/#description

      As per advice elsewhere, I would not trust using any HDD after 8 years or so, just not worth the potential trouble. Just sharing, Hope this may be helpful.

      2 users thanked author for this post.
      • #328895 Reply

        Canadian Tech
        AskWoody_MVP

        I have the same device. I use it frequently to salvage data from a drive that won’t boot.

        It’s not so much that drives just fail after 8 years. It is simply the facts that it is so simple to change them and relatively inexpensive coupled with a re-install producing such fabulous results. That makes it such a slam dunk in my experience. The work takes 8 to 12 hours, and why invest that on an 8 year old drive???

        CT

        1 user thanked author for this post.
      • #329226 Reply

        GoneToPlaid
        AskWoody Plus

        That Maxtor drive is guaranteed to suddenly fail, since at the time Maxtor was using a cheaper spindle design which had inherent design flaws. Way back, I had three such drives suddenly fail. Maxtor hid the design flaws in SMART by not reporting sectors that were remapped. Those drives would catastrophically fail very soon after SMART showed the first uncorrectable sector.

    • #328836 Reply

      GreatAndPowerfulTech
      AskWoody Lounger

      Speaking of SSDs, in my experience, when they begin to fail, they fail quite completely and suddenly. One of the reasons I do not use them.

      I’ve installed about 300 solid state drives. The oldest is an 80-GB Intel X-25. I avoid the Chinese brands like King Dian. I’ve only had two failures during service and one DOA. That’s a far better success rate then the hard drives I’ve used, mostly WD Black. Hard drives, from my experience, have roughly twice the failure rate of SSD’s.

      GreatAndPowerfulTech

      1 user thanked author for this post.
      • #329123 Reply

        Ascaris
        AskWoody_MVP

        That matches what I’ve read when researching the relative longevity of hard drives vs. SSDs.  Hard drives have a lower failure rate throughout their respective service lives, and the concern that many have about them having NAND cells with a finite service life (while true) is not a big concern for the vast majority of users, who will not even be close to the end of life of the NAND cells by the time the drive is too old to be of any practical use.

        That said, SSDs do have one thing to watch out for.  While the rate of failure (as in “He’s dead, Jim”) is lower, they do have higher odds of uncorrectable read errors that can lead to the loss of one or two files at a time rather than everything on the entire drive.  When a hard drive has a weak sector that generates a read error on the first try, the data can often be recovered if the drive keeps trying.  Typical hard drives (those intended for PC/workstation use outside of a RAID array) will try heroically to read a sector, and they often succeed.  The drive will generally do all this without you having to do anything; the drive electronics are programmed to do it all automatically.

        On a SSD, when there is a read error, there’s a bigger chance that the data won’t be readable with additional read attempts.

        When there is an uncorrectable error, if it’s in a sector or NAND block that is in use, the file or files that are partially or wholly within will be corrupted, and it will be necessary to restore them from a backup, which it’s very important to have in any case.

        I’ve had hard drives suddenly quit on me, with the entire drive taking its contents with it into the grave, and I’ve had ones that gave me warning before they failed.  One of the former was a laptop drive that suddenly quit just after (same day) I wrote a glowing review of its durability on Newegg… it was a few months short of 5 years old, with power-on hours around 26,000, and it showed no sign of sickness… it was perfectly healthy in SMART, I’d never had any issues with it, and I was impressed with it.  And then, in the middle of installing Windows 7 on my F8Sn laptop (which had been running XP for years), it just stopped responding.  No data could be written to or read from it, and the Seagate diagnostic test declared it failed immediately.

        At the time I bought the drive, Seagate was giving their drives a 5 year warranty, so I requested a RMA, and I got a replacement.  Not new… it was a refurb, but as I have said before, I don’t really trust any of them all that much anyway.  It’s now the boot drive in my backup server, while the F8Sn the failed drive came out of got a Samsung SSD.  That SSD was moved into my Dell G3 gaming laptop a few months ago (the laptop got the HDD that was in the G3).

        My oldest SSD, my Samsung 840 Pro 128GB, is a bit over five years old now, and it’s seen some heavy use as the primary boot/OS drive of my main desktop PC.  It’s got about 70 terabytes written so far, with 26,340 power-on hours, and its performance is still as good as it ever was, averaging 548 MB/sec on read (sequential) and 319 MB/sec on write (sequential) across the entire drive partition, 0-100%. Edit: I initially missed that it was doing one partition at a time, so I went back and repeated it across all the partitions on the drive.  Performance was the same as in the image in throughput, without the nosedive in performance at 90%, strangely.  Seek times were lower, at 0.05-0.08 seconds.

        On this model, it’s normal for the 128GB model to be slower on writes… larger ones have similar read and write numbers.

        840probench

        The drive has picked up four relocated sectors recently, but it’s still at 99/100 according to SMART, so it has a lot more of them left (it says the threshold is 10, so it has about 89 * 4, or 356 more bad blocks to go before it reaches that level.  SSDs are designed for this, with spare blocks held in reserve so that the capacity remains the same for a certain number of bad blocks, since it’s understood that NAND cells do wear out with use, and I’m not concerned about it.  In SMART reports 0 uncorrectable errors, so the drive electronics were able to recover the data safely on all of them.

        It’s way too small a sample size to be able to really know anything by a 1:1 comparison, but this drive has lost no data in 5+ years, while the HDD I had before it had lost 100% of its data before it 5 year warranty expired.  Those are by far the oldest drives I’ve had lately, so they’re the only ones I have to compare long-term.  While it does not demonstrate that SSDs are any more reliable than hard drives, to me it’s an illustration that backups aren’t important just for SSDs, but for everything that has data that you want to keep.  Total drive failure is only one of a bunch of threats to your data… a bad update could delete things, malware could encrypt it, or a thief may steal the device with the drive inside.  Better to be safe!

         

         

         

         

        Group "L" (KDE Neon User Edition 5.15.3 & Kubuntu 18.04).

        • This reply was modified 1 month ago by
           Ascaris.
        • This reply was modified 1 month ago by
           Ascaris.
        Attachments:
        You must be logged in to view attached files.
        1 user thanked author for this post.
    • #328841 Reply

      anonymous

      Open question, with reference to @lars220 ‘s experience shared in #post-328819 . Will an SSD retain memory sitting on a shelf for 16 years? What steps should be taken to increase the odds of data recovery after more than a decade without power?

      • #329106 Reply

        Ascaris
        AskWoody_MVP

        That is one area you would not want to use a SSD.  They are not suitable for long-term storage like that. The drive itself will live, but the data will accumulate more and more errors over time.  It would probably take a couple of years, but sixteen… I would not bet on anything on it being usable.

        As far as I understand, the data will last longer on the drive if it is written to when warm and stored cool or cold.  How hot and cold, I don’t know!

        Group "L" (KDE Neon User Edition 5.15.3 & Kubuntu 18.04).

        2 users thanked author for this post.
    • #328916 Reply

      anonymous

      GSmartControl, a great tool for looking at hard drive’s smart data and testing (internal drive test).
      https://gsmartcontrol.sourceforge.io

      I’ve had good luck with Crucial brand SSDs (and ram). I’ve had bad luck with samsung brand SSDs (had more than 36 fail out of a batch of less than 50 — bad sectors that can’t be read or written to, more develop over time, a secure erase seemed to clear the problem, didn’t test long term because faulty hardware doesn’t belong in the computer)

      Any fault (fault = lost any of my data, even one sector) means replace, I don’t care if all the sectors can be read now or have been replaced with readable spares.

      • #328949 Reply

        mn–
        AskWoody Lounger

        That’s still a better percentage than one specific Seagate model number back when… (3.5″ hotplug SCA80 scsi, server disks)

      • #329134 Reply

        Ascaris
        AskWoody_MVP

        It’s not faulty any more than a tire is faulty when it only has two thirds of its usable tread left.  SSDs had relocated sectors in various tests reported online and still had plenty of usable and reliable life left in them.

        Group "L" (KDE Neon User Edition 5.15.3 & Kubuntu 18.04).

        1 user thanked author for this post.
        • #329147 Reply

          mn–
          AskWoody Lounger

          Though, getting read errors on a previously successfully written block and data loss from that pretty much by does mean it’s faulty… SSD wear occurs on write, therefore data loss due to a failure during background relocation may mean the fault is in the relocate/rebalance algorithm.

          Also means you can’t trust that any data previously written is still there, so… not much use in such a disk IMHO…

          Firmware problem, maybe.

          (The old Seagate spinning SCSI server disk problem was also supposed to be in the firmware, went through at least 15 different revisions of that before we started getting a different model number in the replacements…)

          1 user thanked author for this post.
          • #329249 Reply

            Ascaris
            AskWoody_MVP

            Though, getting read errors on a previously successfully written block and data loss from that pretty much by does mean it’s faulty… SSD wear occurs on write, therefore data loss due to a failure during background relocation may mean the fault is in the relocate/rebalance algorithm.

            It means that one specific NAND cell is faulty, and it’s marked unusable and retired from service at that point, so you’re good to go.  It doesn’t mean that the rest of the SSD unit with NAND cells that did not have any issues is itself faulty because of that one defect that was corrected, as the anonymous poster suggested.

            Wear-leveling doesn’t mean that the wear will be spread out so evenly that all of the NAND cells will die more or less in the order in which they were initially written.  Super-aggressive wear leveling itself would contribute a significant number of R/W cycles to each cell, so it can’t always be the case that the least used cell in every given case will be the next one written to, which would be the case if wear-leveling was perfect.

            In addition, not every cell rated for (say) 2,000 write cycles will make it to 2,000, while some will make it far more than that.  It only takes a single bit error to throw off a read on a given block, and one bit out of 4 trillion bits on a 500GB drive is an infinitely tiny percentage of failed NAND cells.

            NAND cells wear out… it’s expected, just as much as is a car tire wearing out, and the drive is designed to cope with it by taking the weak cell out of service and allowing the ones that didn’t have a problem to keep doing their thing.  You might have a point if you said that the drive was faulty at the moment that the weak cell was detected (which would also be true of a hard drive with a weak sector), but as soon as the weak cell was removed from service, the drive was no longer faulty.  Even if there was data loss, it doesn’t mean that the drive itself is still faulty even after the problem that led to the data loss was fixed.  The data on the drive and the health of the drive itself are related, but not the same thing.

            It’s also not true that a reallocated sector means there was data loss.  It’s more likely than with a hard drive, but by no means is it a given.  As I posted above, my 5 year old SSD has recently acquired four reallocated sectors, but there hasn’t been any uncorrectable error (the kind that cause data loss) over its lifetime.  Both statistics are part of the SMART profile for the drive.

            As an aside, the last thing I would want to do is try to get the block containing the faulty cell unmarked and returned to service, as anonymous suggested he/she tried.  I would not call that correcting the fault by any means.

             

            Group "L" (KDE Neon User Edition 5.15.3 & Kubuntu 18.04).

            1 user thanked author for this post.
    • #328957 Reply

      deuce120
      AskWoody Plus

      I had a hard drive that became unaccessable meaning it would rotate but no access to the data. After some searching, I found a site that suggested  that I use FreeOTFEExplorer to access the drive and change one bit. Long story short, it worked and the drive now sits on the shelf along with numerous others. Should I need to access it, I pop it into Thermaltake BlacX Duet.

      I have found that CrystalDiskInfo works well for hard drive or SSD.

      • This reply was modified 1 month ago by
         deuce120.
    • #329219 Reply

      Mr. Natural
      AskWoody Plus

      Hey Fred. Thanks for the wmic command line info. That’s something new to me that I was not aware of. Thanks!

      Red Ruffnsore reporting from the front lines.

    • #329232 Reply

      bbearren
      AskWoody MVP

      All my eggs are not in one basket—or two, or three … from my website:

      mypartitions

      Task Scheduler makes images of my OS, Programs and Users logical drives weekly, and runs a couple of Robocopy command lines to triplicate my data files daily.  These go to multiple logical drives/HDD’s, my NAS, and OneDrive.

      I also make periodic complete drive images (includes all logical drives on the HDD itself).  HDD failure has always been a matter of when, not if.  I’ve had a few, but I’ve never lost anything because of it.  An ounce of prevention is worth a pound of cure, as they say; what is a pound of prevention worth?

      I don’t do fresh installs.  I want to get right back to where I left off; replace a failed HDD, restore its complete drive image, restore any pertinent more recent logical drive images within that HDD, and get right back to where I was when the drive failed.

      When I get logical errors, I use chkdsk /r as Fred suggested.  If chkdsk finds bad sectors, recovers their data, marks them as bad, and pronounces the logical drive as repaired, I’ll make an up-to-date complete drive image, and keep using the drive.

      If that drive throws a few more bad sectors, I’ll replace it with a new drive, restore the complete drive image of the failed drive to the new drive, then update it with any and all more recent logical drive images pertinent to that drive, and keep on truckin’.

      My NAS has four 3TB HDD’s in a RAID 10 array.  I have an identical, formatted 3TB HDD in its box on a shelf.  If one of the drives in my RAID array fails, I’ll replace it with the spare, and order a new spare to put on the shelf.

      There’s an axiom in worker safety, “Expect the unexpected.”  Might I add, “Prepare for the expected.”

      Create a fresh drive image before making system changes, in case you need to start over!
      "The problem is not the problem. The problem is your attitude about the problem. Savvy?"—Captain Jack Sparrow
      "When you're troubleshooting, start with the simple and proceed to the complex."—M.O. Johns

      "Experience is what you get when you're looking for something else."—Sir Thomas Robert Deware

      • This reply was modified 1 month ago by
         bbearren.
      • This reply was modified 1 month ago by
         bbearren.
      • This reply was modified 1 month ago by
         bbearren.
      Attachments:
      You must be logged in to view attached files.

    Please follow the -Lounge Rules- no personal attacks, no swearing, and politics/religion are relegated to the Rants forum.

    Reply To: LangList: Should you trust a hard drive after a major error?

    You can use BBCodes to format your content.
    Your account can't use Advanced BBCodes, they will be stripped before saving.

    Your information: