Lessons Learned (Volume 1)
Or, rather: Lessons Learned in the least convenient way possible, in the most painful manner possible, with the maximum amount of stress, time pressure, and personal agony.
Chapter 1: January
This January, in the middle of composing my Division III (essentially my final project of college), my biggest hard drive, upon which all my archives and stored data resided, blew its servo. I had the data, still, but it was inaccessible- mechanical failure of a bizarrely non-destructive type. Unfortunately, this occurred at the same time that my power supply quite literally exploded, taking the motherboard with it. If you’ve never seen a power supply explode, it’s quite a sight- bright lights start flashing inside it, popping and sizzling sounds are emitted, and everyone in the room reacts according to their experience with said event. If you’ve EVER seen it happen, you practically dive across the room in bullet-time, praying that you’ll be able to yank out the power cord before the PSU fries everything it’s connected to. If you haven’t seen it before, you stand there, thunderstruck by the awesome power of unleashed electricity, much like the people who watched Thomas Edison electrocute an elephant as a graphic demonstration of why he was both more clever and infinitely more of an evil asshole than Nikolai Tesla. Where was I? Right, power supplies. So the servo in the drive died, and that was it. I let it sit around for the next eight months while I, in order: Lost over a month of worktime due to the aforementioned incident, dealt with (more like ignored, heh) a supremely uncooperative Div III committee, and somehow managed to complete my Div III on time and graduate (!).
Chapter II: Nine Months Later
Nine months later, I’m fully in the Real World, working the same job I’ve had for the last four years, using absolutely nothing I paid to learn in college (in fact, the skills I’m using are the ones I WAS PAID TO LEARN by working for Hampshire IT!). I decided that, hell, since we do data recovery, why not avail myself of ye olde employee-e discounte? Favors are called in and three days later, I have my data back. I copy it to my current storage drive… and hilarity of hilarities, IT DIES. I’ve already called in all the favors that I had, so this one’s my problem. Utilizing some truly space-age, very proprietary software, I manage to discover exactly what’s wrong with the drive- there’s a big ‘ol fat patch of horribly corrupted data about 60% of the way through… right in middle of where I copied my recovery!
So what happened?
During the entire time I’d had this particular drive, (since January, actually) I had never actually filled it all the way up. I had about 2-300GB of data sitting on it, so I hadn’t even gotten it half full. But the recovery was around 400GB, which pushed me up to ~800GB out the terabyte available. This means that the corruption was probably a defect in the actual drive itself- I had just never noticed it because nothing had ever gotten that far into the drive yet.
I know this isn’t literally how hard drives work, but consider this- if that patch was corrupted and throwing out CRC errors whenever it was read, any writes to that patch would just fail over to a different sector… unless there was no other space left, in which case they would be ‘written’ to the bad patch and promptly disappear into the nether.)
Amazingly, the way that this drive had failed was, actually, not a problem. The data from before the recovery was copied over was 100% fine, and I had that recovered data on another drive, as well. I could copy the recovery from the second drive, and the original data from the corrupted drive, and as long as I had a third drive to merge those two datasets onto, everything would be great.
The Great Copy
New drives in hand, I set out to transfer all my data to the virgin media… and man, it’s taking a long time. Even allowing for CPU overhead, my calculations say that this should be taking half as long as it is. Odd. Nothing to do but wait, though.
The first copy finishes and I shut down, remove the old drive, and reboot… and get this: BOOTMGR MISSING, PRESS CTRL+ALT+DELETE TO RESTART
Well, shit. However, I am an IT professional, so I just put everything back the way it was and reboot again. Everything comes up fine. Into the Disk Management extension!
Lots of people don’t realize that even the ‘user’ versions of Windows come with an astonishing array of cool and useful administrative tools that can fix pretty much any wierd problem that crops up for you. The Local Disk Mangement console lets you see exactly what WIndows thinks your disks look like, how they’re partitioned, and where you’re booting from.
It turns out that Windows 7 has some… interesting… ideas about where your bootloader should go. Instead of installing it on the drive where you installed Windows, it seems to take the lowest-indexed drive in the HDD hierarchy, and puts BOOTMGR there. If you’re like me, and are operating under the impression the SATA doesn’t care what order you connect your drives in, this can be VERY BAD.
In my case, I have no IDE drives, so Windows skipped them. But I had three SATA drives connected: User/ Misc, OS, and Storage, in that order. See the problem? User/ Misc was the lowest-index drive, so it got BOOTMGR, and the OS drive got nothing! This is all fine and dandy… until your replace User/ Misc or swap the order of the drives around, which will result in BOOTMGR MISSING.
Quiz time! Can you figure out why this happened? I’ll give you a hint- This is NOT the intended behavior, and I’ve already made some incorrect assumptions. In addition, there’s a GLARINGLY OBVIOUS clue that I missed.
…
…
…
What I missed:
The data transfer rate. Why was it so slow? If I had bothered to do some more math, I would have discovered that the data transfer rate between my two SATA drives was exactly the maximum transfer rate allowed by IDE drives… which meant that my SATA drives were emulating IDE drives! Why? My motherboard, like pretty much every motherboard that has onboard SATA, lets you set the SATA ports to one of three modes- Legacy, Chipset RAID, and AHCI. Mine was set to Legacy, for maximum compatibility. This means that the motherboard presents the SATA drives to the OS as IDE drives, in case the OS can’t use SATA drives properly. Because of this, when I installed Windows, it looked at the drives connected, and saw them as traditional IDE masters and slaves. In the case of IDE drives, connection order is of paramount importance, so placing BOOTMGR on the first Master was a no-brainer. My OS drive was showing as the first slave, so of course it didn’t get BOOTMGR.
Fixin’ It:
You’d think that the solution would just be to set the SATA drives to AHCI, (which lets them act like REAL SATA drives instead of emulated IDE drives) but unfortunately it isn’t that simple. If you do that, Windows bluescreens on startup. saying STOP 0x0000007B INACCESSABLE_BOOT_DEVICE.
According to Microsoft KB922976, Windows disables unused storage drivers after the first startup, to speed up boot. So when you switch over of AHCI, Windows can’t read the OS drive anymore, because it hasn’t loaded the right driver. The same article gives you a fix, though. It’s a simple registry tweak that re-enables the AHCI driver. Reboot again after doing that… boom, all fixed.
As to fixing the BOOTMGR problem, I’m not sure that you’d need to after setting the drives to AHCI, but just in case, you can use BCEdit, like Scott Hanselman and I did. (after a little more research, natch)
The Upshot:
Everything is now back in its place and much faster, too! I also got to blow out my case and rewire everything, so it was a good excuse to do some cleaning. I also got to learn more about Windows 7’s internals, and you can be SURE I’m going to have to do something like this again. It all came out of a boneheaded mistake that anyone could make, and it was quite hard to figure out exactly what had gone wrong, because there was no failure per se: It took a completely unrelated problem for me to even notice something was wrong!
I wonder how many people are running their expensive, multi-terabyte SATA drives in IDE emulation mode, and don’t even know it?