This morning after waking up to lots of thunder and lightning, I got a text message saying my raid5 array had failed. Only this time, 2 of the 3 drives were missing. Since both of those drives were actually mounted via a vblade share (on a different physical machine), I assumed that the other server had freaked out during a power surge. I quickly rebooted the machine to bring back the vblade shares, but then the trouble started.
At some point, the array was "started" but had two faulty drives. I tried --remove and --add to remove and re-add the "faulty" drives. This had the effect of bringing the array back "online" with all the drives as spares. I removed the drives again, and tried the trick I used last time:
mdadm --assemble -f /dev/md0 /dev/sda2 /dev/etherd/e4.1 /dev/etherd/e4.2
However, this also didn't work. It showed the array with /dev/sda2 and /dev/etherd/e4.2 as spares, and e4.1 was nowhere to be seen. At this point I was a little more than worried that I had done something to trash the array. That's when a google search led me to this handy command:
mdadm -E /dev/sda2
This prints out the superblock information that is present on the hard drive. This told me that the e4.2 drive had not been damaged, since I was able to see information there. Also, the UUIDs on all three drives still matched. However, the bottom section of the report differed on all the drives.
A few google searches later, and I came across this:
mdadm --create --assume-clean --level=5 --raid-devices=3 /dev/md0 /dev/sda2 /dev/etherd/e4.2 /dev/etherd/e4.1
Using the --assume-clean flag tells mdadm not to write any data to the drives, or to start initializing the array. However, what I didn't realize was that it would reset the UUIDs. That command brought the array back online, at least according to /proc/mdstat, but when I tried to mount it, it couldn't figure out the filesystem.
That's when I realized that the order in which you specify the drives to the --create command actually matters. I re-ran the command like this:
mdadm --create --assume-clean --level=5 --raid-devices=3 /dev/md0 /dev/sda2 /dev/etherd/e4.1 /dev/etherd/e4.2
The array came back online, and I was able to mount it!
So while RAID 5 protects against a single hard drive failing, it does not protect against me running stupid commands on the array. I'm going to have to start backing up my raid arrays onto other drives...