Apparently raid arrays don't like it when you kill power to the machine. We've been doing remodeling in the house, and I've been turning on and off breakers. Apparently I forgot which breaker the servers were on, and turned it off accidentally a few times.
One of the drives apparently did fail for some reason. I accept this. This happens all the time, and that is exactly why I have a raid 5 array. I was like ok no big deal, I'll just send it in for an RMA. But shortly after, another drive looked like it failed. I saw this in /proc/mdstat:
...[U__U]
meaning only two of the four drives were left. Future attempts at rebooting the machine resulted in the raid volume not being accessible at all. Other clues that indicated a drive failure:
From /var/log/messages:
Dec 29 19:29:06 onyx kernel: Buffer I/O error on device md0, logical block 0
Dec 29 19:29:06 onyx kernel: lost page write due to I/O error on md0
Dec 29 19:29:06 onyx kernel: EXT2-fs error (device md0): ext2_readdir: bad page in #2
There was also some output in dmesg I found by typing
dmesg | less
(But I didn't write it down, and now dmesg outputs information from the last boot which successfully brought up the array with 3/4 drives.)
I was convinced I hadn't actually lost 2/4 drives at the same time, and set out to figure out a way to bring it back.
After several hours of looking through forums and reading the mdadm documentation, I was able to get the array back running on 3/4 drives.
I created the configuration file /etc/mdadm.conf:
DEVICE /dev/sd[abcd]1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1
Then ran:
/sbin/mdadm --assemble -f /dev/md0
mdadm: forcing event count in /dev/sdc1(2) from 1077319 upto 1077330
mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/sdc1
mdadm: /dev/md0 has been started with 3 drives (out of 4).
Now I just have to RMA this drive very quickly before another drive actually does fail.