Saturday, November 14, 2009

RAID5 Failure, Again

This time I get (another) annoying RAID5 failure. The CentOS 4.7 server won't boot because it was unable to start the RAID5 array. Yes, this is the second time I stumbled upon this problem (see this Indonesian-written post). I burned a new CentOS 4.7 DVD (using a new REAL server's DVD writer, no less), then I boot up the DVD, typed linux rescue in the boot command line, and tried to follow the exactly the same step I've done and written in this blog, but to no success: the system complains that the superblock doesnt match.
Seems I forgot the new RAID5 configuration in this server. I forgot that I have reinstalled this server with SAP ERP Netweaver, creating two software RAID5 arrays in the process, and of course with different partitions.
The partitions were: sda3, sdb3, sdc1, sdd2. The four partitions created a 215 megablock (thats about 100 GB, I think) md1 partition. Here's the chemistry:
- The kernel won't add non-fresh member (sdd2) into the array, it kicks it out of the RAID assembly.
- The remaining RAID assembly of three partitions couldn't be started. The cause is, which I found out after forcing the array to run, is that event counter in sda3 is not the same with the others. But kernel said nothing of this in the dmesg log. It just said 'unable to start degraded array ..'
- I forced the assembly to run. Must do this when the md device stopped. So, mdadm -S /dev/md1, then mdadm -A --force --run /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc1 /dev/sdd2. It runs indeed, writing error messages about sda3
- But sdd2 still kicked out from the array. I must manually add it to the array, mdadm -a /dev/md1 /dev/sdd2
Now, I am just waiting for the recovery (recovery status can be read in /proc/mdstat) to finish, so I could boot up this system in confidence. I hope nothing else went wrong.

No comments: