I manage a lot of servers with simple software RAID1 configuration. One of the disks started to fail notably on partition sdb8, so I had to set all other partitions to failed status before removing and replacing the disk:
mdadm --manage /dev/md0 --fail /dev/sdb5 mdadm --manage /dev/md3 --fail /dev/sdb7 mdadm --manage /dev/md1 --fail /dev/sdb6
After this my /proc/mdstat looked like this:
cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid1 sdb8[1](F) sda8[0] 284019576 blocks super 1.0 [2/1] [U_] bitmap: 3/3 pages [12KB], 65536KB chunk md0 : active raid1 sda5[0] sdb5[1](F) 528372 blocks super 1.0 [2/1] [U_] bitmap: 0/1 pages [0KB], 65536KB chunk md3 : active raid1 sda7[0] sdb7[1](F) 4200436 blocks super 1.0 [2/1] [U_] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sda6[0] sdb6[1](F) 4199412 blocks super 1.0 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none>
What you will normally do is remove faulty drive sdb and replace it with a fresh one. Do not forget to properly remove SCSI device like I did and you will end up with new disk recognized as sdc instead sdb, which complicates things a bit, but it can be fixed. I will add a note how to remove SCSI disk properly to the end of this post.
Take a look at partition table on working drive sda:
fdisk -l /dev/sda Disk /dev/sda: 300.0 GB, 300000000000 bytes 255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000af798 Devce Boot Start End Blocks Id System /dev/sda1 * 2048 585936895 292967424 f W95 Ext'd (LBA) /dev/sda5 4096 1060863 528384 fd Linux raid autodetect /dev/sda6 1062912 9461759 4199424 fd Linux raid autodetect /dev/sda7 9463808 17864703 4200448 fd Linux raid autodetect /dev/sda8 17866752 585906175 284019712 fd Linux raid autodetect
See that your newly inserted drive has no partitions:
fdisk -l /dev/sdc Disk /dev/sdc: 300.0 GB, 300000000000 bytes 255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000
Force copy partition table from disk sda to your new disk sdc with command:
sfdisk -d /dev/sda | sfdisk /dev/sdc --force Checking that no-one is using this disk right now ... Warning: extended partition does not start at a cylinder boundary. DOS and Linux will interpret the contents differently. OK Disk /dev/sdc: 36472 cylinders, 255 heads, 63 sectors/track sfdisk: ERROR: sector 0 does not have an msdos signature /dev/sdc: unrecognized partition table type Old situation: No partitions found New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdc1 * 2048 585936895 585934848 f W95 Ext'd (LBA) /dev/sdc2 0 - 0 0 Empty /dev/sdc3 0 - 0 0 Empty /dev/sdc4 0 - 0 0 Empty /dev/sdc5 4096 1060863 1056768 fd Linux raid autodetect /dev/sdc6 1062912 9461759 8398848 fd Linux raid autodetect /dev/sdc7 9463808 17864703 8400896 fd Linux raid autodetect /dev/sdc8 17866752 585906175 568039424 fd Linux raid autodetect Warning: partition 1 does not end at a cylinder boundary Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).)
Check if the partition table is written correctly:
fdisk -l /dev/sdc Disk /dev/sdc: 300.0 GB, 300000000000 bytes 255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdc1 * 2048 585936895 292967424 f W95 Ext'd (LBA) /dev/sdc5 4096 1060863 528384 fd Linux raid autodetect /dev/sdc6 1062912 9461759 4199424 fd Linux raid autodetect /dev/sdc7 9463808 17864703 4200448 fd Linux raid autodetect /dev/sdc8 17866752 585906175 284019712 fd Linux raid autodetect unused devices: <none>
Now it is time to add new partitions to RAID1 arrays with commands:
mdadm --manage /dev/md0 --add /dev/sdc5 mdadm --manage /dev/md1 --add /dev/sdc6 mdadm --manage /dev/md3 --add /dev/sdc7 mdadm --manage /dev/md2 --add /dev/sdc8
Synchronization will start immediately as you add the partition back to array. I usually wait for one synchronization to finish before I add another partition. You can check the status of synchronization with simple cat command:
cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid1 sdc8[2] sdb8[1](F) sda8[0] 284019576 blocks super 1.0 [2/1] [U_] [>....................] recovery = 0.0% (264576/284019576) finish=107.2min speed=44096K/sec bitmap: 3/3 pages [12KB], 65536KB chunk md0 : active raid1 sdc5[2] sda5[0] sdb5[1](F) 528372 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md3 : active raid1 sdc7[2] sda7[0] sdb7[1](F) 4200436 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdc6[2] sda6[0] sdb6[1](F) 4199412 blocks super 1.0 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none>
When it is all finished it should look like this:
cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid1 sdc8[2] sdb8[1](F) sda8[0] 284019576 blocks super 1.0 [2/2] [UU] bitmap: 3/3 pages [12KB], 65536KB chunk md0 : active raid1 sdc5[2] sda5[0] sdb5[1](F) 528372 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md3 : active raid1 sdc7[2] sda7[0] sdb7[1](F) 4200436 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdc6[2] sda6[0] sdb6[1](F) 4199412 blocks super 1.0 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none>
Now it is time to install GRUB to our newly added disk in case working disk fails we would need to boot from this new one and if we do not do this we would need to use rescue CD to boot.
First you should fix your device.map file where GRUB looks for physical hard drives:
In my case it looks like:
cat /boot/grub/device.map (hd1) /dev/disk/by-id/scsi-35000c5003b2fda87 (hd0) /dev/disk/by-id/scsi-35000c5003b2f452f
To find out what is the ID of your newly inserted disk:
ls -la /dev/disk/by-id/ ... lrwxrwxrwx 1 root root 9 Jan 13 11:52 scsi-35000c500742c35cf -> ../../sdc lrwxrwxrwx 1 root root 10 Jan 13 11:52 scsi-35000c500742c35cf-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Jan 13 11:52 scsi-35000c500742c35cf-part5 -> ../../sdc5 lrwxrwxrwx 1 root root 10 Jan 13 11:52 scsi-35000c500742c35cf-part6 -> ../../sdc6 lrwxrwxrwx 1 root root 10 Jan 13 11:52 scsi-35000c500742c35cf-part7 -> ../../sdc7 lrwxrwxrwx 1 root root 10 Jan 13 11:52 scsi-35000c500742c35cf-part8 -> ../../sdc8
So you should change your ID in device.map to look like this:
(hd1) /dev/disk/by-id/scsi-35000c500742c35cf (hd0) /dev/disk/by-id/scsi-35000c5003b2f452f
Now start GRUB with grub command and do the following:
GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ] grub> root (hd Possible disks are: hd0 hd1 grub> root (hd Possible disks are: hd0 hd1 grub> find /bo Error 12: Invalid device requested grub> find /bogru Error 12: Invalid device requested grub> find /boot/grub/stage2 (hd0,4) (hd1,4) grub> setup --stage2=/boot/grub/stage2 (hd1) (hd1,4) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 17 sectors are embedded. succeeded Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd1) (hd1)1+17 p (hd1,4)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded Done.
This way you have installed GRUB boot loader to your newly added hd1 (sdc).
Now we come to the part that I missed and that is to properly remove old sdb hard disk from SCSI and as you could see there is still sdb disk shown in /proc/mdstat. You need to remove all faulty declared devices with following commands:
mdadm --manage /dev/md0 --remove faulty mdadm --manage /dev/md1 --remove faulty mdadm --manage /dev/md2 --remove faulty mdadm --manage /dev/md3 --remove faulty
Now your /proc/mdstat will be clear of faulty devices:
cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid1 sdc8[2] sda8[0] 284019576 blocks super 1.0 [2/2] [UU] bitmap: 3/3 pages [12KB], 65536KB chunk md0 : active raid1 sdc5[2] sda5[0] 528372 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md3 : active raid1 sdc7[2] sda7[0] 4200436 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdc6[2] sda6[0] 4199412 blocks super 1.0 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none>
Notice: Here is what I should have done before removing the hard drive to replace it:
echo "scsi remove-single-device 0 0 2 0" > /proc/scsi/scsi
And after inserting the new drive add it back with:
echo "scsi add-single-device 0 0 2 0" > /proc/scsi/scsi
This will ensure your new drive is recognized properly again as sdb.
In case you forgot like me, change from sdc to sdb will happen next time when you reboot the server.