Software RAID

Useful How-To's:

http://unthought.net/Software-RAID.HOWTO/

(OLD) http://www.tldp.org/HOWTO/Software-RAID-HOWTO-4.html

Where to get raidtools:

> cd /root
> mkdir raid
> cd raid/
> wget http://people.redhat.com/mingo/raidtools/raidtools-1.00.3.tar.gz

For software raid we need kernel > 2.4 with raid patches and the raid tools.

To test the kernel (from the FAQ):

  If your system has RAID support, you should have a file called /proc/mdstat. Remember
  it, that file is your friend. If you do not have that file, maybe your kernel does
  not have RAID support. See what the contains, by doing a cat /proc/mdstat. It should
  tell you that you have the right RAID personality (eg. RAID mode) registered, and
  that no RAID devices are currently active. 

Software raid is configured through /etc/raidtab. Here's an example for raid 1:

  raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/sdb6
        raid-disk       0
        device          /dev/sdc5
        raid-disk       1

The mkraid command is used to initialize a new raid array:

  mkraid /dev/md0

Initial State
---------------
Here's the initial setup:

> df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda2             38456340   1462116  35040720   5% /
/dev/hda1                23302      5976     16123  28% /boot
none                    515348         0    515348   0% /dev/shm
/dev/hdc1             39516436     32828  37476280   1% /mnt/drive2

> fdisk /dev/hda

Command (m for help): p

Disk /dev/hda: 255 heads, 63 sectors, 4998 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         3     24066   83  Linux
/dev/hda2             4      4867  39070080   83  Linux
/dev/hda3          4868      4998   1052257+  82  Linux swap

> fdisk /dev/hdc

Command (m for help): p

Disk /dev/hdc: 16 heads, 63 sectors, 79656 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1   *         1     79656  40146592+  83  Linux

Test Setup - Standard
---------------------

1. Remove old partitions on /dev/hdc:

> fdisk /dev/hdc

Command (m for help): p

Disk /dev/hdc: 16 heads, 63 sectors, 79656 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1   *         1     79656  40146592+  83  Linux

Command (m for help): d
Partition number (1-4): 1

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

2. Create 2 equal sized partitions for testing:

> fdisk /dev/hdc

Command (m for help): p

Disk /dev/hdc: 16 heads, 63 sectors, 79656 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-79656, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-79656, default 79656): +1024M

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (2082-79656, default 2082):
Using default value 2082
Last cylinder or +size or +sizeM or +sizeK (2082-79656, default 79656): +1024M

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

3. Setup /etc/raidtab:

  raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdc1
        raid-disk       0
        device          /dev/hdc2
        raid-disk       1

4. Prepare the partitions with mkraid:

> mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/hdc1, 1048792kB, raid superblock at 1048704kB
disk 1: /dev/hdc2, 1048824kB, raid superblock at 1048704kB

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdc2[1] hdc1[0]
      1048704 blocks [2/2] [UU]
      [========>............]  resync = 44.2% (465664/1048704) finish=0.9min speed=10171K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdc2[1] hdc1[0]
      1048704 blocks [2/2] [UU]
      [=========>...........]  resync = 48.7% (512512/1048704) finish=0.8min speed=10056K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdc2[1] hdc1[0]
      1048704 blocks [2/2] [UU]
      [===============>.....]  resync = 79.7% (836608/1048704) finish=0.3min speed=10148K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdc2[1] hdc1[0]
      1048704 blocks [2/2] [UU]

unused devices: <none>

5. Checkout the results:

> fdisk /dev/md0

Command (m for help): p

Disk /dev/md0: 2 heads, 4 sectors, 262176 cylinders
Units = cylinders of 8 * 512 bytes

    Device Boot    Start       End    Blocks   Id  System
/dev/md0p1             8  10036530  40146088+  83  Linux
Partition 1 does not end on cylinder boundary:
     phys=(1023, 15, 63) should be (1023, 1, 4)

Command (m for help):

6. Mount up the new partition:

> mkdir /mnt/raid
> mount /dev/md0 /mnt/raid
> df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda2             38456340   1462376  35040460   5% /
/dev/hda1                23302      5976     16123  28% /boot
none                    515348         0    515348   0% /dev/shm
/dev/md0              39516436     32828  37476280   1% /mnt/raid

7. Shutdown raid:

> umount /dev/md0
> raidstop --all /dev/md0

Test Setup - Degraded Mode
--------------------------

1. Create partitions:

> fdisk /dev/hdc

Command (m for help): p

Disk /dev/hdc: 16 heads, 63 sectors, 79656 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1             1      2081   1048792+  83  Linux
/dev/hdc2          2082      4162   1048824   83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (4163-79656, default 4163):
Using default value 4163
Last cylinder or +size or +sizeM or +sizeK (4163-79656, default 79656): +1024M

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 4
First cylinder (6244-79656, default 6244):
Using default value 6244
Last cylinder or +size or +sizeM or +sizeK (6244-79656, default 79656): +1024M

Command (m for help): p

Disk /dev/hdc: 16 heads, 63 sectors, 79656 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1             1      2081   1048792+  83  Linux
/dev/hdc2          2082      4162   1048824   83  Linux
/dev/hdc3          4163      6243   1048824   83  Linux
/dev/hdc4          6244      8324   1048824   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

2. Format/mount the first new partition.

> mke2fs /dev/hdc3
mke2fs 1.27 (8-Mar-2002)
warning: 62 blocks unused.

Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131328 inodes, 262144 blocks
13110 blocks (5.00%) reserved for the super user
First data block=0
8 block groups
32768 blocks per group, 32768 fragments per group
16416 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

> mount /dev/hdc3 /mnt/drive2

3. Setup /etc/raidtab:

  raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdc1
        raid-disk       0
        device          /dev/hdc2
        raid-disk       1

  raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdc4
        raid-disk       0
        device          /dev/hdc3
        failed-disk       1

4. Build the array:

> mkraid /dev/md1
handling MD device /dev/md1
analyzing super-block
disk 0: /dev/hdc4, 1048824kB, raid superblock at 1048704kB
disk 1: /dev/hdc3, failed

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc4[0]
      1048704 blocks [2/1] [U_]

unused devices: <none>

5. Put a filesystem on the array:

> mke2fs /dev/md1
mke2fs 1.27 (8-Mar-2002)
warning: 32 blocks unused.

Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131328 inodes, 262144 blocks
13108 blocks (5.00%) reserved for the super user
First data block=0
8 block groups
32768 blocks per group, 32768 fragments per group
16416 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

6. Copy contents of failed drive to array:

> mount /dev/md1 /mnt/raid/
> cd /mnt/drive2
> find . -xdev | cpio -pm /mnt/raid
> umount /dev/hdc3

7. Edit /etc/raidtab making failed drive active:

  raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdc1
        raid-disk       0
        device          /dev/hdc2
        raid-disk       1

  raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdc4
        raid-disk       0
        device          /dev/hdc3
        raid-disk       1

8. Hot add the failed drive to the array:

> raidhotadd /dev/md1 /dev/hdc3

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc3[2] hdc4[0]
      1048704 blocks [2/1] [U_]
      [=>...................]  recovery =  8.4% (90048/1048704) finish=1.5min speed=10005K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc3[2] hdc4[0]
      1048704 blocks [2/1] [U_]
      [==========>..........]  recovery = 51.8% (544640/1048704) finish=0.8min speed=10131K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc3[2] hdc4[0]
      1048704 blocks [2/1] [U_]
      [===============>.....]  recovery = 77.7% (816512/1048704) finish=0.4min speed=9480K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc3[1] hdc4[0]
      1048704 blocks [2/2] [UU]

unused devices: <none>

Thats it!

Test Setup - Degraded Mode - Real Drives
----------------------------------------

1. Check initial config of primary drive.

> fdisk /dev/hda

Command (m for help): p

Disk /dev/hda: 255 heads, 63 sectors, 1870 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         3     24066   83  Linux
/dev/hda2             4        41    305235   82  Linux swap
/dev/hda3            42      1870  14691442+  83  Linux

2. Partition secondary drive.

> fdisk /dev/hdd

Command (m for help): p

Disk /dev/hdd: 16 heads, 63 sectors, 29805 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-29805, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-29805, default 29805): +25M

Command (m for help): p

Disk /dev/hdd: 16 heads, 63 sectors, 29805 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdd1             1        51     25672+  83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (52-29805, default 52):
Using default value 52
Last cylinder or +size or +sizeM or +sizeK (52-29805, default 29805): +300M

Command (m for help): p

Disk /dev/hdd: 16 heads, 63 sectors, 29805 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdd1             1        51     25672+  83  Linux
/dev/hdd2            52       661    307440   83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (662-29805, default 662):
Using default value 662
Last cylinder or +size or +sizeM or +sizeK (662-29805, default 29805):
Using default value 29805

Command (m for help): p

Disk /dev/hdd: 16 heads, 63 sectors, 29805 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdd1             1        51     25672+  83  Linux
/dev/hdd2            52       661    307440   83  Linux
/dev/hdd3           662     29805  14688576   83  Linux

Command (m for help): p

Disk /dev/hdd: 16 heads, 63 sectors, 29805 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdd1   *         1        51     25672+  83  Linux
/dev/hdd2            52       661    307440   83  Linux
/dev/hdd3           662     29805  14688576   83  Linux

Command (m for help): a
Partition number (1-4): 1

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

3. Make a filesystem on the /boot partition of the secondary drive
and mount it.

> mke2fs /dev/hdd1
mke2fs 1.27 (8-Mar-2002)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
6432 inodes, 25672 blocks
1283 blocks (5.00%) reserved for the super user
First data block=1
4 block groups
8192 blocks per group, 8192 fragments per group
1608 inodes per group
Superblock backups stored on blocks:
        8193, 24577

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 32 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

> mount /dev/hdd1 /mnt/drive2

4. Install grub on the /boot partition of the secondary drive.

> grub-install --root-directory=/mnt/drive2 /dev/hdd
Probing devices to guess BIOS drives. This may take a long time.
Installation finished. No error reported.
This is the contents of the device map /mnt/drive2/boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0)   /dev/fd0
(hd0)   /dev/hda
(hd1)   /dev/hdd

> cd /mnt/drive2
> ln -s ./boot/grub grub

5. Copy kernel files to /boot partition of the secondary drive.

> cp /boot/* /mnt/drive2/
> cp /boot/grub/grub.conf /mnt/drive2/grub
> cd /mnt/drive2/grub
> ln -s ./grub.conf ./boot/grub/menu.lst
> cd ..
> umount /dev/hdd1

6. Make the swap partition on the secondary drive.

> mkswap /dev/hdd2
Setting up swapspace version 1, size = 307436K

7. Setup the /etc/raidtab file in degraded mode:

  raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdd3
        raid-disk       0
        device          /dev/hda3
        failed-disk       1

8. Build the array:

> mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/hdd3, 14688576kB, raid superblock at 14688512kB
disk 1: /dev/hda3, failed

9. Put ext3 filesystem on the array:

> mkfs -t ext3 /dev/md0
mke2fs 1.27 (8-Mar-2002)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
1836928 inodes, 3672128 blocks
183606 blocks (5.00%) reserved for the super user
First data block=0
113 block groups
32768 blocks per group, 32768 fragments per group
16256 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 37 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

10. Copy contents of failed drive to array:

> mkdir /mnt/raid
> mount /dev/md0 /mnt/raid/
> cd /
> find . -xdev | cpio -pm /mnt/raid

11. Create the initrd images:

> mkinitrd --preload raid1 --with=raid1 initrd-2.4.18-3.img 2.4.18-3
> cp initrd-2.4.18-3.img /mnt/drive2/
> cp /boot/initrd-2.4.18-3.img /boot/initrd-2.4.18-3.img.bak
> cp initrd-2.4.18-3.img /boot

12. Set partition type on secondary drive:

> fdisk /dev/hdd

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd

Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

13. Edit /etc/fstab on the array:

Change the root (/) line to absolute instead of label reference.

14. REBOOT

15. Edit /etc/raidtab and change failed drive to active.

  raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size     4
        persistent-superblock 1
        device          /dev/hdd3
        raid-disk       0
        device          /dev/hda3
        raid-disk       1

16. Charge partition type of failed drive using fdisk.

> fdisk /dev/hda

The number of cylinders for this disk is set to 1870.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/hda: 255 heads, 63 sectors, 1870 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         3     24066   83  Linux
/dev/hda2             4        41    305235   82  Linux swap
/dev/hda3            42      1870  14691442+  83  Linux

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

17. REBOOT

18. Hot add the failed drive

> raidhotadd /dev/md0 /dev/hda3

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hda3[2] hdd3[0]
      14688512 blocks [2/1] [U_]
      [>....................]  recovery =  0.3% (48704/14688512) finish=161.6min speed=1509K/sec
unused devices: <none>

> cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hda3[1] hdd3[0]
      14688512 blocks [2/2] [UU]

unused devices: <none>

DONE !!!

Boot Considerations
-------------------

Current                             Future
-------                             ---------
/dev/hda1 -> grub boot partition -> /dev/hda1 or /dev/hdc1
/dev/hda2 -> root partition      -> /dev/md0
/dev/hdc1 -> unused
/dev/hdc2 -> unused

/dev/md0 = /dev/hda2 + /dev/hdc2 (RAID 1 Mirror)

Issue 1: Raid is provided as modules but is needed before
the root filesystem is booted.

 mkinitrd --with=<module> <ramdisk name> <kernel>

 mkinitrd --preload raid5 --with=raid5 raid-ramdisk 2.2.5-22

An initrd command must be setup in the grub bootloader for this
to work.

Issue 2: The bios is set to boot from /dev/hda1. How do we get
booted from /dev/hdc1 with the root set to /dev/md0?

Solution: The search order for bootable drives in the BIOS should be
set to /dev/hda and then /dev/hdc. This may be automatic.

Current GRUB Config

 # grub.conf generated by anaconda
 #
 # Note that you do not have to rerun grub after making changes to this file
 # NOTICE:  You have a /boot partition.  This means that
 #          all kernel and initrd paths are relative to /boot/, eg.
 #          root (hd0,0)
 #          kernel /vmlinuz-version ro root=/dev/hda2
 #          initrd /initrd-version.img
 #boot=/dev/hda
 default=0
 timeout=10
 splashimage=(hd0,0)/grub/splash.xpm.gz
 title Red Hat Linux (2.4.20-13.7)
        root (hd0,0)
        kernel /vmlinuz-2.4.20-13.7 ro root=/dev/hda2
        initrd /initrd-2.4.20-13.7.img

New GRUB Config

 # grub.conf generated by anaconda
 #
 # Note that you do not have to rerun grub after making changes to this file
 # NOTICE:  You have a /boot partition.  This means that
 #          all kernel and initrd paths are relative to /boot/, eg.
 #          root (hd0,0)
 #          kernel /vmlinuz-version ro root=/dev/hda2
 #          initrd /initrd-version.img
 #boot=/dev/hda
 default=0
 timeout=10
 splashimage=(hd0,0)/grub/splash.xpm.gz
 title Red Hat Linux (2.4.20-13.7) Primary
        root (hd0,0)
        kernel /vmlinuz-2.4.20-13.7 ro root=/dev/md0
        initrd /initrd-2.4.20-13.7.img
 title Red Hat Linux (2.4.20-13.7) Backup
        root (hd1,0)
        kernel /vmlinuz-2.4.20-13.7 ro root=/dev/md0
        initrd /initrd-2.4.20-13.7.img

Setup Plan
----------

1. Configure the /etc/raidtab for a degraded raid 1 mirror using
/dev/hda2 and /dev/hdc2 with /dev/hdc2 as the active device and /dev/hda2
as a failed device.

2. Create new initrd with raid1 support. Reboot and verify that
new initrd is working.

2. Install modified grub on /dev/hdc1 and /dev/hda1.

3. Copy contents of hda2 to hdc2.

5. Reboot and verify that md0/hdc2 is working.

6. Hot add hda2 to the md0 array.

And that should do the trick!
blog comments powered by Disqus