mdadm Command: Tutorial & Examples

Managing RAID Devices

mdadm stands for Multiple Device Administration. It is a powerful command-line tool used in Linux to manage software RAID arrays. RAID, which stands for Redundant Array of Independent Disks, is a technique used to combine multiple physical storage devices into a logical unit for improved performance, data redundancy, or both. With mdadm, you can create, monitor, and manage RAID arrays, ensuring data integrity and availability.

Understanding RAID arrays

Before we dive into mdadm, let's briefly understand RAID arrays. A RAID array consists of two or more physical disks combined together to form a single logical unit. The data is distributed across these disks using different strategies known as RAID levels, such as RAID 0, RAID 1, RAID 5, RAID 6, and more.

RAID 0: Provides increased performance by striping data across multiple disks, but offers no redundancy.
RAID 1: Offers data redundancy by mirroring the data on multiple disks, providing fault tolerance.
RAID 5: Combines striping and parity for enhanced performance and redundancy.
RAID 6: Similar to RAID 5 but with double parity, offering increased fault tolerance.

Understanding these levels helps administrators choose the right configuration for their needs, balancing performance and redundancy.

Why is mdadm important?

mdadm is essential for managing and maintaining RAID arrays in Linux. It allows you to create, assemble, and monitor RAID configurations, ensuring the stability and reliability of your storage infrastructure. Whether you're setting up a file server, a database server, or a web server, mdadm comes to the rescue when it comes to managing your RAID arrays efficiently.

With mdadm, you can perform various operations, including:

Creating new RAID arrays
Adding or removing disks from an existing array
Monitoring the status and health of RAID devices
Rebuilding failed or replaced disks
Reshaping the array for capacity or performance changes
Handling RAID failover and recovery

Technical background

mdadm operates at a higher level than the kernel's built-in RAID support, which is typically limited to hardware RAID. It allows for more flexibility and control over the RAID configuration, providing capabilities such as software RAID creation and management that can easily be modified without hardware constraints. It also supports monitoring of RAID arrays via the /proc/mdstat virtual file, which provides real-time status updates.

Common problems and pitfalls

While using mdadm can greatly enhance data storage management, several common issues may arise:

Disk failure: If a disk in the array fails, it is crucial to replace it promptly to avoid data loss.
Configuration errors: Incorrect parameters during RAID creation can lead to suboptimal performance or data loss.
Not monitoring: Failing to monitor the RAID status can lead to undetected failures and data integrity issues.
Inconsistent RAID levels: Mixing different RAID levels in the same array can create complexity and potential data loss.

Practical examples

Now, let's explore some practical examples to understand how to use mdadm effectively.

Example 1: Creating a RAID 1 array

To create a RAID 1 array, which provides data redundancy by mirroring, you can use the following command:

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1

In this example, we are creating a new RAID 1 array named /dev/md0 with two devices (/dev/sdb1 and /dev/sdc1).

Example 2: Monitoring RAID devices

You can use mdadm to monitor the status and health of your RAID devices. Use the following command to display the detailed information of all RAID arrays:

mdadm --detail --scan

This command provides a comprehensive overview of your RAID devices, including their current status, disk health, and any failures or inconsistencies.

Example 3: Rebuilding a failed disk

When a disk in your RAID array fails, you need to replace it and rebuild the array. Assuming /dev/sdb1 has failed and has been replaced with a new disk, you can rebuild it using the following command:

mdadm --manage /dev/md0 --add /dev/sdb1

This command instructs mdadm to add /dev/sdb1 back to the RAID array /dev/md0 for the rebuilding process.

Example 4: Scanning devices and starting RAID

If the array is not found, you can scan for devices and start the RAID with:

mdadm --assemble --scan

Example 5: Resizing a RAID array and adding a partition

This process requires that the RAID is not in use. You need to start your server or VM in recovery mode to perform these steps:

Check the device for errors:
```
e2fsck -f /dev/md2
```
Resize the filesystem to be slightly smaller than necessary:
```
resize2fs /dev/md2 25G
```
Resize the RAID device:
```
mdadm --grow /dev/md2 --size=33554432
```
Resize the filesystem to the maximum:
```
resize2fs /dev/md2
```
Check the filesystem again:
```
e2fsck -f /dev/md2
```
Remove one partition from the RAID:
```
mdadm /dev/md2 --fail /dev/sdb4
```
Stop the RAID array:
```
mdadm --stop /dev/md2
```
Resize the partitions using fdisk or gdisk. Refresh the partition table to the kernel:
```
partprobe
```

Add the partition to the RAID array:

mdadm --zero-superblock /dev/sdb4
mdadm -a /dev/md2 /dev/sdb4

Watch the disks resyncing:
```
cat /proc/mdstat
```

Create a new RAID 1:

mdadm --create --verbose /dev/md3 --level=mirror --raid-devices=2 /dev/sda5 /dev/sdb5

Example 6: Deleting RAID volume

To delete a RAID volume, follow these steps:

Unmount the RAID volume:
```
umount /mnt/raidvolume
```
Stop the RAID array:
```
mdadm --stop /dev/md0
```
Zero the superblock on the disk:
```
mdadm --zero-superblock /dev/sda4
```

After that, you can remove the mount point in /etc/fstab and also remove the RAID configuration from /etc/mdadm/mdadm.conf.

Common errors and troubleshooting

When using mdadm, you may encounter some common errors:

Array not found: This can occur if the RAID array has not been assembled correctly. Use the --assemble option to rectify this.
Degraded array: If a disk fails, the array will operate in degraded mode. Replace the failed disk and rebuild the array to restore redundancy.
Checksum errors: These may indicate data corruption. Running e2fsck on the filesystem can help identify and fix these issues.
Inconsistent state: If the array is in a read-only state, it may require a manual check or re-assembly.