Ubuntu adding a 2nd data drive as a mirror (RAID1)

Over the years I’ve had the expected number of hard drive failures. Some have been more catastrophic to me as I didn’t have a good backup strategy in place, others felt avoidable if I’d paid attention the warning signs.

My current setup for data duplication is based on Snapraid, a non-traditional RAID solution. It allows mixed sizes of drives, and the replication is done via regularly running the sync operation. Mine is done daily, files are sync’d across the drives and a data validation is done from time to time as well. This means while I might lose up to 24hrs of data if the primary drive fails, I have lower usage of the main parity drive and I get the assurance that file corruption hasn’t happened.

Snapraid is very bad when you have either: many small files, frequently changing files. It is ideal for backing up media like photos or movies. To deal with the more rapidly changing data I’ve got a SSD drive for storage. I haven’t yet had a SSD fail on me, but that is assured to happen at one point. Backblaze is already seeing some failure rate information that is concerning. Couple this with the fact that my storage SSD started throwing errors the other day and only a full power cycle of the machine brought it backĀ  – it’s fine now, but for how long? Time to setup a mirror.

For this storage I’m going back to traditional RAID. The SSD is a 480GB drive, and thankfully the price of them has dropped to easily under $70. This additional drive now fills all 6 of the SATA ports on my motherboard, the next upgrade will need to be an SATA port expansion card. I’ve written about RAID a few times here.

I’ve moved away from specifying drives as /dev/sdbX because these values can change. Even this new SSD caused the drive that was at /dev/sdf to move to /dev/sdg allowing the new drive to use /dev/sdf. My /etc/fstab is now setup using /dev/disk/by-id/xxx because these are persistent. Most of the disk utilities understand this format just fine as you an see with this example with fdisk.

Granted, working with /dev/disk/by-id is a lot more verbose – but that id will not change if you re-organize the SATA cables.

Let’s get going on setting up the new drive as a mirror for the existing one. Here’s the basic set of steps

  1. Partition the new drive so it is identical to the existing one
  2. Create a RAID1 array in degraded state
  3. Format and mount the array
  4. Copy the data from the existing drive to the new array
  5. Un-mount both the array and the original drive
  6. Mount the array where the original drive was mounted
  7. Make sure things are good – the next step is destructive
  8. Add the original drive to the degraded RAID1 array making it whole

It may seems like a lot of steps, and some of them are scary – but on the other side we’ll have a software RAID protecting the data. The remainder of this post will be the details of those steps above.

Step 1 – Partitioning the new drive

Any time you’re about to partition (or re-partition) a drive, it is important to be careful. We could very easily target the wrong one and then we have a big problem. Above you can see that I’ve done a sudo fdisk -l /dev/disk/by-id/ata-KINGSTON_SA400S37480G_50026B77841D62E8 and have been able to see it is not yet partitioned. This is a good way to confirm we have the right device. I also want to look at the drive we want to mirror and figure out how it is partitioned.

We want the new drive to look like that once we are done.

Note that I selected a non-default last sector to match the existing (smaller) drive. If the situation had been reversed, I’d be re-partitioning the existing drive to match the smaller new one. Net, to keep things sane with the mirror we want the same layout for both. It’s not a bad idea now to compare theĀ  partition layouts for the two drives to make sure we got this right.

Step 2 – Create a RAID1 array

We are going to create a RAID1 array, but with a missing drive – thus it will be in a degraded state. For this we need to look in /dev/disk/by-id and select the partition name we just created in step1. This will be -part1 at the end of the device name we used.

I can use /dev/md0 because this is the first RAID array on this system.

We can look at our array via /proc/mdstat — we should also expect to get an email from the system informing us that there is a degraded RAID array.

Step 3 – Format and mount

We can now treat the new /dev/md0 as a drive partition. This is standard linux formatting and mounting. I’ll be using ext4 as the filesystem.

And now we mount it on the /mnt endpoint.

Step 4 – Copy the data

For this step, I’m going to use rsync and probably run it multiple times because right now I have live workload changing some of those files. I’ll have to shut down all processes that are updating the original volume before doing the final rsync.

This will run for some time, depending on how much you’ve got on the original disk. Once it is done, shut down anything that might be changing the original drive and run the same rsync command again. Then you can move on to the next step.

Step 5 – Un-mount both the array and the original drive

Un-mounting /mnt was easy, because this was a new mount and my repeated rsync runs were the only thing targeting it.

In the previous step I’d already stopped the docker containers that were using the volume as storage, so I thought it’d be similarly trivial to unmount. I was wrong.

To track down what was preventing the unmount required digging through the verbose output of sudo lsof which will show all open files. It turned out that I had forgotten about the logging agent I have running which is reading some log files that live on this storage. Once I’d also stopped that process that was reading data I was good to go.

Step 6 – Mount the array where the original drive was mounted

This should be as easy as modifying /etc/fstab to point to /dev/md0 where we used to point to the physical disk by-id

Once /etc/fstab is fixed – we can just mount /mounted/original and restart everything we shut down.

Step 7 – Make sure things are good

At this point we have a degraded RAID1 array and a full (but aging) copy of the data on a second physical drive. This isn’t a bad place to be, but we should make sure that everything is working as expected. Check that all of the workloads you have are not generating unusual logs and whatever else you can think of to check that the new copy of the data appears to be good to go.

Step 8 – Complete the RAID1 array

We are now going to add the original drive to the RAID1 array, changing it from degraded to whole. This is a little scary because we are about to destroy the original copy of the data, but the trade-off is that we’ll end up with a resilient mirrored drive setup of the newly mounted /dev/md0 filesystem.

Again, you will notice that I’m using the by-id specification of the original drive partition which ends in -part1 as there is only one partition.

Once we’ve done this, we can monitor the progress of the two drives being mirrored (aka: recovering to RAID1 status)

Once things have completed and the system is stable, a reboot isn’t a bad idea to ensure that everything will start up fine. This is generally a good thing to do whenever you are making changes to the system.

While this isn’t perfect protection from any sort of data loss, it should allow us to gracefully recover when one of the SSD drives stops working. Having a backup plan that you test regularly is a very good thing to add as another layer of data protection.

8 thoughts on “Ubuntu adding a 2nd data drive as a mirror (RAID1)”

  1. Nice write-up!

    I’ve been using e2fslabel and mounting by label, rather than mounting by drive ID. It’s a little less verbose, and lets me use automated scripts with multiple copies of a backup disk that I iterate through (mount –label weeklybackup /mnt).

  2. Yikes – I got an email that indicated “A Fail event had been detected on md device /dev/md0.”

    Inspecting the system, the device that was the failed drive simply did not appear in the /dev/sdX list anymore. A reboot did not recover the drive, however a full power off / power on did.

    Once I could see the device, it was was still marked as ‘removed’ from the array. Simply re-adding it resulted in mdadm reporting that the volume was re-added – avoiding a long re-sync.

    $ sudo mdadm /dev/md0 –add /dev/disk/by-id/ata-ADATA_SU650_2K1220024459-part1
    mdadm: re-added /dev/disk/by-id/ata-ADATA_SU650_2K1220024459-part1

    The array looks happy – but you can launch a check array manually

    $ sudo /usr/share/mdadm/checkarray /dev/md0

    And then monitor progress

    $ cat /proc/mdstat

    Still, it might be time to retire the drive – or get a spare.

  3. Hmm. Again the same drive appears to have failed and I get emails indicating the drive array is degraded. Power off / power on of the server and the drive comes back – but the RAID array has removed the device.

    Re-adding it as per above and it re-sync’d just fine.. scary. FWIW this is an ADATA drive – I probably will shy away from this brand for important storage needs in the future.

  4. Boo.. 3rd time this has happened. Same ADATA drive getting stuck. It seems to need a hard power off / on to come back. The I need to re-add the drive to the array

    $ sudo mdadm /dev/md0 –re-add /dev/disk/by-id/ata-ADATA_SU650_2K1220024459-part1

    I’m not sure if –add or –re-add is the right way.. I’ll have to read up on this. Note: two dashes precede the command.

    Then kick the array to check itself and we’re good to go.

    $ sudo /usr/share/mdadm/checkarray /dev/md0

  5. Oh oh – that pesky ADATA drive failed again.

    This time just a full power off / power on seems to have restored the array to a healthy state. I kicked off a check to make sure all is well.

    $ sudo /usr/share/mdadm/checkarray /dev/md0

  6. Again the ADATA drive took a holiday.

    P.S. The /proc/mdstat file currently contains the following:

    Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid1 sde1[0] sdf1[2](F)
    468717568 blocks super 1.2 [2/1] [U_]
    [===================>.] check = 99.9% (468717568/468717568) finish=0.0min speed=15548K/sec
    bitmap: 3/4 pages [12KB], 65536KB chunk

    Again, power off and power on.. and we’re back.. ran a checkarray just in case..

  7. Oops.. not so fast – I just got another degraded array error. Boo.

    Hmm, but the checkarray is still working away.. maybe it’ll just take a while for the mirror to repair itself?

    md0 : active raid1 sdf1[2] sde1[0]
    468717568 blocks super 1.2 [2/2] [UU]
    [>………………..] check = 4.4% (20911552/468717568) finish=91.6min speed=81399K/sec
    bitmap: 2/4 pages [8KB], 65536KB chunk

    yup.. seems the check completed just fine – and my array is all good to go.

  8. Again – one of the drives failed and I got an email “Fail event on /dev/md0 … ”

    My solution this time was to just ‘sudo poweroff’ then go hit the button to start things up once it had shut down. This is probably the simplest/quickest way to recover.. I may just need to replace that drive, SSDs are certainly cheap enough now.

    After the boot – I got an email “DegradedArray event on /dev/md0..” with some details about the recovery

    A DegradedArray event had been detected on md device /dev/md0.

    Faithfully yours, etc.

    P.S. The /proc/mdstat file currently contains the following:

    Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid1 sdf1[2] sde1[0]
    468717568 blocks super 1.2 [2/1] [U_]
    [==>………………] recovery = 14.3% (67200192/468717568) finish=30.4min speed=219440K/sec
    bitmap: 4/4 pages [16KB], 65536KB chunk

    unused devices:

Leave a Reply

Your email address will not be published. Required fields are marked *