Replacing a ZFS degraded device

It was no surprise that a new RAIDZ array built out of decade old drives was going to have problems, I didn’t expect the problems to happen quite so quickly, but I was not surprised. This drive had 4534 days of power on time, basically 12.5 years. It was also manufactured in Oct 2009, making it 14.5 years old.

I had started to backup some data to this new ZFS volume, and upon one of the first scrub operations ZFS flagged this drive as having problems.

$ zpool status 
  pool: backup
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 908K in 13:29:51 with 0 errors on Tue Apr  9 06:59:00 2024
config:

        NAME                                          STATE     READ WRITE CKSUM
        backup                                        DEGRADED     0     0     0
          raidz1-0                                    DEGRADED     0     0     0
            ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946  ONLINE       0     0     0
            wwn-0x50014ee2b0706857                    ONLINE       0     0     0
            wwn-0x50014ee2adfa14f6                    ONLINE       0     0     0
            wwn-0x50014ee2ae38ab42                    FAULTED     36     0     0  too many errors

errors: No known data errors

$ zpool status

pool: backup

state: DEGRADED

status: One or more devices are faulted in response to persistent errors.

Sufficient replicas exist for the pool to continue functioning in a

degraded state.

action: Replace the faulted device, or use 'zpool clear' to mark the device

repaired.

scan: scrub repaired 908K in 13:29:51 with 0 errors on Tue Apr 9 06:59:00 2024

config:

NAME STATE READ WRITE CKSUM

backup DEGRADED 0 0 0

raidz1-0 DEGRADED 0 0 0

ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946 ONLINE 0 0 0

wwn-0x50014ee2b0706857 ONLINE 0 0 0

wwn-0x50014ee2adfa14f6 ONLINE 0 0 0

wwn-0x50014ee2ae38ab42 FAULTED 36 0 0 too many errors

errors: No known data errors

The degraded device, maps to /dev/sdg – I determined this by looking a the /dev/disk/by-id/wwn-0x50014ee2ae38ab42 link.

On one of my other systems I’m using snapraid.it, which I quite like. It has a SMART check that does a calculation to indicate how likely the drive is to fail. I’ve often wondered how accurate this calculation is.

SnapRAID SMART report:

   Temp  Power   Error   FP Size
      C OnDays   Count        TB  Serial                Device    Disk
 -----------------------------------------------------------------------
     28   4534       0 100%  1.0  WD-WCAV53163713       /dev/sdg  -
     24   3803       0  84%  1.0  WD-WMC0T0683946       /dev/sdd  -
     23   4156       0  84%  1.0  WD-WCAZA6813339       /dev/sde  -
     27   4740       0  97%  1.0  WD-WCAV51778566       /dev/sdf  -

The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.

Probability that at least one disk is going to fail in the next year is 100%.

SnapRAID SMART report:

Temp Power Error FP Size

C OnDays Count TB Serial Device Disk

-----------------------------------------------------------------------

28 4534 0 100% 1.0 WD-WCAV53163713 /dev/sdg -

24 3803 0 84% 1.0 WD-WMC0T0683946 /dev/sdd -

23 4156 0 84% 1.0 WD-WCAZA6813339 /dev/sde -

27 4740 0 97% 1.0 WD-WCAV51778566 /dev/sdf -

The FP column is the estimated probability (in percentage) that the disk

is going to fail in the next year.

Probability that at least one disk is going to fail in the next year is 100%.

The nice thing is you don’t need to be using snapraid to get the SMART check data out, it’s a read only activity based on the devices. In this case it has decided the failing drive has 100% chance of failure, so that seems to check out.

Well, as it happens I had a spare 1TB drive on my desk so it was a matter of swapping some hardware. I found a very useful blog post covering how to do it, and will replicate some of the content here.

As I mentioned above, you first need to figure out which device it is, in this case it is /dev/sdg. I also want to figure out the serial number.

$ sudo smartctl -a /dev/sdg | grep Serial
Serial Number:    WD-WCAV53163713

1 2	$ sudo smartctl -a /dev/sdg \| grep Serial Serial Number: WD-WCAV53163713

Good, so we know the serial number (and the brand of drive), but when you’ve got 4 identical drives, which of the 4 is the right serial number? Of course, I ended up pulling all 4 drives before I found the matching serial number. The blog post gave some very good advice.

Before I configure an array, I like to make sure all drive bays are labelled with the corresponding drive’s serial number, that makes this process much easier!

Every install I make will now follow this advice, at least for ones with many drives. My system now looks like this thanks to my label maker

I’m certain future me will be thankful.

Because the ZFS array had marked this disk as being in a FALTED state, we do not need to mark it ‘offline’ or anything else before pulling the drive. If we were swapping an ‘online’ disk we may need to do more before pulling the drive.

Now that we’ve done the physical swap, we need to get the new disk added to the pool.

The first, very scary thing we need to do is copy the partition from an existing drive in the vdev. The new disk is the TARGET, and an existing disk is SOURCE.

# Check twice, you really don't want to mess this up
# sudo sgdisk --replicate /dev/TARGET /dev/SOURCE

$ sudo sgdisk --replicate /dev/sdg /dev/sdf

# Check twice, you really don't want to mess this up

# sudo sgdisk --replicate /dev/TARGET /dev/SOURCE

$ sudo sgdisk --replicate /dev/sdg /dev/sdf

Once the partition is copied over, we want to randomize the GUIDs as I believe ZFS relies on unique GUIDs for devices.

# Again, taking care that the device is the TARGET (aka: new drive)

$ sudo sgdisk --randomize-guids /dev/sdg

# Again, taking care that the device is the TARGET (aka: new drive)

$ sudo sgdisk --randomize-guids /dev/sdg

This is where my steps deviate from the referenced blog post, but the changes make complete sense. When I created this ZFS RAIDZ array I used the short sdg name for the device. However, as you can see after a reboot the zpool command is showing me the /dev/disk/by-id/ name.

# sudo zpool replace backup OLD NEW

$ sudo zpool replace backup /dev/disk/by-id/wwn-0x50014ee2ae38ab42 /dev/sdg

# sudo zpool replace backup OLD NEW

$ sudo zpool replace backup /dev/disk/by-id/wwn-0x50014ee2ae38ab42 /dev/sdg

This worked fine. I actually had a few miss-steps trying to do this, and zpool gave me very friendly and helpful error messages. More reason to like ZFS as a filesystem.

$ zpool status backup -v
  pool: backup
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 11 09:12:10 2024
        18.6G / 2.15T scanned at 359M/s, 0B / 2.15T issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                          STATE     READ WRITE CKSUM
        backup                                        DEGRADED     0     0     0
          raidz1-0                                    DEGRADED     0     0     0
            ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946  ONLINE       0     0     0
            wwn-0x50014ee2b0706857                    ONLINE       0     0     0
            wwn-0x50014ee2adfa14f6                    ONLINE       0     0     0
            replacing-3                               DEGRADED     0     0     0
              wwn-0x50014ee2ae38ab42                  OFFLINE      0     0     0
              sdg                                     ONLINE       0     0     0

$ zpool status backup -v

pool: backup

state: DEGRADED

status: One or more devices is currently being resilvered. The pool will

continue to function, possibly in a degraded state.

action: Wait for the resilver to complete.

scan: resilver in progress since Thu Apr 11 09:12:10 2024

18.6G / 2.15T scanned at 359M/s, 0B / 2.15T issued

0B resilvered, 0.00% done, no estimated completion time

config:

NAME STATE READ WRITE CKSUM

backup DEGRADED 0 0 0

raidz1-0 DEGRADED 0 0 0

ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946 ONLINE 0 0 0

wwn-0x50014ee2b0706857 ONLINE 0 0 0

wwn-0x50014ee2adfa14f6 ONLINE 0 0 0

replacing-3 DEGRADED 0 0 0

wwn-0x50014ee2ae38ab42 OFFLINE 0 0 0

sdg ONLINE 0 0 0

Cool, we can see that ZFS is repairing things with the newly added drive. Interestingly it is shown as sdg currently.

This machine is pretty loud (it has a lot of old fans), so I was pretty wild and powered it down while the ZFS was trying to resilver things. When I rebooted it after relocating it to where it normally lives and the noise won’t bug me, it seems that the device naming has sorted itself out.

$ zpool status backup -v
  pool: backup
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 11 09:12:10 2024
        22.9G / 2.15T scanned at 244M/s, 0B / 2.15T issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                          STATE     READ WRITE CKSUM
        backup                                        DEGRADED     0     0     0
          raidz1-0                                    DEGRADED     0     0     0
            ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946  ONLINE       0     0     0
            wwn-0x50014ee2b0706857                    ONLINE       0     0     0
            wwn-0x50014ee2adfa14f6                    ONLINE       0     0     0
            replacing-3                               DEGRADED     0     0    20
              wwn-0x50014ee2ae38ab42                  OFFLINE      0     0     0
              wwn-0x5000cca3a8d3fcdb                  ONLINE       0     0     0

$ zpool status backup -v

pool: backup

state: DEGRADED

status: One or more devices is currently being resilvered. The pool will

continue to function, possibly in a degraded state.

action: Wait for the resilver to complete.

scan: resilver in progress since Thu Apr 11 09:12:10 2024

22.9G / 2.15T scanned at 244M/s, 0B / 2.15T issued

0B resilvered, 0.00% done, no estimated completion time

config:

NAME STATE READ WRITE CKSUM

backup DEGRADED 0 0 0

raidz1-0 DEGRADED 0 0 0

ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946 ONLINE 0 0 0

wwn-0x50014ee2b0706857 ONLINE 0 0 0

wwn-0x50014ee2adfa14f6 ONLINE 0 0 0

replacing-3 DEGRADED 0 0 20

wwn-0x50014ee2ae38ab42 OFFLINE 0 0 0

wwn-0x5000cca3a8d3fcdb ONLINE 0 0 0

The snapraid SMART report now looks a lot better too

$ sudo snapraid smart
[sudo] password for roo: 
SnapRAID SMART report:

   Temp  Power   Error   FP Size
      C OnDays   Count        TB  Serial                Device    Disk
 -----------------------------------------------------------------------
     26   1416       0   4%  1.0  JPW9K0N21DZ2AE        /dev/sdg  -
     23   4158       0  84%  1.0  WD-WCAZA6813339       /dev/sdd  -
     23   3805       0  84%  1.0  WD-WMC0T0683946       /dev/sde  -
     25   4742       0  97%  1.0  WD-WCAV51778566       /dev/sdf  -

$ sudo snapraid smart

[sudo] password for roo:

SnapRAID SMART report:

Temp Power Error FP Size

C OnDays Count TB Serial Device Disk

-----------------------------------------------------------------------

26 1416 0 4% 1.0 JPW9K0N21DZ2AE /dev/sdg -

23 4158 0 84% 1.0 WD-WCAZA6813339 /dev/sdd -

23 3805 0 84% 1.0 WD-WMC0T0683946 /dev/sde -

25 4742 0 97% 1.0 WD-WCAV51778566 /dev/sdf -

It took about 9 hours to finish the resilvering, but then things were happy.

$ zpool status backup -v
  pool: backup
 state: ONLINE
  scan: resilvered 531G in 09:17:03 with 0 errors on Thu Apr 11 18:29:13 2024
config:

        NAME                                          STATE     READ WRITE CKSUM
        backup                                        ONLINE       0     0     0
          raidz1-0                                    ONLINE       0     0     0
            ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946  ONLINE       0     0     0
            wwn-0x50014ee2b0706857                    ONLINE       0     0     0
            wwn-0x50014ee2adfa14f6                    ONLINE       0     0     0
            wwn-0x5000cca3a8d3fcdb                    ONLINE       0     0     0

errors: No known data errors

$ zpool status backup -v

pool: backup

state: ONLINE

scan: resilvered 531G in 09:17:03 with 0 errors on Thu Apr 11 18:29:13 2024

config:

NAME STATE READ WRITE CKSUM

backup ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

ata-WDC_WD10EARX-00N0YB0_WD-WMC0T0683946 ONLINE 0 0 0

wwn-0x50014ee2b0706857 ONLINE 0 0 0

wwn-0x50014ee2adfa14f6 ONLINE 0 0 0

wwn-0x5000cca3a8d3fcdb ONLINE 0 0 0

errors: No known data errors

Some folks think that you should not use RAIDZ, but create a pool with a collection of vdevs which are mirrors.

About 2 weeks later, I had a second disk go bad on me. Again, no surprise since these are very old devices. Here is a graph of the errors.

The zfs scrub ran on April 21st, and you can see the spike in errors – but clearly this drive was failing slowly all along as I was using it in this new build. This second failing drive was /dev/sdf – which if you look back at the snapraid SMART report, was at 97% failure percentage. It is worth noting that while ZFS and the snapraid SMART have both decided these drives are bad, I was able to put both drives into a USB enclosure and access them still – I certainly don’t trust these old drives to store data on them, but ZFS stopped using the device before it became unusable.

I managed to grab a used 1TB drive for $10. It is quite old (from 2012) but only has a 1.5yrs of power on time. Hopefully it’ll last, but at the price it’s hard to argue. Swapping that drive in was a matter of following the same steps. Having the drive bay labelled with the serial numbers was very helpful.

Since then, I’ve picked up another $10 1TB drive, and this one is from 2017 with only 70 days of power on time. Given I’ve still got two decade old drives in this RAIDZ, I suspect I’ll be replacing one of them soon. The going used rate for 1TB drives is between $10 and $20 locally, amazing value if you have a redundant layout.

Leave a Reply