When Mirrors Break: RAID1 failure and recovery

A couple of years ago I added a second drive to my server in a RAID1 (mirror) configuration. Originally I was using the single drive for logs, but with a more durable mirror setup I moved more (important) data to it.

RAID is not a backup story, if you really care about the data you want to back it up. There are two hard lessons I learned with this recent failure (and my recovery). Two valuable to me bits of data I’m storing on this mirrored volume are email, and photoprism storage (but not the photos themselves). Stupidly I did not have regular backups of either of these, please learn from my mistake.

The two lessons I hope to learn from this are:

  1. Backup your data, even a bad backup is better than nothing
  2. Do not ignore any signs of problems, replace any suspicious hardware ASAP

If you read the comments on my previous post, you will see a history of minor failures that I clearly willfully ignored. I mean, hey – it’s a mirrored setup and mostly I had 2 drives working fine.. right? Stupid me.

The replacement 500GB SSD drive cost me $56.49 taxes in, it even has a 5 year manufacturer warranty in comparison to the 3 year warranty on the failed ADATA drive. Sadly checking the ADATA warranty shows me it made it just path the 3 year mark (not that a ‘free’ replacement drive would fix my problem)

While ADATA has been mostly reliable for me in the past, I’ll pick other brands for my important data. The ADATA products are often very cheap which is attractive, but at the current cost of SSDs it’s easy to pay for the premium brands.

Here is a brief replay of how the disaster rolled out. The previous day I had noticed that something was not quite right with email, but restarting things seemed to resolve the issue. The next morning email wasn’t flowing, so there was something wrong.

Looking at the logs, I was seeing a lot of messages “structure needs cleaning” – which is an indicator that there is some sort of ext4 filesytem problem and it needs to run a check to clean things up. It also appeared that the ADATA half of the mirror had failed in some way. Rebooting the system seemed like a good idea and everything seems to have come back.

Checking the logs for the mail system showed all was well, but then I checked email on my phone, and there were no messages? Stupidly I then opened up my mail client on my laptop, which then proceeded to synchronize with the mail server and delete all of the email stored on my laptop to mirror the empty mailbox on the server.

What was wrong? It took a while, but I figured out that my RAID1 array had completely failed to initialize, both volumes were marked as ‘spare’.

Ugh, well that explains what happened. When the system reboot the mount failed – and my mail server just created new data directories on the mount point (which are on my root volume).

At this point I realize I’m in a bad place, having potentially flushed decades of email. Have I mentioned that running your own email is a bad idea?

Time to start capturing things for recovery. I did a copy of the two drives using dd:

In the process of doing this, it became obvious that sdf (the ADATA drive) had hard read errors, where in contrast I was able to complete the image creation of sde (a Kingston drive).

Once I had some time to think about the situation, I was able to re-add the good drive to the array to make it become active. This let me mount the volume and make a copy of the email for backup purposes. Once this was done I unmounted and ran a fschk -y /dev/md0 to fix all of the filesystem errors.

I then stopped the currently running mail server, renamed the mount point directory to keep the email that had come into the system while I was doing repairs, and created a new (empty) mount point. Then a reboot.

Sigh of relief as all of my mail appeared back. Sure, I’m running with a degraded RAID1 array and the fschk clearly removed some corrupted files but at least the bulk of my data is back.

Fixing the broken mirror was relatively straight forward. I bought a new drive. Then I captured the output of ls dev/disk/by-id/ before powering down the system and physically swapping the bad drive for the good drive. I could then repeat the ls dev/disk/by-id/ and look at the diffs, this allowed me to see the new drive appear, and inspect which drive letter it mapped to.

Nice, it appears to have slotted in just where the previous ADATA drive was, not important but comforting. I then dumped the fdisk information of the healthy Kingston drive.

We want our new drive to be partitioned the same way, luckily the new SSD is even bigger. Mostly this is accepting defaults with the exception of typing in the last sector to match the Kingston drive.

This is similar to the original creation of the RAID1 post, but we can now skip to step 8 and add the new volume.

And that’s it, now we just wait for the mirror to re-sync. It is interesting to note that while I can talk about the device ‘by-id’, mdstat uses the legacy drive letters.

And a short while later, it’s nearly done.

At this point my email appears to be working correctly.  The ext4 filesystem corruption I blame on the failing ADATA drive in the mirror, but this is a guess. The corruption caused a few emails to be ‘lost’, but had a bigger impact on the photoprism data which in part was the mariadb storage. I also noticed that both my prometheus data and mimir data were corrupted, neither of these are critical though.

Backups are good, they don’t have to be perfect – future you will be thankful.

Pixel 4a Screen Protector

The Pixel 4a continues to be my “daily driver”. I still mostly only need to charge it every 2 days, but by the second day the battery is well into the red and I’ve needed to top up to make it through. Using Android Auto in the car (wired) has also changed things a little, as my phone is getting charged while I drive. Still, on a full battery I can go all day.

Of course, battery life is completely related to usage. I have a very modest number of apps, and I spend all day attached to a keyboard so I’m not using the phone very much at all.

I’ve had a screen protector on the phone from day one. My preference, and it seems to be where the industry has gone too, is to have a ‘tempered glass’ screen protector. This particular brand doesn’t even have a selfie camera hole – it’s just a rounded rectangle of glass. I bought these on eBay way back in October 2020 – the listing is still active. I’d recommend this vendor as the product I got was very good, they also carry many other sizes for other phones.

I’ve also got a bumper case on the phone which has saved it from many a drop. I finally got unlucky and dropped it 4 feet onto ceramic tile and the screen protector cracked.

This wasn’t the first tumble onto hard tile, but it finally landed the wrong way and cracked the screen protector. I will say that after being tossed around and living in my pocket for years, the screen protector itself was still in good shape.

As you can see, the damage to the screen protector was pretty obvious.

Since this was a 2 pack of protectors, I had another one waiting to go. Peeling the old one off revealed the pristine 4a screen, exactly what I want to protect.

The protector ships with a couple of generic wipes. After sitting around for a couple of years the wet wipe had dried out. I didn’t need much cleaning power anyways so I just gave it a good wipe down with the dry one.

The screen protector itself has a protective sticker only on one side. This is the screen side. I like to leave it in the foam sleeve until I’m about to install as that helps reduce dust. The install kit comes with a couple of stickers that you use as a ‘hinge’ once you’ve placed the new screen on the phone (with protective sticker still on). Once the hinge is setup, you lift the screen and peel off the protective sticker.

Let the clean screen flop down on the clean phone, and watch the magic ‘bonding’ happen. If you’ve managed to stay dust free, it’ll be a nice clean match up and you’re good to go. I wasn’t so lucky this time.

Yup, a dreaded dust blob between the screen protector and the phone screen. Along with the guide stickers (hinges) you get a dust remover sticker. Gently peel the new screen up, it’ll stay attached due to the sticker hinges. Then dab at the dust blob – in my case it was stuck to the new screen protector, but you can do both sides (gently). The dust remover sticker should pick up the dust and leave a clean surface behind. Re-flop the screen and if you’ve not introduce more dust it should be good to go. Carefully remove the hinge stickers and put the case back on.

Here is a good youtube video on the hinge method for screen installs.

If you need to put a screen on without the stickers – just use scotch tape. It works exactly the same. You want to avoid touching anything directly with your fingers (which are slightly greasy). I’ve installed many screen protectors, and it does get easier – but even someone doing it the first time can succeed if you go slowly and try to be in a dust free location. One hint would be to do this in the bathroom just after you’ve had a shower – the moisture in the air tends to cut down on dust.

For me, screen protectors work well. I’d rather scratch/crack the screen protector than risk a ding in my phone screen. In the past, I’ve used screen protectors to cover up / mask scratches in the screen of a used phone I’ve bought – so even if you have a scratch, a screen protector can help make your phone seem new.

GL.iNet GL-AR300M16 with OpenWRT 22.03.05

When travelling I usually just deal with the internet situation that is provided, I’ve got wireguard if I want to have ad blocking or reach to my home network. The other day I got looking at travel routers, and while TP-Link has some popular ones, the GL.iNet devices seem to have more flash and RAM for basically the same prices.

The GL.iNet AR300M16 was under $40 on amazon.ca, and it shipped (free) in a few days. Look at it, very tiny and cute – but more powerful than the Netgear WNR3500L that I’ve used in the past. The USB power supply I’m using is larger than the router.

Of course, I selected this device with OpenWRT in mind. While the stock firmware has some really nice features as a travel router – I think I can achieve the same things with plain old OpenWRT. The GL.iNet device family apparently uses an OpenWRT base and customizes it. There are a number of GL.iNet devices documented on the OpenWRT site, but nothing specific for the AR300M16. The AR300M is close, but has a different flash module setup.

The first thing I did was just connect to the device, both wireless and wired. I knew that the OpenWRT install was going to require a wired only connection so I wanted to make sure that the laptop I was using was going to be able to successfully connect to the stock firmware over wire.

I was impressed at the quality of the user interface. I may have to give the stock firmware a proper try, but first let’s flash OpenWRT to it.

This turns out to be very easy. The stock firmware ‘local’ upgrade process will accept a .bin file. The OpenWRT firmware selector gives us an easy way to find a compatible firmware for the “GL.iNet GL-AR300M16” device.

I started with the Kernel image. This is the recommended path for moving from stock as it’s a smaller image. The stock firmware was happy to accept this .bin file as an upload, but warned me that I was treading in dangerous waters.

No problem, I know what I’m doing (so I told myself). Hitting “Install” and off we went. I did made sure that before I flashed the firmware I was using a quality USB power supply that delivered more than 2A of power.

This went smoothly, but the IP address of the router changed from 192.168.8.1 to 192.168.1.1. This is a difference between the stock firmware defaults and the OpenWRT defaults.

I then used the OpenWRT firmware upgrade to flash the sysupgrade image. This went smoothly as well. Now I have a teeny tiny router with OpenWRT installed.

Next I need to figure out how I want to configure this particular device to be my travel router, allowing me to connect my devices to it – and have it use another wifi network as the upstream. Then explore adding some ad blocking and some other nice features.