NixOS with mirrored ZFS boot volume


My past experiments with ZFS were also based on NixOS, a Linux distribution that I am growing increasingly fond of. It has a declarative style for configuration, which means you can more easily reproduce your install. For my new server build out, I will be using NixOS as my base, layering up plenty of docker containers to operate the various services I self-host – and the filesystems will be ZFS with redundancy. In this post I will focus on the base install, specifically the setup of the hardware for the very first NixOS install.

First I needed to learn how to get NixOS installed on a mirrored ZFS boot volume. Recall that my hardware has a pair of M2 SSDs which will hold the base operating system. There are many possible approaches documented: ZFSBootMenu looked really interesting, but it’s not native NixOS; I found a GitHub project that had some nice scripts, but it was a little too script-y for me and was hard to understand; The OpenZFS site documents installing NixOS which includes support for mirrored boot drives, but I wasn’t sure about how to mirror the EFI filesystem and recovery after a drive failure; the NixOS forum was informative; many of the other places that talk about this type of install point at a blog post, which I ended up using as my base.

While the original author does the partitioning and filesystem setup as a set of manual steps, I’ve captured mine as an install script. I had a side journey investigating disko which looks great but was one more thing to learn, and I already know how to cook a script.

When editing the /mnt/etc/nixos/configuration.nix file you will need to add some specific sections to ensure that you get a working ZFS enabled system on the other side

First we need to change from the systemd-boot EFI boot loader, to a grub based boot. This let’s us make use of the grub.mirroredBoots support to keep both EFI partitions updated. We are also specifying the mirrored boot devices by uuid, and tagging both drives a “nofail” to allow the system to boot even if we lose a drive. Of course we also need to indicate that ZFS is supported, and create a unique networking.hostId which ZFS needs.

Other settings from the default you probably want to change. Setting a hostname, timezone, and enabling one of the networking options. You may also want to define a user and enable sshd. All of these are pretty standard NixOS configuration issues, all of the ZFS magic is captured above.

Once you’ve got this configured, you should be able to run nixos-install and reboot into your new system.

One nice trick I learned is that from the full graphical installer you can open up a terminal shell, and then run sudo passwd – followed by entering a password for root. This allows for ssh access (as root) from another machine, making it much easier to copy in your install script and configuration.

Details on the install script

While there are comments, let me walk through the install script above. You will need to edit the DISK1, DISK2 declarations at the top of the script. This assumes you have two identical sized drives you intend to use as a boot volume.

You will notice that relative to the source blog post, I’m not including encryption in my setup, for me this is an acceptable risk as it removes the need to enter a password on boot and I need my system to come back automatically after a power failure.

This setup doesn’t include swap and I went down a deep rabbit hole on swap, did I need it? How much swap should you setup? My original naive NixOS install (without ZFS) using the graphical installer resulted in no swap being setup. The GitHub project above suggests setting up swap on each of the mirrors, but I’m worried about what happens if you lose a drive. I found someone on reddit suggesting a partitioning scheme for ZFS that has no swap unless absolutely necessary. Then I found folks who said you must have swap to avoid problems. Another claimed that swap helps avoid performance problems. This great post gives some good reasons to consider using swap. I also found a RedHat post that suggested some production systems don’t use swap, and even some database related performance posts saying to avoid swap. After all that, while there are some downsides to not having swap, I decided to stick with a no-swap setup for now. I can always toss in a cheap SSD later and use it as swap if I end up needing it.

You may also notice that the partitioning scheme above is very simple. We reserve 1G for the EFI partition, and the remainder is for the ZFS mirror. It turns out that the two M2 SSDs I bought, don’t entirely agree on what 1TB means.

Yup, one thinks 1024 GB is 1TB, and the other says it’s 1000GB. Sadly, both are right. The good news is that ZFS seems to be happy to pair these two non-equal partitions together and offered up mirrored storage since we are passing the -f flag to force it to happen, so I’ll rely on that vs. try to partition both drives to exactly the same size.

The zpool create is where I differ from the source blog. This took a while for me to land on the right set of options, and maybe they don’t matter all that much.

ashift=12 This is instructing ZFS to assume 4k sectors, and while my SSDs actually report 512byte sectors it’s ok to go bigger. Some NVME drives can be reformatted, but they need to support the larger sectors (mine do not). I did find a few posts that convinced me that 4k (ashift=12) was the best choice. If you think about how flash memory works, 4k (or even larger) makes a lot of sense.

autotrim=on I adopted this from the OpenZFS recommendations. After reading a bunch about this, it does seem like a good idea if you are using a SSD. You should also consider doing a regular trim operation too apparently. In theory this will extend the lifetime of your SSD, and has a minimal performance impact.

mountpoint=none Every root on ZFS article uses this, given my past experience where zfs filesystems auto-mount it makes sense. The root filesystem should be special and we don’t want it mounted in any other way.

relatime=on Some approaches recommend using atime=off for performance reasons. However, the downside is that you can break some software which requires atime (access time) to be correct (an example given was email). The relatime setting is an in-between. It will skip most atime updates, but still keep it mostly correct. This also lines up with the OpenZFS recommendations.

acltype=posixacl This is another setting that many configurations used. I did find a good blog article talking about it.

xattr=sa This is both linked to the acltype above, but also is generically considered a good performance tweak.

Let’s talk about a few that I didn’t include from the source blog, aside from atime which I have already touched on.

compression=lz4 While you can find articles that recommend to specify this, the default compression setting is ‘on’ and I believe the default algorithm is lz4. I decided to go with defaults, and if there is a slight performance penalty that’s ok.

encryption=aes-256-gcm, keyformat=passphrase Hopefully obvious that these are both related to encryption of the data, and for my use would mean dealing with entering a pass phrase on boot.

If we review the OpenZFS doc on NixOS, they have more options I have not specified. I didn’t go quite as deep on each of these, but when I did a review it seemed that many of those settings were aligned with the defaults.

canmount=off Seems to be a almost a duplicate of mountpoint=none

dnodesize=auto Almost seems like a good idea, especially as it’s linked to xattr=sa which might store lots of attributes needing more size than the default legacy. This article has some details, and in the end they also elected to not use it.

normalization=formD I found a good post on ZFS settings overall, and this one specific to unicode filenames. It almost convinced me I should reconsider and add this, but how many weird unicode filenames do I have anyways? And if they don’t match, I can deal with the weirdness.

Recovering a broken mirror

While this is a lot of detail, we haven’t yet touched on recovery from a failure. I did a lot of experiments using UTM to run a VM on my Macbook. This let me build VMs with dual NVME drives and quickly iterate on both setup, and recovery.

To simulate a failure, I would simply shut the VM down, delete one of the drives, and re-create a new drive. Starting the VM up again resulted in one of two things:

  1. The VM would boot, but take a bit longer than normal. Looking at zpool status would indicate that one of my mirrors was broken
  2. The VM would dump me into the EFI Shell, this confused me as I hadn’t navigated this previously, but it was a simple matter of manually selecting the file/drive to boot and I’d end up in the same state as (1)

Awesome, so my install does result in a system that will survive the loss of one of the mirrored drives. I did stumble quite a bit on fixing the mirror.

The first step is to re-partition the new drive

Of course, the mount point may be /boot-fallback depending on which drive we have lost. Then similar to my post on doing this on a RAIDZ we simply do a sudo zpool replace rpool (target) (source). Because this is a bit complicated, let me walk through an example.

We will start with a working system and look at the zpool status

Now we will remove one of the drives and boot again. The failed state boot will take some time (many minutes), you need to be patient – the amazing thing is that it will eventually boot and everything works.

Now when we look at zpool status we can see the mirror is broken, but we do have a fully running system.

Time to do some re-partitioning using the script above (obviously with changes to address which mount point and drive). Pro-tip: you may need to install some of the partitioning tools: nix-shell -p gptfdisk parted

Now that our disk is partitioned, we can repair the pool.

Using zpool status we can verify that things are good again

Now we just need to fix our boot problem. As I said, this tripped me up for a bit but the NixOS discourse was very helpful.

There are two references to the boot drives. One in /etc/nixos/configuration.nix, and one in /etc/nixos/hardware-configuration.nix. Both need to be updated to reflect the new drive.

We can fix the latter by doing the following

Once we edit the /etc/nixos/configuration.nix to correctly represent the new uuid’s for the drive(s) – we can issue a rebuild, then reboot to test.

This final reboot should be normal speed, and we will see that both /boot and /boot-fallback are mounted automatically and fully populated.

Cool, so we have a basic NixOS install, running on a mirrored ZFS root filesystem – and we have some evidence that we can recover if a bad thing happens.

New Server Build (2025): Assembly

I’ll start the post off with the end result (pictured above). I don’t yet have the 8TB storage drives installed, but you can see that there is plenty of room for them. There are 3×3.5″ bays, and 3×5.25″ bays which can easily be adapted to store more 3.5″ drives.

The first thing I needed to do was disassemble the Thermaltake Versa H21 case. Both side panels have nice thumb screws to release them. The front panel does just pop off, but it was scary. You really feel like you are going to break something. I was able to get the bottom to come easily, but the last connection at the top was very hard to get out. It finally popped off – I hope I don’t have to do that often.

Before I mount the motherboard (MB), I need to install the CPU. If you look carefully at the picture above you will see a very faint triangle in the lower left side. Remember that, it’s pin 1 and we want to align that pin1 with the socket on the MB.

The MB itself is also not well marked, but there was a bit of documentation on where they expected pin 1 to go. They also marked it with a triangle (black on black) so you just have to line up the two triangles.

Above is the CPU in the correct orientation relative to the socket. If you look closely you’ll see how this is confusing in person. Yes, you can see the faint triangle on the CPU, and if you zoom in you can see a black triangle on the socket cover. Oh, but the writing on the chip, is 180deg from the writing on the socket cover — so confusing.

Next we’re adding the Thermalright Burst Assassin 120 SE heat sink. There is a large bracket you mount on the underside of the MB. Then you mount some spacers and brackets. I did find both the instructions provided, and the packaging to be very clear – even though this heat sink can be used with several different socket types.

The heat sink is huge. I’m pretty sure this will keep things cool. It also seems to provide reasonable clearance for the RAM sockets. The MB also has a CPU fan header just to the right, almost perfectly placed for the fan power connector. If you look very closely, you’ll see that while the MB didn’t come with a lot of documentation, the markings on the board itself are nicely descriptive.

This is showing the order and pairing of the DDR5 modules. If you are installing a single stick, use A2. For a single pair use A2 and B2. I thought that was pretty slick. The Corsair Vengeance RAM clicked in nicely, I’m pretty happy with that selection.

The physical size of the M2 SSD was surprising to me. It’s just so small. The MB only provides a single heat spreader for the first M2 slot. I suppose I could get an aftermarket one for the second but I’ll wait to see if heat is a problem.

There is reasonable room inside of the case to work. While the case has built in raised mounts for the motherboard, I had to add a few stand-offs (included) to adapt to my motherboard (mATX). There was little to no documentation, but having done this a few times – it’s mostly common sense. The included screws come in a single bag, and there is a mix of sizes / types. Again, if you have no experience doing this it may be mysterious as to which screw is used for which hole. There are at least 3 different threads / sizes provided and they are difficult to identify.

I’m not super happy about how the rear panel that came with the MB fit into the case, it fits and isn’t coming out – but did not really pop in nicely – it’s more a pressure fit. I’m not sure if this is due the case, or the MB, or both. One or two of the screws for mounting the MB feel like they stripped while I was installing things. Again, maybe this was user error – but it may be lack of precision in the case.

Under the front panel is a filter, which supports a pair of 120mm fans. This is a nice snap in setup and the cables easily route to the side. On the topic of routing cables, I did find it quite easy to snake the various cables around the case and keep them mostly out of the way. The fact that the case isn’t flat on the sides assists here too. Zip ties are provided to keep things neat.

It’s always a bit spooky to boot up the first time, but it came up without any drama. I needed to update the BIOS which was more than a year out of date, and turn on XMP to move my memory speed up from ~4800 -> ~5200.  It runs nice and cool, and is quiet.

I’ll do a mini review of a subset of the components:

Thermalright Burst Assassin 120 SE [5/5 stars] This is a very reasonably priced air cooler, but you get good documentation, everything fits like it should and it feels solid once installed.

Gigabyte B760M DS3H AX [4/5 stars] I haven’t had a chance to really explore all of the options, but I knocked off a star for the minimal documentation provided – and the confusing CPU orientation information. For the price, it feels like it wouldn’t be all that hard to make this a better experience.

Thermaltake Versa H21 [4/5 stars] Given the value you are getting based on the price, and the number of drive bays – this might be the best choice for a home server. It easily fit my large cooler, cable routing options were good, it has great ventilation and is mostly tool free for common things. Negatives were the lack of documentation, janky MB mounts, and the scary front panel removal.

New Server Build (2025): Part Selection

It’s been a while since I built a new server, the last one was back in 2016. I’m hoping the transition to a new server will be quicker this time because it took me forever to migrate to the current one. Much has changed, I’ve moved to mostly docker containers hosting most of my services and things are much more organized. While there isn’t anything wrong with the server today, I’m starting to run out of storage and I could use more compute power.

This new build should solve those problems, it will also let me upgrade my local backup server which is currently my very old server build from 2009. Somehow I had in my head decided on about a $1000 budget, but things got fuzzy when I started to think about storage – as I was hoping to also fit my new storage drives into that price. I’d initially thought 4TB drives would be the sweet spot, but as time moved on I convinced myself that 8TB was the right choice as they are in around $200 each.

The current sever only has 120G as the boot volume (110G usable). I’ve got a RAID1 on top of two 480GB SSDs giving me 439G storage. Then a motley collection of storage drives managed by snapraid: 3TB, 6TB with a 8TB parity drive. I had done upgrades over time, moving the original 60GB boot volume up to 120GB, adding drives and upgrading sizes as I went along.. but upgrades are both a monetary cost, as well as a time investment.

Where do you even start with a home server build? For me, it is about storage. I need a place for my media collection, photos, backups, etc. I know I wanted to shift to a ZFS based solution based on some of my recent experiences. As mentioned above, initially a 6x4TB RAIDZ2 was what I was thinking, this would give me double parity (survive 2 concurrent drive failures) and 16TB of usable storage. With 4TB drives in around the $120 price point that seems feasible. With larger drives, we get increased risk that something will go wrong – there is just so much data all on one device.

Two things happened to change my mind on the drive size. Looking at drive prices and filtering on ‘new’ only we get the data in the screenshot above. Very interesting that many of the best prices are for external unit, this is why shucking drives is so popular. Keeping in mind my total usable storage currently is under 11TB, getting a massive drive seems unreasonable. However we can see 8TB drives are in this list, and we have to go a long way down to find 4TB at $28.50/TB. The second reason is that ZFS introduced the ability to expand vdevs, meaning we can add drives. This larger drive size also means I can have a 4x8TB RAIDZ2 and have 16TB of usable storage, and in future grow to 32TB by adding 2 more drives.

Next was the more difficult problem, building out a system to host these drives. I came across PCPartPicker which was a great way to start exploring various options and pricing. I did find that you could find better prices going directly to some stores websites, but it certainly helped me narrow down my search more quickly. I must also call out PerfectMediaServer which is another good starting point to inspire you. Did I want to find a close to what I wanted used system, or buy new? It took me a few days to wrangle this wide open set of choices simply focused on CPU/Motherboard(MB) but eventually landed on new being better value in the long term. A bit more expensive, but not that much more.

I started out with a bias towards Intel, I was fairly sure I wanted on board graphics to get QuickSync which is similar to what I have now. I did seriously look at AMD systems, and there are many more options there. Then I found a good price on 12th generation Core i5 and buying a comparable performance AMD system ended up about the same price, and the i5 had more cores. Still lots of nicely priced AM4 socket setups with good performance (an improvement over what I have) – and there are lots of CPU options that are lower cost but lack built in graphics.

Eventually I landed on the i5 and had to pick a motherboard which opened up an entirely new can of worms. Did I want DDR4 or DDR5? How many M2 slots? Some MBs disable one of the SATA ports if you used the second M2 slot. The 6xx chipset boards are cheaper, but lack capabilities (and might be going away?) gah! Networking, 1Gbps or 2.5Gbps? How many SATA ports do I want? I will say that the sales folk at ShopRBC were very helpful via email in making suggestions as to what may or may not work well. I finally picked a 7xx chipset motherboard, with 2.5Gbps network, two M2 slots, and 4 SATA ports that supports DDR5.

This CPU+MB is excessive, I probably could have gotten away with less – OR paid nearly the same price and added a lower end graphics card in. I’m still happy with the choices I made, I landed in the right ballpark – but my choices are by no means the best choices – they are simply a choice that will work. The whole Intel vs. AMD ends up being a wash too – because the market pricing works out that you get the same performance for the same price.

I still needed RAM, a cooler, and a couple of M2 SSDs, maybe a case? Luckily I have a ‘spare’ 650w power supply I bought from someone parting out a gaming system, so that saves some money.

One of my friends had an old AIO watercooler, but after checking it out it didn’t specify the newer 1700 socket and the wikipedia article called out that the mounting holes are different enough that it was too big a risk to re-use. As for air coolers, many recommend the dual tower Thermalright Peerless Assassin 120 – but what a monster cooler this is needing 157mm of clearance. After a lot of searching I landed on the Thermalright Burst Assassin 120 SE – a 6 pipe cooler, but only a single fan / tower. It’s still pretty big, but came in at the right price point.

Like every component in this build, I had difficulty deciding on how much RAM to put in. 32GB is the new normal, and the MB has 4 slots – but given that last time I did this I started with 16GB and never changed it, maybe I should leap to 64GB right away? While DDR5 is nearly the same price as DDR4, it is more expensive and 64GB lands you in around $250 where 32GB is easily half of that. Then lightning struck and someone locally was selling a 2x32GB DDR5 kit for $150. This quickly solved my RAM choice dilemma.

Just like RAM, I couldn’t decide if I wanted my boot drive (M2 form factor) to be 1TB or only 500GB. This boiled down to a choice between $100 or $150 as I plan to run a RAID1 boot volume. In the end it was why the heck not, I’ve already blown past my budget anyways, I got a pair of 1TB drives but two different brands. I tried to stick with brand names vs. some of the ultra-budget options out there for fear of making a bad reliability choice.

I do have an old case sitting around, but it’s literally 25 years old. Cooling needs have changed a lot. I was originally looking very seriously at the
Antec VSK4000E U3 case, but it turns out the max cooler is 145mm and my cooler choice needs 148mm. Finding a reasonable price on a ‘modern’ case that has lots of drive bays is tricky. I finally came across the Thermaltake Versa H21.

I could go on about all of the back-and-forth I did trying to decide on each part of the build, and while I’ve done a subset of that here so far, I will spare everyone all of the details. Let’s look at the list of parts:

  • Intel Core i5-12600K 3.7 GHz 10-Core Processor $216.96
  • Thermalright Burst Assassin 120 SE 66.17 CFM CPU Cooler $29.90
  • Gigabyte B760M DS3H AX Micro ATX LGA1700 Motherboard $189.00
  • Corsair Vengeance 64 GB (2 x 32 GB) DDR5-5200 CL40 Memory $150 (cash)
  • Kingston NV3 1 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive $75.95
  • Lexar NQ700 1 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive $79.99
  • Thermaltake Versa H21 ATX Mid Tower Case $74.99
  • be quiet! Pure Wings 3 49.9 CFM 120 mm Fan (x2) $9.99

Most items I bought locally at CanadaComputers or ShopRBC, and for the most part those local stores had the best prices for me (partly because I could avoid shipping costs). The cooler was an Amazon purchase. After tax I ended up with a total cost of $926.05 (ouch) but this is a machine I’ll probably use for at least 5 years, possibly closer to 10. It has 4x the memory of the current server, and is easily 2x faster. The same configuration via PCPartsPicker is currently $1041.64 before tax, but prices change daily and the big delta is the RAM which is over $300 right now.

Next up, putting this pile of parts together so I have working machine.