NixOS + Docker with MacVLAN (IPv4)

I continue to make use of the docker macvlan network support as it allows me to treat some of my containers as if they are virtual machines (VMs). Using this feature I can assign an IP address that is distinct from my host, but is still just a container running on the host. I’ve written about creating one, and expanding it.

As I’m now building out a new server and have selected NixOS as my base, I need to make some changes to how I’ve setup the docker macvlan. This blog post captures those changes.

While NixOS supports the declaration of containers, I’m not doing that right now by choice. It’ll make my migration easier and I can always go back and refactor. Thus there are just two things I need to include in my NixOS configuration:

  1. Enable docker support
  2. Modify the host network to route to the macvlan network

The first (enable docker support) is so very easy with NixOS. You need a single line added to your /etc/nixos/configuration.nix

You probably want to modify your user to be in the “docker” group allowing direct access to docker commands vs. needing to sudo each time.

There is a third thing we need to do, create the docker macvlan network. I don’t have this baked into my NixOS configuration because I was too lazy to write an idempotent version of doing it and figuring out where in the start up sequence to make it run. This turns out to be just a one line script:

Docker will persist this network configuration across reboots.

If you stop here, you will be able to create containers with their own IP address. I pass along these two docker command line options to create a container with it’s own IP:

The docker macvlan network I’ve defined has 4 IPs reserved, but you can specify a larger ip-range if you want when you create the docker mavlan network.

However, if you did stop here, you would not be able to reach the container running on 192.168.1.64 from the host. This is the second change to our Nix configuration (modify the host network to route to the macvlan network). In my original post I used a script to create the route from host to container, as this wasn’t persistent I needed to run that script after every boot.

One way to do a similar thing in NixOS is to create a systemd service. I was exploring this and did figure it out. However, I was wrong in my approach. While this worked, it wasn’t the best way to do it. NixOS has networking.macvlans which is a more NixOS-y way to solve the problem. The very helpful community helped me discover this.

If you dig into the implementation (createMacvlanDevice, configureAddrs), you can get some insight into how this maps onto basically the same thing my boot time script did.

This feels a lot less of a hack than using a script. Both work, but using the networking.macvlans approach is nice and clean. I should probably do the work to declare the docker macvlan inside my NixOS configuration to make this complete, but that’s a task for another day.

NixOS with mirrored ZFS boot volume


My past experiments with ZFS were also based on NixOS, a Linux distribution that I am growing increasingly fond of. It has a declarative style for configuration, which means you can more easily reproduce your install. For my new server build out, I will be using NixOS as my base, layering up plenty of docker containers to operate the various services I self-host – and the filesystems will be ZFS with redundancy. In this post I will focus on the base install, specifically the setup of the hardware for the very first NixOS install.

First I needed to learn how to get NixOS installed on a mirrored ZFS boot volume. Recall that my hardware has a pair of M2 SSDs which will hold the base operating system. There are many possible approaches documented: ZFSBootMenu looked really interesting, but it’s not native NixOS; I found a GitHub project that had some nice scripts, but it was a little too script-y for me and was hard to understand; The OpenZFS site documents installing NixOS which includes support for mirrored boot drives, but I wasn’t sure about how to mirror the EFI filesystem and recovery after a drive failure; the NixOS forum was informative; many of the other places that talk about this type of install point at a blog post, which I ended up using as my base.

While the original author does the partitioning and filesystem setup as a set of manual steps, I’ve captured mine as an install script. I had a side journey investigating disko which looks great but was one more thing to learn, and I already know how to cook a script.

When editing the /mnt/etc/nixos/configuration.nix file you will need to add some specific sections to ensure that you get a working ZFS enabled system on the other side

First we need to change from the systemd-boot EFI boot loader, to a grub based boot. This let’s us make use of the grub.mirroredBoots support to keep both EFI partitions updated. We are also specifying the mirrored boot devices by uuid, and tagging both drives a “nofail” to allow the system to boot even if we lose a drive. Of course we also need to indicate that ZFS is supported, and create a unique networking.hostId which ZFS needs.

Other settings from the default you probably want to change. Setting a hostname, timezone, and enabling one of the networking options. You may also want to define a user and enable sshd. All of these are pretty standard NixOS configuration issues, all of the ZFS magic is captured above.

Once you’ve got this configured, you should be able to run nixos-install and reboot into your new system.

One nice trick I learned is that from the full graphical installer you can open up a terminal shell, and then run sudo passwd – followed by entering a password for root. This allows for ssh access (as root) from another machine, making it much easier to copy in your install script and configuration.

Details on the install script

While there are comments, let me walk through the install script above. You will need to edit the DISK1, DISK2 declarations at the top of the script. This assumes you have two identical sized drives you intend to use as a boot volume.

You will notice that relative to the source blog post, I’m not including encryption in my setup, for me this is an acceptable risk as it removes the need to enter a password on boot and I need my system to come back automatically after a power failure.

This setup doesn’t include swap and I went down a deep rabbit hole on swap, did I need it? How much swap should you setup? My original naive NixOS install (without ZFS) using the graphical installer resulted in no swap being setup. The GitHub project above suggests setting up swap on each of the mirrors, but I’m worried about what happens if you lose a drive. I found someone on reddit suggesting a partitioning scheme for ZFS that has no swap unless absolutely necessary. Then I found folks who said you must have swap to avoid problems. Another claimed that swap helps avoid performance problems. This great post gives some good reasons to consider using swap. I also found a RedHat post that suggested some production systems don’t use swap, and even some database related performance posts saying to avoid swap. After all that, while there are some downsides to not having swap, I decided to stick with a no-swap setup for now. I can always toss in a cheap SSD later and use it as swap if I end up needing it.

You may also notice that the partitioning scheme above is very simple. We reserve 1G for the EFI partition, and the remainder is for the ZFS mirror. It turns out that the two M2 SSDs I bought, don’t entirely agree on what 1TB means.

Yup, one thinks 1024 GB is 1TB, and the other says it’s 1000GB. Sadly, both are right. The good news is that ZFS seems to be happy to pair these two non-equal partitions together and offered up mirrored storage since we are passing the -f flag to force it to happen, so I’ll rely on that vs. try to partition both drives to exactly the same size.

The zpool create is where I differ from the source blog. This took a while for me to land on the right set of options, and maybe they don’t matter all that much.

ashift=12 This is instructing ZFS to assume 4k sectors, and while my SSDs actually report 512byte sectors it’s ok to go bigger. Some NVME drives can be reformatted, but they need to support the larger sectors (mine do not). I did find a few posts that convinced me that 4k (ashift=12) was the best choice. If you think about how flash memory works, 4k (or even larger) makes a lot of sense.

autotrim=on I adopted this from the OpenZFS recommendations. After reading a bunch about this, it does seem like a good idea if you are using a SSD. You should also consider doing a regular trim operation too apparently. In theory this will extend the lifetime of your SSD, and has a minimal performance impact.

mountpoint=none Every root on ZFS article uses this, given my past experience where zfs filesystems auto-mount it makes sense. The root filesystem should be special and we don’t want it mounted in any other way.

relatime=on Some approaches recommend using atime=off for performance reasons. However, the downside is that you can break some software which requires atime (access time) to be correct (an example given was email). The relatime setting is an in-between. It will skip most atime updates, but still keep it mostly correct. This also lines up with the OpenZFS recommendations.

acltype=posixacl This is another setting that many configurations used. I did find a good blog article talking about it.

xattr=sa This is both linked to the acltype above, but also is generically considered a good performance tweak.

Let’s talk about a few that I didn’t include from the source blog, aside from atime which I have already touched on.

compression=lz4 While you can find articles that recommend to specify this, the default compression setting is ‘on’ and I believe the default algorithm is lz4. I decided to go with defaults, and if there is a slight performance penalty that’s ok.

encryption=aes-256-gcm, keyformat=passphrase Hopefully obvious that these are both related to encryption of the data, and for my use would mean dealing with entering a pass phrase on boot.

If we review the OpenZFS doc on NixOS, they have more options I have not specified. I didn’t go quite as deep on each of these, but when I did a review it seemed that many of those settings were aligned with the defaults.

canmount=off Seems to be a almost a duplicate of mountpoint=none

dnodesize=auto Almost seems like a good idea, especially as it’s linked to xattr=sa which might store lots of attributes needing more size than the default legacy. This article has some details, and in the end they also elected to not use it.

normalization=formD I found a good post on ZFS settings overall, and this one specific to unicode filenames. It almost convinced me I should reconsider and add this, but how many weird unicode filenames do I have anyways? And if they don’t match, I can deal with the weirdness.

Recovering a broken mirror

While this is a lot of detail, we haven’t yet touched on recovery from a failure. I did a lot of experiments using UTM to run a VM on my Macbook. This let me build VMs with dual NVME drives and quickly iterate on both setup, and recovery.

To simulate a failure, I would simply shut the VM down, delete one of the drives, and re-create a new drive. Starting the VM up again resulted in one of two things:

  1. The VM would boot, but take a bit longer than normal. Looking at zpool status would indicate that one of my mirrors was broken
  2. The VM would dump me into the EFI Shell, this confused me as I hadn’t navigated this previously, but it was a simple matter of manually selecting the file/drive to boot and I’d end up in the same state as (1)

Awesome, so my install does result in a system that will survive the loss of one of the mirrored drives. I did stumble quite a bit on fixing the mirror.

The first step is to re-partition the new drive

Of course, the mount point may be /boot-fallback depending on which drive we have lost. Then similar to my post on doing this on a RAIDZ we simply do a sudo zpool replace rpool (target) (source). Because this is a bit complicated, let me walk through an example.

We will start with a working system and look at the zpool status

Now we will remove one of the drives and boot again. The failed state boot will take some time (many minutes), you need to be patient – the amazing thing is that it will eventually boot and everything works.

Now when we look at zpool status we can see the mirror is broken, but we do have a fully running system.

Time to do some re-partitioning using the script above (obviously with changes to address which mount point and drive). Pro-tip: you may need to install some of the partitioning tools: nix-shell -p gptfdisk parted

Now that our disk is partitioned, we can repair the pool.

Using zpool status we can verify that things are good again

Now we just need to fix our boot problem. As I said, this tripped me up for a bit but the NixOS discourse was very helpful.

There are two references to the boot drives. One in /etc/nixos/configuration.nix, and one in /etc/nixos/hardware-configuration.nix. Both need to be updated to reflect the new drive.

We can fix the latter by doing the following

Once we edit the /etc/nixos/configuration.nix to correctly represent the new uuid’s for the drive(s) – we can issue a rebuild, then reboot to test.

This final reboot should be normal speed, and we will see that both /boot and /boot-fallback are mounted automatically and fully populated.

Cool, so we have a basic NixOS install, running on a mirrored ZFS root filesystem – and we have some evidence that we can recover if a bad thing happens.

New Server Build (2025): Assembly

I’ll start the post off with the end result (pictured above). I don’t yet have the 8TB storage drives installed, but you can see that there is plenty of room for them. There are 3×3.5″ bays, and 3×5.25″ bays which can easily be adapted to store more 3.5″ drives.

The first thing I needed to do was disassemble the Thermaltake Versa H21 case. Both side panels have nice thumb screws to release them. The front panel does just pop off, but it was scary. You really feel like you are going to break something. I was able to get the bottom to come easily, but the last connection at the top was very hard to get out. It finally popped off – I hope I don’t have to do that often.

Before I mount the motherboard (MB), I need to install the CPU. If you look carefully at the picture above you will see a very faint triangle in the lower left side. Remember that, it’s pin 1 and we want to align that pin1 with the socket on the MB.

The MB itself is also not well marked, but there was a bit of documentation on where they expected pin 1 to go. They also marked it with a triangle (black on black) so you just have to line up the two triangles.

Above is the CPU in the correct orientation relative to the socket. If you look closely you’ll see how this is confusing in person. Yes, you can see the faint triangle on the CPU, and if you zoom in you can see a black triangle on the socket cover. Oh, but the writing on the chip, is 180deg from the writing on the socket cover — so confusing.

Next we’re adding the Thermalright Burst Assassin 120 SE heat sink. There is a large bracket you mount on the underside of the MB. Then you mount some spacers and brackets. I did find both the instructions provided, and the packaging to be very clear – even though this heat sink can be used with several different socket types.

The heat sink is huge. I’m pretty sure this will keep things cool. It also seems to provide reasonable clearance for the RAM sockets. The MB also has a CPU fan header just to the right, almost perfectly placed for the fan power connector. If you look very closely, you’ll see that while the MB didn’t come with a lot of documentation, the markings on the board itself are nicely descriptive.

This is showing the order and pairing of the DDR5 modules. If you are installing a single stick, use A2. For a single pair use A2 and B2. I thought that was pretty slick. The Corsair Vengeance RAM clicked in nicely, I’m pretty happy with that selection.

The physical size of the M2 SSD was surprising to me. It’s just so small. The MB only provides a single heat spreader for the first M2 slot. I suppose I could get an aftermarket one for the second but I’ll wait to see if heat is a problem.

There is reasonable room inside of the case to work. While the case has built in raised mounts for the motherboard, I had to add a few stand-offs (included) to adapt to my motherboard (mATX). There was little to no documentation, but having done this a few times – it’s mostly common sense. The included screws come in a single bag, and there is a mix of sizes / types. Again, if you have no experience doing this it may be mysterious as to which screw is used for which hole. There are at least 3 different threads / sizes provided and they are difficult to identify.

I’m not super happy about how the rear panel that came with the MB fit into the case, it fits and isn’t coming out – but did not really pop in nicely – it’s more a pressure fit. I’m not sure if this is due the case, or the MB, or both. One or two of the screws for mounting the MB feel like they stripped while I was installing things. Again, maybe this was user error – but it may be lack of precision in the case.

Under the front panel is a filter, which supports a pair of 120mm fans. This is a nice snap in setup and the cables easily route to the side. On the topic of routing cables, I did find it quite easy to snake the various cables around the case and keep them mostly out of the way. The fact that the case isn’t flat on the sides assists here too. Zip ties are provided to keep things neat.

It’s always a bit spooky to boot up the first time, but it came up without any drama. I needed to update the BIOS which was more than a year out of date, and turn on XMP to move my memory speed up from ~4800 -> ~5200.  It runs nice and cool, and is quiet.

I’ll do a mini review of a subset of the components:

Thermalright Burst Assassin 120 SE [5/5 stars] This is a very reasonably priced air cooler, but you get good documentation, everything fits like it should and it feels solid once installed.

Gigabyte B760M DS3H AX [4/5 stars] I haven’t had a chance to really explore all of the options, but I knocked off a star for the minimal documentation provided – and the confusing CPU orientation information. For the price, it feels like it wouldn’t be all that hard to make this a better experience.

Thermaltake Versa H21 [4/5 stars] Given the value you are getting based on the price, and the number of drive bays – this might be the best choice for a home server. It easily fit my large cooler, cable routing options were good, it has great ventilation and is mostly tool free for common things. Negatives were the lack of documentation, janky MB mounts, and the scary front panel removal.