NixOS + Docker with MacVLAN (IPv4)

I continue to make use of the docker macvlan network support as it allows me to treat some of my containers as if they are virtual machines (VMs). Using this feature I can assign an IP address that is distinct from my host, but is still just a container running on the host. I’ve written about creating one, and expanding it.

As I’m now building out a new server and have selected NixOS as my base, I need to make some changes to how I’ve setup the docker macvlan. This blog post captures those changes.

While NixOS supports the declaration of containers, I’m not doing that right now by choice. It’ll make my migration easier and I can always go back and refactor. Thus there are just two things I need to include in my NixOS configuration:

  1. Enable docker support
  2. Modify the host network to route to the macvlan network

The first (enable docker support) is so very easy with NixOS. You need a single line added to your /etc/nixos/configuration.nix

You probably want to modify your user to be in the “docker” group allowing direct access to docker commands vs. needing to sudo each time.

There is a third thing we need to do, create the docker macvlan network. I don’t have this baked into my NixOS configuration because I was too lazy to write an idempotent version of doing it and figuring out where in the start up sequence to make it run. This turns out to be just a one line script:

Docker will persist this network configuration across reboots.

If you stop here, you will be able to create containers with their own IP address. I pass along these two docker command line options to create a container with it’s own IP:

The docker macvlan network I’ve defined has 4 IPs reserved, but you can specify a larger ip-range if you want when you create the docker mavlan network.

However, if you did stop here, you would not be able to reach the container running on 192.168.1.64 from the host. This is the second change to our Nix configuration (modify the host network to route to the macvlan network). In my original post I used a script to create the route from host to container, as this wasn’t persistent I needed to run that script after every boot.

One way to do a similar thing in NixOS is to create a systemd service. I was exploring this and did figure it out. However, I was wrong in my approach. While this worked, it wasn’t the best way to do it. NixOS has networking.macvlans which is a more NixOS-y way to solve the problem. The very helpful community helped me discover this.

If you dig into the implementation (createMacvlanDevice, configureAddrs), you can get some insight into how this maps onto basically the same thing my boot time script did.

This feels a lot less of a hack than using a script. Both work, but using the networking.macvlans approach is nice and clean. I should probably do the work to declare the docker macvlan inside my NixOS configuration to make this complete, but that’s a task for another day.

NixOS with mirrored ZFS boot volume


My past experiments with ZFS were also based on NixOS, a Linux distribution that I am growing increasingly fond of. It has a declarative style for configuration, which means you can more easily reproduce your install. For my new server build out, I will be using NixOS as my base, layering up plenty of docker containers to operate the various services I self-host – and the filesystems will be ZFS with redundancy. In this post I will focus on the base install, specifically the setup of the hardware for the very first NixOS install.

First I needed to learn how to get NixOS installed on a mirrored ZFS boot volume. Recall that my hardware has a pair of M2 SSDs which will hold the base operating system. There are many possible approaches documented: ZFSBootMenu looked really interesting, but it’s not native NixOS; I found a GitHub project that had some nice scripts, but it was a little too script-y for me and was hard to understand; The OpenZFS site documents installing NixOS which includes support for mirrored boot drives, but I wasn’t sure about how to mirror the EFI filesystem and recovery after a drive failure; the NixOS forum was informative; many of the other places that talk about this type of install point at a blog post, which I ended up using as my base.

While the original author does the partitioning and filesystem setup as a set of manual steps, I’ve captured mine as an install script. I had a side journey investigating disko which looks great but was one more thing to learn, and I already know how to cook a script.

When editing the /mnt/etc/nixos/configuration.nix file you will need to add some specific sections to ensure that you get a working ZFS enabled system on the other side

First we need to change from the systemd-boot EFI boot loader, to a grub based boot. This let’s us make use of the grub.mirroredBoots support to keep both EFI partitions updated. We are also specifying the mirrored boot devices by uuid, and tagging both drives a “nofail” to allow the system to boot even if we lose a drive. Of course we also need to indicate that ZFS is supported, and create a unique networking.hostId which ZFS needs.

Other settings from the default you probably want to change. Setting a hostname, timezone, and enabling one of the networking options. You may also want to define a user and enable sshd. All of these are pretty standard NixOS configuration issues, all of the ZFS magic is captured above.

Once you’ve got this configured, you should be able to run nixos-install and reboot into your new system.

One nice trick I learned is that from the full graphical installer you can open up a terminal shell, and then run sudo passwd – followed by entering a password for root. This allows for ssh access (as root) from another machine, making it much easier to copy in your install script and configuration.

Details on the install script

While there are comments, let me walk through the install script above. You will need to edit the DISK1, DISK2 declarations at the top of the script. This assumes you have two identical sized drives you intend to use as a boot volume.

You will notice that relative to the source blog post, I’m not including encryption in my setup, for me this is an acceptable risk as it removes the need to enter a password on boot and I need my system to come back automatically after a power failure.

This setup doesn’t include swap and I went down a deep rabbit hole on swap, did I need it? How much swap should you setup? My original naive NixOS install (without ZFS) using the graphical installer resulted in no swap being setup. The GitHub project above suggests setting up swap on each of the mirrors, but I’m worried about what happens if you lose a drive. I found someone on reddit suggesting a partitioning scheme for ZFS that has no swap unless absolutely necessary. Then I found folks who said you must have swap to avoid problems. Another claimed that swap helps avoid performance problems. This great post gives some good reasons to consider using swap. I also found a RedHat post that suggested some production systems don’t use swap, and even some database related performance posts saying to avoid swap. After all that, while there are some downsides to not having swap, I decided to stick with a no-swap setup for now. I can always toss in a cheap SSD later and use it as swap if I end up needing it.

You may also notice that the partitioning scheme above is very simple. We reserve 1G for the EFI partition, and the remainder is for the ZFS mirror. It turns out that the two M2 SSDs I bought, don’t entirely agree on what 1TB means.

Yup, one thinks 1024 GB is 1TB, and the other says it’s 1000GB. Sadly, both are right. The good news is that ZFS seems to be happy to pair these two non-equal partitions together and offered up mirrored storage since we are passing the -f flag to force it to happen, so I’ll rely on that vs. try to partition both drives to exactly the same size.

The zpool create is where I differ from the source blog. This took a while for me to land on the right set of options, and maybe they don’t matter all that much.

ashift=12 This is instructing ZFS to assume 4k sectors, and while my SSDs actually report 512byte sectors it’s ok to go bigger. Some NVME drives can be reformatted, but they need to support the larger sectors (mine do not). I did find a few posts that convinced me that 4k (ashift=12) was the best choice. If you think about how flash memory works, 4k (or even larger) makes a lot of sense.

autotrim=on I adopted this from the OpenZFS recommendations. After reading a bunch about this, it does seem like a good idea if you are using a SSD. You should also consider doing a regular trim operation too apparently. In theory this will extend the lifetime of your SSD, and has a minimal performance impact.

mountpoint=none Every root on ZFS article uses this, given my past experience where zfs filesystems auto-mount it makes sense. The root filesystem should be special and we don’t want it mounted in any other way.

relatime=on Some approaches recommend using atime=off for performance reasons. However, the downside is that you can break some software which requires atime (access time) to be correct (an example given was email). The relatime setting is an in-between. It will skip most atime updates, but still keep it mostly correct. This also lines up with the OpenZFS recommendations.

acltype=posixacl This is another setting that many configurations used. I did find a good blog article talking about it.

xattr=sa This is both linked to the acltype above, but also is generically considered a good performance tweak.

Let’s talk about a few that I didn’t include from the source blog, aside from atime which I have already touched on.

compression=lz4 While you can find articles that recommend to specify this, the default compression setting is ‘on’ and I believe the default algorithm is lz4. I decided to go with defaults, and if there is a slight performance penalty that’s ok.

encryption=aes-256-gcm, keyformat=passphrase Hopefully obvious that these are both related to encryption of the data, and for my use would mean dealing with entering a pass phrase on boot.

If we review the OpenZFS doc on NixOS, they have more options I have not specified. I didn’t go quite as deep on each of these, but when I did a review it seemed that many of those settings were aligned with the defaults.

canmount=off Seems to be a almost a duplicate of mountpoint=none

dnodesize=auto Almost seems like a good idea, especially as it’s linked to xattr=sa which might store lots of attributes needing more size than the default legacy. This article has some details, and in the end they also elected to not use it.

normalization=formD I found a good post on ZFS settings overall, and this one specific to unicode filenames. It almost convinced me I should reconsider and add this, but how many weird unicode filenames do I have anyways? And if they don’t match, I can deal with the weirdness.

Recovering a broken mirror

While this is a lot of detail, we haven’t yet touched on recovery from a failure. I did a lot of experiments using UTM to run a VM on my Macbook. This let me build VMs with dual NVME drives and quickly iterate on both setup, and recovery.

To simulate a failure, I would simply shut the VM down, delete one of the drives, and re-create a new drive. Starting the VM up again resulted in one of two things:

  1. The VM would boot, but take a bit longer than normal. Looking at zpool status would indicate that one of my mirrors was broken
  2. The VM would dump me into the EFI Shell, this confused me as I hadn’t navigated this previously, but it was a simple matter of manually selecting the file/drive to boot and I’d end up in the same state as (1)

Awesome, so my install does result in a system that will survive the loss of one of the mirrored drives. I did stumble quite a bit on fixing the mirror.

The first step is to re-partition the new drive

Of course, the mount point may be /boot-fallback depending on which drive we have lost. Then similar to my post on doing this on a RAIDZ we simply do a sudo zpool replace rpool (target) (source). Because this is a bit complicated, let me walk through an example.

We will start with a working system and look at the zpool status

Now we will remove one of the drives and boot again. The failed state boot will take some time (many minutes), you need to be patient – the amazing thing is that it will eventually boot and everything works.

Now when we look at zpool status we can see the mirror is broken, but we do have a fully running system.

Time to do some re-partitioning using the script above (obviously with changes to address which mount point and drive). Pro-tip: you may need to install some of the partitioning tools: nix-shell -p gptfdisk parted

Now that our disk is partitioned, we can repair the pool.

Using zpool status we can verify that things are good again

Now we just need to fix our boot problem. As I said, this tripped me up for a bit but the NixOS discourse was very helpful.

There are two references to the boot drives. One in /etc/nixos/configuration.nix, and one in /etc/nixos/hardware-configuration.nix. Both need to be updated to reflect the new drive.

We can fix the latter by doing the following

Once we edit the /etc/nixos/configuration.nix to correctly represent the new uuid’s for the drive(s) – we can issue a rebuild, then reboot to test.

This final reboot should be normal speed, and we will see that both /boot and /boot-fallback are mounted automatically and fully populated.

Cool, so we have a basic NixOS install, running on a mirrored ZFS root filesystem – and we have some evidence that we can recover if a bad thing happens.

Adventures in 4K – Ripping 4K UHD Blu-Ray

For my birthday a few months ago, I got a copy of The Matrix in 4k. Previously I had only the original DVD that I bought when it first came out. I popped the 4k blu-ray into my blu-ray drive and started up MakeMKV only to discover that my system was unable to read a UHD disc.

Thankfully the 4k blu-ray comes with a normal blu-ray that contains a 1080p copy of the movie and I was able to rip that to my personal collection. While I do have a 4k capable TV, my primary projection setup is still only 1080p so having more bits available isn’t actually a better setup. Still, owning a 4k disc and not being able to use it bugged me.

It turns out the MakeMKV folks run a forum, and there are recommendations there for the right drives to buy in order to rip the 4k discs. There is a thread Ultimate UHD Drives Flashing Guide Updated 2024 which is required reading if you want to get started. I also checked out CanadaComputers which is my local go-to computer store, often having better prices than you can find online.

My pick was the LG WH16NS40 which was both low cost, and appeared to be well supported by the MakeMKV forum. Of course, it isn’t as simple as buy the drive and rip 4k media, you need to modify the firmware. The fact that I had to modify the drive to get it to do what I wanted made this a must have item so it went on my Christmas list. Thankfully I was on the good list and when it was time to unwrap gifts I had my hands on a new drive.

Installing the drive into my Linux machine was pretty straight forward. I ended up replacing another older DVD drive I had in there. On the label of my new drive, I could see the model number (WH6NS40) and manufacture date (June 2024). There was also an indication of the ROM version (1.05).

I run MakeMKV in a container, for me this is a great way to encapsulate the right setup and make it easy to repeat. The new drive showed up just fine to MakeMKV – but I didn’t expect it to support 4k UHD discs just yet.

I will summarize things further down, so you can skip to the summary if you want. However, the bulk of this post will be my discovery process on re-flashing the drive.

Time to head off to the guide and read it carefully.

The first thing I took note of was the correct firmware I wanted based on my drive. This was in the “Recommandation” section near the top.

WH16NS40 on any Firmware directly to > WH16NS60 1.02MK

So I want the 1.02MK version, and it seems I can get there with a single flash vs. needing to do multiple steps.

A bit further down in the same guide, I came across

LG 1.04+ / BU40N 1.03 / Asus 3.10+ and similar
The newer OEM firmwares cannot be flashed easily due to the additional downgrade checks implemented by the drive/firmware manufacturer.

Oh oh. So I may have problems? I am pretty sure I have the 1.05 firmware.

As I read on, it seems the recommended flashing tool is for Windows, and while I have a few Windows systems the drive is installed in a box that only has Linux on it. I spent some time reading through various forum posts and searching for other related material.

At this point I have more confidence that yes, my drive is supported – but it’s a question about how exactly to fix this drive (under Linux) to make it go. Bonus points if I manage to do this all inside of a container.

I did find an older thread that discusses flashing things under Linux. It pointed at a stand alone flashing tool on github, but it was reading through this thread when I discovered that MakeMKV itself contains the sdftool and supports the flashing process. This means I already have the tool inside the MakeMKV container.

Here is how I run the container

For your system you will want to adjust the volume mappings and device mappings to match what is on your host system. This works great for me, and I can access both of the blu-ray drives on my system and write newly ripped files to my host filesystem.

Looking at the browser view of the MakeMKV container I can see that the new drive is recognized, and in the right side panel it even calls out the details for LibreDrive support.

Shelling into the docker container, I can see that sdftool is a link to makemkvcon.

I had read about the possibility of dumping the original firmware as a backup plan in case things go very badly, but it seems this is actually not possible. It seems the manufacturers have made this more complicated in the name of security or something.

I grabbed the “all you need firmware pack” from the guide. This is a very small set of alternative firmwares, only one matches my LG ‘desktop’ sized drive so it was easy to identify the one I wanted to use.

I also needed the SDF.bin file that is hosted on the makemkv site.

In theory, I have all the bits I need. The sdftool, the SDF.bin, and the modified firmware.

At this point, I’m back following the guide. The Mac/Linux portion which walks you through things. I can dump information about the current firmware from my drive

Now I know the existing firmware version, it does not appear to be an exact match to the ones in the list from the guide under “Newer OEM Firmwares and encrypted”. However, the following is a pretty close match:

This drive was made in June 2024 and most probably has a firmware from after 2020 – so a very close match to the list above, and the date of manufacture makes it very likely that my drive has ‘encrypted’ firmware.

Ok – to recap what the plan looks like.

  1. Grab the sdf.bin file
  2. Download the modified firmware(s)
  3. Dump existing firmware versions – determine if you are encrypted or not (likely you are)
  4. Flash the drive

Easy right?

From outside the container we can copy in the firmware we need

And inside the container we can pull down the SDF.bin file.

Then we just need to do the very scary flash part.

There is a very long (minutes) pause where the flashing is taking place.. longer than I can hold my breath.. uh.. did I just make a brick?? fuuuuuu….

I can see from another terminal session that it is eating CPU, pegged at 100%.

After 10+ mins of hanging.. I hesitantly CTRL-C’d the thing..

Thankfully, everything seems ok – I’m exactly where I started. Whew.

I found that adding the verbose (-v) flag was probably a good idea, and a forum thread that indicated that there should be more output from the command. Maybe it’s getting stuck starting up?

I had a few thoughts. Maybe I need to run the container with less restrictions? (docker –privileged) No, that didn’t change anything.

Then I found someone having the same problem recently. It seems the solution they used was to just use Windows. I did ponder how I might setup a temporary Windows install to do this. Then I found this thread that discusses MakeMKV hanging after loading the SDF.bin file, this feels like it may be the same problem. In that case the issue is with the most recent version of MakeMKV (1.17.8).

I started looking for an older version of the container I’ve been using, one that has MakeMKV (1.17.7). It turns out that jlesage/makemkv:v24.07.1 is a few tags back, but has that version. Let’s see if using this version will work better.

This seems to be much better, I’m now getting an error message instead of a 100% CPU hang. Also, apparently I need to remove the disc from the drive which is something I can do.

I only very briefly held my breath as I typed in ‘yes’ and let it continue to do the work. It only took a minute or so to flash the drive and report success.

I needed to re-start the makemkv container. Then it was showing me my drive was good to go

As you can see the LibreDrive information now shows

And I can now read 4k UHD blu-ray discs without problem. I was able to rip the 4k version of the Matrix (53Gb) without issue. My setup was only showing [4x] speed, but I suspect this is more a limitation of my overall system vs. the drive which I suspect can go faster. I’m still very pleased to be able to pull the bits.

Summary – the TL;DR version

Recent versions of the LG WH16NS40 can be modified to read 4k UHD blu-ray discs. This can be accomplished under Linux, using the MakeMKV container.

There is a bug in MakeMKV version 1.17.8 which causes it to hang with 100% CPU. Using version 1.17.7 still works as of the date of this post.

Absolutely read the guide.

Start up the MakeMKV container.

I downloaded the firmware bundle, and picked the matching one for my drive. I then copied it from my host filesystem into the container

Then shell into the container and download the SDF.bin file.

Now we issue the flash command

That’s it. We have a modified firmware installed.  Time to enjoy 4k goodness.