Update: NixOS + Docker + MacVLAN

This is an updated version of my previous article NixOS + Docker with MacVLAN (IPv4) – which addresses ARP Flux which causing networking problems. Skip to the bottom summary if you want to just read the conclusion.

Ever since, but very intermittently, my new server build out – I had been having issues with my macbook (OSX) randomly losing connection to the server, but only for a brief time. I noticed mostly when using mosh and it would indicate that I’d lost connection, but self hosted web UIs like Grafana also gave me problems.

Ping would also fail, but then it would start to work after a minute or so. What was weird is that while I could not ping my new server (192.168.1.79) I could have a stable mosh connection to another box (192.168.1.140), which could see the new server (192.168.1.79) just fine. This made it seem very much like it was something wrong with OSX. I was guilty of mostly ignoring this because it did impact me, but not frequently — it was still very annoying.

This is also an exploration of using Google Gemini, which you can use from an incognito tab with no login required. It didn’t immediately get me to the right answer, but we eventually got there.

The first iteration didn’t really get me very far, only a suggestion to check the route on my OSX machine when the problem was happening. This meant I had to wait around for it to break – and catch it in the act.

Unfortunately the output from this didn’t change with broken networking vs. working. At this point I added the detail that this networking problem seemed to happen most often after waking the OSX machine from sleep. Gemini was then able to reason:

In your route output, the flag LLINFO stands for Link Layer Information. This means macOS has a cached MAC address for 192.168.1.79.

When a Mac sleeps, it often hands off its networking responsibilities to a “Sleep Proxy” (like an Apple TV or HomePod) or simply drops its ARP table. When it wakes up, it might be trying to use a stale hardware address, or the Linux machine (myserver.lan) might not be responding to the Mac’s “Who has 192.168.1.79?” request fast enough.

The next suggestion was to check the arp information when networking was broken.

  • If it says (incomplete), the Mac is sending requests but getting no answer.
  • If it shows a MAC address, compare it to the actual MAC address of the Linux box. If they match, the physical path is clear but the packets are being dropped higher up.

This got me looking at my WatchYourLan deployment, this is a networking monitoring tool that I use to alert me to new devices (based on MAC addresses) on my network.

To my surprise – I found another MAC 82:d4:9a:8f:06:40 for myserver.lan, one that I didn’t expect to see. Gemini called this a smoking gun, as it is a Locally Administered Address (LAA) – also known as a generated MAC. Gemini also concludes that this may be a Docker issue, which somewhat confirms what I was thinking. The advice from Gemini is getting more specific now, but I also find that in general this is where AI generated output can go off the rails so caution is required.

The Pattern: You have a “Bridge” problem

Seeing four different randomized MACs for a single IP (192.168.1.79) on a wired Linux machine is a classic symptom of Docker Userland Proxy or Bridge-utils misconfiguration.

When you run containers or virtual machines in “Bridge Mode,” the Linux kernel creates a virtual switch (br0, docker0, etc.).

The physical NIC (eth0) has its own hardware MAC.

The Bridge (br0) often generates its own MAC (like one of these A6 or 52 ones).

Each container gets a virtual interface (veth) with a randomized MAC.

I then remember to add the detail that this Linux machine is a NixOS installation, and Gemini comes back with

NixOS with a complex Docker setup is the “perfect storm” for this. NixOS tends to be very explicit about interface creation, and Docker’s default bridge behavior often clashes with how NixOS manages its own networking stack.

It suggests a “ghost hunter” command to identify bad MACs

If these “ghost” MACs are responding to ARP requests on your physical LAN (a phenomenon called ARP Flux), it’s usually because the Linux kernel is being too “helpful” by answering ARP requests for an IP on any interface, even if the request came in on a different one.

It also suggests a change to my configuration.nix file, which in retrospect was pretty close to the final solution – but not quite. At this point I’m pretty impressed with Gemini being able to help with NixOS specifics. It also asks the question

Does your NixOS config happen to have any bridge interfaces (networking.bridges) or macvlan setups defined manually?

Why, yes – yes I do I reply. Gemini then fires back:

This macvlan-shim is the “smoking gun” for your macOS networking issues.

Why this breaks your Mac’s connection
By default, Linux is very “promiscuous” with ARP. When your Mac sends an ARP request asking “Who is 192.168.1.79?”, the Linux kernel sees that request on enp3s0. Because the macvlan-shim is bridged to that same physical card, the kernel might answer the Mac using the macvlan’s generated MAC address instead of the physical NIC’s hardware address.

This is known as ARP Flux. Your Mac gets confused because it sees multiple MAC addresses (the “ghosts” we found earlier) all claiming to be the same IP or living on the same segment. When the Mac wakes from sleep, it’s a race: if the “shim” MAC wins the ARP race, the Mac sends traffic to a virtual interface that isn’t configured to handle the host’s primary traffic.

It sure likes the term “smoking gun”, but it feels like we’re close to identifying the real root problem in my setup. I had previously done some changes to the way I deployed the various docker containers on my macvlan network to set static MAC addresses, but had not yet figured out how to make the shim IP have a static MAC address. The evidence I have and the advice from Gemini are starting to align. Odd that I got to this configuration from a previous setup using Ubuntu which didn’t have an issue, or maybe it was just luck due to timing.

I asked a lot of clarifying questions about the solutions it was proposing, which helped iterate towards both my more complete understanding of the solution – but it also helped generate some options. A minimalist solution to this was suggested, but felt like a lot of ‘trust in the force’:

If you added this to my previous article on NixOS + Docker + MacVLAN you’d probably be fine. However, here is the more complete solution I ended up using:

Again, if you compare with the original article we can see that I’ve added a specific MAC address for the shim, and used a naming convention to make the last digit match the IP address. The big change is the boot.kernel.sysctl which is very similar to the minimal setup above.

I also use the new MAC address numbering scheme for each of my macvlan containers – assigning them 06:00:00:00:00:XX where XX is the IP. Very handy to see they are the right assignment.

Now this had some interesting side-effects. The shim IP continued to offer up a stale MAC generated MAC address. I was able to fix this by forcing a recreate (but I suspect a reboot may have solved the problem).

A few other things also broke due to this more restrictive ARP control. Previously a container living on the macvlan network (like my nginx for local web services – on IP 192.168.1.65) could see my server (192.168.1.79). The myNewNet-shim provided visibility from the host to the nginx (192.168.1.65).

In this new no ARP Flux world, I have to use the shim address (192.168.1.67) in order to connect to my server (192.168.1.79) – so everywhere I reference x.x.x.79 I now needed to use x.x.x.67.

I’ve realized that this makes my wireguard setup a bit more annoying because if I use wireguard to connect home from remote, and I want to connect to my server I need to use the shim address instead of the local DNS name that maps to the real address. This was too big a trade off so I went back to Gemini and had a long discussion (with a lot of dead ends) to arrive at a solution that adds back the visibility of the host (192.168.1.79) from the macvlan containers (say 192.168.1.64 for example).

Now using wireguard, I can see the full local network – host included.

Less ARP chaos is a good thing. Gemini did suggest that instead of declaring the route for the shim with a CIDR block that includes the shim address, I could be even more specific and simply have a route per IP address:

The key difference here is using /32 instead of /30. Being specific avoids the Linux Kernel from having to figure out what to do with the shim address (192.168.1.67) but it seems to do the right thing so I’ve gone with the simpler declaration. I’ve also got a special route in there for wireguard addresses so my host can see the wireguard clients directly.

Verifying the fix can be done on the OSX machine

We want to review all of the MAC addresses to make sure we have the expected ones that follow the pattern 06:00:00:00:00:xx. If all goes well, this is the end of the ghosting problem (ARP Flux) that will cause the annoying interruption in networking from my OSX machine to the server.

In summary – the full docker macvlan setup on NixOS – this is basically a brief re-telling of the original post, with all of the updates above merged in.

First enable docker support on NixOS. You need a single line added to your /etc/nixos/configuration.nix

Create your docker macvlan network.

Docker will persist this network configuration across reboots.

Now we need to modify /etc/nixos/configuration.nix to fix routing to/from our macvlan network IPs from the host – and avoid causing APR Flux.

This was a bit of a journey, but if you were just looking for a clean way to get macvlan networks working with NixOS and Docker hopefully it is presented in a way that is straight forward and you can follow. Also, we touched on using AI to help us explore solutions – and I do encourage you to use it as a tool vs. a magic eight ball. Along the way many wrong answers were presented, but by asking for more details and ways to test the assumptions, and changes – I was able to learn more about the solution and come to what I think was a good solution.

Generative AI code assist

I considered use “Vibe Coding” as the title, but it’s just such a buzz word that I decided to go with a more factual title. I’m old school enough to want to distinguish between generative AI and the more broad AGI (Artificial General Intelligence). I’ll also state that I consider myself a bit of an AI coding skeptic, but hopefully in a healthy way.

Just like any computer program, garbage-in, garbage-out. The modern buzz word for this is AI-slop. I’ll avoid bashing the technology and focus on how you can use it constructively today, even with some of it’s limitations. I will also confess that at work I’ve got access to AI for code generation and it’s been interesting learning a new set of skills, this post will focus on what you can do for free on the web.

Perchance has a no login required, free code generator. I was in the process of setting up karakeep to replace wallabag. Both of these tools perform a similar function, effectively a web based bookmark manager and offline capture of a web resource. A simple list of links + archive.org would solve the same problem, but this is a self-hosted solution and is pretty neat.

The task at hand is to figure out how to export all of my links from wallabag and the import them into karakeep, the more context I can preserve the better. Since there isn’t a common import/export format between the two tools, we’ll use the aforementioned code generator to create something to convert the file.

Luckily both support a JSON based format. I can export from wallabag into JSON. It looks something like this

And karakeep has both an import/export supported into JSON format

First we will create a few sample entries in karakeep and do an export to figure out what it’s format is. It turns out to look something like this:

If you look at the two formats, you can see some obvious mappings. This is good. I started with the perchance code generator and a very simple prompt:

This let me get my feet wet, and make sure I had my environment setup to run code etc. I do have reasonable javascript experience, and that will help me use the code generator as a tool to move quickly. I tend to think of most of these AI solutions as doing pattern matching, they pick the ‘shape’ of your solution and fill in the blanks – this is also where they will make stuff up if there is a blank and you haven’t given it enough context, it’ll just guess at a likely answer.

Once I had the code generator creating code, and I was able to test it, things moved along fairly quickly. I iterated forwards specifying the output format JSON etc.. and I was both a bit amazed, but pleased to see that it had decided to use the map capability in nodejs. This made the generated code quite simple.

My final prompt ended up being:

And this is the code it generated

Notice anything curious here? It has, without me saying anything, decide to map title and tags into the output object. Very nice – I’m impressed.

Was there any really smartness here? Well, I would not have arrived at the idea of using map in the javascript code – it’s the right and elegant solution. A stronger javascript developer would have likely landed here since it is a concise solution to the problem. Maybe I would have found a similar answer on stackoverflow, but the code generator made it easy for me.

The date manipulation is also very slick.

I would have eventually got there, but it just did it for me. A very nice time saver.

This generated javascript let me export / import 376 entries from my wallabag — preserving the original dates, tags and titles.

Sometimes working with AI for code is like having a book smart, and very fast, new hire. No experience, lots of enthusiasm, and cranks out code quickly. Does the code work? Not always, maybe not even often. I’ve also had to ‘reset’ the approach being used, when multiple iterations uncovered that it was basically impossible to solve the problem using the approach that I started with. Using test driven development can help provide guide rails for ‘working / not working’, the more context you can provide the better. Learning how to guide the AI, and evaluate if you’re getting what you intended to ask for are the new skills I’ve been growing.

I feel I do need to throw down some caution flags around AI use. If you’re using something that is ‘free’, think again why are they making it free to use? Open source projects, don’t mean that it’s safe. Under the covers, this is all still built out of the same parts – so if you use it to open your network and data to the internet, you’ve got the same security problems.

Interesting times.