Expanding a docker macvlan network

I’ve previously written about using macvlan networks with docker, this has proved to be a great way to make containers more like lightweight VMs as you can assign a unique IP on your network to them. Unfortunately when I did this I only allocated 4 IPs to the network, and 1 of those is used to provide a communication path from the host to the macvlan network.

Here is how I’ve used up those 4 IPs:

  1. wireguard – allows clients on wireguard to see other docker services on the host
  2. mqtt broker – used to bridge between my IoT network and the lan network without exposing all of my lan to the IoT network
  3. nginx – a local only webserver, useful for fronting Home Assistant and other web based apps I use
  4. shim – IP allocated to supporting routing from the host to the macvlan network.

If I had known how useful giving a container a unique IP on the network was, I would have allocated more up front. Unfortunately you can’t easily grow a docker network, you need to delete and recreate it.

As an overview here is what we need to do.

  • Stop any docker container that is attached to the macvlan network
  • Undo the shim routing
  • Delete the docker network
  • Recreate the docker network (expanded)
  • Redo the shim routing
  • Recreate the existing containers

This ends up not being too hard, and the only slightly non-obvious step is undoing the shim routing, which is the reverse of the setup.

The remainder of this post is a walk through of setting up a 4 IP network, then tearing it down and setting up a larger 8 IP network.

Continue reading “Expanding a docker macvlan network”

Debugging your network

The other day there was something strange happening on my home network. While my Amazon Fire HD8 tablet was happily on the WiFi, my Pixel 4a was ‘bouncing’ between WiFi and mobile data. The animated image at the top of the post is a simulation of what I saw.

The first thing I did was reboot my phone. Maybe this was some weird snafu and a reboot will fix it. This is the classic turn if off and on again.

The problem persisted. At the time I thought this may be just my device, but I found out later that Jenn’s Pixel 6a was doing the same thing, as was an older Pixel 2. The 6a was happy enough delivering internet, but based on the mobile data usage for the day it was clear most of the traffic was mobile data.

Next was the look at the OpenWRT router(s) – and I was a bit surprised that all of them had been up for 70+ days. I was headed out the door and decided to stall since it seemed that computers (and the 6a) were working ok (or so I thought). I confirmed that my phone was fine as I worked from the office, and had solid WiFi all day.

After dinner, it was clear that there was still a problem – but only the Pixel phones seemed to be having the problem. I tried removing the WiFi connection from my phone and re-adding it, maybe that would help? Nope. What if I switch to the guest network? Whaaat? Solid connectivity?!

Now the guest network doesn’t get the benefit of ad blocking from my Pi Hole. Maybe there is something going on there? I start poking around the logs on the Pi Hole, and reviewing logs on the OpenWRT router to see if there is any evidence. Nothing stands out. I try moving my phone to the IoT network – all three of the WiFi networks are hosted on the same hardware – so this helps eliminate the hardware and a good chunk of the software. The IoT network does make use of the Pi Hole, and to my surprise my phone was happy on the WiFi when using the IoT network.

At this point I’m about an hour and a half into debugging this, and I’m starting to run out of ideas. There have not been any configuration changes to my network recently. I don’t believe that any software updates have landed on my phone, and to have 3 different Pixel devices to all have the same problem all of a sudden is really weird. I’ve rebooted both my networking devices and the mobile devices – still the problem persists.

I run with 3 access points (two dumb AP and a main gateway), but each of them advertises the same SSID on two WiFi channels (a total of 6 distinct WiFi channels, all with the same SSID). This is a great setup for me as my devices just seamlessly move from connection to connection based on what is best.

My next idea is to change the SSID of one of the channels from ‘lan wifi’ to ‘hack wifi’ – allowing me to specifically connect to a given radio on a given access point. Now I can connect my phone to this new ‘hack wifi’ and know that configuration changes to this will affect the one device. Unsurprisingly the behaviour is the same, my phone just keeps connecting / reconnecting to this ‘hack wifi’.

I dive into some of the OpenWRT settings, looking for something that will make this WiFi connection more resilient. There are lots of options, but ultimately this is a dead end. I then wonder what would happen if I modify this WiFi connection to instead of routing to my ‘lan’ network, to route to the ‘iot’ network. Now my connection to ‘hack wifi’ works great.. hmm

What does this tell me? There seems to be something weird about my ‘lan’ network vs. there being something specific about the way that my ‘lan wifi’ is configured. This pivots me away from looking for differences in the WiFi configuration of my IoT, Guest, and Lan networks and looking more specifically at what’s connected to the lan network.

I grab the list of all devices on the lan network. Eliminate all of the WiFi clients in that list because it’s probably not them (but I’m guessing). Let’s take a closer look at the wired things (of which I have a good number). I start unplugging things, no joy. I turn off my main wired switch and still nothing. Finally I try unplugging my old server.

Boom. That was it. Almost immediately my phone connects to the WiFi network and stays connected. A quick check, and all of the other Pixel phones are happy now too.

This reminds me of a problem I had years ago, but my network was smaller and a little less complicated. A network cable I had built turned out to be bad, but only after months of use. Suddenly one day, my whole network was misbehaving, all devices were having problems. Powering off the machine had no effect, it was only once I removed the network cable that things came back to life. This was similar as the bad machine was connected to the main wired switch that I had powered off, but that wasn’t enough to fix the issue. Removing the cable was the fix.

Throughout this problem, computers on the ‘lan wifi’ seemed fine. Video calls worked fine, no strange drops or slow downs. Still, the impact to the Pixel phones was extreme.

The old server is very old (built in 2009), all of the drives in it are starting to fail and I should probably just power it off and wipe the drives and dispose of the hardware. I just haven’t quite gotten to it yet, this is probably a sign I should.

Edit  – Oct 6th

Oh no, it happened again. I haven’t plugged the machine or the cable that I determined was the issue last time back in – so what is triggering this?

This time I went straight to the 24port switch. Instead of powering it off (and on) which last time I did, and it had no effect – I removed it from the network by disconnecting the network cable between it and my OpenWRT router.

My phone immediately stopped bouncing between wifi and LTE, but just sat on LTE. It was then that I noticed that (of course) my pi hole which is run on a Raspberry Pi 4, is connected to the wired switch.

Plugging the switch back into the network, and my phone happily rejoined the network. The problem was gone, and has stayed gone since then. No reboots of any equipment were needed, only a brief drop from the network for the 24 port switch.  Very mysterious.

Edit – Oct 23rd

Ugh, again. This time I tried to be more surgical. I unplugged the pi-hole machine and waited 30 seconds. Plugging it back in might have fixed things? (not a power cycles, just a network drop?) Or maybe it fixed itself? Grr.

Edit – Nov 2nd

Yup, again. I woke up and noticed my phone was bouncing. Unplug the pi-hole from the network, wait, add it back.. magic – all fixed.

My ISP recently started giving me IPv6 (again) — I’m wondering if there is something funny going on there. This is still odd.

Edit – Dec 13th

And again (and possibly I missed recording one time it has happened since the last as well). It seems clear that disconnecting the network cable from the Pi and waiting a few (10) seconds – then reconnecting it is all I need to do to resolve this problem. I have changed (and tested) the network cable and changed the port it is connected to on the 24port switch.

Final Edit

I’ve failed to update a few times when I’ve had minor issues – but based on the history above you can see it continued. I’ve now moved the RPi4 (pi-hole) from connecting to the 24port switch, to directly connecting to my OpenWRT (Archer c7). Knock on wood, so far I haven’t seen the problem since.

I lied – one more edit!

The mysterious wifi-cycling happened again, but just to a Pixel2 device. It kept connecting, and flashing that there was no internet and then going through the cycle all over again. I run multiple WiFi networks (and even have unique VLANs), and I could get the pixel2 to stay on the guest network no problem, but not the main lan network. This got me thinking – ok, it is something to do with my pi-hole. However, I’m not seeing any logs on the pi-hole.

The light bulb finally went on. What if the pixel2 is using the IPv6 address of the pi-hole to connect? Oh look – IPv6 address my DHCP server is handing out is not the IPv6 address of my pi-hole!?

My OpenWRT configuration which hands out the IPv6 addresses, and assigns the pi-hole it’s static addresses – had a configuration error. I had given it an IPv6 ‘suffix’ of 8, and it needed to be corrected to 08 to work.

One more reboot, and my pi-hole now has the correct IPv6 address – and the Pixel2 is happily connected to the wifi. This feels like the true root cause of this problem. It does explain why rebooting things got the timing to be different and the device(s) having problems would pick the IPv4 address or just fail over to when it couldn’t reach the IPv6 address. The pixel2 obviously has less logic in this space, and helped me find the problem.

OpenWRT Travel Router

I recently posted about my purchase of the GL.iNet GL-AR300M16 which I of course immediately flashed with OpenWRT. As this was intended as a travel router it came along with us on a recent vacation. Above you can see the tiny little GL.iNet device plugged via the WAN port into one of the LAN ports of the internet router of the rental we had.

The GL.iNet isn’t a speedy device – with only a single 2.4GHz wireless connection it wasn’t able to saturate the internet connection (200Mbps symmetric) but I was still getting pretty reasonable speeds (~50Mbps).

I had setup the travel router to have a “travel” SSID, and could associate all of the devices we’d brought (6) to that. Sure this is a setup step, but for future trips I’ll only have to setup the travel router and all the devices will connect to the “travel” SSID.

As an aside, I’ll mention that I’ve started to bring my Roku when we travel, this way I have a familiar movie/show watching experience and I don’t have to remember to clear any passwords when I leave because I take the box with me.

Where it gets more interesting, is that I configured the travel router as a wireguard client. Pretty much following my post on OpenWRT as a wireguard client verbatim. I did set up the allowed_ips as a /24 CIDR block – effectively creating a split VPN – so that traffic targeted at my ‘home’ network would flow over wireguard, but other traffic would go directly to the internet. The benefit to this VPN setup is that if I’m streaming a movie on Netflix, that traffic will bypass the wireguard tunnel and when I want to reach a “local to my home network” service like homeassistant, it just works like at home.

Then as icing on the cake, I fiddled with the DNS options so that any address handed out by the travel router gets my pi-hole as the DNS server. If you want to do something similar check out my pi-hole setup post that talks about this configuration in OpenWRT. This gives me ad-blocking and my personal block lists. This helps keep the internet a little bit more family friendly, plus no ads!

I did experience a couple of network hangs,  4 over the week long trip, but a quick power cycle of the router and we were back in business. I suspect that this may have been either high load, or heat, that triggered the problem. The limitation of only 2.4GHz networking didn’t seem to be a big deal, and I got reasonable WiFi coverage over a 3 floors of a townhome.

This setup was pretty awesome. It gave me a ‘at home’ network experience, while I was away from home. What a great little box.

As a bonus, let’s dive into another travel configuration. I’m writing this post from a hotel room, connected to the travel router. Now the hotel doesn’t have a wired ethernet port, so I need to do something slightly different.

There is an OpenWRT package “travel-mate” that makes this more complicated setup easy. We want to operate in AP+STA (access point + station) mode, where the single wifi radio is doing both jobs. Many routers can do this, the GL.iNet is one of them.

The travel-mate documentation is a little sparse, but there is a long and fairly active forum thread that provides help. I was able to get it working with a little bit of stumbling around.

Installing two packages: travelmate and luci-app-travelmate will get you going. An OpenWRT menu “Services->Travelmate” will appear in the web UI, allowing you to access the configuration.

A newly installed travel-mate will have blank information, mine is a capture from a running version.

You’ll need to do a one time “Interface Wizard” to get the interfaces setup. This should create some trm_ network interfaces. I did this once and have forgotten about the details, you can probably safely do the same.

When you are ready to connect to the upstream WiFi (say the hotel’s internet) you will want to visit the “Wireless Station” tab and scan for, and select a SSID to connect to.

There is some magic I don’t yet fully understand about configuring a login script to bypass the captive portal that your hotel is likely to have. In my case, my laptop that connected to the travel router was presented with the captive portal webpage and I was able to log in that way (the travel router basically was a proxy for the captive portal). Once logged in, the router was granted access by the hotel WiFi and all other devices connected to the travel router just worked. (yeah, magic)

I’ll just quickly cover the  travel-mate General Settings.

The top red circle is the “Enabled” checkbox. This is handy as you don’t want travel-mate to be active if you’re using the travel router in a wired setup like I was in the top part of this post. Leaving it enabled while in a wired setup will possibly cause WiFi drop outs as it tries to scan for available networks to connect to.

The bottom red circle is checked on by default, by for my use I found that I had to disable it. Otherwise it was disabling my wireguard VPN. With the checkbox cleared, my split VPN is working fine and I’m enjoying the “at home networking, while I’m not at home” experience. It was also pretty nice that my phone and my tablet just connected to the “travel” WiFi once it was up and running.

Since we are using a single radio to both handle the clients (my devices) and talk to the host network (the hotel WiFi) you can expect that the overall speed to be much less. I know this is true as I’ve tested travel-mate in this AP+STA mode with my home network, and seen the difference (I was only able to get about 26Mbps when my home net connection is much faster). The good news here is that hotel WiFi while adequate, isn’t very good, at least not in this hotel.

Here are two speed tests, one via the travel-router, and one directly to the hotel WiFi.

They are basically the same, especially given the variations you’ll see on the hotel WiFi. The key take away here is that using the travel router isn’t imposing any real overhead or limits, and if we had much better hotel WiFi I’d still get acceptable performance.

It is interesting to note that with travel-mate and running in AP+STA mode, and only 3 devices and 1 user .. it was very stable. I didn’t have any hangs or weird problems once it was setup. I’ll certainly bring it along for future trips.