Docker system prune – not always what you expect

Containers have improved my ‘home lab’ significantly. I’ve run a server at home (exposed to the internet) for many years. Linux has made this both easy to do, and fairly secure.

However, in the old – “I’ll just install these packages on my linux box” – model, you’d end up with package A needing some dependency and package B needing the same one, then you’d have version conflicts. It was always something you could resolve, but with enough software you’d have a mess of dependencies to figure out.

Containers solves this by giving you a lightweight ‘virtualization’ isolating each of your packages from each other AND it also is a very convenient distribution mechanism. This allows you to easily get a complete functional application with all of it’s dependencies in a single bundle. I’ll point at linuxserver.io as a great place to get curated images from. Also, consider having an update policy to help you keep current, something like DUIN, or fully automate with watchtower.

Watchtower does have the ability to do some cleanup for you, but I’m not using watchtower (yet). I have included some image clean up into my makefiles because I was trying to fight filesystem bloat due to updates. While I don’t want to prematurely delete anything, I also don’t want a lot of old cruft using up my disk space.

I recently became aware of the large number of docker volumes on my system. I didn’t count, but it was well over 50 (the list filled my terminal window). This seemed odd, some of them had a creation date of 2019.

Let’s just remove them docker volume prune – yup, remove all volumes not used by at least one container. Hmm, no – I still have so many. Let’s investigate further.

What? If I sub in a volume id that I know is attached to a container, I do get the container shown to me. This feels like both docker system prune and docker volume prune are broken.

Thankfully the internet is helpful if you know what to search for. Stackoverflow helped me out. It in turn pointed me at a github issue. Here is what I understand from those.

Docker has both anonymous and named volumes. Unfortunately, many people were naming volumes and treating them like permanent objects. Running docker system prune was removing these named volumes if there wasn’t a container associated with it. Losing data sucks, so docker has changed to not remove named volumes as part of a prune operation.

In my case, I had some container images which had mount points that I wasn’t specifying as part of my setup. An example is a /var/log mount – so when I create the container, docker is creating a volume on my behalf – and it’s a named volume. When I recreate that image, I’m getting a new volume and ‘leaking’ a named volume which is no longer attached to a container. This explains why I had 50+ volumes hanging out.

You can easily fix this

Yup, now I have very few docker volumes on my system – the remaining ones are associated with either a running or a stopped container.

Running Selenium testing in a single Docker container

Selenium is a pretty neat bit of kit, it is a framework that makes it easy to create browser automation for testing and other web-scraping activities. Unfortunately it seems there is a dependency mess just to get going, and when I hit these types of problems I turn to Docker to contain the mess.

While there are a number of “Selenium + Docker” posts out there, many have more complex multi-container setups. I wanted a very simple single container to have Chrome + Selenium + my code to go grab something off the web. This article is close, but doesn’t work out of the box due to various software updates. This blog post will cover the changes needed.

First up is the Dockerfile.

The changes needed from the original article are minor. Since Chrome 115 the chromedriver has changed locations, and the zip file layout is slightly different. I also updated it to pull the latest version of Selenium.

ChromeDriver is a standalone server that implements the W3C WebDriver standard. This is what Selenium will use to control the Chrome browser.

The second part is the Python script tests.py

Again, only minor changes here to account for changes in Selenium APIs. This script does do some of the key ‘tricks’ to ensure that Chrome will run inside Docker (providing a few arguments to Chrome).

This is a very basic ‘hello world’ style test case, but it’s a starting point to start writing a more complicated web scraper.

Building is as simple as:

And then we run it and get output on stdout:

Armed with this simple Docker container, and using the Python Selenium documentation you can now scrape complex web pages with relative ease.

Update: Managing docker containers with makefiles

I’ve been a bit of an old school “just use docker” vs. “docker-compose”, but it was the other day when I learned that they’ve now baked “compose” into the base docker cli. This means there is less reason to avoid using the features of docker compose.

Still, as previously discussed, I like using Makefiles to manage my containers. There is a very good argument to adopt using Watchtower, despite it being not recommended.

We do not endorse the use of Watchtower as a solution to automated updates of existing Docker containers. In fact we generally discourage automated updates.

Let me go on record that if you’re going to blindly run your update process, you may as well just use Watchtower. This is a good way to automatically break things of course, so maybe don’t auto-update things that you rely on to work (like wireguard or your web server).

With that out of the way – let me point out one of the downsides to my previous Makefile approach to managing containers: I’d slowly run out of disk space with many old images. I had gotten into the habit of running

every once in a while. This would get rid of all the containers, and images, that were no longer active. It did have the downside of removing images that were not currently running – so you need to be aware of that if you had assumptions.

Still my makefile approach saved me a few times allowing me to easily re-name / re-run the -old version of a container and back out of a bad update.

What I needed to do was make sure that the old, -old image was removed. This was where most of the disk space growth was. This turns out to be easier to do than to explain. Here is my update makefile

There are only two new lines. The first is

This is assigning the variable IMAGEID to have the docker image id value from the current -old image.

The second is the obvious removal of the image based on that id.

I believe that these two additional lines result in nearly no filesystem growth as I update images. Very handy as some of the linuxserver.io images pull down ~1GB of data every time, and they are updated weekly.