PSA: DNS servers have no priority order

It is a common misconception that DNS servers that your system uses are managed in a priority order. I had this misunderstanding for years, and I’ve seen many others with the same.

The problem comes from the router or OS setup where you can list a “Primary” and “Secondary” DNS server. This certainly gives you the impression that you have one that is ‘mostly used’ and a ‘backup one’ that is used if the first one is broken, or too slow. This is false, but confusingly also sometimes true.

Consider this stack exchange question/answer. Or this serverfault question.  If you go searching there are many more questions on this topic.

Neither DNS resolver lists nor NS record sets are intrinsically ordered, so there is no “primary”. Clients are free to query whichever one they want in whichever order they want. For resolvers specifically, clients might default to using the servers in the same order as they were given to the client, but, as you’ve discovered, they also might not.

Let me also assure you from my personal experience, there is no guarantee of order. Some systems will always try the “Primary” first, then fall back to the “Secondary”. Others will round-robin queries. Some will detect a single failure and re-order the two servers for all future queries. Some devices (Amazon Fire Tablets) will magically use a hard coded DNS server if the configured ones are not working.

Things get even more confusing to understand because there is the behaviour of the individual clients (like your laptop or phone), and then the layers of DNS servers between you and the authoritative server. DNS is a core part of how the internet works, and there is lots of information on the different parts of DNS out there.

The naming “Primary” and “Secondary” come from the server side of DNS. When you are hosting a system and configure the domain name to IP mapping, you set up your DNS records in the “Primary” system. The “Secondary” system is usually an automated replica of that “Primary”. This really has nothing to do with what the client devices are going to do with those addresses.

Another pit-fall people run into when they think there is an ordering, is when they setup a pi-hole for ad-blocking. They will use their new pi-hole installation as the “Primary” and then use a popular public DNS server (like as the “Secondary”.  This configuration sort of works – at least some of the time, your client machine will hit your pi-hole and ad-blocking will work. Then, unpredictably it will not block an ad – because the client has used the “Secondary”.

Advice: Assume all DNS servers are the same and will return the same answer. There is no ordering.

I personally run two pi-hole installations. My “Primary” handles about 80% of the traffic, and the “Secondary” about 20%. This isn’t because 20% of the time my “Primary” is unavailable or too slow, but simply that about 20% of the client requests are deciding to use the “Secondary” for whatever reason (and that a large amount of my traffic comes from my Ubuntu server machine). Looking deeper at the two pi-hole dashboards, the mix of clients looks about the same, but the “Secondary” has fewer clients – it does seem fairly random.

If your ISP hands out IPv6 addresses, you may find that things get even more interesting as you’ll also have clients assigned an IPv6 DNS address, this adds yet another interface to the client device and another potential DNS server (or two) that may be used for name lookups.

Remember, it’s always DNS.

When rate limiting (and firewalling) goes wrong

Recently I experienced a few power failures that lasted hours. This means that when the power is back, all of my infrastructure reboots and reconnects. For the most part this is 100% automatic, but the last time I ran into an interesting problem.

My pi-hole was running with the default rate limiting of 1000/60. This means that each device can make up to 1000 requests per minute, and if it exceeds that it will be put on a deny list for 60 seconds.

It turns out that my main server that runs a bunch of docker containers makes a lot of DNS requests when everything is starting up all at once. This creates a storm of requests to the pi-hole and the server ends up being blocked for DNS requests (responding with REFUSED) due to rate limiting.

Unfortunately the behaviour of enough of the containers is to retry when this happens. This causes more DNS requests to be made as the retry logic runs. These retries cause another wave of requests which cause the server to be blocked again. Some of my containers entered error conditions due to unexpected DNS failures, so these needed to later be restarted but at least they stopped contributing to the problem.

My email container was pretty unhappy, it really wants to be able to use DNS, even when receiving email. Since my server had been unavailable for a while, there were external email servers trying to deliver mail that had been queued – this contributed to the load. Additionally I couldn’t connect any email clients to the server which left me scratching my head a little, more on that later on.

The ‘fix’ was easy enough. Modify the pi-hole DNS rate-limiting setting to 0/0 to remove any rate limiting. This is imperfect, but at one point I saw 30,000 requests in a minute from my struggling server and I think I’d rather have no limit and deal with that problem than hit the limit and run into this denial of service issue.

Now that the pi-hole was happy, I was able to get most of my containers to be happy with a little poking at them. Email was still sad, and this took me a coffee break to realize what was wrong. The email container was receiving email just fine, but I could not connect with a client. This felt like a networking problem, but how could that be?

I had forgotten (again) – that the email server has fail2ban running in it. This scans logs looking for suspicious activity and will ban an IP for a period of time by inserting a firewall rule. Furthermore, as I use the domain name to configure my email client – this resolves to the external IP. The external IP means that the client talks to my OpenWRT router which provides NAT and then redirects/maps that external IP back into my network. This has the effect that the originating IP looks like it is my router, not the client machine on the internal IP address. This process is called NAT reflection, or NAT hairpinning.

While NAT reflection is a super handy feature for my OpenWRT router to have, allowing me to easily from inside my home network visit a machine I’ve exposed via port mapping to the outside world using the same DNS entry that points at the external IP address — it means that services on that machine see my router IP as the client IP. When any of the machines in my house have problems connecting to my email server, in this case because I had DNS REFUSED errors on the email server, fail2ban decides that is a bad client and bans it. Thus banning all traffic originating from my home network.

This is easy to fix once you understand what is happening, I just needed to unban my router IP and my email clients could connect.

Pi-hole Ubuntu Server (take 2)

My past two posts have been the hardware setup and pi-hole deployment. You rarely get things right the first time and this is no exception.

I’ve already added an update to the first post about my nullmailer configuration. I had forgotten to modify /etc/mailname and thus a lot of the automated email was failing. I happened to notice this because my newly deployed pi-hole was getting a lot of reverse name lookups for my main router, which were triggered by the email system writing the error.

You may also want to go look at / clean out the /var/spool/nullmailer/failed directory, it likely has a bunch of emails that were not sent.

Once corrected, I started to get a bunch of emails. They were two different errors both triggered by visiting the pi-hole web interface

The first issue is a bit curious. Let’s start by looking at the /etc/resolv.conf file which is a generated file based on the DHCP configuration.

Then if we look at the static configuration that was created by the pi-hole install in /etc/dhcpcd.conf

Decoding the interaction here. The pi-hole setup script believes that we should have a static IP and it also has assigned the pi-hole to use DNS servers that are the same DNS servers that I specified as my custom entries for upstream resolution.

This feels like the pi-hole install script intentionally makes the pi-hole host machine not able to do local DNS resolutions. I wonder if all pi-hole installations have this challenge. I asked on the forum, – and after a long discussion I came to agree that I fell into a trap.

A properly configured linux machine will have the name from /etc/hostname also reflected in /etc/hosts (as I have done). My setup was working fine because the DNS service on my OpenWRT router was covering up this gap. While it’s nice that it just worked, the machine should really be able to resolve its own name without relying on DNS.

As my network has IPv6 as well, the pi-hole also gets an IPv6 DNS entry. Since the static configuration doesn’t specify an upstream IPv6 DNS – the ULA for the pi-hole leaks into the configuration. This is a  potentially scary DNS recursion problem, but it doesn’t seem to be getting used to forward the lookup of ‘myhost’ to the pi-hole so I’m going to ignore this for now.

And easy fix is to go modify /etc/hosts

Where myhost is whatever appears in /etc/hostname.

This will fix the first problem, and it seems the second problem was also caused by the same unable to resolve myhost. I think this is exactly what this post was describing when it was seeing the error messages. This is another thread and different solution to a similar problem.

Satisfied I’d solved that mystery, along the way I’d also created a list of things to ‘fix’

  1. Ditch rsyslog remote
  2. Run logwatch locally
  3. Run fluentbit locally
  4. Password-less sudo
  5. SSH key only access
  6. Add mosh

The rest of this post is just details on these 6 changes.

Continue reading “Pi-hole Ubuntu Server (take 2)”