Hoarder – a self hosted link collection and web archive

I found out about Hoarder via the self-hosted podcast. While I don’t always agree with the opinions of the hosts, they’ve helped me discover useful things a few times. I’d certainly recommend checking the podcast out.

The Bookmark Everything App

Quickly save links, notes, and images and hoarder will automatically tag them for you using AI for faster retrieval. Built for the data hoarders out there!

The install is docker friendly and based on compose. It’s a very simple 3 steps to get a test instance setup.

  1. Download the compose yaml.
  2. Create a .env file with a few values
  3. Then docker compose up

Seems like it supports “sign up” – if you host this visibly externally you may have some spammy sign-ups.. this may be something you can disable.. (yes, you can disable this as I find out below)

After you have created a user – you are greeted with this blank canvas

I currently run Wallabag – which I landed on after trying a few other choices. It was the best choice for my needs at the time, but also super basic. Wallabag has a mobile app which I find useful – as it makes sharing links I find on mobile easy to my Wallbag install.

Wallabag often struggles to capture a page – but it at least keeps the link. One example is this website – which has some sort of scraper blocker. You get a page that indicates it is protected by this.

Ok – so how does Hoarder do with a link https://www.thekitchn.com/instant-pot-bo-kho-recipe-23242169?

For comparison – this is what wallabag got..

The capture in Hoarder take a bit of time – not long, but it renders sort of a blank-ish card immediately and then the image fills in.


Let’s take a closer look at the tile that it created for me

The top of the tile is a picture and link to the original URL (1). The link (1) is also the same destination.
The date (2) and expansion arrows (2) both take you to a larger locally hosted view.
(3) is a menu of options

Let’s dive deeper into the expanded (locally hosted view)

The overall capture/rendering of the page from the local version is pretty good. Links in the text haven’t been re-written, but that’s both expected and generally useful.

This view also offers the option to view a screenshot – which is as you expect.

Since I didn’t provide an OpenAI key nor did I configure Ollama the fancy “Summarize with AI” button just gives me an error.

Looking – it seems this setup 3 unique containers

  • ghcr.io/hoarder-app/hoarder:release
  • getmeili/meilisearch:v1.11.1
  • gcr.io/zenika-hub/alpine-chrome:123

but.. I’m not seeing any storage on the host – this is probably bad, because that means at least one of these containers is stateful (and looking at the compose — there are two data volumes)

I have a preference of storing my data on the host filesystem as a volume mapping… maybe I’ll warm up to the whole docker volume thing, but it always feels like a big hack. (Read on and you’ll find that there is a way to avoid the storage concerns that I have here).

The search appears limited to the title only (boo) – tags are supported in search too.. but no deep searching within the text of the articles.

Looking more at the doc – persistence is something you can configure

DATA_DIR – “The path for the persistent data directory. This is where the db and the uploaded assets live.”

and it does appear you can stop signups from happening

DISABLE_SIGNUPS – “If enabled, no new signups will be allowed and the signup button will be disabled in the UI”

Interesting options for the crawler (disabled by default)

CRAWLER_FULL_PAGE_SCREENSHOT – “Whether to store a screenshot of the full page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, the screenshot will only include the visible part of the page”

CRAWLER_FULL_PAGE_ARCHIVE – “Whether to store a full local copy of the page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, only the readable text of the page is archived.”

CRAWLER_VIDEO_DOWNLOAD – “Whether to download videos from the page or not (using yt-dlp)”

Overall – I’m pretty impressed. I’m not sure I’m quite ready to dump wallabag, but this might become a project I tackle during the holiday break. That stew recipe is pretty amazing, absolutely worth trying.

Generating SSH key pairs

Despite having had some excitement recently, SSH continues to be both the utility and a protocol that I use heavily every day. I will also have to shout out to mosh which is a must have overlay, if you aren’t using it – stop reading this now and go get mosh.

Not often, but every once in a while I find myself needing to generate a new key pair for use with SSH. GitHub has one of the best articles on doing this, but it’s not quite what I want. I find myself having to re-think the small differences I want to make each time, clearly time to write up what I do so I can just visit this post when I need to generate a key.

Yup, that’s it. In the directory you run this there will be two files generated. The private key is basename, and the public key is basename.pub. I’m also a fan of the .ssh/config file which you may want to adopt, this makes it easy to have different keys for different systems.

Breaking down the creation command. We are generating a key using the Ed25519 algorithm, most modern systems will support this. Next up we see that we are adding a comment, I find this useful to identify what the public key is for. Last is the filename(s) we want the output written to.

You’ll see that comments often have no whitespace in them, if you want to be risk adverse avoid using spaces and use dashes or something.

Cloudflare Managed DNS

Consider this an update to a previous article where I talk about using rebel.ca to manage my DNS records. I still like them as a company, and the support is generally good – I will no longer recommend them. Without getting into the details, I had a DNS management problem with them that was the last straw for me, this resulted in about 36hrs of downtime for this domain.

The good news is that today, there are lots of free managed DNS providers. Really this isn’t a huge technical challenge. You need two nameserver entries in your SOA that is managed by your registrar. Ideally those nameservers are hosted on machines that are on different networks and in different locations for good redundancy. As far as management of the record, having a friendly web UI isn’t a huge problem in 2024. These servers will answer DNS requests for the many DNS servers out there that ask for a name to IP mapping.  Yes, there are real costs to operating one of these – but for an individual personal domain, the number of queries and amount of data is pretty tiny.

I decided to go for Cloudflare, they offer a generous free plan and are pretty central to the operation of the internet as a whole. Hopefully I can trust them to manage my DNS record(s), but I do have some reluctance because the internet is dominated by a few huge tech companies which isn’t great. I believe the internet needs to be built on open standards and we need lots of medium sized companies providing services.

You can sign up for Cloudflare in minutes. Entirely self service, and email confirmation is used to give you full access.

Adding one of my domains to be managed by Cloudflare is easy. I want to click on the ‘Website’ entry on the left navigation bar. Then I pick +Add a site to enter the domain I want them to manage DNS for.

Now we pick the ‘Free’ plan and move to the next step.

Cloudflare does a pretty slick job of sniffing out your existing DNS records (assuming you have some) and populating it’s configuration. Review these and edit as needed. Then we can continue.

As I mentioned above, I really don’t want Cloudflare messing with things much – so I disabled the “Proxy” for all of my records and have it setup as DNS only. If I ever have a problem, I can go in and use some of their free DDoS protection stuff – but let’s start with just the basics.

To enable Cloudflare to be my DNS provider, I need to go change the record with my domain registrar so that it points at the Cloudflare name servers. Cloudflare monitors this and will make my domain ‘active’ in the dashboard, but they’ve already created the DNS records and things are good to go. It was a matter of minutes for my domain to become active once I’d modified the SOA with my registrar.

I did this back a few weeks ago. So far so good, but then again DNS is pretty boring when it’s not breaking the internet.