Archiving Floppies

I’m slowly getting to clearing out some of the old office stuff at home, and yes, I appear to still have some 3.5″ floppies. I did in fact have a 3.5″ floppy drive, but it was in an old husk of a former PC. My desktop machine has a modern power supply and didn’t even have the right power connector to hook up the drive (easy fix with an adapter) – at least the motherboard still had the right connector to hook up the data cable.

I then had to do the right BIOS dance to actually enable the device, once this was done I could see it under Linux as /dev/fd0. Unfortunately the handful of disks I tried to mount gave errors, it seemed either this drive is faulty or all of my disks are expired. Now, these are floppies from the early 90’s – which is oh my 35 years ago!

Time to bust out ddrescue, and see if I can image any of these disks to pull data. Sadly my initial attempts were not great – I wasn’t getting much data off of these at all. Maybe this is a huge waste of time. I found the useful seeming ddrescueview which gives me a way to look at the status of the rescue attempt.

Let’s cover the basics. My initial attempts looked like

This worked, but I got a lot of errors. Adding the -d flag seemed to help a lot, but later I found out that I needed more flags to make this right.

I found a useful wikipage entry from the archiveteam specific to recovering floppies.

Here is the ddrescueview visualization of my initial attempt:

So not great. Next up is when I added the -d flag

Better. Of course as I decided to make sure this was repeatable, I tried removing the -d flag and running it again to make sure it was really bad. This time I got a completely clean read (fully green). There were 2 errors reported, but it retried and it was good?

So I start trying various combinations to see if I’m getting repeatable results. Overall it’s random errors and no clean reads again.

Now the clever thing that ddrescue does, is maintain a map file. This captures what was done, and allows you to run another pass to try to have more luck. This is what I need. Referencing the archiveteam advice I landed on this as the right combination

Let’s break down the flags

  • -d : direct access
  • -b512 : sector size of 512 bytes, important for direct access
  • -r 3 : retry errors 3 times
  • –retrim : allows us to re-run, and re-try failed blocks in the map

Using this magic, I was able to run the command a second time and get a clean read! So you can either be lucky, or use the map file and try a few times with the right settings.

Now I can mount the image under linux

This particular floppy apparently contained a few rescue tools (NDD.exe ring any bells) Well, glad I got those bits back – guess I’ll toss it on the pile and move on to the next one.

Now that I have things sorted out – I’m finding a couple that read clean, which is pretty cool given the files are from 1991. Amazing how little fits on these floppies, when it used to seem like so much.

I did manage to ‘crash’ the floppy drive with bad disks or something, because it would get into a state that rebooting the machine would not fix. Powering it off for a minute or two and a full cold boot seemed to get things back on track. When it was busted I’d get errors like:

I did run into more problems just like this and I really don’t understand what was wrong, or how to get it to behave again. Very frustrating. I just had to keep trying cold boots and different floppies. Looking at dmesg I see:

I picked up a used USB floppy drive locally, it was only $15 and it gave me a secondary device to try some of these floppies with — and I was hitting my head against the wall with the errors above.

The USB floppy appears on my system as a drive /dev/sdc – but I can just use that device in place of /dev/fd0 and the same commands work. Hopefully resetting the state will be easier as I can just unplug the USB drive and try it again. We’ll see if it gets into a similar busted state (which appears to be triggered by bad reads). So far it seems much more stable overall and I’m working my way through my old floppies.

The USB floppy drive worked really well. It is starting to seem like that old 3.5 floppy drive I installed in my machine was maybe not so stable. Some floppies that had many errors, read just fine with the USB floppy drive.

To speed things up, I adopted a two phased approach. Trying an optimistic version which would fail out quickly – followed by the more aggressive 3 retry version above if I determined I wanted to get as much data as possible. This is the quick version:

As a bonus for anyone who’s hung on this far into the post, let me share some of the output you get from the ddrescue tool showing the progress it makes:

You can see above, that it did in fact get to 100%, but slowly and required a secondary run to finish.

This was certainly a trip down memory lane, I’m glad I persisted in trying to read the data. There were a few files I wanted to keep out of the pile of floppies, and now I’ve got the archived with my other files to keep.

Hoarder – a self hosted link collection and web archive

I found out about Hoarder via the self-hosted podcast. While I don’t always agree with the opinions of the hosts, they’ve helped me discover useful things a few times. I’d certainly recommend checking the podcast out.

The Bookmark Everything App

Quickly save links, notes, and images and hoarder will automatically tag them for you using AI for faster retrieval. Built for the data hoarders out there!

The install is docker friendly and based on compose. It’s a very simple 3 steps to get a test instance setup.

  1. Download the compose yaml.
  2. Create a .env file with a few values
  3. Then docker compose up

Seems like it supports “sign up” – if you host this visibly externally you may have some spammy sign-ups.. this may be something you can disable.. (yes, you can disable this as I find out below)

After you have created a user – you are greeted with this blank canvas

I currently run Wallabag – which I landed on after trying a few other choices. It was the best choice for my needs at the time, but also super basic. Wallabag has a mobile app which I find useful – as it makes sharing links I find on mobile easy to my Wallbag install.

Wallabag often struggles to capture a page – but it at least keeps the link. One example is this website – which has some sort of scraper blocker. You get a page that indicates it is protected by this.

Ok – so how does Hoarder do with a link https://www.thekitchn.com/instant-pot-bo-kho-recipe-23242169?

For comparison – this is what wallabag got..

The capture in Hoarder take a bit of time – not long, but it renders sort of a blank-ish card immediately and then the image fills in.


Let’s take a closer look at the tile that it created for me

The top of the tile is a picture and link to the original URL (1). The link (1) is also the same destination.
The date (2) and expansion arrows (2) both take you to a larger locally hosted view.
(3) is a menu of options

Let’s dive deeper into the expanded (locally hosted view)

The overall capture/rendering of the page from the local version is pretty good. Links in the text haven’t been re-written, but that’s both expected and generally useful.

This view also offers the option to view a screenshot – which is as you expect.

Since I didn’t provide an OpenAI key nor did I configure Ollama the fancy “Summarize with AI” button just gives me an error.

Looking – it seems this setup 3 unique containers

  • ghcr.io/hoarder-app/hoarder:release
  • getmeili/meilisearch:v1.11.1
  • gcr.io/zenika-hub/alpine-chrome:123

but.. I’m not seeing any storage on the host – this is probably bad, because that means at least one of these containers is stateful (and looking at the compose — there are two data volumes)

I have a preference of storing my data on the host filesystem as a volume mapping… maybe I’ll warm up to the whole docker volume thing, but it always feels like a big hack. (Read on and you’ll find that there is a way to avoid the storage concerns that I have here).

The search appears limited to the title only (boo) – tags are supported in search too.. but no deep searching within the text of the articles.

Looking more at the doc – persistence is something you can configure

DATA_DIR – “The path for the persistent data directory. This is where the db and the uploaded assets live.”

and it does appear you can stop signups from happening

DISABLE_SIGNUPS – “If enabled, no new signups will be allowed and the signup button will be disabled in the UI”

Interesting options for the crawler (disabled by default)

CRAWLER_FULL_PAGE_SCREENSHOT – “Whether to store a screenshot of the full page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, the screenshot will only include the visible part of the page”

CRAWLER_FULL_PAGE_ARCHIVE – “Whether to store a full local copy of the page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, only the readable text of the page is archived.”

CRAWLER_VIDEO_DOWNLOAD – “Whether to download videos from the page or not (using yt-dlp)”

Overall – I’m pretty impressed. I’m not sure I’m quite ready to dump wallabag, but this might become a project I tackle during the holiday break. That stew recipe is pretty amazing, absolutely worth trying.

Generating SSH key pairs

Despite having had some excitement recently, SSH continues to be both the utility and a protocol that I use heavily every day. I will also have to shout out to mosh which is a must have overlay, if you aren’t using it – stop reading this now and go get mosh.

Not often, but every once in a while I find myself needing to generate a new key pair for use with SSH. GitHub has one of the best articles on doing this, but it’s not quite what I want. I find myself having to re-think the small differences I want to make each time, clearly time to write up what I do so I can just visit this post when I need to generate a key.

Yup, that’s it. In the directory you run this there will be two files generated. The private key is basename, and the public key is basename.pub. I’m also a fan of the .ssh/config file which you may want to adopt, this makes it easy to have different keys for different systems.

Breaking down the creation command. We are generating a key using the Ed25519 algorithm, most modern systems will support this. Next up we see that we are adding a comment, I find this useful to identify what the public key is for. Last is the filename(s) we want the output written to.

You’ll see that comments often have no whitespace in them, if you want to be risk adverse avoid using spaces and use dashes or something.