Re-encoding/transcoding video for space savings

There are lots of good reasons to run a media server at home to host your own content. There are also plenty of legitimate ways to get content, but also many that are less so. I think it is important to pay creators for their efforts, I also really like “owning” the media I enjoy and not paying repeatedly to re-watch things.

Sometimes you’ll end up with a version of the content (say captured OTA) that is both high quality, but huge. An hour long 1080p HDTV capture from my HD Homerun lands in around 7.7GB. A more efficient encoding can significantly drop that, of course these are lossy encodings meaning that the quality can be less. The trick is to figure out how to pick an encoding of the video that retains most of the quality, but is significantly smaller.

Different codecs have different approaches, all with trade-offs, but sometimes it’s really hard to see the differences. Many of the content creators are also faced with the same dilemma, they have high bit-rate master copies of the content and need to squeeze it down to fit on a DVD or Blu-ray.

Let’s dive into how we do this. For this example we’ll take a 45min TV show that starts at 1.9G. It is encoded in AVC1/H.264 at 6000 kb/s. The image quality is very good, but it’s also sort of big and maybe you don’t care that much about this particular TV series but you do want to retain a copy.

FFmpeg can fix this for us. We’ll be moving to H.265 as an encoding, burning a bunch of CPU cycles to get there, and ideally get a video that is substantially the same but uses less storage.

You can see we’ve told it to re-encode to the H.265 codec, but to simply copy the audio. The crf flag is important for maintaining quality, lower values will be more quality, higher less. If you don’t supply the flag, a default of 28 is used resulting in more space savings, but to my eyes a softness in the output.

That is a significant space savings, the new file is only 27% of the original – 3 times smaller! Sure, it’s small, but is it any good?

Let’s look at some stats. While we have maintained the 1920×1080 resolution, we’ve dropped the bitrate to 1510 kb/s. This explains a lot of the saving, we’ve reduce the number of bits used to create each frame of the image.

We will use ffmpeg to extract a few frames of the two videos for comparison. The use of the .png format will give us “lossless” images (but recall the video itself has been encoded in a lossy format).

Above are the the frames from 4:10, the first/top is from the original, and the second/bottom is from the smaller version. You can click through to view the original 1920×1080 image. Visually they seem identical.

Using the idiff tool we can create a difference image and get data about how different these two images are from each other.

Ok, so they are different but we knew that. What does the difference image look like?

Yup, just a big black box not a lot of obvious differences, this agrees with what we we can see ourselves. Let’s tweak the idiff command to highlight the differences by a scale of 20

Now we can see the small differences magnified, and it’s in areas you’d expect, around the edges of objects and in the small details.

Let’s look at another frame using the same approach.

Again, the original is the first, and the smaller is the second. Let’s do the diff, but only grab the scaled version.

Again, in the scaled version of the diff image we see that the fine details are were the very small differences are. Try opening the full images, one in each tab and toggling between them – can you see any changes? Yes, we know there are differences – the smaller file is 1/3 of the size – but is this enough to matter? Or even notice?

I’m going to continue to use MakeMKV to rip blu-rays of movies I consider ‘reference material’ – like Dune, because I want all the bits. However, if it’s just some TV show I captured OTA and I’m going to watch once, or at most a handful of times – I’ll take the space savings.

Generative AI code assist

I considered use “Vibe Coding” as the title, but it’s just such a buzz word that I decided to go with a more factual title. I’m old school enough to want to distinguish between generative AI and the more broad AGI (Artificial General Intelligence). I’ll also state that I consider myself a bit of an AI coding skeptic, but hopefully in a healthy way.

Just like any computer program, garbage-in, garbage-out. The modern buzz word for this is AI-slop. I’ll avoid bashing the technology and focus on how you can use it constructively today, even with some of it’s limitations. I will also confess that at work I’ve got access to AI for code generation and it’s been interesting learning a new set of skills, this post will focus on what you can do for free on the web.

Perchance has a no login required, free code generator. I was in the process of setting up karakeep to replace wallabag. Both of these tools perform a similar function, effectively a web based bookmark manager and offline capture of a web resource. A simple list of links + archive.org would solve the same problem, but this is a self-hosted solution and is pretty neat.

The task at hand is to figure out how to export all of my links from wallabag and the import them into karakeep, the more context I can preserve the better. Since there isn’t a common import/export format between the two tools, we’ll use the aforementioned code generator to create something to convert the file.

Luckily both support a JSON based format. I can export from wallabag into JSON. It looks something like this

And karakeep has both an import/export supported into JSON format

First we will create a few sample entries in karakeep and do an export to figure out what it’s format is. It turns out to look something like this:

If you look at the two formats, you can see some obvious mappings. This is good. I started with the perchance code generator and a very simple prompt:

This let me get my feet wet, and make sure I had my environment setup to run code etc. I do have reasonable javascript experience, and that will help me use the code generator as a tool to move quickly. I tend to think of most of these AI solutions as doing pattern matching, they pick the ‘shape’ of your solution and fill in the blanks – this is also where they will make stuff up if there is a blank and you haven’t given it enough context, it’ll just guess at a likely answer.

Once I had the code generator creating code, and I was able to test it, things moved along fairly quickly. I iterated forwards specifying the output format JSON etc.. and I was both a bit amazed, but pleased to see that it had decided to use the map capability in nodejs. This made the generated code quite simple.

My final prompt ended up being:

And this is the code it generated

Notice anything curious here? It has, without me saying anything, decide to map title and tags into the output object. Very nice – I’m impressed.

Was there any really smartness here? Well, I would not have arrived at the idea of using map in the javascript code – it’s the right and elegant solution. A stronger javascript developer would have likely landed here since it is a concise solution to the problem. Maybe I would have found a similar answer on stackoverflow, but the code generator made it easy for me.

The date manipulation is also very slick.

I would have eventually got there, but it just did it for me. A very nice time saver.

This generated javascript let me export / import 376 entries from my wallabag — preserving the original dates, tags and titles.

Sometimes working with AI for code is like having a book smart, and very fast, new hire. No experience, lots of enthusiasm, and cranks out code quickly. Does the code work? Not always, maybe not even often. I’ve also had to ‘reset’ the approach being used, when multiple iterations uncovered that it was basically impossible to solve the problem using the approach that I started with. Using test driven development can help provide guide rails for ‘working / not working’, the more context you can provide the better. Learning how to guide the AI, and evaluate if you’re getting what you intended to ask for are the new skills I’ve been growing.

I feel I do need to throw down some caution flags around AI use. If you’re using something that is ‘free’, think again why are they making it free to use? Open source projects, don’t mean that it’s safe. Under the covers, this is all still built out of the same parts – so if you use it to open your network and data to the internet, you’ve got the same security problems.

Interesting times.

Forgejo – self hosted git server

Source control is really great, and very accessible today. Many developers starts with just a pile of files and no source control, but you soon have a disaster and wish you had a better story than a bunch of copies of your files. In the early days of computing, source control was a lot more gnarly to get setup – now we have things like GitHub that make it easy.

Forgejo brings a GitHub like experience to your homelab. I can’t talk about Forgejo without mentioning Gitea. I’ve been running Gitea for about the last 7 years, it’s fairly lightweight and gives you a nice web UI that feels a lot like GitHub, but it’s self-hosted. Unfortunately there was a break in the community back in 2022, when Gitea Ltd took control – this did not go well and Forgejo was born.

The funny thing is that Gitea orginally came out of Gogs. The wikipedia page indicates Gogs was controlled by a single maintainer and Gitea as a fork opened up more community contribution. It’s unfortunate that open source projects often struggle either due to commercial influences, or the various parties involved in the project. Writing code can be hard, but working with other people is harder.

For a while Forgejo was fully Gitea compatible, this changed in early 2024 when they stopped maintaining compatibility. I only became aware of Forgejo in late 2024, but decided Gitea was still an acceptable solution for my needs. It was only recently that I was reminded about Forgejo and re-evaluated if it was finally time to move (yes, it was).

Forgejo has a few installation options, I unsurprisingly selected the docker path. I opted to use the rootless docker image which may limit some future expansion such as supporting actions, but I have basic source control needs and I can always change things later if I need.

My docker-compose.yml uses the built in sqlite3 DB, but as mentioned above is using the rootless version which is a bit more secure.

As I’m based on NixOS, my gid is 100, not the typical 1000.  I had to modify my port mapping to avoid a conflict with Gitea which is already using 3000.

Now, the next thing I need (ok, want) to do is configure my nginx as a reverse proxy so I can give my new source control system a hostname instead of having to deal with port numbers. I actually run two nginx containers – one based on swag for internet visible web, and another nginx for internal systems. With a more complex configuration I could use just one, but having a hard separation gives me peace of mind that I haven’t accidentally exposed an internal system to the internet.

I configured a hostname (forge.lan) in my openwrt router which I use for local DNS. My local nginx is running with a unique IP address thanks to macvlan magic. If I map forge.lan to the IP of my nginx (192.168.1.66) then the nginx configuration is fairly simple, I treat it like a site creating a file config/nginx/site-confs/forge.conf that looks like:

Most of this is directly from the forgejo doc on reverse proxies. When my nginx gets traffic on port 80 for a server named forge.lan, it will proxy the connection to my main server (192.168.1.79) running the forgejo container.

With this setup, we can now start the docker container

And visit http://forge.lan to be greeted by the bootstrap setup screen. At this point we can mostly just accept all the defaults because it should self detect everything correctly.

When we interact with this new self hosted git server, at least some of the time we’ll be on the command line. This means we’ll be wanting to use a ssh connection so we can do things like this

There is a problem here. If you recall the webserver (192.168.1.66) is not on the same IP as the host (192.168.1.79) of my forgejo container. Since I want the hostname forge.lan to map to the webserver IP, I’ve introduced a challenge for myself.

When I hit this problem with Gitea, my solution was simply to switch to using my swag based public facing webserver (which runs on my main host IP) and use a deny list to prevent anyone from getting to gitea unless they were on my local network. This works, but means I had some worry that one day I’d mess that up and expose my self hosted git server to the internet. It turns out there is a better way, nginx knows how to proxy ssh connections.

This stackexchange post pointed me in the right direction, but it’s simply a matter of adding a stream configuration to your main nginx.conf file.

After restarting nginx, I can now perform ssh connections to forgejo. This feels pretty slick.

I then proceeded to clone my git repos from my gitea server to my new forgejo server. This is a bit repetitive, and to avoid too many mistakes I cooked up a script based on this pattern. Oh yeah, and I did need to enable push-to-create on forgejo to make this work.

This script takes a single parameter which is the URL for the source (Gitea) server. It then strips off the project name which is the directory of the project. We are using the --mirror option to tell git we want everything, not just the main branch.

Using this I was able to quickly move all 29 repositories I had. My full commit history came along just fine. I did lose the fun commit graph, but apparently if you aren’t afraid to do some light DB hacking you can move it over from Gitea as the formats are basically the same. The action table is the one you want to move / migrate. I’m ok with my commit graph being reset.

You also don’t get anything migrated outside of the git repos, this means issues for example will be lost. For me this isn’t a big deal, I had 3 issues all created 4 years ago. If you want a more complete migration you might investigate this set of scripts.

The last thing for me is to work my way through any place I’ve got any of those repositories checked out, and change the origin URL. For example consider my nix-configs project

At this point I’m fully migrated, and can shut down the old gitea server.

If you were interested in a Forgejo setup that has actions configured and is setup for a bit more scale, check out this write up.