Early Mid April infra bits 2025

nirik

2025-04-12 17:43

Another week has gone by, and here's some more things I'd like to highlight from the last week.

Datacenter Move

I wrote up a community blog post draft with updates for the community. Hopefully it will be up early next week and I will also send a devel-announce list post and discussion thread.

We had a bit of a snafu around network cards. The new aarch64 boxes we got we missed getting 10G nics, so we are working to aquire those soon. The plan in the new datacenter is to have everything on dual 10G nics connected to different switches, so networking folks can update them without causing us any outages.

Some new power10 machines have arrived. I'm hopeful we might be able to switch to them as part of the move. We will know more about them once we are able to get in and start configuring them.

Next week I am hoping to get out of band management access to our new hardware in the new datacenter. This should allow us to start configuing firmware and storage and possibly do initial installs to start bootstraping things up.

Exciting times. I Hope we have enough time to get everything lined up before the june switcharoo date. :)

Fun with databases

We have been having a few applications crash/loop and others behave somewhat sluggishly of late. I finally took a good look at our main postgres database server (hereafter called db01). It's always been somewhat busy, as it has a number of things using it, but once I looked at i/o: yikes. (htop's i/o tab or iotop are very handy for this sort of thing). It showed that a mailman process was using vast amounts of i/o and basically causing the machine to be at 100% all the time. A while back I set db01 to log slow queries. So, looking at that log showed that what it was doing was searching the mailman.bounceevents table for all entries were 'processed' was 'f'. That table is 50GB. It has bounce events back 5 or 6 years at least. Searching around I found a 7 year old bug filed by my co-worker Aurélien: https://gitlab.com/mailman/mailman/-/issues/343

That was fixed! bounces are processed. However, nothing ever cleans up this table at least currently. So, I proposed we just truncate the table. However, others made a good case that the less invasive change (we are in freeze after all) would just be to add a index.

So, I did some testing in staging and then made the change in production. The queries went from: ~300 seconds to pretty much 0. i/o was now still high but around the 20-30% range most of the time.

It's amazing what indexes will do.

Fedora 42 go for next week!

Amazingly, we made a first rc for fedora 42 and... it was GO! I think we have done this once before in all of fedora history, but it's sure pretty rare. So, look for the new release out tuesday.

I am a bit sad in that there's a bug/issue around the Xfce spin and initial setup not working. Xfce isn't a blocking deliverable, so we just have to work around it. https://bugzilla.redhat.com/show_bug.cgi?id=2358688 I am not sure whats going on with it, but you can probibly avoid it by making sure to create a user/setup root in the installer.

I upgraded my machines here at home and... nothing at all broke. I didn't even have anything to look at.

comments? additions? reactions?

As always, comment on mastodon: posts/2025/04/12/early-mid-april-infra-bits-2025.rst