Skip to main content

A story of a Distributed Denial Of service

We had a DDoS hit our DNS servers a few weeks ago, so I thought I would write up what happened for anyone interested.

First, a bit of background: Why do we ( Fedora Infrastructure ) run DNS servers? Well, we run them to provide users resolution of our domains. It's worth noting that we don't provide recursive servers that will answer queries for any domain, but just authoritative servers for the domains we manage. Doing this allows us to quickly update things (which we depend on to take proxy servers in and out of rotation) as well as make sure we have dnssec working and other configuration. If we were setting this up these days, we might very well go with a trusted 3rd party provider, but we predate those really existing and for the most part it's worked fine for us. We have a number of DNS servers, 2 of them in our main IAD2 datacenter and the rest spread out to various other places we have presence.

So, whats a DDoS (Distributed Denial of Service)? Basically it's flooding something with requests, but not from one point/ip/subnet, but from a distributed network of machines. You can't simply block requests from an ip/subnet, the requests are coming from everywhere!

The DDoS that hit us started pretty sharply. We started getting alerts from our monitoring, and then... couldn't reach any of our machines in our main datacenter. Luckily we have close contact with the networking folks there and they were able to see that we had hit a connection limit in our firewall there (10million connections). They were all connections going to our DNS servers, so we had the networking folks clear the connections and block DNS from our main datacenter until we got a handle on things. That got most everything back working, but DNS was still getting flooded, leaving some users unable to resolve our domains.

I shifted to working with the remaining DNS servers outside our main datacenter. Looking at the requests that were coming in, it seemed that many or most were from legit looking nameservers or ips. So, this attack was in fact a indirect one. They were querying recursive DNS servers to query us for domains we controlled. So, blocking would not do anything but block legit users mixed in with the recursive servers. So, I took several tacks: First, I increased limits we had in place to allow those servers to process a lot more connections at once (from 1000 or so to 10000). Secondly, I setup some limits per zone. All the queries were for our old fedorahosted.org domain, which had a wildcard (ie, anything.fedorahosted.org). Limiting per zone (which bind can now do) helped other more important domains get replied to while the queries against fedorahosted.org were limited. Finally I removed the wildcard on fedorahosted.org (we haven't used that domain in many years, nothing should us uing it now).

After doing those things (or perhaps cooincidence?) the attack stopped. Just as our servers were fully able to handle the queries coming in. So, hopefully we are better set if this happens again. If it becomes too common we may have to look at moving DNS off to a 3rd party that specializes in handling this kind of thing.

Finally, several people asked me what motive the people doing this might have. I have no idea, we are a humble Linux distribution here. I don't think we will every know more.

That new car smell

After about 9 months on a waiting list, we finally got our new car last tuesday. It's a rav4 prime XSE with premium package (ie, the highest end one they make).

A short thing at the top here for those interested in Open Source/Linux. The manual for the car points to a site that shows 53 components as having source available in one place via LG and another site for the informational display. It's got all the usual things you might expect: kernel patches, busybox, glibc, systemd, util-linux, etc. It's of course unclear how it all builds or works, but thats typical. At least there's a ton of open source here (like there is everywhere).

Now, read on for long boring car stuff. :)

So why did I decide to go for this make/model? Well, it's pretty ideal for us here. We live about 15mi from the grocery store/shops, so when we go into town for shopping or appointments it's about a 30-40mi round trip (depending on which side of town the thing we are going to is on). The rav4 prime has a advertised 42mile all electric range, so 95% of the time we should be able to be all EV and burn no gas at all. On those occasions when we go further (trip to the coast and back: ~90mi, trip to the portland airport: ~230mi, longer road trips, etc) we can just run in hybrid mode and get great millage and not have to worry about recharging or range. The prime has a 14.5gal tank, although toyota is apparently really concerned with anyone running out of gas, so when it says "0 miles" and the fuel gage reads empty, there's still about 3 gal in the tank (~100+ miles). That gives it a range of around ~500miles+~100mi reserve. I also really really liked the engineering involved in this car. They really did a very clever job using motor/generators and planetary gears. It's nice to know I never need to worry about: belts (there are none), starters (There isn't one, it uses a motor generator and the battery to start the engine), alternators (there's a dc/dc converter, no alternator), or driveshaft (the back 2 wheels have their own generator). The climate system is all heat pump based, moving hot/cold from where it is to where it's desired.

Some first impressions: Loving it.

  • Yesterday went to the store and back in 100% EV mode, round trip 26.4 miles, and used 36% of the battery.
  • The magnetic grey metallic color just looks super sharp.
  • The "wave your foot under the back and it opens/closes the back hatch" is awesome.
  • It's going to take some getting used to to just opening the door and having it know you have the key and unlocking for you.
  • Thursday went to the portland airport and back in HV (Hybrid vehicle) mode. 232miles. It claims I got 51mpg, which I suppose is possible. There was a lot of stop and go traffic when the engine just turned off and I think the fastest I got going was 65mph. Ave speed was like 38mph. Still have about 2/3rds of a tank of gas.
  • Radar cruise control with lane assist is pretty cool, but will take some getting used to also. The lane assist seems to want you right in the middle of the lane, but I found that my habbit was to be more toward the outside of the road (to stay further from crazy drivers). The radar cruise control wanted to brake later than I would have, ie, I saw break lights ahead and would have started slowing, but it waited until the car in front was slowing a lot before it started breaking.
  • The 2023 infotainment thing (they redid it completely for 2023) is pretty great. Voice commands seem to work as well as those things always do, navigation was solid, music via my phone bluetooth was great, phone calls and texts worked great. One interesting thing I found out is that my phone won't do android auto. This is because I'm running grapheneos, and android auto requires google play to have a lot of privs that graphineos doesn't let it have in the sandbox it runs it in. As far as I can tell right now though, I don't care. The regular phone via bluetooth integration seems to do everything I normally would want. Perhaps I'm missing some great android auto thing I am unaware of though.
  • Ventilated seats are awesome. Really makes things cooler and more pleasant.
  • I had to really punch it a few times on the airport trip in HV and... man does it go. (302hp)
  • The cameras for parking/backing up are outstanding. We put a doormat in the garage and when I pull the car in I can clearly see looking down where the mat is so I know where to position the car.
  • Blind spot monitoring is great. No more craning around to see if you can switch lanes. It just lights a light on the side mirrors when someone is there.
  • Digital rearview mirror took a little getting used to also (this is basically showing you behind the car with the camera, so you don't need to worry about peoples heads in the way,etc). I found it hard to refocus on it at first, but after a while I really liked it. It's much clearer than the non digital one since it doesn't have to look through the 'privacy glass' (ie, tint) in the back and can gather light much better. I will probibly use this all the time now.
  • The heads up display is awesome. Being able to see speed, charge / eco /power, lane assist/radar cruse, speed limit and songs when they change is great.
  • Sign detection is awesome. It basically looks for signs (speed limit, stop, yield, etc) and shows them on the panel/HUD. I can't count the number of times when driving in the past I said to myself "Hey, what was the speed limit here?" Now you know. It does put the sign in red/yellow when you speed, but it doesn't beep or care otherwise.
  • There's some gamification on 'eco driving' that I haven't really paid much attention to. Basically it gives you a score on starts / stops / etc from 1-100. My score way jumping around, but I was mostly ignoring it to play with the car, perhaps I'll look at it more in coming years.
  • The charging schedule stuff seems to work fine (if you fully plug it in, I didn't have it actually fully plugged in the other day and surprisingly it didn't charge until I did). So you can tell it you are departing at 2pm on friday and it will start charging so that it's done just before 2pm friday. You can even have it run the a/c before you leave to get the interior all ready.
  • The android app is... functional, but has a lot of quirks. But you can lock/unlock, start the car (and turn on a/c), see range for hv/ev, plugged in or not, charging or not, how long until charged. It also sends you notices anytime the doors lock/unlock, charging starts, etc. It's really chatty with notices and I don't see a way to adjust them, but thats mostly fine. It also has a little 'Drive pulse and trips" thing where it tells you about your trips (about a day after you made them). You can opt out of this, but I am leaving it on for now. This all requires a subsription of course, but new cars come with 1 year included. I guess I'll see in a year if I want to renew.
  • There is of course a ODBII port and... wow... theres a lot of data there. I will have to play around with it and see if any of it is useful to collect.
  • Panoramic moon roof is awesome. Lets in tons of light and looks really nice.

A few annoyances / things to sort out:

  • When you stop and open the drivers door it gives you a cool little report about your trip. time, mpg or KWh/mile, miles, etc. If you have a charging schedule set it shows you a thing about that and asks if you want to charge now, then shows the trip report and then... it disappears. I see no way to get it back, so it's anoying that you can't look back at your last trip and see the stats.
  • Similarly for mpg/kwhpm you can get a graph that shows your 'best mpg' and bars for your trips, but mpg is pretty worthless when you are in ev mode (it shows 99.9mpg), and there's no way to see more details about any of the trips. (at least from the infotainment/display).
  • The "guess-o-meter" as other folks call the millage estimates is pretty off for ev mode especially. I went 26.4 miles and used 36% of battery, but 100% full the guess-o-meter said I had 33miles of range. I understand that it has to learn/calibrate. Will be interesting to see how well it adjusts in the next month or so.
  • As folks noted on the net, the pedestrian warning noise at low speeds in ev mode is pretty weird. Many people call it 'space ship like'. Moving forward it's just weird, but not too anoying, but backing up... wow. it's really quite loud. Will have to see if I get used to it, or want to do something about it.
  • Amusingly, insurance was about the same as the 2013 gas rav4 we traded in on this one. I guess because all the crazy safety features.

Not yet tested / need to look into it more:

  • Will be interesting to drive at night. I think the HUD will look really sharp and nice at night. Will have to see how the headlights do.
  • Also living in the pacific northwest, will be interesting to see how the wipers do, and how it handles in rain. The OEM tires are... not great, but I am planning on just using them until they wear out.
  • Longer trips in HV at higher speeds will be interesting. I am sure millage will be less (hopefully around the 36-40mpg thats advertised). Still will likely double the mpg we would get in our other car.
  • Didn't play around with satellite radio at all. I'm happy with my music/podcasts
  • Didn't play around yet with the wifi hotspot. This service only has a 1 month trial included, so I'll have to decide if it's worth getting. My phone does just fine as a hotspot most of the time, but it might be nice to have just because it's a different provider (AT&T) and will have different coverage.
  • I avoided the dealer extended warentee and ceramic coating sell. I might get these things down the road, but I think I can do cheaper/better than the dealer on them.
  • Speaking of, considering a ceramic wax, but there's a ton of choices out there.
  • It came with roof cross bars, which I had the dealer take off. We don't usually carry things on the roof, and we can always put them back on if we do.
  • I'd like to try a trip sitting in the back. It looks really quite nice, but would be interesting to see how comfortable it is. So far I have always been in the front.

Anyhow, thats enough rambling. Overall I am very happy and enjoying playing around with it.

Tracking down a bug

I thought I'd share a fun hour or so of my afternoon and how I approached tracking down a bug in the Fedora Xfce spin.

The bug is https://bugzilla.redhat.com/show_bug.cgi?id=2170682 "Updates cause Xfce to lose all menu icons". A curious bug, on updating to the latest version icons in menus no longer were enabled (although the user didn't disable them and they are enabled by default).

Since the report happily had the update where it started, I went and looked first upstream (which turned out to be a mistake, but you never know). I looked through the recent commits for xfce4-settings to see if anything stood out as possibly being related to this. Nothing really did. All the changes seemed minor and unrelated.

Next I grabbed the latest Fedora 38 prebeta/branched Xfce live media and fired it up in a vm. Sure enough, the icons were not there at all. I checked the live users xsettings.xml vs the default one and they both seemed fine from a quick glance. Then, I saved off the user xsettings.xml file and reset the menu icons option. Sure enough, that one option was added, but the file was otherwise completely the same.

Next I checked the user journal. Bingo. There is a message saying the system xml file is broken/not parsable. Thats the problem! So where does the offending line come from? It's from the patch we have to set the default theme... the xml wasn't formed fully correctly and thus xfsettingsd just gave up reading it and never got to the default for menu icons. :)

Knowing when something broke really helps you narrow down what to look at for the problem. In some ideal world I would have checked the user journal for errors first, but I did get there eventually.

So if you hit a bug: try and figure out when it started, use that to bound your search for _what changed_ and that will a lot of the time find the culprit.

error: rpmdbNextIterator: skipping in Fedora 38+

I've seen this question enough times recently to decide to just write up a blog post on it and point people here. :) 

If you are running Fedora 38 (currently rawhide, but will be branching off soon) and you are getting errors like this from dnf and/or rpm:

Running transaction check
error: rpmdbNextIterator: skipping h#   47749 
Header V4 DSA/SHA1 Signature, key ID 7fac5991: BAD
Header SHA256 digest: OK
Header SHA1 digest: OK
error: rpmdbNextIterator: skipping h#   47749 
Header V4 DSA/SHA1 Signature, key ID 7fac5991: BAD
Header SHA256 digest: OK
Header SHA1 digest: OK

This post is for you. What is going on here? And how can you get around the problem?

Well, what happened is that rpm used to have internal code to handle signatures on packages. This code was really something rpm didn't want to have to maintain, so they switched recently to using sequoia, which is a new gnupg handling project written in rust. With this switch, sequoia actually honors the site wide fedora crypto policy (which the internal old rpm version did not).

Back in Fedora 33, the distro wide crypto policy was updated to disallow SHA-1 as a signature algorithm. See https://fedoraproject.org/wiki/Changes/StrongCryptoSettings2 for more information.

You might wonder why, if Fedora changed the distro wide crypto policy to disallow SHA-1 in signatures, why didn't they update things so nothing used SHA-1 now? Well, the short answer is: they did. No rpms that Fedora produces now use SHA-1 signatures. However, some third-party rpms do. One of the big ones that many people are hitting is google's "chrome" web browser. There's probably others.

Now that we know _why_ this is happening, what can you do? Well, first you need to do something so dnf and rpm allows you to remove/update/change your package set. You can do this by (temporarily!) allowing SHA-1 Signatures with:

sudo update-crypto-policies --set DEFAULT:SHA1

rpm and dnf should now work for you again. You might remove packages that have SHA-1 signatures and switch to alternatives. Or wait until google updates their signing key (the current one is from 2007). Once you have done what you need to do you can set the policy back to it's sane default:

sudo update-crypto-policies --set DEFAULT

Ryzen Powerhouse PC detailed Linux review

As I noted a few weeks ago, I was pondering replacing my old trusty 10 year old server with something new and quieter. I finally settled on a https://silentpc.com/powerhouse-pcs/ryzen-powerhouse build from silentpc.com. Here's probibly more than anyone wants to hear about it.

First, I could easily have built up a new machine myself, researching and ordering parts, carefully hitting them together and cabling things, but... I have done that a number of times in my life, and now mostly find it tedious. If thats the sort of thing you enjoy, then do go for it, but personally, I'm happy to pay overhead to a place like silentpc.com to let them source all the parts, get them all working nicely and cabled well.

I went with this machine for a number of reasons:

  • I wanted something with enough drive bays for 3.5" spinning rust so I could just move the 4 from my old machine. That would simplify migration and let me have some 'slow' storage for things like backups and such. This machine has 10 bays total. 1 used by a dvd drive, the others open.
  • I wanted to go with an AMD cpu. I've heard a lot about the current gen being fast and power efficent.
  • I wanted to start with 2 nvme drives, then add later a 4 nvme expander pci card. This MB should support that.
  • I wanted a quieter machine.

Ordering was easy. The sales folks were quick to reply to my questions about nvme expanders and video cards. I definitely did not want an nvidia card, so they arranged to add just a low end, but linux support radeon card. They estimated a 6-10 day build process before shipping, but it was less than that, and around christmas time too! They really did a banner job on packaging. The machine came in a big box with a small box of cables/etc and another box for the server itself. They carefully filled the interor of the machine with bubble wrap so nothing moved around in shipping. The case is super heavy and has noise dampening material on it. There are 3 fans, but they are all large and slowly moving (500-600rpm). Even after adding 5 3.5" spinning drives, the fan in my computer closet is much louder than the server. The CPU heat sink is massive, but thats great, because it means it doesn't need to run the fan at vast speeds. The power supply ( a HX850) actually powers it's fan off when things are mostly idle!

Lets take a look at some stats from the old server:

  • Dell C1100 "cloud" node
  • 72GB memory ( 1066Mhz/DDR3 )
  • 4 3TB 3.5" 7200 rpm SATA drives (3GB/sec) in a raid5
  • 2 Xenon L5639 @ 2.13GHz ( so 12 cores, 24 threads)

Against the new server:

  • Powerhouse Rizen PC
  • 64GB memory ( 3200Mhz/DDR4)
  • 2 Samsung SSD 990 PRO 2TB (currently in raid1)
  • 5 3TB 3.5" 7200rpm SATA drives (6GB/sec. I had one spare, might as well add it in) in a raid5 (converting to raid6 as I write this)
  • AMD Ryzen 9 5950X 16-Core at up to 4.9GHz

So, really the only thing the old server has any more of is memory, and it's old/slow memory anyhow. :)

I did need to order a few spare parts, but those were pretty easy. Got another network card (I use my main server as the firewall/gateway to my network) and some sata cables.

Like any good sysadmin, I got the new box all installed (Fedora 37 from netinstall) and configured and all ready, then set a downtime to migrate things. Everything worked fine out of the box on Linux, no problems with network cards, wireless, or anything. I did have to enable SVM in the bios to get kvm support, but that was easy enough to do. As part of this I cleaned up my computer closet, re-ran some cables and got some old UPSes batteries replaced. The downtime then was just:

  • Take down old server
  • Move network cables to new server for wireless/dsl
  • Move SATA drives from old to new server and connect
  • Add spare SATA drive and connect
  • Bring up and fix any issues

Things seemingly went fine, but then I hit something disturbing: My borg backup jobs that ran after the switch (to backup my laptop and main vm) failed. A borg check showed things were not happy. I wondered if it could be a borg bug (say that 5 times real fast), so I tried using restic. restic did manage to back up fine, but a check on it's repo afterward showed weird corruption. Looking at raid I saw that there were a number of mismatches on the SATA raid, so I wondered if those drives just were giving up after so many years. However, I soon noticed that there were mismatches on the raid1 on the 2 brand new nvmes! My next guess was a bad memory stick, but why was it only affecting the backups/raid? A bit more digging and looking, I finally realized what it might be, and indeed a reboot brought things back to normal. The problem was that I had been playing with powertop and had done a 'powertop --auto-tune' and some power saving setting on some chipset device was causing all the issues. After a clean reboot and some repair/check's, the mismatch count on both raid arrays was 0.

Overall I am quite happy with the box. It's super fast and super quiet. I should be able to do some expanding on it over time (bump memory to 128GB, add in the nvme card, there's even space on the back of the MB for 2 2.5: ssd sata drives if desired). There's space for 3 more 3.5" drives also (although I think after I add the nvme card I might drop all the spinning drives back to the old server). I would recommend/buy from again silentpc.com

On generators and physics

On tuesday here we had a massive wind storm come through. 25-35mph winds with gusts up to 75mph or more, along with heavy rains and near freezing temps. It's a pretty crazy thing to experience in the forest. 250-300ft douglas fir trees swaying 30 or 40 degrees in the winds, branches coming off and flying by. A large gust came through and broke 3 of our large trees off and pulled another one out by it's roots. Very sad to see these big trees go.

But to bring things back to a tech focus, also on tuesday due to the storm, we lost power for about 16 hours. We had gotten a generator to handle just this situation and had run it happily for a spring outage earlier in the year. We were running on 20lb propane tanks (since they are so easy to get). In the spring we were getting about 5-6 hours of use out of a tank.

So, I fired up the generator and we got about 3 hours out of a partial tank, where I switched to our last full tank and headed into town to stock up. I exchanged the empty tank and got 3 more. That last full one was out when I got home, so I swapped another full one in and... it stopped after an hour. At first we were wondering if we had some heavy load on the generator we didn't realize, but on checking that wasn't the case. We looked for leaks in propane, but wait... the tank is all covered in frost and all frozen up. Here we realized the real problem.

Propane in tanks is in a liquid state, under pressure. When you open the valve, propane comes out and changes into a gas to power your whatever. This state change requires energy. Usually this is just pulled from the tank shell and everything is fine. However, when it's cold, the tank gets colder and colder and can't supply that energy anymore and with no gas flowing anymore, the regulator cuts off the generator.

So, at first we tried putting the tank in the garage instead of outside, wrapping it in blankets, etc. But that didn't really help too much. So, then we got a tub and filled it with water and put the tank in there. That helped quite a lot, we managed to get about 4 hours out of a tank then. At the end the tank had a 1 inch or so 'wrapper' of ice, but it did get the tank lasting longer.

So, whats the solution here? Well, I think getting just one larger tank (say a 100lb one) would help a lot as it will have a ton more surface area. Not as easy you move around, but oh well. Also, they make apparently powered heating blankets for the smaller tanks. I'll probibly pick up one of those in the short term. So, lesson learned: temperature has a lot to do with how well your propane generator will work.

Some thoughts on a new home server

I've been spending some of my time off in the last few days pondering replacing my old reliable home server with something new and shiny. I figured this might be a good time to write up some thoughts around this.

So, the first question that I am sure leaps to mind for people is: Home server? why on earth do you want one of those! Move it to "The Cloud"! Of course doing so would indeed have a number of advantages:

  • Better bandwith
  • No need to hassle with hardware, someone else would do that
  • Less noise and power usage at home
  • Depending on how deep in the clouds you go: less hassle running services

On the other hand it has real disadvantages to me:

  • No "real life" home setup to test/try/figure things out.
  • Never really 100% sure who has/owns/can do things with your data
  • Ability to mess with hardware, which can be kind of fun.
  • I have a small list of close friends who I provide services to. It's fun to keep in touch with them that way and have something I can do for them.
  • Ability to mess with running a bunch of services, which can be kind of fun.
  • Paying a cloud provider recurring fees for something I could just buy once and not pay for over and over again seems like it could be a win, depending on the fees.

Someday I might give up and move things, but it's not come fully to that yet. Email has been slowly getting more difficult to run on a non gigantic domain, but I've managed to overcome so far, so I will keep going until that becomes completely untenable. I really like having my data close by and knowing that I can go fix some problem when it happens. It's also been a while, but I want to look at spinning up a home OpenShift instance so I can dig into it more and learn more about the low level parts of it. Might need to use OKD or k3s or something instead of OpenShift, but should let me find out more about how ks8 works.

All that background said, lets look at my current home server. It's a Dell PowerEdge C1100/CS24-TY. I got it from https://deepdiscountservers.com long long long ago, along with another identical server. You can really get pretty great stuff there. It's basically all the old compute that cloud companies have aged out. So, they are usually older, but tons of memory and disk and cpu. These ones I got have 72GB memory, 24 cpu threads, and 4 3.5" hot swap drive bays in the front. The second one I got I used for a long time as a test machine, but it has a slightly too old cpu to do power management, so it's really really loud. The main server does do some power management, but it's pretty loud too. In my current house I have a closet for computer stuff, but even with the door closed I am near enough to it that I can hear the server running. Of course I can also usually hear the fridge in the kitchen running too. The drives I currently have are 3TB 7200 rpm hitachi's. Which have also been quite reliable. The server has a pci card in it for some more network ports. It serves as my main firewall / virthost / storage server.

So, why replace it? Well, it was made in the fall of 2012. Yes, thats 10 years old now. Thats ages in computer hardware. It's slow. The cpu is pretty slow and the storage is super slow. It's running the 7200rpm spinning disks on a 3GB/sec sata bus (They can do 6GB/s). Taking backups or moving a bunch of things or running a postgresql vacuum just takes ages. It's also loud. Not earthshatteringly so, and like I mentioned our fridge is also kinda loud, but there's a lot of times when the fridge compressor is off and I can hear the server distinctly. Finally, it's fun to look at things and then install and assembe them. Computer geeks gotta geek. Also, this is perhaps a chance for me to play with some things I haven't yet, like perhaps moving over to a AMD cpu instead of intel or raid on nvme, etc

So, my first thought was to just get another rackmount from deepdiscountservers, which would work fine, but it would be intel based, basically just a newer version of what I have now with more memory and cpus. The cpus would be intel and while newer servers are likely to do throttling better, I don't think the noise would be all that much lower. rack mount servers are just not designed to be quiet.

Next, I poked around on the net and ran accross silentpc.com, which has some interesting computers on offer. I focused in on the "Powerhouse Ryzen PC" box. It's a tower case, which is not ideal, but I'm sure I can fit it in somewhere. It's a Ryzen cpu, a power supply that can power completely off if things are idle, super quiet fans, etc. It's got enough room so I can move my existing 4 drives over to it (and add in a 5th that I have I was keeping for a spare). Only 2 NVME slots available, but... that takes me into an aside I had:

Most motherboards these days I have seen have just a few NVME slots on them. However, they make PCIe cards that have NVME slots (one, two, or four). The four slot NVME's are interesting. You need to have a motherboard that supports "pci bifurication" on the slot you are putting it in. If you don't, you can only see one drive and thats it. If you do, the motherboard takes the x16 slot and carves it into 4 x4 slots and you see all the drives.

From what I have been able to gather the Powerhouse Ryzen PC has a motherboard that has 1 pcie slot that can do bifurcation (but I asked them in email to make sure). If so, then I can get it with 2 NVME's and raid1 them for now, move the 3.5" drives over with most of my data, and then down the road I can get a PCIe 4 NVME card and stick 4 NVME's in there and raid 6 them with the 2 on the MB and then perhaps retire the spinning drives. :) Sadly, their web interface seems to only offer nvidia cards (which I really don't want), but I asked them in email and they can indeed do other cards. So, waiting to hear back, but I think this might work out nicely for a new box. If it does, I'm also thinking about moving the existing 1U boxes out to the garage and see if I can set them up with a wake on lan or the like so I can use them if I need to test something.

Looking forward to tinkering with it (or looking more if this one doesn't pan out).

New phone, who dis?

In october, I went on 2 trips and after that several things became very clear: First, covid is still out there and you can still get it (as I did) and that my current trusty phone that I had been using for the last 7 years finally needed to be replaced.

The old phone is a one plus 3t and it's been a great phone. I currently have been running /e/ on it (with no google) and it's been fine. Unfortunately, the years have now taken their toll on the battery. When traveling I had to put it on super battery save just to have any battery left after a day away from a charger. On super battery save it's slow slow slow, bordering on unusable. The sad thing is, if I could replace the battery I could probibly be fine with this phone for more years. It's true it doesn't take super great pictures, only has 64GB space, and has a bunch of scratches now. But alas, the battery is non replaceable, so I decided I needed to do something.

Of course the first thing I looked at was just using one of the 2 pinephones or 1 pinephone pro I already have. Sadly, they just aren't good enough for a daily driver for me. They are slow, the battery life is also bad (on the pro at least), and... the biggest problem: The camera is just not good at all. I end up taking a lot of pictures and I really need them to be viewable. Finally, despite lots of work, there's still a bunch of things non upstreamed, so I would have to run a custom kernel and a bunch of other non upstreamed parts.

Next, there are now some phones you can buy with /e/ pre-installed. There's the Murena one and Teracube 2e. The one has more storage space, but otherwise they have less ram than my oneplus 3t. :) This was a tempting option, but /e/ hadn't been impressing me of late. Updates were few and far between, I had to do a bunch of tinkering to get on the latest stream (based on android 11).

The fairphone3/4 are interesting, but don't seem to be available in the US at all. Of course I am sure I could get one, but likely it wouldn't work with US telco's.

I wanted to get something that would have a nice long life of updates also.

Finally I ended up deciding on just getting a unlocked pixel 7 and going with grapheneos on it. I deliberately choose the 7 over the 7 pro because it's slightly smaller (but still big) and from all reviews I read had a better battery life. The pixel 7 gets 5 years of updates, which is about as good as you can do these days.

Why grapheneos? Well, I did not want to go back to being tied to google if I could help it and /e/ doesn't really support any very modern phones. The free phone os'es (postmarketos, mobian, etc) also don't support pretty new hardware either. However, I figure after a few years there's a good chance something like a pixel will be supported by more of those and I can choose to jump to one of them if I want. grapheneos is basically ASOP (the "upstream" android) with a bunch of security enhancements added to it. The install process was pretty painless, but I did hit one problem where I tried to install with the web installer using firefox, then switched to chrome and got an error. I finally figured out that I forgot to close the firefox tab out and it was keeping the webusb locked so the other browser couldn't install. ;)

Install went fine after that and I installed f-droid and got my applications and data moved over to the new phone. The biggest headache was of course signal. I had been using it for sms/mms after my last /e/ re-install, but they are dropping support now, so I had to export my sms's back out of it to get them moved over. The export function doesn't tell you that it doesn't handle duplicates and you should wipe your sms db first, so I ended up with 2x my sms messages. Finally got that transfered over and signal deleted. signal could have been a great app, but they seem determined to made decisions that will drive them into irrelevance now. It's sad.

Anyhow, I hope the pixel 7 will last me a few more years until I can get a modern phone and put Fedora on it. :)

Onlykey DUO

Last year, I backed the onlykey DUO on kickstarter: https://www.kickstarter.com/projects/timsteiner/onlykey-duo-portable-protection-for-all-of-your-devices It seemed like a interesting device and I like that it's fully opensource, unlike modern yubikeys.

The device finally arrived last month, and I've had a chance to play around with it some. Sadly, I don't think it's going to replace my yubikey anytime soon.

On the good side: The device itself is nicely constructed. It has a multicolored led on it that indicates which profile is in use (There are 4: green, blue, yellow, purple). It's got 2 buttons on the end, so you can press one or the other or both at the same time and long or short presses for different slots. That means each profile has 6 'slots' for a total of 24 in all 4 profiles. You can set a pin to lock the key which you have to enter before using it, along with a 'self destruct' pin that will wipe all configuration when entered.

On the bad side however, there's a fair bit. The software to manage the onlykey is provided as either a ubuntu .deb or a snap. I tried to get the snap working with no luck at all, and ended up unpacking the deb to get things working. I looked into making a Fedora package but it's a node app and has a pile of deps.

Next, I tried to enroll a otp for our Fedora account system, but found that the TOTP secret wouldn't work. Further investigation showed that the onlykey NEO only supports sha1 for TOTP secrets and our account system uses SHA512. ;( There's a old closed ticket about this on the onlykey firmware repo: https://github.com/trustcrypto/OnlyKey-Firmware/issues/101

There's also no way to generate a ssh private key on the device (like you can using the opensc support on a yubikey). You can generate ecdsa sk openssh keys, which is great, but not too useful to me yet as RHEL7 and RHEL8 don't support them.

So, at this point I would not recommend these devices unless you don't need to interact with the Fedora account system or want to use the device with a Fedora linux install.

Another tale of a rawhide compose bug

Astute observers will have noticed recently that Fedora rawhide composes stopped after 2022-04-13 and didn't resume until 2022-04-19. A particularly odd bug was to blame. This is the story of that bug and my investigations of it.

A bit of background first on how nightly images are made currently. There's a cron job that calls a script (called nightly.sh) in the pungi-fedora pagure.io git repo. It does a number of small associated things (like sending message bus messages of status, sending report emails, copying results around, etc) but the big thing it does is to call pungi. pungi in turn looks at it's config (in that same pungi-fedora repo) and does all the heavy lifting of the compose, calling mostly out to koji to do things, but also doing some things locally. For images, pungi calls koji with some parameters (use this repo and name for kickstart file, use these compose repos for packages, etc). koji then calls different tools depending on what the image is. For livemedia the tool is livemedia-creator (in the lorax package), for qcow2/raw images it calls ImageFactory/oz to do the actual image build on a builder.

The compose on the 14th failed making a aarch64 Cloud-Base image. https://koji.fedoraproject.org/koji/taskinfo?taskID=85654099 with an odd traceback at the very very end of the build:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 17480: invalid start byte

So, my first go-to on these sorts of things is to look at what packages changed between the successfull compose on the 13th and the failed one on the 14th. That day of changes was pretty small, being a thursday in a week when everyone was focused on Fedora 36 final testing. The only thing that seemed like it could affect booting at all was a grub2 update. That update only added a 'read' module though, which normally I wouldn't think could cause any issue, but perhaps it was a odd aarch64 toolchain issue? So, I tried some composes with grub2 untagged. No luck, still failed the same way.

Next I tried to get some more information about exactly what was not utf-8 that it was complaining about. I did this by doing some koji scratch image builds. But of course my scratch builds never seemed to fail, they all worked fine.

Of course then it was easter weekend, I got busy around the house and didn't really dig back in to it until the following monday. This time I did some more scratch builds and finally managed to get it to fail correctly and with some added debug in oz on the builder to print out the data it was trying to convert to utf-8. At the very end of making the image (while it's shutting down), it was getting a bunch of:

cpio: Malformed number <bunch of junk encoded weird stuff>

This of course was not something python wanted to convert to utf-8. It looked like a totally different encoding in there, so oz marked it a failure.

A sidebar here on how oz works. oz takes its inputs and tells libvirt to fire off a vm for it, running the installer with the kickstart provided, and serial console pointed to a socket. It then starts polling (by default every 10 seconds). If there's disk or network activity on the vm, it gets data from the console and waits. If it hits the total timeout it kills the vm and returns a failure. If it doesnt see any disk or network activity for (default 300 seconds) it assumes the install has stalled/failed, and kills the vm and returns a failure. If the vm has shutdown, it assumes it's successfull and returns. The output of the console is returned back to koji so you can see the log file in the koji task. However, as Adam Williamson pointed out: That last 10 second window never actually gets collected by oz for converting/adding to logs. If the install shuts down right after oz polls, the next time it will poll the vm is shutdown and it just returns.

So, what is this cpio output? Well, when you shut down a Fedora system, dracut calls "/usr/lib/dracut/dracut-initramfs-restore". This script runs at the very end of the shutdown cycle and is supposed to copy out the initramfs to memory and then pivot your install to using that as / so it can unmount your real root directory cleanly. The script uses a wrapper called 'skipcpio' thats shipped with dracut. This wrapper skips past the first part of the initramfs, as this is just a small archive with the microcode to load that first thing when you boot. The next archive is the actual initramfs that does the heavy lifting. The script has no idea how you have compressed your initramfs (if you have), so it just blindly tries: not compressed? gzipped? bzipped? xz? lz4? lzop? zstd? The first try (not compressed) is the thing that results in the error spew. cpio bravely tries to unpack the zstd compressed initramfs and... has problems.

So, the flow is:

  • nightly.sh calls pungi
  • pungi calls koji to make Cloud-Base image
  • Oz calls libvirt to do install, starts polling.
  • Install boots, installs and finishes.
  • On shutdown dracut-initramfs-restore runs and because it tries uncompressed first, spews cpio errors
  • Oz polls again after the errors start, but before the vm is completely off.
  • Oz tries to convert the log to utf-8 and blows up.

Upstream dracut as it turns out just merged a PR to change the order it tries these things in (moving zstd to the first one) because it resulted in several seconds slower shutdowns. So, I just added that to the rawhide dracut package and everything started composing again.

So, what caused this to start happening now? Well, queue back to oz not getting the last bit of console output. My theory is that this has been happening for a long time, it's just since it was at the very end of the output, it always got dropped on the floor and oz never saw it. For some reason now we are shutting down faster, or slower and it's getting caught in that last poll that does have data.

Always fun digging into these sorts of things. :)