Skip to main content

OpenWrt one - a short review

Recently the OpenWrt One was announced for sale. This is a wireless access point/router powered by Banana Pi and designed by the OpenWrt project. Additionally, $10 from every device sold go to the Software freedom conservency to help fund OpenWrt efforts.

The device was available on aliexpress, which is a bit weird for us here in the west, but I had no trouble ordering it there and the cost was pretty reasonable. It arrived last week.

OpenWrt One box

The design is pretty nice. There's a NAND/NOR switch. Normal operation the switch is in NAND setting. If something goes wrong, you can hold down the button on the front while powering on and you shoule get a rescue image. If somehow even that image doesn't work, you can switch the switch to NOR mode and it will boot a full recovery from a USB drive. So, pretty unbrickable.

Initial setup was easy. Just screw on the 3 antenna, connect ethernet and usb-c power and everything came up fine. I was a bit confused on what password to use, but then I realized just hitting return would take me to the 'hey, please set a password' screen. A small note might be nice there.

Since I was using OpenWrt on my existing linksys e8450 it was pretty simple to configure the new accesspoint in a similar manner. Upgrade was pretty easy as soon as I realized that I needed to pick 24.10.0-rcN or snapshot on the firmware selector as there are no stable images for the One yet.

I then spent a lot of time playing with the channel_analysis page. This page scans for other accesspoints and shows you what channels are in heavy use or open. On 5ghz, there was basically nothing else, so no problems there. However, on 2.4Ghz there were an astonishing number of aps. I live out pretty far from town, but there's still a LOT of them. Of course some were coming from 'inside the house' like some roku devices or the like. Finally I decided channel 9 was the best bet.

switching things was a bit of a dance. I connected to the openwrt wireless network, logged in and changed the wired network, then powered off the old ap and swapped the network cable to the new one. Then, rejoined the wireless and changed the name/password so all the existing devices would just keep working.

I do notice faster connection rates on my main laptop at least. The accesspoint is also really responsive either via web (luci) or ssh. I may look at adding some more duties to this device over time. It does have a nvme slot so I could do some caching or perhaps some other setup. I also want to play with the usb-c console port and perhaps at some point upgrade my home switch so I can power it via PoE.

All in all a pretty great device. It seems to currently be sold out, but if you are looking for a nice, unbrickable ap that is very open source, this might just be the ticket for you.

OpenWrt One up and routing away

Hello from nikola

Hello again everyone.

After using wordpress for more than 20 years, I finally decided it was time to move off of it. I'm not really happy about the recent turmoil from the upstream wordpress folks, and I didn't think there was too much value over just moving to a static generator as so many have before me.

I did some looking around, and decided to just go with nikola. It uses python and seems pretty well used. It also has a wordpress import plugin which I hoped to use.

The first problem I ran into is that the 'nikola plugin' command didn't work. I couldn't see that I had done anything to break it, and some poking around let me see that this was a bug in 8.3.0 (which is what the current fedora rpm version is), but was fixed in 8.3.1 (released early this year). There is already a PR to update it:

https://src.fedoraproject.org/rpms/python-nikola/pull-request/6

So, I built the new version locally and plugin was back in business.

The wordpress_import plugin worked somewhat, but there were a few issues I hit there too. It tracebacked if I passed '--one-file' to use the new one file format (instead of a content and a metadata file). I looked at it a bit, but couldn't figure out where it was failing. I did have to tweak a bit of the wordpress export, mostly for mistakes that wordpress ignored, like posts with multiple of the same tag on them, etc.

I looked a bit at comments. I have 81 comments on all my posts over the last 21 years, but there are none in the last 5 years. There is a 'static_comments' plugin that lets you serve the old static comments, which looked promising, but it was not very clear to me how to insert it into the theme I picked to use ('hack'). The doc example has jinja2 examples, and just a 'adjust accordingly for mako templates'. I didn't want to spend a bunch of time learning mako templates, so for now I am just going to drop the comments. If I get time or someone wants to help me get static_comments working, let me know.

Builds are quite fast and it's an easy rsync to my main server. Hopefully this all will make me blog a bit more now.

This post will likely cuse aggregators (like fedoraplanet.org) to see all my recent posts again. Sorry about that.

fun with laptops

So, rewind to earlier this year: There were 2 laptop announcements of interest to me.

First was the snapdragon X arm laptops that were going to come out. qualcomm was touting that they would have great linux support and they were already working on merging things upstream. Nothing is ever that rosy, but I did pick up a Lenovo Yoga Slim 7x that I have been playing with. Look for a more detailed review and status on that one in a bit. Short summary is that it's a pretty cool laptop and mainstream linux support is coming along, but it's not yet reading to be a daily laptop IMHO.

The second was framework announcing a new batch of laptops would be coming out with some nice upgrades, so I pre-ordered one of the ryzen ones. But reader, you may ask: "don't you already have a framework ryzen laptop? and aren't they supposed to be upgradable? So why would you order another one?". To which I answer: yes, and yes, and... because I wanted so many new things it seemed easier to just order a new one and get a spare/second laptop out of it.

I have one of the very first generation framework 13 laptops. It was originally ordered as a intel 11th gen cpu/mb shipped in July of 2021, almost 3.5 years ago. So whats in the newer/latest version that I wanted?

  • Better hinges (the old ones are kinda weak and you can cause the display to 'flop' if you carry it by that.
  • New top cover. The old one is the old multipart one, the new ones have a one part aluminum one.
  • New camera thats supposedly better.
  • New battery (ok, I replaced the battery in my old one a while back, but always nice to have a new battery)
  • Replacement input cover (the thing with the keyboard/touchpad). After hammering mine for 3.5 years, the tab and/or alt keys stick and result in moving between windows being frustrating. Also, new one has no windows key, just a 'super' key.
  • Higher resolution / refresh rate display. (120hz and 2880x1920 and matte vs 60hz and 2256x1504 and glossy). In particular the glossy is very anoying in highly reflecting areas.

So, I could have replaced all those things, but at that point it seemed like it would be easier to just move to a new chassis and have a spare.

Of course things didn't go as planned. The laptop arrived and I swapped my memory and nvme drive over to it and... it didn't boot. Spend a fair bit of time with framework support back and forth. They wanted movies / pictures of most everything and had me do a bunch of things to isolate the problems. They decided it was a bad monitor/display cable and input cover. So, they shipped those replacements to me (they had to replace the display because the cable is attached to it). Unfortunately, they shipped them USPS, so it took about 9 days and because we don't get USPS here I had to go rescue it from the local postoffice before they sent it back.Today I swapped in the display and input cover and everything worked like a charm. A quick switch of memory and nvme I am am now booted on the new laptop.

Infrastructure happenings, second half of aug - first half of sept 2024

So, I was going to try and do these posts more regularly, but of course thats hard to do. After flock there was a bunch of things I wanted to post, then a bunch of fires and so things got behind. Such is life, so here's a few things I wanted to talk about in more detail from the last month or so. As always, I do still post on mastodon daily, happy to answer questions or comments there as things happen and expand on things in posts like this.

Fedora 41 branched off rawhide! This I think went much more smoothly than the last cycle. I like to hope it's because we documented all the things that were not right last time and did them this time. There were a few more things to adjust, it wasn't perfect, but it was much better!

We upgraded our OpenShift cluters from 4.15 to 4.16. I continue to be very happy how smooth OpenShift upgrades are. Not 100% seamless, but pretty good. This time we had some storage stuff that caused the upgrade to not finish, but it wasn't too hard to work around. So much nicer than the old 3.x days.

We landed a bunch of koji/kiwi changes before Beta freeze. Kudos to Neal Gompa and Adam Williamson for working through all those. It was nice to mostly get everything lined up before Freeze so we didn't have to be doing a lot of churn. We got everything working in rawhide first, then merged the f41 changes.

Had a really anoying IPA outage. I was running our main playbook (runs over everything) on a thursday night, just to make sure everything was in sync for the freeze, and... our playbook thought all our ipa servers were not configured right and tried to uninstall and resync them all. Luckily the server that was the CA master refused to uninstall, so we were still up on one server. From that we were able to reinstall/resync the other 2 and get things back up and working. I am still not sure why the playbook saw no dirserv running on the servers (and thus thought they were unconfigured). We are going to adjust that playbook to definitely not try and do that, and instead move setting up a replica to a manual playbook only run by humans as needed.

Thanks to a bunch of work from Stephen Gallagher and Carl George, eln and epel10 are now doing composes just like we do for rawhide and branched. This should allow us to retire our old ODCS (on demand compose service) setup, as its not really maintained upstream anymore and is on EOL os versions. Great to get things all running the same way, but of course we will probibly change everything next year or something.

We managed to sign off on Fedora 41 Beta being released next week. I was pretty amazed, as it didn't seem like we had enough time to really shake out all the bugs, but testing coverage ended up being pretty good. Looking forward to Beta next week and end of Beta freeze.

Infra and Releng workshop at flock 2024

Last friday at flock, we had a Infrastructure and Release Engineering workshop/hackfest. It was from 9am to 1pm, so 4 hours and we used them all. We did take a couple of breaks, but overall we powered through discussing the entire agenda.

Before the workshop we brainstormed a bunch of disucssion items at: https://discussion.fedoraproject.org/t/planning-for-infra-and-releng-hackfest-at-flock-2024/110244 and created a hackmd document to record notes into: https://hackmd.io/HxpzTNpITfu0OYmOGRApiw

I'm going to list here each topic, some notes about it and then any action items from that.

  • "Standards for OpenShift app deployments" - We run, but don't develop a number of applications in our OpenShift cluster. Right now the deployment methods are all over the map. Some apps use a source2image setup with production and staging branches, others just pull an image from quay.io where it's unclear how that image is made or could be adjusted, still others build local images, still others do even more different things. This makes it hard for us to debug or know what base images are in use. Also, some playbooks automatically fire off builds or deployments and they shouldn't. We should split this out to manual playbooks if we need it, but normally OpenShift will just do whatever is needed.
    • ACTION: create comments in each app playbook that explains how it's deployed
    • ACTION: with OpenShift 4.16 we will need to move all our apps that still have deploymentconfig to use deployment.
    • ACTION: Look at deploying ACS (advanced cluster security) to gain more visibility when we have out of date or vulnerable images.
    • ACTION: create a "best practices" guide (next to our development guide) doc that explains the way we consider best to deploy apps in our clusters. All of humaton, zlopez, smiller, dkirwan, abompard, lachmanfrantisek, lsm5, mohanboddu expressed interest in helping on this.
  • "Infra SIG packages" - We have a packaging group called "infra-sig" that maintains a bunch of packages that we use (or used to use). The group doesn't have too many active members these days and we really need to look at what packages are in it and orphan ones we don't use/need/want anymore.
    • ACTION: Find someone(s) to propose packages to orphan / add
    • ACTION: Onboard them with packit to help us reduce maint. We can get packit folks a list and they can mass onboard them for us
    • ACTION: look at list of folks in the sig and remove those who are no long around/interested.
  • "Discuss Releng packages"
    • ACTION: come up with list of releng packages that are owned directly by release engineering and add them to infra sig
  • "Discuss proxy network: move to nginx? change things? or keep?" - We had a bit of discussion about moving away from httpd to nginx or gunicorn. In the end we didn't really come to much consensus on this one, needs further discussion. We do have a lot of ansible playbooks that are apache dependent and things are broadly working ok with the setup we have. HTTP/3 support would be nice as would better perf, but not a requirement.
  • "Discuss making aws more ansiblized/managed, or not?" - We didn't really come to much conclusion on this one either. One problem is that our main amazon account is a subaccount of the amazon community account, so we can't divide it anymore and lots of groups use that, so we can't fully manage it very easily anyhow. This one also needs more thought I think.
  • "Discuss onboarding, what we can do to make it better" - we had a pretty nice discussion on this one, including some folks that are not involved right now with some great perspectives.
    • ACTION: kevin to post outline of docs changes and submit WIP PR for them for people to add to.
    • ACTION: after docs are in better shape, look at marketing to potential contributors
    • ACTION: after each release look at having a 'Hello' day where new folks can join and ask questions and learn about the setup.
  • OpenShift apps deployment info - Did a quick tutorial on how we deploy apps for all those present. Should be folded into the docs above.
  • "Look ahead: gitforge, bugzilla, matrix server" - This was just a discussion on all these things that are coming in the next year. It's going to be a ton of work.
  • "Retire wiki pages / migrate to docs" - We talked about where end user docs might live over contributor/member docs. We talked about all the wiki pages that we want to migrate _somewhere_
  • "Datagrepper access" - This was a discussion about the commops team wanting to do database queries on datagrepper for community metrics. It's logistically difficult to get access to the actual database from anywhere the tools they want to run are. So, after a bit of gathering requirements, we brainstormed a solution: Setup a database in AWS using RDS, load a recent dump from datagrepper to it and then setup some datanommer instances in communishift (or wherever) that listen to our message bus and just insert new messages as they come. This was it should be up to date, but cause no load for the main datagrepper instance (it would be completely seperate!). We now have tickets pending to do this work for them.
    • ACTION: infra folks to work tickets to get things setup alongside commops folks
    • ACTION: commops to install and use whatever frontends they want to query the RDS db.
  • ARA in infra - This would be nice reporting for us, although there was some discussion that if we get AWX setup it would have much of the same reporting in it. We left this I think as sort of a 'If someone had time and wanted to look at setting it up they could".
  • AWX deployment - We talked about issues/roadblocks on AWX. It isn't really setup to handle the way our ansible repo is setup (with a public and a private repo). We should be able to move it forward for a proof of concept tho and can then decide how we want to redo our repos or if we do want to. Reworking things to be more standard would also allow us to have example values for secrets so people could test/deploy/use our playbooks more easily in CI or other places.
    • ACTION: kevin to check on status and see if we can stand up the POC
    • ACTION: once thats in place, discuss redoing things or other options.
  • "zabbix checkin/testing/planning" - We have a zabbix setup thats pretty far along, we want to move it forward so we can retire nagios. Talked about the current status and ideas on moving things forward.
    • ACTION: Setup a bot channel that sends zabbix alerts so we can see what it's alerting on in order to adjust settings.
    • ACTION: adjust alerts based on above and based on when nagios alerts and zabbix doesn't.
    • ACTION: see about moving to next LTS version that has some improvments.
  • We then went to looking at our github repos for the fedora-infra group. We archived a bunch of old projects, a great way to end things!

I do wish we would have had a way to let remote folks interact with the workshop. We tried a google meet, but the hotel network was not kind to us on friday. So, there are a lot of actions above, we need to find people to match to them! Let us know if you have interest in helping us out.

All in all a great workshop and we used all our time and had some great discussions!

Flock 2024!

beware, this is going to be pretty long. I split these up by day in the past, but somehow this time I just kept adding to one post. We start with 2 days before and wrap up with some general thoughts.

Day -2 (monday, 2024-08-05): travel day. When I originally booked my travel I had a nice set of two flights in the afternoon/evening and all was fine, but then they canceled my first flight and rebooked me on a much earlier flight. So, I got up around 4am, showered and grabbed coffee and off to the airport. I had left a bunch of room in case traffic in portland was bad, but it turned out of course that it was fine and I had lots of time. Then to MDW for a 5 hour layover. Had a beer and a chicken sandwitch and caught up on email a bit there. Then, my second (and last) flight from MDW to ROC. This was supposed to be just over an hour, but turns out we had to wait about 45m for a connecting flight to arrive, then when we got almost to landing, it turned out there was a bunch of rain, so they had to circle around for another 30min or so. Then a quick taxi to the hotel and I crashed hard.

Day -1 (tuesday, 2024-08-06): I had planned a day before to recover from travel and try and get used to the time zone differences. I did manage to sleep in a bit and then met up with ab and ngompa for breakfast. We discussed all kinds of things and had a very nice time I think. I then went over to the coffee shop off the lobby for a bit of hacking and met up with a few more fedorans that were arriving. Then, off to dinosaur bbq for lunch. Was quite good! After lunch got together with some folks to talk about gitforge requirements. Added some to the investigation. Then, off to the leadership dinner with a bunch of other folks. Sadly a number of folks coming in today had travel problems (there were a bunch of really big rainstorms on the east coast of the US). Some of them had to take a train from NYC and only would arrive the afternoon of the next day. By the end the schedule had 20 revisions.

Day 1 - (wed - 2024-08-07): Flock begins! After a quick breakfast, off to...

  • the opening "state of fedora" talk from mattdm. A few charts and graphs, but some good things to think about too.
  • Next up was the FESCo roundtable. We started out with questions seeded from Aoife and then got a number of good ones from the audience as well. There was talk about recent decisions FESCo made, looks to all the upcoming possible changes and the future as well. I think it was great and we got some good questions. We kind of ran out of time in the end.
  • A quick break and then on to the Council town hall. Again some great discussions and questions.
  • I wanted to go to the Infrastructure projects talk after that, but I got sidetracked by the hallway track, talking to several folks I haven't seen in person in a while.
  • A lunch break and then on to introducing Konflux. This was only a 25min talk, but I also hung around after and asked some more questions. If everything pans out the way it's envisioned, I think Konflux could be really great for us. It would allow us in theory to replace koji, bodhi, compose hosts, signing hosts, autosign hosts, some ci infra, a bunch of scripting around uploading and syncing things, and likely more. It's still of course super early days, but I think it's got a lot of promise. There's a test instance setup now to allow maintainers to test builds and see how things look. I have a tab open to play with this after I get back.
  • Next to continue the 'big changes' theme, I went to the Git Forge replacement talk. There's example services setup here too to allow folks to look at forejo or gitlab. I'm hoping we can have a pretty good timeline to get the evaluations in and make a decision and look at deployment.
  • Another quick coffee break and off to the Lean Coffee session. It was interesting. We broke into two groups and everyone around each table would write a topic on a card. Then we each voted for our top two items and then starting with the one with the most votes we discussed them each for 5minutes then if the majority wanted to continue we did another 5 minutes. We had a bunch of varied topics including: How to recognize contibutors more. How to consolidate or make contributing to docs easier. How to better handle SIGs and communication between them and the rest of the project. Some good thoughts.
  • After that was my talk: Matrix: the red pill and the blue pill. I was worried that I wouldn't be able to fill up the time, but I almost ran out of time. Hopefully folks know better now how matrix is setup and the limitations and advantages it has. I will be uploading my slides next week for anyone who wants them. Basically the first part of the talk is the things you need to know as a user who is just trying to use Matrix and the second half was about how things worked and more 'geeky' details.
  • Another round of hallway track talking to lots of different folks about lots of different things.
  • The evening event was a board game/candy swap/karioke night in the hotel. The candy swap was super fun, it gets bigger and bigger every year. Lots of candy/snacks from all over the world and lots of great stories about them from all the fedorans there. I had even more good conversations about books, package signing, vegatables, and more. I called it a night after the karioke.

Day 2 - (thursday - 2024-08-08): The next morning started out with 2 great talks:

  • "It's OK to not know things" was great advice in any field, but definitely in software/operating systems. I'd suggest you go watch the recording of this one as soon as it's available.
  • Next was "How (not) to get into tech" and was a lovely history of a great progression through various roles. I suspect many of us didn't get computer science degrees and just 'happened' into what we are doing today. Also there were tons of cute dog pictures.
  • Next was the Fedora mentored projects showcase. Some great work from lots of people. One great takeaway here is that when you mentor someone, then they mentor a few people, soon you have helped an entire tree of people.
  • I stuck to the large room for the lenovo updates. Super glad lenovo is shipping Fedora on some of their machines, and using pretty much exactly what we ship. Sad that its not so easy to find the ones you can get with fedora, but at least they are there. Lots of new models, newly supported things coming up.
  • I wanted to go to the framework talk, but I got caught up in the hallway track talking to people. I'll try and catch it once the videos are up. After that was lunch and more discussions and talking with folks.
  • I got in and joined the risc-v talk already in progress (Another one(s) to watch later) and chimed in with info about the new risc-v koji hub we want to setup (hw is there, needs racking and setup).
  • Dan Walsh then did a great inro to bootc talk. The entire bootc setup is very interesting and it's going to make things so much nicer down the road once we bootc all the things. looking forward to it.
  • There were several talks I wanted to go to then, but again went on to hallway track. I also poked a bit at the mass resigning of rawhide with the new f42 key for next week.
  • The day ended up with a Q&A from Mike Mcgrath. A number of questions around AI/ML things and discussion of open source and how things might look in the next few years. Surprisingly not many questions about source code or rebuilds.
  • The day ended with a dinner at the Strong Museum of play. We went there at the last flock that was in Rochester and I remember it being fun. This time was no different, it was awesome to talk to yet more folks and then play some classic video games I remember from long ago. Gauntlet Legends was fun, Rampage, ghost busters, and centipede were all there. I used to be great at centipede, but I was really bad now. Just need to get one at home.

Day 3 - (friday - 2024-08-09):

Friday morning was all about the Infrastructure and Release Engineering workshop/hackfest that I organized. We started pretty close to 9am and kept working away with a few breaks until 1pm when lunch was ready. We had gathered a list of topics we wanted to discuss beforehand and went though them one by one. We actually did manage to at least touch on all of them. Notes were collected in our hack md doc I'm planning on reading through there this week and filing tickets for things as well as posts about plans we made. I was very happy that there were a few folks who aren't normally involved in Infra and Releng there. They chimed in on various topics like gitlab migrations, openshift configurations, ara setup, what still needs some old packages we wanted to get rid of and more. It was great to get some outside perspectives on things. I'd like to thank everyone who came! Toward the end we managed to archive a bunch of old github projects in our fedora-infra space. We came up with a plan to get commops access to datagrepper data for analytics and much more.

The afternoon Met up with Some folks and managed to figure out the AWS permissions issue that was blocking us from replacing fedimg. Hurray. Also a lot of discussion around sigul lockup debugging and secure boot chain. I wanted to go to the epel10 workshop/hackfest, but it was more important to fix up those things while I had the people involved right there to look at things.

Dinner friday night was a really nice team dinner with a bunch of co-workers. It was a big anoying that as the evening went on the place filled up with people and the base volume got higher and higher. After a point I couldn't hear anyone at all. Some of us did move away to a far corner and it was much better there, but oh well, I guess thats how it goes on a friday night.

Day 4 - (saturday - 2024-08-10):

Saturday was the mentor summit. I started to be involved in that, but then there were some folks who had some fires/blockers so I went and helped out where I could. For some reason the openqa test cluster in aws was all stopped. I restarted it and will be looking into what could have happened to it or how we can log what might have happened to it. I then dealt with some signing issues around the mass resigning and sigul lockups. Then there was a lot of great hallway track discussion on all kinds of topics

The conference ended up with a readout from a lot of folks, which I think is a great tradition. Lots of perspectives on what happened and got discussed. I tossed in my few cents.

Dinner ended up being 4 of us at a pretty nice ramen place. Good food and conversations.

Day +1 - (sunday - 2024-08-11):

Sunday had some folks going on a group trip to Niagra Falls, but I wanted to get on home, so my travel home day was sunday. Sadly, my flight was super early and I was car pooling to the airport with some other folks who left even eariler, so I had to get up at about 3:45am to meet up in the lobby at 4:30am and catch my 6:45am flight. My flight had Troy on it too, so we had some breakfast after we landed at Midway. Good to get a bit more discussion and I hadn't had much time with Troy during the conference. Managed to land around a hour late. Then the 2 hour drive home and finally I am able to crash.

Some general thoughts on the conference in no particular order:

  • I wish I had had a chance to get a picture with my boss, his boss, his boss and his boss that were all there at various times. Would have made an amusing org chart thing.
  • For whatever reason I seem to have spent a lot of time with Jeremy, David, and AB, but that was great as they are all wonderful humans.
  • To me it seemed like there was a lot of energy around all the changes that might be coming in Fedora: Konflux, new git forge, bugzilla replacement, and more. Of course you can't predict the future, but I am pretty hopeful of all these changes.
  • I was sad that a number of folks couldn't make it this time: pjones, jforbes, ausil, pbrobinson, and a bunch of folks from my work team and more.Just bad luck/timing I think, hopefully will see many of them next year.
  • Flock always leaves me weary in body, but energized in spirit.

Look for a post on the discussion from the infra and releng workshop later this week.

Fedora Infrastructure musings for the first week of aug 2024

Last week was a busy one (but aren't they all?). I did a fair bit of moving things around, and managed to quash a number more rhel7 instances: A virthost that I couldn't get to the management on finally I was able to get into so I could reinstall it, along with moving a noc vm off it once we got it's bridge to the mgmt network it needed. Finally we took down pdc as well. There's still a few loose ends with it, but keeping it up wouldn't have helped them since it wasn't getting populated anymore. Hopefully we will get those last few things done soon. They are:

  • The toddler that syncs components from src.fedoraproject.org to bugzilla. We have a fixed version of that plugin, but it just needs to pass testing and we can deploy it. Until thats in place, new packages will not get bugzilla components and changes of ownership won't be reflected. Sorry for the trouble.
  • A way to save compose metadata off so things like fedfind can query what packages are in composes. We are going to do this likely with just a small script that saves that data off to a filesystem, since we have the data already from pungi in composes, just need to save it.

I spent a lot of time trying to get this new lenovo slim 7x booting. I got to the point where the kernel boots and hands off, but then at the end of booting up something happens and the laptop resets. Pretty frustrating since I know others have it booting and largely working. I'm debating taking it with me to flock. Would be another thing to carry, but someone there might know what the boot issue is or how to debug it. Failing that, things are rapidly landing upstream, so it could just be something will get fixed soon. Really looking forward to it, the screen is really nice, and battery life should be super great.

Flock is next week! I will be there and am looking forward to talking with everyone. I'll be traveling basically all day on monday, trying to recover on tuesday and then flocking from wed->sat. I head home sunday. Safe travels to everyone heading to flock!

Fedora Infrastructure musings for the 3rd week of july 2024

The week started with the f41 mass rebuild finishing up (it actually mostly finished over the weekend, with just a few straggling builds left). For those that don't know much about it, it's basically when we (mostly) rebuild every package in Fedora rawhide. This then picks up a number of fixes (like improved compilers/tooling) and also is a chance to confirm that packages actually build currently and aren't sitting there broken waiting for someone to need to quickly update them and find they cannot. Things were a bit slower this time because we have fewer (but larger/faster) s390x builders, so when say 8 ghc versions are all building at once, smaller things pile up behind them, but it all worked out fine in the end. We again hit a problem that happens it seems like every time now: Someone commits some change to a package to git and then either doesn't build it, or does, but it fails gating or is otherwise untagged as broken, then the mass rebuild comes along and builds it again. We have talked about some ideas to handle this better, hopefully we will get to implementing something.

Upon looking around, I found that the linux support for the new snapdragon X laptops has been moving very quickly over the last few weeks. Lots of things now work, but of course it's super early days and you really need to be willing to poke and prod to get things working. Which describes me pretty well, so I ordered a Lenovo Yoga slim 7x to play around with things. Look forward to some reviews and info about what can be made working in the short term in Fedora. Should be a lot of fun.

Did some more flock prep work this week. See the discussion thread if you have thoughts on what we should discuss or any input beforehand on anything.

I thought I would share a bit about how I handle keeping track of tasks. It's definitely not perfect, but it mostly works for me. I've tried a bunch of todo apps/task trackers and they have always felt like they have a lot of overhead. I used taskwarrior for a long while, but adjusting things there was just a lot of work (especially re-occurring meetings/events). So, I went back to something I was using from long ago: a text file. So each week I make a new text file. It lists out the days so I can add tasks that I actually did on those days as I do them, and add things I plan to do/meetings/whatever with a - in front of them. I then have a longer list of todos/pending items below that divided into ready / post freeze / planning and I put things under there to pick from when I have cycles. I also have some todos that are moved from day to day until I do them and the entire thing is copied over the next week and adjusted. The - in front of things allows me to grep those out and output all the done tasks for reports or whatever. It works reasonably well, but the ready list is long and not as well organized as it could be. Anyhow, always room for improvement.

Fedora Infra musings for the third week of july

Another week has raced by (time flies when you're having fun?). flock to fedora is coming up really fast now. It's Aug 7th to 10th in Rochester, NY. Looking forward to meeting up with everyone there and having some great discussions. I have a talk (which I still need to write up) on matrix, which should be fun and then a Infrastructure and Release Engineering hackfest which I need to work on organizing a bit more. Look for more info on discussion.

On monday I managed to get updated firmware for our aarch64 emag's. Got them all updated, reinstalled and re-added as builders just barely in time for the mass rebuild. This sort of thing takes a really tremendous amount of time. I'd like to explain for those that haven't had this sort of fun before. There's a lot of parts of this process where you need to wait for something to happen and do something in reaction to it. ie, wait for one firmware (there's 3 on these aarch64 machines) to finish updating, then reload and upload the next one. For some reason I couldn't force them to pxe boot in all cases, so that meant: login to serial console, watch the server boot, when it gets to a specific point hit esc-shift-1 to pxe boot. If you miss it, you have to start over. You might think you could do other things while this is happening, but... when you do, you always miss the window to hit the key and have to keep doing it over and over. Next fun with these was that they have 4 interfaces and for some unknown reason, they are all active on various vlans and which one gets the 'default' route is somewhat random. If it's not the actual builder network, it can't reach resources and fails the install. Some times one or more of the interfaces wouldn't come up with a cryptic error. If this was the main network, you had to reboot and try again. Once they were pxe booted the kickstart install and ansiblizing was easy. Hopefully they will work on now until we retire them.

Our resultsdb app's pods have been restarting. It's not super clear as to the cause. They hit max threads and the health check fails and then they restart, but the reason for the max threads isn't clear. Is it somehow getting blocked on database writes so requests pile up? Or is it just getting too many requests at once to handle properly. I looked at it some, but the resultdb image that we use is made by the factory2 team (which no longer exists), and it's not very easy to enable debugging in. Will look at it more next week.

Overall this week I didn't feel like I got much done. Too many things that are difficult/require a lot of time and it's hard to feel progress on them. Hopefully next week will go better!

Fedora Infra musings for the Second week of july 2024

This week started out fun with some Oral surgery on monday. Luckily it all went very well. I went to sleep, woke up when they were done and had a bunch of pain medication on board. I'm getting pretty sick of 'soft' foods however.

Tuesday we had our logserver 100% full. Turns out a toddler (thing that takes actions on message bus messages) was crashing in a loop. When it does this it puts the message back on the queue and tries again. This works fine if it's some kind of transitory error and it can process it after a short while, but doesn't work very well at all if it needs intervention. So, 350GB of syslog later we disabled it until we can fix it. We did have some disucssion about this problem, and it seems like the way to go might be to cause the entire pod to crash on these things. That way it would alert us and require intervention instead of looping on something it can't ever process. Also, right now the toddlers are just generic pods that run all the handlers, but we are looking at a poddlers setup where each handler has it's own pod. That way a crash of one will not block all the rest. Interesting stuff.

Our new, updated mailman instance has been having memory pressure problems. We were finally able to track it down to the 'full text search' causing memory spikes in gunicorn workers. It's rebuilding it's indexes, but it's not been able to finish doing so yet, so without those this search is really memory intensive. So, we are going to disable it for now until the indexing is all caught up. This seems to have really helped it out. Fingers crossed.

This week was a mass update/reboot cycle. We try and do these every few months to pick up on non security updates (security updates get applied daily). So, on tuesday I did all the staging hosts and various other hosts I could do that wouldn't cause any outages for users/maintainers. Wed was the big event and all the rest were done. Ansible does make this pretty reasonable to do, but of course there's always things that don't apply right, don't reboot right, or break somehow. There's a share of those this time:

  • All of our old lenovo emag aarch64 buildhw's wouldn't reboot. (see below)
  • koji hubs fedora-messaging plugin wasn't working. Turns out the hardening in the f40 httpd service file prevented it from working. I've overridden that for now, but we should fix it to not need that override.
  • Our staging openshift cluster had a node with a disk that died. This disk was used for storage, so the upgrade couldn't continue. Finally got it to delete that and continue today.
  • flatpak builds were broken because f40 builders meant that we switched to createrepo_c 1.0, and thus, zstd by default. flatpak sig folks have fixes in the pipeline.
  • epel8 builds were broken by f40's dnf no longer downloading filelists. rhel8 has requirements for /usr/libexec/platform-python that wouldn't work anymore, so no builds.I've just added platform-python to the koji epel8 build groups for now. Perhaps there will be a larger fix in mock.

So, we have a number of old lenovo emags. They have been our primary aarch64 builders for ages (since about 2019 or so). They are now no longer under warentee, and we have slowly been replacing them with newer boxes. They now will no longer boot at all. It seems like it has to be a shim or grub problem, but I can't really seem to get it working even with older versions, so I am now thinking it might be a firmware problem. There is actually a (slightly) newer firmware, if I can get a copy. Failing that we may have to accelerate our retirement's around these. They really served long and well, and are actually pretty nice hardware, but all things must end. Anyhow, looking for the new firmware to try that before giving up.

Been dealing with this bug in rawhide kernels lately. The last two days I have come in in the morning and my laptop is completely unresponsive. A few other times I have hit the kswapd storm, and backups have been taking many hours. I sure hope the fix lands soon. I might go back to f40 kernels if the upstream fix doesn't land soon. I know I could just make my own kernel, but... I've done that enough in my life.

Till next week, be kind to others!