Skip to main content

ansible 2.0 going stable in f23/f22/epel7/epel6

Just a quick heads up: ansible-2.0.1.0 is heading out to the stable updates repos for Fedora 23, Fedora 22, EPEL7 and EPEL6. There's a few outstanding issues folks might hit, but we figured it was finally time to push to stable. If you use includes for handlers you might hit ( https://github.com/ansible/ansible/issues/13485 ). This one will be fixed in 2.1 by allowing you to add a config option to set includes as static or dynamic. Also, you might see some intermittent issues with some hosts saying "unable to resolve remote temp directory" ( https://github.com/ansible/ansible/issues/13876 ). Hopefully this will be fixed in 2.0.2.0, and there are some workarounds in the issue. If you run into another issue, please do check if it's reported upstream and if not report it. We have also pushed out a ansible1.9 compat package for those folks that need to stick with 1.9 for now for whatever reasons. You can simply remove the ansible package and install ansible1.9 and you can use that version. It's currently at 1.9.4, but there's a 1.9.6 update pending updates-testing now also. Happy ansiblizing!

Rawhide: notes from the trail, 2016-04-03 edition

Well, it's been a while again, so I thought I would update everyone on recent goings on in Rawhide, Fedora's rolling development release. Since my last post we finally moved over to having Rawhide and Branched (what will be Fedora24) done by the new pungi4. There are sadly lots of little issues that have been cropping up and getting fixed by Release Engineering, but it's been a bit of a bumpy ride. As of this post, the last Rawhide compose that finished was the 2016-03-31 one, all the ones since then have failed. Hopefully this will get sorted out tomorrow. The current cause of problems is adding atomic/ostree to the compose (which we very much need/want), it's just been more difficult that we first hoped. Once things settle down we will really be in a better place than before, with full composes (tested by openqa!) every night. There was a bit of delay since the Branching event on signing, but last week we got back on track and (almost) everything in rawhide should be signed by the Fedora-25 key now. With compose issues and general breakage in the install path it's been difficult to install rawhide recently. Again, once we get the normal composes back on track this should clear up. As usual, once you have a rawhide install there hasn't been really much breakage of the day to day use path. Astute observers would have noticed that the rawhide-source mirrormanager repo wasn't working after the switch to pungi4. I've fixed that up the other day and rawhide-source should be working normally again.

How long does it take for that Fedora ssh key to sync?

There's been some confusion about how long it takes from changing your ssh key in the Fedora Account System (FAS) and being able to use it to access things like fedorapeople.org or pkgs.fedoraproject.org or fedorahosted.org. The method has also changed a while back so I thought I would write up this short post on it. TLDR version: for ssh access to fedorapeople, pkgs or fedorahosted, it should take about 2-17minutes. (usually 2). For access to other machines (if you are in a sysadmin group), it should take 15 to 30min (usually 15). So, you would be safe saying "within 30minutes". In the past, we simply had a cron job on every machine that would run every 30min or so and sync everything. This was a massive waste of resources. We had to size our fas servers to handle the load of every machine +/- a few minutes hitting them and for the most part, absolutely nothing changed. Then, along came fedmsg and we now had easily available data to know when something has changed and what it might affect. The new process is run from a fedmsg listener: It waits for ssh keys changes to be mentioned on the fedmsg bus, then it looks to see if they are from someone in a sysadmin group or not. It will then wait for about 30 seconds (if there's more changes coming in, might as well do them at once). If not, run a ansible playbook that updates just those machines that they might have access to. If they are, use a different playbook that updates all the machines in turn. The non sysadmin playbook takes about a minute to run. So, the vast majority of these changes will appear in about 2 minutes. The sysadmin playbook takes much longer (There's a LOT more machines in this playbook): 15minutes or so. The vast majority of changes here will appear in 15minutes (but it depends on which machine you are waiting for access on of course). As a side note this applies to ssh keys and mail aliases, but not koji certificates. Those are active immediately as soon as they are issued (and your old certs are revoked). Since the script will only run one playbook run at a time, the worst case is your changes just miss a sysadmin playbook run and you are yourself a sysadmin. Then you may have to wait 30 minutes. For the case most people will hit, it should just be a minute or two.

taskwarrior tips and notes

A year or so ago some co-workers pointed me at taskwarrior ( https://taskwarrior.org/ ) as a neat way to do todos and task tracking. At the time I tried to dive into it and it just didn't seem to work out for various reasons ( I didn't give myself free time to really dig into it, tried to make it do time tracking as well, etc). Last week and over this last long weekend I gave it another go and I think it may stick this time. I thought this might be a good time to share the tips and workflows I came up with in case anyone out there wants to give taskwarrior a try. First a bit of background: I'm a sysadmin by trade, so my todos and tasks are often tons of short tasks and reminders along with meetings and longer term planning/thinking items. For many years my workflow has been to use my mail client inbox along with a text file for things. If it was in my inbox it means I need to take some action or reply to something. If it's in the text file it's something I have already done (to keep a record) or some short term todos of things that don't have email associated. This setup works pretty well, but it's bad at followups (where I replied/did something and the other party didn't), anoying for repeat items (have to use a template for the text file or copy and paste), and bad at searching for particular things. Enter taskwarrior: The new workflow I've settled on pretty much replaces my text file with taskwarrior, so I still have mail inbox, but taskwarrior helping it out. There's extensive taskwarrior docs on the web and the man page is pretty nice too. It's very nice that taskwarrior is flexable, it can be used a bunch of different ways with different workflows and methods. You can set things up to use it the way you want to. Above and beyond those things:

  • Setting up meetings confused me, because I don't want to see them until near the meeting time/day, until I realized I can use wait to make them wait until I want to see them. For example, here's the EPEL meeting thats every wed at 18UTC: 'task add 'EPEL meeting' project:work recur:weekly due:2016-03-30-12:00 wait:soww' (soww is start of work week). So, I won't see this on my task next list until monday when I am looking at my upcoming week.
  • tags are very handy. For example I have been adding +nagios to any task where I am looking at or acting on an alert. That way I can go back after a while and look and see if there's any trends or things we really want to try and fix because they re-occur. 'task completed +nagios' will show all these completed nagios tagged tasks. (likewise, ansible commits, bugzilla, infratrac and internal only tickets)
  • 'task edit <id|uid>' is pretty handy to see what task has on a task or tweak it if you added it wrong.
  • If you really do want to wait forever for something, you can use 'wait:someday' or 'wait:later' and it will set it to 2038. :)
  • You can also add 'until' on meetings for example for just after they would end or the like, and task will delete that task if it reaches the until time. This is handy for example if you miss a meeting, you didn't do it, it should just be removed, but you don't even have to think about it if you add it with an until.
  • Another cool one from Paul Frields: You can use 'task log ...' to create a task thats already marked done. This is handy if you already did something and didn't have a chance to make a task before hand. Saves you adding a new task only to mark it done.
  • Out of the box, task doesn't keep track of how much time you spent on any given task. You can use some hooks to record this, but it doesn't do it by default. I think the last time I tried it out this might have confused me, but this time thinking about it, I am not sure I care exactly how many minutes I spend on each task.
  • I've found it handy to add text as much as I can. When adding the task 'task add ...' (describe it as simply and completely as you can), when starting the task 'task N start ...' (usually here a link if it's to a ticket or bugzilla thing where I know the url when starting)', and when completing the task 'task N done ...' (here a commit url or the like with the work done). Or 'task N annotate ...' anytime you made some progress and it's an interesting or noteworthy thing to add.
Taskwarrior does have a server you can sync to (taskd). It's pretty easy to setup and uses TLS certs to authenticate clients. There's also an android app, which is kind of painful to configure, but seems to work otherwise. I've not done all that much with it yet, but it might be handy if you are at a event or the like and want to record a task but don't have your laptop out. There's also bugwarrior that lets you pull from trac/bugzilla/pagure/etc into task. I did that the last time I tried things out, but I think it's not a good workflow for me. When I do work on a ticket or bug I can add that task manually, and leave most/many of them that are not waiting on me to sit in their space. Reports are quite handy for seeing how much you have done and what you have done in a time period, and pretty easy to setup. Just add them to your .taskrc with what to filter on and what cols to show. I setup one for what I have completed for the current work week and what I completed for last work week (useful in weekly reports to management). Still getting the hang of things, but I am hopeful that I'll stick with taskwarrior at least for now.

encrypt all the things: blogs

So, with ssl certificates pretty easily available these days from letsencrypt.org it's more and more worth looking at making sure you are using https instead of http for everything you can. There are however some corner cases that are... difficult. One of those is blog aggregators. A while back we moved our planet.fedoraproject.org to fedoraplanet.org. This was to get it out of the fedoraproject.org domain because we send HSTS headers for fedoraproject.org telling browsers they should always contact that domain with https. Blog aggregators are in a tough spot, because they simply pull content from a bunch of different sites and put it in one place and link to it. Unless 100% of the blogs that the site is aggregating are also https, if the site itself uses https most browsers will show you a nasty warning about mixed encrypted/non encrypted content. So, for now, since most of the blogs on fedoraplanet.org are http, we are leaving fedoraplanet.org itself as http. However, we would love to get to the point where all the blogs we aggregate are https. I don't think it's an impossible journey. Here's what you can do if your blog is listed in fedoraplanet.org:

  • Check to see if your blog site already supports https and has a valid cert. If so, you simply need to login to fedorapeople.org and edit your .planet file to use the https link instead of http. Done.
  • If your blog site doesn't support https (yet), ask your blog provider about adding it. They should be able to add a letsencrypt.org cert pretty easily.
  • If your blog site doesn't support https(yet) and you run your own blog, why not add https support?
If we can get a critical mass of blogs using https, we can look at switching the site over too. Help us out. :)

A Fedora Distribution download primer

With the fresh news of a compromise in the Linux Mint distribution images, I thought I would take a few minutes to explain how Fedora handles image downloads and what you can do as an end user to make sure you have the correct and official Fedora images. First, lets take a look at what happens in each step if you open your browser to getfedora.org (our install images download site):

  • You type 'getfedora.org' in your browser.
  • First, your operating system asks your dns servers for the IP address of getfedora.org. If your OS is using dnssec, then it will get a cryptographically signed answer. If not, it will get whatever answer your dns servers give it.
  • Next your browser may try and connect to getfedora.org via http. We have getfedora.org setup to redirect all http requests to https, so this would get you a redirect.
  • On the first https connection to getfedora.org, we send a HSTS header. This tells your browser (if supported by it) that it should ALWAYS use https to talk to this site. Even if you enter http://getfedora.org, it would just correct that and connect on https.
  • Once connected you can download distro images by clicking on the download link for the image you like. Once you click on a download (unless you have completely disabled javascript), there's a screen describing how to verify your download: https://getfedora.org/verify
  • Once you have downloaded your image, you need to do two things to make sure it's the valid and official image: First, check the gpg signature of the checksum file. Official checksum files in Fedora are always signed. You can get the gpg key for that Fedora release from getfedora.org, most any keyserver, or from the fedora-repos package if you already have a Fedora install. Additionally, if you import this key and then refresh (gpg2 --refresh-keys) you can see the signatories of that key and decide based on all that if you trust it. If thats correct, then you can use sha256sum to check the checksum of the image. YOU SHOULD ALWAYS DO THESE CHECKS. :)
So, we have dnssec, hsts and signed checksum files. Would that have helped us any if we suffered a attack similar to the one the Linux Mint folks suffered? In that attack, their download machines were compromised and intruders replaced checksum and download links to their own version. If that happened to Fedora, the only step above that would protect people would have been the gpg signature check (which sadly, many folks never do, and for good reason, it's hard and anoying and manual). In Fedora 24 the workstation and server editions will be moving to preferring the usb media creator application instead of preferring direct downloads of images. We will need to make sure it's as secure as we can make it, but there may well be a manual step of checking the application after you download it. (Unless you already have Fedora and install it as a normal package). In it's current form it already downloads checksum files from the Fedora master mirror via https and uses that to check downloads, but more can be done.

Rawhide: notes from the trail, 2016-02-21 edition

Once again I have been busy and haven't gotten a notes from the trail out recently, time to correct that! Just a bit over two weeks ago we had a mass rebuild event in rawhide. This one was for gcc6 which just landed in rawhide. The actual rebuilding took (much like the last one we did) about 36 hours, but thats just the tip of the iceberg. From the devel-announce post about the mass rebuild completing: "16129 builds have been tagged into f24, there s currently 1155 failed builds that need to be addressed by the package maintainers." Dealing with those failed builds is always the intensive part of any mass rebuild cycle, since it takes maintainer time to sort out whats going on (and often upstream communication) to fix. In just a few days we will be branching off F24 from rawhide. Do make sure when this happens on tuesday that you follow the direction you want: Either keep on rawhide or follow Fedora 24 until it's release in June. With the mass rebuild and a bunch of other changes people are landing before the branch event there's been some issues with the rawhide install path again. If you need to install rawhide, do check https://openqa.fedoraproject.org/ for the last image that passed install tests and save yourself time and energy before downloading. Finally we are close to landing the changes in the compose process around pungi4. This will make every day's rawhide just like a release compose. It will have all the trees, images, checksums, metadata and such every day. This will also be used for the Fedora 24 cycle composes, so everything should be a good deal more consistent.

A tale of that most hated hardware: printers

Ask any sysadmin what hardware they have to deal with is the worst and I think universally they will all tell you it's printers. There's a lot of reasons for this, but I think the biggest ones are:

  • They use / misuse consumable items like ink, paper, etc.
  • They have a ton of moving parts to jam or break or get blocked.
  • They tend to be in places where end users are asked to service them and often put in incorrect paper, jam things or misrefill ink, etc.
  • Recent trends have made printers lowest margin items. If your printer doesn't work, throw it away and get a new one, it's cheaper than fixing or refilling your existing one.
A few years ago I got a HP multifunction printer (when it's predecessor broke and it turned out to be cheaper to just buy a new one). The scanner part of it works great and always has. The printer part on the other hand has been nothing but horror. I don't tend to print much at all, but a few times a year for some reason or another I need to print a document or directions to something or whatever and every single time this printer has failed me on the first print attemps. Reasons for it's breakage have included:
  • It's been so long since I printed the ink "dried up". Once I was able to get it working by cleaning it with a q-tip, once by making the printer run it's clean print heads cycle about 20 times, once by buying a new cartridge (almost as much as a new printer).
  • Once it decided to just not turn on at all. Unplugging it completely and waiting 5minutes and replugging it "fixed" it.
  • Once the color ink cartridge was low so it wouldn't print anything (even just using the black cart that was full!) which required a new color cartridge.
Over the holidays I cleaned out a lot of my computer room, so I could see all the printers I still had that I hadn't managed to throw away. 2 old HP inkjets, an old epson, and... what is that at the back of the room? why it's my old HP laserjet 4P. I used that printer extensively during college and after and finally retired it when the tonor ran out and I wanted to move to a new shiny color inkjet printer. So that got me thinking. The laserjet 4P only has a parallel port, but a quick check found a parallel to usb cable for $8. At the time I retired it, a tonor cartridge for the 4P cost around $100, but looking now, I found one for $12. A quick amazon order and a few days the items arrived. I hooked up the printer to my main server and put in the new tonor cart, added it to cups and printed a test page. A lovely 600dpi test page. So, hopefully this laserjet 4P will now print reliably for me for the rest of my life and I can avoid the inkjet "buy a new one" everytime I need to print something.

A vist to a Fedora datacenter

I just got back from a visit to the largest of the datacenters used by Fedora Infrastructure, and I thought I would share a bit about why we do such visits what what we do on them. We have a number of sites where Fedora Infrastructure has at least one machine, but we have one main datacenter (sponsored by our main sponsor, Red Hat) where all our build system machines are, our fedorainfracloud openstack cloud, and most of our main servers are. This site does have some on site folks who can handle most of our day to day hands on needs, but sometimes we have things that need a lot of coordination, or extra time to complete, so we try and 'save up' those things for one of our on-site visits. (Side note: If you are interested in donating machines/colocation space for the Fedora Project, please do mail admin@fedoraproject.org. We would love to hear from you!) On this particular visit last week we got quite a few things done:

  • Package maintainers will be happy to hear we racked a new HP Moonshot chassis. This will hopefully allow us soon to run armv7 builder vm's on aarch64 virthosts which will vastly decrease build times.
  • We moved 5 machines to a new rack to be used for testing storage solutions. We want to see if we can move to more gluster or open source storage solutions. These machines will be used to test out those solutions as they mature.
  • We added memory to a number of cloud compute nodes, allowing us to run more and larger cloud instances (and have more copr builders).
  • We added a new equallogic storage device to our cloud network. This is a smaller one than we have now, but it will allow us to test things and also give us some more cloud storage room.
  • We fixed some firmware on some power supplies (yes, power supplies have firmware too!). This involved an elaborate dance of moving power supplies around in various machines to get them all upgraded (which would have been very tedious remotely).
  • Moved an old remote kvm and added a new kvm. We don't tend to use these all that much, but they are very handy for machines that don't have proper out of band management.
  • Pulled tons and tons of unused cables out of all our racks. On site folks don't tend to want to remove cables, only add new cables for new devices. There was a bunch of old cables that were only connected to a switch or serial port or power device and were no longer needed.
  • Setup a power7 machine (many thanks to IBM) in our cloud network. As soon as we have it installed and setup we should be able to have ppc64 and ppc64le cloud instances and also do copr builds of those arches locally instead of against a remote server.
  • Updated our records of which servers were on which serial, power and switch ports. This information is very important when making remote requests or power cycling servers so you don't accidentally get the wrong one.
  • Pulled out a few older servers we were no longer using.
  • Racked a few new servers we will be using for releng/buildsystem stuff.
  • Checked all the servers for any error lights or alarms. Cleared a few we had already fixed, and fixed a few we didn't know about yet.
All in all a good weeks work that should set us up well for the next few months. I would like to thank Steven Smoogen (for being on-site and getting all the above done with me) and Patrick Uiterwijk (who held down the Fedora Infrastructure fort while we were off at the datacenter and helped get all the above done from the remote side) as well as Matthew Galgoci (Red Hat networking guru extrodinare) and our On-site folks, Jesse Iashie and Pedro Munoz. It's a pleasure to work with you all.

Rawhide year in review

Looking back over 2015, I see I only managed 7 rawhide posts. Will see if I can do better for 2016. In any case, looking back at these posts and emails I have sent to mailing lists with rawhide topics I thought I would do a bit of a year end roundup. Overall we have made some great improvements:

  • We now have openqa running on rawhide images every day to let us know when the install path breaks so we can fix it and so people know when doing a rawhide install will work for them.
  • Rawhide is "mostly" signed now. When a large set of rpms lands right before the compose the autosigner sometimes doesn't sign them all before the compose, but for the most part all the rpms are signed. (For example, all rawhide composes since 2015-12-29 are fully signed, that one had a qt5 build that landed right before compose).
  • Slowly it seems like more people are using rawhide day to day and helping report issues or problems. Rawhide is not for everyone, but it's good to have a pool of folks using things and testing and reporting bugs so they can get fixed.
Coming up in 2016:
  • releng hopes to soon change the rawhide compose process to use pungi4. This would be the same process then used for real Alpha/Beta/RC/Final composes. It would mean we would have a full compose of all things every night we could test and use.
  • We are working through bugs with sigul batch signing mode (we have already made some great progress fixing several issues), and once those are all solved the autosigner should be able to handle things much quicker.
  • Hopefully we can start doing some more automated testing on composes as well.
Looking forward to another fun year running rawhide and making it better for everyone!