2016 Holiday hacking (so far)

nirik

2016-12-27 18:10

I hope everyone is having a lovely holiday season and doing the things they find fun or recharging. I've finally gotten to a few hacking projects that I thought might be interesting to share with others:

I finally cleaned up my local ansible repo some and set it up as a pagure project: ( https://pagure.io/scrye-ansible/ ). It still needs a bunch more cleanup (these playbooks were some of the first I ever wrote, so many years ago), but they should help anyone who wants to setup test machines like I have here or provide some more ansible examples.
For 4+ years I have had a CentOS6 digitial ocean droplet I have been using as a secondary nameserver for my domains. It actually predates IPV6 support at digitial ocean and had a manually setup he.net ipv6 tunnel on it. I finally looked at the ansible digitial_ocean module and setup a new shiny Fedora 25 droplet, created entirely via ansible as well as putting my secondary nsd configuration in it. This is all now in the above scrye-ansible repo. I may well be using more digital ocean droplets now that I see how easy it is via ansible.
I looked around for a new rom for my phone (nexus 6) and didn't find much that filled me with joy. So, for now I just moved to the latest nightly CM version and will see if this new fork ( http://lineageos.org/ ) pans out. The other roms out there are really unprofessional looking: (homepage is a xda forum post, binaries have no checksums and download from bigfiles.com or the like, last commits to source repos was years ago, etc).
One big issue I had had with my phone of late is it quitting when it has 30-40% battery left and suddenly saying it has 0%. I found that you can reset the battery calibration (boot with power, down volume, select 'bootloader logs', press power for 10+ seconds until the phone reboots). That seems to have gotten things a good deal happier now, but time will tell. Unfortunately, the battery on the nexus 6 is not very easily replaceable.
I got all my home machines rebooted into the latest update kernel. I sometimes delay on my firewall/gateway machine because it means a short outage, but everything is updated now.
Pulled all the old backups off my phone and freed up a fair pile of space. There's no reason to store them there, I can copy them back if I need to restore something from them (which I never really do after the initial restores).
Setup usbguard on my home machines. The chances of someone plugging in a evil usb device to them is small, but no harm in setting it up and denying all the non known to me devices. I've been using it on my laptop for a long while now.
Updated my drop-keys (gnome) and xscreensaver-keys (Xfce) scripts and pushed them to the pagure repo: https://pagure.io/scrye-ansible/c/37610c85918125d92bedff6673b58310605e90dd?branch=master basically these scripts can watch for the screen to lock and drop all your unlocked ssh keys. For gnome there's an upstream bug that might make this land in upstream gnome-keyring: https://bugzilla.gnome.org/show_bug.cgi?id=735149

In the next few days I hope to catch up on some packaging backlog and perhaps get some armv7 installs done. I hope everyone is having an enjoyable, relaxing holiday!

Rawhide notes from the trail, the 2016-12-18 edition

nirik

2016-12-18 13:05

Hello from the Rawhide trail. With the recent Flag day (on Dec 12th), we switched all rawhide builds to allow us to sign (and hopefully eventually test) all packages. Here's how it works:

Your rawhide build used to just be tagged into the f26 (currently rawhide) tag. Now, it tags into the f26-pending tag instead.
The autosigner sees the build, signs it and moves it to the f26 tag for the next rawhide compose.

Unfortunately, there is a backlog of packages we needed to sign, many from the ppc{le} bringup, so thats why there hasn't been any rawhide composes the last few days. The one today should be out later today however, and we should be back on track from there. So currently we are just signing things at the $release-pending tag, but we would like to try and start doing some automated QA there at some point. Nothing that will hold up builds for too long, but something that will catch obviously broken builds from landing. Now that we have everything otherwise in place we can start figuring out what we want to run there. Also, coming soon to rawide will be the first rebuild of python packages for the upcoming python 3.6. Hopefully that will all be smooth sailing, but a number of package updates will land for that soon.

Rawhide notes from the trail, the 2016-12-03 edition

nirik

2016-12-03 14:26

Well, it's been a while but I thought I would share news from rawhide both from the previous month or so and upcoming. With the release of Fedora 25 a number of developers and testers are switching back to rawhide in the run up to the Fedora 26 branching event. Welcome back to the trail! In the last month there's been various small rawhide issues:

openssl 1.1.0c came out and had a number of issues. In particular it broke authentication to koji, so any package maintainers had to downgrade back. It also broke a bunch of tests in python, which has delayed the landing of python 3.6 in rawhide. There's a patched version of 1.1.0c that landed last week, so everyone should be ok to update openssl again.
The latest systemd version landed, but then had to be untagged again because it broke a number of parts of compose. Hopefully those will get worked through and fixed.
More recently (The last week) a kf5/qt5 update is in progress and isn't fully finished. Rawhide users will notice a number of packages dnf says it's skipping due to broken dependencies. This should get fixed up soon, so just keep skipping those items.
Finally, the last few days we have run into an odd problem that looks like it might be related to RHEL7.3 squid (which is whats on kojipkgs.fedoraproject.org and used for package downloads for builds). Sometimes packages just don't seem to download and it breaks the compose, but testing later they work fine. We downgraded squid back to the 7.2 version for now and composes seem to be working again.

Finally in exciting upcoming news: There's a "flag day" on december 12th (2016-12-12) for a bunch of changes that release engineering has been wanting to make for a long time. One of these changes will finally allow us to do the last thing we need to do to get a 100% signed and gated rawhide. There will be a change in fedpkg (as well as some koji server changes) that will make rawhide builds land in a 'f26-pending' tag instead of just going directly to 'f26'. Then, our autosign process will sign them and retag them to f26. We also can add in automated QA here to decide if a build is ready to be tagged in or not. I hope next year that we can add a bunch of tests here and make rawhide more and more stable.

Security Score Card For Fedora Infrastructure

nirik

2016-10-24 11:41

Josh is asking folks to send him their security score card via twitter. Since I've been trying to blog more and like pontificating, I thought I would respond here in a blog post. ;) There's 4 parts to the scorecard:

Number of staff
Number of "systems"
Lines of code
Number of security people

For Fedora Infrastructure, some of these are pretty hard to answer, but here's some attempts:

Fedora Infrastructure is a Open organization. People who show up and start doing things are granted more and more permissions based on their merit. Sometimes people drift away to other things, sometimes new people show up. There's some people employed by Fedora's primary sponsor Red Hat, specifically to work on Fedora. Those account for 3.5 sysadmins, 5 applications developers, 2 release engineers, and 2 design folks. Specific areas will have potentially lots more community folks working on them. So, answer: 13-130?
This one is easier to quantify. We have (almost) everything in ansible, so right now our ansible inventory + some misc non ansible hosts is around 616 hosts.
This is another one thats difficult. We have a lot of applications (see https://apps.fedoraproject.org/) Some of them are just upstream projects we have instances of (mediawiki, askbot, etc). Others are things where we are primary developers on (fedocal, pagure, etc). It would be a fun project to look at all these and count up lines of code. Answer: dunno. ;(
If this is full time security people working only on security issues, then 0. We do have a excellent security office in Patrick who is super smart and good at auditing and looking for issues before they bite us, but he's not doing that full time. Others of the sysadmin teams do security updates and monitoring lists/errata and watching logs for out of the ordinary behavior, but thats also not full time. So, answer: 0 or 1 0r 3?

So, from this I think it would be nice to have a better idea of our applications (not lines of code), but just where to keep track of things better and who knows that application. It would be awesome to get some full time security folks, but I am not sure that will be in the cards. I'd like to thank Josh for bringing up the discussion... it's an interesting one for sure.

Another Fedora cycle, another painless Fedora upgrade

nirik

2016-10-23 13:06

As we near the release of Fedora 25, as always I upgrade my main servers to the new release before things come out so I can share any problems or issues I hit and possibly get them fixed before the official release. As with the last few cycles, there was almost no problems. The one annoying item I hit was a configuration change in squid. In the past squid allowed you to have a 'http_port NNNN intercept' and no regular http_port defined, however the Fedora 25 version ( squid-4.0.11-1.fc25 ) fails to start with a cryptic: "mimeLoadIcon: cannot parse internal URL: http://myhost:0/ squid-internal-static/icons/..." error. It took me a while to find out that I needed to add a 'http_port NNNN' also now. Otherwise everything went just fine. Many thanks to our excellent QA and upgrade developers.

some IOT ideas

nirik

2016-10-22 12:23

With the massive denial of service against a large DNS provider (in turn causing a lot of other things to have outages) on friday using a network of insecure IOT (Internet of things) devices (mostly cameras), a lot of folks are thinking about how to address IOT problems. There are a lot of problems and no easy answers, but I thought I would throw out a few ideas here and see if any resonate with others. First, the problems: IOT devices are insecure and easily taken over for denial of service or other attacks, there is little to no economic incentive to make them more secure, consumers largely don't care as long as they are still doing whatever their main function is, devices seldom get updates and when they do only for a short time before the company that made them moves on to something else. Some ideas:

Make things vastly more cheap. Yeah, thats right, more cheap. To the point where you can dispose of or toss in recycling the device that just went out of support or was found to be used in some botnet. Or 3d print a new device. Of course this is not going to happen for a long while.
Make updates very reliable. I think we can do this with something like ostree/atomic + some wrapper hardware. Upgrade, reboot, if things don't come back in X minutes, reboot back to the last working version and send for help.
Get IOT devices to use mainstream open Linux distros. These would provide source, upgrades, and device makers wouldn't have to care about not supporting things. This would probibly take legislation and a big push for Linux distros to cater to IOT devices, and there would likely still be some hardware specific code (but it could be open and maintained in distros).
Require Internet insurance for users of ISPs. ISPs that police botnet and other harmful devices would pay lower raates than ones that didn't care, money raised could be used to shut down harmful devices.

It's a pretty difficult problem sadly, and there's no good answer, but we are going to have to start doing something or start getting used to large DDOSes taking out a bunch of things we use often.

xfce4-terminal: lots of progress recently

nirik

2016-10-21 19:57

Most Linux desktop's have their own terminal programs, and Xfce is no exception. There's been a number of releases of xfce4-terminal recently, so I thought I would share some of the changes with people who perhaps haven't tried it recently. The biggest change is that it's been ported to gtk3 and vte291. This is great news as it means it's no longer using the no longer supported at all ancient version of vte. Unlimited scrollback has been added, wayland support, tons of bug fixes, translation updates, and memory leaks quashed. More specifics in the NEWS file: https://git.xfce.org/apps/xfce4-terminal/tree/NEWS Give it a try today and see if it handles all your terminal needs.

keepalived: Simple HA

nirik

2016-10-20 17:07

We have been using keepalived in Fedora Infrastructure for a while now. It's a pretty easy to use and simple way to do some basic HA. Keepalived can keep track of which machine is "master" for a IP address and quickly fail over and back when moving that IP address around. You can also run scripts on state change. Keepalived uses VRRP and handles updating arp tables when IP addresses move around. It also supports weighting so you can prefer one or another server to "normally" have the master IP/scripts. Right now we are keepalived on our main koji server pair. We have a koji01 and koji02. Normally 01 is primary/master and has the application IP address on it so all traffic goes to it. If for some reason it was turned off, the keepalived on 02 would see that and take the IP address and run a script to become master. If 01 came back up, 02 would see that and transfer back to it. Right now we have the scripts setting up on the secondary server a bunch of cron jobs (garbage collection) and kojira (the process that regenerates build roots). We are also using keepalived on some new paired postgresql instances we are working on. More on that in a later blog post. If you need simple HA with a IP address and script(s), keepalived does a banner job.

pass - The standard unix password manager

nirik

2016-10-19 15:56

In my line of work (sysadmin), I have to deal with a LOT of passwords. For a number of years I was a fan of keepassx, but the upgrade to version 2.0 there didn't thrill me. (There were some features I liked that it dropped and in general it seemed to be less nice), so I decided to look around and was pointed to pass. Pass (or password-store as it's sometimes called) can be found at https://www.passwordstore.org/ and in most major Linux distributions package set. (It's called just pass in fedora). It's a simple command line tool and uses gnupg and git (if you like). Each of your passwords / sites is a gnupg encrypted file, setup in a tree under ~/.password-store. You can tell pass also to use git here so every change you make it a git commit of encrypted files. So, you should be able to find any old passwords you changed in git history if needed. Setup is pretty simple:

pass init yourgpgid@example.com - this sets up the base directory for pass. Note that you can actually add additional gpg keyids to encrypt files to later if you have a team or the like.
pass git init - this sets up the git repo

After setup you can use 'pass insert path/to/name' to add your own password, 'pass generate /path/to/name' to have pass generate a random password for you (using pwgen), 'pass ls' to list the tree of sites, 'pass -c /path/to/name' to copy the password to your clipboard for easy inserting into another application or website. (by default it will stay in the clipboard for 45seconds or 1 paste and then vanish). Note that you never even need to know the password here, you just get it from pass and paste it in and are done. If for some reason you do need to see it (a broken app that won't let you paste for example), you can just do 'pass /path/to/name' and it will output the password. You can do 'pass edit /path/to/name' to edit a password, and note that you can add whatever you like to the pass file. pass -c will use the first line as the password, but you can add more lines with security questions and random answers, usernames or any other notes you like about the site. 'pass git ' will let you run any arbitrary git command on your pass git repo, if you wanted to look at history or go back to some previous commit. There is a android app that is supposed to work with pass files, but I have not tried it. It requires you to copy your gnupg private key to your phone, which is not something I am really wanting to do. Since I have my laptop with me most all the time anyhow, it's not a big deal.

debugging koji build failures

nirik

2016-10-18 17:05

From time to time builds fail in koji (The Fedora build system), and it's good to know how to figure out where to look for the reason. Koji has a central hub that manages jobs and a bunch of builders that actually do the builds. When someone initiates a build you are talking to the hub and either uploading a src.rpm (for a scratch build) or telling it to use a particular git hash/repo for an official Fedora build. For official builds, koji will first generate a job to build the src.rpm from git and the packages lookaside cache with the source. This job will run on some builder thats ready and has capacity for it. Once the src.rpm is generated (or if you are providing it for a scratch build), the hub will generate build tasks for all the arches that are set in the target tag you are building to. Each one of those will go out to a builder of the right arch type that is enabled and has capacity, etc. If any of these fail, the entire build fails. Once you have gotten notice of a failure, the first thing to do is go to the link for the build in the koji web interface and then click on the task link to show the build tasks. There you will see one (or more) links in red that failed. Don't click on any of them yet though, instead click on the "show result" link. That will tell you exactly which task failure failed the build. When one failure happens koji cancels the other builds, but sometimes they fail at nearly the same time and you are looking at the wrong place for the real failure. Sometimes all the links are green and the reason for the failure is only visible in the "show result" area (for example, if you have a archfull package with noarch subpackages and they are different on different arches, thats a failure, or if you don't have permissions to build the thing you are building, thats a tagging failure). Next click on that task that you saw in the "show result" link (if you haven't already seen the error as above), and again click on the "show result" link. You should see something like:

"BuildError: error building package (arch i686), mock exited with status 1; 
see build.log for more information"

Or it could refer you to the root.log. Go to that log and look toward the end for the error. If the error is in the root.log it means there was a problem setting up the mock chroot. That could be that there is a package you BuildRequire that in not installable, or some package in the build root is completely broken. If the issue is in the build.log its the actual output from the rpmbuild, so this would be where to look at your general compiler issues, etc. If you ask someone to help you out and look at the error(s) you are getting, it's important to give them the top level link to the task instead of just a link to the root.log or build.log. If you only have those links it's impossible to back up and look at the higher levels, but if you have the top level task you can look at the entire thing. Aside from normal rpm package builds, koji also builds a bunch of other things like livemedia and other images. Finding issues there is very similar. Always check the "show results" area for which logs to look at for the actual failure.