Kevin's musings

linux

Fudcon day 1

by on Jan.13, 2012, under fedora, linux

Fudcon day 1 started with having to get up at 7:30am to get ready for the 9am starting time. Thats 5:30am my time, so that’s an excuse for me being groggy this morning. ;)
Had no problem getting to the venue and getting my badge and t-shirt. Then, after some logistics we started in on the first session of the day:

Fixing Staging in Fedora Infrastructure.

Some background: Currently we have a some ‘staging’ machines that are supposed to be copies of production instances that we can use to test and integrate new things with. We have a seperate git branch in puppet that handles the staging instances, which seems neat, but turns out to be an annoyance in several ways.

There was a lot of information and debate on what production, dev, staging, integration, or the meant. How we could setup puppet. If we could on demand make a staging instance or a subset of those. How process should work. How we could go from here.

We came up with a plan of attack and some things to consider:

  • Drop the ‘staging’ git branch. Everything is in the same git repo. ie, all machines are production.
  • Try and make our apps more able to be ‘containers’. Ie, reduce dependence on other parts of Infrastructure so things can be tested in containers easier.
  • Look at ways to build containers or integration staging machines on the fly as needed.

After a quick lunch (man the wind was nasty to/from lunch), it was time for a 2 factor auth session.

We’ve been talking about finishing off yubikey as a true two factor authentication method in fedora infrastructure. We had a lot of good input here and hashed out a plan here too:

Short term:

  • Fork linotp’s pam module to a new project. This would be just the pam module, and we would enhance it to require a valid ssl cert from the server it’s talking to before sending it anything, prompting for pin and pass and other enhancements.
  • First target is going to be sudo for all sysadmin-main users.
  • Create a CGI that the pam module can talk to and send auth info to and return ok, bad, broken
  • CGI will likely run on fas servers so it can talk to fas and yubikey
  • Some quick and dirty way to query pin

Longer term:

  • FAS changes to store and set/reset pin
  • ADD google auth or OATH to the CGI
  • Increase parts thats are covered/where 2 factor is required

All in all some great sessions today. I think we have some lovely plans in fedora infrastructure, ready to dig in and get working in the coming days and weeks.

Comments Off more...

Fudcon Day 0

by on Jan.13, 2012, under fedora, linux

Thursday (Fudcon Blacksburg 2012 day 0) was a travel day for me. Had to get up at the unreasonable hour of 4:45 to catch the shuttle to the airport, then to my first flight to chicago. That went off fine, but unfortunately my plane from Chicago to Roanoke was delayed quite a while. ;(
On the plus side I got to hang out with 3 other fudcon bound folks and we did some chatting there in the airport. Finally we got a plane and got to Blacksburg, where the magic Spot van picked up about 15 of us and took us to the hotel.
The hallway track was well in progress there, and I ended up staying up later than I had thought I could talking to people I usually only see on IRC. Finally got to bed around midnight after catching up on some emails.

Looking forward to a great fudcon!

Comments Off more...

Fedora Infrastructure at fudcon 2012

by on Jan.06, 2012, under fedora, linux

Next weekend is Fudcon Blacksberg 2012 and a number of Fedora Infrastructure folks should be there (Including me).

I’ve signed up to give/facilitate 3 workshops/hackfest sessions:

  • Fedora Infrastructure: Revamping our staging server setup, then implemeting it. This will be hopefully friday and sunday. We can try and hash out a plan on friday and look at starting implementation or at least timeline for implementing things on sunday.
  • Fedora Infrastructure: 2 factor authentication brainstorming. This will hopefully be saturday. I’d like to try and get full 2 factor auth setup for yubikeys and/or a clear plan to do so. Ideally we will setup a framework where we can also optionally use other 2factor setups (google authenticator, etc). I will also have some yubikeys with me to distribute
  • Fedora Infrastructure: Getting started/apprentices/Getting involved. This will be a intro session for folks that would like to get involved with infrastructure. We can answer questions, try and find out what would be a good fit for people starting out and work on our apprentice program to try and make it easier for people to join us.

In addition to those, Toshio is going to be running some sessions on Python programming and software releases using our infrastructure as the example. Should be great stuff.

Of course as always in addition to the talks and hackfests, there’s a lively “hallway” track: A chance to meet up with folks and discuss things in a high bandwidth way. I hope to see lots of folks I know, and lots of new faces as well. Happy to discuss any fedora related issues or items there.

If you’re unable to attend this Fudcon, do follow along in #fedora-fudcon on irc.freenode.net. There should be pointers there to whats happening, whats being discussed and allow for remote input. Hopefully we will have irc transcribers in various rooms as well.

Comments Off more...

Another fedorahosted.org trac plugin

by on Jan.04, 2012, under fedora, linux

I’ve just updated in epel and installed on fedorahosted.org another trac plugin: PeerReviewPlugin

This is a nifty little plugin that lets you do codereviews. You can set users to have CODEREVIEWDEV (developer who can submit reviews) and CODEREVIEWMGR (can approve reviews). You can enable the plugin in the admin section of your trac instance, and should then see a “Peer Review” button.

Enjoy.

EDITED TO ADD(2012-01-06): We ran into a problem with the plugin that I missed in my testing of it. ;( I’ve disabled it for now… if anyone has ideas on fixing http://trac-hacks.org/ticket/7034 please chime in there. ;)

Comments Off more...

Some new fedorahosted.org plugins

by on Dec.28, 2011, under fedora, linux

Thanks to some quick packaging work by Andrea Veri we have some new plugins we have added to trac on fedorahosted.org for your tracking pleasure:

  • trac-workflowadmin-plugin: This plugin lets you change the ticket workflow on your project. You can setup what states tickets must pass through and in what order. It even gives you a nice little graph of workflow.
  • trac-navadd-plugin: This isn’t too visible yet, but I am going to hopefully use it to add handy ‘register’ and ‘forgot password’ links to all projects. This should help with end user ticket reporting.
  • trac-watchlist-plugin: This allows any logged in user to watch tickets or wiki pages without having to CC to the ticket or otherwise distrub it
  • trac-tocmacro-plugin: This was enabled in the old fedorahosted, but wasn’t packaged up correctly and has been now. This allows you to use a TOC macro to provide table of contents on wiki pages.

As always, these plugins are available in the EPEL repos (currently in testing) for all to use, improve and share. If you need some specific plugin or macro in fedorahosted, just let us know!

Comments Off more...

fedorahosted.org changes an enhancements

by on Dec.18, 2011, under fedora, linux

I just thought I would shoot out a quick post about some changes and enhancements at fedorahosted.org (an excellent place to host your open source project).

We’ve moved fedorahosted to RHEL6 and a newer set of hardware. Everything should be faster and newer. You will probibly most notice the upgrade to trac (from 0.10 to 0.12), but mailman, scms and all other software should also be newer. The backend is now a cluster of two drbd using nodes. We are going to look at decentralizing things much further in the coming year.

Speaking of trac, there’s a ton of new items in the new version along with some new plugins we have enabled:

  • The MasterTickets plugin is now installed, so you can have tickets block or be blocked by other tickets, and get a nice graph of the dependency tree.
  • Multiple repository support. You can now have your project list and show multiple source repos, using the same or different SCM’s
  • A great deal of configuration is available to trac admins on notification emails and permissions.
  • Unicode and i18n support is vastly improved
  • With a bit of adjustment, the PrivateTickets plugin works fine with the new trac. This allows you to have tickets only some groups can read/see if needed.
  • Configurable work flows for tickets. You can set things to go through specific states before being fixed.

Mailing lists (lists.fedorahosted.org) have some small but nice improvements, like properly handling bounce messages from inactive google.com accounts and better utf8 handling.

Our review board instance is still running along, and should be a bit faster now.

We have great plans for 2012 for fedorahosted. If you’re interested in helping us out, take a gander at the Fedora Infrastructure Getting Started page and jump in and join us.

See the fedorahosted.org FAQ for more information about Fedora Hosted and how to host your next open source project there.

Comments Off more...

Fedora NFS server outage retrospective

by on Dec.10, 2011, under fedora, linux

As you may have seen if you are on the fedora announce list, we had an outage the other day of our main build system NFS storage. This meant that no builds could be made and also data could not be downloaded from koji (rpms, build info, etc). I thought I would share here what happened so we can learn from and try and prevent or mitigate this happening again.

First, a bit of background on the setup: We have a storage device that exports raw storage as iSCSI. This is then consumed/used by our main nfs server (nfs01). It’s using the device with lvm2, and has a ext4 filesystem on it. It’s around 12TB in size. This data is then exported to various other machines to use, including builders, kojipkgs squid frontend for packages, koji hubs and release engineering boxes that push updates. We also have a backup nfs server (bnfs01) that has it’s own separate storage with a backup copy of the primary data.

On the morning of December 8th, the connection between the iSCSI backend and nfs01 had a hiccup. It retried current in progress writes, and then decided it could resume ok and kept going. The filesystem had “Errors behavior: Continue” set so it kept going (although no actual fs errors were logged, so that may not matter). Shortly after this, NFS locks started failing and builds were getting I/O errors. A lvm snapshot was made and a fsck run on that snapshot, which completed after around 2 hours. A fsck was then run on the actual volume itself, but that took around 8 hours and showed a great deal more corruption than the snapshot had. In order to get things into a good state, we then did a rsync of the snapshot off to our backup storage (which took around 8 hours), and merged that snapshot back as the master fs on the volume (which took around 30min to complete). Then, a reboot and we were back up ok, but there were some small number of builds that were made after the issue started. We purged them from the koji database and re-ran them with the current/valid/repaired filesystem. After that builders were brought back on-line and queued up builds processed and things were back to normal.

So, some lessons/ideas here, in no particular order:

  • 12TB means most anything you decide to do will take a while to finish. On the plus side that gives you lots of time to think about the next step.
  • We should change the default on-error behavior to at least ‘read-only’ on that volume. Then errors would at least stop further corruption, and best prevent the need for a lengthy fsck. It’s not entirely clear if the iSCSI errors would have made the fs hit error condition or not however.
  • We could do better about more regular backups of this data. A daily snapshot and rsync off of that snapshot to backup storage could save us time in the event of another backup sync being needed. We would also then have the snapshot to go back to if needed
  • Down the road some of the cluster fses might be a good thing to investigate and transition to. If we can spread the backend storage around and have enough nodes, the failure of any one might not be as much impact.
  • Perhaps we could add monitoring for iscsi errors and note and react to them quicker
  • lvm and it’s snapshots and ability to merge a snapshot back in as primary really helped us out here.

Feel free to chime in with other thoughts or ideas. Hopefully it will be quite some time since we have another outage like this one.

Comments Off more...

Lessons from password/key changes

by on Dec.01, 2011, under fedora, linux

Recently, Fedora infrastructure requested everyone change their password and upload a new SSH public key. We gave folks almost 2 months to make the change. I’d just like to note here the reactions we had, divided into 4 ‘groups’ of users. Of course some people were between two of these groups or some probably had slightly different reactions, but this is what my reading of the feedback leads me to:

  1. Group one sees the announcement, skims it, goes and changes their password, generates and uploads a new ssh key, goes back to what they were doing.
  2. Group two sees the announcement, reads it, reads the links off it. Adjusts their firewall, checks their backups, re-enables selinux or applies updates, then changes their password and uploads a new key
  3. Group three sees the announcement. Complains that their private ssh key is safe and always has been, and they know all about passwords and ssh and encryption and this change is unneeded.
  4. Group four doesn’t even see the announcement. They are no longer involved, too busy to read it, or just don’t care

Group one spends about 5 minutes. The advantage to Fedora Infrastructure isn’t great here, but they do have a new password that meets the guidelines and a new ssh key in case they were careless with the old one. Of course they didn’t learn much, so they could be careless with the new one as well.

Group two are the very people we are trying to reach most, and here’s the most advantage to this plan. These people will learn how to improve their security some, how better to handle their ssh private keys and hopefully prevent compromise on their personal machines. They may spend a good deal of time following the links and learning about best practices, but it’s all time well spent.

Group three are the vocal minority. While they already know best practices and keep their ssh private keys safe, they don’t realize we have any way of telling them apart from group 4 (below). Time spent here is large for both them and Infrastructure folks, but advantage is low because it amounts to “I know I am fine and shouldn’t have to do this” vs “We have no way of knowing that”. There are likely a less vocal subset of these folks that show up in Group one (just do it and grumble and move on).

Group four is another group where there is good advantage for infrastructure. These are folks that are gone, don’t pay attention to their Fedora work, or are too busy to spend a few minutes on it. Packages with these folks as maintainers would be better off being orphaned and reassigned to people who use and care for them. Sysadmin groups with these folks are better off not having someone who they think are involved, but really doesn’t have time to be.

So, in the end I think there is still good advantage to us having done this. I hope the folks in group two are large and learn from our documents and best practices. I hope the people in group three realize that this isn’t just about them, it’s about the community, and I hope pruning the people from group four helps improve the health and activity of our community.

Finally, as a side note, the deadline was last night, but we are still assessing exactly how we want to mark inactive folks who haven’t yet changed their password and uploaded a new key. You still have time to go do it now. ;)

2 Comments more...

Reminder: Fedora password and ssh key change deadline looms (2 days left!)

by on Nov.28, 2011, under fedora, linux

Just a friendly reminder: If you are a Fedora account system account holder, and haven’t changed your password and uploaded a new ssh key since we announced the mandatory change, you best do so NOW. The deadline is 2011-11-30 (only 2 days away).

If you don’t, you may no longer have access to groups you currently do (like packager, or sysadmin or ambassador).

Go take a few minutes, read the announcement and security information linked to it, and change your password and upload a new ssh public key.

If you aren’t a Fedora contributor, the information linked in our announcement is still a great read and may just help you be more secure on your machines. :)

Comments Off more...

Bugside manners

by on Nov.02, 2011, under fedora, linux

Looking at a bug report today, I saw some pretty poor “bugside manners” on the part of a few people (including the upstream developer). There’s tons of examples out there of good bug report interactions and poor ones. I’d like to urge everyone to take a minute and think before posting that next bug comment.

Good bugside:

  • Ask for facts. Program output, versions, exact behaviors
  • Provide info if asked for it by others
  • Try and see the issue from the reporters view
  • If you find yourself unable to say something nice or move the bug along, perhaps you should refrain from saying anything at all

Bad bugside:

  • Don’t pile on unrelated bugs. Open a new bug for a new issue.
  • Avoid talking about “big picture”/design/philosophy. Those should go to mailing lists or the like, bug trackers are to track and fix bugs
  • If you don’t have things to add about the specific bug at hand, don’t post

I’m sure there’s other good/bad rules. Anyone run accross any other common good or bad bugside manners they want to share?

Comments Off more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...