Skip to main content

Another fedorahosted.org trac plugin

I've just updated in epel and installed on fedorahosted.org another trac plugin: PeerReviewPlugin This is a nifty little plugin that lets you do codereviews. You can set users to have CODEREVIEWDEV (developer who can submit reviews) and CODEREVIEWMGR (can approve reviews). You can enable the plugin in the admin section of your trac instance, and should then see a "Peer Review" button. Enjoy. EDITED TO ADD(2012-01-06): We ran into a problem with the plugin that I missed in my testing of it. ;( I've disabled it for now... if anyone has ideas on fixing http://trac-hacks.org/ticket/7034 please chime in there. ;)

Some new fedorahosted.org plugins

Thanks to some quick packaging work by Andrea Veri we have some new plugins we have added to trac on fedorahosted.org for your tracking pleasure:

  • trac-workflowadmin-plugin: This plugin lets you change the ticket workflow on your project. You can setup what states tickets must pass through and in what order. It even gives you a nice little graph of workflow.
  • trac-navadd-plugin: This isn't too visible yet, but I am going to hopefully use it to add handy 'register' and 'forgot password' links to all projects. This should help with end user ticket reporting.
  • trac-watchlist-plugin: This allows any logged in user to watch tickets or wiki pages without having to CC to the ticket or otherwise distrub it
  • trac-tocmacro-plugin: This was enabled in the old fedorahosted, but wasn't packaged up correctly and has been now. This allows you to use a TOC macro to provide table of contents on wiki pages.
As always, these plugins are available in the EPEL repos (currently in testing) for all to use, improve and share. If you need some specific plugin or macro in fedorahosted, just let us know!

fedorahosted.org changes an enhancements

I just thought I would shoot out a quick post about some changes and enhancements at fedorahosted.org (an excellent place to host your open source project). We've moved fedorahosted to RHEL6 and a newer set of hardware. Everything should be faster and newer. You will probibly most notice the upgrade to trac (from 0.10 to 0.12), but mailman, scms and all other software should also be newer. The backend is now a cluster of two drbd using nodes. We are going to look at decentralizing things much further in the coming year. Speaking of trac, there's a ton of new items in the new version along with some new plugins we have enabled:

  • The MasterTickets plugin is now installed, so you can have tickets block or be blocked by other tickets, and get a nice graph of the dependency tree.
  • Multiple repository support. You can now have your project list and show multiple source repos, using the same or different SCM's
  • A great deal of configuration is available to trac admins on notification emails and permissions.
  • Unicode and i18n support is vastly improved
  • With a bit of adjustment, the PrivateTickets plugin works fine with the new trac. This allows you to have tickets only some groups can read/see if needed.
  • Configurable work flows for tickets. You can set things to go through specific states before being fixed.
Mailing lists (lists.fedorahosted.org) have some small but nice improvements, like properly handling bounce messages from inactive google.com accounts and better utf8 handling. Our review board instance is still running along, and should be a bit faster now. We have great plans for 2012 for fedorahosted. If you're interested in helping us out, take a gander at the Fedora Infrastructure Getting Started page and jump in and join us. See the fedorahosted.org FAQ for more information about Fedora Hosted and how to host your next open source project there.

Fedora NFS server outage retrospective

As you may have seen if you are on the fedora announce list, we had an outage the other day of our main build system NFS storage. This meant that no builds could be made and also data could not be downloaded from koji (rpms, build info, etc). I thought I would share here what happened so we can learn from and try and prevent or mitigate this happening again. First, a bit of background on the setup: We have a storage device that exports raw storage as iSCSI. This is then consumed/used by our main nfs server (nfs01). It's using the device with lvm2, and has a ext4 filesystem on it. It's around 12TB in size. This data is then exported to various other machines to use, including builders, kojipkgs squid frontend for packages, koji hubs and release engineering boxes that push updates. We also have a backup nfs server (bnfs01) that has it's own separate storage with a backup copy of the primary data. On the morning of December 8th, the connection between the iSCSI backend and nfs01 had a hiccup. It retried current in progress writes, and then decided it could resume ok and kept going. The filesystem had "Errors behavior: Continue" set so it kept going (although no actual fs errors were logged, so that may not matter). Shortly after this, NFS locks started failing and builds were getting I/O errors. A lvm snapshot was made and a fsck run on that snapshot, which completed after around 2 hours. A fsck was then run on the actual volume itself, but that took around 8 hours and showed a great deal more corruption than the snapshot had. In order to get things into a good state, we then did a rsync of the snapshot off to our backup storage (which took around 8 hours), and merged that snapshot back as the master fs on the volume (which took around 30min to complete). Then, a reboot and we were back up ok, but there were some small number of builds that were made after the issue started. We purged them from the koji database and re-ran them with the current/valid/repaired filesystem. After that builders were brought back on-line and queued up builds processed and things were back to normal. So, some lessons/ideas here, in no particular order:

  • 12TB means most anything you decide to do will take a while to finish. On the plus side that gives you lots of time to think about the next step.
  • We should change the default on-error behavior to at least 'read-only' on that volume. Then errors would at least stop further corruption, and best prevent the need for a lengthy fsck. It's not entirely clear if the iSCSI errors would have made the fs hit error condition or not however.
  • We could do better about more regular backups of this data. A daily snapshot and rsync off of that snapshot to backup storage could save us time in the event of another backup sync being needed. We would also then have the snapshot to go back to if needed
  • Down the road some of the cluster fses might be a good thing to investigate and transition to. If we can spread the backend storage around and have enough nodes, the failure of any one might not be as much impact.
  • Perhaps we could add monitoring for iscsi errors and note and react to them quicker
  • lvm and it's snapshots and ability to merge a snapshot back in as primary really helped us out here.
Feel free to chime in with other thoughts or ideas. Hopefully it will be quite some time since we have another outage like this one.

Lessons from password/key changes

Recently, Fedora infrastructure requested everyone change their password and upload a new SSH public key. We gave folks almost 2 months to make the change. I'd just like to note here the reactions we had, divided into 4 'groups' of users. Of course some people were between two of these groups or some probably had slightly different reactions, but this is what my reading of the feedback leads me to:

  1. Group one sees the announcement, skims it, goes and changes their password, generates and uploads a new ssh key, goes back to what they were doing.
  2. Group two sees the announcement, reads it, reads the links off it. Adjusts their firewall, checks their backups, re-enables selinux or applies updates, then changes their password and uploads a new key
  3. Group three sees the announcement. Complains that their private ssh key is safe and always has been, and they know all about passwords and ssh and encryption and this change is unneeded.
  4. Group four doesn't even see the announcement. They are no longer involved, too busy to read it, or just don't care
Group one spends about 5 minutes. The advantage to Fedora Infrastructure isn't great here, but they do have a new password that meets the guidelines and a new ssh key in case they were careless with the old one. Of course they didn't learn much, so they could be careless with the new one as well. Group two are the very people we are trying to reach most, and here's the most advantage to this plan. These people will learn how to improve their security some, how better to handle their ssh private keys and hopefully prevent compromise on their personal machines. They may spend a good deal of time following the links and learning about best practices, but it's all time well spent. Group three are the vocal minority. While they already know best practices and keep their ssh private keys safe, they don't realize we have any way of telling them apart from group 4 (below). Time spent here is large for both them and Infrastructure folks, but advantage is low because it amounts to "I know I am fine and shouldn't have to do this" vs "We have no way of knowing that". There are likely a less vocal subset of these folks that show up in Group one (just do it and grumble and move on). Group four is another group where there is good advantage for infrastructure. These are folks that are gone, don't pay attention to their Fedora work, or are too busy to spend a few minutes on it. Packages with these folks as maintainers would be better off being orphaned and reassigned to people who use and care for them. Sysadmin groups with these folks are better off not having someone who they think are involved, but really doesn't have time to be. So, in the end I think there is still good advantage to us having done this. I hope the folks in group two are large and learn from our documents and best practices. I hope the people in group three realize that this isn't just about them, it's about the community, and I hope pruning the people from group four helps improve the health and activity of our community. Finally, as a side note, the deadline was last night, but we are still assessing exactly how we want to mark inactive folks who haven't yet changed their password and uploaded a new key. You still have time to go do it now. ;)

Reminder: Fedora password and ssh key change deadline looms (2 days left!)

Just a friendly reminder: If you are a Fedora account system account holder, and haven't changed your password and uploaded a new ssh key since we announced the mandatory change, you best do so NOW. The deadline is 2011-11-30 (only 2 days away). If you don't, you may no longer have access to groups you currently do (like packager, or sysadmin or ambassador). Go take a few minutes, read the announcement and security information linked to it, and change your password and upload a new ssh public key. If you aren't a Fedora contributor, the information linked in our announcement is still a great read and may just help you be more secure on your machines. :)

Bugside manners

Looking at a bug report today, I saw some pretty poor "bugside manners" on the part of a few people (including the upstream developer). There's tons of examples out there of good bug report interactions and poor ones. I'd like to urge everyone to take a minute and think before posting that next bug comment. Good bugside:

  • Ask for facts. Program output, versions, exact behaviors
  • Provide info if asked for it by others
  • Try and see the issue from the reporters view
  • If you find yourself unable to say something nice or move the bug along, perhaps you should refrain from saying anything at all
Bad bugside:
  • Don't pile on unrelated bugs. Open a new bug for a new issue.
  • Avoid talking about "big picture"/design/philosophy. Those should go to mailing lists or the like, bug trackers are to track and fix bugs
  • If you don't have things to add about the specific bug at hand, don't post
I'm sure there's other good/bad rules. Anyone run accross any other common good or bad bugside manners they want to share?

Dear IBM

Your blade centers are pretty nifty... however: a) Isn't there any way the remote console could just be vnc directly instead of requiring a java applet? Because the java applet has various issues and there are some very nice direct vnc clients available. It would be less hassle for you even. ;) and b) Could you pretty please have servers pop up a list of boot options/items right away, even if it takes 10minutes to finish booting to any of them? Having to spend 10minutes staring at a java applet waiting for the 10 seconds I can press "1" to get to the SMS menu is sure a drag on my time management. Especially when it doesn't take the keypress and I have to just do it again. Thanks!

Passwords and Keys and changing them

This morning, we announced that we are requiring Fedora contributors to change the passwords and upload new ssh public keys by the end of next month. There's no breach or cause for alarm, we just thought it would be a good time with all the high profile hacking happening out there for everyone to go look at their security practices and create new keys and passwords (see the announcement for full list). Please do go and change your password and create a new ssh key at your convenience (but before 2011-11-30). I'm sorry that the ssh key requirement has caused stress for some contributors, but realize we are not singling anyone out here, there's good reasons to ask for this change now when it's not urgent or triggered by outside events. Just a few of them:

  • Allow you to revisit your security process and policies and read about best practices
  • Allow you to see how to make changes and what machines and places you need to make them in the event you were making a more hurried change
  • Allow you to setup a separate ssh key for Fedora matters. This separates out some risk, at the cost of another passphrase and possibly hitting ssh server limits (most allow you to try only 6 keys).
While many of the more savvy Fedora contributors already know these things, and have good practices, we hope that everyone will learn something from this or at least not let it inconvenience them for too long.

laptops and mail flow

Recently, the filesystem on my trusty Dell D820 laptop started having some problems. It's using btrfs and installed back when btrfs first showed up, so I am no too surprised that something happend with it. Side note: Josef (btrfs maintainer extraordinaire) has been absolutely great in helping track down and fix it, see https://bugzilla.redhat.com/show_bug.cgi?id=726814 for the nitty-gritty. So, I decided at least for now to move to another laptop here (thinkpad t510). First order of business was to add memory and replace the 1600x900 screen with a nice 1920x1080 one. This went very smoothly. The lcd replacement was easy and made a gigantic difference. My D820 has a 1920x1200 screen in it, and I really didn't want to move to too much smaller resolution. The higher resolution screen on the 510 also looks MUCH nicer: brighter colors, crisper and all around more pleasant. Since the filesystem corruption was preventing me from just mass copying everything off the old machine, I just synced over a small list of things I needed: ssh keys, keepassxdb, xchat logs/settings, midori browser history. This gave me a great chance to (re)setup my mail and it's filtering. I'm a long time claws-mail user, and adding filters is pretty easy. So easy that I had added tons of them over the years. Seeing my full flood of email to my main mbox gave me a chance to look at things and ask: do I read this? Should I unsubscribe from this list? Do I need to save logwatch emails from my home machines? Do I care when something is auto-discarded from mailman? After just a few days I pretty much had everything filtering away, and in much better shape than I had things before. So, this might actually end up being something I try and do once a year or so: Drop all my filters and re-check my incoming emails for what really matters. F16 prerelease is running great on the Thinkpad. Everything works out of the box. In fact aside from an anoying glibc resolver bug (which I am running a patched glibc for), this release is looking quite stable and boring so far. ;)