Kevin's musings

Archive for December, 2011

Some new fedorahosted.org plugins

by on Dec.28, 2011, under fedora, linux

Thanks to some quick packaging work by Andrea Veri we have some new plugins we have added to trac on fedorahosted.org for your tracking pleasure:

  • trac-workflowadmin-plugin: This plugin lets you change the ticket workflow on your project. You can setup what states tickets must pass through and in what order. It even gives you a nice little graph of workflow.
  • trac-navadd-plugin: This isn’t too visible yet, but I am going to hopefully use it to add handy ‘register’ and ‘forgot password’ links to all projects. This should help with end user ticket reporting.
  • trac-watchlist-plugin: This allows any logged in user to watch tickets or wiki pages without having to CC to the ticket or otherwise distrub it
  • trac-tocmacro-plugin: This was enabled in the old fedorahosted, but wasn’t packaged up correctly and has been now. This allows you to use a TOC macro to provide table of contents on wiki pages.

As always, these plugins are available in the EPEL repos (currently in testing) for all to use, improve and share. If you need some specific plugin or macro in fedorahosted, just let us know!

Comments Off more...

fedorahosted.org changes an enhancements

by on Dec.18, 2011, under fedora, linux

I just thought I would shoot out a quick post about some changes and enhancements at fedorahosted.org (an excellent place to host your open source project).

We’ve moved fedorahosted to RHEL6 and a newer set of hardware. Everything should be faster and newer. You will probibly most notice the upgrade to trac (from 0.10 to 0.12), but mailman, scms and all other software should also be newer. The backend is now a cluster of two drbd using nodes. We are going to look at decentralizing things much further in the coming year.

Speaking of trac, there’s a ton of new items in the new version along with some new plugins we have enabled:

  • The MasterTickets plugin is now installed, so you can have tickets block or be blocked by other tickets, and get a nice graph of the dependency tree.
  • Multiple repository support. You can now have your project list and show multiple source repos, using the same or different SCM’s
  • A great deal of configuration is available to trac admins on notification emails and permissions.
  • Unicode and i18n support is vastly improved
  • With a bit of adjustment, the PrivateTickets plugin works fine with the new trac. This allows you to have tickets only some groups can read/see if needed.
  • Configurable work flows for tickets. You can set things to go through specific states before being fixed.

Mailing lists (lists.fedorahosted.org) have some small but nice improvements, like properly handling bounce messages from inactive google.com accounts and better utf8 handling.

Our review board instance is still running along, and should be a bit faster now.

We have great plans for 2012 for fedorahosted. If you’re interested in helping us out, take a gander at the Fedora Infrastructure Getting Started page and jump in and join us.

See the fedorahosted.org FAQ for more information about Fedora Hosted and how to host your next open source project there.

Comments Off more...

Fedora NFS server outage retrospective

by on Dec.10, 2011, under fedora, linux

As you may have seen if you are on the fedora announce list, we had an outage the other day of our main build system NFS storage. This meant that no builds could be made and also data could not be downloaded from koji (rpms, build info, etc). I thought I would share here what happened so we can learn from and try and prevent or mitigate this happening again.

First, a bit of background on the setup: We have a storage device that exports raw storage as iSCSI. This is then consumed/used by our main nfs server (nfs01). It’s using the device with lvm2, and has a ext4 filesystem on it. It’s around 12TB in size. This data is then exported to various other machines to use, including builders, kojipkgs squid frontend for packages, koji hubs and release engineering boxes that push updates. We also have a backup nfs server (bnfs01) that has it’s own separate storage with a backup copy of the primary data.

On the morning of December 8th, the connection between the iSCSI backend and nfs01 had a hiccup. It retried current in progress writes, and then decided it could resume ok and kept going. The filesystem had “Errors behavior: Continue” set so it kept going (although no actual fs errors were logged, so that may not matter). Shortly after this, NFS locks started failing and builds were getting I/O errors. A lvm snapshot was made and a fsck run on that snapshot, which completed after around 2 hours. A fsck was then run on the actual volume itself, but that took around 8 hours and showed a great deal more corruption than the snapshot had. In order to get things into a good state, we then did a rsync of the snapshot off to our backup storage (which took around 8 hours), and merged that snapshot back as the master fs on the volume (which took around 30min to complete). Then, a reboot and we were back up ok, but there were some small number of builds that were made after the issue started. We purged them from the koji database and re-ran them with the current/valid/repaired filesystem. After that builders were brought back on-line and queued up builds processed and things were back to normal.

So, some lessons/ideas here, in no particular order:

  • 12TB means most anything you decide to do will take a while to finish. On the plus side that gives you lots of time to think about the next step.
  • We should change the default on-error behavior to at least ‘read-only’ on that volume. Then errors would at least stop further corruption, and best prevent the need for a lengthy fsck. It’s not entirely clear if the iSCSI errors would have made the fs hit error condition or not however.
  • We could do better about more regular backups of this data. A daily snapshot and rsync off of that snapshot to backup storage could save us time in the event of another backup sync being needed. We would also then have the snapshot to go back to if needed
  • Down the road some of the cluster fses might be a good thing to investigate and transition to. If we can spread the backend storage around and have enough nodes, the failure of any one might not be as much impact.
  • Perhaps we could add monitoring for iscsi errors and note and react to them quicker
  • lvm and it’s snapshots and ability to merge a snapshot back in as primary really helped us out here.

Feel free to chime in with other thoughts or ideas. Hopefully it will be quite some time since we have another outage like this one.

Comments Off more...

Lessons from password/key changes

by on Dec.01, 2011, under fedora, linux

Recently, Fedora infrastructure requested everyone change their password and upload a new SSH public key. We gave folks almost 2 months to make the change. I’d just like to note here the reactions we had, divided into 4 ‘groups’ of users. Of course some people were between two of these groups or some probably had slightly different reactions, but this is what my reading of the feedback leads me to:

  1. Group one sees the announcement, skims it, goes and changes their password, generates and uploads a new ssh key, goes back to what they were doing.
  2. Group two sees the announcement, reads it, reads the links off it. Adjusts their firewall, checks their backups, re-enables selinux or applies updates, then changes their password and uploads a new key
  3. Group three sees the announcement. Complains that their private ssh key is safe and always has been, and they know all about passwords and ssh and encryption and this change is unneeded.
  4. Group four doesn’t even see the announcement. They are no longer involved, too busy to read it, or just don’t care

Group one spends about 5 minutes. The advantage to Fedora Infrastructure isn’t great here, but they do have a new password that meets the guidelines and a new ssh key in case they were careless with the old one. Of course they didn’t learn much, so they could be careless with the new one as well.

Group two are the very people we are trying to reach most, and here’s the most advantage to this plan. These people will learn how to improve their security some, how better to handle their ssh private keys and hopefully prevent compromise on their personal machines. They may spend a good deal of time following the links and learning about best practices, but it’s all time well spent.

Group three are the vocal minority. While they already know best practices and keep their ssh private keys safe, they don’t realize we have any way of telling them apart from group 4 (below). Time spent here is large for both them and Infrastructure folks, but advantage is low because it amounts to “I know I am fine and shouldn’t have to do this” vs “We have no way of knowing that”. There are likely a less vocal subset of these folks that show up in Group one (just do it and grumble and move on).

Group four is another group where there is good advantage for infrastructure. These are folks that are gone, don’t pay attention to their Fedora work, or are too busy to spend a few minutes on it. Packages with these folks as maintainers would be better off being orphaned and reassigned to people who use and care for them. Sysadmin groups with these folks are better off not having someone who they think are involved, but really doesn’t have time to be.

So, in the end I think there is still good advantage to us having done this. I hope the folks in group two are large and learn from our documents and best practices. I hope the people in group three realize that this isn’t just about them, it’s about the community, and I hope pruning the people from group four helps improve the health and activity of our community.

Finally, as a side note, the deadline was last night, but we are still assessing exactly how we want to mark inactive folks who haven’t yet changed their password and uploaded a new key. You still have time to go do it now. ;)

2 Comments more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...