Skip to main content

On reboots

Rebooting machines is a interesting study in the varied opinions of the Linux community. On one end, there are folks who will use ksplice or simply avoid rebooting for any reason short of a hardware failure. On the other you have desktop users who reboot their machines daily. I'm somewhere in the middle: For servers if there is a security update to the kernel or glibc that applies, the server should be rebooted. For my laptop, I usually reboot when there's a reason (I want to test something related to the boot process, there's a security update, etc). Rebooting servers regularly (and it seems doing so for security updates accomplishes this) has several other advantages:

  • You can schedule your rebooting. Sometimes power cycling or rebooting a machine puts some stress on the hardware, if it fails, you are able to call for service, etc. If it happens at 2am on sunday morning when you have 9 to 5 business day coverage, you are in much worse shape
  • Even with configuration management there are times when things are added to a server, but not set to start on boot, or need some config change to start properly. Better to fix those in a maint window than to not have them come up after an unattended reboot
  • You can find out a lot about your servers and how they interrelate, and where the points of failure are by rebooting them
  • Do you recall how to get to that serial console / IPMI / pdu / kvm for that server? Scheduling a reboot is a great time to make sure your console access works and shows the boot process.
Fedora Infrastructure has about 150 or so machine instances to manage currently. Until recently the mass reboot process was pretty much just scheduling a block of time and powering through rebooting things. Often we would go over our window as that is a lot of machines to reboot and confirm are back up nicely. So, I worked on changing our process for mass reboots. Now, all hosts are put in 3 different buckets:
  • The "C" group. These are machines that only infrastructure folks will notice are down or are machines where there are redundant resources, so they can be rebooted anytime as long as failover or dns changes are made first so no live traffic is still hitting them.
  • The "B" group. These are machines associated with Fedora contributors and package manintainers. End users won't notice these being down, but contributors/package maintainers will. These reboots need to be scheduled, but since there are few machines in this group, the outage is small.
  • The "A" group. These are machines that end users may notice being down or slow to respond. Database servers or mailing list hubs would be in this group. Again the number is very small, so outages would be very short
Over time, we are working on moving all the hosts in "A" and "B" groups into "C". Which would leave us being able to reboot things as time permits with no need for any scheduled outage, but at least with the above setup, outages are much shorter and less frantic. The last set of reboots we did, we used the new method and planning, and I think it went much more smoothly. We also discovered some additional points of failure:
  • An nameserver reboot caused some internal machines to stop processing, which allowed us to revamp our nameserver setup and make sure all machines were setup to failover properly to another nameserver.
  • A nfs server reboot in the "B" group affected some web servers in the "A" group. This allowed us to revisit why there's a dependent NFS mount on those web servers.
True to the Fedora way, the details are available at: Mass_Upgrade_Infrastructure_SOP.

Fedora FPCA deadline rapidly approaching...

If you have not had a chance yet to go and sign in to your fedora account and sign the Fedora Project Contributor Agreement please do so soon! The deadline is approaching this thursday. After that point you will be removed from any groups that require "cla_done", for example the packager group, or any of the various sysadmin groups or access to fedorapeople. You can of course be re-added to any groups later, but this could be a annoying and time consuming process. Take a few minutes to look it over today! EDITED TO ADD: We decided to give another week for a variety of reasons. So, the new cut off is 2011-06-23. We are going to inform group admins and post a list of packages that will potentially be without owners soon.

Fedora 15 Party!

The release of Fedora 15 is just around the corner (this tuesday, 2011-05-24)! As always, we will be having a on-line release party over in #fedora-social on irc.freenode.net. :) Please join us in celebrating the release...

Fedora Election Season

It's that time again: Fedora election season. The Fedora Board and The Fedora Engineering Steering Committeeare open to nominations until tomorrow. I have no desire to serve on the Fedora Board, but I've been on FESCo since it was the Fedora Extras Steering Committee. So, thinking about running again brought up many thoughts:

  • I think I do a good job keeping FESCo on track and productive
  • I think I do good keeping FESCo open and operating in the light
  • I do think new blood, energy, ideas would be welcome.
  • I for the most part enjoy working on FESCo matters
  • I have perhaps less time than I have in the past, what with the new job
Given all that, I have decided to run again, but possibly see if I can't pawn off the chairpersonship on someone else if I am re-elected. I will let the voters decide if they would like to see me continue or would prefer some new folks joining in. As always, I am happy to talk with folks about anything. Feel free to comment here, shoot me an email or catch me on IRC. I'm sure I will also be participating in the town halls, so feel free to save up questions for that as well. If you have time, energy and new ideas for FESCo, do consider running. The Board race this time should be interesting. There's a number of new faces and ideas and folks I know and have worked with. I look forward to talking with Board candidates as we move forward in the elections. Good luck to everyone who is willing to step up and make things better!

Merlin update - 2011-05-08

Merlin flash Merlin is very slowly improving. He can stand up on his own, and sometimes he can lay down on his own too. A few days ago he started eating small amounts (we have been giving him canned puppy food mixed with water), although we are still feeding him through his stomach tube. His poop is almost back to normal as well, which is great news! I'm looking forward to the time we can leave him alone for the night so we can both get some sleep. ;) He also really really needs to gain weight. He is down to about 50lbs, when he should be upper 60's or lower 70's. ;(

The drive-by review

I've lately noticed an increase in what I call "Drive by reviews". They say that "first impressions are lasting impressions" and I can understand that. Everyone wants to make a good first impression and it takes a lot to overcome a bad one. However, I've seen more folks of late instead taking a first impression of something as a only impression, or over-generalizing based on one first impression. A good example of this happens all the time to restaurants. How often have you heard someone say: "I went to that new restaurant and I am never going to go there again"? When asked for more details how often do they say:

  • The line was long and we didn't want to wait
  • The service was slow
  • The waiter carded me! (asked for ID for a adult beverage)
Of course there are many great reasons for not wanting to go back, but thinking logically about some of the above makes me at least think that I would give the place another try. Long line? It was just opening and you have no data at all what the place was like, you never even went in. Slow service? Just opening and getting the hang of things. Carded? Perhaps their is a law they are following to do so. Bad mood? Stomach ache? Ordered the wrong thing? A host of issues that could well be solved if you give the place another try in a few weeks. Another trend seems to be "I went to $foo and didn't like it, so I am never going to another $foo again". Sure, some things would be the same or very similar in other restaurants in a chain, but others would not be. Is your reasoning related to one of those variables? Which brings me to the Fedora/Linux tie in here. Every few months I see a sad tale of someone who tried the Fedora {mailing lists|forum|irc channel} and had a bad first impression, which leads to a "I am never going to use {Linux|Fedora} again!". Please take a few moments to think logically and not judge an entire Linux distribution or Operating system based on one forum post, email or 5 minutes in an IRC channel. Do some research, work on explaining your problem better or in a different way, try a different support channel, or at the very least note that your impression is based on only one single drive by. It's hard to overcome a bad first impression, but do consider giving more than a single chance.

Fedora Talk: First static, then silence (talk.fedoraproject.org closing up 2011-05-05)

For many years Fedora infrastructure has been running a talk.fedoraproject.org asterisk server. This allows contributors to talk to each other, or send voice mails, etc. However, it gets very very little usage and also has no one really maintaining it or fixing issues with it. In the last 130days there have been a total of 95 calls using the server. There are a number of outstanding infrastructure tickets on the service that no one has dealt with. The server running it is running an outdated OS version and asterisk version. So, we are going to retire the service on 2011-05-05. Now, some of you might say: "But I want to use it! what can I do to save it?" First, as it is now, it's a waste of a server instance to run something 24x7 that only processes 1 call every few days. Any plan to continue the service would HAVE to include some way to increase usage of the service to make it worth doing. There would also need to be a group of folks who would help fix issues, enhance the service, help with updates and so on. A single interested person would be a single point of failure. If such a group forms and has a plan, please bring it to the infrastructure list/meetings.

NEEDINFO: infrastructure tickets

<begin PSA> If you've filed a ticket in the Fedora infrastructure trac at some point in the past (you can quickly check this by, logging in here and then going here. ), can you please take a minute to:

  • See if the ticket is waiting for information from you => Please provide it.
  • Has had no activity at all => Perhaps you could elaborate on your request, or add details
  • Is no longer needed/wanted => Please close it
  • Could perhaps be better filed somewhere else. In particular, bodhi, fas, websites and pkgdb all have their own trac instances. You're likely to get more attention from their developers there than in the infrastructure trac => Refile and close the infrastructure ticket.
Thanks for your consideration. <end PSA>

hug your pets today...

We have had a recent unfortunate string of pet health issues here. ;(

  1. Winter the kitty: Losing weight. Bloodwork was fine. x-rays fine. We insist on a ultrasound and they find that she has lymphoma. Luckily it's the "best" kind and means it should hopefully be treatable by a steroid and some low level oral chemo drug. She might well live out her natural span.
  2. Legolas (legs) the greyhound. Limping and having trouble getting around. x-rays fine. Bloodwork fine. On to the MRI. This shows that his old racing injury on his spine is pressing on his spinal colum causing problems. Spinal surgery (including 6 days in the hospital recovering). Happily he's doing well and prognosis is good. Just need to keep him from falling for a few more weeks and then he can do stairs again and go on (short) walks.
  3. Merlin the greyhound. Not eating. Vomiting. Really bad D. Bloodwork ok except he was dehydrated. Started on IV fluids and antibiotics. Still not getting any better, refusing all food. More x-rays. Nothing. Finally a ultrasound finds that he has a thymoma thats pressing on his esophogus, causing him to be unable to eat. Currently he's in the hospital where they are trying to get enough food in him so he gets enough strength back to do a surgery and remove the thymoma. Prognosis is unknown. If he can get enough energy, and the thymoma is well defined they can remove it all and he will be good, but not sure either of those will happen.
So, hug your cats and dogs and be happy when they are healthy. ;)

Starting a new adventure with some traveling...

Yesterday was my last day with Tummy.com, and next week I start in on my new job with Red Hat. (In case folks missed it, I will be taking over as Fedora Infrastructure Lead). Some big shoes (and/or sideburns! :) to fill, but it should be a great adventure. The adventure begins with a trip down to Raleigh for monday and tuesday, then up to Boston wed, thursday, friday then back home. So, if you are looking for me on IRC or mailing lists, I may be scarce next week. If you are in Raleigh or Boston and want to meet up drop me an email and I can see what I can do schedule wise (Although no promises, as I might be pretty busy).