jonEbird

February 9, 2010

Deciphering Caught Signals

Filed under: adminstration, linux, python — jonEbird @ 6:49 pm

Have you ever wondered which signal handlers a particular process has registered? A friend of mine was observing different behavior when spawning a new process from his Python script vs. invoking the command in the shell. Actually, he was consulting me about finding the best way to shutdown the process after spawning it from his Python script. You see, the program is actually just a shell wrapper which then kicks off the real program. His program would learn the process id (pid) of the wrapper and trying to send a kill signal to that was effectively terminating the wrapper and leaving the actual program running. By comparison, I asked him what happens in the shell when he tries to kill the program. Unlike being spawned in the Python script, this time the program and wrapper together would shutdown cleanly. My initial question was, “Are there different signal handlers being caught between the two scenarios?” He wasn’t sure and our dialog afterwards is what I’d like to explain to you now.

A pretty straight forward way to query what signal handlers a process has is to use “ps”. Let’s use my shell as an example:

$ ps -o pid,user,comm,caught -p $$
  PID USER     COMMAND                   CAUGHT
 3508 jon      bash            000000004b813efb

My shell is currently catching the signals being represented by the signal mask of 0×000000004b813efb. Pretty straight forward, right? Yeah, unless you havn’t done much C programming like my friend. He was not used to seeing hexadecimal numbers where each bit represents a on/off flag for each available signal. To follow along, make sure you understand binary representation of numbers first and learn that our number 0×000000004b813efb is represented in binary as 01001011100000010011111011111011. Now viewing that number and reading from right (least significant bit) to left, note which nth bit has a one or not. You can see that it is the 1st, 2nd, 4th, 5th, etc. Now all we have to do is associate those place holders with the signals they represent. Easiest way to see which numeric values are assigned to which signals is to use the “kill” command:

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     16) SIGSTKFLT
17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU
25) SIGXFSZ     26) SIGVTALRM   27) SIGPROF     28) SIGWINCH
29) SIGIO       30) SIGPWR      31) SIGSYS      34) SIGRTMIN
35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3  38) SIGRTMIN+4
39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

Armed with this knowledge, you can now provide a human readable report for which signals my shell is capturing: It has signal handlers setup for SIGHUP(1), SIGINT(2), SIGILL(4), SIGTRAP(5), etc.

A quick note about signal handlers. A signal handler is basically a jump location for your program to goto after receiving a particular signal. Think of it as an asynchronous function call, or more succinctly as a callback. That is, your program’s execution will jump to the function you’ve registered for your signal handler immediately upon receiving said signal and it does not matter where in your program’s execution you are currently at. Since the call is asynchronous, a lot of people will have a signal handler merely toggle a global flag and let their program resume it’s processing and check on that flag at a more convenient time.

Now that we know how to see which signals are being caught by a program, and what signal handlers are, let’s create a new signal handler for my shell and note the changed signal mask. Again, reviewing my currently caught signals, I notice I’m not doing anything for the 3rd signal of SIGQUIT. I want to assign a signal handler on this signal so we can see the changed signal mask. I’m going to have the shell execute a simple function upon receipt of the SIGQUIT signal.

$ function sayhi { echo "hi there"; }
$ trap sayhi 3
$ trap sayhi SIGQUIT # same thing as the number 3
$ kill -QUIT $$
hi there

Now, how about our signal mask. Has it changed?

$ ps -o pid,user,comm,caught -p $$
  PID USER     COMMAND                   CAUGHT
 3508 jon      bash            000000004b813eff

The signal mask has changed from 0×000000004b813efb to 0×000000004b813eff. The new signal mask, converting from hexadecimal to binary, is 1001011100000010011111011111111. Notice how our 3rd bit from the right is now a “1″ and before it was “0″.

Understanding how the signal masks are represented is good, but it’s still a pain if you want to quickly compare the signals being caught between two different processes. Per that point, I created a little Python script to do the work for me:

#!/bin/env python

import sys, signal

def dec2bin(N):
    binary = ''
    while N:
        N, r = divmod(N,2)
        binary = str(r) + binary
    return binary

def sigmask(binary):
    """Take a string representation of a binary number and return the signals associated with each bit.
       E.g. '10101' => ['SIGHUP','SIGQUIT','SIGTRAP']
            This is because SIGHUP is 1, SIGQUIT is 3 and SIGTRAP is 5
    """
    sigmap = dict([ (getattr(signal, sig), sig) for sig in dir(signal) if (sig.startswith('SIG') and '_' not in sig) ])
    signals = [ sigmap.get(n+1,str(n+1)) for n, bit in enumerate(reversed(binary)) if bit == '1' ]
    return signals

if __name__ == '__main__':

    if sys.argv[1].startswith('0x'):
        N = int(sys.argv[1], 16)
    else:
        N = int(sys.argv[1])

    binstr = dec2bin(N)
    print '"%s" (0x%x,%d) => %s; %s' % (sys.argv[1], N, N, binstr, ','.join(sigmask(binstr)) )

To use the my signals.py program, copy it to a file, make it executable and run it passing the signal mask of your program.

$ wget -O ~/bin/signals.py http://jonebird.com/signals.py
$ chmod 755 ~/bin/signals.py # assuming ~/bin is in your PATH
$ signals.py "0x$(ps --no-headers -o caught -p $$)"
"0x000000004b813eff" (0x4b813eff,1266761471) => 1001011100000010011111011111111;
 SIGHUP,SIGINT,SIGQUIT,SIGILL,SIGTRAP,SIGIOT,SIGBUS,SIGFPE,SIGUSR1,SIGSEGV,SIGUSR2,
 SIGPIPE,SIGALRM,SIGCLD,SIGXCPU,SIGXFSZ,SIGVTALRM,SIGWINCH,SIGSYS

Now back to my friend and his program problem. I asked him to fire off the program both from his Python script and then again directly from the shell. Each time I asked him to check on the caught signal mask of both the wrapper program and the actual binary and report the signal masks to me. As for the wrapper, it was consistently catching only SIGINT and SIGCLD, but the story was not as clear for the binary.
When kicked off via Python, the binary was catching the following signals:

  SIGQUIT,SIGBUS,SIGFPE,SIGSEGV,SIGTERM

whereas when invoked directly from the shell, the binary was catching:

  SIGINT,SIGQUIT,SIGBUS,SIGFPE,SIGSEGV,SIGTERM

Initially, I thought, “Ah ha, see it’s catching SIGINT in addition to the other signals when invoked from the shell!”, but quelled my excitement as I realized it didn’t help to explain why both wrapper and binary were both shutting down in the shell. If you sent a SIGINT to the wrapper via “kill -INT <wrapperpid>” nothing happens. Any other signal that the wrapper was not catching, such as SIGTERM (which is the default send via “kill” when you do not specifiy a signal), would cause the wrapper to terminate and orphan the binary to remain running.

The explanation lies within the shell code. We went through the various cases and when it wasn’t explained by the wrapper handling some signal and shutting down the binary, I was left with presuming the interactive shell was doing something unique. I initially observed this by running a strace against the binary and seeing the SIGINT interrupt and then later confirmed the behavior by consulting the bash source code. When you hit control-c in the shell, the shell will send a SIGINT to both processes because they are in the same process group (pgrp). I literally downloaded the bash source code to confirm this and quoting from a comment in the source code, “keyboard signals are sent to process groups”* That means a SIGINT is sent to both the wrapper and the binary. When that happens, the wrapper does nothing, as seen from prior experiments, but the binary catches it and does a clean shutdown which then allows the wrapper to complete and exit as well.

– Jon Miller

* How to efficiently root through source code is a subject for another blog. Within the bash-3.2.48.tar.gz source bundle, look at line 3230 in jobs.c.

September 27, 2009

Presenting at Inaugural CoPUG

Filed under: hadoop, python — jonEbird @ 8:34 pm

Tomorrow I will be presenting an Introduction to Hadoop: Driven by Python for the inaugural Central Ohio Python Users Group or just CoPUG for short.

I have high hopes for CoPUG. The organizer, Eric Floehr, appears to be well organized, competent individual although I have only exchanged emails and have yet to meet in person. While in Atlanta, last year for PyWorks, I learned of the very strong PyAtl group lead by none other than the current editor of the Python Magazine, Brandon Rhodes. Although I am not sure, I wonder if their Python group has something to do with PyCon coming to Atlanta in 2010. Can I dream of PyCon someday coming to Columbus?

My Introduction to Hadoop: Driven by Python slides provided under the Creative Commons Attribution 3.0 United States License.

August 10, 2009

Hadoop Elephant Makes a Big Splash

Filed under: blogging, hadoop, python — jonEbird @ 5:27 pm

Big news in the world of Hadoop today. My Running Large Python Tasks With Hadoop is published in the July Edition of Python Magazine. This marks my second article with the magazine and I had a lot of fun doing it. My interest in the anti-rdbms will continue as I continue to find interesting ways to organize data in the enterprise.

While providing a gentle introduction to Hadoop, my article also introduces readers to my HadoopCalculator which you can install a couple of different ways. First way is done via git where you can pull my HadoopUtils repo from github via:

git clone git://github.com/jonEbird/Hadoop-Utils.git


That will bring a few more scripts than just my HadoopCalculator. The second way to install is to use the Python setuptools utility easy_install or pull down the source package from the Cheese Shop.

Thank you for reading this far. I lied. The big news today in the Hadoop world is Doug Cutting joining Cloudera. Had you going, didn’t I? Recently, while Doug was still with Yahoo!, the Microsoft and Yahoo Partnership had people wondering what impact that would have on the Hadoop ecosystem. Today, Yahoo! is the largest Hadoop user and for obvious reasons contributed a lot to the community. Cloudera was already a well known player in the Hadoop community but their stock has risen immensely with the addition of Doug Cutting. If they were selling stock, I’d buy.

November 16, 2008

Pyworks In Summation

Filed under: PHP, blogging, python — jonEbird @ 7:10 pm

I sit in the Atlanta Airport reminiscing over the events of PyWorks ‘08. This was the first year for PyWorks but MTA combined the conference with PHP Architect and I believe everyone was happy with the combination. At a minimum, people had engaging conversations between the groups and a significant number of them cross-attended the sessions. I attended two PHP sessions and one neutral session and then the rest Python. Some people were a bit disappointed in the lack of Python attendees and it is true that we didn’t make up a large part of the total 148 attendees of the conference. But with the quality of talks staying superbly high, not having a full room wasn’t a bad thing.

The quality of talks were all superb, indeed. Probably over half of the presenters are either principle developers on high profile projects or they have written a book or own their own consulting company. On day zero, where there were 3hr long tutorial sessions, I spend the morning in Mark Ramm’s TurboGears but then I switched over to the PHP side in the afternoon to catch Scott MacVicar and Helgi Þormar Þorbjörnsson’s Caching for Cash.

At the start of day one, the first day of the normal sessions, I think everyone was expecting a lot more people. There were, in fact, more people but not as many as I was expecting, but again that’s perfectly okay. This day was a full one, starting off with the keynote by Kevin Dangoor about Growing your Community. After a break I then attended Decorators are Fun by Matt Wilson and learned that he is not that far away from me in Cleveland. Next I attended another Mark Ramm talk about WSGI where he was explaining how easy it was to build a web framework. It was given a bit “tongue in check” since he is the primary maintainer of TurboGears. Following that, I attended a middle track session about Distributed version control with GIT by Travis Swicegood. Travis had just finished writing a book about using GIT called Pragmatic Version Control Using Git and not surprisingly gave a authoritation explanation of using GIT. Following lunch, I attending another PHP track presentation but it could have been in the neutral middle track. The talk was Map, Filter, Reduce In the Small and in the Cloud by Sebastian Bergmann where he explained the popular functional programming techniques popularized by Google for computing large quantities of data. Sebastian gave me another reason to checkout Hadoop and in fact I’m now thinking of another Python Magazine article about using hadoop with Jython. For the last session of the day I decided to attend Michael Foord’s talk about IronPython. I didn’t think I’d ever checkout IronPython on my own, so I thought I’d get a crash course from Michael who also just finished work on his book IronPython in Action.

Still not done with day one. After all of the normal presentation’s concluded, we had happy hour while gearing up for the Pecha Kucha competition sessions. Pecha Kucha is where you provide 20 slides and set them to auto switch every 20 seconds making your session a little over six minutes. Apparently people have found that you can get the same quality bits of information in that format as compared to a full hour session. At least that is what the Japanese have concluded. As for PHP/PyWorks, we mostly had fun with the sessions. There were talks about web security, general ranting, LOLCode, and many others which I’m having a problem remembering. At the end, the LOLCode talk took the prize of the Xbox 360 gaming system by our judges and if you’d really like to see what was going on, you may be able to watch streamed video captured by Travis Swicegood’s iPhone. Before I went to bed, I rehearsed my presentation one more time.

By the time day two started, it felt like I had been there a full week and yet we still had a full day of presentations again. I started the morning in Chris Perkins’s talk about the Sphinx Documentation System. We all understand the importance of documentation and it’s not always fun, but again I thought investing 45min catching up on some of the Python “best practices” for documentation would be well worth the time. Afterwards, I stayed in the same room for Jacob Taylor’s talk about Exploring Artificial Intelligence with Python. Jacob didn’t get around to showing any Python code but he had good attendance for being a founder of SugarCRM. Next, the highlight of the conference, my presentation about LDAP and Python. The number of attendees for my presentation were average for the Python sessions and by this point I felt like I knew everyone which removed any pressure or nervousness. We’ll see how interested people were by seeing who downloads my configparser.py and/or ldapconfig.py scripts. After lunch, I attended Kevin Dangoor’s Paver talk where he explained the motivations for Paver and showed numerous examples of what pain points it solves. Finally, the last session I attended at PyWorks was Jonathan LaCour’s talk about Elixir, the Python module which makes introduction into SQLAlchemy an easy one. Elixir helps kick start your DB code by simplifying SQLAlchemy by making a lot of sane choices for you as well as providing other conveniences. Jonathan had to work hard to get all of his content into his hour, mostly because he gave a decent overview of SQLAlchemy and then his Elixir module.

As with the previous day, this day concluded with another happy hour while waiting for our closing keynote. The closing keynote was given by Jay Pipes about “living in the gray areas” and not sticking to extreme black and white of our technologies. He praised the joint efforts being made by the PHP and Python folks and criticized people who are too biased to learn from the other communities. Jay is working on Drizzle, while working for Sun, where they are challanging all of the preconceived notions being made by the MySQL community. Drizzle is basically a fork of MySQL and their goals are to provide a much more streamlined version of a database. Jay explained that forks are good (as well as “sporks”) because it keeps people on their toes and keeps the level of competition up. Finally, Jay’s last point was that we need to spend more time listening to other people and less time preaching our biased opinions.

I overheard PHP and Python people resonating Jay’s message after the keynote. I’m glad to have participated in such a successful conference where I truely believe boundries were crossed. With as much time that I spend with the PHP folks, I was repeatedly asked, “So, you coming over to the PHP side?” I think the last time I was asked that was in the hotel pool where again I was playing the role of the “token Python guy” amongst the PHP folks. To be honest, those PHP folks know how to have fun, and if my criteria for choosing a programming language was the amount of fun the community had I would be doing PHP development. I definately want attend next year’s PyWorks and PHP conference and I have an entire year to come up with my presentation proposals.

October 25, 2008

PyWorks Stuff

Filed under: adminstration, python, usability — jonEbird @ 12:00 am

For the 2008 PyWorks convention, I will be presenting about LDAP and Python. The presentation is really about demystifying LDAP and encouraging people to use and extend LDAP for their config file needs. In efforts to make my point, the last half of my presentation will be a time for a demo. This entry is your basic landing point where you can download the scripts, presuming you are looking for a copy of the scripts and/or slides after seeing my presentation? (oh! nevermind, your google search landed you here)

PyWorks Speakers Badge

For the demo, I will be leveraging the fail2ban project. It is a python based application which scans typical application logs for security failures and bans IPs from being able to connect again. It also uses the builtin ConfigParser module for reading it’s 30+ config files, which is why I have chosen to use it. For the demo, I have created two scripts:

The first one, configparser2ldap.py is used to process a set of config files and automatically generate LDAP schema as well as LDIF data.

Next, I have my ldapconfig.py module where I extended the ConfigParser module to support making queries to LDAP. I am basically overriding the read() method only and leaving the rest of the module alone. This way the only modifications to the fail2ban application are how it is instantiating the ConfigParser and I won’t have to become a full time fail2ban developer if I want to centralize the configuration data in LDAP.

And that is really the main point of my presentation: The power of centralizing your configuration data and how it can drastically change how you administer your large scale server farm.

Downloads

LDAP + Python Slides.

configparser2ldap.py script to auto-generate LDAP schema and LDIF from ConfigParser compatible config files.

ldapconfig.py python module which inherits the ConfigParser and supports optionally pulling config data from LDAP.