February 9, 2010

Deciphering Caught Signals

Filed under: adminstration,linux,python — jonEbird @ 6:49 pm

Have you ever wondered which signal handlers a particular process has registered? A friend of mine was observing different behavior when spawning a new process from his Python script vs. invoking the command in the shell. Actually, he was consulting me about finding the best way to shutdown the process after spawning it from his Python script. You see, the program is actually just a shell wrapper which then kicks off the real program. His program would learn the process id (pid) of the wrapper and trying to send a kill signal to that was effectively terminating the wrapper and leaving the actual program running. By comparison, I asked him what happens in the shell when he tries to kill the program. Unlike being spawned in the Python script, this time the program and wrapper together would shutdown cleanly. My initial question was, “Are there different signal handlers being caught between the two scenarios?” He wasn’t sure and our dialog afterwards is what I’d like to explain to you now.

A pretty straight forward way to query what signal handlers a process has is to use “ps”. Let’s use my shell as an example:

$ ps -o pid,user,comm,caught -p $$
  PID USER     COMMAND                   CAUGHT
 3508 jon      bash            000000004b813efb

My shell is currently catching the signals being represented by the signal mask of 0x000000004b813efb. Pretty straight forward, right? Yeah, unless you havn’t done much C programming like my friend. He was not used to seeing hexadecimal numbers where each bit represents a on/off flag for each available signal. To follow along, make sure you understand binary representation of numbers first and learn that our number 0x000000004b813efb is represented in binary as 01001011100000010011111011111011. Now viewing that number and reading from right (least significant bit) to left, note which nth bit has a one or not. You can see that it is the 1st, 2nd, 4th, 5th, etc. Now all we have to do is associate those place holders with the signals they represent. Easiest way to see which numeric values are assigned to which signals is to use the “kill” command:

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     16) SIGSTKFLT
17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU
29) SIGIO       30) SIGPWR      31) SIGSYS      34) SIGRTMIN

Armed with this knowledge, you can now provide a human readable report for which signals my shell is capturing: It has signal handlers setup for SIGHUP(1), SIGINT(2), SIGILL(4), SIGTRAP(5), etc.

A quick note about signal handlers. A signal handler is basically a jump location for your program to goto after receiving a particular signal. Think of it as an asynchronous function call, or more succinctly as a callback. That is, your program’s execution will jump to the function you’ve registered for your signal handler immediately upon receiving said signal and it does not matter where in your program’s execution you are currently at. Since the call is asynchronous, a lot of people will have a signal handler merely toggle a global flag and let their program resume it’s processing and check on that flag at a more convenient time.

Now that we know how to see which signals are being caught by a program, and what signal handlers are, let’s create a new signal handler for my shell and note the changed signal mask. Again, reviewing my currently caught signals, I notice I’m not doing anything for the 3rd signal of SIGQUIT. I want to assign a signal handler on this signal so we can see the changed signal mask. I’m going to have the shell execute a simple function upon receipt of the SIGQUIT signal.

$ function sayhi { echo "hi there"; }
$ trap sayhi 3
$ trap sayhi SIGQUIT # same thing as the number 3
$ kill -QUIT $$
hi there

Now, how about our signal mask. Has it changed?

$ ps -o pid,user,comm,caught -p $$
  PID USER     COMMAND                   CAUGHT
 3508 jon      bash            000000004b813eff

The signal mask has changed from 0x000000004b813efb to 0x000000004b813eff. The new signal mask, converting from hexadecimal to binary, is 1001011100000010011111011111111. Notice how our 3rd bit from the right is now a “1″ and before it was “0″.

Understanding how the signal masks are represented is good, but it’s still a pain if you want to quickly compare the signals being caught between two different processes. Per that point, I created a little Python script to do the work for me:

#!/bin/env python

import sys, signal

def dec2bin(N):
    binary = ''
    while N:
        N, r = divmod(N,2)
        binary = str(r) + binary
    return binary

def sigmask(binary):
    """Take a string representation of a binary number and return the signals associated with each bit.
       E.g. '10101' => ['SIGHUP','SIGQUIT','SIGTRAP']
            This is because SIGHUP is 1, SIGQUIT is 3 and SIGTRAP is 5
    sigmap = dict([ (getattr(signal, sig), sig) for sig in dir(signal) if (sig.startswith('SIG') and '_' not in sig) ])
    signals = [ sigmap.get(n+1,str(n+1)) for n, bit in enumerate(reversed(binary)) if bit == '1' ]
    return signals

if __name__ == '__main__':

    if sys.argv[1].startswith('0x'):
        N = int(sys.argv[1], 16)
        N = int(sys.argv[1])

    binstr = dec2bin(N)
    print '"%s" (0x%x,%d) => %s; %s' % (sys.argv[1], N, N, binstr, ','.join(sigmask(binstr)) )

To use the my program, copy it to a file, make it executable and run it passing the signal mask of your program.

$ wget -O ~/bin/
$ chmod 755 ~/bin/ # assuming ~/bin is in your PATH
$ "0x$(ps --no-headers -o caught -p $$)"
"0x000000004b813eff" (0x4b813eff,1266761471) => 1001011100000010011111011111111;

Now back to my friend and his program problem. I asked him to fire off the program both from his Python script and then again directly from the shell. Each time I asked him to check on the caught signal mask of both the wrapper program and the actual binary and report the signal masks to me. As for the wrapper, it was consistently catching only SIGINT and SIGCLD, but the story was not as clear for the binary.
When kicked off via Python, the binary was catching the following signals:


whereas when invoked directly from the shell, the binary was catching:


Initially, I thought, “Ah ha, see it’s catching SIGINT in addition to the other signals when invoked from the shell!”, but quelled my excitement as I realized it didn’t help to explain why both wrapper and binary were both shutting down in the shell. If you sent a SIGINT to the wrapper via “kill -INT <wrapperpid>” nothing happens. Any other signal that the wrapper was not catching, such as SIGTERM (which is the default send via “kill” when you do not specifiy a signal), would cause the wrapper to terminate and orphan the binary to remain running.

The explanation lies within the shell code. We went through the various cases and when it wasn’t explained by the wrapper handling some signal and shutting down the binary, I was left with presuming the interactive shell was doing something unique. I initially observed this by running a strace against the binary and seeing the SIGINT interrupt and then later confirmed the behavior by consulting the bash source code. When you hit control-c in the shell, the shell will send a SIGINT to both processes because they are in the same process group (pgrp). I literally downloaded the bash source code to confirm this and quoting from a comment in the source code, “keyboard signals are sent to process groups”* That means a SIGINT is sent to both the wrapper and the binary. When that happens, the wrapper does nothing, as seen from prior experiments, but the binary catches it and does a clean shutdown which then allows the wrapper to complete and exit as well.

– Jon Miller

* How to efficiently root through source code is a subject for another blog. Within the bash-3.2.48.tar.gz source bundle, look at line 3230 in jobs.c.

Viewing 2 Comments

    • ^
    • v

    He wasn’t sure and our dialog afterwards is what I’d like to explain to you now.

    • ^
    • v

    There's a lot of good information here, good post!

    It reminds me that end users are going to Ctrl-C the wrappers we create (and as you point out Systems Engineers are going to create scripts that send kill signals as well). I recall once sending out emails pleading, 'please do a kill -15 before kill -9' because my wrappers where doing database locks, transactions, etc., at the time. Eventually I learned to trap and add handlers for SIGKILL and SIGTERM as a necessity, and, as I recall, made a deal with the SysAdmins to provide SIGINT handlers to do other things as well.

    I think it is worth noting, and it's my understanding (though I have not tested this lately) that when a wrapper traps SIGINT, etc., and it's handler gracefully shuts down child processes, etc. (which is binary progress, or others in the process group?) then the shell might not begin sending SIGINT to others in process group until after the wrapper has responded (and hopfully did not leave any others in process group). In other words, a wrapper that traps SIGINT might stop interactive shell from sending SIGINT to others in process group, or that the shell might only look for others in the process group after wrapper has responded to SIGINT.

    You definately got me thinking :-)

close Reblog this comment
blog comments powered by Disqus