jonEbird

November 13, 2007

Reverse Engineering Buddy

Filed under: adminstration, linux, usability — jonEbird @ 10:34 pm

An Idea for a helpful Admin Tool

What if you got a page and/or ticket for an obscure server’s particular service? The unique problem is that your environment is huge, you’re still relatively new to the company, co-workers are not there to help you and you have never heard of this server. When logging in, you’re hoping that the person has a nice RC script under /etc/init.d/, that you can find the app via a “lsof -i:<port>”, find the application’s home and locate some log files. But what if the application install was not that nice and did not conform to the norms that you are used to?

To either a small or very large degree, you will be reverse engineering this application. If you’re really unlucky, the application who supports it also has no idea about it nor knows anything about Unix-like machines. So, what if there was an application which is polling upon logging into the server, told you, “In case you are looking for the application binX, which typically listens on port XX, it was most likely started last time by issuing the script /path/to/funky/path/binX.sh”. I’m guessing it would freak you out and immediately flood your emotions with confusion, gratitude and curiosity.

So, would such an application be difficult to write?

  • Poll any events for read/write/access under key dirs, such as /etc/init.d/, /etc/*conf ? (use the inotify syscall introduced in Linux kernel 2.6.16)
  • Track users logging into the system (could correlate later)
  • Watch for any new ports being listened on, then record the binary name.
  • Reverse engineer this application to automatically collect interesting data on it.
  • Intelligently parse an strace (note to self, checkout: http://subterfugue.org/)
  • Utilize systemtap for Linux and DTrace for Solaris. pseudo code { observe new socket being opened, so show me the last 10 files opened and executed. correlate application with startup script }

Now, if your data was collected in a easily usable format, you can collect similar data from other machines and start to make broader correlations.

The whole process is really about automating the process of reverse engineering an application. I do that alot. I believe others would like an application which aided or performed the entire reverse engineering for them.

June 21, 2007

Stripping Another Process of it’s Signal Handler

Filed under: adminstration, linux — jonEbird @ 11:38 pm

Have you ever wanted to send a signal, which normally produces a core file, but the process has one of those annoying signal handlers setup to catch the signal you’re sending? The nerve of that application trying to intelligently handle signals! I actually have a real need to remove the signal handler of a process which I’ll describe shortly. Normally, it is a bad idea to remove another process’s signal handler and under normal circumstances I do not suggest following the procedure which I am going to describe.

I have been struggling with a production issue at work with a process which has been less than cooperative. You see, I have a java process which gets crazy and starts consuming CPU cycles. When you run a strace against the process the only system call you will see is a sched_yield() call. The java thread is most likely stuck on a spinlock in user space and the process/thread which owns the lock has died or something else, but for my runaway process all it cares about it is checking for it’s lock and yielding execution back to the kernel to schedule another task. Ofcourse, it just gets the CPU again and continues to pound it.

My company pays alot of money for support and we actually have had a case now open for eight months now. The problem is we are unable to gather sufficient data for their level2 and level3 support teams. They would like a javacore to be generated, which can be done by sending a signal 3 to the java process. In addition to a javacore, they recommend sending a signal 11 (SEGV) to the process to prompt the generation of a normal binary core file. Either one would be invaluable for the support team in ascertaining what is going wrong. Unfortunately, it seems that once the process is stuck in this tight, sched_yield() loop any of the signals we send to it are being ignored. In short, that is my problem.

During my Linux Kernel Internals training with RedHat, I had an idea of writing a kernel module to strip the signal handler from the java process so I can finally generate that elusive core file. The kernel module sets up an entry under /proc named stripsignal_pid. If you read the value, it will tell you a quick one-liner about using this interface. To use the module, you write a process ID into that file and that process’s SIGABRT signal handler will be reset to the SIG_DFL. At this point, if you send a SIGABRT signal to the process the result will be writing out of it’s core file.

Download the source along with a helper test program here: stripsignal.tar.gz.

But if all you are interested in is reviewing the short source code, then feel free to browse the stripsignal.c source online.

I tell you what, the best thing I learned from the class was just familiarizing myself with the source code and actually learning some new emacs tricks for navigating large source code projects. Next writeup will be about my experiences with the GNU global tagging system.

July 31, 2006

Asynchronous Network Programming from an Admin Perspective

Filed under: adminstration, linux — jonEbird @ 9:08 pm

Asynchronous programming is common place for developers but it can often be a mysterious thing to system administrators who merely know enough programming to get by. Since the vast majority of material you will find on the subject is catered towards developers, it can easily go right over the heads of many administrators. This is for those administrators.

As a system admin myself, I tend to break applications down to the system call level while troubleshooting problems. That is where we live, day in day out, while keeping closed source applications up and running. The only window we have into the nature of proprietary applications are the system calls they make.

To me, asynchronous programming is nothing more than an exercise of using the select system call. The select system call takes various arrays of file handles ready to be read from, written to or potentially having an exception. It also blocks execution of your program as to not waste CPU cycles.

There are a handful of libraries out there to make asynchronous programming easy and painless. These libraries typically use the notion of registering a callback function to be executed when data is available for a particular file handle. This design allows the programmer to focus on the problem they are trying to solve as well as keeping your program readable. For you python coders, you should checkout Twisted if you haven’t already. They make this sort of thing pathetically simple.

But what kind of tutorial would this be if I simply used a one-liner call from Twisted? No. Instead I’ll create my own version of the Twisted echo server which is really what I wanted to demonstrate anyways.

Download asyn_echo.py


#!/bin/env python

import socket, select, sys

HOST =
PORT = 9999

client_fd = []
fd_to_conn = {}

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_fd = s.fileno()
client_fd.append(listen_fd)
s.bind((HOST, PORT))
s.listen(100)
s.settimeout(5)

while True:
r_fds, w_fds, e_fds = select.select(client_fd, [], [])
for fd in r_fds:
if fd == listen_fd:
conn, addr = s.accept()
print ‘Accepted connection from %s:%d [fd: %d]‘ %
(addr[0], addr[1], conn.fileno())
fd_to_conn[conn.fileno()] = conn
client_fd.append(conn.fileno())
else:
data = fd_to_conn[fd].recv(1024)
if not data:
# closed connection
print ‘Goodbye’
fd_to_conn[fd].close()
del(fd_to_conn[fd])
client_fd.pop(client_fd.index(fd))
else:
fd_to_conn[fd].send(data)

I actually pitted my version against the Twisted version. I created a shell script which recursively called itself in a fork bomb style and then finally sending a 1500k postscript file via netcat to the echo server. The most I sent at once was 64 netcat processes totaling approximately 60 megabytes. For the most part they both performed nearly equal. More importantly, they rely on the same select system call to efficiently process the data.

April 18, 2006

getopts morphed into a gui

Filed under: linux, usability — jonEbird @ 10:02 pm

Lately I’ve found myself trying to further learn web technology and have been
evaluating various web frameworks. The motivation is to become much more
proficient with the most widely accepted user interface: The web browser.

Being a tradition Unix/Linux user, when I think of widely understood,
standardized interfaces, I’m going to be thinking of the good ‘ole shell. Just
run your unknown utility with a ‘-h’ or a ‘–help’ to get the impatience
user’s guide. That is really nice. A key piece of that consistency is how the options
are typically processed via the get-opts library.
Most people are familiar with the standard get-opts routines that are readily
available in your language of choice… from C to python. For those of you
unfamiliar with the get-opts routine, it is the library which makes all
command line options to a program standardized. That is why passing ‘-fp’ to
ps is the same as ‘-f -p’. What I admire about this library is
how successful it has been in unifying all sorts of utilities on Unix/Linux
for years. Heck, even tar came around after those early years!
So why not we take get-opts to the next level of convenience, usability and
power? I’ve already seen this trend start with python’s replacement of the
stock get-opts with href="http://www.python.org/doc/2.4/lib/module-optparse.html">optparse.
Can you guess where I’m going with this?

What if we supplied a bit more information than merely whether or not an
option takes an argument or not, and then use that extra information for
maximum power and usability? Imagine developing your latest utility and testing via command line, but then turning around and invoking it via a web browser, GTK window, TK window, etc? And why not?

In the simplest of implementations, a web page could provide radio buttons for
toggling various options as well as text form for any additional options. I
envision a metamorphosis of information and interface options while being ease to
use. After we get the basics working, we can start implementing more advanced
features, such as: grouping like options together in sections, allowing the
user to toggle layout to alphabetical, search options, keep advanced options
hidden by default, remember the last options used and much more! Think of what that’ll do for the Linux newbies?

I can not currently
call myself a web developer and honestly I am not anxious to become one
either, yet I want to make some of my utilities available via the web. I’ll spend an equal amount of time exposing my utility via the web as I spent developing it in the first place! Are you
the same? Care to develop the next generation of get-opts?