November 13, 2007

Reverse Engineering Buddy

Filed under: adminstration,linux,usability — jonEbird @ 10:34 pm

An Idea for a helpful Admin Tool

What if you got a page and/or ticket for an obscure server’s particular service? The unique problem is that your environment is huge, you’re still relatively new to the company, co-workers are not there to help you and you have never heard of this server. When logging in, you’re hoping that the person has a nice RC script under /etc/init.d/, that you can find the app via a “lsof -i:<port>”, find the application’s home and locate some log files. But what if the application install was not that nice and did not conform to the norms that you are used to?

To either a small or very large degree, you will be reverse engineering this application. If you’re really unlucky, the application who supports it also has no idea about it nor knows anything about Unix-like machines. So, what if there was an application which is polling upon logging into the server, told you, “In case you are looking for the application binX, which typically listens on port XX, it was most likely started last time by issuing the script /path/to/funky/path/”. I’m guessing it would freak you out and immediately flood your emotions with confusion, gratitude and curiosity.

So, would such an application be difficult to write?

  • Poll any events for read/write/access under key dirs, such as /etc/init.d/, /etc/*conf ? (use the inotify syscall introduced in Linux kernel 2.6.16)
  • Track users logging into the system (could correlate later)
  • Watch for any new ports being listened on, then record the binary name.
  • Reverse engineer this application to automatically collect interesting data on it.
  • Intelligently parse an strace (note to self, checkout:
  • Utilize systemtap for Linux and DTrace for Solaris. pseudo code { observe new socket being opened, so show me the last 10 files opened and executed. correlate application with startup script }

Now, if your data was collected in a easily usable format, you can collect similar data from other machines and start to make broader correlations.

The whole process is really about automating the process of reverse engineering an application. I do that alot. I believe others would like an application which aided or performed the entire reverse engineering for them.

June 21, 2007

Stripping Another Process of it’s Signal Handler

Filed under: adminstration,linux — jonEbird @ 11:38 pm

Have you ever wanted to send a signal, which normally produces a core file, but the process has one of those annoying signal handlers setup to catch the signal you’re sending? The nerve of that application trying to intelligently handle signals! I actually have a real need to remove the signal handler of a process which I’ll describe shortly. Normally, it is a bad idea to remove another process’s signal handler and under normal circumstances I do not suggest following the procedure which I am going to describe.

I have been struggling with a production issue at work with a process which has been less than cooperative. You see, I have a java process which gets crazy and starts consuming CPU cycles. When you run a strace against the process the only system call you will see is a sched_yield() call. The java thread is most likely stuck on a spinlock in user space and the process/thread which owns the lock has died or something else, but for my runaway process all it cares about it is checking for it’s lock and yielding execution back to the kernel to schedule another task. Ofcourse, it just gets the CPU again and continues to pound it.

My company pays alot of money for support and we actually have had a case now open for eight months now. The problem is we are unable to gather sufficient data for their level2 and level3 support teams. They would like a javacore to be generated, which can be done by sending a signal 3 to the java process. In addition to a javacore, they recommend sending a signal 11 (SEGV) to the process to prompt the generation of a normal binary core file. Either one would be invaluable for the support team in ascertaining what is going wrong. Unfortunately, it seems that once the process is stuck in this tight, sched_yield() loop any of the signals we send to it are being ignored. In short, that is my problem.

During my Linux Kernel Internals training with RedHat, I had an idea of writing a kernel module to strip the signal handler from the java process so I can finally generate that elusive core file. The kernel module sets up an entry under /proc named stripsignal_pid. If you read the value, it will tell you a quick one-liner about using this interface. To use the module, you write a process ID into that file and that process’s SIGABRT signal handler will be reset to the SIG_DFL. At this point, if you send a SIGABRT signal to the process the result will be writing out of it’s core file.

Download the source along with a helper test program here: stripsignal.tar.gz.

But if all you are interested in is reviewing the short source code, then feel free to browse the stripsignal.c source online.

I tell you what, the best thing I learned from the class was just familiarizing myself with the source code and actually learning some new emacs tricks for navigating large source code projects. Next writeup will be about my experiences with the GNU global tagging system.

January 15, 2007

Python Admin vs. Java Developers

Filed under: adminstration,usability — jonEbird @ 5:31 pm

What is the best programming language for a system administrator? Queue the language war, please. The typical arguments are “your language can’t do this”, “this library doesn’t have a consistent naming convention”, well “my language is faster”, yeah and “your syntax is hideous to read much less use”, blah blah blah. No, I’m not a professional developer but I do spend a significant time doing development as a systems administrator. My programs are not huge year long projects, will probably never reach million lines of code and usually never need superb speed. For administrators, the most important aspect of the language of choice is productivity and maintainability.

When choosing your language, I recommend picking one that has a decent user community, is available on numerous platforms, has had significant time to mature in proving itself and has an extensive modules/library support. Meeting these requirements will leave you using a language that should keep you efficiently producing solutions to your administrative tasks.

First let’s eliminate some languages based on maintainability. Goodbye Haskell, lisp, scheme, Erlang and any other purely functional languages you have used or know of. I’d venture to say that less than 2% of system administrators are comfortable using any one of those languages. And you can obviously not choose a language which only yourself are going to be able to maintain. Aside from staying away from the obscure, the program should be intuitive to read. People can argue on the virtues of their favorite language and why it lends itself to writing maintainable code, but writing maintainable code is truly a skill. You can write obfusticated code in any language. It takes practice and a conscience effort of keeping your code clean and organized well. Here, practice makes perfect, is the key.

Secondly, and in my opinion the most important aspect of the language of choice is staying efficient. Ideally, each program should be succinct and to the point. I no longer use C/C++ regularly, even though that’s the language I started with, because you simply have to write much more code which another language can do in half or less of work. Try looking at one of the ‘P’s of the LAMP stack and see which fits you better and you can see yourself being productive in. That is, evaluate Python, Perl, PHP and Ruby (okay, not a ‘P’ but whatever). Don’t use a language that doesn’t make sense to you. Don’t waste your time.

And finally, time to explain this title and tell a little story where some customer data was delayed during one day’s production incident. One day, we had a production issue where messages were accidentally dequeued from a IBM Webphere MQSeries queue. A tool which was used to grab just one message dequeued all of the messages. To top it all off, the same tool kept seg faulting while trying to requeue the same messages. The solution left to us was to manually parse out each of the discrete messages into separate files. Once in that state, we had another known tool which could upload the messages separately. There were three developers and myself on the phone and we were all racing to the solution. My language of choice was Python and the rest of the developers used the language that they use professionally, Java. So who reached the solution first? Well I wouldn’t be writing this if I hadn’t won, would I? For me, Python makes sense and I can efficiently write code which I like to think other people will be able to understand and update. That is what is most important for your language of choice.

[ As un-entertaining as it is, you can view the Python solution. ]

November 6, 2006

Scripting Best Practices

Filed under: adminstration,usability — jonEbird @ 6:35 pm

Nothing too fancy here. Just a list of the most common things I find desirable while writing shell scripts.

  1. Use meaningful variable names
  2. This point is strictly for the sake of readability. Too often when trying to read somebodies script I’ll actually do various search & replaces of their variables because they used variables like “w”, “w2″, “w3″. It was quick and dirty for the author, but the inheritor of that script would appreciate if you had used more meaningful variable names.

  3. Comment your code.
  4. This goes without saying, really…

  5. Visually separate any optional settings sections.
  6. Don’t know about you, but sometimes I get lazy and don’t feel like using getopts. Instead, I’ll throw my what would be optional arguments as hard coded variables at the top of my script. I think this is fine, but you’ll want to visually segregate these optional variables from the rest of the script.
    I like to use a — dashed line of about 50-70 characters and even put the words “do not modify beyond this point” to further emphasize what you’re encouraged to change and what shouldn’t normally be touched.

  7. Use relative pathing for accessing files to the script.
  8. Never assume the user’s cwd is the same as the script and use “./” to run or source another file. I like to set a variable REL_DIR=`dirname $0` and use it to reference the directory where the very script is running from.
    E.g. You have a functions script you’d like to source, then with that REL_DIR variable you would “. ${REL_DIR}/<some-file>“.
    I’m actually surprised on how often this happens.

  9. Always print a usage statement for improper usage and/or when -h option used.
  10. My code excerpt typically looks like:

    USAGE="Usage: `basename $0` <my options here>"
    if [ -z "$SOME_ARG" ]; then
        print $USAGE 1>&2
        exit 1
  11. Conscientiously use STDOUT vs. STDERR in different scenarios.
  12. Not a script faux pau really, but it can help during the development process. Use STDOUT only for informational messages and/or optional debugging info. Then STDERR would only be used for errors. That way, when running the script you can optionally turn off stdout (1>&-) and easily check that nothing was printed to STDERR. When the output is mixed you’ll have a greater chance of missing the error.
    One example of this technique in action is when using the tar command. Try leaving out the verbose (‘v’) option when creating or extracting your archive, then you can easily see when you might have had a permissions issue or something else related.

  13. Keep all required variables defined in the script.
  14. Define the required variables at the top of the script. Even mention that they are REQUIRED. A good example of this is scripts that use Sybase’s isql utility. Anytime I run isql, I like to set something like:

    # required variables for isql

    What you want to avoid is a situation where the script works because you’ve got the required variable set in your env, but only because it’s set in one of your dot files.

  15. Cron’ed Scripts.
  16. Two common principles I like to emphasize here:
    1. Keep all required variable/env settings in the script! cron does NOT source your dot files.
    2. Redirect stdout, but leave stderr unmanaged. This is a cheap technique, but whenever I don’t have time to test for all possible errors I simply setup my .forward file and let cron email me the output produced from the cron script. Though, to be complete, you should really manage your stderr in other fashions.

  17. Keep your exit/return codes categorized.
  18. Not always important for small scripts, but a good practice.
    For any sort of error checking your script might perform, use a unique error code for each situation that you decide to exit the shell script. That will make invocations of your script more manageable.

  19. Avoid “Magic” Numbers
  20. Anywhere you are comparing a value to some, seemingly, arbitrary number, go ahead and set that value to a meaningful variable name. Then your comparison reads alot better.
    Using “$CURRENT_VALUE -gt $THRESHOLD” is much better than finding “$CURRENT_VALUE -gt 83” buried in some script and not having any clue what the number 83 signifies aside from the surrounding code.

  21. Use unique temporary files.
  22. Never do this: /path/to/some/command -option > command.out.
    You are assuming that you are sitting in a directory where you have permissions to create a temporary file and secondly that no one will ever be running the same script at the same time you are.
    Some shells make creating temporary files easy with commands such as mktemp. I typically employ a convention where I define my temporary file space as “TEMP=/tmp/.myshellname$$_“. Then lets say I need a temp file to capture the output from ps. I might redirect it to ${TEMP}raw_ps.
    And finally, at the end of the script, or defined in a shell function, you can cleanup each temporary file with one line: rm -f ${TEMP}*.

In general, well written code/scripts should read well and be organized well. Every principle discussed above has one purpose: maintainability.

October 19, 2006

Automation in Three Phases

Filed under: adminstration — jonEbird @ 6:05 pm

My previous post talked about my passion for automation in administration work. My claim was that once you started automating your tasks you wouldn’t look back. So now, here is my proposal for how you might develop those wonderful habits. You develop your automation in three phases:

Phase I.

  This is your initial attempt at installing the software or conducting some other procedure. Keep an editor open and keep track of every explicit step you take… each change directory, file edit, useradd, chmod, limits update, kernel parameter. The finished document can double as an appendix section to your Disaster Recovery (DR) documentation detailing every explicit step and detail.

Phase II.

  The next phase doesn’t come along until you need to repeat the same procedure again. At this point, you retrieve your notes from the initial work. Chances are, there are slight differences with this iteration of the work, such as, executing on another machine, using a different account, installing down a different directory, using a different database. What you realize is that these difference are merely cosmetic and don’t really pertain to the piece of software, per se, but instead mere environmental changes. When you notice this you suddenly realize that you can take that first document and create a script out of it. There is not much need to get fancy at this point. Start by creating variables for all of those environmental values and move those definitions to the top of your script. E.g. If in your first doc, you executed a “useradd myapp1″, you’ll change it to setting “username=myapp1″ and update the command to be “useradd $username”. That’s it. Again, no need to get fancy at this point.

Phase III.

  Just like the last phase, this phase doesn’t begin until you yet again need to perform the same task. This time you retrieve the script used in phase II, but this time before executing you might observe some key improvements that can be made to the script? It is the same principle as in writing papers; you need to give yourself some time after writing your first draft before you’ll see the problems with the paper. In this case, hopefully you see some key areas which can use some improvements, such as error checking  or even abstracting the process even further.

I have used this technique for nearly every application I have to install from the simplistic Apache to the very error prone Oracle RAC install. In each case, I have a script which performs each preparation step as well as the final install. The beauty of approaching your automation in three steps is that you can ease your way into it. The time between iterations can also help you realize key points in the final resultant script. My first Oracle RAC installation took days. Now, I can install a multi-node Oracle RAC cluster in a couple hours. Aside from the obvious speed benefit, I am also getting a consistent result. Consistency leads to predictability and predictability leads to easier, brain dead administration which is what we are really trying to accomplish. Right?

October 15, 2006

The Automation Mentality

Filed under: adminstration — jonEbird @ 5:11 pm

I am a lazy, perfectionist who happens to make a living as a system administrator. I could use more adjectives to describe myself, but those two in particular are what drive me to automate as many tasks as I can. The lazy part of me gets pissed off when I have to do some mundane task against 100 machines and rather seeks a much more efficient method. And the perfectionist part of me get annoyed when human error creeps in and contributes to inconsistent results across all of those 100 machines. Now after several years of advocating autonomous procedures I have become passionate about it.

Let’s explore human error further. The two most common cases of human error, that I have witnessed, are the accidental error such as the classic fat-finger and then the incorrect interpretation of a procedure. I typically avoid the classic fat-finger problem by preparing my commands a head of time and merely copy & pasting them into the terminal where I’m working. The incorrect interpretation of an instruction is a bit harder to avoid. At school, the professors would dictate the lab assignment in mathematical terminology. Many students complained about this and insisted that the professor explain the assignment in plain English. The professor explained that they didn’t write it out in plain English because the English language left too much to interpretation where the mathematical notation did not. I always thought that was quite slick, and after joining the corporate world, when I was to write documentation on how I configured a particular application I would dictate explicit shell commands vs. explaining it in English just like the professor. That way, my co-worker could not interpret my instructions any other way.

Aside from avoiding unnecessary errors, automating your work can make you more efficient, consistent and overall better collaborator in the team. Creating habits of automation start with humble beginnings. Start with documenting your work with explicit accuracy. I invest a full hour of documenting my work when it would normally take only thirty minutes. But next time, the same task will take only 15 to 20 minutes. As soon as the total number of times you perform this task equals five or more, you are now saving time. It’s all about efficiency but with the truly great side affect of being consistent as well. Again, the trick is to make it a habit. Once you’ve acted through it a few times and seen the benefits of doing so, I doubt that you’ll go back to your old ways. Say you need to edit a config file to update a particular value. Don’t edit it with vi!
alias vi=’echo “Hmmm… Lets try sed to edit”‘

As a critic, you might suggest that simply executing a script might remove the focus that might be necessary to ensure a quality procedure? I disagree. Any oversights and/or mistakes can actually be easier to handle with an autonomous approach. Firstly you fix your script to handle the oversight, then you create a mini-script with that fix/update and run it against every previous instance that you used this script against. There is little need to audit the previous instances since you know that your script was used in each iteration and therefore they all need the update. All in all, you again are saving time.

July 31, 2006

Asynchronous Network Programming from an Admin Perspective

Filed under: adminstration,linux — jonEbird @ 9:08 pm

Asynchronous programming is common place for developers but it can often be a mysterious thing to system administrators who merely know enough programming to get by. Since the vast majority of material you will find on the subject is catered towards developers, it can easily go right over the heads of many administrators. This is for those administrators.

As a system admin myself, I tend to break applications down to the system call level while troubleshooting problems. That is where we live, day in day out, while keeping closed source applications up and running. The only window we have into the nature of proprietary applications are the system calls they make.

To me, asynchronous programming is nothing more than an exercise of using the select system call. The select system call takes various arrays of file handles ready to be read from, written to or potentially having an exception. It also blocks execution of your program as to not waste CPU cycles.

There are a handful of libraries out there to make asynchronous programming easy and painless. These libraries typically use the notion of registering a callback function to be executed when data is available for a particular file handle. This design allows the programmer to focus on the problem they are trying to solve as well as keeping your program readable. For you python coders, you should checkout Twisted if you haven’t already. They make this sort of thing pathetically simple.

But what kind of tutorial would this be if I simply used a one-liner call from Twisted? No. Instead I’ll create my own version of the Twisted echo server which is really what I wanted to demonstrate anyways.


#!/bin/env python

import socket, select, sys

PORT = 9999

client_fd = []
fd_to_conn = {}

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_fd = s.fileno()
s.bind((HOST, PORT))

while True:
r_fds, w_fds, e_fds =, [], [])
for fd in r_fds:
if fd == listen_fd:
conn, addr = s.accept()
print ‘Accepted connection from %s:%d [fd: %d]‘ %
(addr[0], addr[1], conn.fileno())
fd_to_conn[conn.fileno()] = conn
data = fd_to_conn[fd].recv(1024)
if not data:
# closed connection
print ‘Goodbye’

I actually pitted my version against the Twisted version. I created a shell script which recursively called itself in a fork bomb style and then finally sending a 1500k postscript file via netcat to the echo server. The most I sent at once was 64 netcat processes totaling approximately 60 megabytes. For the most part they both performed nearly equal. More importantly, they rely on the same select system call to efficiently process the data.

« Previous Page