Virgin Post
Time will tell what this site will become… But I promise you this much: I won’t write crap just to fill in content.
Instead, this will merely be a news heading forum for all things being and to be hosted here.
Time will tell what this site will become… But I promise you this much: I won’t write crap just to fill in content.
Instead, this will merely be a news heading forum for all things being and to be hosted here.
Lately I’ve found myself trying to further learn web technology and have been
evaluating various web frameworks. The motivation is to become much more
proficient with the most widely accepted user interface: The web browser.
Being a tradition Unix/Linux user, when I think of widely understood,
standardized interfaces, I’m going to be thinking of the good ‘ole shell. Just
run your unknown utility with a ‘-h’ or a ‘–help’ to get the impatience
user’s guide. That is really nice. A key piece of that consistency is how the options
are typically processed via the get-opts library.
Most people are familiar with the standard get-opts routines that are readily
available in your language of choice… from C to python. For those of you
unfamiliar with the get-opts routine, it is the library which makes all
command line options to a program standardized. That is why passing ‘-fp’ to
ps is the same as ‘-f -p’. What I admire about this library is
how successful it has been in unifying all sorts of utilities on Unix/Linux
for years. Heck, even tar came around after those early years!
So why not we take get-opts to the next level of convenience, usability and
power? I’ve already seen this trend start with python’s replacement of the
stock get-opts with
href="http://www.python.org/doc/2.4/lib/module-optparse.html">optparse.
Can you guess where I’m going with this?
What if we supplied a bit more information than merely whether or not an
option takes an argument or not, and then use that extra information for
maximum power and usability? Imagine developing your latest utility and testing via command line, but then turning around and invoking it via a web browser, GTK window, TK window, etc? And why not?
In the simplest of implementations, a web page could provide radio buttons for
toggling various options as well as text form for any additional options. I
envision a metamorphosis of information and interface options while being ease to
use. After we get the basics working, we can start implementing more advanced
features, such as: grouping like options together in sections, allowing the
user to toggle layout to alphabetical, search options, keep advanced options
hidden by default, remember the last options used and much more! Think of what that’ll do for the Linux newbies?
I can not currently
call myself a web developer and honestly I am not anxious to become one
either, yet I want to make some of my utilities available via the web. I’ll spend an equal amount of time exposing my utility via the web as I spent developing it in the first place! Are you
the same? Care to develop the next generation of get-opts?
Today I turn 28 and I just received some well timed news from my place of work. Yesterday I was promoted to a Senior level administrator after three years of service.
I started with the company in June of 2000 while still attending Ohio State University and I worked as an intern for over three years when I was finally hired full time. Although my company is not huge, it is profitable and has been in the computer industry for over 30 years. I always thought I would have to move around to truly gain broad experience for my resume, but instead my company is remarkable in how much it changes. As a result, I get a stable working position while helping myself to some welcomed experience.
As an intern, I was doing development work in C/C++, sql & shell scripting primarily. I also had a side job at the University helping people out in the lab which helped fuel my continued interest in administration. When I decided to shift to an administration role, I spend quite a bit of time just reading books at the bookstore, buying some, checking some out from the library, but mostly just reading them while enjoying some coffee in the cafe. The career development goal was simple: learn as much as you can in the field of administration and the operating systems you are supporting. Now that I’ve seemingly achieved that goal, as seen by my promotion, what should my development goal be now?
What does a lead administrator do that a senior administrator just isn’t up for? I am sure I can read the official job description from my HR department, but the last time I looked it was severely out of date. The road from entry level or associate to Senior is one that is primarily technical, at least in my field, where you are merely demonstrating yourself as proficient in handling challenges, communicate well, work well with others and make your deadlines. The next level, I suspect, will require myself venturing out in the areas where nerds are not comfortable going.
A lead administrator is involved in budget discussions, aids in design decisions along with the architects, understands the business justifications and ramifications to technical decisions. Honestly, not all of these aspects are that appealing to me. Instead, when I have free time of my own to explore interesting, work related topics, it almost always involving some sort of programming. If I’m not creating some mini utility that I’ve really wanted to have, I’m reading some book or blog about programming, checking out the latest language war threads or even evaluating a popular language I’ve yet to mess around with.
My ideal job is one where I do not lose my root access, get to play with really cool technology, am looked to for technical and architectural decisions, and get to frequently write supporting programs to further our administration group, monitoring teams, database administrators and even the development staff. My latest development utility developed at work was to abstract all of our add-on program’s init scripts into a new schema in LDAP to be centralized administrated. The problem is I do not know of a official job that matches my 50/50 mix of administration and development.
In conclusion, I’m going to take it easy while celebrating my b-day and just revel in my promotion for a while. As for the coming years, I like to flirt with the idea of launching a new startup company. I can develop my own software and still handle the administration of our machines. Now if I could just grow the balls to do so…
Asynchronous programming is common place for developers but it can often be a mysterious thing to system administrators who merely know enough programming to get by. Since the vast majority of material you will find on the subject is catered towards developers, it can easily go right over the heads of many administrators. This is for those administrators.
As a system admin myself, I tend to break applications down to the system call level while troubleshooting problems. That is where we live, day in day out, while keeping closed source applications up and running. The only window we have into the nature of proprietary applications are the system calls they make.
To me, asynchronous programming is nothing more than an exercise of using the select system call. The select system call takes various arrays of file handles ready to be read from, written to or potentially having an exception. It also blocks execution of your program as to not waste CPU cycles.
There are a handful of libraries out there to make asynchronous programming easy and painless. These libraries typically use the notion of registering a callback function to be executed when data is available for a particular file handle. This design allows the programmer to focus on the problem they are trying to solve as well as keeping your program readable. For you python coders, you should checkout Twisted if you haven’t already. They make this sort of thing pathetically simple.
But what kind of tutorial would this be if I simply used a one-liner call from Twisted? No. Instead I’ll create my own version of the Twisted echo server which is really what I wanted to demonstrate anyways.
#!/bin/env python
import socket, select, sys
HOST = ‘’
PORT = 9999
client_fd = []
fd_to_conn = {}
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_fd = s.fileno()
client_fd.append(listen_fd)
s.bind((HOST, PORT))
s.listen(100)
s.settimeout(5)
while True:
r_fds, w_fds, e_fds = select.select(client_fd, [], [])
for fd in r_fds:
if fd == listen_fd:
conn, addr = s.accept()
print ‘Accepted connection from %s:%d [fd: %d]’ %
(addr[0], addr[1], conn.fileno())
fd_to_conn[conn.fileno()] = conn
client_fd.append(conn.fileno())
else:
data = fd_to_conn[fd].recv(1024)
if not data:
# closed connection
print ‘Goodbye’
fd_to_conn[fd].close()
del(fd_to_conn[fd])
client_fd.pop(client_fd.index(fd))
else:
fd_to_conn[fd].send(data)
I actually pitted my version against the Twisted version. I created a shell script which recursively called itself in a fork bomb style and then finally sending a 1500k postscript file via netcat to the echo server. The most I sent at once was 64 netcat processes totaling approximately 60 megabytes. For the most part they both performed nearly equal. More importantly, they rely on the same select system call to efficiently process the data.
I am a lazy, perfectionist who happens to make a living as a system administrator. I could use more adjectives to describe myself, but those two in particular are what drive me to automate as many tasks as I can. The lazy part of me gets pissed off when I have to do some mundane task against 100 machines and rather seeks a much more efficient method. And the perfectionist part of me get annoyed when human error creeps in and contributes to inconsistent results across all of those 100 machines. Now after several years of advocating autonomous procedures I have become passionate about it.
Let’s explore human error further. The two most common cases of human error, that I have witnessed, are the accidental error such as the classic fat-finger and then the incorrect interpretation of a procedure. I typically avoid the classic fat-finger problem by preparing my commands a head of time and merely copy & pasting them into the terminal where I’m working. The incorrect interpretation of an instruction is a bit harder to avoid. At school, the professors would dictate the lab assignment in mathematical terminology. Many students complained about this and insisted that the professor explain the assignment in plain English. The professor explained that they didn’t write it out in plain English because the English language left too much to interpretation where the mathematical notation did not. I always thought that was quite slick, and after joining the corporate world, when I was to write documentation on how I configured a particular application I would dictate explicit shell commands vs. explaining it in English just like the professor. That way, my co-worker could not interpret my instructions any other way.
Aside from avoiding unnecessary errors, automating your work can make you more efficient, consistent and overall better collaborator in the team. Creating habits of automation start with humble beginnings. Start with documenting your work with explicit accuracy. I invest a full hour of documenting my work when it would normally take only thirty minutes. But next time, the same task will take only 15 to 20 minutes. As soon as the total number of times you perform this task equals five or more, you are now saving time. It’s all about efficiency but with the truly great side affect of being consistent as well. Again, the trick is to make it a habit. Once you’ve acted through it a few times and seen the benefits of doing so, I doubt that you’ll go back to your old ways. Say you need to edit a config file to update a particular value. Don’t edit it with vi!
alias vi=’echo “Hmmm… Lets try sed to edit”‘
As a critic, you might suggest that simply executing a script might remove the focus that might be necessary to ensure a quality procedure? I disagree. Any oversights and/or mistakes can actually be easier to handle with an autonomous approach. Firstly you fix your script to handle the oversight, then you create a mini-script with that fix/update and run it against every previous instance that you used this script against. There is little need to audit the previous instances since you know that your script was used in each iteration and therefore they all need the update. All in all, you again are saving time.
My previous post talked about my passion for automation in administration work. My claim was that once you started automating your tasks you wouldn’t look back. So now, here is my proposal for how you might develop those wonderful habits. You develop your automation in three phases:
This is your initial attempt at installing the software or conducting some other procedure. Keep an editor open and keep track of every explicit step you take… each change directory, file edit, useradd, chmod, limits update, kernel parameter. The finished document can double as an appendix section to your Disaster Recovery (DR) documentation detailing every explicit step and detail.
The next phase doesn’t come along until you need to repeat the same procedure again. At this point, you retrieve your notes from the initial work. Chances are, there are slight differences with this iteration of the work, such as, executing on another machine, using a different account, installing down a different directory, using a different database. What you realize is that these difference are merely cosmetic and don’t really pertain to the piece of software, per se, but instead mere environmental changes. When you notice this you suddenly realize that you can take that first document and create a script out of it. There is not much need to get fancy at this point. Start by creating variables for all of those environmental values and move those definitions to the top of your script. E.g. If in your first doc, you executed a “useradd myapp1″, you’ll change it to setting “username=myapp1″ and update the command to be “useradd $username”. That’s it. Again, no need to get fancy at this point.
Just like the last phase, this phase doesn’t begin until you yet again need to perform the same task. This time you retrieve the script used in phase II, but this time before executing you might observe some key improvements that can be made to the script? It is the same principle as in writing papers; you need to give yourself some time after writing your first draft before you’ll see the problems with the paper. In this case, hopefully you see some key areas which can use some improvements, such as error checking or even abstracting the process even further.
I have used this technique for nearly every application I have to install from the simplistic Apache to the very error prone Oracle RAC install. In each case, I have a script which performs each preparation step as well as the final install. The beauty of approaching your automation in three steps is that you can ease your way into it. The time between iterations can also help you realize key points in the final resultant script. My first Oracle RAC installation took days. Now, I can install a multi-node Oracle RAC cluster in a couple hours. Aside from the obvious speed benefit, I am also getting a consistent result. Consistency leads to predictability and predictability leads to easier, brain dead administration which is what we are really trying to accomplish. Right?
Nothing too fancy here. Just a list of the most common things I find desirable while writing shell scripts.
This point is strictly for the sake of readability. Too often when trying to read somebodies script I’ll actually do various search & replaces of their variables because they used variables like “w”, “w2″, “w3″. It was quick and dirty for the author, but the inheritor of that script would appreciate if you had used more meaningful variable names.
This goes without saying, really…
Don’t know about you, but sometimes I get lazy and don’t feel like using getopts. Instead, I’ll throw my what would be optional arguments as hard coded variables at the top of my script. I think this is fine, but you’ll want to visually segregate these optional variables from the rest of the script.
I like to use a — dashed line of about 50-70 characters and even put the words “do not modify beyond this point” to further emphasize what you’re encouraged to change and what shouldn’t normally be touched.
Never assume the user’s cwd is the same as the script and use “./” to run or source another file. I like to set a variable REL_DIR=`dirname $0` and use it to reference the directory where the very script is running from.
E.g. You have a functions script you’d like to source, then with that REL_DIR variable you would “. ${REL_DIR}/<some-file>“.
I’m actually surprised on how often this happens.
My code excerpt typically looks like:
USAGE="Usage: `basename $0` <my options here>"
if [ -z "$SOME_ARG" ]; then
print $USAGE 1>&2
exit 1
fi
Not a script faux pau really, but it can help during the development process. Use STDOUT only for informational messages and/or optional debugging info. Then STDERR would only be used for errors. That way, when running the script you can optionally turn off stdout (1>&-) and easily check that nothing was printed to STDERR. When the output is mixed you’ll have a greater chance of missing the error.
One example of this technique in action is when using the tar command. Try leaving out the verbose (’v') option when creating or extracting your archive, then you can easily see when you might have had a permissions issue or something else related.
Define the required variables at the top of the script. Even mention that they are REQUIRED. A good example of this is scripts that use Sybase’s isql utility. Anytime I run isql, I like to set something like:
# required variables for isql SYBASE=/some/path/to/sybase LD_LIBRARY_PATH=$SYBASE/lib
What you want to avoid is a situation where the script works because you’ve got the required variable set in your env, but only because it’s set in one of your dot files.
Two common principles I like to emphasize here:
1. Keep all required variable/env settings in the script! cron does NOT source your dot files.
2. Redirect stdout, but leave stderr unmanaged. This is a cheap technique, but whenever I don’t have time to test for all possible errors I simply setup my .forward file and let cron email me the output produced from the cron script. Though, to be complete, you should really manage your stderr in other fashions.
Not always important for small scripts, but a good practice.
For any sort of error checking your script might perform, use a unique error code for each situation that you decide to exit the shell script. That will make invocations of your script more manageable.
Anywhere you are comparing a value to some, seemingly, arbitrary number, go ahead and set that value to a meaningful variable name. Then your comparison reads alot better.
Using “$CURRENT_VALUE -gt $THRESHOLD” is much better than finding “$CURRENT_VALUE -gt 83” buried in some script and not having any clue what the number 83 signifies aside from the surrounding code.
Never do this: /path/to/some/command -option > command.out.
You are assuming that you are sitting in a directory where you have permissions to create a temporary file and secondly that no one will ever be running the same script at the same time you are.
Some shells make creating temporary files easy with commands such as mktemp. I typically employ a convention where I define my temporary file space as “TEMP=/tmp/.myshellname$$_“. Then lets say I need a temp file to capture the output from ps. I might redirect it to ${TEMP}raw_ps.
And finally, at the end of the script, or defined in a shell function, you can cleanup each temporary file with one line: rm -f ${TEMP}*.
In general, well written code/scripts should read well and be organized well. Every principle discussed above has one purpose: maintainability.
What is the best programming language for a system administrator? Queue the language war, please. The typical arguments are “your language can’t do this”, “this library doesn’t have a consistent naming convention”, well “my language is faster”, yeah and “your syntax is hideous to read much less use”, blah blah blah. No, I’m not a professional developer but I do spend a significant time doing development as a systems administrator. My programs are not huge year long projects, will probably never reach million lines of code and usually never need superb speed. For administrators, the most important aspect of the language of choice is productivity and maintainability.
When choosing your language, I recommend picking one that has a decent user community, is available on numerous platforms, has had significant time to mature in proving itself and has an extensive modules/library support. Meeting these requirements will leave you using a language that should keep you efficiently producing solutions to your administrative tasks.
First let’s eliminate some languages based on maintainability. Goodbye Haskell, lisp, scheme, Erlang and any other purely functional languages you have used or know of. I’d venture to say that less than 2% of system administrators are comfortable using any one of those languages. And you can obviously not choose a language which only yourself are going to be able to maintain. Aside from staying away from the obscure, the program should be intuitive to read. People can argue on the virtues of their favorite language and why it lends itself to writing maintainable code, but writing maintainable code is truly a skill. You can write obfusticated code in any language. It takes practice and a conscience effort of keeping your code clean and organized well. Here, practice makes perfect, is the key.
Secondly, and in my opinion the most important aspect of the language of choice is staying efficient. Ideally, each program should be succinct and to the point. I no longer use C/C++ regularly, even though that’s the language I started with, because you simply have to write much more code which another language can do in half or less of work. Try looking at one of the ‘P’s of the LAMP stack and see which fits you better and you can see yourself being productive in. That is, evaluate Python, Perl, PHP and Ruby (okay, not a ‘P’ but whatever). Don’t use a language that doesn’t make sense to you. Don’t waste your time.
And finally, time to explain this title and tell a little story where some customer data was delayed during one day’s production incident. One day, we had a production issue where messages were accidentally dequeued from a IBM Webphere MQSeries queue. A tool which was used to grab just one message dequeued all of the messages. To top it all off, the same tool kept seg faulting while trying to requeue the same messages. The solution left to us was to manually parse out each of the discrete messages into separate files. Once in that state, we had another known tool which could upload the messages separately. There were three developers and myself on the phone and we were all racing to the solution. My language of choice was Python and the rest of the developers used the language that they use professionally, Java. So who reached the solution first? Well I wouldn’t be writing this if I hadn’t won, would I? For me, Python makes sense and I can efficiently write code which I like to think other people will be able to understand and update. That is what is most important for your language of choice.
[ As un-entertaining as it is, you can view the Python solution. ]
Have you ever wanted to send a signal, which normally produces a core file, but the process has one of those annoying signal handlers setup to catch the signal you’re sending? The nerve of that application trying to intelligently handle signals! I actually have a real need to remove the signal handler of a process which I’ll describe shortly. Normally, it is a bad idea to remove another process’s signal handler and under normal circumstances I do not suggest following the procedure which I am going to describe.
I have been struggling with a production issue at work with a process which has been less than cooperative. You see, I have a java process which gets crazy and starts consuming CPU cycles. When you run a strace against the process the only system call you will see is a sched_yield() call. The java thread is most likely stuck on a spinlock in user space and the process/thread which owns the lock has died or something else, but for my runaway process all it cares about it is checking for it’s lock and yielding execution back to the kernel to schedule another task. Ofcourse, it just gets the CPU again and continues to pound it.
My company pays alot of money for support and we actually have had a case now open for eight months now. The problem is we are unable to gather sufficient data for their level2 and level3 support teams. They would like a javacore to be generated, which can be done by sending a signal 3 to the java process. In addition to a javacore, they recommend sending a signal 11 (SEGV) to the process to prompt the generation of a normal binary core file. Either one would be invaluable for the support team in ascertaining what is going wrong. Unfortunately, it seems that once the process is stuck in this tight, sched_yield() loop any of the signals we send to it are being ignored. In short, that is my problem.
During my Linux Kernel Internals training with RedHat, I had an idea of writing a kernel module to strip the signal handler from the java process so I can finally generate that elusive core file. The kernel module sets up an entry under /proc named stripsignal_pid. If you read the value, it will tell you a quick one-liner about using this interface. To use the module, you write a process ID into that file and that process’s SIGABRT signal handler will be reset to the SIG_DFL. At this point, if you send a SIGABRT signal to the process the result will be writing out of it’s core file.
Download the source along with a helper test program here: stripsignal.tar.gz.
But if all you are interested in is reviewing the short source code, then feel free to browse the stripsignal.c source online.
I tell you what, the best thing I learned from the class was just familiarizing myself with the source code and actually learning some new emacs tricks for navigating large source code projects. Next writeup will be about my experiences with the GNU global tagging system.
An Idea for a helpful Admin Tool
What if you got a page and/or ticket for an obscure server’s particular service? The unique problem is that your environment is huge, you’re still relatively new to the company, co-workers are not there to help you and you have never heard of this server. When logging in, you’re hoping that the person has a nice RC script under /etc/init.d/, that you can find the app via a “lsof -i:<port>”, find the application’s home and locate some log files. But what if the application install was not that nice and did not conform to the norms that you are used to?
To either a small or very large degree, you will be reverse engineering this application. If you’re really unlucky, the application who supports it also has no idea about it nor knows anything about Unix-like machines. So, what if there was an application which is polling upon logging into the server, told you, “In case you are looking for the application binX, which typically listens on port XX, it was most likely started last time by issuing the script /path/to/funky/path/binX.sh”. I’m guessing it would freak you out and immediately flood your emotions with confusion, gratitude and curiosity.
So, would such an application be difficult to write?
Now, if your data was collected in a easily usable format, you can collect similar data from other machines and start to make broader correlations.
The whole process is really about automating the process of reverse engineering an application. I do that alot. I believe others would like an application which aided or performed the entire reverse engineering for them.