jonEbird

July 31, 2006

Asynchronous Network Programming from an Admin Perspective

Filed under: adminstration,linux — jonEbird @ 9:08 pm

Asynchronous programming is common place for developers but it can often be a mysterious thing to system administrators who merely know enough programming to get by. Since the vast majority of material you will find on the subject is catered towards developers, it can easily go right over the heads of many administrators. This is for those administrators.

As a system admin myself, I tend to break applications down to the system call level while troubleshooting problems. That is where we live, day in day out, while keeping closed source applications up and running. The only window we have into the nature of proprietary applications are the system calls they make.

To me, asynchronous programming is nothing more than an exercise of using the select system call. The select system call takes various arrays of file handles ready to be read from, written to or potentially having an exception. It also blocks execution of your program as to not waste CPU cycles.

There are a handful of libraries out there to make asynchronous programming easy and painless. These libraries typically use the notion of registering a callback function to be executed when data is available for a particular file handle. This design allows the programmer to focus on the problem they are trying to solve as well as keeping your program readable. For you python coders, you should checkout Twisted if you haven’t already. They make this sort of thing pathetically simple.

But what kind of tutorial would this be if I simply used a one-liner call from Twisted? No. Instead I’ll create my own version of the Twisted echo server which is really what I wanted to demonstrate anyways.

Download asyn_echo.py


#!/bin/env python

import socket, select, sys

HOST =
PORT = 9999

client_fd = []
fd_to_conn = {}

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_fd = s.fileno()
client_fd.append(listen_fd)
s.bind((HOST, PORT))
s.listen(100)
s.settimeout(5)

while True:
r_fds, w_fds, e_fds = select.select(client_fd, [], [])
for fd in r_fds:
if fd == listen_fd:
conn, addr = s.accept()
print ‘Accepted connection from %s:%d [fd: %d]‘ %
(addr[0], addr[1], conn.fileno())
fd_to_conn[conn.fileno()] = conn
client_fd.append(conn.fileno())
else:
data = fd_to_conn[fd].recv(1024)
if not data:
# closed connection
print ‘Goodbye’
fd_to_conn[fd].close()
del(fd_to_conn[fd])
client_fd.pop(client_fd.index(fd))
else:
fd_to_conn[fd].send(data)

I actually pitted my version against the Twisted version. I created a shell script which recursively called itself in a fork bomb style and then finally sending a 1500k postscript file via netcat to the echo server. The most I sent at once was 64 netcat processes totaling approximately 60 megabytes. For the most part they both performed nearly equal. More importantly, they rely on the same select system call to efficiently process the data.