jonEbird

February 7, 2012

Python Memoize Decorator with TTL Argument

Filed under: blogging,linux,python — jonEbird @ 8:51 pm

I have been working on a troubleshooting effort on and off over the past couple of weeks. In the process of troubleshooting, I ended up writing a script to query and report on un-read buffered data for sockets across the system and today I wanted to correlate the sockets to a particular set of processes. That turned out to be not so bad knowing I could look down /proc/<pid>/fd/ for a list of the current file descriptors the particular process has open. Here is my winning Python implementation.

def get_pid_socket_nodes(pid):
    """Return a list of socket device numbers for the given process ID (pid)
    pid may be an integer or a string
    Akin to: ls -l /proc/
/fd/ | sed -n 's/^.*socket:\[\([0-9]*\)\]$/\1/p'
    """
    nodes = []
    for fd in os.listdir('/proc/%s/fd' % pid):
        link = os.readlink('/proc/%s/fd/%s' % (pid, fd))
        if link.startswith('socket:['):
            nodes.append(link[8:-1])
    return nodes

What I haven’t told you is the overall program may be scanning /proc/net/{tcp,udp} a lot and with each scan I’ll want to correlate values found with a particular process’s sockets. That means a straight forward implementation could mean calling my helper function get_socket_devices() at each interval for each process of interest. Also, the interval is currently controlled by a sleep interval specified by the caller. If the caller wants to monitor for socket information at an 0.1s sleep interval for five processes, I’ll be scanning /proc/<pid>/fd/ entries 50 times a second. Sure, it won’t crash the machine but it’s certainly a waste of resources assuming the processes you are interested in aren’t closing and opening sockets at an alarming rate.

I wanted to reduce the amount of /proc/<pid>/fd/ scans but I also did not want to clutter the code. “Ah ha, what I want is a memoize decorator”, I told myself, but I want to be able to specify a time-to-live (ttl) value to my decorator. Admittedly, I have only ever needed to write decorators without arguments, so this was a first for me. I also don’t write enough decorators to be able to write one correctly without looking at a sample implementation for a refresher. My implementation is basically a merge between Memoize and Cached Properties from the Python Decorator Library wiki page. (I also spent some time re-reading Bruce Eckel’s Decorator Arguments writeup as well as Elf Sternberg’s Decorators With Arguments writeup.)

class memoized_ttl(object):
    """Decorator that caches a function's return value each time it is called within a TTL
    If called within the TTL and the same arguments, the cached value is returned,
    If called outside the TTL or a different value, a fresh value is returned.
    """
    def __init__(self, ttl):
        self.cache = {}
        self.ttl = ttl
    def __call__(self, f):
        def wrapped_f(*args):
            now = time.time()
            try:
                value, last_update = self.cache[args]
                if self.ttl > 0 and now - last_update > self.ttl:
                    raise AttributeError
                #print 'DEBUG: cached value'
                return value
            except (KeyError, AttributeError):
                value = f(*args)
                self.cache[args] = (value, now)
                #print 'DEBUG: fresh value'
                return value
            except TypeError:
                # uncachable -- for instance, passing a list as an argument.
                # Better to not cache than to blow up entirely.
                return f(*args)
        return wrapped_f

Now let’s put the decorator into use and test it. (My testing is uncommenting the “print DEBUG” lines from the memoized_ttl decorator.) To use the decorator, you use the standard decorator calling syntax and specify your desired ttl. I am restricting the /proc/<pid>/fd/ scans to a minute interval which, based on my experience, will drop the load of the script down to below the radar.

@memoized_ttl(60)
def get_pid_socket_nodes(pid):
    """Return a list of socket device numbers for the given process ID (pid)
    pid may be an integer or a string
    Akin to: ls -l /proc/
/fd/ | sed -n 's/^.*socket:\[\([0-9]*\)\]$/\1/p'
    """
    nodes = []
    for fd in os.listdir('/proc/%s/fd' % pid):
        link = os.readlink('/proc/%s/fd/%s' % (pid, fd))
        if link.startswith('socket:['):
            nodes.append(link[8:-1])
    return nodes

And finally, here is what it looks like looking at a couple of processes of mine with open sockets:

>>> from buffered_sockets import *
>>> pid = 23283
>>> pid2 = 23279
>>> get_socket_devices(pid)
DEBUG: fresh value
['46287188']
>>> get_socket_devices(pid)
DEBUG: cached value
['46287188']
>>> get_socket_devices(pid2)
DEBUG: fresh value
['46287165']
>>> get_socket_devices(pid2)
DEBUG: cached value
['46287165']
>>> time.sleep(60)
>>> get_socket_devices(pid)
DEBUG: fresh value
['46287188']
>>> get_socket_devices(pid)
DEBUG: cached value
['46287188']
>>>

A handy decorator to keep around which I suspect will get a lot of mileage from other scripts I end up authoring. Aside from the helpful blog posts listed above, I also found Will McGugan’s Timed Caching Decorator page after writing this entire blog post. Apparently I need to improve my google searching skills because I certainly didn’t want to re-invent what others have already completed. On the other hand, if you spend this much time creating something for the first time, you tend to remember the lessons better. I’ll take that.

 
close Reblog this comment
blog comments powered by Disqus