October 19, 2006

Automation in Three Phases

Filed under: adminstration — jonEbird @ 6:05 pm

My previous post talked about my passion for automation in administration work. My claim was that once you started automating your tasks you wouldn’t look back. So now, here is my proposal for how you might develop those wonderful habits. You develop your automation in three phases:

Phase I.

  This is your initial attempt at installing the software or conducting some other procedure. Keep an editor open and keep track of every explicit step you take… each change directory, file edit, useradd, chmod, limits update, kernel parameter. The finished document can double as an appendix section to your Disaster Recovery (DR) documentation detailing every explicit step and detail.

Phase II.

  The next phase doesn’t come along until you need to repeat the same procedure again. At this point, you retrieve your notes from the initial work. Chances are, there are slight differences with this iteration of the work, such as, executing on another machine, using a different account, installing down a different directory, using a different database. What you realize is that these difference are merely cosmetic and don’t really pertain to the piece of software, per se, but instead mere environmental changes. When you notice this you suddenly realize that you can take that first document and create a script out of it. There is not much need to get fancy at this point. Start by creating variables for all of those environmental values and move those definitions to the top of your script. E.g. If in your first doc, you executed a “useradd myapp1″, you’ll change it to setting “username=myapp1″ and update the command to be “useradd $username”. That’s it. Again, no need to get fancy at this point.

Phase III.

  Just like the last phase, this phase doesn’t begin until you yet again need to perform the same task. This time you retrieve the script used in phase II, but this time before executing you might observe some key improvements that can be made to the script? It is the same principle as in writing papers; you need to give yourself some time after writing your first draft before you’ll see the problems with the paper. In this case, hopefully you see some key areas which can use some improvements, such as error checking  or even abstracting the process even further.

I have used this technique for nearly every application I have to install from the simplistic Apache to the very error prone Oracle RAC install. In each case, I have a script which performs each preparation step as well as the final install. The beauty of approaching your automation in three steps is that you can ease your way into it. The time between iterations can also help you realize key points in the final resultant script. My first Oracle RAC installation took days. Now, I can install a multi-node Oracle RAC cluster in a couple hours. Aside from the obvious speed benefit, I am also getting a consistent result. Consistency leads to predictability and predictability leads to easier, brain dead administration which is what we are really trying to accomplish. Right?

October 15, 2006

The Automation Mentality

Filed under: adminstration — jonEbird @ 5:11 pm

I am a lazy, perfectionist who happens to make a living as a system administrator. I could use more adjectives to describe myself, but those two in particular are what drive me to automate as many tasks as I can. The lazy part of me gets pissed off when I have to do some mundane task against 100 machines and rather seeks a much more efficient method. And the perfectionist part of me get annoyed when human error creeps in and contributes to inconsistent results across all of those 100 machines. Now after several years of advocating autonomous procedures I have become passionate about it.

Let’s explore human error further. The two most common cases of human error, that I have witnessed, are the accidental error such as the classic fat-finger and then the incorrect interpretation of a procedure. I typically avoid the classic fat-finger problem by preparing my commands a head of time and merely copy & pasting them into the terminal where I’m working. The incorrect interpretation of an instruction is a bit harder to avoid. At school, the professors would dictate the lab assignment in mathematical terminology. Many students complained about this and insisted that the professor explain the assignment in plain English. The professor explained that they didn’t write it out in plain English because the English language left too much to interpretation where the mathematical notation did not. I always thought that was quite slick, and after joining the corporate world, when I was to write documentation on how I configured a particular application I would dictate explicit shell commands vs. explaining it in English just like the professor. That way, my co-worker could not interpret my instructions any other way.

Aside from avoiding unnecessary errors, automating your work can make you more efficient, consistent and overall better collaborator in the team. Creating habits of automation start with humble beginnings. Start with documenting your work with explicit accuracy. I invest a full hour of documenting my work when it would normally take only thirty minutes. But next time, the same task will take only 15 to 20 minutes. As soon as the total number of times you perform this task equals five or more, you are now saving time. It’s all about efficiency but with the truly great side affect of being consistent as well. Again, the trick is to make it a habit. Once you’ve acted through it a few times and seen the benefits of doing so, I doubt that you’ll go back to your old ways. Say you need to edit a config file to update a particular value. Don’t edit it with vi!
alias vi=’echo “Hmmm… Lets try sed to edit”‘

As a critic, you might suggest that simply executing a script might remove the focus that might be necessary to ensure a quality procedure? I disagree. Any oversights and/or mistakes can actually be easier to handle with an autonomous approach. Firstly you fix your script to handle the oversight, then you create a mini-script with that fix/update and run it against every previous instance that you used this script against. There is little need to audit the previous instances since you know that your script was used in each iteration and therefore they all need the update. All in all, you again are saving time.