Check out the new USENIX Web site. SAGE - Perl Practicum


Perl Practicum: A Plea for Clarity

by Hal Pomeranz

After a brief hiatus, welcome back to the second year of Perl Practicum. This month, Rob asked me to write a little piece about writing readable Perl. I started out with every intention of doing just that, but the article evolved instead into a piece on writing maintainable Perl. This dismayed me for a bit, until I realized that maintainability was the primary driving force for clarity. If, having been written, one's code never had to be looked at again, the motivation for writing clear code would be greatly reduced (note: not eradicated). That bit of philosophy done with, let us proceed with what amounts to little more than a collection of useful tips I have collected through the years.

A Little About Loops

The first rule is "say what you mean." Compare the following two mechanisms for doing the same thing:
        grep($array{$_}++, @list);		# BAD!

        for (@list) {			# OK 	
             $array{$_} = 1;
        }

I claim the second form is to be preferred since it makes clear what is going on: we are iterating over @list and setting values in %array to be non-zero (the assignment, as opposed to the auto-increment operator, is significant). There are several other, more verbose, ways to rewrite the above "for" loop: decide which form you are most comfortable with, but avoid grep() as a loop operator. I generally prefer to optimize for readability over performance.

As another example, consider the following two infinite loop constructs:

        while (1) {			# BAD! 	
        ...
        }

        for ( ;; ){			# OK 	
        ... 	
        }

The second form tends to be more visually arresting, it alerts the reader that something important is happening.

As long as we are on the subject of loops, let us examine another rule for clear communication, "say it succinctly." The goto statement and multi-level break commands are both to be abhorred because they hamper the reader's ability to conceptualize the program flow at a glance. I went looking for some "before and after" examples of these constructs and did not find any that would easily fit the space boundaries imposed upon this series. Enough said, I think.

Conditional Expressions

Along the lines of "saying what you mean" remember that you can always use until instead of while and unless instead of if:
        &usage() unless (@ARGV);
        until ($value > $LIMIT) {
             ... 	
        }

Avoiding extra negation in conditional expressions can be a great aid to clarity. Perl can read like clear prose if you are careful and use informative symbolic names.

With the postfix conditional operators, be careful to put the most important part of the statement up front. This is why we write:

        open(...) || die ... ;	   	# recommended

rather than
        die ... unless open(...);		# EVIL!

The purpose of the statement is to associate a file handle with a file or process. The die() operation is merely a case of exception handling.

Similarly, avoid overloading conditional expressions with operations which actually manipulate program data or have other side effects. Evaluate an expression to take a logical branch in the program flow and then perform your operations.

Parentheses, Functions, and Others

Always err on the side of extra parentheses, though of course too many can cause problems as well. In conditional expressions, "extra" parentheses will help the reader parse the expression. They also help protect the application from maintenance by programmers with a poorer grasp of operator precedence than the author. Perl is of course extremely forgiving as far as parenthesizing function argument lists. Always parenthesize function arguments. The classic example of the importance of this rule is taken directly from the Camel book:
        print (1+2)*3, "\n";	# INCORRECT!

This prints the value expression in parentheses, i.e., the number three without a newline. The statement is syntactically correct (points to you if you figure out exactly what happens in the rest of the line) and the Perl interpreter will not complain, but the output is wildly different from:
        print((1+2)*3, "\n");	# CORRECT

which is probably what the author of the code intended.

If you only need a few scattered values out of a list value returned by a function, please avoid assignment to dummy variables. In other words, do:

        ($login, $name, $home) = (getpwent)[0,6,7];	# GOOD

rather than:
        ($login, $dummy, $dummy $dummy,	$dummy, $dummy, $name, $home) = getpwent; # BAD

Aside from wasted typing, the second form obscures precisely which information you are interested in manipulating.

Odds and Ends

It is a good general principle when writing clear code never to rely on default behaviors. Explicitly undef your variables or assign them zero values before using them for the first time. This helps to avoid errors introduced through later modifications.

Function defaults are a trickier issue. You can pretty well assume that any Perl function will operate on $_ or @_ when given no arguments. This is a nice feature and I use it all the time (too convenient to give up, I suppose). It does, however, make Perl code less than clear to the uninitiated reader, and I have had occasions where something unexpected has cropped up because $_ did not contain what I thought it did. On a more trivial issue, I would like to make a plea for explicitly using the "<" character when opening a file for reading, even though this is the default behavior for open(). Never hard-code pathnames or other constants into your program. Assign these values to variables AT THE TOP of your program. For example, here are the first few lines of an application I wrote to manipulate a remote optical jukebox:

        #!/usr/bin/perl

        $jukehost = `gator';
        $nfsjukedir = `/rd/juke'; 	
        $realjukedir = `/export/jb/jb0'; 	
        $localjukedir = `/jukebox'; 	
        $remotecmd = `/usr/local/etc/jbadm';

When the code is written in this fashion, maintenance becomes a breeze.

Always explicitly close file and directory handles as soon as you finish processing the data. You avoid potential shortage problems, protect your code from interesting side effects caused by later modification, and make your code clearer to the hypothetical external viewer.

Issues of Convention

The careful reader will note that I have been discussing issues of clarity related to program syntax. Equally important are issues which are not dictated by the language definition, such as your indentation scheme, variable naming conventions, and commenting scheme. These are also the areas where you run into the most religious warfare.

To avoid this morass (for example, everybody I know hates my bracing style), I suggest only one simple rule. Pick a site standard that everybody can live with and stick to it. Even a bad standard is better than no standard at all. If you are forced to maintain code that is developed and used externally to your organization, then maintain whatever conventions pertain to the code as you received it.

For a good starting point, there is a document available on the Internet entitled Recommended C Style and Coding Standards (originally from a document prepared by committee at Bell Labs, but modified by Henry Spencer, David Keppel, and Mark Brader). Obtain /pub/cstyle.tar.Z from ftp.cs.washington.edu.

Further Study

Please note that everything said above applies pretty well to any language you choose to program in. Certain constructs may or may not be available to you, but clarity should be syntax-independent. For a more in depth treatment of this material, start with a good C style guide and then follow up through any bibliographic information provided in it.

Reproduced from ;login: Vol. 19 No. 6, December 1994.


?Need help? Use our Contacts page.
Last changed: May 24, 1997 pc
Perl index
Publications index
USENIX home