Check out the new USENIX Web site.

next up previous
Next: Related Work Up: Applications Previous: Plain Text

Source Code

Source code can be processed like plain text, but with a parser for the programming language, source code can be queried much more easily. LAPIS includes a Java parser, so the examples that follow are in Java.

Unlike other systems for querying and processing source code, TC operates on regions in the source text, not on an abstract syntax tree. At the text level, the user can achieve substantial mileage knowing only a few general types of regions identified by the parser, such as Statement, Comment, Expression, and Method, and using text constraints to specialize them. For example, our parser identifies Comment regions, but does not specially distinguish the ``documentation comments'' that can be automatically extracted by the javadoc utility. Figure 8 shows a Java method preceded by a documentation comment.


/**
 * Convert a local filename to a URL.
 * For example, if the filename is "C:\FOO\BAR\BAZ",
 * the resulting URL is "file:/C:/FOO/BAR/BAZ".
 * @param file File to convert
 * @return URL corresponding to file
 */
public static URL FileToURL (File file) throws MalformedURLException {
    return new URL ("file:" + toURLDelimiters (file.getAbsolutePath ()));
}
Figure 8: A Java method with a documentation comment.
The user can find the documentation comments by constraining Comment with a text-level expression:

DocComment = Comment starts with "/**";

A similar technique can be used to distinguish public class methods from private methods:

PublicMethod = Method starts with "public";

In this case, however, the accuracy of the pattern depends on programmer convention, since attributes like public may appear in any order in a method declaration, not necessarily first. All of the following method declarations are equivalent in Java:

public static synchronized void f ()
static public synchronized void f ()
synchronized static public void f ()

If necessary, the user can deal with this problem by adjusting the pattern (e.g., Method starts with Line contains "public") or relying on the Java parser to identify attribute regions (e.g., Method contains Attribute contains "public") . In practice, however, it is often more convenient to use typographic conventions, like public always appearing first, than to modify the parser for every contingency. Since text constraints can express such conventions, constraints might also be used to enforce them, if desired.

We can use DocComment and PublicMethod to find public methods that need documentation:

PublicMethod but not just after DocComment;

Text constraints are also useful for defining custom structure inside source code. Java documentation comments can include various kinds of fields, such as to describe method parameters, to describe the return value, and to describe exceptional return conditions. These fields can be described by text constraint expressions:

DocField = starts with delimiter "@", 
           in DocComment;
ParamDoc = DocField, starts with "@param";
ReturnDoc = DocField, starts with "@return";
ExceptionDoc = DocField, starts with 
                           "@exception";
Using this structure, we can find methods whose documentation is incomplete in various ways. For example, this expression finds methods with parameters but no parameter documentation:
PublicMethod contains FormalParameter, 
  just after (DocComment but not 
              contains ParamDoc);



next up previous
Next: Related Work Up: Applications Previous: Plain Text



Robert C. Miller and Brad A. Myers
Mon Apr 26 11:34:19 EDT 1999