Check out the new USENIX Web site.

next up previous
Next: Conclusions Up: Lightweight Structured Text Processing Previous: Related Work

Future Work

  This work is part of the first author's PhD thesis research, and continues to evolve. This section describes some of the directions in which the work will be taken in the coming months.

LAPIS will be extended with new matchers, parsers, and tools. A more useful matcher for literals would optionally ignore alphabetic case, optionally match only full words, match spaces in the literal expression against any background character, and optionally do simple stemming. Parser support would be improved by allowing parsers to operate on limited parts of the document - for example, applying an HTML parser only to Java documentation comments, which may contain HTML tags. Useful new tools would include computing statistics on region sets (such as counts, sums, and averages) and reformatting text by template substitution.

Another fruitful area for research is integration of lightweight structured text processing into other applications, in particular an extensible text editor such as EMACS. Integration with a text editor poses at least two challenges: the interface problem of using named region sets fluidly in direct-manipulation text editing, and the implementation problem of updating region sets cheaply as the user edits.

The text constraint language has room for improvement. It should be possible to count (e.g. 2nd Line in Table) and use numeric operators (e.g. Toolkit contains Price < 100). Constraint systems should support recursive or mutually recursive definitions. It would also be useful to precede a constraint expression by a fuzzy qualifier, such as always, usually, rarely, or never. A fuzzy qualifier describes how important it is for a matching region to satisfy the constraint. Finally, it will be important to determine the conditions under which our text contraints implementation (tandem tree intersection) runs in linear time.

Robert C. Miller and Brad A. Myers
Mon Apr 26 11:34:19 EDT 1999