Check out the new USENIX Web site.

next up previous
Next: Plain Text Up: Applications Previous: Applications

Web Pages

Many web pages display data in a custom format, using HTML markup to set off important parts of the text typographically or spatially. Figure 6 shows part of a page describing user interface toolkits [17]

Cumulus Technology Corp.,
1007 Elwell Court,
Palo Alto, CA, 94303,
(415) 960-1200,
Unix, Discontinued, Alpha-numeric terminal windows, Window System

Altia Design, Altia,
5030 Corporate Plaza Dr #300, Colorado Springs, CO, 80919,
(800)653-9957 or (719)598-4299,
UNIX or Windows, IB

Brad Myers,
Human-Computer Interaction Institute, Carnegie Mellon Univ, Pittsburgh, PA, 15213,
(412) 268-5150,,
X or MS Windows,
portable toolkit, UIMS

Figure 6: Excerpt from a web page describing user interface toolkits.
The page describes over 100 toolkits with various properties: some are free, some are commercial; some run on Unix, others Microsoft Windows, others Macintosh, and others are cross-platform. To browse the page conveniently, we might want to restrict the display to show only toolkits matching certain requirements - for example, toolkits running under both Unix and Microsoft Windows, sorted by price.

Each toolkit on this page is contained in a single paragraph (<P> element in HTML). So we might start by describing the toolkit as the Paragraph element, which is identified by the built-in HTML parser:

Toolkit = Paragraph

Finding the prices is straightforward using Number, a region set identified by the built-in USEnglish parser:

Price = ("\$" then Number | "FREE")
        in Toolkit;
Finding toolkits that run under Macintosh is easy (Toolkit contains "Mac"), since the page refers consistently to Macintosh as ``Mac''. But Unix platforms are sometimes described as ``X'', ``X Windows'', or ``Motif'', and Microsoft Windows is also called ``MS Windows'' or just plain ``Windows''. We deal with these problems by defining a constraint for each kind of platform that specifies all these possibilities and further constrains the matched literal to be a full Word (not just part of a word):
Macintosh = Word, "Mac";
Unix = Word, ("Unix" | "X" | "Motif");
MSWindows = Word, ("PC" |
      "Windows" but not just after "X");
Using these definitions, we can readily filter the web page for toolkits matching a certain requirements (Toolkit, contains Unix, contains MSWindows) and sort them according to Price.

Robert C. Miller and Brad A. Myers
Mon Apr 26 11:34:19 EDT 1999