Check out the new USENIX Web site. next up previous
Next: Demultiplexing and Remultiplexing Streams Up: The Tribeca Query Language Previous: The Tribeca Type System

Streams, Basic Operators, and Simple Queries

Every Tribeca query has a single source stream and at least one result stream. Stream declarations associate a name with the external source (result) of the stream data, such as a disk or tape file. The source_stream statement must also declare the stream's data type. The data types of result streams are derived from the operator that writes to them.

Tribeca supports three kinds of simple operators: qualifications, projections and aggregates. A Tribeca query can combine operators using pipes and transform the stream in several stages. Note that while Tribeca borrows the Unix term ``pipe'', these are not Unix pipes. Tribeca operators are not implemented as separate processes. As the example below illustrates, a Tribeca pipe statement names a stream of data and allows users to express data flow from one Tribeca operator to another.

Qualification operators filter data in a stream. Tribeca's qualification statement specifies a source stream, a result stream, and a list of qualification operators to be applied to the source. Records from the source stream that pass the qualifications are placed on the result stream. While the list of operators is implicitly a conjunction, Tribeca supports the usual complex qualifications involving AND, OR and NOT.

A projection selects one or more fields from each record in the source stream, assembles the fields into a new record, and puts the record onto the result stream. The projection statement may also apply a function to the field during the projection operation. It is important to realize that the projection statement is provided to allow users to construct simple, readable queries. As explained in Section 3, the stream data model allows most projections to be eliminated during compilation. Intermediate tuples are never materialized in Tribeca unless they are used as hash keys or written to external storage.

An aggregate operator is applied to all of the values in a stream and produces a single value. While aggregates of basic Tribeca streams are sometimes useful, queries usually produce streams of related aggregates using demultiplex and window operations described below.

The simple query below uses all operators introduced so far:

source_stream s1 is {tape sample1 AtmTrace}
result_stream r1 is {file res1}
result_stream r2 is {file res2}
stream_pipe p1 p2
stream_proj {{s1.atm.ts s1.atm.vci}} p1
stream_qual {{p1.ts.lte 1000000}} r1
stream_qual {{p1.vci.gt 5} {p1.vci.lt 50}} p2
stream_agg {p2.ts.min} r2
The query reads a source stream of type AtmTrace from tape. It uses project to create a stream of (time stamp, VCI) pairs (an ad-hoc composite type). That stream is then passed to two different quals. The first saves into a file all (ts,VCI) pairs with timestamp less than 1 million. The second finds all (ts,VCI) pairs in which the VCI is between 5 and 50. Finally, the aggregate finds the minimum time stamp from those pairs.

As described so far, Tribeca queries are trees of stream operations. The source stream may feed any number of operators. Each operator writes to a pipe or a result stream. The source and intermediate streams may feed any number of operators. An intermediate or result stream derives data type from the operator that writes it. None of the simple stream operations introduced so far take their input from more than one stream, but we will introduce operators for combining streams in the next few sections.


next up previous
Next: Demultiplexing and Remultiplexing Streams Up: The Tribeca Query Language Previous: The Tribeca Type System