The Tribeca Type System

Next: Streams, Basic Operators, and Up: The Tribeca Query Language Previous: Overview and Example

The Tribeca Type System

Tribeca has both a data description language (DDL) and an extensible type system. Like other extensible database managers, the core data management software is type-independent. Data types and operators may be added to the system to support new applications. The procedure for creating extensions in Tribeca is similar to that of Postgres [15] so the details are omitted here. An extension type declaration defines operators and data representations (ascii, host byte order, network byte order) associated with the type. Tribeca allows operator overloading so the same operator name can be used in different data types.

The DDL allows users to create composite types from the compiled-in extension types. The DDL has a simple inheritance mechanism that allows users to describe the kinds of layered packet headers that are commonly found in network traffic data (for instance, UDP/IP and TCP/IP types both inherit from the IP type). In addition to inheritance, the DDL has built-in support for arbitrary offset and width bit fields since network protocols often include bit fields. The DDL has an enumerated type provision so that queries can refer to ID fields by name instead of number (e.g. the field that determines that a routed frame relay packet is transporting IP data has the value 0xCC, but this value can be referred to in a Tribeca query as ``IP''). Note that ad-hoc, unnamed composite types can also be created in queries.

For traffic analysis, support of both an extensible type system and a DDL are crucial. Extensions are needed because some hardware-generated time stamp fields and some fields of network protocols are difficult to describe in a data description language (for example, the DLCI field of a frame relay packet takes several bits from two different bytes of the packet header and combines them into a short int). Extensions are also used to incorporate into Tribeca the exotic statistical estimators used by the traffic analysts. The analysts also want control over the implementation of less exotic estimators, like mean, to ensure that the operator will be numerically stable for their workload. The DDL is important because it allows our non-programmer users to retarget their queries at new networks or at higher levels of the protocol stack without implementing extensions.

The inheritance mechanism and ad-hoc types also help Tribeca queries handle the diversity of higher level protocols used in networks. While at the lowest level all packets on the same network use the same protocol (i.e. a frame relay network carries only frame relay packets), higher level protocols can be quite diverse and their packets interleaved in complex ways. A Tribeca query examines each packet to find out what higher level protocol it uses and then coerces the packet to the appropriate child type and extracts fields of interest from the higher-level part of the packet. This extraction creates an ad-hoc type that can be examined by later parts of the query. We'll see an example of this in the section on multiplexing.

Next: Streams, Basic Operators, and Up: The Tribeca Query Language Previous: Overview and Example