Check out the new USENIX Web site. next up previous

Next: Operators Up: The Bro language Previous: The Bro language

Data types and constants

 

Atomic types. Bro supports several types familiar to users of traditional languages: bool for booleans, int for integers, count for non-negative integers (``unsigned'' in C), double for double-precision floating point, and string for a series of bytes. The first four of these (all but string) are termed arithmetic types, and mixing them in expressions promotes bool to count, count to int, and int to double.

Bro provides T and F as bool constants for true and false; a series of digits for count constants; and C-style constants for double and string.

Unlike in C, however, Bro strings are represented internally as a count and a vector of bytes, rather than a NUL-terminated series of bytes. This difference is important because NULs can easily be introduced into strings derived from network traffic, either by the nature of the application, inadvertently, or maliciously by an attacker attempting to subvert the monitor. An example of the latter is sending the following to an FTP server:

    USER nice\0USER root
where ``\0'' represents a NUL. Depending on how it is written, the FTP application receiving this text might well interpret it as two separate commands, ``USER nice'' followed by ``USER root''. But if the monitoring program uses NUL-terminated strings, then it will effectively see only ``USER nice'' and have no opportunity to detect the subversive action.

Similarly, it is important that when Bro logs such strings, or prints them as text to a file, that it expands embedded NULs into visible escape sequences to flag their appearance.

Bro also includes a number of non-traditional types, geared towards its specific problem domain. A value of type time reflects an absolute time, and interval a difference in time. Subtracting two time values yields an interval; adding or subtracting an interval to a time yields a time; adding two time values is an error. There are presently no time constants, but interval constants can be specified using a numeric (possibly floating-point) value followed by a unit of time, such as ``30 min'' for thirty minutes.

The port type corresponds to a TCP or UDP port number. TCP and UDP ports are distinct (internally, Bro distinguishes between the two, both of which are 16-bit quantities, by storing port values in a 32-bit integer and setting bit 17 for UDP ports). Thus, a variable of type port can hold either a TCP or a UDP port, but at any given time it is holding exactly one of these.

There are two forms of port constants. The first consists of an unsigned integer followed by either ``/tcp'' or ``/udp.'' So, for example, ``80/tcp'' corresponds to TCP port 80 (the HTTP protocol used by the World Wide Web). The second form of constant is specified using an identifier that matches one of the services known to the getservbyname library routine. (Probably these service names should instead be built directly into Bro, to avoid problems when porting Bro scripts between operating systems.) So, for example, ``telnet'' is a Bro constant equivalent to ``23/tcp.''

This second form of port constant, while highly convenient and readable, brings with it a subtle problem. Some names, such as ``domain,'' on many systems correspond to two different ports; in this example, to 53/tcp and 53/udp. Therefore, the type of ``domain'' is not a simple port value, but instead a list of port values. Accordingly, a constant like ``domain'' cannot be used in Bro expressions (such as ``dst_port == domain''), because it is ambiguous which value is intended. We return to this point shortly.

Values of type port may be compared for equality or ordering (for example, ``20/tcp < telnet'' yields true), but otherwise cannot be operated on.

Another networking type provided by Bro is addr, corresponding to an IP address. These are represented internally as unsigned, 32-bit integers, but in Bro scripts the only operations that can be performed on them are comparisons for equality or inequality (also, a built-in function provides masking, as discussed below). Constants of type addr have the familiar ``dotted quad'' format, tex2html_wrap_inline1136, where the tex2html_wrap_inline1138 all lie between 0 and 255.

More interesting are hostname constants. There is no Bro type corresponding to Internet hostnames, because hostnames can correspond to multiple IP addresses, so one quickly runs into ambiguities if comparing one hostname with another. Bro does, however, support hostnames as constants. Any series of two or more identifiers delimited by dots forms a hostname constant, so, for example, ``lbl.gov'' and ``www.microsoft.com'' are both hostname constants (the latter, as of this writing, corresponds to 13 distinct IP addresses). The value of a hostname constant is a list of addr containing one or more elements. These lists (as with the lists associated with certain port constants, discussed above) cannot be used in Bro expressions; but they play a central role in initializing Bro table's and set's, discussed in § 3.3 below.

Aggregate types.   Bro also supports a number of aggregate types. A record is a collection of elements of arbitrary type. For example, the predefined conn_id type, used to hold connection identifiers, is defined in the Bro run-time initialization file as:

    type conn_id: record {
        orig_h: addr;
        orig_p: port;
        resp_h: addr;
        resp_p: port;
    };
The orig_h and resp_h elements (or ``fields'') have type addr and hold the connection originator's and responder's IP addresses. Similarly, orig_p and resp_p hold the originator and responder ports. Record fields are accessed using the ``$'' operator.

For specifying security policies, a particularly useful Bro type is table. Bro tables have two components, a set of indices and a yield type. The indices may be of any atomic (non-aggregate) type, and/or any record types that, when (recursively) expanded into all of their elements, are comprised of only atomic types. (Thus, Bro tables provide a form of associative array.) So, for example,

    table[port] of string
can be indexed by a port value, yielding a string, and:

    table[conn_id] of ftp_session_info
is indexed by a conn_id record--or, equivalently, by an addr, a port, another addr, and another port--and yields an ftp_session_info record as a result.

Closely related to table types are set types. These are simply table types that do not yield a value. Their purpose is to maintain collections of tuples, expressed in terms of the set's indices. The examples in § 3.3 clarify how this is useful.

Another aggregate type supported is file. Support for files is presently crude: a script can open files for writing or appending, and can pass the resulting file variable to the print command to specify where it should write, but that is all. Also, these files are simple ASCII. In the future, we plan to extend files to support reading, ASCII parsing, and binary (typed) reading and writing.

We also note that a key type missing from Bro is that of pattern, for supporting regular expression matching against text. We plan to add patterns in the near future.

Finally, above we alluded to the list type, which holds zero or more instances of a value. Currently, this type is not directly available to the Bro script writer, other than implicitly when using port or hostname constants. Since its present use is primarily internal to the script interpreter (when initializing variables, per § 3.3), we do not describe it further.


next up previous

Next: Operators Up: The Bro language Previous: The Bro language

Vern Paxson
Sat Dec 6 01:53:24 PST 1997