Just as a word in a sentence occurs in a context, a character
in a string can be considered to occur in a context. For example,
in the string ``** object**'' the character ``** t**'' is said to occur
within the context ``** objec**''. However, we can also say that ``** t**''
occurs within the context ``** c**'', ``** ec**'', ``** jec**'', and ``**
bjec**''. The length of a context is termed its * order*.
In the example string, ``** jec**'' would be considered a
third order context for ``** t**''. In text compression these contexts are
used to model which characters are likely to be seen next. For
example, given that we have seen the context ``** object**'' it may be
likely that the next character be a space, or possibly an ``** i**'', but
it is unlikely that the next character is an ``** h**''. On the other
hand, if we only consider the first order context, ``** t**'', then ``**
h**'' is not so unlikely. Techniques that track multiple contexts of
varying orders are termed * Multi-Order Context Models* [2].
To prevent the model from quickly growing beyond available resources,
most implementations of a multi order context model limit the highest
order tracked to some finite number (**m**), hence the term *
Finite Multi-Order Context Model*.

At every point in the string, the next character
can be modeled by the last seen contexts (a set of order **0** through
**m**). For example, take the input string ``** objec**'' and limit our
model to a third order (**m=3**). The next character
can now be described by
four contexts { ø, ``** c**,'' ``** ec**,'' ``** jec**''
}. This set of
contexts can be thought of as the current state of whatever we are
modeling, be it a character input stream or a sequence of file system events.
With each new event, the set
of new contexts is generated by appending the newly seen event to the end
of the contexts that previously modeled our state. If the above
set was our current state at time **t**, and at time **t+1** we see the
character ``** t**'', our new state is described by
the set { ø, ``**
t**,'' ``** ct**,'' ``** ect**'' }. The nature of a context model,
where one set of contexts is built from the previous set, makes it
well suited for a
trie [9],
where the children of each node indicate the events that have
followed the sequence represented by that node. A resulting
property of this trie is that the frequency count for each current
context is equal to the sum of its children's counts plus
one. It is
from this property that we derive our probability estimate in §2.3

Tom M. Kroeger Tue Nov 14 00:27:35 PST 1995