Conference on Domain-Specific Languages, 1997
Modeling Interactive 3D and Multimedia Animation with an Embedded Language
While interactive multimedia animation is a very compelling medium, few people are able to express themselves in it. There are too many low-level details that have to do not with the desired content--e.g., shapes, appearance and behavior--but rather how to get a computer to present the content. For instance, behaviors like motion and growth are generally gradual, continuous phenomena. Moreover, many such behaviors go on simultaneously. Computers, on the other hand, cannot directly accommodate either of these basic properties, because they do their work in discrete steps rather than continuously, and they only do one thing at a time. Graphics programmers have to spend much of their effort bridging the gap between what an animation is and how to present it on a computer.
We propose that this situation can be improved by a change of language, and present Fran, synthesized by complementing an existing declarative host language, Haskell, with an embedded domain-specific vocabulary for modeled animation. As demonstrated in a collection of examples, the resulting animation descriptions are not only relatively easy to write, but also highly composable.
Any language makes some ideas easy to express and other ideas difficult. As we will argue in this paper, today's mainstream programming languages are ill-suited for expressing multimedia animation (3D, 2D and sound), both in their basic paradigm and their vocabulary. These languages support what we call "presentation-oriented" programming, in which the essential nature of an animation, i.e., what an animation is, becomes lost in details of how to present it. We consider the question of what kind of language is suitable for capturing just the essence of an animation, and present one such language, Fran, synthesized by complementing an existing declarative "host language", Haskell, with an embedded domain-specific vocabulary.
We propose an alternative to "presentation-oriented" programming, namely "modeling", in which a model of the animation is described, leaving presentation as a separate task, to be done automatically. This idea of modeling has been applied fruitfully in the area of non-animated 3D graphics as discussed below, and is now almost widely, though not universally, accepted. Our contribution is to extend this idea in a uniform style to encompass as well sound and 2D images, and across the time dimension, in order to model animations over a broad range of types. For brevity, this paper concentrates on 3D animation, but it is really the uniform integration of different types that gives rise to great expressive power. (See Elliott and Hudak  for 2D examples.)
While imperative programming languages are suited to presentation-oriented programming, the modeling approach requires a different kind of language. Unfortunately, bringing a useful new language into being is quite a daunting task, requiring design of semantics and syntax, implementation of compilers and environment tools, and writing of educational material. However, as Peter Landin taught us thirty years ago, we can logically separate a language into (a) a domain-specific vocabulary and (b) a domain-independent way of composing more complex things from simpler ones. In other words, a language is a combination of "host language" and a "domain-specific embedded language" (DSEL). By reusing the same host language for several different vocabularies, we can amortize the cost of its creation over more uses. In fact, unlike thirty years ago, we are now fortunate enough to have various candidate languages from which to choose. In this paper, we examine various features of a candidate host language to see which are helpful and which are not helpful for modeled animation. We find that Haskell is a fairly good fit, requiring only a few compromises.
The rest of this paper is organized as follows. Section 2 starts with a few examples of modeled animations. Section 3 introduces the notions of presentation and modeling for non-animated 3D graphics, and looks at some concrete benefits. Section 4 extends the idea and benefits of modeling to a variety of types besides 3D geometry, including sound and 2D images, and across the time dimension. Section 5 considers the pragmatics of creating a new domain specific language (DSL), and motivates the DSEL approach. Section 6 examines the usefulness of host language features in some detail. The remainder of the paper looks at related work and describes some directions for future work on modeled animation.
In this section we present a handful of modeled animations, in order to make later discussion more concrete.
To start, we import a simple 3D model of a sphere from "X file" format.
sphere :: GeometryB sphere :: GeometryB sphere = importX "sphere1.x"
The type GeometryB represents 3D geometry animations. (This "animation" happens to be a "static", i.e., not time-varying, one.) Similarly, we import a teapot model. However, the teapot is in an awkward orientation, so we adjust it after importing, rotating around the X axis by an angle of -pi/2:
teapot :: GeometryB teapot = rotate3 xVector3 (-pi/2) **% importX "tpot2.x"
Note that in Haskell, function application binds more tightly than all infix operators. Here are the types of the modeling vocabulary we used:
xVector3 :: Vector3B rotate3 :: Vector3B -> RealB -> Transform3B (**%) :: Transform3B -> GeometryB -> GeometryB
The constant xVector3 is the unit vector pointing in the positive X direction. rotate3 takes an axis vector and a number and yields a 3D transform. The operator **% applies a 3D transform.
Although types like GeometryB and Vector3B are potentially animated, the example so far uses static animations. Next we will color the teapot red and make it spin around the Y axis.
redSpinningPot = rotate3 yVector3 time **% withColorG red teapot
The new features are "time", the unit Y vector and application of a color to a geometric model:
time :: RealB yVector3 :: Vector3B withColorG :: ColorB -> GeometryG -> GeometryG
The use of time here deserves special attention. It is a primitive number-valued animation (hence the type RealB) representing the flow of time. Note that time is not a mutable real value, but a fixed animation. Animations are essentially functions of time, with time being the identity function, and operations like rotate3, withColorG, being **% are combinators that map functions of time to functions of time.
Next, generalize this simple spinning teapot, so that its color and rotation angle are parameters.
spinPot :: ColorB -> RealB -> GeometryB spinPot potColor potAngle = rotate3 yVector3 potAngle **% withColorG potColor teapot
We will make use of the potSpin function in a series of three interactive 2D animations.
spin1, spin2 :: User -> ImageB spin1 = withSpin potSpin1 spin2 = withSpin potSpin2
When an animation is interactive, its type is a function from the user supplying input. Hence the type above. Yet to be defined are withSpin, potSpin1, and potSpin2. First, we will give their types and an informal description of their purpose.
potSpin1, potSpin2 :: RealB -> User -> GeometryB withSpin :: (RealB -> User -> GeometryB) -> User -> ImageB
The two potSpin functions take as arguments an animated number, which will be related to the rotation angle passed to spinPot, and a user from which to get input. In the simplest case, just ignore the user, use red for the pot color, and pass on the angle argument unchanged:
potSpin1 angle u = spinPot red angle
The withSpin function takes one of these geometry producers and renders it together with some textual instructions.
withSpin f u = growHowTo u `over` renderGeometry (f (grow u) u) defaultCamera
The function grow will be defined below. Its job is to turn user input into an animated angle, which gets passed to the geometry producer. The produced geometry is rendered with a default camera to produce a 2D animation, which is combined with the instruction text image. The function renderGeometry takes geometry and camera (animated as always), and yields a 2D animation:
renderGeometry :: GeometryB -> Transform3B -> ImageB
A more interesting pot spinner
Before looking into the definition of grow, we will see the second pot-spinning geometry producer, which adds a few new features:
potSpin2 potAngleSpeed u = spinPot potColor potAngle `unionG` light where light = rotate3 yVector3 (pi/4) **% translate3 (vector3Spherical 2 time 0) **% uscale3 0.1 **% withColorG white (sphere `unionG` pointLightG) potColor = colorHSL (sin time * 180) 0.5 0.5 potAngle = integral potAngleSpeed u
Note the expression "sin time * 180" used in defining the teapot's color. The meaning of sin and "*" are not the usual ones, operating on numbers, but rather counterparts "lifted" to consume and produce number-valued animations (of type RealB). Even the numeric literal 180 is taken to mean an unchanging number-valued animation (having type RealB). Haskell's overloading ability, based on type classes is responsible for this great syntactic convenience. Several dozen functions have been lifted in this way, so that, for instance, sin and "*" not only have the usual types
sin :: Float -> Float (*) :: Float -> Float -> Float
sin :: RealB -> RealB (*) :: RealB -> RealB -> RealB
Now we turn to grow, which converts user input to a time-varying angle (of type RealB). It is defined as the integral of the value generated by bSign, defined below, which produces an animated number that has value zero when no mouse buttons are pressed, but switches to negative one or positive one while the user is holding down the left or right mouse button. The angle value produced by grow is thus growing while the right button is pressed, shrinking while the left is pressed, and constant when neither button is pressed.
grow :: User -> RealB grow u = integral (bSign u) u
(The reason that even integral takes a user argument is that integration is done numerically, and must somehow know how hard to work on the approximation.)
The bSign function is itself defined in terms of a more general function selectLeftRight, which switches between three values, depending on the left and right button states.
bSign :: User -> RealB bSign u = selectLeftRight 0 (-1) 1 u selectLeftRight :: a -> a -> a -> User -> Behavior a selectLeftRight none left right u = condB (leftButton u) (constantB left) (condB (rightButton u) (constantB right) (constantB none))
Some explanation: the use of a lower-case type name ("a") above means that selectLeftRight is polymorphic, applying to any type of argument. The function condB is a behavior-level conditional, taking an animated boolean and two animated values, and choosing between the two continuously. The Fran primmitive constantB turns a regular "static" value into a constant animated value (as required here by condB). The leftButton and rightButton functions tell whether the mouse buttons are pressed.
It is easy to define these two button state functions, in terms of a toggling function that takes an initial value and two events that tell when to switch to true and when to false.
leftButton, rightButton :: User -> BoolB leftButton u = toggle (lbp u) (lbr u) rightButton u = toggle (rbp u) (rbr u) toggle :: Event a -> Event b -> BoolB toggle go stop = stepper False (go -=> True .|. stop -=> False)
The functions lbp, lbr, rbp, and rbr, yield left and right button press and release events.
lbp, rbp, lbr, rbr :: User -> Event ()
The stepper function takes an initial value v and an event e, and yields a piecewise-constant behavior that starts out as v and switches to the values associated with occurrences of e. In the definition of toggle, the event is constructed from the go and stop argument events, using the event handling operator "-=>" and the event merging operator ".|.". As a result, the constructed event occurs with value True whenever go occurs and with value False whenever False occurs. (Note: the event operators are described in Elliott and Hudak , but their semantics have changed since that publication, and now consist of a sequence of occurrences, not just a single one. Also, the button press events and mouse motion behavior are functions of a User rather than a start time.)
Finally, to produce instructions and user feedback, we define growHowTo, which produces a rendered string, colored yellow and moved down to be out of the way. The text gives instructions when neither button is pressed, says "left" while the left button is pressed, and "right" while the right button is pressed. Its definition involves 2D versions of vectors, transform formation and application, and coloring, plus the polymorphic function selectLeftRight, defined above.
growHowTo :: User -> ImageB growHowTo u = moveXY 0 (-1) ( withColor yellow ( simpleTextImage messageB )) where messageB = selectLeftRight "Use mouse buttons to control pot's spin" "left" "right" u
Many more examples of functional animation may be found in Elliott and Hudak , Elliott , and Daniels . See also the user's manual (Peterson and Ling ), which contains precise types and informal meanings of the embedded animation modeling vocabulary and still more examples.
With the given examples in mind, we step back from our chosen approach to expressing interactive animation, and consider the history, the benefits of "modeling", and of language embedding.
Presentation vs. modeling for 3D geometry
The practice of 3D graphics programming has made tremendous progress over the past three decades. Originally, if you wanted your program to display some graphics you had to work at the level of pixel generation. You had to master scan-line conversion of lines, polygons, and curved surfaces, hidden surface elimination, and lighting and shading models--rather complex tasks. A significant advancement was the distillation of this expertise into rendering libraries (and of course underlying hardware). With a rendering library, such as GL by Silicon Graphics, you could express yourself at the level of triangles and transformation matrices. While an advancement, these libraries presented a view of a somewhat complex state machine containing registers such as the current material properties and the current local or global transformation matrices. You had to drive this state machine, push register values onto a stack, change them, instruct the library to display a collection of triangles, and restore the registers at the right time.
The next major advancement was to further factor out common chores of graphics presentation into libraries that presented complex structured models, as exemplified in such systems as PHIGS, SGI's Inventor and Performer, VRML, and Microsoft's Direct3D RM (retained mode). The paradigm shift from presentation to modeling for geometry has had several practical benefits:
In spite of the benefits listed above, not everyone has made the shift from presentation to modeling of geometry. The primary source of resistance to this paradigm shift has been that it entails a loss of low level control of execution, and hence efficiency. As mentioned above, handing over low level execution control from the application to the presentation sub-system actually benefits execution efficiency where authors lack the significant resources and expertise required implement, optimize, and port their programs for all required platforms. In other cases, as in the case of current state-of-the-art commercial video games, the resources and expertise are available and well worth the considerable investment. An example is Doom, which would have been a failure at the time if implemented on top of a general-purpose presentation library. On the other hand, even Doom and its successors really adopt the modeling paradigm, in that they consist of a rendering engine paired with a modeling representation. In addition to the loss of direct control of efficiency, modeling tends to eliminate some flexibility in the form of presentation-level tricks that do not correspond to any expressible model. In our experience, these tricks tend not to scale well and are not composable, and in cases that do, are achievable through model extensibility.
There have been many other similar paradigm shifts, generally embodied in specialized languages sometimes with corresponding tools that generate the language. Examples include dialog box languages and editors; grammar languages and parser generators; page layout languages and desktop publishing programs; and high-level programming languages and compilers.
Modeling vs. presentation for animation
The conventional approach to constructing richly interactive animated content much like the old days of graphics rendering, as described briefly above, that is one must write sequential, imperative programs. (Much animation is in fact modeled rather than programmed, because it comes from animation authoring tools, but interaction is severely limited, for instance to hyper-linking.) These programs must explicitly manage common implementation chores that have nothing to do with the content of animation itself, but rather its presentation on a digital computer. These implementation chores include:
The essence of modeled animation is to carry the presentation/modeling paradigm shift beyond static (non-time-varying) 3D geometry, and thus more broadly reap the kind of benefits described in the previous section. The extensions to static geometric modeling embodied in modeled animation include the following:
By extending modeling from static 3D to other types and to animation, we also extend the modeling benefits listed in the previous section. Most of these benefits translate in straightforward ways, but some possible non-obvious extensions are as follows:
So far, we have used the term "modeling language" loosely. In this section, we make a more precise examination of the different possible notions of "language" and some of their pros and cons for practical use.
A language may be thought of as the combination of two complementary aspects. One aspect is domain-generic, and contains fundamental syntactic and semantic notions like definition and use of names for values and types, construction and application of functions or procedures, control flow, and typing rules. The other language aspect is a domain-specific vocabulary, describing, e.g., math operations on floating point numbers, string manipulation, lists and trees, and in our context, geometry, imagery, sound and animation.
Holding these two language aspects in mind, there are two strategies we could adopt in making concrete the idea of an animation modeling language, or any DSL, which we will call "integrated" and "embedded" respectively. In the integrated approach, the DSL combines both language aspects. In the embedded approach, the domain-specific vocabulary is introduced into an existing "host" programming language. While these two strategies may be similar in spirit, the pragmatics of carrying them out differ considerably.
The chief advantage of integration is that one can have a perfectly suited language, semantically and syntactically, while the embedded approach requires toleration of compromises made to accommodate a broad range of domains. In return for this toleration, the embedded DSL approach allows us to use already existing language infrastructure.
To be useful in practice, not just a toy or a research experiment, a complete DSL needs several components, well designed and well executed:
Given this list, we have ample incentive to try to make the embedded DSL approach work, if we can find a sufficiently suitable existing host language. We now take a closer look at the question of what features constitute suitability.
Choosing a host language for modeled animation
We have found a variety of host language features to be helpful for animation modeling, while others were harmful. The helpful features include the following, some of these features are obvious from a programming language perspective, but are in fact missing or very weakly present in popular model formats for geometry and animation.
Laziness also plays a role complementary to garbage collection, for efficient use of memory. Laziness delays consumption of memory until just before an animation component is needed, while garbage collection frees the memory when an animation component is no longer needed.
Imperative programming languages, such as C, C++, Java and Visual Basic, have statements in addition to expressions, and in fact, emphasize statements over expressions. For example, in these languages, it is possible to introduce a scoped variable in a statement, but not in an expression. Also, if works on statements, though C has its ternary ?: expression operator. While expressions are primarily for denoting values, statements are for denoting changes to an internal or external state. State changes certainly occur during presentation of a model, but are not appropriate in the model itself, as they interfere with composability, optimizability, and multithreaded, parallel and distributed execution. Common language features that are statement-oriented, and which thus do not useful for modeled animation, include the following:
Given the language requirements and non-requirements above, we now return to the "integrated-vs-embedded" question, keeping in mind that design and implementation of a new programming language and development tools, and creation of required educational material are formidable tasks, not to be undertaken unless genuinely necessary. Fortunately, there are well-suited existing languages, the so-called "statically typed, higher-order, purely functional" languages. Of those languages, Haskell (Hudak et al [1992b], Hudak and Fasel [1992a]) has the largest following, has an international standard (Haskell 1.4) and is undergoing considerable development. For these reasons, we have chosen Haskell for our own implementation of the ideals of modeled animation. Other languages can be used as well, with varying tradeoffs. For example, Java is more popular than Haskell, and while predominately statement-oriented, it does support garbage collection.
While neither the current development tools and educational material for Haskell programming, nor the size of the Haskell programming community, is impressive compared to those of mainstream languages, we believe that both are sufficient to act as a seed, with which to generate initial compelling applications. We hope that these initial applications will inspire curiosity and creativity of a somewhat larger set of programmers, leading to better development tools and written materials, yet more compelling applications, and so on, in a positive feedback cycle.
Aside from issues of familiarity, there will always be an important role for imperative computation in the construction of complete applications, which is best described using statement-oriented programming languages. One then could throw such features into a modeling language, or even try to force imperative programming languages to also serve as modeling languages. We prefer the approach of multi-lingual integration, which is to support construction of application modules in a variety of languages and then combine the parts, generally in compiled form, with a language neutral tool.
The idea of an "domain-specific embedded language" is, we believe, the central message in Landin's seminal "700" paper:
Arya  used a lazy functional language to model non-interactive 2D animation as lazy lists of pictures, constructed using list combinators. This work was the original inspiration for our own; we have extended it to interactivity, continuous time, and many other types besides images.
TBAG modeled animations over various types as functions over continuous time (Elliott et al , Schechter et al ). It also used the idea of lifting function on static values into functions on animations, which we adopted for Fran. Unlike Fran, however, reactivity was handled imperatively, through constraint assertion and retraction, performed by an application program. Like Fran, TBAG was an embedded language, but it used C++ as its host language, in an attempt to appeal to a wider audience. The C++ template facility was adequate for parametric polymorphism. The notation was in some ways even more malleable than in Haskell, because C++ overloading is genuinely ad hoc. On the other hand, unlike Haskell, C++ only admits a small fixed set of infix operators. The greatest failings of C++ (or Java) as a host language for a modeling language are its lack of an expression-level "let", and the absence of higher-order functions. The latter may be simulated with objects, but without a notational equivalent to lambda expressions.
Obliq-3D is another 3D animation system embedded in a more general purpose programming language (Najork and Brown ). However, its host language is primarily imperative and object-oriented, rather than functional. Accordingly, Obliq-3D's models are initially constructed, and then modified, by means of side-effects. In this way it is reminiscent of Inventor (Strauss ).
Direct Animation is a library developed at Microsoft to support interactive animation (Microsoft ). It is designed to be used from mainstream imperative languages such as Java, and mixes the functional and imperative approaches. Fran and Direct Animation both grew out of an earlier design called ActiveVRML (Elliott ), which was an "integrated" DSL.
There are also several languages designed around a synchronous data-flow notion of computation, including Signal (Gautier et al ) and Lustre (Caspi et al ), which were specifically designed for control of real-time systems. In Signal, the most fundamental idea is that of a signal, a time-ordered sequence of values. Unlike Fran, however, time is not a value, but rather is implicit in the ordering of values in a signal. By its very nature time is thus discrete rather than continuous, with emphasis on the relative ordering of values in a data-flow-like framework. The designers of Signal have also developed a clock calculus with which one can reason about Signal programs. Lustre is a language similar to Signal, rooted again in the notion of a sequence, and owing much of its nature to Lucid (Wadge and Ashcroft ).
Traditionally the programming of interactive 3D and multimedia animations has been a complex and tedious task. We have argued that one source of difficulty is that the languages used are suited to describe how to present animations, and in such descriptions the essential nature of an animation, i.e., what an animation is, becomes lost in details of how to present it. Focusing on the "what" of animation, i.e., modeling, rather than the "how" of its presentation, yields a much simpler and more composable programming style. The modeling approach requires a new language, but this new language can be synthesized by adding a domain-dependent vocabulary to an existing domain-independent host language. We have found Haskell quite well-suited, as demonstrated in a collection of sample animation definitions.
A running theme of this paper has been economy of scale. We recommend making choices that amortize effort required over several uses of the fruits of that effort. The alternatives are poor quality or impractically high cost. Specifically:
A notable exception to the necessity of modeling, embedding and composability for high quality interactive animation is in software that can sell in huge quantity, which then exploits an end-user economy of scale. The unfortunate consequence to this exception, however, is a kind of mainstreaming of the content, as in violent video games. Fortunately, however, even these games are often implemented using the modeling approach, and allow consumers to create new characters and worlds for them.
There are ample opportunities for future work in modeled animation, including the following.
My thoughts on "domain-specific embedded language" have been greatly influenced by Paul Hudak. Philip Wadler pointed out the connection to Landin's "700" paper. Todd Knoblock and Jim Kajiya helped to explore the basic ideas of modeled animation. Sigbjorn Finne helped with the implementation during a summer research internship. Alastair Reid made many implementation improvements. Paul Hudak, Alastair Reid, and John Peterson at Yale provided many helpful discussions about functional animation, how to use Haskell well, and lazy functional programming in general. Gary Shu Ling helped get Fran running under GHC. Byron Cook gave many helpful comments on an earlier draft to improve readability.
Fran runs under Windows 95 and NT 4.0, and is freely available at http://www.research.microsoft.com/~conal/Fran/.
Anthony Daniels , "Fran in Action!", in preparation, http://www.cs.nott.ac.uk/~acd/action.ps
Conal Elliott [February 1996], "A Brief Introduction to ActiveVRML", Technical Report MSR-TR-96-05, Microsoft Research, ftp://ftp.research.microsoft.com/pub/tr/tr-96-05.ps
Conal Elliott , "Composing Reactive Animations", To appear in Dr. Dobb's Journal, http://www.research.microsoft.com/~conal/fran/tutorial.htm .
Conal Elliott, Greg Schechter, Ricky Yeung and Salim Abi-Ezzi [July 1994], "TBAG: a High Level Framework for Interactive, Animated 3D Graphics Applications", in Andrew Glassner, editor, Proceedings of SIGGRAPH '94 (Orlando, Florida), pages 421-434. ACM Press, http://www.research.microsoft.com/~conal/tbag/papers/siggraph94.ps
Conal Elliott and Paul Hudak [June 1997], "Functional Reactive Animation", in Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming, http://www.research.microsoft.com/~conal/papers/icfp97.ps
Thierry Gautier, Paul Le Guernic, and Loic Besnard , "Signal: A Declarative Language for Synchronous Programming of Real-Time Systems", in Gilles Kahn, editor, Functional Programming Languages and Computer Architecture, volume 274 of Lecture Notes in Computer Science, edited by G. Goos and J. Hartmanis, pages 257-277. Springer-Verlag, 1987.
Paul Hudak and Joseph H. Fasel [May 1992a], "A Gentle Introduction to Haskell". SIGPLAN Notices, 27(5). See http://haskell.org/tutorial/index.html for latest version.
Paul Hudak and Simon L. Peyton Jones and Philip Wadler (editors) [March 1992b], "Report on the Programming Language Haskell, A Non-strict Purely Functional Language (Version 1.2)", SIGPLAN Notices. See http://haskell.org/report/index.html for latest version.
Microsoft , DirectAnimation, in the Microsoft DirectX web page, http://www.microsoft.com/directx.
John Peterson and Gary Shu Ling , "Fran User's Manual", http://www.haskell.org/fran/fran.html
Simon Peyton Jones and Andre Santos , "Compiling Haskell by Program Transformation: a Report from the Trenches", ESOP '96: 6th European Symposium on Programming, Linkoping Sweden, April 22--24, 1996, Lecture Notes in Computer Science, Vol. 1058, Springer-Verlag Inc. http://www.dcs.gla.ac.uk/fp/authors/Simon_Peyton_Jones/comp-by-trans.ps.gz
Greg Schechter, Conal Elliott, Ricky Yeung and Salim Abi-Ezzi , "Functional 3D Graphics in C++ - with an Object-Oriented, Multiple Dispatching Implementation", in Proceedings of the 1994 Eurographics Object-Oriented Graphics Workshop. Springer Verlag, http://www.research.microsoft.com/~conal/papers/eoog94.ps
This paper was originally published in the
Proceedings of the Conference on Domain-Specific Languages,
October 15-17, 1997,
Santa Barbara, California, USA
Last changed: 15 April 2002 aw