Check out the new USENIX Web site.

Home About USENIX Events Membership Publications Students
WORLDS '04 Paper    [WORLDS '04 Technical Program]



Lessons from E-speak

Alan H. Karp
Hewlett-Packard Laboratories
alan.karp@hp.com, https://hpl.hp.com/personal/Alan_Karp

Abstract

E-speak was the technology base for HP's E-services initiative, which was announced in 1999. It was designed to be a scalable distributed system, and it met all of its design goals. Although it's no longer around as a supported product, the lessons we learned, both positive and painful, may be helpful to others.

1  Introduction

The basic idea behind e-speak was to improve interoperability in distributed systems that crossed administrative domains by turning everything into a service. Today this approach is called Web Services or the Service Oriented Architecture. For that reason, e-speak has been called ``the industry's first web services platform.'' [5] and ``web services before there were web services''. The most succinct definition is ``E-speak is roughly what you'd get if you crossed CORBA with LDAP and simplified the resulting mess a bit.'' [6]. Of course, e-speak preceded SOAP, WSDL, and even the widespread acceptance of XML, but all the essential elements of web services, and more, were there.

At one point, over 60 companies were evaluating or using e-speak [5]. Nevertheless, work on e-speak stopped when HP dropped its entire suite of middleware products in 2002. At that time, there were four major users of e-speak, several of whom continued to use their e-speak platforms for a year or more.

We did some things right, and we did some things wrong. After briefly describing what problems e-speak was intended to solve, I'll enumerate some of the lessons we learned. It has been said that a fool learns from his own mistakes, a wise man learns from the mistakes of others. At best we fell into the former category. The goal of this paper is to help you land in the latter one.

2  Architecture

The name e-speak was applied to two, somewhat different architectures. The first was built to be a single system image for the Internet [1]; the second was a B2B platform [2]. Because of their different goals, they used somewhat different mechanisms for authorizing access. However, they were both based on the same set of assumptions.

Large scale: E-speak was designed for a million machines. Hence, it did not have anything centralized and couldn't rely on ever being in a consistent state.

Dynamic: Something is always changing. It is important that we not require developers to deal with such a dynamic environment; it's hard enough to write applications in a static world. The e-speak platform hid many of the changes from the applications.

Heterogeneous: The world is heterogeneous and getting more so, and not just in hardware platform or operating system, but in device capability as well. E-speak's distribution model allowed devices to implement as much or as little of the protocol and environment as appropriate.

Hostile: As we know, there are bad people out there. Some are bad for financial gain; others merely for the challenge of breaking things. Security is critical, but it can't make the system too rigid. E-speak's security mechanisms allowed distrustful parties who implemented completely different security policies to interact while controlling their risk.

Many fiefdoms: There are a variety of organizations that want to use such systems, but getting them to change how they do things is difficult. Getting them all to agree on a single way is nearly impossible, and once you've got agreement, making changes is even worse. E-speak allowed interoperation even across organizations with incompatible policies.

3  Dos

We did some things right; we did some things wrong. This section summarizes some of the lessons from what we did right.

Don't put policy into the architecture. Too many systems build specific policies into the architecture. For example, mandating a specific digital certificate format for carrying information requires everyone to use this format, even if they already use a different format internally. Basing access control on identity requires a unified identity scheme across the entire environment. Most military systems build a particular version of multi-level security into the architecture, which makes it hard for them to interact with those with a different definition or number of security levels. E-speak's flexible mechanisms allowed us to implement a wide variety of policies.

People who never interact should not have to agree. Too many of our distributed systems require global agreement. In the best of these systems, it is only the version of the protocol that everyone must use. Even this level of agreement is too much, since it requires synchronous upgrades, which is clearly untenable in a large scale system. More commonly, numerous policies are hardwired.

The most common problems concern naming and ontologies. Too many systems require the entire system use a single name system and a single, global ontology. Since global agreement is needed, updates take too long. People either abandon the system or implement updates in their own communities, fragmenting the system into incompatible parts.

Nearly everything in e-speak was pairwise. We also decided that it was all right if two parties could not communicate. This decision meant that many problems of upgrading components could be left to individual policies instead of being specified in the architecture.

Think about security early and often. Everyone says that you've got to include security from the beginning, but few do, at least in a meaningful way. The first question to be answered is what you mean by security. You must identify what assets you're protecting and the threats you're protecting them from. Only then can you define your security mechanism. Be careful, though, it is easy to architect policy instead of just mechanisms.

In e-speak everything was a service, so it made sense to control access to the methods provided by the service. Because of the large scale and different administrative domains, basing access control on identity was clearly untenable. Hence, we settled on a capability-like approach [4]. Limiting ourselves to mechanisms proved its value when we found that we could enforce such disparate policies as Unix-style security, multiple security levels, and compartments without any change to the mechanisms.

Think about naming early and often. Designers of distributed systems invariably assume that the name space can be partitioned. However, in a dynamic environment with hostile participants, this assumption is unwarranted. The problem with names is that they reside in many places -- programs, data files, even people's heads. The goal is to build a name system in which applications don't break when someone renames something.

It has been shown that no single naming system can be human meaningful, securely collision free, and globally context free [11]. A human meaningful name has meaning in some particular context. URLs have this property to some extent, but they lack other desirable properties. Securely collision free means that names can't be spoofed. Clearly, return addresses on email do not have this property. The usual approach is to use a private key in a private/public key system to construct the name. Globally context free means that the name doesn't depend on the location of the namer or the named. Sufficiently large random numbers have this property.

It is also important that names in a large scale distributed system have both spatial and temporal integrity. Spatial integrity means that the name used for something doesn't change when the namer or the named changes locations, as it does today when moving from inside to outside a corporate firewall. Temporal integrity means that the name shouldn't change because the passage of time caused some external factor to change, as it does today when companies merge or when a private key used to construct a name must be changed.

E-speak was based on path based names. Each client, sort of like a process, had a private name space. Pairwise translation was used to move the request between the namer and the named. The advantage was that any pair could change the name used without affecting anyone else along the path. The disadvantage was that names had no meaning out of band. Also, if an intermediary was unavailable, the service was unreachable. However, paths could be shortened by introduction.

Avoid special cases. Special cases are an architectural nightmare, a development nightmare, a maintenance nightmare. We went to considerable effort to avoid special cases in e-speak, and it paid benefits. The service engine, analogous to an operating system kernel, was relatively small and only had about a dozen distinct resource types to deal with.

Sticking to this policy also had an unexpected benefit. E-speak provided for service discovery with constraint based search using vocabularies, an ontology representation based on attribute value pairs. Since everything in e-speak was a service, so were vocabularies. That meant a vocabulary could be advertised as would any other service. A search might turn up services and new vocabularies that extended the descriptions used to find them. The result was that e-speak provided a dynamically extensible ontology framework.

Plan for delegation/Plan for revocation. Most of the systems we use today depend on identification or authentication to determine access. There are many problems with such an approach, such as confused deputy attacks [3], but the biggest problem is the inability to designate a delegate. The unfortunate result is that people tend to share their identities.

Delegation also simplifies management when access crosses administrative domains. In today's world, I get a list of employees (or roles, it doesn't much matter) in your company who are authorized to use my service. When someone in your company changes jobs or a role changes responsibility, you tell me, and I update my list. The problem is that we each have thousands of partners and spend all our time updating our respective lists. With easy delegation, I give your company a delegatable right to use my service. How you manage it is up to you. If the right is easily revoked, you can delegate it to the appropriate subset of your employees.

Support Voluntary Oblivious Compliance. There will be people who want to break the rules. Not just strangers, but people in your organization, too. Unfortunately, there's nothing you can do to prevent misuse of a legitimate authority. Don't even try. In other words, DPWYCP (Don't Prohibit What You Can't Prevent). However, those who want to follow the rules need some help. The rules are complex, and they frequently change. If your system requires that everyone know these rules in order for them to be enforced, they won't be.

E-speak supported what we now call ``Voluntary Oblivious Compliance'' (VOC). Let's say you ask me for access to a service. Should I give it to you? I could certainly expend some effort to find out, but there's a race condition. Your access might be revoked just after I ask. E-speak took a different approach. I'd just send you a reference to the service, and the platform would prevent you from using it if you shouldn't have gotten it. The specific mechanism, negative permissions from split capabilities [4], isn't as important as the ability to support the concept.

Design for Consistency Under Merge. People will build private copies of your distributed system. At some later date they will want to merge these copies. If you're not careful, one side or the other will have to go through painful modifications in order to eliminate conflicts between the systems.

You can't rely on a partitionable name space to help you, either. A major supercomputer center spent several weeks trying to merge two large clusters until they discovered that two ethernet cards had the same MAC address. In fact, AOL at one time assigned the same MAC address to every dial-in user. Several attempts at merging private UDDI repositories failed due to conflicting GUIDs, even though the GUID algorithm supposedly generates unique strings. E-speak was consistent under merge because of the name system and the distribution model.

Don't authenticate when you want to authorize. Too many times our systems ask ``Who are you?'' when they want to know if your request should be honored. Most times knowing who you are doesn't carry enough information to make an informed decision. If my access is granted because I work for a business partner, you don't want to know who I am. You only want know that business partner has authorized me to make the request.

Relying on identity also makes delegation difficult. The unfortunate result is that people share their passwords, which loses a valuable use for identity, audit trails. A final problem is that it isn't a person making the request; it's software. There is no assurance that the software is acting in the user's best interest. A virus certainly doesn't. If access is controlled by identity, then the software necessarily runs with the user's privileges. E-speak properly separated identification, authentication, authorization, and access control.

4  Questionable Decisions

Some decisions we made weren't obviously right or wrong. We had good reasons for doing what we did, but perhaps those reasons didn't justify the decisions.

We listened to the experts even when we knew their conclusion was based on a misconception. Shortly before the release of the Beta version, we commissioned a well-known security firm to do a review of the architecture. The rules they insisted on were that we give them any documentation we had, and they would prepare a report. No interaction with them was allowed during the review process, and their conclusion was final. No response from us would affect their report.

They stated that the security model was flawed because it was based on ``name hiding''. We knew this statement was due to a misconception on their part, but there was no procedure for correcting it. Nevertheless, this review was an important factor in the decision to change the basic access control mechanism.

It was too late to change the architecture for the Beta release, but the decision was made to change the architecture for Release 1.0. I argued for keeping split capabilities in spite of the flawed review. The main opposing argument was that we could hardly go to market with a version that a well-known security company said wasn't secure. I felt that we could if we showed the flaw in their understanding of the system. Needless to say, I lost that argument. Split capabilities were replaced with Simple Public Key Infrastructure (SPKI) attribute certificates.

The change to SPKI certificates had a number of effects. First of all, the architecture was better suited to a B2B environment. It made certain end-to-end guarantees easier to enforce and, equally importantly, easier to explain. SPKI, because of its PKI heritage, was more familiar to our customer base, even though our use of the certificates was quite different from their conventional use. Increasing marketability in B2B was an attractive proposition considering the projections of a market size of $1,600B by 2004 [10].

The downside was that we no longer had a general purpose, low latency base. When the B2B boom failed to materialize, we couldn't easily build other kinds of systems. The very features that made the e-speak product attractive to our B2B customers, particularly the heavy reliance on expensive cryptographic protocols, made it hard to go after other businesses. For example, e-speak could have made a good platform for building a distributed collaboration system, but the SPKI operations made the latencies too large for interactive use.

We sacrificed an important architectural principle. We wanted the core (analogous to an OS kernel) to handle one request in its entirety before starting work the next one. Doing so would avoid any possible race conditions since message handling would be logically atomic. Of course, that meant that our throughput was controlled by the slowest commands. Discovering locally available resources used an SQL-like repository lookup, which was very slow when the repository was large.

We tried to find a way to make the repository a client of the core, but we didn't like the idea that this client would be able to use all system resources. Subverting this client would give an attacker full control over the system. We ultimately decided to add additional threads to the core solely to handle lookup requests. The result was a considerable increase in the core's complexity. We might have done better by taking our chances with an external lookup service.

We invented something new before thinking hard about possible optimizations. The original prototype for e-speak used capability lists (c-lists) for access control. Each client had a list of resources it could name. No attack was possible against a resource not in this list. Each entry in the c-list included the specific permission, such as read or write, it granted. Each separate permission required a separate c-list entry. The problem was that each entry had a considerable amount of metadata, typically 100-1,000 bytes. It was this extra metadata that concerned us.

Since most of the metadata was the same on all the entries for a given resource, we thought we could come up with an optimization. However, we decided to implement split capabilities [4] instead. Split capabilities solved the metadata problem and made configuring a variety of security policies quite straightforward. However, by separating designation from authorization we opened up the system to some attacks. More importantly, adding something completely new made it even harder for people to understand the system.

We didn't control the urge to add new features. Part of the reason e-speak was hard to understand was that we had too many features. They were all useful in the sense that every feature was used in least one deployment, but it took a trained eye to sort through the features to find the relevant ones for a particular service. Paraphrasing Einstein, ``E-speak needed to be as simple as possible but no simpler''. It wasn't. Even if the architecture included all the features, we would have done better had we turned on only a basic set for the earlier releases. Once developers got used to thinking the e-speak way, we could have made the additional features available in steps. The hard part, of course, is deciding which features are really needed.

5  Don'ts

Flaws in an architecture are usually subtle and depend on specific details. After all, if they were obvious, they wouldn't have made it into the release. Understanding these flaws, then, necessarily requires a reasonably deep understanding of the architecture. Space does not permit a sufficiently detailed description of e-speak here. Still, it's important to document what was wrong. See the relevant architecture documents [1, 2] if you want to understand this section.

Don't let the implementation drive the architecture. Our first implementation was dreadfully slow. Some investigation showed that each name resolution required an average of 500 hash table lookups. Some people argued that we should fix the problem by abandoning our naming scheme. Here we held firm. After some tuning, we reduced the name resolution to an average of two hash table lookups.

We weren't so fortunate in two other cases. The initial architecture called for the explicit rights being requested, e.g., read or write, to be associated with the corresponding resource. Somewhere along the line, this feature got optimized so that there was only one place to specify the rights for all the resources named in a request. This change made the system susceptible to a confused deputy attack [3].

The second failing was the handling of incoming messages. The original architecture specification was unclear on whether all messages went to one inbox or there was a separate inbox for each message. The latter is more secure since there's less chance of mixing the authorities from different senders. However, the former had an even worse problem. Since there was no way for the system to know when the client no longer needed an entry, they just kept piling up until the JVM ran out of memory. This choice also meant that the sender could not choose the name the receiver would use for the resource, a definite change in the architecture.

Don't make the easy way the insecure way. We wanted to be able to delegate revocable authorities, so we introduced the idea of key rings. Each user also had a mandatory key ring, which was needed to support negative permissions [4]. Unfortunately, we let users add keys to their mandatory key ring. The result was that users tended to put all their keys on this key ring, making them susceptible to confused deputy attacks.

Even had we not given users access to their mandatory key rings, having such a thing in the system encouraged grouping large collections of authorities. Later in the process we realized that we could clone keys to make revocable authorities, obviating the original reason for key rings.

Don't ignore end-to-end issues. The Beta release was path based; all requests passed through a series of intermediaries unless the end points were introduced explicitly. While the message payloads could be encrypted, the headers were necessarily visible to the relaying machines. The original architecture did not attempt to protect the headers from prying or tampering. By the time the problem was identified, signing the headers to prevent tampering was a problem because the naming system required a change on each hop. A proposal to implement tunneling (e-speak in e-speak) was never implemented because the decision was made to change the architecture.

Don't forget about performance. The Beta release was designed to be sort of like an operating system, so we worried quite a bit about performance. Any interaction between machines that exceeded a few tens of ms was a target for optimization. The e-speak product was built for a different environment, B2B systems. A B2B interaction often takes minutes or even days to complete. That being the case, we didn't worry too much about performance. The unfortunate consequence was that setting up a connection took several seconds of CPU time, and each interaction involved a reasonably expensive cryptographic calculation. Although the latency of these steps wasn't an issue, the CPU load was. As a consequence, our customers needed more machines than they would have liked to support their business partners. Planned optimizations never got into the system.

6  Why E-speak Died

HP abandoned E-speak in 2002 for a number of reasons. Had we been able to build a larger user base in the time we had, e-speak might still be in use. Unfortunately, we made some mistakes that slowed e-speak's spread.

We didn't do a good job explaining the system. When we described all of what e-speak could do, customers told us it was too much to grasp. When we explained only the part relevant to their particular environment, they told us it was just CORBA or DCE or ...(fill in your favorite environment). We never did find the middle ground.

Although e-speak has fewer than 20 components that need to be understood, the system was quite flexible because of the way they could be combined. For example, events are based on e-speak vocabularies and split capabilities. That meant that events could have a variety of personalities depending on how these pieces were used. We kept trying to show users how flexible the system was, which only served to confuse them. We would have been better had we settled on a small number of use patterns to present.

We made people change their mental models too much. We thought we were starting a revolution, but businesses want evolution. Kannan Govindarajan, one of the e-speak architects, has said that we needed to ``evolve users, not revolve them''. Indeed, some of them told us that we made their heads spin.

Once we got past that barrier and started showing people how to use it, we ran into another problem; developers had to change their way of thinking. It isn't difficult for people to think of a print service as a service, but we also wanted them to think of the file being printed as a service. Doing so had a lot of advantages, such as not accidentally printing a confidential document on the printer in the lobby. Once users adopted this way of thinking, they found their problems were easier to solve. We just didn't have a good way to get them across that barrier.

We focused on the technology, not the business problem. We're technologists, and we developed some pretty neat technology. Unfortunately, we turned off some customers who wanted us to help them solve their problems, not demonstrate our cool stuff. The customers we ended up with were the ones who couldn't see any other way to build their business, so they listened to our techno-babble. We might have built a big enough customer base to avoid being shut down had we focused more on the customer's problems.

We didn't give adequate attention to the development environment. It's one thing to have a good solution to people's problems, but it's got to be something they want to use. If the only debugging tool is ``System.println'', you're not going to attract many developers. It's a credit to e-speak's potential that we had as many as we did. We would have been better off devoting a bigger part of our budget to building a good development environment and a smaller part to adding more features.

We didn't devote enough resources to the Open Source effort. We knew that keeping e-speak proprietary to HP would doom it, so we released it under the GNU Public Licenses. Unfortunately, we didn't realize until too late that this was not enough. We needed to devote substantial resources to building a community of developers. Without that push on our part, we never developed a critical mass that could convince potential customers that e-speak wasn't just an HP product.

We worked in the wrong industry. E-speak was software. Even worse, it was middleware, which is software that's supposed to be invisible. HP at that time was largely a hardware company. In fact, we started in the business unit that sold HP-UX servers. Everything about software is different from hardware. It's made differently; it's sold differently; it's procurement cycle is different; it's support structure is different. All these differences made it hard both for HP management to know how to deal with it and for customers to deal with HP in this new way. HP could certainly make a wonderful refrigerator, but it would be hard to break into the market because refrigerators are so far from customer's expectations of HP. In 2000, middleware was farther from HP's core business than are refrigerators today.

7  Conclusions

E-speak is no longer a supported product, but aspects of it still live on. NTT still has a web site describing the services platform it built with e-speak [9]. More significantly, various aspects of e-speak have influenced the development of the E language for secure distributed computing [8]. The e-speak vocabulary system [7] has taken on a life of its own because of the way it solves the problems people try to address with global ontologies. Finally, some groups trying to build scalable systems with the web services standards and finding them inadequate are taking a look at e-speak. Maybe our biggest mistake was being too early.

8  Acknowledgements

Guillermo Rozas, Arindam Banerji, Rajiv Gupta, and I are mainly responsible for any failings in the original architecture. Nigel Edwards and Michael Wray made important contributions to the product version.

9  Availability

E-speak was released under GNU Public Licenses and is available for download from the author's web site.

References

[1]
Hewlett-Packard Company. E-speak Architecture Specification, September 1999. https://www.hpl.hp.com/personal/Alan_Karp/espeak/version2.2/Architecture_2.2.pdf.

[2]
Hewlett-Packard Company. E-speak Architectural Specification, Release A.0, January 2001. https://www.hpl.hp.com/personal/Alan_Karp/espeak/version3.14/Architecture_3.14.pdf.

[3]
Norm Hardy. The confused deputy. Operating Systems Reviews, 22(4), 1988. https://www.cap-lore.com/CapTheory/ConfusedDeputy.html.

[4]
Alan H. Karp, Guillermo Rosas, Arindam Banerji, and Rajiv Gupta. Using split capabilities for access control. IEEE Software, 20(1):42--49, January 2003. https://www.hpl.hp.com/techreports/2001/HPL-2001-164R1.html.

[5]
Jim Kerstetter and Peter Burrows. HP's e-speak: Good products, botched marketing. Businessweek Online, July 3 2000. https://www.businessweek.com/2000/00_27/b3688173.htm.

[6]
Eric Kidd. CustomDNS. https://customdns.sourceforge.net/internals.php.

[7]
Wooyoung Kim and Alan H. Karp. Customizable description and dynamic discover for web services. ACM Conference on Electronic Commerce (ACM EC'04), 2004. https://www.hpl.hp.com/techreports/2004/HPL-2004-45.html.

[8]
Mark Miller. Open source distributed capabilities. https://erights.org.

[9]
NTT. TeaTray. https://www.nttcom.co.jp/teatray/english/base/.

[10]
Rob Rosenthal. The Internet commerce market model: B2B versus B2C around the world. Technical Report 22745, IDC, July 2000.

[11]
Bryce Wilcox-O'Hearn. Names: Decentralized, secure, human-meaningful: Choose two. https://zooko.com/distnames.html, September 2003.

This document was translated from LATEX by HEVEA.

This paper was originally published in the Proceedings of the First Workshop on Real, Large Distributed Systems,
December 5, 2004, San Francisco, CA

Last changed: 23 Nov. 2004 ch
WORLDS '04 Technical Program
WORLDS '04 Home
USENIX home