Check out the new USENIX Web site.
Page 1
Towards a Dependable Architecture for
Internet-scale Sensing
Rohan Narayana Murty and Matt Welsh
Division of Engineering and Applied Sciences
Harvard University
{
rohan,mdw
}
@eecs.harvard.edu
Abstract
The convergence of embedded sensors and pervasive high-
performance networking is giving rise to a new class of distributed
applications, which we refer to as Internet-scale sensing (ISS).
ISS systems consist of a large number of geographically dis-
tributed data sources tied into a framework for collecting, filter-
ing, and processing potentially large volumes of real-time data. In
this paper, we discuss the issues involved in building dependable
ISS systems. ISS systems differ from conventional distributed
systems in a number of respects, including the number of data
sources, differing data quality requirements, and necessity to con-
tinue operating despite intermittent link and node failures. Such
failures should result in graceful degradation of the quality of the
results returned by the system, rather than loss of results.
In this paper, we argue that conventional approaches to achiev-
ing consistency do not scale to the requirements of ISS systems.
We outline a lightweight approach to dependability based on a set
of metrics that reflect on the quality of the answers returned by the
system. We argue that answers returned by an ISS system should
include a measure of the harvest and freshness of the data sources
participating in the result, and these metrics in turn can be used
to drive fault-tolerance mechanisms in the system. We also pro-
pose three simple techniques to achieve scalability and graceful
degradation in the face of failure.
1 Introduction
The convergence of embedded sensors and pervasive high-
performance networking is giving rise to a new class of dis-
tributed applications, which we refer to as Internet-scale
sensing (ISS). An Internet-scale sensing system consists
of a number of geographically distributed data sources
tied into a networked framework for collecting, filtering,
and processing potentially large volumes of real-time data.
Data sources include telescopes, satellites, seismometers,
or weather stations; corresponding scientific applications
include whole-sky surveys [10], automated pulsar detec-
tion [8], earthquake detection and characterization [5], and
environmental monitoring of large ecosystems [6]. In the
distributed systems community, ISS systems are being de-
veloped to support network performance monitoring [4],
distributed virus and worm detection [7, 21]. The common
theme across these systems is the acquisition and process-
ing of real-time data, yielding a macroscopic view of many
disparate data sources.
In this paper, we argue that ISS applications have funda-
mentally different data quality and reliability requirements
than conventional distributed systems. In particular, for ISS
systems to scale to vast numbers of data sources and si-
multaneous users, it is simply infeasible to expect the sys-
tem to achieve “reliability” in the conventional sense. Data
sources, network hosts, and network links experience fre-
quent and intermittent failures; yet, the system must con-
tinue to produce answers, possibly based on incomplete in-
put data. Failure (and concomitant quality degradation) is
the norm, rather than the exception, in such systems.
We claim that ISS systems require a new approach to de-
pendability that eschews the use of complex, heavyweight
consistency protocols in favor of leaner mechanisms that
scale well with the increasing number of data sources as
well as recognize the predominance of transient loss and
failure. The real-time nature of stream data processing dif-
fers greatly from distributed systems concerned with per-
sistent state. Rather than strive to achieve data consistency,
an ISS platform should provide feedback to end users on
the fidelity and coverage of the results returned by the sys-
tem. Such feedback can be tied to the accuracy of results,
and used to drive fault-tolerance mechanisms within the
system.
In this paper, we outline broad design principles for ISS
systems, and propose three simple techniques to achieve
scalability and graceful degradation in the face of failure.
These include:
• Structured operator replication, which replicates
data-processing operators relative to their importance
in the query;
• Free-running operator state, designed to support op-
erators with a bounded temporal dependence on their
input data; and
• Best-guess reconciliation, which filters or combines
results across multiple, possibly divergent, operator
replicas.
In this paper, we outline the requirements for ISS ap-
plications, describe previous approaches to building such
systems, and explain why these approaches fail to meet the
requirements. We then describe a range of data quality met-
rics for ISS systems, contrasting them to more conventional
distributed systems goals. Finally, we outline an approach
to architecting a scalable, robust ISS platform that supports

Page 2
these metrics, representing a significant departure from the
classic distributed systems focus on data consistency and
availability.
2 Background: Internet Scale Sensing
A broad range of Internet-scale sensing (ISS) systems are
currently under development in varying scientific domains.
Some important examples of ISS systems currently under
development include:
eScience applications, including the EarthScope [5] and
NEON [6] initiatives. EarthScope plans to deploy
thousands of GPS receiving stations and seismometers
across the North American continent to study plate
tectonics and earthquakes at unprecedented scales.
NEON is deploying networked arrays of mobile and
fixed embedded sensors, cameras, and weather sta-
tions for monitoring large ecosystems. The increased
activity in wireless sensor networks [28, 33] is one
driver of this area.
Network monitoring involving real-time data sources
on the Internet.
Examples include network tele-
scopes [22], distributed intrusion detection sys-
tems [3], and surveillance of blogs, RSS feeds, and
chat room activity for law enforcement [1].
Distributed surveillance using networked cameras [15],
microphone arrays (e.g., for localizing gunshots) [27],
and other sensors. For example, a global array of seis-
mometers, microphones, and radionuclide stations is
used to monitor compliance with the Comprehensive
Nuclear Test Ban Treaty [2] and locate sources of nu-
clear explosions.
Existing ISS systems are tied to very specific applica-
tions, although each faces similar challenges in terms of
scalability, robustness, and performance. To date, solutions
have been developed in an ad hoc manner, typically by
users outside of the distributed systems community. For ex-
ample, geodesic data from EarthScope is currently hosted
at a single FTP site in Colorado, from which users must
manually download datasets of interest. Apart from the ob-
vious scalability and reliability concerns, such an approach
requires users to download large volumes of data for local
processing, which is impractical for real-time applications
supporting many simultaneous users.
We believe that the development of a flexible, general-
purpose infrastructure to support ISS applications is an ex-
citing research agenda, and one that will increase in impor-
tance over time as more scientific disciplines tap into the
ability to link multiple data sources over the Internet. Such
an infrastructure could support multiple disparate applica-
tions in different domains, simplifying application design
through a common set of interfaces for data acquisition and
in-network processing. By pushing computation on the raw
data closer to its sources, network bandwidth can be con-
served. In many applications, intermediate results can be
shared across multiple users, offering a multiplicative re-
duction in network load.
Related work
A number of existing systems represent steps towards an
Internet-scale sensing platform. In many cases, such as the
various astronomical virtual observatory efforts [9, 11], the
infrastructure is highly specialized to a given application
domain. Many systems are currently centralized, as is the
case with EarthScope and the nuclear test ban verification
network. Numerous research efforts in wireless sensor net-
works [28, 33, 29, 32] are focused on application-specific
deployments focused on data collection, although the po-
tential to link these into a larger network infrastructure has
been raised [16, 15, 25]. The network monitoring commu-
nity has developed a range of distributed systems that can
be seen as prototypical ISS platforms, including distributed
honeypots [31], network telescopes [22], and intrusion de-
tection systems [3].
Extensive work on streaming database systems, such as
Borealis [12], PIER [20], and HiFi [16] look to apply the
relational query model to real-time streaming data. How-
ever, few of these systems have been concerned with scal-
ing up to large numbers of data sources or simultaneous
users. For example, Borealis has mainly been evaluated
on small configurations of less than six machines. While
PIER [20] is focused on scale, its approach to mapping
query operators to hosts using a P2P overlay ignores the
impact of increased latency and network load. Moreover,
as discussed in the next section, these systems do little to
address reliability in environments where failures of hosts
and network links are commonplace.
3 Issues and Challenges for Internet-Scale
Sensing
In this section, we outline a typical ISS system and con-
sider the challenges in designing such a system. We then
discuss to what extent failures affect results returned by the
system, and why previously proposed approaches fail to
address these problems. We argue that an ISS system must
provide feedback on its internal operation that can be used
to inform the end user of the quality of the end results, as
well as drive fault tolerance mechanisms within the system.
Let us first sketch a concrete ISS system that per-
forms large-scale network monitoring. Such an application
would consist of two components: (1) a large number of
data sources producing live network data, and (2) multi-
ple queries on the input data. Possible sources of network
data include router packet traces, honeypots, network tele-
scopes, BGP feeds, and firewall reports. The system could
support queries from a large number of users that mine this
rich, real-time data set to understand the network’s opera-
tion, as well as to detect anomalies, attacks, and virus/worm
propagation. Queries are processed by pulling data from
the data sources in real time, performing various filtering
and aggregation operations, and pushing periodic results
to end users. An example of a network monitoring query
might be “return the top 1000 destination IP addresses for
all packets leaving subnet a.b.c.d once every 5 minutes.”

Page 3
3.1 ISS system architecture
Given the large number of data sources and potentially
large number of concurrent queries in an ISS system, it
is infeasible for each user to download all data of inter-
est for local processing. To support such massive scale, a
number of systems have investigated the use of an overlay
network of hosts that collect, process, and deliver real-time
data [12, 15, 25]. While the details of these systems differ,
their high-level architectures are very similar: a user con-
structs a query that is typically realized as a tree of oper-
ators that filter and aggregate streaming data. The query
pulls data from multiple sources, where it flows up the
query tree until results are delivered to the end user.
By leveraging an overlay network and pushing query
processing closer to the data sources, the total network
load consumed by a set of queries can be greatly reduced.
This is especially important for data sources or users that
have low-bandwidth connections to the Internet (e.g., seis-
mic sensors connected via expensive and low-bitrate satel-
lite telemetry). In addition, numerous optimizations can
be performed within the overlay network. For example,
queries within similar data requirements and processing
can be merged, reducing computational load. Likewise, the
placement of operators within the overlay network can be
tuned based on latency, bandwidth, or load [25].
3.2 ISS challenges
The massive scale, data fidelity requirements, and real-time
nature of ISS systems give rise to a unique set of challenges
that differ somewhat from more traditional client/server or
peer-to-peer distributed systems. We outline these chal-
lenges below.
Scaling to support large numbers of data sources and
users: Harnessing data from vast numbers of real-time
sources is made difficult by intermittent failures of sources
and network links, as well as the difficulty of diagnosing
such failures (e.g., determining whether a source is offline,
or whether a link is faulty). Varying data rates, link laten-
cies, and bandwidth capacities raise concerns of operator
placement and link saturation. Scaling up to support many
simultaneous users and queries raises questions of effective
load balancing and bandwidth sharing across queries.
Failures and their effect on query results: The fail-
ure of an overlay host will affect the ability of an operator
to consume input data and produce results to push up the
query tree, leading to gaps or delays in the result stream.
Failures also affect the internal state of an operator. A state-
ful operator (e.g., one maintaining an average value over
some time window) may lose its state following a failure,
leading to errors in future query results after it recovers.
Likewise, network partitions lead to data loss even if the
overlay hosts themselves are reliable. Moreover, failures in
an ISS system have a cascading effect in that the inputs to
downstream operators in a query are impacted. As an ex-
ample, the failure of a node in the query tree can knock out
an entire subtree of the query, deeply affecting the quality
of results.
3.3 Previous approaches
A natural starting point for increasing robustness in an ISS
query is to replicate query operators across multiple phys-
ical hosts, using geographic diversity to avoid outages due
to network link failures. However, replication alone does
not address the problem of state divergence, which can
cause different replicas to report different results. An alter-
nate approach is to make use of state replication schemes,
which attempts to keep replicas in a consistent state. This
is the classic approach in managing replicated state in dis-
tributed systems, though we argue it is inappropriate for
ISS systems due to its high overhead, potentially requiring
message exchange on every tuple arriving at an operator.
While there exists a great deal of prior work on repli-
cation techniques and consistency protocols, we advo-
cate the use of lightweight techniques for the purposes
of achieving fault tolerance in ISS systems. The Tandem
system [14, 18] proposed using passive standby involv-
ing primary-backup pairs to insure against system failures.
Stronger versions of this approach, involving checkpoint
primary-backup state, have since been proposed in the lit-
erature [26]. We believe for the purposes of an ISS system,
this approach suffers from low availability and high over-
head.
In an active replication approach, all replicas receive the
same set of inputs (with ordering guarantees) and compute
the same state. This approach does not require replicas to
synchronize state. However, it hinges on the assumption
that all incoming data are delivered in order to all repli-
cas. In an ISS system with intermittent node failures, asym-
metric link failures (possibly caused by routing anomalies),
network partitions, processing delays, and variable network
latency, we explore the possibility of using active replica-
tion to provide high availability but with relatively lower
overhead. In particular, we relax the requirement of in-
order delivery at all replicas, which reduces overhead but
can lead to state divergence.
The Borealis [12, 13] system makes use of replication
but leverages eventual consistency semantics: an operator
recovering from a failure continues to report answers that
are marked as “tentative” until its input state is brought up
to date with its replica group. Each upstream operator must
buffer its history of output tuples in order to replay them
against downstream operators that recover from a failure.
The Borealis approach trades off sophisticated consen-
sus protocols with a buffer-and-replay strategy following
failures. There are several problems with this approach in
an ISS context. First, the history of output tuples that must
be stored by an operator could be arbitrarily long, and must
be recorded in persistent storage in case of failure. Second,
this approach assumes that failures are infrequent, requir-
ing an expensive replay of past tuples against a recovering
operator. In a system with frequent failures and high churn,
we expect such an approach will not scale well. Third,
Borealis does not attempt to bound the error for so-called
“tentative” tuples, so users have little information about the
quality of these results.

Page 4
In TRAPP [23], the authors explore the tradeoffs be-
tween precision and availability. Similar in spirit to our ar-
gument, the system permits the user to specify constraints
on the desired correctness of the end answer as well as
the availability of the system. Based on the set of con-
straints, the system attempts to mix and match cached data
as well as fresh data from the sources. This approach has
been studied assuming a fixed number of operator replicas.
Also, it is unclear how well TRAPP would scale to meets
the needs of an ISS system, given the large number of data
sources involved. TRAPP requires data sources to provide
upper and lower bounds on the numerical values produced
and these values are cached at replicas in the system. Hence
there is a significant amount of state that needs to be stored
and processed when the system has a large number of data
sources. This poses a challenge to its ability to scale.
Bayou [24] deals with weakly connected replicas that
are kept consistent using epidemic protocols. Similarly,
Astrolabe [30] makes use gossip protocols to achieve even-
tual consistency among replicas. TACT [19] explores a
continuous consistency model that allows one to study the
effects of protocols trading off availability for consistency.
Similar techniques could potentially be used in an ISS sys-
tem and we address this in the discussion section.
4 New Dependability Approaches for ISS
Internet-scale sensing systems demand new approaches to
dependability that take into account the scale and data-
quality requirements of end-user applications. We claim
that building an ISS system to conform to the traditional
metrics of availability and data consistency is neither feasi-
ble nor desirable. Failures and intermittent outages are the
norm, rather than the exception. Rather, the focus in ISS
should be on scaling to increasing numbers of data sources
and end users and graceful degradation of query results as
failures occur. Rather than strive to always report the “cor-
rect” answer to a query, the ISS platform should provide
the user feedback on the fidelity of the query result. Every
query result returned by the ISS system should carry with
it a set of quality metrics.
The quality metrics exposed by an ISS platform can be
divided into two categories: data-centric and operational.
Data-centric metrics are those that directly represent the
accuracy, timeliness, or certainty of a given result. In con-
trast, operational metrics represent the internal operation
and behavior of the ISS system in processing the input data,
such as the network latency experienced by tuples flowing
over a query tree.
A wide range of metrics could be supported by an ISS
system. There is a clear tension between exposing many
low-level details of the query’s operation (which may have
little meaning to an end user) and providing high-level
feedback on data quality (that may be too abstract to be use-
ful). Depending on the query semantics, it may be possible
for the system to directly estimate confidence intervals or
an error envelope, although in general this requires exten-
sive knowledge of the data sources and query operators.
Two specific quality metrics that we believe will be use-
ful in a wide range of circumstances are defined below:
Harvest: The fraction of data sources represented in an
answer to a query [17]. Harvest represents the cover-
age of a query result and is negatively affected by fail-
ures or loss within the query tree. Note that harvest is
not directly related to correctness, since a result with
a harvest of 100% can still experience errors due to
state divergence of operators. However, a high value
for harvest indicates good confidence that the answer
represents all of the input data sources.
Freshness: The age of the input data tuples represented in
the answer. Freshness is negatively affected by net-
work latency, system load, and buffer-induced delays
within the query tree. Freshness is bounded below by
the network diameter. In addition to reporting the old-
est tuple, the distribution of the ages of input tuples
can provide feedback on the timing variance of opera-
tors in the query.
While these metrics are primarily operational, they are
straightforward to measure online and somewhat intuitive.
Using an appropriate model of the data sources and query
semantics, it is possible to translate these values into data-
centric quality metrics such as error bounds.
4.1 Towards a new ISS system design
Through initial experiments with a large-scale network
monitoring application running on PlanetLab, we have de-
rived a simple set of design principles for architecting a
scalable, robust ISS platform. The techniques described
here are intended to sidestep the complexity of traditional
replication and consistency mechanisms, in recognition of
the differing data needs of ISS applications. We are cur-
rently developing an ISS platform based on these design
principles.
Structured operator replication
The first technique that we employ is structured operator
replication, in which each operator in the query tree can
be replicated either proactively (in anticipation of a fail-
ure) or reactively (in response to a failure). Replication can
increase harvest and freshness considerably, though at the
cost of increased resource consumption (of overlay hosts
and network bandwidth). Increased replication also has the
potential to aggravate the effects of state divergence.
We propose an approach to structured replication that
recognizes the varying impact of failures of different opera-
tors in an ISS system. In particular, the depth of an operator
in the query tree is directly related to its potential impact on
harvest; failures of operators at the leaves of the tree only
affect a few sources, while failures higher in the tree can
greatly reduce harvest. In our approach, operators higher
in the tree are given preference for replication, striking a
balance between resource requirements and resulting data
quality. Operators at the highest levels of the tree are repli-
cated proactively, while lower-level operators may only be
replicated following a failure.

Page 5
Free-running operators
Rather than maintaining strong consistency between oper-
ator replicas, we advocate allowing operators to “free run,”
updating internal state independently as incoming tuples ar-
rive. This approach obviates the need for expensive consis-
tency protocols, although each operator must push its out-
put tuples to all of its replicated downstream operators, in-
creasing bandwidth usage. This approach also simplifies
operator startup in case of failure: the recovering operator
starts with an empty internal state.
The downside to this approach is that an operator’s state
may diverge from its replica peers due to missing or de-
layed inputs and operator failure. However, we observe
that in ISS queries, typical operators have a finite (and of-
ten short) causality window that defines the set of past input
tuples that affect its internal state. For example, an opera-
tor performing a windowed average of the last 30 sec of
data has a causality window of only 30 sec. For typical
streaming query operators, the causality window is usually
the size of the operator’s input window, although it may
be a more complex function of the operator semantics. In
general, an operator with a causality window of w sec will
become consistent with its replica peers after running for
w sec, assuming all replicas receive all input tuples and no
additional failures occur during this window. Therefore,
the ISS system uses a form of eventual consistency across
operator state without explicit synchronization.
Best-guess reconciliation
A set of replicated operators will experience occasional
failures and restarts. As described above, eventual con-
sistency is achieved across the replicas because of the
bounded causality window. However, when replicas do di-
verge, we face the problem of determining which of the
results from the replica set represents the “best” answer to
push downstream. Our approach is to make use of a rec-
onciliation mechanism that filters the values from multiple
replicas before passing them into the next operator in the
query tree. We are currently investigating a range of recon-
ciliation policies, including:
• Choose the result from the replica with the longest up-
time. The intuition here is that as long as the node’s
uptime exceeds the causality window w, the operator
should be reporting the “correct” result, although in-
termittent losses can still result in errors.
• Choose the result from the replica reporting the high-
est harvest. This mechanism favors replicas culling
input data from a greater number of data sources. A
challenge arises in breaking ties if two replicas have
similar harvest but non-identical input sets (say, due
to network partitions).
• A voting scheme could be used that chooses the result
from the majority of replicas in agreement with each
other. This approach only works if there happens to be
a majority set of replicas reporting identical (or very
similar) results. A failure affecting multiple replicas
can remove this condition.
• Finally, results from multiple replicas can be com-
bined into a single result, if it makes sense to do so
given the semantics of the operator in question. For
example, for certain operators, it may be appropriate
to take the average or median value across all replicas.
Reconciliation pushes the results reported by a repli-
cated set of operators towards convergence. However, as
the number of replicas increases, so does the state diver-
gence across them. A useful operational metric for mea-
suring the effectiveness of reconciliation is group spread,
which we define as the degree to which replicas differ in
their results. A large group spread indicates that the repli-
cas as highly divergent and lowers confidence that the rec-
onciliation is choosing “correct” values. Low group spread
implies that replicas are reporting consistent results.
5 Discussion and Conclusion
Collecting and processing vast amounts of real-time data is
an important direction for distributed systems research. At
the same time, Internet-scale sensing raises new challenges
for dependability. The transient and real-time nature of
stream data processing differs greatly from distributed sys-
tems dealing with persistent state. We argue that ISS sys-
tems should be designed to offer feedback to end users on
the fidelity and coverage of the results returned by the sys-
tem, and make use of simple, lightweight replication tech-
niques. This is in stark contrast to current work on repli-
cated distributed systems that rely on heavyweight proto-
cols to achieve consistency.
This paper explores one end of the spectrum of operator
replication in stream-processing systems. We believe that
given the scale of ISS systems, overall system availability
should be achieved by tolerating graceful degradation in the
fidelity of the answers produced. This focus motivates the
need for lightweight concurrency mechanisms. We have
proposed the use of active replication, using free-running
operators coupled with a reconciliation mechanism that can
be generalized across various application domains. The
current proposal for replication does not affect the inter-
nal state of the replicas, relying instead upon the bounded
causality window to achieve eventual consistency. One
possible refinement is to introduce periodic synchroniza-
tion across operators, which is particularly attractive when
causality windows are large.
We are currently developing an ISS platform based
on our previous work on the stream-based overlay net-
works [25]. While many of the underlying techniques for
data collection, processing, and query optimization have
been explored by our work and others [12, 15, 20, 16], to
date no system has been proposed for Internet-scale sensor
networking that is designed to handle the scale and robust-
ness requirements outlined in this paper. This is a fertile
area for future research and requires taking a broad view
of data quality metrics, fault tolerance strategies, and dis-
tributed state management.

Page 6
References
[1] Chatguard. https://www.chatprotection.com/.
[2] Comprehensive test ban treaty organization (ctbto). https://www.
ctbto.org/.
[3] Distributed intrusion detection system. https://www.dshield.
org/.
[4] Distributed monitoring framework (dmf). https://dsd.lbl.
gov/DMF/.
[5] Earth scope. https://www.earthscope.org/.
[6] National ecological observatory network.
https://www.
neoninc.org/.
[7] Netbait. https://netbait.planet-lab.org/.
[8] Rapid telescope for optical response. https://www.raptor.
lanl.gov/.
[9] Skyview: The internet’s virtual telescope. https://skyview.
gsfc.nasa.gov/.
[10] Sloan digital sky survey. https://www.sdss.org/.
[11] Us national virtual observatory. https://www.us-vo.org/.
[12] D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherni-
ack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvk-
ina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis
Stream Processing Engine. In Second Biennial Conference on Inno-
vative Data Systems Research (CIDR 2005), Asilomar, CA, January
2005.
[13] M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker.
Fault-tolerance in the borealis distributed stream processing system.
In ACM SIGMOD Conf., Baltimore, MD, June 2005.
[14] J. Bartlett, J. Gray, and B. Horst. Fault tolerance in tandem computer
systems. In A. Avizienis, H. Kopetz, and J.-C. Laprie, editors, The
Evolution of Fault-Tolerant Systems, pages 55–76. Springer-Verlag,
Vienna, Austria, 1987.
[15] J. Campbell, P. B. Gibbons, S. Nath, P. Pillai, S. Seshan, and R. Suk-
thankar. Irisnet: an internet-scale architecture for multimedia sen-
sors. In Proc. the 13th annual ACM international conference on
Multimedia, November 2005.
[16] O. Cooper, A. Edakkunni, M. J. Franklin, W. Hong, S. R. Jeffery,
S. Krishnamurthy, F. Reiss, and E. Wu. Hifi: A unified architecture
for high fan-in systems. In Proc. the 30th International Conference
on Very Large Data Bases, August 2004.
[17] A. Fox and E. A. Brewer. Harvest, yield and scalable tolerant sys-
tems. In Proc. the 1999 Workshop on Hot Topics in Operating Sys-
tems, Rio Rico, Arizona, March 1999.
[18] J. Gray. Why do computers stop and what can be done about it?
In Symposium on Reliability in Distributed Software and Database
Systems, pages 3–12, 1986.
[19] Haifeng Yu and Amin Vahdat. The Costs and Limits of Availability
for Replicated Services. In Proceedings of the 18th ACM Symposium
on Operating Systems Principles (SOSP), October 2001.
[20] R. Huebsch, B. Chun, J. M. Hellerstein, B. T. Loo, P. Maniatis,
T. Roscoe, S. Shenker, I. Stoica, and A. R. Yumerefendi. The ar-
chitecture of pier: an internet-scale query processor. In Proc. the
Second Biennial Conference on Innovative Data Systems Research,
January 2005.
[21] Hyang-Ah Kim and Brad Karp. Autograph: Toward Automated,
Distributed Worm Signature Detection. In 13th Usenix Security
Symposium (Security 2004), August 2004.
[22] D. Moore, C. Shannon, G. M. Voelker, and S. Savage. Network
telescopes: Technical report. Technical Report UCB/CSD-00-1096,
Cooperative Association for Internet Data Analysis, April 2004.
[23] C. Olston and J. Widom. Offering a precision-performance tradeoff
for aggregation queries over replicated data. In The VLDB Journal,
pages 144–155, 2000.
[24] K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and A. J.
Demers. Flexible update propagation for weakly consistent repli-
cation. In Proceedings of the 16th ACM Symposium on Operating
SystemsPrinciples (SOSP-16), Saint Malo, France, 1997.
[25] P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh,
and M. Seltzer. Network-aware operator placement for stream-
processing systems. In Proc. the 22nd International Conference on
Data Engineering (ICDE’06), August 2006.
[26] A. Ray. Oracle data guard: Ensuring disaster recovery for the enter-
prise. Technical report, An Oracle white paper, March 2002.
[27] G. Simon et al. Sensor network-based countersniper system. In
Proc. ACM SenSys ’04, November 2004.
[28] R. Szewczyk, A. Mainwaring, J. Polastre, and D. Culler. An analy-
sis of a large scale habitat monitoring application. In Proc. Second
ACM Conference on Embedded Networked Sensor Systems (Sen-
Sys), 2004.
[29] G. Tolle, J. Polastre, R. Szewczyk, D. Culler, N. Turner, K. Tu,
S. Burgess, T. Dawson, P. Buonadonna, D. Gay, and W. Hong. A
macroscope in the redwoods. In Proc. the Third ACM Conference
on Embedded Networked Sensor Systems (SenSys 2005), November
2005.
[30] R. van Renesse, K. Birman, and W. Vogels. Astrolabe: A robust
and scalable technology for distributed system monitoring, manage-
ment, and data mining. In ACM Transactions on Computer Systems,
volume 21, pages 164–206, May 2003.
[31] M. Vrable, J. Ma, J. Chen, D. Moore, E. Vandekieft, A. C. Snoeren,
G. M. Voelker, and S. Savage. Scalability, fidelity, and containment
in the Potemkin virtual honeyfarm. In SOSP ’05: Proceedings of the
twentieth ACM symposium on Operating systems principles, pages
148–162, New York, NY, USA, 2005. ACM Press.
[32] H. Wang, D. Estrin, and L. Girod. Preprocessing in a tiered sensor
network for habitat monitoring, 2002.
[33] G. Werner-Allen, K. Lorincz, M. Ruiz, O. Marcillo, J. Johnson,
J. Lees, and M. Welsh. Deploying a wireless sensor network on an
active volcano. IEEE Internet Computing, Special Issue on Data-
Driven Applications in Sensor Networks, March/April 2006.