| ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MobiSys '03 Paper   
[MobiSys '03 Tech Program Index]
Energy-Conserving Data
Placement and Asynchronous Multicast in Wireless Sensor Networks
Sagnik
Bhattacharya, Hyung Kim, Shashi
Prabh, Tarek Abdelzaher
Department of Computer Science
University of Virginia
Charlottesville,
VA 22904
Abstract In
recent years, large distributed sensor networks have emerged as a new
fast-growing application domain for wireless computing. In this paper, we
present a distributed application-layer service for data placement and asynchronous
multicast whose purpose is power conservation. Since the dominant traffic in a
sensor network is that of data retrieval, (i) caching mutable data at locations
that minimize the sum of request and update traffic, and (ii) asynchronously
multicasting updates from sensors to observers can significantly reduce the
total number of packet transmissions in the network. Our simulation results
show that our service subsequently reduces network energy consumption while
maintaining the desired data consistency semantics. Sensor
networks are ad hoc wireless networks made of large numbers of small, cheap
devices with limited sensing, computation, actuation, and wireless
communication capabilities. Such a network, for example, can be dropped from the
sky on a disaster area to form collaborative teams of programmable nodes that
help with rescue operations. Sensor networks are made possible by advances in
processor, memory and radio communication technology, which enable low-cost
mass-production of sensor-equipped wireless computing nodes. The sensor network paradigm is motivated by applications such as guiding
rescue efforts in disaster areas, monitoring poorly accessible or dangerous
environments, collecting military intelligence, tracking wild-life, or
protecting equipment and personnel in unfriendly terrains. In such environments, it is usually
impractical to build fixed infrastructures of powerful and expensive nodes.
Instead, the sensor networks philosophy advocates the use of myriads of inexpensive
nodes strewn arbitrarily in the environment and left largely unattended. The primary function of sensor networks is the collection and delivery of
sensory data. Power is identified as one of the most expensive resources. Due
to the difficulty of battery recharging of thousands of devices in the remote
or hostile environment, maximizing battery lifetime by conserving power is a
matter of great importance. In this paper, we develop a distributed framework that improves power
conservation by application-layer sensor data caching and asynchronous update
multicast. The goal of the framework is to reduce the total power expended on
the primary network function; namely, data collection and delivery. The importance of optimizing communication cost is also supported by
measured data from recent prototypes of sensor network devices, which show that
the main power sink in the network is, indeed, wireless communication. For
example, the Berkeley motes [15] consume 1 mJ for transmitting and 0.5 mJ for receiving a single bit, while the CPU
can execute 208 cycles (roughly 100 instructions) with 0.8 mJ. Assuming full load, CPU power consumption is about 10mW, compared to
50mW for the radio. The high power cost of communication makes it a prime
candidate for optimization. The remainder of this paper is organized as follows. Section
2 presents the service model and the formulation of the power minimization
problem. Section 3 presents the details of the data
placement middleware and its API. Section 4 presents an
evaluation using experimental as well as simulation results. Section
5 reviews the related work. The paper concludes with Section
6. 2.
Service Model Consider
a dense ad hoc wireless sensor network with multiple observers, spread
over a large monitored area. At any given time, the observers’ attention is
directed to a relatively limited number of key locales in the network, where
important events or activities are taking place. We call them focus locales.
For example, in a disaster area scenario, rescue team members may be interested
in monitoring survivors. The locations of found survivors therefore represent
the focus locales of this application. The total number of sensor nodes is
assumed to be much larger than the number of focus locales at any given time. Sensor nodes at each focus locale elect a local representative for
communication with the rest of the world. Distributed leader election
algorithms may be borrowed for this purpose from previous literature and are
not the goal of this paper. Our service adopts a publish-subscribe model, as
shown in Figure 1. In this model, each representative
publishes sensory data about its focus locale to observers who subscribe to a
corresponding multicast group to receive such data. The size of the published
update stream originating at a given locale is time-varying, depending on the
volatility of the environment and the type of sensors involved. An environment,
which changes frequently, will generate more update traffic than a quiescent
environment. Similarly, sound sensors (microphones) will generate more traffic
than temperature sensors. Contrary to previous multicast frameworks for sensor networks, update
traffic is multicast from focus locales to receivers in an asynchronous
manner. Data caches are created at the nodes of the multicast tree. A lazy
algorithm is used for propagating data updates among neighboring caches along
the tree in the direction of the receivers. These receivers may be wireless
hand-held devices or laptops, for example, in the possession of rescue team
members operating in a disaster area. We assume that receivers do not move, or
move slowly compared to communication delays in the network.
While in this paper we do not consider streaming multimedia, an argument
in favor of addressing such traffic in sensor networks is that more powerful
devices may become available in the foreseeable future at an affordable price.
We argue, however, that advances in sensor network technology are most likely
occur in two directions: developing more powerful devices of the same
form factor, and developing smaller devices of the same processing and
communication capacity. Research reported in this paper is more relevant to the
latter direction. In our model, observers who join the asynchronous multicast tree specify
a period at which the requested data should be reported. Flurries of changes in
the environment need not be individually reported if they occur at time-scales
smaller than this period. Different observers may specify different period
requirements for the same measurement. For example, an observer who is close to
the measured activity may request a higher reporting rate than a distant
observer. Our middleware achieves four main functions; (i) it determines the number
of data caches for each focus locale, (ii) it chooses the best location for
each cache such that communication energy is minimized, (iii) it maintains each
cache consistent with its data source at the corresponding focus locale, and
(iv) it feeds data to observers from the most suitable cache instead of the
original sources. A key difference between this problem and the problem of caching in an
Internet context is that in the latter case, the topology of the network
restricts the choice of cache locations. In contrast, we assume a sensor
network that is dense enough such that a data cache can be placed at any
arbitrary physical location in the monitored region, offering new degrees of
freedom to the data placement algorithm. Another key difference is that the
number of Internet proxy caches is typically much smaller than the number of
different web sites. Hence, such caches are centralized powerful machines,
which gather and retain content from a large number of distributed sources. In
contrast, in this paper, we consider a middleware caching service, which runs
on every sensor node. Since the number of sensor nodes is larger than
the number of focus locales, the storage requirements of this service on any
single node are very small. We assume that sensor nodes know their location. Algorithms for estimating
geographic or logical coordinates have been explored at length in the sensor
network research [5][6]. These efforts address
the problem of location awareness using algorithms that do not require high
cost devices such as GPS on every node. Classical ad hoc wireless routing
protocols like AODV[8], and DSDV [9]
may be used along each unicast edge of our data dissemination tree. These
protocols, however, are not location-aware which may affect performance.
Several more recent adaptations such as Location-aware routing (LAR) [7] and geographical forwarding [4] make use of the location
information. These routing algorithms would be a natural choice for the network
layer underneath our service. We now formulate our data placement problem
mathematically. 2.1.
Problem formulation Consider
a sensor network that is monitoring a set of focus locales at which events of
interest occur. Given a locale (X,Y) in a sensor network, let BS= {BS1, BS2,….., BSM} be a set of M
observers that request data from that locale with rates Rreq={R1,
R2,…., RM}. Let sensor updates at (X,Y) occur at an average
rate Rupdate. A tree of copies is created for the sensor
as shown in Figure 2.
We define the cost of message transfer between two nodes in the tree as
the power expended on a packet’s transfer on the shortest route multiplied by
the packet rate. Consider the case of placing a single data copy to minimize
cost as defined above. Let the data copy be placed at a distance ni
hops from the ith observer and at a distance nsens
hops from the sensor node serving the data. In a densely populated network, the
hop counts will be large. The cost of sending a single packet is proportional
to the hop count. Hence, the net cost
of serving all observers is: T = nsens . Rupdate
+ å1£i£M ni . Ri (1) To place the copy at the optimal
location, T has to be minimized. Figure 3 shows the
situation with three observers. We can reduce this problem to the following
geometric optimization. Given N points, where point i is at
location (Xi,Yi), find a point (x,y) such
that D = å1£i£N (di . wi)
is minimum, where, di is the distance of the ith
point from (x,y), and wi is the weight of the edge
from the ith point to (x,y). This is illustrated in Figure 4. A heuristic solution to this problem is to place (x,y)
at the center of gravity of the N input points in question, i.e.: x = å1£i£N xiwi /å1£i£N wi (2) y = å1£i£N yiwi /å1£i£N wi (3) Hence, in a
minimum-cost tree with multiple copies (i.e., multiple internal vertices), each
copy (x,y) should be at the center of gravity of those vertices to which
it is connected. The objective of our algorithm is to find such a tree. In the following,
we compare our formulation to other popular variants of content placement
problems described in prior literature. If the number of copies in the tree is
known in advance, a popular variation of the problem is expressed as a Minimum K-median
problem, stated as follows. Given n points (possible copy locations), select K
of them to host data copies, and feed each observer from a copy such that total
communication cost D is minimized, where: D = å1£j£K å1£i£N cij . yij (4) cij is the cost of the edge
from i to j and yij is 1 if the jth
copy serves the ith observer, and 0 otherwise. Many Internet-based content placement algorithms adopt this model.
In this case, the possible locations of the caches are fixed. Hence, cij
is fixed for the given
network topology. The problem is NP-hard, but heuristic solutions are
possible, e.g., [10] and [11]. If the cache locations are
specified, a minimum spanning tree can be constructed to disseminate
information from senders to receivers at the
lowest cost.
Our model differs
in that copy locations are not known a priori. In a dense sensor
network, the number of nodes n approaches infinity. Copies can essentially be
placed anywhere in the Euclidean plane without restrictions. In this case, the problem
is that of constructing a minimum-cost weighted Steiner tree, which connects
the sensor node to the observers. The Steiner tree
formulation differs from the K-median and spanning tree problems in that it
allows one to create new nodes in the tree as opposed to having to choose from
a pre-specified set of possible node locations. This difference separates our
paper from similar work in web caching and content distribution literature. Note that Rupdate in
our algorithm is not a fixed sampling rate, but rather refers to the average
rate of change of the environment. Hence, it may vary dynamically with
environmental conditions. For example, it may decrease when the environment is
quiet. An advantage of such dynamic adaptation is that no energy is wasted when
no updates occur. A disadvantage is that an application is unable to tell when
it has missed an update (e.g., due to message loss), since it does not expect
updates to arrive at particular time intervals. This problem can be solved in
several ways. First, we may let Rupdate be a fixed sampling rate. The formulation of our algorithm remains
the same. In this case, if a sample does not arrive in time, the application
can tell. Alternatively, the origin sensor may number the updates. If a gap
occurs in the received update numbers, the application is aware that a previous
update was lost. The occasional loss may be acceptable since we assume that
only the latest update is relevant at any given time. A potential problem with
the latter approach is that in the absence of subsequent environmental changes,
an important update may be lost, unbeknown to the application, indefinitely.
One solution is to enforce an upper bound, B, on the update period.
Hence, when the environment is quiet a message is expected at least once every B
seconds. Otherwise, the application is
aware of a problem. In the rest of the paper, we shall not address the issue
update loss any further. 3.
Data Placement Upon
perturbation, distributed physical systems such as weights interconnected by
strings settle into an equilibrium position, which represents a minimum energy
state. Our data placement algorithm is inspired by such systems. Assuming
environmental conditions don’t change, each step of the algorithm reduces a
measure of total energy until a minimum energy tree is found. More
specifically, we use a distributed greedy heuristic that iteratively places
each node at the center of gravity of its neighbors. Note that, while in a
physical system, energy has a direct meaning, in our system energy is an
abstract mathematical quantity. We call the
depth of the copy in the distribution tree rooted at the origin sensor, the copy
level. The original data at the sensor is referred to as the level-0
copy. A heuristic is used to add or remove copies in the tree. The algorithm is described in more detail next. 3.1 The Algorithm Each
node on the multicast tree rooted at the sensor maintains a location pointer to
its parent as well as a location pointer to each of its children. One can think
of these pointers as an application-layer routing table. For each child, the
node maintains the maximum propagation rate, which is the maximum of all
requested update rates of all observers served by that child. A node never
forwards updates to a child at a rate higher than the child’s maximum
propagation rate. This way, flurries of environmental updates that exceed some
receivers’ requested rates are not propagated unnecessarily to those receivers.
3.1.1
Joining the Multicast Tree An
observer, k, joins a multicast tree by sending a join() message to the
location of the origin sensor, i.e., to the level-0 copy. The message
indicates the location of the observer and its desired update rate Rk.
The origin sensor forwards the message along the multicast tree in the
direction of the new observer as follows. Each level-i copy (starting
with the origin sensor), upon receipt of the join message, determines if the
new observer is closer to any of its children than to itself. If so, it
forwards the join message to the corresponding child, i.e., to a level-(i+1)
copy. If the maximum propagation rate for that child is lower than Rk
it is changed to Rk. This recursive forwarding terminates
when a node is found with no children that are closer to the observer. We call
this copy the nearest neighbor. The nearest neighbor adds the observer
to the set of its children. The maximum propagation rate for the observer is
initialized to its requested update rate. Figure 5
illustrates the message exchange in the join process. 3.1.2
Copy Creation and Migration For
the purposes of creation of new cache copies, nodes are differentiated into
fixed and migratory. The origin
sensor and observers are fixed nodes. Other nodes are migratory
nodes that can move to better
locations of fork off new copies. When a newly joined observer is connected to its nearest neighbor N,
node N computes the center of gravity of itself and all its neighbors.
It then computes the savings, if any, resulting from creating a new copy at
that center of gravity. If the savings from creating the copy
exceed a threshold, the option of creating this copy is deemed viable.
Before we proceed further, let us look more closely at how the copy may be
created.
Fig. 5: Joining the
Multicast Tree If N (the nearest neighbor) is the origin sensor, the new copy can
only be created downstream from it. The copy would be fed from N
and in turn feed N’s children as shown in Figure 6-a.
Otherwise, if N is not the origin sensor, the new copy can in principle
be created either downstream or upstream from N. An upstream copy
would be fed from N’s parent and would feed both N and N’s
children as shown in Figure 6-b. A downstream copy would
be created as described above (Figure 6-a). Observe that,
if N is not a fixed copy, a third option is also possible. Namely, it is
possible to simply move N to a new position. This is called copy migration. In copy
migration, when a newly joined observer is connected to a
migratory nearest neighbor N, the node computes the center of gravity of all its neighbors
(including the new observer), and evaluates the savings that would arise if it
moves to the computed position. If the difference is larger than a fixed
threshold the option of migration is deemed viable. This is illustrated
in Figure 6-c. A viable option with the
maximum savings among three data
placement options described above is executed. It is
easy to show that no new copies are created unless there are three nodes in the
system, and that at most one copy is created for every newly joined member.
Hence, the algorithm creates at most m-2 copies where m is the
total number of observers. 3.1.3
Leaving the Multicast Tree An
observer, k, leaves the multicast tree by sending a leave() message to its
parent N. The parent stops forwarding messages to the departed observer.
If k had the highest maximum forwarding rate among N’s children, N
resets its own maximum forwarding rate to that of the next-highest rate child.
If N is a migratory node, it computes the center of gravity of all
remaining neighbors, computes the savings that result from moving to that
center, and moves there if the savings exceed a threshold. If there is only one
child left for the migratory node, the node is deleted and its parent takes
over its child.
(a) Nearest neighbor creates downstream copy
(b) Nearest neighbor creates upstream copy
Fig. 6: Copy Creation and
Migration Rules 3.2 Sampling Rupdate To
perform center of gravity computations, nodes must know not only the requested
observer rates, but also the environmental sensor update rate, Rupdate.
There are two simple approaches towards the measurement of that rate. One
approach is to measure the number of updates over the last n seconds. A
disadvantage of this approach is that it has a fixed time horizon after which
it forgets the past. It may be more advantages to adapt the horizon to the
current rate of updates itself, such that system agility is increased when
activity is high. An approach for calculating Rupdate, which
has the aforementioned adaptive property, is to take the inverse of the average
of the last k inter-arrival times. More complicated methods like
predictive modeling could have been used but this would be limited by the
computation and storage resource constraints of the nodes. For our simulation
purposes we have used a simple model where Rupdate is
calculated as the inverse of the average of the last five inter-arrival times.
The number five is selected as the sample size to reflect that we expect any
five consecutive updates to be strongly correlated, though a larger or smaller
number could be chosen according to how volatile we expect the environment to
be. 4.
Evaluation Our current service implementation utilized Berkeley motes [15] as the underlying distributed platform. These are tiny computing devices, which run a microthreaded operating system called TinyOS [16]. Each node has up to three sensors. It runs on an 8-bit 4 MHz micro-controller and has 128KB of program memory and 4KB of data memory. However, the number of motes available to us at present is insufficient for large-scale experiments. Hence, in the first set of experiments, we use the motes only to derive communication and power consumption characteristics that are then fed to a simulator. Accordingly, we also implemented our data placement middleware in the ns-2 simulator [20]. Our goal in simulating the data placement algorithm is to test whether it actually conserves power as expected given the power and communication characteristics of the motes. Our simulation model is a network of (200£ N £ 500) nodes in a 200m x 200m grid each having a radio range of 20m. The packet sizes are kept as 30 bytes, as used by the Berkeley motes running TinyOS. The number of base-stations is roughly 5% the number of nodes in the network. Since we have to model a location-aware network, we assume that each node knows it own location. We implemented a wrapper, which works on top of the simulated routing protocol in ns-2, and translates the destination location of packets into actual destination node addresses. |