|
||||||||||||||
|
![]()
![]()
![]()
Lei Guo
,
Songqing Chen
,
Zhen Xiao
, Enhua
Tan
, Xiaoning Ding
, and Xiaodong Zhang![]()
|
|
|
|
|
|
|
|
|
|
|
|
|
{lguo, etan, dingxn, zhang}@cs.wm.edu |
sqchen@cs.gmu.edu |
xiao@research.att.com |
Existing studies on BitTorrent systems are single-torrent based, while more than 85% of all peers participate in multiple torrents according to our trace analysis. In addition, these studies are not sufficiently insightful and accurate even for single-torrent models, due to some unrealistic assumptions. Our analysis of representative BitTorrent traffic provides several new findings regarding the limitations of BitTorrent systems: (1) Due to the exponentially decreasing peer arrival rate in reality, service availability in such systems becomes poor quickly, after which it is difficult for the file to be located and downloaded. (2) Client performance in the BitTorrent-like systems is unstable, and fluctuates widely with the peer population. (3) Existing systems could provide unfair services to peers, where peers with high downloading speed tend to download more and upload less. In this paper, we study these limitations on torrent evolution in realistic environments. Motivated by the analysis and modeling results, we further build a graph based multi-torrent model to study inter-torrent collaboration. Our model quantitatively provides strong motivation for inter-torrent collaboration instead of directly stimulating seeds to stay longer. We also discuss a system design to show the feasibility of multi-torrent collaboration.
BitTorrent [8] is a new generation of Peer-to-Peer (P2P) system that has become very popular recently. According to a recent CNN report, BitTorrent traffic represents 53% of all P2P traffic on the Internet in June 2004 [16]. Unlike traditional P2P systems such as Gnutella [1], KaZaa [2], and eDonkey/eMule/Overnet [3], in which peers sharing different files are organized together and exchange their desired files with each other, BitTorrent organizes peers sharing the same file into a P2P network and focuses on fast and efficient replication to distribute the file. In BitTorrent, a file is divided into small chunks, and a peer can download multiple chunks of the file in parallel. Peers with different file chunks are stimulated to exchange with each other through a ``tit-for-tat'' incentive mechanism, which enables peers with high uploading bandwidth to have corresponding high downloading bandwidth. In this way, BitTorrent prevents free riding effectively, which is very common in early P2P systems [5]. P2P systems for exchanging different files such as KaZaa and eMule use participation levels or credit/reputation systems to track the contribution of each peer, and encourage peers to contribute by giving higher service priority to those peers with more contribution. However, such systems are either too complex and unrealistic or very easy to be circumvented [4,6]. Compared to these systems, the direct ``tit-for-tat'' mechanism of BitTorrent is very simple and effective. In practice, BitTorrent-like systems scale fairly well during flash crowd period and are now widely used for various purposes, such as for distributing large software packages [7,14].
Research has been conducted to study the effectiveness of BitTorrent-like systems [7,14,17,18,23]. The most recent work shows the stability of BitTorrent-like systems through a fluid model, and verifies the effectiveness of the current incentive mechanism [18]. However, this fluid model assumes a Poisson arrival model for the requests, which has been shown to be unrealistic during a long period (eight months) of trace study [17]. Consequently, the model can only characterize the performance of the BitTorrent system under stable conditions. In reality, as shown by our trace analysis, the stable period is very short. In addition, all existing studies on BitTorrent-like systems focus on the behaviors of single-torrent systems only, while our traces show that most peers ( > 85%) participate in multiple torrents.
In this work, we present an extensive study of BitTorrent-like P2P systems through measurements, trace analysis, and modeling. We first study the evolution of BitTorrent systems based on realistic assumptions analyzed from traces. We find that although the existing system is effective to address the ``flash crowd'' problem upon the debut of a new file, it has the following limitations:
Motivated by the results for the single-torrent system, we further study the multi-torrent system through trace analysis and modeling. Although it was generally understood that collaboration among multiple torrents might overcome some of the limitations of the single-torrent system, to our best knowledge, our work is the first to quantitatively and comprehensively analyze the multi-torrent system. In detail, we (1) characterize the peer request pattern in multiple torrents; (2) study the service potentials a torrent can provide to and get from other torrents; (3) demonstrate the benefit of inter-torrent collaboration. In addition, we discuss a new architecture to facilitate inter-torrent collaboration and show the feasibility and compatibility to the current BitTorrent systems.
Our contributions in this work are:
The remainder of the paper is organized as follows. Section 2 presents related work. We demonstrate the limitations of existing BitTorrent-like systems through measurements, trace analysis, and modeling in Section 3, and present our multi-torrent model in Section 4. Section 5 proposes and discusses an architecture for inter-torrent collaboration. We make concluding remarks in Section 6.
The amount of P2P traffic and the population of P2P users on the Internet keeps increasing. A lot of studies have been performed on the measurements, modeling, and algorithms of different P2P systems.
Measurement studies [19,20] characterize the P2P traffic over the Internet, including Napster, Gnutella, and KaZaa systems. Study [12] analyzes the popularity of P2P content over the Internet and characterizes the ``download at most once" property of P2P clients. Extensive measurements and traffic analysis on BitTorrent systems have also been conducted recently. Study [14] analyzes a five-month workload of a single BitTorrent system for software distribution that involved thousands of peers, and assesses the performance of BitTorrent at the flash crowd period. In [7], authors analyze the BitTorrent traffic of thousands of torrents over a two-month period, with respect to file characteristics and client access characteristics. Work [17] presents the current infrastructure of BitTorrent file sharing systems, including the Web servers/mirrors for directory service, meta-data distribution, and P2P content sharing. The authors also find that the arrival, abort, and departure processes of downloaders do not follow a Poisson distribution in the eight-month trace they collected, which was assumed in the previous modeling study [18].
A queuing model for P2P file sharing systems is proposed in [11]. Study [23] analyzes the service capacity of BitTorrent-like systems, and finds that multi-part downloading helps P2P systems to improve performance during flash crowd period. Study [18] further characterizes the overall performance of BitTorrent-like systems using a simple fluid model, and analyzes the effectiveness of BitTorrent incentive mechanism using game theory. Study [15] introduces a probabilistic model of coupon replication systems, and analyzes the performance under an environment where neither altruistic user behaviors nor load balancing strategies (such as rarest first in BitTorrent) are supported.
Study [22] proposes an interest-based content location approach for P2P systems. By self-organizing into small groups, peers with the same interest can collaborate more efficiently, which is similar to the BitTorrent networks, where all peers share the same file. In [21], a P2P protocol is proposed for bulk data transfer, which aims to improve client performance and to reduce server load, by using enhanced algorithms over BitTorrent systems.
Different from all studies above, our modeling and trace analysis provide an understanding of torrent evolution in the BitTorrent systems and the relation among multiple torrents over the Internet. Furthermore, our results reveal three limitations in current BitTorrent systems, and propose an innovative architecture to facilitate inter-torrent collaboration, which represents the first step towards making the current BitTorrent-like system a reliable and efficient content delivery vehicle.
In a BitTorrent system, the content provider creates a meta file (with the .torrent suffix name) for the torrent file it wants to share, and publishes the meta file on a Web site. Then the content provider starts a BitTorrent client with a full copy of the torrent file as the original seed. For each torrent file, there is a tracker site, whose URL is encoded in the meta file, to help peers find each other to exchange the file chunks. A user starts a BitTorrent client as a downloader at the beginning to download file chunks from other peers or seeds in parallel. A peer that has downloaded the file completely also becomes a seed that could in turn provide downloading service to other peers. All peers in the system, including both downloaders and seeds, self-organize into a P2P network, known as a torrent. The initial seed can leave the torrent when there are other seeds available, and content availability and system performance in the future depend on the arrival and departure of downloaders and other seeds.
Previous research has studied BitTorrent-like systems through trace analysis and modeling, and verified its effectiveness during flash crowds, which normally happen soon upon the debut of a new file [18]. However, no existing work has characterized overall client performance in the lifetime of a torrent when the file popularity changes. This is particularly important for BitTorrent-like systems where service availability relies purely on the voluntary participation of peers. This is in contrast to a client-server model where a permanent site (i.e., a server) can provide persistent service.
In this section, we study torrent evolution, downloading service availability, client performance fluctuation, and service fairness in BitTorrent-like systems based on torrent popularity characterization. We propose an evolution model for BitTorrent-like systems and analyze the torrent lifespan, ratio of failed peers, and the service policy of seeds, based on both the modeling and trace analysis.
In this study, we analyze and model BitTorrent
traffic based on two kinds of traces. The first one contains the statistics
collected from two popular dedicated tracker sites (although each torrent can
have its own tracker site, there are many dedicated tracker sites on the
Internet providing persistent service, each of which may host thousands of
torrents), sampled every half an hour for 48 days from 2003-10-23 to 2003-12-10.
This trace was collected by
In order to better understand BitTorrent traffic over the Internet, we also collected the BitTorrent meta file downloading trace from a large commercial server farm hosted by a major ISP and a large group of home users connected to the Internet via a well-known cable company, using the Gigascope appliance [10], from 2004-09-28 to 2004-10-07. The server farm trace includes about 50 tracker sites hosting hundreds of torrents, and the cable network trace includes about 3,000 BitTorrent users (by IP addresses) requesting thousands of torrents on the Internet. Both traces include the first IP packets of all HTTP downloading of the .torrent files, with the timestamp when the packet is captured (the downloading time of the .torrent file). This timestamp represents the peer arrival time to the torrent. We also extract the timestamp encoded in each .torrent file, which is the creation time of the meta file and represents the torrent birth time.
|
|
|
|
|
(a) Tracker trace |
(b) Server farm trace |
(c) Cable network trace |
|
Figure 1: The complementary CDF distribution of peer arrival time (time of a peer's first request to a torrent or time when a meta file was downloaded) after torrent birth for three BitTorrent traces (y-axis is in log scale) |
||
Figure 1(a) shows the complementary CDF (CCDF) distribution of the time after torrent birth for the requests to all fully-traced torrents in the tracker trace. For peers downloading the file in multiple sessions, only the first requests are considered. The y-axis at time t denotes the total number of requests for all torrents in the trace minus the cumulative number of requests for all torrents after time t since they are born. Figures 1(b) and 1(c) show the CCDF distribution of the time when a .torrent file was downloaded after torrent birth in the server farm and in the cable network, respectively. Note that y-axis is in log scale in the three figures.
All three curves can be fitted with straight lines. This
consistent trend strongly suggests that after a torrent is born, the number of
peer arrivals to the torrent decreases exponentially with time in general. To
validate that this conjecture holds for individual torrents as well, we use the
least square method to fit the logarithm of the complementary of the number of
peer arrivals for each torrent in the tracker trace. We define the relative
deviation of the fitting for the number of requests at a time instant as
, where
is the actual complementary value of the
number of requests and
is the fitting result. Figure 2 shows the distribution of average fitting
deviation for each fully-traced torrent that has at least 20 peers during its
lifetime. In this figure, each point in the x-axis denotes a torrent,
sorted in non-ascending order of torrent population during the entire lifetime,
and the corresponding value in y-axis denotes the average of relative fitting
deviation of this torrent. We can see the fitting is more accurate for torrents
with larger population, and the overall average relative deviation is only
about 6%. We do not fit the curve for individual torrents in the server farm
and cable network trace, because the data collection duration is short so that
they do not cover the whole lifespans of torrents. In
the remainder part of this paper, we only use the tracker trace for modeling
and analysis.
We define the torrent popularity at a time instant
as the peer arrival rate of the torrent at that time, which is the derivative
of the peer arrival time distribution of that torrent. Since the derivative of
an exponential function is also an exponential function, we assume that the
peer arrival rate of a torrent follows an exponential decreasing rule with time
![]()
where
is the initial arrival rate when the torrent
starts, and
is the attenuation parameter of the torrent
evolution. In Section 3.3, we will use a fluid
model to evaluate our assumption again.
We define the torrent lifespan as the duration from the birth of the torrent to the time after which there is no complete copy of the file in the system, and the new arriving peers cannot complete downloading. To simplify the modeling, we assume that the initial seed exits the system as soon as a downloader has downloaded the file completely. In practice, the initial seed may stay online in the system for a longer time, and some seeds may return to the system to serve the content.
The inter-arrival time between two successive
arriving peers
can be approximated as
. Denote the rate at which seeds leave
the system as
, then the average
service time of a seed can be approximated as
. As shown in Figure 3, peer
and peer
+1 are the
-th and (
+1)-th arriving peers in the torrent, at the time
and
, respectively. The inter-arrival
time between peer
and peer
+1 can be estimated as
. Peer
downloads the file with speed
and then stays in the torrent for a time duration
. Peer
+1 downloads at speed
. According to the exponential
decrease of peer arrival rate, the inter-arrival time
of peers will grow exponentially, and finally there will be only one seed at a
time. When the peer arrival rate
is small enough
(
is large), peer
+1 can only be served by peer
, and we have
. Thus, when
, peer
cannot complete downloading before peer
leaves, and the torrent is dead. Using Equation 3.1, we get the torrent lifespan
Equation 3.2 shows the
expectation of the real torrent lifespan. To verify Equation 3.2, we compute the initial peer arrival rate
and the torrent attenuation parameter
for fully
traced torrents in the tracker trace. From Equation 3.1,
we have
|
|
(3.3) |
Both
and
for each peer
arrival can be extracted from the trace and we get
and
using linear regression. We also compute
the seed leaving rate
as the the
reciprocal of the average seed service time, which is extracted from the trace,
too. Figure 4 shows the comparison of torrent
lifespan computed from the tracker trace (indicated by trace) and that
from the Equation 3.2 (indicated by model).
In this figure, each point in x-axis denotes a torrent, while each point
in y-axis denotes the measurement result or the modeling result of
torrent lifespan. The torrents in the x-axis are sorted in non-ascending
order of the modeling results of torrent lifespans.
As shown in the figure, our model fits the real torrent lifespan very well. The
average lifespan of torrents is about 8.89 days based on the trace analysis and
8.34 days based on our model. The lifespans of most
torrents are between 30 - 300 hours, and there are only a small number of
torrents with extremely short or extremely long lifespans.
The total population of a torrent (in the number of
peers) is
Among them, some peers may not be able to complete
downloading due to lack of seeds, which we call failed peers, denoted
as follows:
Thus, the downloading failure ratio of the torrent
is
|
|
|
|
|
(a) The comparison of torrent population: modeling and trace
analysis (in log-log scale) |
(b) The comparison of downloading failure ratio: modeling and
trace analysis (y-axis is in log scale) |
(c) The relation between torrent population and downloading
failure ratio (y-axis is in log scale) |
|
Figure 5: Torrent population and downloading failure ratio for all fully-traced torrents |
||
Figure 5(a) shows the comparison of the torrent population computed from the tracker trace with that from our model for each individual fully-traced torrent. In this figure, each point in x-axis denotes a torrent, while each point in y-axis denotes the measurement result or the modeling result of the total population of the torrent during its entire lifespan. The torrents in the x-axis are sorted in non-ascending order of the modeling results of torrent populations. As evidenced by the figure, the modeling result and trace analysis are consistent. In addition, we can see that the distribution of the torrent population is heavily skewed: although there are several large torrents, most torrents are very small, and the average population of torrents is only about 102 peers.
Figure 5(b) shows the downloading failure ratio based on trace analysis and on our model (plotted in the similar manner as that of Figure 5(a)). The real failure ratio of torrents is slightly lower than what our model predicts, because there are some altruistic peers that serve the torrent voluntarily. That also explains why the torrent lifespan in the trace analysis (8.89 days) is slightly higher than that in our model (8.34 days). Furthermore, there are some torrents that have no failed peers in the trace because the seeds leave after the downloaders finish, but cannot be shown in the log scale plot. However, the average downloading failure ratio based on the trace analysis is still about 10%, which is non-trivial for a content distribution system.
Equation 3.5 implies that the
number of failed peers in a torrent is independent of the initial peer arrival rate.
Instead, the number of failed peers depends on the speed of torrent evolution
(the attenuation exponent of peer arrival rate) and the seed departure rate.
Figure 5(c) shows downloading failure
ratios of torrents and their corresponding populations (plotted in the similar
manner as that of Figure 5(a) and 5(b)). As reflected in the figure and indicated by
Equation 3.5, the larger the torrent population, the
lower the downloading failure ratio. It is interesting to note that the
population of torrents, sorted in non-ascending order of their corresponding
downloading failure ratios, forms several clear curves, each of which
represents those torrents with similar evolution patterns (the popularity
attenuation parameter
). On the
right side of the figure, the failure ratio of the torrents is 0 due to the
existence of some altruistic seeds, which always stay until the last downloader
completes.
|
|
|
|
(a) The downloading speed distribution (complementary CDF, in
log-log scale) |
(b) The downloading progress distribution (complementary CDF) |
|
Figure 6: The peers abort downloading voluntarily |
|
In the above analysis, we assume that peers always complete their downloading unless they cannot. We do not consider peers that abort downloading voluntarily when seeds are still available in the torrent. A peer may abort downloading due to (1) loss of interest to the torrent file; (2) slow downloading speed or small downloading progress. Figure 6(a) shows the distribution of the average downloading speed of peers that voluntarily abort and peers that download the file completely. Figure 6(b) shows the distribution of downloading progress (the percentage of the entire file that has been downloaded) when peers abort downloading voluntarily. The figures indicate that the probability for a peer to abort downloading voluntarily is almost independent of its downloading speed and the current downloading progress. Hence, the voluntary aborting of some downloaders does not affect our analysis above.
Study [18] proposed a fluid model
for BitTorrent-like systems with constant peer
arrival rate. We follow the idea of the fluid model, but using the evolution of
peer arrival rate described in Equation 3.1. The
basic ODE (ordinary differential equation) set for the fluid model is
where the meanings of the parameters in our fluid model are listed in Table 1. These notations are adopted from work [18,23].
When the ODE set has two different real eigenvalues
, the
resolution can be expressed as:
|
|
(3.8) |
where
are
constant. The value of these constants and the detailed resolution of the fluid
model can be found in our technical report [13].
The average downloading speed of peers at time
is
|
|
|
|
(a) Torrent evolution |
(b) Downloading speed |
|
Figure 7: Torrent evolution under the fluid model |
|
We use the tracker trace to validate the torrent evolution
model. Similar to the peer arrival rate, the modeling results fit the trace
better for torrents with larger populations. Figure 7(a) shows the torrent evolution by both our
fluid model and the analysis results of a typical torrent in the trace. The
figure shows that the number of downloaders increases
exponentially in a short period of time after the torrent's birth (the flash
crowd period), and then decreases exponentially, but at a slower rate. The
number of seeds also increases exponentially at first, and then decreases
exponentially at a slower rate. The peak time of the number of seeds lags
behind that of the number of downloaders. As a
result,
increases until the torrent is dead, and the
resources of seeds cannot increase in proportion to service demand.
Furthermore, due to the random arrival of downloaders
and the random departure of seeds, average downloading performance fluctuates
significantly when the number of peers in the torrent is small, as shown in
Figure 7(b).
|
|
|
|
|
|
(a) The downloading speed in the lifetime of a typical torrent |
(b) The downloading speed (in log scale) and the number of downloaders/seeds for each torrent at a time instant |
(c) The average downloading speed (in log scale) and the total number of downloaders/seeds for all torrents in the system |
|
|
Figure 8: Performance variations in BitTorrent systems |
|||
Figure 8(a) shows the performance variations of the torrent under two kinds of granularities. The instant speed represents the mean downloading speed of all peers in the torrent at that time instant, sampled every half an hour. The average speed represents the average value of the instant speed over the typical downloading time (the average downloading time of all peers). The figure shows that the client downloading speed at different time stages is highly diverse and can affect client downloading time significantly. The reason is that seeds play an important role in the client downloading performance. However, the generation of seeds is the same as the completeness of peer downloading, so the random fluctuation of downloading speed cannot be smoothed in the scale of typical downloading time when the number of peers is small.
Figure 8(b) shows the
number of downloaders and seeds (a stack figure), and
the average downloading speed for each torrent in the trace at
Figure 8(c) shows the total number of peers in all torrents (a stack figure) and the average downloading speed of all downloaders in the trace at different time stages. The average downloading speed of all torrents is shown to be much more stable than that of one torrent. The reason is that the downloader/seed ratio is much more stable due to the large population of the system. This motivates us to balance the service load among different torrents, so that each torrent can provide relatively stable downloading performance to clients in its lifespan.
In a BitTorrent system, the service policy of seeds favors peers with high downloading speed, in order to improve the seed production rate in the system, i.e., to have these high speed downloaders complete downloading as soon as possible and wish they will then serve other downloaders. In this subsection, we investigate whether this wish comes true in practice.
|
|
|
|
(a) The peer downloading speed and contribution ratio |
(b) The the number of torrents that each
peer involves and the corresponding contribution ratio |
|
Figure 9: Fairness of seed service policy in BitTorrent systems (y-axis is in log scale) |
|
We define the contribution ratio of a peer as the total uploaded bytes over the total downloaded bytes of the peer. Figure 9(a) shows the peer downloading speed and the corresponding contribution ratio extracted from the trace. In this figure, each point in the x-axis denotes a peer, while the left y-axis denotes the contribution ratio of this peer, and the right y-axis denotes the average downloading speed of this peer. On the x-axis, peers are sorted in non-ascending order of their contribution ratios. The figure shows the rough trend that the peer contribution ratio increases when the downloading speed decreases. That is, the higher the downloading performance peers have, the less uploading service they actually contribute. This indicates that peers with high speed finish downloading quickly and then quit the system soon, which defeats the design purpose of the seed service policy.
Figure 9(b) shows the number of torrents that each peer involves and its corresponding contribution ratio (plotted in the similar way as that of Figure 9(a)). The figure shows no distinguishable correlation between the two, indicating that the main reason for seeds to leave old torrents is not to start new downloading tasks.
In summary, we observe that the BitTorrent's biased seed service policy in favor of high speed downloaders really affects the fairness to peers in downloading, and an incentive mechanism is needed to encourage seeds to contribute.
In the previous section, we have shown that client performance fluctuates significantly in single-torrent systems, but is very stable when aggregated over multiple torrents. Based on this observation, in this section, we study the correlation among multiple torrents through modeling and trace analysis, aiming to look for solutions to enable inter-torrent collaboration.
Although different torrents are independent from each other in the current BitTorrent systems, they are inherently related by peers that request multiple torrent files. A peer may download a torrent file, serve as a seed for that torrent for a while, and then go offline to sleep for some period of time. The peer may return sometime later and repeat the activities above. Thus, a peer's lifecyle consists of a sequence of downloading, seeding, and sleeping activities. If a peer stops using BitTorrent for a long time that is much longer than its typical sleeping time, we consider the peer as dead.
In the current BitTorrent systems, a peer is encouraged to exchange file chunks with other peers that are downloading the same file instead of serving old torrent files it has downloaded. Thus, in our model, we assume each peer joins (downloading and seeding) each torrent at most once, and joins one torrent at a time. Having these assumptions, we start to characterize peers in multiple torrents.
|
|
|
|
|
|
(a) Torrent birth |
(b) Request arrival of all peers over all torrents |
(c) Peer birth |
|
|
Figure 10: The CDFs of torrent birth, peer request arrival, and peer birth over the trace collection time |
|||
In the multi-torrent environment, both torrents and peers
are born and die continuously. Figure 10(a) shows
the CDF of torrent birth in the trace (indicated by raw data) and our
linear fit. The average torrent birth rate (denoted as
in the following context) is about 0.9454 torrent per hour. Figure
10(b) shows the CDF of torrent request arrivals
(for all peers over all torrents) and our linear fit. We define the torrent
request rate as the number of downloading requests for all torrents per
unit time in the multi-torrent system, denoted as
in the
following context. Although the peer arrival rate of a single-torrent system
decreases exponentially as shown in Figure 1, the
torrent request rate in the multi-torrent system is almost a constant, about
133.39 per hour.
Since both the torrent birth rate and torrent request rate
are almost constant, it is natural to assume that the peer birth rate
(denoted as
in the following context) is also a constant. A
peer is born when it appears in the system for the first time.
However, as shown in Figure 10(c), the peer birth
rate is high at the beginning of the trace collection duration, and then
converges to a constant rate asymptotically. The reason is that peers appear in
the trace for the first time may actually be born before the trace collection,
and the number of such peers decreases quickly after the trace collection
starts. Thus, we take the asymptotic birth rate as the real birth rate of
peers, which is about 19.37 per hour.
The constant peer birth rate and torrent request rate
indicate that each peer only joins a limited number of torrents. However, the
request rate of a peer might still change over time. We define the peer
request rate as the number of requests a peer submits for different
torrents per unit time. Assume the peer request rate can be expressed as
where
is the time duration after the peer is born,
is the initial request rate, and
is the attenuation
parameter of the request rate. When
, the peer has a constant request rate;
when
, the peer has an increasing request rate.
The inter-arrival time between two successive requests of a
peer
is
. Thus, we have
|
|
(4.11) |
|
|
|
|
|
(a) The attenuation of peers' requesting rates and number of
torrents peers request (y-axis is in log scale) |
(b) The inter-arrival time of peers' requests and number of
torrents they join (in log-log scale) |
(c) The downloading speed and number of torrents peers join
(y-axis is in log scale) |
|
Figure 11: The request pattern of peers |
||
We extract
and
from the trace for each peer requesting multiple torrents, and use
linear regression to compute
and
. Figure 11(a) shows the number of torrents that
each peer requests and the corresponding
. In this figure, each point in the x-axis denotes a peer,
while the left y-axis denotes the
value of this peer,
and the right y-axis denotes the number of torrents this peer
participates. In x-axis, peers are sorted in non-ascending order of the
number of torrents they join. As shown in the figure, the value of parameter
in Equation 4.10 is very large,
with the mean value of about 77 years, which implies that the average request
rates of peers do not change significantly over time. Further,
is
independent of the number of torrents that peers join. Thus, we can assume that
the request processes of peers are Poisson-like processes with constant average
request rates.
Figure 11(b) shows
the average inter-arrival time of torrent requests for peers requesting
multiple torrent files (plotted in the similar manner as that of Figure 11(a)). As shown in the figure, it is
intuitive to find that the upper bound of the number of torrents each peer
requests increases with the decrease of inter-arrival time. However, for peers
with similar request rates, the number of torrents
they request are very diverse, since they stay in the system for different time
durations. Figure 11(c) further plots the
downloading speed versus the number of torrents peers join (plotted in the
similar manner as that of Figure 11(a)).
There is no strong correlation between the two for peers with downloading speed
1 KB per second. This implies that for peers whose downloading
speed is large enough, the numbers of torrent files different peers request do
not depend on their request rates and their downloading speed.
Thus, we assume that a peer joins a new torrent with probability
. For
peers
in the system, during their whole lifecycles, there are
peers
that request at least
torrents. Ranking peers in non-ascending
order of the number of torrents they join, the number of torrents that a peer
ranked
joins is
In addition, a peer has the probability
to
download exactly 1 file, probability
to download exactly
2 files, and probability
to download exactly
files. So the mean number of torrents that a peer joins is:
|
|
(4.13) |
|
|
|
|
(a) For all peers in the trace |
(b) For peers born in the middle of trace collection time |
|
Figure 12: Torrent involvement of peers (x-axis is in log scale) |
|
Figure 12(a) shows the
distribution of number of torrent files that each peer downloads in the trace.
The curve in the figure is a little convex, deviating from what Equation 4.12 predicts (a straight line when
-axis
is in log scale). The reason is that the number of torrents joined by peers
born before the trace collection is under-estimated, since some of these
requests cannot be recorded in the trace. A similar situation exists for peers
that are not dead before the end of trace collection.
Figure 12(b) shows
the distribution of number of torrents joined by each peer that was born in the
middle of the trace collection duration (indicated by raw data) and
our linear fit. The curve fits Equation 4.12 very
well, and we estimate from the analysis that
, while the average number of torrents
each peer joins is about
.
To verify the probability model we use in the above
analysis, we estimate
in another way as follows. Assuming the
peer birth rate is
and the torrent request rate is
, since each peer joins
torrents during its lifetime in average, we
have
|
|
(4.14) |
Based on the peer request arrival rate and the peer birth rate
we derived before (see Figure 10(b) and 10(c)), we have
. This is
very close to the value we got from Equation 4.12,
, meaning that there are more than
85% peers joining multiple torrents.
|
|
|
|
|
(a) The probability distribution of seeding time |
(b) The probability distribution of sleeping time |
|
|
Figure 13: The seeding time and sleeping time of peers (y-axis is in log scale) |
||
Having characterized the torrent request pattern of peers,
finally we consider the distribution of the seeding time and the sleeping time
of peers. According to our fluid model,
represents the average seeding time.
Figure 13(a) and 13(b)
show the probability distribution functions of the peer seeding time and the peer
sleeping time in the system. Note that the
-axis is in log scale.
Both the peer seeding time and sleeping time roughly follow the exponential
distribution with probability density function
, and
, respectively. Based on the trace analysis, we estimate
hours, and
hours.
In this part we study how different torrents are connected through peers that download multiple files, based on our previously verified assumptions.
For simplification, we consider a homogeneous multi-torrent
environment where all torrents and peers have the same
,
,
,
,
,
, and average sleeping time. We denote each torrent in the system
as torrent
with birth time
(
). For any two torrents that are born
successively, torrent
first born and torrent
born
next, we have
and
.
Assume the probability that a peer selects torrent
at time
as its
-th torrent is
,
when
. We also
denote
as
for simplicity.
Without loss of generality, we assume that the most recently born torrent by
time
is torrent 1, and
satisfies
|
|
(4.15) |
where
,
. Thus, we have
|
|
(4.16) |
For a peer that requests its
-th torrent file, the peer does not
select the torrents that it has requested. Assuming
the peer arrival rate of a torrent can be expressed as