6.1 Response Time Metrics

Next: 6.2 Metrics Evaluating the Up: 6. Metrics to Measure Previous: 6. Metrics to Measure

6.1 Response Time Metrics

We use the following functions to denote the critical timestamps for connection conn and request r:

t_syn(conn): time when the first SYN packet from the client is received for establishing the connection conn;
t_req^start(r): time when the first byte of the request r is received ;
t_req^end(r): time when the last byte of the request r is received;
t_resp^start(r): time when the first byte of the response for r is sent;
t_resp^end(r): time when the last byte of the response for r is sent;
t_resp^ack(r): time when the ACK for the last byte of the response for r is received.

Additionally, for a web page P, we have the following variables:

N - the number of distinct connections used to retrieve the objects in the web page P;
r₁^k,...r_{n_k}^k - the requests for the objects retrieved through the connection conn_k ( k = 1,...,N), and ordered accordingly to the time when these requests were received, i.e.,

$\begin{displaymath}t_{req}^{end}(r_{1}^{k}) \le t_{req}^{end}(r_{2}^{k}) \le ... \le t_{req}^{end}(r_{n_k}^{k}).\end{displaymath}$

The extended version of HTTP 1.0 and later version HTTP 1.1 [9] introduce the concepts of persistent connections and pipelining. Persistent connections enable reuse of a single TCP connection for multiple object retrievals from the same IP address. Pipelining allows a client to make a series of requests on a persistent connection without waiting for the previous response to complete (the server must, however, return the responses in the same order as the requests are sent).

We consider the requests r_i^k,...,r_n^k to belong to the same pipelining group (denoted as $PipeGr = \{r_i^k,...,r_n^k\}$ ) if for any j such that $i \leq j-1 < j \leq n$ , $t_{req}^{start}(r_{j}^{k}) \le t_{resp}^{end}(r_{j-1}^{k})$ .

Thus for all the requests on the same connection conn_k: r₁^k, ... , r_{n_k}^k, we define the maximum pipelining groups in such a way that they do not intersect, e.g.,

$\begin{displaymath}\underbrace{r_{1}^{k}, ... , r_{i}^{k}}_{PipeGr_1}, \underbra... ...+1}^{k}}_{PipeGr_2} , ... ,\underbrace{r_{n_k}^{k}}_{PipeGr_l}.\end{displaymath}$

For each of the pipelining groups, we define three portions of response time: total response time (Total), network-related portion (Network), and lower-bound estimate of the server processing time (Server).

Let us consider the following example. For convenience, let us denote $PipeGr_1 = \{r_1^k,...,r_i^k\}$ .

Then

Total(PipeGr₁) = t_resp^end(r_i^k) - t_req^start(r₁^k),

$\begin{displaymath}Network(PipeGr_1) = \sum_{j=1}^{i} {(t_{resp}^{end}(r_{j}^{k}) - t_{resp}^{start}(r_{j}^{k}))},\end{displaymath}$

Server(PipeGr₁) = Total(PipeGr₁) - Network(PipeGr₁).

If no pipelining exists, a pipelining group only consists of one request. In this case, the computed server time represents precisely the server processing time for a given request-response pair. If a connection adopts pipelining, the ``real'' server processing time might be larger than the computed server time because it can partially overlap the network transfer time, and it is difficult to estimate the exact server processing time from the packet-level information. However, we are still interested to estimate the ``non-overlapping'' server processing time as this is the portion of the server time on a critical path of overall end-to-end response time. Thus, we use as an estimate the lower-bound server processing time, which is explicitly exposed in the overall end-to-end response.

If connection conn_k is a newly established connection to retrieve a web page, we observe additional connection setup time:

Setup(conn_k) = t_req^start(r₁^k) - t_syn(conn_k)

, ⁵

otherwise the setup time is 0. Additionally, we define t^start(conn_k)= t_syn(conn_k) for a newly established connection, otherwise, t^start(conn_k)= t_req^start(r₁^k).

Similarly, we define the breakdown for a given connection conn_k:

Total(conn_k) = Setup(conn_k) + t_resp^end(r_{n_k}^k) - t_req^start(r₁^k),

$\begin{displaymath}Network(conn_k) = Setup(conn_k) + \sum_{j=1}^{l} {Network(PipeGr_j)},\end{displaymath}$

$\begin{displaymath}Server(conn_k) = \sum_{j=1}^{l} {Server(PipeGr_j)}.\end{displaymath}$

Now, we define similar latencies for a given page P:

$\begin{displaymath}Total(P) = \max_{j \leq N}{t_{resp}^{end}(r_{n_j}^{j})} - \min_{j \leq N}{ t^{start}(conn_j)},\end{displaymath}$

$\begin{displaymath}CumNetwork(P) = \sum_{j=1}^{N} {Network(conn_j)},\end{displaymath}$

$\begin{displaymath}CumServer(P) = \sum_{j=1}^{N} {Server(conn_j)}.\end{displaymath}$

For the rest of the paper, we will use the term EtE time interchangeably with Total(P) time.

All the above formulae use t_resp^end(r) to calculate response time. An alternative way is to use as the end of a transaction the time t_resp^ack(r) when the ACK for the last byte of the response is received by a server. Figure 4 shows an example of a simplified scenario where a 1-object page is downloaded by the client: it shows the communication protocol for connection setup between the client and the server as well as the set of major timestamps collected by the EtE monitor on the server side. The connection setup time measured on the server side is the time between the client SYN packet and the first byte of the client request. This represents a close approximation for the original client setup time (we present more detail on this point in Section 8 when reporting our validation experiments).

**Figure 4:** An example of a 1-object page download by the client: major timestamps collected by the EtE monitor on the server side.
$\begin{figure} \centering \def 0.4 ...$

If the ACK for the last byte of the client response is not delayed or lost, t_resp^ack(r) is a more accurate approximation of the end-to-end response time observed by the client: it ``compensates'' for the latency of the first client SYN packet that is not measured on the server side. The difference between the two methods, i.e. EtE time (last byte) and EtE time (ack), is only a round trip time, which is on the scale of milliseconds. Since the overall response time is on the scale of seconds, we consider this deviation an acceptably close approximation. However, to avoid the problems with delayed or lost ACKs, EtE monitor determines the end of a transaction as the time when the last byte of a response is sent by a server.

Metrics introduced in this section account for packet retransmission. However, if the retransmission happens on connection establishment (i.e. due to dropped SYNs), EtE monitor cannot account for this.

The functions CumNetwork(P) and CumServer(P) give the sum of all the network-related and server processing portions of the response time over all connections used to retrieve the web page. However, the connections can be opened concurrently by the browser. To evaluate the concurrency impact, we introduce the page concurrency coefficient ConcurrencyCoef(P):

$\begin{displaymath}ConcurrencyCoef(P) = { \sum_{j=1}^{N}{Total(conn_j)} \over {Total(P)}} .\end{displaymath}$

Using page concurrency coefficient, we finally compute the network-related and the service-related portions of response time for a particular page P:

Network(P) =CumNetwork(P)/ConcurrencyCoef(P),

Server(P)=CumServer(P) / ConcurrencyCoef(P).

EtE monitor can distinguish the requests sent to a web server from clients behind proxies by checking the HTTP via fields. If a client page access is handled via the same proxy (which is typically the case, especially when persistent connections are used), EtE monitor provides correct measurements for end-to-end response time and other metrics, as well as provides interesting statistics on the percentage of client requests coming from proxies. Clearly, this percentage is an approximation, since not all the proxies set the via fields in their requests. Finally, EtE monitor can only measure the response time to a proxy instead of the actual client behind it.

Next: 6.2 Metrics Evaluating the Up: 6. Metrics to Measure Previous: 6. Metrics to Measure