Next: Proactive Disconnections Up: Performing Replacement in Modem Previous: Soft Faults

Implementation Considerations and Applicability

The idea advocated in this paper is to consider disconnecting users only if a modem pool is fully occupied. Following this approach, we studied different algorithms for picking users to disconnect. Unfortunately, neither modem servers nor modems have native support for the policies we describe. Fortunately, however, the approach is very easy to implement.

The most straightforward implementation would be one that does not strictly perform replacement but keeps a fixed number of modems unoccupied, as long as users with idle time more than exist. That is, for a small number n, if the number of available modems drops below n and there are users idle for more than seconds, the system will disconnect one of these users and repeat the process. If there are no users idle for more than seconds, then all modems can be occupied and other connections will get a busy signal. This implementation works well because it does not penalize the system in cases of heavy load that cannot be alleviated with disconnections (all modems can be used). In case of a load that could be lightened with disconnections, it only penalizes the system by keeping n modems available. The number n could be quite small relative to the pool size. For the Telesys pool of over 3,000 modems, a value of n = 20 would be reasonable (see below for the rate of connections and disconnections in the Telesys traces).

Additionally, what makes a sophisticated modem disconnection policy easy to implement is that the data and decisions involved are not significant for modern machines. For the Telesys modem pool (which is among the largest unified pools encountered in practice) one has to manage up to a few thousands of modems at any time and a total number of users in the low tens of thousands. Handling replacement policy data structures with this many entries is a simple matter. Even for CIRG and LRU, the more ``costly'' policies among the ones we studied, updating the data structures and selecting a user to disconnect was, at most, a matter of milliseconds. The total memory required was less than 2MB for CIRG and less than 100KB for LRU at any time. Furthermore, the input data change at human-time rates. Typically, 5 to 10 connections or disconnections per minute were observed in the Telesys trace. As we saw in our experiments, our polling interval of 2 minutes was sufficient for obtaining data such that accurate predictions can be made.

In fact, the problem is computationally simple enough that even a centralized remote implementation is sufficient. (This is certainly not the only option but we discuss it here because of its simplicity.) That is, a remote workstation can be periodically polling all the terminal servers and sending messages that will initiate user disconnections. Many modern communications servers support SNMP (see [CFSD90] and [MR91] for the protocol and the relevant MIB entries), so both the polling and the disconnection commands can be sent remotely over the Internet. Alternatively, a centralized implementation with small proxies that will perform the disconnections at every server seems to be a simple option.

To see how feasible this is, during our trace collection, we polled the over 100 Telesys terminal servers remotely over the Internet using the ``finger'' command (which uses the Finger user information protocol [Zim91]). This method is clearly inefficient because the protocol is not optimized for periodic polling and because the information we needed was less than 5% of the total transmitted data. Nevertheless, our polling took around 50 seconds when done serially and around 15 seconds when done with one process per terminal server (the vast majority of processes finished within 3 seconds but a couple took longer). Although we have no way of analyzing the delay, it is reasonable to assume that it is primarily due to delay of processing at the terminal server and secondarily due to network latency. The former can be minimized with a less inefficient polling protocol. The latter could be reduced if our machine storing the trace was at closer network proximity of the servers. Nevertheless, even the 15 seconds taken for a remote, inefficient poll are perfectly acceptable--user statistics will not have changed significantly in this time.

Next: Proactive Disconnections Up: Performing Replacement in Modem Previous: Soft Faults

Yannis Smaragdakis
Tue Apr 25 15:09:47 EDT 2000