Performance

Table 2 shows the results of three benchmarks comparing the performance of Linux kernels running with (``LOMAC v1.1.0'') and without LOMAC (``No LOMAC''). The benchmarks tested version 1.1.0 of LOMAC with run-time assertions disabled. The first entry in the table measure the time to perform the ``make'' portion of the Linux 2.2.5 kernel build procedure on 450MHz Intel Pentium II-based RedHat 6.0 system. Each result is the average of 10 trials, discarding an initial uncounted trial to prime caches. Although this macro-benchmark tends to hide LOMAC's additional kernel overhead, it gives an impression of how a user might perceive LOMAC's performance on a real workload.

The second and third table entries show the latency and throughput performance of the Apache/1.3.9 web server running on a 133MHz Intel Pentium-based RedHat 6.1 system. This web server was connected via a 10Mbit crossover (uplink) Ethernet cable to a Sun Microsystems Ultra 5 workstation running Solaris 2.6. This workstation performed a series of 47 10-minute-long trials running the WebStone 2.5b4 web server benchmark using 32 test clients applying the standard WebStone static workload to the webserver to produce each result. The apparent small improvement in latency is spurious; the performance impact of LOMAC is much smaller than the variance in the WebStone benchmark's results.

Table 2: Benchmark Results

Kernel Build Elapsed Time (s)
	mean	std. dev.	penalty
No LOMAC	269.61	0.03	-
LOMAC v1.1.0	278.05	0.03	3.1%
Webstone Latency (s)
	mean	std. dev.	penalty
No LOMAC	0.569	0.003	-
LOMAC v1.1.0	0.567	0.003	-0.2%
Webstone Throughput (Mbit/s)
	mean	std. dev.	penalty
No LOMAC	8.327	0.058	-
LOMAC v1.1.0	8.305	0.063	0.3%
UB Execl Throughput (loops/s)
	mean	std. dev.	penalty
No LOMAC	642.4	23.7	-
LOMAC v1.1.0	537.0	21.7	16.4%
UB File Copy 256 Byte buffers (KByte/s)
	mean	std. dev.	penalty
No LOMAC	34393	289	-
LOMAC v1.1.0	31131	222	9.5%
UB File Copy 1024 Byte buffers (KByte/s)
	mean	std. dev.	penalty
No LOMAC	69672	385	-
LOMAC v1.1.0	66155	573	5.0%
UB File Copy 4096 Byte buffers (KByte/s)
	mean	std. dev.	penalty
No LOMAC	81379	547	-
LOMAC v1.1.0	79078	775	2.8%
UB Pipe Throughput (loops/s)
	mean	std. dev.	penalty
No LOMAC	263124	1679	-
LOMAC v1.1.0	234225	4289	11.0%
UB Pipe-based Context Switch (loops/s)
	mean	std. dev.	penalty
No LOMAC	139917	1827	-
LOMAC v1.1.0	116993	1510	16.4%
UB Process Creation (loops/s)
	mean	std. dev.	penalty
No LOMAC	3811	20	-
LOMAC v1.1.0	3830	24	-0.5%
UB System Call Overhead (loops/s)
	mean	std. dev.	penalty
No LOMAC	249414	332	-
LOMAC v1.1.0	249356	303	0.2%
UB 8 Shell Script Load (loops/minute)
	mean	std. dev.	penalty
No LOMAC	144.2	3.0	-
LOMAC v1.1.0	129.0	3.1	10.5%

The remaining table entries show the results of the BYTE UNIX benchmarks performed with the UnixBench 4.1.0 software on the same system used for the kernel-build benchmark. Each result is the average of 21 trials. The table omits the largely computational DhryStone and WhetStone components of the benchmark; the presence of LOMAC did not significantly affect these components. The apparent small improvement in process creation time is also spurious; the performance impact of LOMAC is smaller than the variance in the Process Creation portion of the UnixBench benchmark.

LOMAC's performance is comparable to interposition-based general kernel extension mechanisms such as Generic Software Wrappers [10] and SLIC [11]. For example, the SLIC prototype reported performance penalties ranging from 0% to 5% on an emacs-building benchmark, depending on how many security extensions were loaded at the time. The Generic Software Wrappers prototype reported penalties ranging from 3.5% to 6.5% on a kernel-building benchmark, up to 1.4% for WebStone latency, and up to 3.3% for WebStone throughput, again depending on how many security extensions were loaded.

LOMAC has not yet been optimized for performance; there are several areas of its implementation that trade performance for simplicity in order to support the rapid development of new features. For example, when a process opens or executes a file, LOMAC consults the PLM to determine the file's level and the level of its parent directory. LOMAC saves these levels in memory for the benefit of its read and write mediation functions. However, LOMAC makes no attempt to skip the PLM lookup on subsequent opens, even for files and directories that already have their levels stored in memory. The PLM implementation is presently based on a simple but inefficient sequential search. Lookups on short, common directories such as ``/bin'' and ``/usr/bin'' require 25 string comparisons. This inefficiency is reflected in the high penalty shown by the UnixBench Execl Throughput benchmark. Considerable time could be saved by avoiding redundant PLM lookups, and by improving the PLM's search algorithm.

At a higher level, LOMAC might save time by not mediating the actions of high-level processes, since LOMAC always allows high-level processes to do as they wish. Similarly, LOMAC might save time by not considering low-level processes for demotion, since low-level processes are already running at the lowest integrity level. This optimization has the potential to reduce the overhead of read and write operations shown in the three UnixBench File Copy benchmarks. As LOMAC nears its goals for features, an increasing amount of development resources will be allocated to improving performance.