################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
      Proceedings of the USENIX 1996 Annual Technical Conference
		San Diego,  California,  January 1996


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


		     Implementation of IPv6 in 4.4 BSD


	   Randall J. Atkinson, Daniel L. McDonald, Bao G. Phan,

		     Craig W. Metz*, & Kenneth C. Chin


	 Information Technology Division, Naval Research Laboratory


			      November 9, 1995


				  Abstract

        The widespread availability of the TCP/IP protocols in early versions
     of BSD UNIX fostered the currently widespread use of those protocols
     in commercial products. Rapid depletion of the IPv4 address space has
     caused the Internet Engineering Task Force to design version 6 of the In-
     ternet Protocol (IPv6). IPv6 has some similiarities with IPv4, but it also
     has many differences, most notably in address size. This paper describes
     our experience creating a freely distributable implementation of IPv6 in-
     side 4.4 BSD, with focus on the areas that have changed between the IPv4
     and IPv6 implementations.


1    Introduction


During the past decade, the worldwide Internet has grown at exponential
rates, not only in North America but also in Europe and Asia. [Lot92] This,
combined with suboptimal address allocation practices, has led to increasing
depletion of the IP version 4 (IPv4) address space.  One direct result of
the IPv4 address depletion was that the Internet Engineering Task Force
(IETF), began working to create a revised version of the Internet Protocol
(IP). This effort is called Next-Generation IP (IPng). The resulting
protocol is IP version 6 (IPv6). When the IPng effort began, there were
several contenders, but in July 1994 the SIPP proposal became the primary
basis for IPv6.

   The widespread availability of TCP/IPv4 in early versions of BSD UNIX was
crucial to the success and deployment of the Internet technologies. In or-
der to help make Next-Generation IP as widely available, the authors began
working with the Simple Internet Protocol (SIP) Working Group of the IETF in
1992.[Dee93] As SIP evolved into SIPP [Hin94] and then into IPv6, the au-
thors began prototyping, initially in BSD Net/2 and currently in 4.4
BSD. Our

____________________________

* Although Craig W. Metz is with Kaman Sciences Corporation, he may be reached
 at NRL.


                                  1


primary development systems were Sun SPARC workstations and i486 systems
running 4.4 BSD 1.

   Implementation issues, rather than the details of the IPv6 protocol, are
the focus of this paper.  A number of implementation issues arose with IPv6
and have been resolved. Obvious issues, such as supporting 128 bit addresses
instead of 32 bit addresses, are discussed in addition to the less obvious
issues of how to implement IPv6 security inside a BSD kernel. We assume that
the reader is somewhat familiar with the IPv6 protocol [DH95 ] and the 4.4
BSD- Lite implementation of IPv4. Figure 1 shows a rough overview of 4.4
BSD-Lite's Internet implementation, along with some of the new modules for
IPv6. To add a new version of IP, many of the surrounding modules had to be
modified as well.


		<Cut out...>
        Figure 1: Simple Overview of 4.4 BSD-Lite Internet Modules


2    Changes in Basic IP Functions


2.1   Differences in packet format

Perhaps the most obvious difference between IPv6 and its predecessor is the
packet format. Although some in the Internet community felt that 64 bit ad-
dresses were sufficiently large, others insisted that 128-bit addresses were
needed
____________________________

1 The systems running 4.4 BSD (encumbered) have had the 4.4 BSD-Lite
networking changes incorporated into them. Some call this a BSD Net/3
system.


                                  2


so that plug-and-play address assignment similar to ISO ES-IS could be sup-
ported. Many of the IPv4 header fields that were unused in practice (Figure
2) were eliminated or moved to options, making the IPv6 base header (Figure
3) more streamlined.  One significant addition to the header is the Flow
Identi- fier which is an important hook for resource reservation techniques
[ZBE+ 93] currently being developed within the IETF.

   The sparse IPv6 header is optimized for minimal processing. An IPv6
router needs only to verify the version number, inspect the destination
address, decrement the hop counter, and process hop-by-hop options if they
are present. (The flow label can be used to optimize this process further.)
An IPv4 router has to perform everything an IPv6 router does, as well as
verify and recompute the header checksum, and fragment the datagram further
if needed. An IPv6 destination host initially only has to check the validity
of the version and des- tination address. If there are options, they are
daisy-chained and indicated by the Next Header field. Otherwise, a
higher-level protocol (e.g. TCP) is the next header processed. An IPv4
destination host has to verify not only the version and destination address,
but the IP header checksum as well.


   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                     Figure 2: IPv4 Packet Format

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version| Prio. |                   Flow Label                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Payload Length        |  Next Header  |   Hop Limit   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                         Source Address                        +
   |                                                               |
   +                                                               +
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                      Destination Address                      +
   |                                                               |
   +                                                               +
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                     Figure 3: IPv6 Packet Format


                                  3


2.2   Protocol Processing

A number of the more recently developed IPv4 optional features are mandatory
in IPv6.  Other features, such as cryptographic security, are new with
IPv62.  These have caused a number of changes in IP protocol processing.

   IPv6 daisy-chains optional headers after the base header. Our implementa-
tion pre-parses an IP packet into its constituent headers and upper-layer
pro- tocol data as part of the initial IPv6 input processing.  Although this
does degrade performance, it has simplified the processing of optional IPv6
headers.  We plan to create a fast path around the preparsing code for
packets containing no optional headers.

   The Path MTU Discovery [MD90 ] technique for avoiding IP fragmentation in
routers is mandatory for IPv6. IPv6 does not have any intermediate fragmen-
tation and instead relies on Path MTU Discovery and end-to-end
fragmentation.  Our implementation stores Path MTU information in host
routes. Host routes are automatically created for IP communications
originating on the local machine. Storing this information in the routing
table makes this data available to TCP, UDP, and ICMP. IPv6 requires a
minimum MTU of 576 bytes, which is much larger than the 68 byte minimum MTU
of IPv4. However, even this larger size might be too small if certain IPv6
options, such as the Hop-by-Hop Options Header (which can be up to 2048
octets), are used. In such cases, end-to-end fragmentation will be required.


3    Security Processing


Two cryptographic security mechanisms have been defined for IPv6 [Atk95c].
One, known as the Authentication Header (AH), provides authentication with-
out confidentiality[Atk95a]. The second, known as the Encapsulating Security
Payload (ESP), provides confidentiality through encryption of packet
contents.  [Atk95b ] ESP has two modes. The first mode, known as
Transport-mode, en- crypts only the upper-layer header and data (such as
TCP, UDP, or ICMP) and leaves the IP header in the clear.  The second mode,
known as Tunnel-mode, encrypts an entire IP datagram, prepending an
additional cleartext IP header outside the encrypted IP datagram so that the
packet can be routed. The im- plementation of these mechanisms broke new
ground within the BSD kernel. In addition to implementing the Authentication
Header and both modes of ESP, we also implemented the kernel support
required to manage network security associations, including the
cryptographic keys.

   The IPv6 security mechanisms can use any appropriate encryption or au-
thentication algorithm. The mandatory algorithms for a compliant implemen-
tation are keyed MD5[MKS95b ] for authentication, and DES-CBC[MKS95a ] for
encryption. Both algorithms are in this implementation. To implement a new
ESP or AH algorithm, the kernel must be recompiled with support for the new

____________________________

2 The cryptographic security recently standardised for IPv4 and IPv6 was
  originally designed for use with IPv6 and later adapted for use with IPv4.


                                  4


algorithms in place.  Other algorithms, such as triple-DES, are being imple-
mented by others. Later in this paper, we discuss why it is straightforward
to add support for additional cryptographic algorithms.

   Both ESP Transport-mode encryption and Authentication Header output
processing are normally performed immediately before any fragmentation on
outgoing packets and after reassembly on the input side.  They are done this
way because, except for fragmentation, they need to operate on the packet as
it will appear on the wire. For example, the source address for the packet
from a multi-homed system must be known before encryption or authentication
can take place.


3.1   Security Associations

A fundamental concept behind IP security is the Security Association. A Se-
curity Association contains all of the configuration data for a particular
secure session between two or more systems communicating via IP. For
example, the security services in use (AH or ESP), the cryptographic
algorithm(s) in use, the cryptographic key(s) in use, the key lifetimes, the
Security Parameters Index (SPI), and the sensitivity level
(e.g. Unclassified, Secret) of the session are all components of a Security
Association.  In order to support multicast as well as unicast, all Security
Associations are one-way from source to destination.  So a typical telnet
session would need two Security Associations, one in each direction.

   Security associations are stored in a table inside the kernel. A module
called the Key Engine controls access to the table. The Key Engine allows
kernel ser- vices, such as the IPv6 module, to obtain security associations
for inbound and outbound packets. The Key Engine also communicates with
user-level key man- agement programs so that key management may be
implemented properly. The relationship between the key engine and user-level
key management programs is similar to the relationship between the routing
socket[Skl91] and programs such as gated(8).


3.2   Security Processing Structure

The authentication processing function is split into three major parts. The
first, a keyed message digest function, is selected on a per-association
basis through an algorithm switch that calls the appropriate computation
function. The second, the header processing routines, finds the appropriate
security association and policy actions for the packet and either builds or
parses the actual option header for authentication. The third part is the
meat of the authentication function.  This routine walks the packet, header
by header, zeroing header fields that vary unpredictably end-to-end, and
passing other header fields and the packet data into the keyed message
digest function. The resulting message digest data can be either inserted
into the outgoing header or, in the case of an incoming packet, checked with
the one in the header. The keyed message digest functions are treated in the
AH calculation function as stream operations; any necessary


                                  5


blocking and padding must be handled by the implementation of the keyed
message digest functions.

   The encryption processing function is split into similar parts. The
first, an encryption/decryption function, and the second, a transform header
construc- tion and parsing function, are selected on a per-association basis
through an algorithm switch.  Because almost all of the header format can
vary depend- ing on which cryptographic transform is being used, it is
necessary that both the cryptographic functions and the header processing
functions be switchable.  There is a generic reblocking function that runs a
specified encryption or de- cryption function over the data while arranging
it into properly sized blocks.  Block-oriented encryption and decryption
functions require the encrypted data to be an integral number of
cryptographic blocks.


3.3   Output Security Processing

Immediately before IP fragmentation is performed, ipv6_output() calls an IP
security output policy function, ipsec_output_policy(), to determine whether
this packet needs security.  This function examines the system security
level configured by the administrator and the socket security level
requested by the process on the socket. The function is able to examine the
socket security level because each outgoing packet data chain now contains a
back pointer to the socket that sent the packet. The security output policy
function then examines the system-wide security policy and the
socket-requested security policy and applies the more paranoid of these
policies to the outgoing packet.

   The ipsec_output_policy() function is also responsible for making the
getassocbysocket() call into the Key Engine to obtain Security Association
data for the outgoing packet. If the Key Engine has the appropriate Security
Associations, it provides access to them. If no appropriate Security
Association exists and a key management daemon is running, then the Key
Engine sends a Request message to that daemon and informs the output policy
function that the Security Association has been delayed. If no appropriate
Security Associ- ation exists and no key management daemon is running, then
the Key Engine returns an error to ipsec_output_policy(). If this error
occurs, it will eventually be presented to the user as the newly defined
IP Security processing error, EIPSEC.

   If IP security is needed and all appropriate security information is
available for the outgoing packet, then the output security policy function
will return both an indication of which services are needed and pointers to
the appropriate Security Associations. The IP Output function then makes the
appropriate calls to apply outgoing security services and then sends the
packet out. If any errors occur during security output processing, the
packet will be dropped and the user will be given the EIPSEC error mentioned
above. In the future, we might enhance the getassocbysocket() call to
provide the user identification or uid associated with the network socket so
that the Key Engine can provide finer granularity of keying.  The current
implementation does support both shared (i.e. host-oriented) keys and also
unique (i.e. socket-oriented) keys.


                                  6


3.4   Input Security Processing

For incoming packets, the task is significantly easier. When an
Authentication Header or Encapsulating Security Payload header is
encountered, it is processed by calling the appropriate IP security input
function (either ipsec_ah_input() or ipsec_esp_input()).  That function
reads the Security Parameters Index (SPI) contained in the cleartext portion
of the received packet and makes a getassocbyspi() call into the Key Engine
to obtain the correct Security Asso- ciation for the received packet. If
this call succeeds, the security input processing is performed and the
appropriate security-related flag is set. The packet data chain has two new
flags, both initially cleared on input, called M_AUTHENTIC and
M_DECRYPTED. These flags indicate that the packet passed authentication
processing and encryption processing, respectively.  If any security input
pro- cessing fails, the packet is dropped and appropriate kernel statistics
counters are incremented. A modified netstat(8) is supplied that can display
these statistics for the system administrator. If more than one form of
security has been ap- plied, then the packet will go through more than one
security input processing function.

   The input security processing code also performs special checks comparing
the outer IP source address and the (previously encrypted) inner IP source
ad- dress for the case when an IP datagram is tunnelled inside another IP
datagram and either the Authentication Header or the Encapsulating Security
Payload is present. These checks are intended to prevent an adversary system
from encap- sulating a forged packet inside an authenticated or encrypted
legitimate packet and tricking the receiving system into believing the
forged packet was authentic.  If these source address checks fail, then the
M_AUTHENTIC or M_DECRYPTED flags on the received packet data chain are
cleared.

   After security input processing is completed, the normal input processing
resumes.  Once the packet reaches the transport layer, the transport layer's
input function, for example tcp_input(), calls ipsec_input_policy() to per-
form an input security policy check. The incoming packet is dropped if it
does not meet the requirements for authentication or encryption that exist
for its destination socket. Because ipsec_input_policy() checks not only the
socket security requirements but also the system-wide security requirements,
the sys- tem administrator can mandate a minimum security level for all
normal network connections.


3.5   Policy Separation

The separation of the policy engine from the mechanisms allows per-socket
secu- rity selections and administrative security selections to be combined
in sophis- ticated ways. For instance, an administrator could require that
packets coming in on a certain range of privileged ports must come from a
privileged port and must be authentic in order to protect the
administrator's system from potential abuses.  The current policy engine
only implements simple system-wide deci- sions (e.g., drop all non-authentic
packets, always use authentication if we have


                                  7


a security association that will facilitate it) in conjunction with
application requested socket security. Enhancements to the security policy
engine are planned for the future.


3.6   Algorithm-independence

Care was taken to provide multiple levels of indirection to take advantage
of the algorithm-independent nature of the Authentication Header and
Encapsulating Security Payload (ESP) specifications. Both implementations
use an algorithm switch, which is indexed by a value in the security
association, to support multiple algorithms concurrently and allow easy
addition of new message digest and encryption functions. This switch is more
complex for ESP, because almost all of the ESP header format can change as a
function of the transform in use. For this case, the switch allows
implementors to specify the header processing code and the encryption code
separately for greater flexibility. For instance, someone wanting to
substitute the IDEA algorithm [LM91 ] for the default DES-CBC algorithm but
still use the same basic header format could create a new algo- rithm switch
entry that uses the same header processing functions as DES-CBC [MKS95a ]
but calls the IDEA encryption functions instead. Different algorithms will
have different performance impacts. Supporting multiple algorithms in the
kernel does not exact a significant performance penalty.


4    Changes to ICMP and IGMP


The Internet Control Message Protocol (ICMP) is perhaps not as widely known
as TCP or UDP, but it performs a critical function in keeping the network
operating smoothly. The Internet Group Membership Protocol (IGMP) is in-
tegral to IP multicasting. ICMP for IPv6 is sufficiently different that it
is now sometimes referred to as ICMPv6 [Pos81][DC95 ].

   Despite having similar header syntax, ICMPv6 differs from ICMP for IPv4
in four major ways. First, ICMPv6, like TCP and UDP, requires a
pseudo-header to be included in its checksum calculation.  Second, the
difference between informational messages (e.g.  Echo) and error messages
(e.g.  Port Unreach- able) is now indicated by the high bit in the ICMPv6
message type.  Third, ICMPv6 absorbs the functions of the formerly separate
IGMP [Dee89], ARP [Plu82][FMMT84 ], Proxy ARP, and ICMP Router Discovery
[Dee91] protocols.  Finally, ICMPv6 also adds support for stateless address
auto-configuration. Be- cause ICMP is above the IP layer, all of these
functions can now be authenti- cated and or encrypted using the IP security
mechanisms, as long as appropriate security associations exist. Sites that
wish to bootstrap securely can now do so.


4.1   Traditional ICMP and IGMP

ICMPv6 retains the functions traditionally performed by ICMP and IGMP. The
Echo and Echo-Reply messages, utilized by ping(8), are still part of ICMPv6.


                                  8


Unreachability of varying forms is indicated by the ICMPv6 Unreachable mes-
sage type. Extensions have been added to indicate unreachable on-link neigh-
bors, as well as errors with strict source routing. A Message Too Big
message indicates when an IPv6 datagram is too large for a link on its
path. Path MTU discovery [MD90 ], a requirement for IPv6, is implemented
using these messages.  Parameter Problem messages indicate invalid IPv6
option fields, as they do in IPv4's ICMP. Time Exceeded messages indicate
either a hop limit that has decremented to zero, or that an IPv6 reassembly
has timed out.3

   ICMPv6 has three additional informational messages: Group Report, Group
Query, and Group Terminate. The first two behave just like the IGMP Report
and Query messages. The Group Terminate message is an optimization so that
routers can be informed more quickly about hosts leaving multicast groups.


4.2   Address Auto-Configuration and Router Discovery

The Internet community mandated that IPv6 support simple address auto-
configuration for hosts.  IPv6 has two solutions to this problem.  The first
approach is to use an optional configuration protocol, such as DHCPv6. This
solution is beyond the scope of this paper.  The second approach, known as
stateless address autoconfiguration, is required, and is implemented in
ICMPv6 [TN95 ].


4.2.1  Link-local Addresses

When an interface is configured for IPv6, it must have a link-local
address. A link-local address is formed by placing a link-local prefix
fe80:: in front of a token, usually the interface's MAC address.  In our
implementation, this is done by the ifconfig(8) application placing this
address on an interface before any other addresses are placed on the same
interface. Implementations must be able to detect whether their link-local
address has been duplicated on the same link (e.g. Ethernet).[NNS95 ] Our
planned approach to this collision detection is discussed in the Neighbor
Discovery section. Once the link-local address is verified as being unique
on a link, the first phase of stateless address auto-configuration is
completed. The IPv6 node can then send out ICMPv6 Router Solicit messages to
locate a router, and begin the second phase of address auto-configuration.


4.2.2  Router Discovery

IPv6 routers send out periodic Router Advertisement messages to the all-nodes
multicast address. Also, IPv6 routers send out Router Advertisement messages
in response to Router Solicit messages. Besides performing the traditional jobs
____________________________

3 This implementation cannot send Time Exceeded messages for IPv6 reassembly
  timeouts;  the "offending packet" needed for the ICMPv6 message is no
  longer available for transmission because reassembly is occurring.


                                  9


of IPv4 router advertisements, IPv6 router advertisements also advertise
parameters relating to Neighbor Discovery: suggested MTUs on variable-MTU
links, suggested maximum hop limits, and on-link prefixes.

   It is the advertisement of on-link prefixes which completes stateless
address auto-configuration. If the Router Advertisement message indicates
that state- less configuration is to be performed, the message will also
contain the globally routable address prefix used on the link. The node then
takes the token from its link-local address, and prepends the advertised
prefix to form an automatically configured globally routable address. The
internal code to handle such adver- tisements also handles the manual
address configuration requests from programs such as ifconfig(8).

   Unlike IPv4, IPv6 addresses can have lifetimes.  In concert with
stateless address auto-configuration, lifetimes provide a way for relatively
rapid IPv6 address renumbering to occur. Provider-oriented addressing is one
of the address schemes that will be used with IPv6.[RLH+ 95 ] With
provider-oriented address- ing, the ability to rapidly renumber many systems
at a site is essential if that site should ever want to change network
service providers. Hence, IPv6 interface addresses in the kernel now contain
lifetime fields.


4.3   Neighbor Discovery

IPv6 does not use ARP.4 Instead, IPv6 uses multicasting and ICMPv6 to dis-
cover the addresses of on-link neighbors.[NNS95 ] Our implementation uses
host routes for on-link neighbors and keeps link-layer information inside
the route, much as 4.4BSD implements ARP entries. Like ARP, IPv6 neighbor
discovery has the route's gateway address point to a data-link socket
address, for example an Ethernet MAC address.

   IPv6 Neighbor Discovery is responsible for finding the link address
infor- mation for the host route entries.  If an IPv6 destination is
determined to be on link, either by matching an on-link prefix (represented
as a cloning network route, as IPv4 does), or by determining that there is
no other way to reach a destination, a neighbor solicit is sent out to a
special multicast address.  The special multicast routing prefix ff02::1: is
prepended to the low 32 bits of the solicited neighbor. All nodes
automatically join the Solicited Nodes multicast group appropriate for their
own addresses.  Broadcast does not exist in IPv6; multicast replaces all
uses for broadcast.5 Once a Neighbor Solicit is heard, enough information is
known to send a unicast Neighbor Advertisement to the solicitor, and now the
soliciting node knows that the neighbor is reachable.  While the solicted
node has enough information to return the unicast neighbor advertisement,
reachability the opposite way is not yet confirmed. Unicast so- licit and
advertisement messages confirm the reachability of the neighbor after
initial reachability is established. Upper-level protocols (e.g. TCP) can
also be used to provide reachability confirmation.6
____________________________

  4 Hence, ARP-related broadcast storm problems will not be present with IPv6
  5 Hence, broadcast storms will not exist with IPv6.
  6 We are still experimenting with the best way for TCP to update
    reachability without impairing performance.


                                 10


   Users can use netstat -r to examine the state of currently reachable and
recently reachable neighbor systems. This neighbor reachability information
is kept as part of the routing table in the kernel, so reachability updates
for one session to a neighbor will also refresh reachability for other
sessions to the same neighbor.  Neighbors that have become unreachable will
linger in the routing table and will eventually be marked with the
RTF_REJECT flag. This is similar to the way ARP is handled in 4.4-Lite BSD.

   Neighbor discovery can be used to detect the uniqueness of a link-local
ad- dress. After a link-local address is configured, the node sends a
multicast neighbor solicit for it's proposed link-local address. If no
neighbor responds with a neighbor advertisement, then the link-local address
is unique for the link. The alpha release does not currently implement
collision detection, because of the difficulty in placing the functionality
of the detection. If done in the kernel, a user process may be trapped in
the ioctl(2) call for a long time while collision detection takes place. If
done in user space, multiple calls will have to be made into the kernel.


5    Transport Layer Changes


Both the UDP and TCP protocols remain unchanged for IPv6. However, the BSD
implementations required modification to provide concurrent support for IPv4
and IPv6.  The main difficulties arose due to the different sizes of the
IPv4 header and the IPv6 header. Because the TCP and UDP implementations are
shared between IPv4 and IPv6, we designed a modified Protocol Control Block
(PCB) structure that supports both versions of IP. Had the original BSD
implementation of TCP, UDP, and IP not been so closely coupled, it would
have been easier to add IPv6 support into the kernel.


5.1   Protocol Control Block

Since TCP and UDP do not change between IPv4 and IPv6, TCP and UDP use the
modified Protocol Control Block structures (PCBs) in the same way. With
IPv6's larger address space, the PCBs were modified to support both IPv4 and
IPv6 addresses and to denote which addresses are actually in use. To support
both protocols, new unions were devised.  To make these changes invisible to
existing code, appropriate #defines were added that silently dereferenced
the appropriate component of the union. Figure 4 shows an example of a new
union and its corresponding new #defines.

   The IPv4-IPv6 transition specification [GN95 ] makes it easier to support
both protocols in a single PCB by allocating a portion of the IPv6 address
space for use as "IPv4-mapped" addresses, which cannot be used as addresses
in IPv6 datagrams. Additionally, if a session is intending to send IPv6
datagrams, a bit in the session's PCB's flags will be set indicating
this. If that bit is not set, then


                                 11


		  union {
			struct route ru_route;
			struct route6 ru_route6;
			} inp_ru;

#define inp_route inp_ru.route
#define inp_route6 inp_ru.route6


             Figure 4: Route union used in new PCB structure


IPv4 is in use. The route, IP header template, and multicast options
elements now use unions so that either IPv4 or IPv6 can be used with the
PCB.

   New PCB functions were written to support bind, connect, and notify func-
tions on PF_INET6 sockets.  Because such a socket can be used to send and
receive either IPv4 or IPv6 traffic, these functions needed to be separate
from the equivalent IPv4 functions and also needed to handle both versions
of IP. In the near future we intend to enhance these functions to fully
support the IPv6 Flow Identifier field so that real-time and predictive
services are provided to applications. The in6_pcbnotify() function also
calls the input security pol- icy function to determine whether a particular
error can be passed upwards to the application or whether that would cause a
security violation and the error should not be delivered.


5.2   Changes in UDP

The UDP protocol remains unchanged for IPv6, but the BSD implementation
needed to be modified to support both versions of IP. The majority of the
changes to the UDP code resulted from the need to support the different
address format.  The changes are minimal and are isolated to the following
functions udp_input(), udp_output(), udp_ctlinput(), and
udp_usrreq(). Almost all changes occur in the input and output processing of
UDP datagrams, handled by the functions udp_input() and udp_output(),
respectively.

   Incoming UDP datagrams, regardless of whether they are transported over
IPv4 or IPv6, are processed by udp_input(). Where the code needs to access
elements of the IP header, different code paths are executed for IPv4 and
IPv6 datagrams. The function relies on a local variable, which it sets on
entrance to the function, to determine which code path to follow.  An
example of a code path specific to IPv6 is the processing of an IPv4 packet
destined for an IPv6 socket. The IPv6 BSD Sockets API specification allows
an application to receive both IPv4 and IPv6 datagrams using an IPv6
socket.[GTB95 ] Code has been added to allow udp_input() to handle this
special case.

   The udp_input() function now calls the input security policy function
before processing an incoming packet. This ensures compliance with both
socket and system security requirements.  If an incoming packet should not
be delivered


                                 12


for security policy reasons, then it is silently dropped.  This check does
exact a performance penalty on each received packet, but we have not yet
found a better way to handle input security policy checks.

   The function udp_output() is called to create and send a UDP datagram.
It determines whether to create an IPv4 or IPv6 datagram by looking at the
protocol control block for the socket originating the datagram. If the
socket's protocol family is PF_INET6 and the socket's PCB indicates that the
destina- tion is a native IPv6 address, an IPv6 UDP datagram is composed and
sent down to the IP layer via the ipv6_output() function. If the protocol
family is PF_INET, ip_output() is called instead of ipv6_output. A
significant change in udp_output() from its IPv4 version involves the
calculation of the UDP check- sum.  In IPv4, calculation of the UDP checksum
is optional and is controlled by the global variable udpcksum. Since IPv6 no
longer has an IP layer check- sum, the UDP checksum is not optional and must
be calculated for all IPv6 UDP packets.  This is necessary to provide
integrity protection of the source and destination address that is not
provided by IPv6, which lacks an IP header checksum.

   The remaining changes in udp_ctlinput() and udp_usrreq() are minor
changes to call IPv6 versions of certain IPv4 functions or to initialize
IPv6 specific variables in the protocol control block.  Overall, the
modifications of UDP code to work with both IPv4 and IPv6 are
straightforward.


5.3   Changes in TCP

The TCP protocol also remains unchanged for IPv6, but was modified to
support both versions of IP.

   One change was to add a new member, pf, to the TCP control block struc-
ture, struct tcpcb.  This new member stores the Protocol Family, either
PF_INET for IPv4 or PF_INET6 for IPv6, in use for each TCP session.  This is
used in several parts of the TCP code to help select the correct IP-specific
code branch.

   The beginning of the tcp_input() function has a small amount of
IP-related processing. This was broken into two code paths, one for IPv4 and
one for IPv6 at the cost of an if check and a slight increase in code size.

   The main difficulty with the 4.4 BSD-Lite TCP implementation was its re-
liance on a single pointer, struct tcpiphdr *ti, that pointed to a structure
containing both the IPv4 overlay header (Figure 5) and also the TCP header
of received segments. The tcp_input() and tcp_reass() functions used this
com- bined structure for most of the data references relating to a given TCP
segment.  There were also other uses of this structure within the TCP
implementation.  Because of the differing IP header sizes, the TCP header
starts at a different offset from the start of the structure, depending on
which IP header is present.  The solution to this problem was to create a
new pointer struct tcphdr *th which is calculated separately for IPv4 and
IPv6, but always points to the TCP header. The references to TCP header data
that had previously used *ti now use *th instead.


                                 13


		<CUT>

             Figure 5: Format of struct ipovly IPv4 Overlay

		<CUT>

            Figure 6: Format of struct ipv6ovly IPv6 Overlay


   However, use of the *th pointer did not solve all of the problems.  The
older struct tcpiphdr contains an element ti->ti_len that pointed to the
packet's length field. There is not room to store such a data item in the
struct tcpipv6hdr, which uses a struct ipv6ovly (Figure 6), but fortunately
there was an existing local variable tlen in tcp_input() that is used
instead. Most of the references to IP data elements are made at the very
beginning of the tcp_input() function and so were easily handled.

   The tcp_reass() function was not amenable to supporting both versions of
IP at the same time, so our implementation increases code size by adding a
new tcpv6_reass() function that uses struct tcpipv6hdr in lieu of the struct
tcpiphdr used by the original tcp_reass().

   The tcp_input() function now calls the input security policy function be-
fore processing an incoming TCP segment. This ensures compliance with both
socket and system security requirements.  If an incoming segment should not
be processed for security policy reasons, then it is silently dropped. If
the system security policy is to require authentication on all received
packets, then attempts to open an unauthenticated TCP connection or
unauthenticated ping will silently fail as if the destination system were
not reachable at all. As with the UDP implementation, this check exacts a
performance penalty.

   One benefit of our changes has been to isolate the network-layer code
more.  This might make it easier to modify TCP further to support TCP over
other network-layer protocols, for example Novell's IPX. We are concerned
about the adverse performance impact of the IPv6 changes, so we are
examining meth- ods of improving the performance of our implementation. We
have not found anything in the IPv6 specifications that inherently reduces
TCP performance.


                                 14


#include <sys/socket.h>
#include <netinet6/in6.h>
: : :
struct sockaddr_in6 addr6;
int s;
: : :
s = socket(PF_INET6, SOCK_DGRAM, 0);
addr6.sin6_len = sizeof(addr6);
addr6.sin6_family = AF_INET6;
addr6.sin6_port = htons(7);
addr6.sin6_flowinfo = 0;
(void) ascii2addr( AF_INET6,
                    "FE80::800:dead:beef",
                    &addr6.sin6_addr);
sendto( s, ''hello'', 6, 0, &addr6,
        sizeof(addr6));
: : :


         Figure 7: Code fragment illustrating use of UDP over IPv6


6    Changes to Applications


6.1   Network Socket Enhancements

Although the IETF does not standardise application programming interfaces,
some members of the IPng Working Group did create an Informational RFC
describing how IPv6 might be used in conjunction with BSD Sockets [GTB95 ].
Some changes in 4.4-Lite BSD were needed to comply with that specification.
Fortunately, most of the changes involved adding protocol switch tables, and
entries to those tables[LMKQ89 ].  Other sockets changes were implemented at
lower levels, most notably the aforementioned PCB code.  One can use a
PF_INET6 socket to communicate using IPv4 or IPv6, which makes it easier to
transition applications to the new version of IP.

   More extensive changes were needed to permit applications to request se-
curity services from IPv6. Several new socket options were defined and
imple- mented, including SO_SECURITY_ENCRYPTION_TRANSPORT,
SO_SECURITY_ENCRYPTION_TUNNEL, and SO_SECURITY_AUTHENTICATION. These new
socket options are used by an application to request that ESP in
transport-mode, ESP in tunnel-mode, or the Authentication Header be used
with this network session.  Each also has an associated Security Level
parameter.  There are currently 4 security levels implemented. Level 0 does
not use security on outbound packets and does not require it on inbound
packets. Level 1 uses security on outbound packets if it is available but
does not require it on inbound packets. Level 2 requires secu-


                                 15


rity both outbound and inbound.  Level 3 is the same as level 2 except that
outbound packets use a security association unique to this socket. A planned
enhancement is to also permit an application to request that its session be
pro- vided with a new security association to replace the one in use. We
consider our new security-related socket options experimental and may alter
them somewhat as we gain more experience with application issues.

   Our kernel implementation permits a system administrator to define a de-
fault or minimum level of security.  The default security will be used for
all sessions provided with a valid Security Association. Applications may
also re- quest security services via the above sockets extensions. The
system security is configured using the same matrix of 3 protocols and 4
security levels that we described earlier for use in socket-requested
security. We plan to enhance the flexibility of our security policy engine
in the future so that the system administrator can have more sophisticated
policies than are currently supported.


6.2   Key Management Socket

We also have defined a new protocol family, called PF_KEY, for the Sockets
ap- plication programming interface. This extension to Sockets provides a
generic interface between security association management applications, such
as a Pho- turis daemon [KS95 ], and the kernel's network security data
structures.[PAM95]

This new generic key management interface is modeled upon the existing rout-
ing socket, PF_ROUTE.[Skl91] This enhancement permits the key management
system to be completely decoupled from the IP security implementation. Mul-
tiple key management schemes can be supported concurrently if desired. It
also will make it easy to change from one key management algorithm or
protocol to a new key management algorithm or protocol. To make such a
change, only a new daemon needs to be installed; no kernel modifications or
kernel rebuilding is necessary.  Many published key management protocols
have had flaws dis- covered years after initial publication[NS78 ][DS81
]. Hence it is important to be able to easily change the key management
protocol being used by the system.  Our alpha release includes an
application, key(8), that can be used by the sys- tem administrator to
manage keys and security associations in the kernel. Any key management
scheme, whether automatic key management such as Photuris or manual key
management such as key(8), can use the PF_KEY interface.


6.3   An Example Application: telnet

Most applications will need a small amount of modification to take advantage
of IPv6 and its unique features. Even with these modifications, the
applications will continue to support IPv4.  Most of these modifications are
in the socket code, allowing the use of the new AF_INET6 address family, new
data structures, and the corresponding network functions.

   We have modified several applications to use IPv6. We describe the
modifi- cations required for telnet in the following paragraphs. The telnet
application


                                 16


was also enhanced to add command-line options to set the socket security
level.  7

   The telnet client first parses the command line and options. If the user
has requested IP security services, then the appropriate socket options are
set using setsockopt(). Telnet then uses the new hostname2addr() and
ascii2addr() functions to seek an IPv6 address for the specified hostname or
text representa- tion of an address. If an IPv6 address is returned, telnet
then opens a PF_INET6 socket and begins communicating.  The requested
security services are auto- matically applied by the IP security
implementation inside the kernel. If an IP security processing error (for
example, no security association can be found and one is needed) occurs,
then the EIPSEC error will be returned to telnet so the user can be informed
of the problem.

   The IPv4 library functions inet_ntoa(), inet_aton(), gethostbyname(), and
gethostbyaddr() have been superceded by the new library functions
addr2ascii(), ascii2addr(), hostname2addr(), and addr2hostname() [GTB95 ].8
These new library functions work equally well for both IPv4 and IPv6, making
it easier for applications to support both IPv4 and also IPv6.

   In the future, we plan to add a privileged socket option to permit
applications that need to bypass IP security to do so (for example, a
Photuris daemon). This socket option would fail if the effective user-id of
the process connected to the socket was not equal to 0 so that ordinary user
applications could not bypass system security.  Such bypass is needed by key
management applications so that they can create the initial security
associations. Certain other applications having application-layer security,
for example a secured Domain Name Service daemon, might also need to bypass
IP security services. Although this has not been implemented yet, we believe
it will be straight forward to implement and have already put some of the
hooks in place.


7    Performance


Throughput and round-trip latency were measured using Rick Jones' NetPerf
tool.[Jon95] NetPerf has more accuracy and reproducibility than some older
tools.[Jef95] Except for Table 5, these measurements are for traffic that is
neither authenticated nor encrypted, though the security policy checks are
still performed.

   In our alpha release, IPv6 performance is somewhat worse than IPv4. UDP
latency, shown in Table 2, and TCP latency, shown in Table 1, both increased
for IPv6.  The increased latency, shown in Figure 8 is in both the inbound
and outbound protocol processing.  Comparing longer addresses (four 32-bit
words vs.  a single 32-bit word) and preparsing of optional headers are the
major contributors to the increased latency. We plan to add a fast path
bypass

____________________________

7 Although 4.4 BSD's telnet includes an encryption option, a fatal
  implementation flaw limits practical value.
8 These new functions were originally suggested by Craig Partridge in an
  email note to the IETF's IPng mailing list.


                                 17


                Figure 8: UDP and TCP Latency Graphs


around the preparsing code in the future. The lower IPv6 throughput, shown
in Table 3 and Table 4, is due to increased latency and larger packet size.

   The 4.4-Lite BSD implementation of TCP/IPv4 has had years of optimi-
sation whilst our alpha release has had no optimisation.  We believe that an
optimised IPv6 implementation will perform at least as well as a similarly
opti- mised IPv4 implementation.

   NetPerf has not yet been modified to use the security socket
options. Mak- ing such modifications to NetPerf does not appear trivial.
The older ttcp(8) testing tool was easily modified to use the security
socket options.  Table 5 indicates throughput differences (measured with
ttcp(8)) using authentication, transport-mode encryption, and both, versus
no security at all. While we have less confidence in the absolute values for
ttcp(8) than for NetPerf, we believe the relative performance degradation
shown by ttcp(8) is meaningful. Our se- curity implementations have not been
optimised at all. We believe that we can noticably improve our encryption
performance by encrypting and decrypting in place and removing memory
copies.  Hardware implementations of DES that run at 1 Gbps exist.[Sch94]
Implementations seeking high performance should probably use such encryption
hardware.


                                 18


_________________________________________
|  Number |  IPv4.  |  IPv6.   | Percent |
|_of_bytes.(|msec)._|_(msec).__|increase._|
|        1 |    1.27 |    1.54 |  +21% |
|       64 |    1.45 |    1.83 |  +26% |
|     1024 |    3.12 |    3.62 |  +16% |
|     2048 |    5.34 |    6.01 |  +12% |
|     4096 |   10.4  |   11.9  |  +14% |
|_____8192_|___19.0__|___22.1__|__+16%_|_

                         Table 1: TCP Latency

___________________________________________
| Number of |  IPv4.  |  IPv6.   | Percent |
|______bytes.(|msec)._|_(msec).__|increase._|
|          1 |    0.93 |    1.08 |  +17% |
|         64 |    1.13 |    1.30 |  +15% |
|       1024 |    2.82 |    3.06 |   +8% |
|       2048 |    5.00 |    5.77 |  +15% |
|       4096 |    8.89 |    9.90 |  +11% |
|_______8192_|___17.0__|___20.2__|__+19%_|_

                         Table 2: UDP Latency

_______________________________________________
|  Data |   Socket |  IPv4   |  IPv6   | Perf.  |
|___size_|buffer_size(|KB/sec)(|KB/sec)_|drop__|
|  4096 |   57344 |   780    |   731    |6.26% |
|  8192 |   57344 |   778    |   729    |6.28% |
| 32768 |   57344 |   776    |   730    |5.97% |
|  4096 |   32768 |   807    |   763    |5.45% |
|  8192 |   32768 |   806    |   758    |5.91% |
| 32768 |   32768 |   811    |   762    |6.02% |
|  4096 |    8192 |   861    |   775    |9.93% |
|  8192 |    8192 |   858    |   784    |8.68% |
|_32768_|____8192_|___863____|___784____|9.19%_|

                       Table 3: TCP Throughput

______________________________________________
| Data |  Socket   |   IPv4 |    IPv6 | Perf.  |
|___sizeb|uffer_size(|KB/sec)(|KB/sec)_|drop__|
|    64 | 32767   |     537 |     500 | 6.82% |
|__1024_|_32767___|____1144_|____1125_|_1.60%_|

                       Table 4: UDP Throughput


8    Summary


This paper has described a freely distributable prototype implementation of
IPv6 based on 4.4 BSD-Lite.  There are a number of implementation differ-
ences between IPv4 and IPv6 due to packet format differences and also
protocol differences. Some of the assumptions made and techniques used by
the IPv4 im- plementation are no longer valid for IPv6. Because the
implementation includes the cryptographic security mechanisms mandatory for
IPv6, any networked ap- plication can now have the security it desires
without having to implement it at the application layer.  Performance of
TCP/IPv4 and TCP/IPv6 has been compared.


9    Acknowledgments


This work has been funded by the Information Security Program Office (PD71E)
of the US Space & Naval Warfare Systems Command since 1992 and also by the
Computer Systems Technology Office of the Advanced Research Projects Agency
(ARPA/CSTO) since 1995. We are grateful for their support.


References


[Atk95a]   Randall Atkinson. IP Authentication Header, August 1995. RFC-


                                 19


           1826.

[Atk95b]   Randall Atkinson. IP Encapsulating Security Payload (ESP), Au-
           gust 1995. RFC-1827.

[Atk95c]   Randall Atkinson.  IP Security Architecture, August 1995.  RFC-
           1825.

[DC95]     Steve Deering and Alex Conta.  ICMP for the Internet Protocol
           version 6, June 1995. Work in Progress.

[Dee89]    Steve Deering. Host extensions for IP Multicasting, August 1989.
           RFC-1112.

[Dee91]    Steve Deering. ICMP Router Discovery Messsages, September 1991.
           RFC-1256.

[Dee93]    Stephen E. Deering. SIP: Simple Internet Protocol. IEEE Networks,
           7(3):16-28, May 1993.

[DH95]     Steve Deering and Bob Hinden. IPv6 specification, June 1995. Work
           in Progress.

[DS81]     D.E. Denning and G.M. Sacco.  Timestamps in key distribution
           protocols.  Communications of the ACM, 24(8):533-536, August
           1981.


                                 20


                    ____________________________
                    | Security       |Throughput |
                    |_Features_______|(KB/sec)__ |
                    | None          |   "775     |
                    | Authentication |  "345     |
                    | Encryption     |  "192     |
                    |_Both__________|___"153____ |

             Table 5: Impact of IPv6 Security On Throughput.


[FMMT84]   R. Finlayson, T. Mann, J. Mogul, and M. Theimer. Reverse address
           resolution protocol, June 1984. RFC-903.

[GN95]     Robert E. Gilligan and Erik Nordmark. Transition Mechanisms for
           IPv6 Hosts and Routers, May 1995. Work in Progress.

[GTB95]    Robert Gilligan, Susan Thomson, and Jim Bound. IPv6 Program
           Interfaces for BSD Systems, July 1995. Work in progress.

[Hin94]    Robert Hinden. Simple Internet Protocol Plus white paper, October
           1994. RFC-1710.

[Jef95]    Jeffrey D. Chung and C. Brendan and S. Traw and Jonathan M.
           Smith.  Event-Signaling within Higher Performance Network Sub-
           systems. In Proceedings, High Performance Communications Sub-
           systems, Mystic, CT, August 1995.

[Jon95]    Rick A. Jones. NetPerf: A Network Performance Benchmark (Re-
           vision 2.0), February 1995. Technical Report.

[KS95]     Phil Karn and William Simpson. The Photuris Session Key Man-
           agement Protocol, October 1995. work in progress.

[LM91]     X. Lai and J. Massey. A Proposal for a New Block Encryption Stan-
           dard. In Advances in Cryptology - EUROCRYPT '90 Proceedings,
           pages 389-404, Berlin, 1991. Springer-Verlag.

[LMKQ89]   Samuel J. Leffler, Marshall Kirk McKusick, Michael J. Karels, and
           John S. Quarterman.  The Design and Implementation or the 4.3
           BSD UNIX Operating System.  Addison-Wesley, New York, NY,
           1989.

[Lot92]    Mark Lottor.  Internet Growth (1981-1991), January 1992.  RFC-
           1296.

[MD90]     Jeff Mogul and Steve Deering.  Path MTU Discovery, November
           1990. RFC-1191.

[MKS95a]   Perry Metzger, Phil Karn, and William Simpson. The ESP DES-
           CBC transform, August 1995. RFC-1829.


                                 21


[MKS95b]   Perry Metzger, Phil Karn, and William Simpson. IP Authentication
           using Keyed MD5, August 1995. RFC-1828.

[NNS95]    Erik Nordmark, Thomas Narten, and William Simpson. Neighbor
           Discovery for IP Version 6, September 1995. Work in Progress.

[NS78]     R.M. Needham and M.D. Schroeder. Using Encryption for Authen-
           tication in Large Networks of Computers. Communications of the
           ACM, 21(12):993-999, December 1978.

[PAM95]    Bao G. Phan,  Randall J. Atkinson,  and Daniel L. McDonald.
           PF_KEY: Key Management Support inside 4.4 BSD Unix, Decem-
           ber 1995. Technical Report.

[Plu82]    D. Plummer. Ethernet address resolution protocol, November 1982.
           RFC-826.

[Pos81]    Jon Postel.  Internet Control Message Protocol, September 1981.
           RFC-792.

[RLH+ 95]  Yakov Rekhter, Peter Lothberg, Robert Hinden, Steve Deering, and
           Jon Postel. An IPv6 Provider-Based Unicast Address Format, Au-
           gust 1995. Work in Progress.

[Sch94]    Bruce Schneier.  Applied Cryptography.  John Wiley & Sons, New
           York, NY, 1994.

[Skl91]    Keith Sklower.  A Tree-Based Packet Routing Table for Berkeley
           UNIX. In Proceedings of the Winter '91 USENIX Conference, Dal-
           las, TX, January 1991. USENIX Association.

[TN95]     Susan Thomson and Thomas Narten. IPv6 Stateless Address Au-
           toconfiguration, October 1995. Work in Progress.

[ZBE+ 93]  L. Zhang, R. Braden, D. Estrin, S. Shenker, and D. Zappala. RSVP:
           A New Resource ReSerVation Protocol. IEEE Networks, September
           1993.


                                 22