################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the Sixth USENIX UNIX Security Symposium San Jose California, July, 1996. For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org A Revocable Backup System (extended abstract) Dan Boneh Richard Lipton dabo@cs.princeton.edu rjl@cs.princeton.edu Department of Computer Science Princeton University Princeton, NJ 08544 Abstract We present a system which enables a user to remove a file from both the file system and all the backup tapes on which the file is stored. The ability to remove files from all backup tapes is desirable in many cases. Our system erases information from the backup tape without actually writing on the tape. This is achieved by applying cryptography in a new way: a block cipher is used to enable the system to ``forget'' information rather than protect it. Our system is easy to install and is transparent to the end user. Further, it introduces no slowdown in system performance and little slowdown in the backup procedure. Introduction On many systems the remove-file command misleads the user into thinking that his file has been permanently removed. Usually the file is still available on a backup tape. This is an important feature used to protect against accidental file erasure or system crashes. However, it has the draw back that the user is unable to completely remove his files. In many scenarios it is desirable on the part of the user to erase all copies of a certain file. This is frequently the case with personal mail messages. Similarly, a user may wish to remove the history and cache files of his web browser. Other examples include a patient removing his medical files from the hospital's system upon checkout and financial data that should be erased after a short period of time. Our goal is to make it so that the sensitive data becomes inaccessible to anyone (including the data's owner). A naive solution is to erase the data from the file system. Then mount the backup tapes one by one and erase the sensitive data from them. This method is impractical for several reasons. First it inconveniences the user. To implement this scheme the user must call a computer operator whenever such an erasure is to take place. Furthermore, this method is quite complicated considering the backup policy which many institutes employ. Usually a backup of the entire file system is done once every time period, e.g. every month. The backup tape is then stored forever in a ``cellar'' or sometimes off site. This backup policy enables users to restore their entire directory structure to any point in the past. Clearly the naive approach is doomed to fail whenever this backup policy is used. The computer operator has to remove the data from many backup tapes. This procedure is painstaking, but is also insecure since the operator might ``forget'' to remove the data from one of the old backup tapes. We propose a system which avoids the problems mentioned above. Our system will enable the user to remove a file from the file system and all backup tapes without ever mounting a single tape. At first this seems impossible: it is not possible to remove data from a tape without physically erasing the tape. Fortunately, cryptography enables us to do just that. The basic idea can be described as follows: when a file is backed up to tape it is first encrypted using a randomly generated key. The encrypted version of the file is the one written to tape. When the user wishes to remove the file from the backup tape he instructs the system to ``forget'' the key used to encrypt the file. The act of forgetting the encryption key renders the data on the tape useless. No one, including the file owner, can ever access the file again. In effect, the file has been erased from the tape. Notice that the encrypt-before-backup is completely transparent to the user. It is crucial that no one but the operating system know which key was used to encrypt the file during backup. This ensures that when the operating system is instructed to forget the encryption key, the tape data becomes unreadable. The rest of the paper is devoted to describing an implementation of the above idea. The tricky part of the implementation is managing the encryption keys. Clearly the encryption keys have to be backed up as well. We have devised several methods for backuping the encryption keys while still preserving the fundamental properties described above. It is interesting to compare our system to the cryptographic file system implemented by Blaze [2]. In a cryptographic file system the files are always stored in an encrypted form. Whenever a file needs to be accessed it is decrypted on the fly using the owner's key. The backup tape is a direct dump of the file system, i.e. the tape contains the encrypted version of every file. Clearly the owner of the file is the only person who has access to the backed up version of the file. This does not guarantee that his backed up files will not be accessed by an unauthorized party. The owner might be forced to reveal his key, e.g. due to a court order. Thus, the cryptographic file system does not conform to our requirements: it does not guarantee that the backup tape data will not be accessed. On the other hand, in our scheme the backed up data becomes inaccessible to everyone once it is ``erased'' from the tape. In addition it is important to point out that our system is very easy to install. Only the backup system has to be modified. The cryptographic file system requires far more extensive modifications. Key management In this section we give a detailed description of our backup system. In our system the user can specify a collection of files that are to be encrypted during backup. Each such file has an encryption key associated with it. The encryption key has a life time which is specified by the user. When the key expires the system generates a new key for the file and adds the expired key to a list of old keys. The maximum number of expired keys that the system remembers is a parameter specified by the user. When the list of keys is at its maximal size the oldest expired key in the list is erased to make room for the new key. The above mechanism is very flexible. We give some examples to illustrate its use. Suppose a user is preparing a document and he wants to make sure that old and invalid copies of his draft are inaccessible to anyone. He could set the key-life for that document to be one month and instruct the system to store 12 expired keys for this document. This means that a new key is generated once a month causing a year old key to be erased. The result is that once a month all backup copies older than a year are revoked. To revoke all copies that are more than 6 month old, the user can manually instruct the system to remove all keys that expired more than 6 month ago. Of course the user can decide to delete the file altogether by instructing the system to remove all keys (including the current key) for the file. In case of disk crash the most recent version of the document can be restored from tape using the current key. More generally, institutions may wish to adopt backup revocation policies. For instance, to prevent law suits regarding old data, an institution may decide to revoke all backup tapes that are more than 3 years old. This is done by instructing the system to delete all keys that expired more than 3 years ago. We point out once more that the revocation process has the effect of removing files from all old backup tapes without ever mounting a single tape. The most important component of our system is the key management. All encryption keys are stored in one file which we refer to as the key-file. This file should be protected meaning that only privileged processes should be allowed to read it. The key-file contains one record per each file that has ever been encrypted during back up. There are two types of entries in the key-file: directory and file entries. The structure of a single entry in the file is described in Figure 1. The fields in a file entry contain the file-name, the maximum number of expired keys that the system should store, the life time of a single key and a list of keys. The list of keys includes the current key (as the first entry in the list) followed by the expired keys ordered chronologically. When a new key is generated the keys in the list are shifted and the last (oldest) entry is lost. The fields in a directory entry specify the directory path and indicate whether all files in the directory and subdirectories should be encrypted during backup. When the cont\_flag is turned on the directory is scanned and all new files in it are added to the key-file. The num_keys and key_life fields for these new file entries are set to the values taken from the corresponding fields in the directory entry. A directory entry must always precede a file entry in order to specify the path to the file. union keyfile_entry { struct file_entry { /* Structure of file entry */ char *filename; char *keys[]; /* Keys used to encrypt file */ int num_keys; /* Number of keys to be saved */ time_t key_life; /* Life time of a single key */ time_t date_key_issued; /* Date current encryption key */ /* for file was issued */ time_t last_backup_date; /* Date file was last backed up */ } file struct dir_entry { /* Structure of directory entry */ char *dirpath; /* Directory path */ char cont_flag; /* Indicate if files in dir */ /* should be added to key-file */ int num_keys; /* Default num keys to be saved */ time_t key_life; /* Default life time of a key */ } dir; } Figure 1: key-file structure The key-file is permanently stored on the file system. This file is extremely important since without it the backup tapes are useless. For this reason the key-file must be backed up as well. However, the file can not be written to tape as is. If it were written to tape the system would not be able to permanently erase keys from the key-file. This would defeat the purpose of our system. Our solution is to generate a new master-key during every backup. The key-file will be written to tape after it has been encrypted using this new master-key. The master-key itself is not written to tape. Notice that in case of a disk crash, the master-key is crucial for recovering the key-file. Without the master-key the key-file can not be recovered and as a result all backup tapes using our system become useless. For this reason the master-key must be handled with care. We discuss methods for storing the master-key in the next section. To make sure that the key-file is backed up to tape we treat it as any other file which is to be encrypted during backup. This means that the key-file is stored on the file system for which it used. Further, the key-file contains an entry which corresponds to the key-file itself. The file-name field in this entry contains the key-file name and the key field contains the master-key. Hence, the master-key is actually stored in the key-file and is used to encrypt the key-file during backup. To make sure that a new master-key is generated during every backup the key-life field is set to zero. Similarly, to make sure that only the current master-key is stored, the num-keys field is set to 1. This has the effect of revoking the old copy of the key-file during every backup. load_key_file(); scan_sub_dir(); generate_new_keys(); touch_key_file(); /* key-file will be stored */ /* during incremental backup */ Loop on all files in file-system { file_entry = get_key_file_entry(current_file); if (file_entry != NULL) { /* If file has an entry in */ key = get_key(file_entry); /* key-file then get */ else /* encryption key. Otherwise, */ key = NULL; /* don't encrypt file. */ if (is_key_file(current_file)) /* Erase current master-key */ erase_master_key_from_disk(); /* before writing key-file. */ Loop on blocks in file { /* Write file blocks to tape */ /* encrypting them if */ if (key != NULL) /* necessary. */ encrypt(current_block, key); dump(current_block); /* Write block to tape. */ } if (file_entry != NULL) /* Update backup date field */ file_entry->last_backup_date = current_date; } write_key_file(); /* Write key-file back to disk. */ store_master_key(); /* Take extra care to store the new master key. */ Figure 2: Backup operation We can now describe the details of the backup and restore operations. The schematics of the backup procedure are described in Figure 2. We briefly sketch these steps here. [Initialize] When the backup process is initialized the entire key-file is loaded into memory. Then all directories in the key-file which have the cont_flag turned on are scanned and their contents is added to the memory image of the key file. The touch_key_file() routine will update the last modification date of the key-file. This guarantees that the key-file is written to tape even during an incremental backup( An incremental backup of a certain level only backups files that were modified since the last backup of that level). [New keys] Next, new keys are generated for all files who need them. A new key is generated for all files for which date_key_issued + key_life < current_date The date_key_issued field is updated for each new generated key. [Write all files to tape] Loop on all files in the file system. Each file is dumped to tape, after encryption if necessary. The old master-key is erased from the key-file just before the key-file itself is dumped to tape. This ensures that the old master-key is not written to tape when the key-file is backed up. Had this not been done, old master-keys could be read from the backup tape. This would be disastrous for our system. [Terminate] Finally, write the key-file back to disk and store the new master-key using the methods described in the next section. The modifications to the restore operation are even simpler. During restore we use the key-file to find the appropriate key for each file. The system first reads the date written on the backup tape. This date is the date on which the backup was done. For each file to be restored the system retrieves the key used to encrypt the file on the backup date. This key is then used to decrypt the file when it is read from the tape. In case of a disk crash the first thing that has to be recovered is the key-file. This is done by creating a temporary key-file containing a single entry. This entry contains the key-file as its file name and the master-key as the key. The operator can now use restore to recover and decrypt the most recent backed up version of the key-file. Now a full restore of the file system is possible. Master key management As was mentioned above, the master-key is a crucial component of the system. Without it, all the backup tapes are useless. We first describe the properties that a master-key storage system must satisfy. Observe that the master-key can not be written to tape in the clear. If it was, then the key-file on the tape would be accessible and with it all the files on the tape. This will again defeat the purpose of our system. Furthermore, wherever we choose to store the master-keys we must make sure that only the most recent key is accessible. As before, an old master-key will enable access to the contents of old tapes. Hence, any system which stores master keys must ``forget'' all but the most recent one. We propose two solutions which can be used in conjunction with one another. The first one is the simplest; at the end of each backup the computer operator is asked to write down (on paper or on a floppy disk) the current master-key. He then destroys his copy of the previous master-key. The second method involves an internet server which is used for master-key storage. Suppose a site performs daily incremental backups. This is common in many sites. The master-key-storage-server generates a public/private key pair on a daily basis. Every day all sites performing backups encrypt their daily master-key using the server's current public-key and write the resulting string on their backup tape. In case of a disk crash a site can recover its daily master-key by performing the following steps: the local computer operator first reads the encryption of the master-key written on his tape. He then sends the encrypted master-key to the storage-server. The storage-server, after authenticating the identity of the sender, decrypts the master-key and sends the result back to the computer operator. The operator can now restore his file system. The same public/private key pair can be used by all sites in the world which use daily (incremental) backups. Hence, the master-key-storage-server simply provides the service of generating a public/private key pair on a daily basis. The above description is not much different than a standard key escrow system. The new twist is that to make sure that old master-keys are inaccessible the master-key-storage-server must erase all but its most recent private-key. Otherwise old master-keys can be recovered from old tapes. These master-keys then enable anyone to read the contents of old tapes. Since the storage-server is providing a commercial service it is in its best interest to be trustworthy and indeed ``forget'' all old private keys. To increase the security and reliability of the scheme one can use k out of l secret sharing techniques [4]. This means that a given site will employ l storage-servers and exactly k of them are required to recover the daily master-key. Now an old master-key can not be recovered even if k-1 of the storage-severs are untrustworthy. Similarly, even if l-k storage-servers crash and lose their current private-key a site can still recover its current master-key. The parameters l and k can be fine tuned to the site's needs. For increased reliability some sites may wish to be able to access a small number (e.g. three) of old master-keys. For instance, if the most recent backup tape is lost, the previous one can be used if the corresponding master-key is still accessible. As a result some sites may want the storage-server to save a small number of its most recent private-keys. To accommodate this need the storage server can offer k (where k<10) types of public/private key pairs. For type 1 only the most recent private key is available. For type 2 the two most recent private keys are available, etc. This arrangement requires the storage-server to generate k new public/private key pairs daily. A site who wishes to have access to its three most recent master-keys may use a type 3 public key published by the storage server. The user interface Two utilities enable a user to interact with our system. The first utility enables a user to declare that a file or directory has the revocable-backup attribute. The second enables a user to revoke backup copies of his files. These two utilities comprise the most basic interface. Naturally utilities for obtaining the status of a given file are also provided. [mkrvcbl] The make-revocable command will add a file (or directory) to the key-file. The user can specify the key-life and num-keys parameters on the command line. The default values for these parameters is infinity for the key-life and one for num-keys. This means that there is a single key associated with the file throughout the life of the file. This enables the user to revoke all backup copies of the file when he wishes to completely remove the file from the system. Executing mkrvcbl dir-name will add the directory to the key-file with the cont_flag turned on. During backup all files in the directory and its subdirectories will be added to the key-file. Only the owner of the file (or directory) is permitted to apply mkrvcbl to the file (or directory). [rvkbck] The revoke-backup command is activated as rvkbck file-name [date]. Only the file owner is permitted to apply rvkbck to a file. The command removes all keys that expired before date from the file's entry in the key-file. If the date parameter is left blank the entire file's record is removed from the key-file. In doing so, all keys used to decrypt the file from the backup tapes are lost. As a result, the file can no longer be restored from the backup tapes. Unfortunately this is not quite true. Recall that the key-file itself is also backed up on tape. Hence, the removed entry can still be found in the backup version of the key-file. However, at the next backup the master-key will change making the old backed up version of the key-file useless. Therefore, the entry is permanently removed from the key-file after the next backup operation. For example, in a system where an incremental backup takes place daily the rvkbck file-name command will permanently remove the file within 24 hours of the time the command is issued. The two utilities described above enable a user to manipulate the key-file according to his needs. By definition, the key-file contains an entry for every file that has ever been encrypted during backup (and has not been revoked). This could make the key-file large. Once every time period, e.g. once a year, the computer operator may wish to prune the key-file by removing all entries which correspond to files which are no longer present on the file system. This is done using the prune-key-file utility. [prunekeyfile] The command prunekeyfile date will remove all entries in the key-file which correspond to files which no longer exist on the file system and whose last backup date is prior to `date'. The utility will write the removed entries to a file called key-file.`date'. This file will be backed up to tape as a regular file (without encryption) and then removed from the file system. Notice that by doing so the user is no longer able to revoke the backed up copy of the files corresponding to the removed entries. This is fine since these files are old files which are no longer present on the file system. Summary and future work We described a system which enables a user to permanently remove a file from the file system and all backup tapes. The ability to revoke backup copies of files is important and may be of interest to many institutions. Our system is very easy to install and provides a simple user interface. Our scheme applies cryptography in a new way. Here cryptography is used to erase information rather than protect it. Since the backed up files are stored for extended periods of time it is desirable that the block cipher used to encrypt the files be extremely secure. Hence we suggest using block ciphers with longer keys than are usually used. For instance one could apply triple DES twice to obtain a 224 bit key. To simplify the installation of our system we chose not to modify the existing file system. Our implementation has the drawback that the backup key information does not become a part of the file. Thus, when the file is moved or copied the resulting file will not be securely backed up. A full-scale implementation of our scheme could embed a new file attribute in the file header. This attribute would indicate that the file is to be securely backed up. This way when the file is moved or copied, the new file will have the same attributes. The size of the key-file can be reduced by incorporating the ideas used in Lamport's one-time password scheme [3, pp. 230-232]. We thank Steven Bellovin [1] for pointing this out. Let f be a one way permutation. The idea is to only store the oldest accessible key in the key-file. The more recent keys can be obtained by repeatedly applying f to this key. When the oldest key k expires it is simply replaced by f(k). Since given f(k) it is hard to compute k we may say that k has been ``forgotten''. For efficiency it is desirable to store both the oldest and most recent keys in the key-file. Using this approach we can make do with storing only two keys per key-file entry. As a final note we point out that standard UNIX backup utilities, e.g. dump and tar, do not enable a user to specify a collection of files that should not be backed up. There are several types of files for which this option is important due to privacy considerations. The typical example is the history and cache files of the user's web browser. On the one hand, the information is usually not important to the user. On the other hand the information might be private and the user may not want extra copies of it floating around. We suggest that this feature be incorporated into commercial backup systems. Acknowledgments We thank Jim Roberts and Matt Norcross for a very helpful explanation of our local backup system. Bibliography [1] S. Bellovin, private communications. [2] M. Blaze, ``A Cryptographic File System for Unix'', available at https://www.cert-kr.or.kr/doc/Crypto-File-System.ps.asc.html [3] C. Kaufman, R. Perlman, M. Speciner, ``Network Security'', Prentice Hall, 1995. [4] A. Shamir, ``How to share a secret'', CACM, Vol. 22, Nov. 1979, pp. 612--613.