Next: Background Up: Toward Speech-Generated Cryptographic Keys Previous: Toward Speech-Generated Cryptographic Keys

Introduction

Futuristic mobile computing platforms will offer, and in some cases will require, methods of user input other than a keyboard, mouse or joystick. This is especially true for head-mounted displays and other wearable computers (e.g., see [22]). For such futuristic devices, and even for next-generation PDAs and programmable mobile phones, voice is a leading contender for the dominant user input medium.

We argue that if voice prevails in this sense, then this poses a challenge for securing data on these devices. On the one hand, if our experience with laptop computers and mobile phones is any indication, then these devices will be stolen frequently: Laptop theft is already the second leading quantifiable cost to enterprises from IT-related security threats [19]. Similarly, mobile phones are the object of theft in four of every ten personal robberies in several cities in the United Kingdom, and these areas logged a fourfold increase in personal robberies involving mobile phones between 1998-99 and 2000-01.⁶ These trends suggest that encryption of any sensitive data on such devices is prudent. On the other hand, presuming that these devices will not be tamper-resistant, the cryptographic key with which such encryption can be performed would need to be derived from the voice input of the user, presumably some form of spoken passphrase. Unfortunately, spoken passphrases are likely to have far less entropy than typed ones, due to their need to be pronouncible and due to other forms of information loss in a spoken representation (e.g., capitalization and punctuation).

In this paper we describe an implementation of an approach to derive a repeatable cryptographic key from spoken user input, in which the entropy of the key is drawn from both the passphrase that is spoken and the speech patterns of the user while speaking it. In this way, even if the entropy of the passphrase space is small, the variability across users' vocal tracts will pose an additional obstacle to the cryptanalysis of the key. Moreover, our approach uses techniques designed to withstand an attacker who captures and reverse-engineers the device on which the key generation takes place (but not while it is taking place), i.e., an attacker who has full access to the stable storage of the device. Our general approach for achieving this was discussed in [16], though as only an initial step toward this goal, that work evaluated the approach only for the generation of 46-bit keys, using only utterances recorded over phone calls, and without regard for the difficulties faced in implementing the approach on resource-constrained devices. Here, we provide a somewhat more realistic evaluation of this approach using a full implementation on an off-the-shelf PDA (the Compaq IPAQ), using data recorded on that PDA, and targeting 60-bit cryptographic keys. We detail numerous changes and refinements we needed to make the approach viable on this platform. We will also give evidence to suggest that the adversary gains little by knowing the user's passphrase, and that the advantage the adversary gains by additionally recording the user saying phrases other than the passphrase is less than one might expect.

We caution the reader, however, of several limitations of our analysis and our approach. First, though we demonstrate the reliable re-generation of a key using an -bit characterization of the user's utterance, we do not claim to necessarily achieve keys with bits of entropy. Indeed, one can draw few conclusions regarding key entropy from the limited user studies [16,17] that we have been able to perform. That said, our studies suggest that the technique we have implemented does draw significant entropy from passphrase utterances, and at the very least should offer greater entropy than the passphrase space alone. Second, as our approach strives to re-generate a cryptographic key whenever the legitimate user utters her passphrase, it is necessarily vulnerable to an attacker who can obtain both the user's device and a high-quality recording of the user uttering her passphrase; this exposes all the user's keying material, after all. While any biometric is vulnerable to such an attack, we raise this point here to emphasize the primary attacker with which we are concerned: the attacker who captures the device but that does not have access to the user. That said, we do show, somewhat surprisingly, that knowledge of the user's passphrase seems to help the attacker little, and that even recordings of the user saying other phrases helps only marginally.

Next: Background Up: Toward Speech-Generated Cryptographic Keys Previous: Toward Speech-Generated Cryptographic Keys

fabian 2002-08-28