Check out the new USENIX Web site. next up previous
Next: Evaluation of IPAQ recordings Up: Toward Speech-Generated Cryptographic Keys Previous: Security


Empirical Results

In this section we empirically evaluate the security of our technique using two different data sets. In these evaluations we attempt to conservatively characterize the security of our technique against an attacker who captures the device. It is clear from Section 4.4 that the number $d$ of distinguishing features is central to the security of our scheme, in that if $d$ is small, then our scheme is vulnerable to a key recovery attack via exhaustive search (see Figure 3). Therefore, in order to demonstrate that our approach is plausibly secure, it is necessary to demonstrate that a high number of distinguishing features can be achieved using our techniques. In addition, we also attempt to characterize the degree to which additional knowledge aids the attacker's quest for the key, in the form of either knowing the passphrase said by the user or having recordings of the user saying phrases other than her passphrase.

We remind the reader that large $d$ is not sufficient for strong security. For example, even if all features are distinguishing ($d =
m$) for all users, but all users' feature descriptors are identical (and the attacker knows this), then an attacker who captures a user's device can trivially determine the key. Therefore, it is equally important that users' feature descriptors vary widely--or more precisely, are drawn from a distribution with high entropy. An entropy evaluation of user's utterances from phone recordings of users saying the same passphrase is described in [16,17], and these studies suggest that the entropy available in user utterances it substantial even when users say the same passphrase. As already noted, however, since that study involves only recordings of users taken over phone lines, and since that study is limited to $m=46$ features, it is insufficient in several ways. Unfortunately, the data sets with which we are presently working (see Sections 5.1 and 5.2) include too few users to enable meaningful measurements of the entropy of users' feature descriptors, and so here we report results for distinguishing features only.

In order to calculate the average number of distinguishing features per user, it is of course necessary to define when a feature is distinguishing. Let $\mu_i$ and $\sigma_i$ denote the mean and standard deviation of feature $\phi_i$ over the recent history of successful logins.9 Then we say that the $i$-th feature is distinguishing if $\vert\mu_i - \tau_i\vert > k\sigma_i$ for some parameter $k > 0$. Note that if feature $i$ is distinguishing, then either $\tau_i > \mu_i + k\sigma_i$ and so usually $b(i) = 0$ for the user (see (1)), or $\tau_i < \mu_i - k\sigma_i$ and so usually $b(i) = 1$ for the user. Intuitively, the parameter $k$ tunes the ``sensitivity'' of the scheme, in that a small $k$ implies more distinguishing features, and a large $k$ implies fewer. Obviously $k$ must be tuned to balance achieving a high number of distinguishing features with enabling the user to successfully regenerate his key reliably, since a higher number of distinguishing features is advantageous for security but also requires increasingly similar utterances to regenerate the key. The parameter $k$ will play a central role in our evaluation.

The features $\phi_i$ that we use in the balance of this paper are described in [16, Section 3.2]. Each is defined by comparing the position of a vector characterizing a segment of the utterance to a fixed plane. This plane is a parameter of our scheme, and though we will rarely mention it below, it is important for the reader to be aware that the data we present is based on a plane selected, based on our data, to optimize our measures in certain ways. On the one hand, this means that our data presents what could be achieved with a good selection of this plane, and is thus optimistic in this regard. On the other hand, since this plane is selected by searching through a small set of candidate planes, (infinitely) many planes are omitted from this search. Consequently, it is likely that planes yielding better measures exist. The experimentation we have conducted thus far does not permit us to conclude how to select this plane in general, and this continues to be an area of our ongoing work.


Subsections
next up previous
Next: Evaluation of IPAQ recordings Up: Toward Speech-Generated Cryptographic Keys Previous: Security
fabian 2002-08-28