Michael K. Reiter
The inability of humans to generate and remember strong secrets makes it difficult for people to manage cryptographic keys. To address this problem, numerous proposals have been suggested to enable a human to repeatably generate a cryptographic key from her biometrics, where the strength of the key rests on the assumption that the measured biometrics have high entropy across the population. In this paper we show that, despite the fact that several researchers have examined the security of BKGs, the common techniques used to argue the security of practical systems are lacking. To address this issue we reexamine two well known, yet sometimes misunderstood, security requirements. We also present another that we believe has not received adequate attention in the literature, but is essential for practical biometric key generators. To demonstrate that each requirement has significant importance, we analyze three published schemes, and point out deficiencies in each. For example, in one case we show that failing to meet a requirement results in a construction where an attacker has a 22% chance of finding ostensibly 43-bit keys on her first guess. In another we show how an attacker who compromises a user's cryptographic key can then infer that user's biometric, thus revealing any other key generated using that biometric. We hope that by examining the pitfalls that occur continuously in the literature, we enable researchers and practitioners to more accurately analyze proposed constructions.
While cryptographic applications vary widely in terms of assumptions, constructions, and goals, all require cryptographic keys. In cases where a computer should not be trusted to protect cryptographic keys--as in laptop file encryption, where keeping the key on the laptop obviates the utility of the file encryption--the key must be input by its human operator. It is well known, however, that humans have difficulty choosing and remembering strong secrets (e.g., [2,14]). As a result, researchers have devoted significant effort to finding input that has sufficient unpredictability to be used in cryptographic applications, but that remains easy for humans to regenerate reliably. One of the more promising suggestions in this direction are biometrics--characteristics of human physiology or behavior. Biometrics are attractive as a means for key generation as they are easily reproducible by the legitimate user, yet potentially difficult for an adversary to guess.
There have been numerous proposals for generating cryptographic keys from biometrics (e.g., [33,34,28]). At a high level, these Biometric Cryptographic Key Generators, or BKGs, follow a similar design: during an enrollment phase, biometric samples from a user are collected; statistical functions, or features, are applied to the samples; and some representation of the output of these features is stored in a data structure called a biometric template. Later, the same user can present another sample, which is processed with the stored template to reproduce a key. A different user, however, should be unable to produce that key. Since the template itself is generally stored where the key is used (e.g., in a laptop file encryption application, on the laptop), a template must not leak any information about the key that it is used to reconstruct. That is, the threat model admits the capture of the template by the adversary; otherwise the template could be the cryptographic key itself, and biometrics would not be needed to reconstruct the key at all.
Generally, one measures the strength of a cryptographic key by its entropy, which quantifies the amount of uncertainty in the key from an adversary's point of view. If one regards a key generator as drawing an element uniformly at random from a large set, then the entropy of the keys can be easily computed as the base-two logarithm of the size of the set. Computing the entropy of keys output by a concrete instantiation of a key generator, however, is non-trivial because "choosing uniformly at random" is difficult to achieve in practice. This is in part due to the fact that the key generator's source of randomness may be based on information that is leaked by external sources. For instance, an oft-cited flaw in Kerberos version 4 allowed adversaries to guess ostensibly 56-bit DES session keys in only guesses . The problem stemmed from the fact that the seeding information input to the key generator was related to information that could be easily inferred by an adversary. In other words, this auxiliary information greatly reduced the entropy of the key space.
In the case of biometric key generators, where the randomness used to generate the keys comes from a user's biometric and is a function of the particular features used by the system, the aforementioned problems are compounded by several factors. For instance, in the case of certain biometric modalities, it is known that population statistics can be strong indicators of a specific user's biometric [10,36,3]. In other words, depending on the type of biometric and the set of features used by the BKG, access to population statistics can greatly reduce the entropy of a user's biometric, and consequently, reduce the entropy of her key. Moreover, templates could also leak information about the key. To complicate matters, in the context of biometric key generation, in addition to evaluating the strength of the key, one must also consider the privacy implications associated with using biometrics. Indeed, the protection of a user's biometric information is crucial, not only to preserve privacy, but also to enable that user to reuse the biometric key generator to manage a new key. We argue that this concern for privacy mandates not only that the template protect the biometric, but also that the keys output by a BKG not leak information about the biometric. Otherwise, the compromise of a key might render the user's biometric unusable for key generation thereafter.
The goal of this work is to distill the seemingly intertwined and complex security requirements of biometric key generators into a small set of requirements that facilitate practical security analyses by designers.
Specifically, the contributions of this paper are:
Throughout this paper we focus on the importance of considering adversaries who have access to public information, such as templates, when performing security evaluations. We hope that our observations will promote critical analyses of BKGs and temper the spread of flawed (or incorrectly evaluated) proposals.
To our knowledge, Soutar and Tomko  were the first to propose biometric key generation. Davida et al.  proposed an approach that uses iris codes, which are believed to have the highest entropy of all commonly-used biometrics. However, iris code collection can be considered somewhat invasive and the use of majority-decoding for error correction--a central ingredient of the Davida et al. approach--has been argued to have limited use in practice .
Monrose et al. proposed the first practical system that exploits behavioral (versus physiological) biometrics for key generation . Their technique uses keystroke latencies to increase the entropy of standard passwords. Their construction yields a key at least as secure as the password alone, and an empirical analysis showed that in some instances their approach increases the workload of an attacker by a multiplicative factor of . Later, a similar approach was used to generate cryptographic keys from voice [28,27]. Many constructions followed those of Monrose et al., using biometrics such as face , fingerprints [33,39], handwriting [40,17] and iris codes [16,45]. Unfortunately, many are susceptible to attacks. Hill-climbing attacks have been leveraged against fingerprint, face, and handwriting-based biometric systems [1,37,43] by exploiting information leaked during the reconstruction of the key from the biometric template.
There has also been an emergence of generative attacks against biometrics [5,23], which use auxiliary information such as population statistics along with limited information about a target user's biometric. The attacks we present in this paper are different from generative attacks because we assume that adversaries only have access to templates and auxiliary information. Our attacks, therefore, capture much more limited, and arguably more realistic, adversaries. Despite such limited information, we show how an attacker can recover a target user's key with high likelihood.
There has also been recent theoretical work to formalize particular aspects of biometric key generators. The idea of fuzzy cryptography was first introduced by Juels and Wattenberg , who describe a commitment scheme that supports noise-tolerant decommitments. In Section 7 we provide a concrete analysis of a published construction that highlights the pitfalls of using fuzzy commitments as biometric key generators. Further work included a fuzzy vault , which was later analyzed as an instance of a secure-sketch that can be used to build fuzzy extractors [11,6,12,22]. Fuzzy extractors treat biometric features as non-uniformly distributed, error-prone sources and apply error-correction algorithms and randomness extractors [18,30] to generate random strings.
Fuzzy cryptography has made important contributions by specifying formal security definitions with which BKGs can be analyzed. Nevertheless, there remains a gap between theoretical soundness and practical systems. For instance, while fuzzy extractors can be effectively used as a component in a larger biometric key generation system, they do not capture all the practical requirements of a BKG. In particular, it is unclear whether known constructions can correct the kinds of errors typically generated by humans, especially in the case of behavioral biometrics. Moreover, fuzzy extractors require biometric inputs with high min-entropy but do not address how to select features that achieve this requisite level of entropy. Since this is an inherently empirical question, much of our work is concerned with how to experimentally evaluate the entropy available in a biometric.
Lastly, Jain et al. enumerate possible attacks against biometric templates and discuss several practical approaches that increase template security . Similarly, Mansfield and Wayman discuss a set of best practices that may be used to measure the security and usability of biometric systems . While these works describe specific attacks and defenses against systems, they do not address biometric key generators and the unique requirements they demand.
Before we can argue about how to accurately assess biometric key generators (BKGs), we first define the algorithms and components associated with a BKG. These definitions are general enough to encompass most proposed BKGs.
BKGs are generally composed of two algorithms, an enrollment algorithm ( ) and a key-generation algorithm ( ):
The enrollment algorithm estimates the variation inherent to a particular user's biometric reading and computes information needed to error-correct a new sample that is sufficiently close to the enrollment samples. encodes this information into a template and outputs the template and the associated key. The key-generation algorithm uses the template output by the enrollment algorithm and a new biometric sample to output a key. If the provided sample is sufficiently similar to those provided during enrollment, then and output the same keys.
Generally speaking, there are four classes of information associated with a BKG.
For the remainder of the paper if a component of a BKG is associated with a specific user, then we subscript the information with the user's unique identifier. So, for example, , is 's biometric and is auxiliary information derived from all users .
In the context of biometric key generation, security is not as easily defined as correctness. Loosely speaking, a secure BKG outputs a key that "looks random" to any adversary that cannot guess the biometric. In addition, the templates and keys derived by the BKG should not leak any information about the biometric that was used to create them. We enumerate a set of three security requirements for biometric key generators, and examine the components that should be analyzed mathematically (i.e., the template and key) and empirically (i.e., the biometric and auxiliary information). While the necessity of the first two requirements has been understood to some degree, we will highlight and analyze how previous evaluations of these requirements are lacking. Additionally, we discuss a requirement that is often overlooked in the practical literature, but one which we believe is necessary for a secure and practical BKG.
We consider a BKG secure if it meets the following three requirements for each enrollable user in a population:
The necessity of and is well known, and indeed many proposals make some sort of effort to argue security along these lines (see, e.g., [29,11]). However, many different approaches are used to make these arguments. Some take a cryptographically formal approach, whereas others provide an empirical evaluation aimed at demonstrating that the biometrics and the generated keys have high entropy. Unfortunately, the level of rigor can vary between works, and differences in the ways and are typically argued make it difficult to compare approaches. Also, it is not always clear that the empirical assumptions required by the cryptographic algorithms of the BKG can be met in practice.
Even more problematic is that many approaches for demonstrating biometric security merely provide some sort of measure of entropy of a biometric (or key) based on variation across a population. For example, one common approach is to compute biometric features for each user in a population, and compute the entropy over the output of these features. However, such analyses are generally lacking on two counts. For one, if the correlation between features is not accounted for, the reported entropy of the scheme being evaluated could be much higher than what an adversary must overcome in practice. Second, such techniques fail to compute entropy as a function of the biometric templates, which we argue should be assumed to be publicly available. Consequently, such calculations would declare a BKG "secure" even if, say, the template leaked information about the derived key. For example, suppose that a BKG uses only one feature and simply quantizes the feature space, outputting as a key the region of the feature space that contains the majority of the measurements of a specific user's feature. The quantization is likely to vary between users, and so the partitioning information would need to be stored in each user's template. Possession of the template thus reduces the set of possible keys, as it defines how the feature space is partitioned.
As far as we know, the notion of Strong Biometric Privacy ( ) has only been considered recently, and only in a theoretical setting . Even the original definitions of fuzzy extractors [11, Definition 3] do not explicitly address this requirement. Unfortunately, has also largely been ignored by the designers of practical systems. Perhaps this oversight is due to lack of perceived practical motivation--it is not immediately clear that a key could be used to reveal a user's biometric. Indeed, to our knowledge, there have been few, if any, concrete attacks that have used keys and templates to infer a user's biometric. We observe, however, that it is precisely practical situations that motivate such a requirement; keys output by a BKG could be revealed for any number of reasons in real systems (e.g., side-channel attacks against encryption keys, or the verification of a MAC). If a key can be used to derive the biometric that was used to generate the key, then key recovery poses a severe privacy concern. Moreover, key compromise would then preclude a user from using this biometric in a BKG ever again, as the adversary would be able to recreate any key the user makes thereafter. Therefore, in Section 7 we provide specific practical motivation for this requirement by describing an attack against a well-accepted BKG. The attack combines the key and a template to infer a user's biometric.
In what follows, we provide practical motivation for the importance of each of our three requirements by analyzing three published BKGs. It is not our goal to fault specific constructions, but instead to critique evaluation techniques that have become standard practice in the field. We chose to analyze these specific BKGs because each was argued to be secure using "standard" techniques. However, we show that since these techniques do not address important requirements, each of these constructions exhibit significant weaknesses despite security arguments to the contrary.
Before continuing further, we note that analyzing the security of a biometric key generator is a challenging task. A comprehensive approach to biometric security should consider sources of auxiliary information, as well as the impact of human forgers. Though it may seem impractical to consider the latter as a potential threat to a standard key generator, skilled humans can be used to generate initial forgeries that an algorithmic approach can then leverage to undermine the security of the BKG.
To this point, research has accepted this "adversarial multiplicity" without examining the consequences in great detail. Many works (e.g., [33,29,17,15,40,16]) report both False Accept Rates (i.e., how often a human can forge a biometric) and an estimate of key entropy (i.e., the supposed difficulty an algorithm must overcome in order to guess a key) without specifically identifying the intended adversary. In this work, we focus on algorithmic adversaries given their importance in offline guessing attacks, and because we have already addressed the importance of considering human-aided forgeries [4,5]. While our previous work did not address biometric key generators specifically, those lessons apply equally to this case.
The security of biometric key generators in the face of algorithmic adversaries has been argued in several different ways, and each approach has advantages and disadvantages. Theoretical approaches (e.g., [11,6]) begin by assuming that the biometrics have high adversarial min-entropy (i.e., conditioned on all the auxiliary information available to an adversary, the entropy of the biometric is still high) and then proceed to distill this entropy into a key that is statistically close to uniform. However, in practice, it is not always clear how to estimate the uncertainty of a biometric. In more practical settings, guessing entropy  has been used to measure the strength of keys (e.g., [29,27,10]), as it is easily computed from empirical data. Unfortunately, as we demonstrate shortly, guessing entropy is a summary statistic and can thus yield misleading results when computed over skewed distributions. Yet another common approach (e.g., [31,7,16,41,17]), which has lead to somewhat misleading views on security, is to argue key strength by computing the Shannon entropy of the key distribution over a population. More precisely, if we consider a BKG that assigns the key to a user in a population , then it is considered "secure" if the entropy of the distribution is high. We note, however, that the entropy of the previous distribution measures only key uniqueness and says nothing about how difficult it is for an adversary to guess the key. In fact, it is not difficult to design BKGs that output keys with maximum entropy in the previous sense, but whose keys are easy for an adversary to guess; setting is a trivial example.
To address these issues, we present a new measure that is easy to compute empirically and that estimates the difficulty an adversary will have in guessing the output of a distribution given some related auxiliary distribution. It can be used to empirically estimate the entropy of a biometric for any adversary that assumes the biometric is distributed similarly to the auxiliary distribution. Our proposition, Guessing Distance, involves determining the number of guesses that an adversary must make to identify a biometric or key, and how the number of guesses are reduced in light of various forms of auxiliary information.
We assume that a specific user induces a distribution over a finite, -element set . We also assume that an adversary has access to population statistics that also induce a distribution, , over . could be computed from the distributions of other users . We seek to quantify how useful is at predicting . The specification of varies depending on the BKG being analyzed; could be a set of biometrics, a set of possible feature outputs, or a set of keys. It is up to system designers to use the specification of that would most likely be used by an adversary. For instance, if the output of features are easier to guess than a biometric, then should be defined as the set of possible feature outputs. Although at this point we keep the definition of and abstract, it is important when assessing the security of a construction to take as much auxiliary information as possible into account when estimating . We return in Section 5 with an example of such an analysis.
We desire a measure that estimates the number of guesses that an adversary will make to find the high-probability elements of , when guessing by enumerating starting with the most likely elements as prescribed by . That is, our measure need not precisely capture the distance between and (as might, say, -distance or Relative Entropy), but rather must capture simply 's ability to predict the most likely elements as described by 1. Given a user's distribution , and two (potentially different) population distributions and , we would like the distance between and and and to be the same if and only if and prescribe the same guessing strategy for a random variable distributed according to . For example, consider the distributions , and , and the element such that , , and . Here, an adversary with access to would require the same number of guesses to find as an adversary with access to (one). Thus, we would like the distance between and and between and to be the same.
Guessing Distance measures the number of guesses that an adversary who assumes that makes before guessing the most likely element as prescribed by (that is, )2. We take the average over and as it may be the case that several elements may have similar probability masses under . In such a situation, the ordering of may be ambiguous, so we report an average measure across all equivalent orderings. As and will typically be empirical estimates, we use a tolerance to offset small measurement errors when grouping elements of similar probability masses. The subscript is ignored if .
An important characteristic of is that it compares two probability distributions. This allows for a more fine-tuned evaluation as one can compute for each user in the population. To see the overall strength of a proposed approach, one might report a CDF of the 's for each user, or report the minimum over all 's in the population.
Guessing Distance is superficially similar to Guessing Entropy , which is commonly used to compute the expected number of guesses it takes to find an average element in a set assuming an optimal guessing strategy (i.e., first guessing the element with the highest likelihood, followed by guessing the element with the second highest likelihood, etc.) Indeed, one might view Guessing Distance as an extension of Guessing Entropy (see Appendix A); however, we prefer Guessing Distance as a measure of security as it provides more information about non-uniform distributions over a key space. For such distributions, Guessing Entropy is increased by the elements that have a low probability, and thus might not provide as conservative an estimate of security as desired. Guessing Distance, on the other hand, can be computed for each user, which brings to light the insecurity afforded by a non-uniform distribution. We provide a concrete example of such a case in Appendix A.
We now show why templates play a crucial role in the computation of key entropy ( from Section 3). Our analysis brings to light two points: first that templates, and in particular, error-correction information, can indeed leak a substantial amount of information about a key, and thus must be considered when computing key entropy. Second, we show how standard approaches to computing key entropy, even if they were to take templates into account, must be conducted with care to avoid common pitfalls. Through our analysis we demonstrate the flexibility and utility of Guessing Distance. While we focus here on a specific proposal by Vielhauer and Steinmetz [40,41], we argue that our results are generally applicable to a host of similar proposals (see, e.g., [44,7,35,17]) that use per-user feature-space quantization for error correction. This complicates the calculation of entropy and brings to light common pitfalls.
The construction works as follows. Given 50 features  that map biometric samples to the set of non-negative integers, and enrollment samples , let be the difference between the minimum and maximum value of , expanded by a small tolerance. The scheme partitions the output range of into -length segments. The key is derived by letting be the smallest integer in the segment that contains the user's samples, computing , and setting the key element . The key is , and the template is composed of . To later extract given a biometric sample , and a template , set and output . We refer the reader to  for details on correctness.
As is the case in many other proposals, Vielhauer et al. perform an analysis that addresses requirement by arguing that given that the template leaks only error correcting information (i.e., the partitioning of the feature space) it does not indicate the values . To support this argument, they conduct an empirical evaluation to measure the Shannon entropy of each . For each user they derive from and , then compute the entropy of each element across all users. This analysis is a standard estimate of entropy. To see why this is inaccurate, consider two different users and such that outputs consistent values on feature and does not. Then the partitioning over 's range differs for each user. Thus, even if the mean value of is the same when measured over both 's and 's samples, this mean will be mapped to different partitions in the feature space, and thus, a different key. This implies that computing entropy over the overestimates security because the mapping induced by the templates artificially amplifies the entropy of the biometrics. A more realistic estimate of the utility afforded an adversary by auxiliary information can be achieved by fixing a user's template, and using that template to error-correct every other user's samples to generate a list of keys, then measuring how close those keys are to the target user's key. By conditioning the estimate on the target users template we are able to eliminate the artificial inflation of entropy and provide a better estimate of the security afforded by the construction.
We implemented the construction and tested the technique using all of the passphrases in the data set we collected in , which consists of over 9,000 writing samples from 47 users. Each user wrote the same five passphrases 10-20 times. In our analysis we follow the standard approach to isolate the entropy associated with the biometric: we compute various entropy measures using each user's rendering of the same passphrase [29,5,36] (this approach is justified as user selected passphrases are assumed to have low entropy). Tolerance values were set such that the approach achieved a False Reject Rate (FRR) of (as reported in ) and all outliers and samples from users who failed to enroll  were removed.
Figure 1 shows three different measures of key uncertainty. The first measure, denoted Standard, is the common measure of interpersonal variation as reported in the literature (e.g., [17,7]) using the data from our experiments. Namely, if the key element has entropy across the entire population, then the entropy of the key space is computed as . We also show two estimates of guessing distance, the first ( , plotted as GD-P) does not take a target user's template into account and is just the distribution over all other users's keys in the population (the techniques we use to compute these estimates are described in Appendix B). The second ( , plotted as GD-U) takes the user's template into account, computing by taking the biometrics from all other users in the population, and generating keys using , then computing the distribution over these keys.
Figure 1 shows the CDF of the number of guesses that one would expect an adversary to make to find each user's key. There are several important points to take away from these results. The first is the common pitfalls associated with computing key entropy. The difference between and the standard measurement indicates that the standard measurement of entropy ( bits in this case) is clearly misleading--under this view one would expect an adversary to make guesses on average before finding a key. However, from it is clear that an adversary can do substantially better than this. The difference in estimates is due to the fact that takes into account conditional information between features whereas a more standard measure does not.
The second point is the impact of a user's template on computing . We can see by examining that if we take the usual approach of just computing entropy over the keys, and ignore each user's template, we would assume only a small probability of guessing a key in fewer than attempts. On the other hand, since the templates reduce the possible key space for each user, the estimate provides a more realistic measurement. In fact, an adversary with access to population statistics has a chance of guessing a user's key in fewer than attempts, and chance guessing a key in a single attempt!
These results also shed light on another pitfall worth mentioning--namely, that of reporting an average case estimate of key strength. If we take the target user's template into account in the current construction, of the keys can be guessed in one attempt despite the estimated Guessing Entropy being approximately . In summary, this analysis highlights the importance of conditioning entropy estimates on publicly available templates, and how several common entropy measures can result in misleading estimates of security.
Recall that a scheme that achieves Weak Biometric Privacy uses templates that do not leak information about the biometrics input during enrollment. A standard approach to arguing that a scheme achieves is to show (1) auxiliary information leaks little useful information about the biometrics, and (2) templates do not leak information about a biometric. This can be problematic as the two steps are generally performed in isolation. In our description of , however, we argue that step (2) should actually show that an adversary with access to both templates and auxiliary information should learn no information about the biometric. The key difference here is that auxiliary information is used in both steps (1) and (2). This is essential as it is not difficult to create templates that are secure when considered in isolation, but are insecure once we consider knowledge derived from other users (e.g., population-wide statistics). In what follows we shed light on this important consideration by examining the scheme of Hao and Wah . While our analysis focuses on their construction, it is pertinent to any BKG that stores partial information about the biometric in the template [43,26].
For completeness, we briefly review the construction. The BKG generates DSA signing keys from dynamic features associated with handwriting (e.g., pen tip velocity or writing time). The range of each feature is quantized based on a user's natural variation over the feature. Each partition of a feature's range is assigned a unique integer; let be the integer that corresponds to the partition containing the output of feature when applied to the user's biometric. The signing key is computed as . The template stores information that describes the partitions for each feature, as well as the coordinates that define the pen strokes of the enrollment samples, and the verification key corresponding to . The coordinates of the enrollment samples are used as input to the Dynamic Time Warping  algorithm during subsequent key generation; if the provided sample diverges too greatly from the original samples, it is immediately rejected and key generation aborted.
Hao et al. performed a typical analysis of . First, they compute the entropy of the features over the entire population of users to show that auxiliary information leaks little information that could be used to discern the biometric. Second, the template is argued to be secure by making the following three observations. First, since the template only specifies the partitioning of the range of each feature, the template only leaks the variation in each feature, not the output. Second, for a computationally bound adversary, a DSA verification key leaks no information about the DSA signing key. Third, since the BKG employs only dynamic features, the static coordinates leak no relevant information. Note that in this analysis the template is analyzed without considering auxiliary information. Unfortunately, while by themselves the auxiliary information and the templates seem to be of little use to an adversary, when taken together, the biometric can be easily recovered.
To empirically evaluate our attack, we used the same data set as in Section 5. Our implementation of the BKG had a FRR of 29.2% and a False Accept Rate (FAR) of , which is inline with the FRR/FAR of 28%/1.2% reported in . Moreover, if we follow the computation of inter-personal variation as described in , then we would incorrectly conclude that the scheme creates keys with over 40 bits of entropy with our data set, which is the same estimate provided in . However, this is not the case (see Figure 2). In particular, the fact that the template leaks information about the biometric enables an attack that successfully recreates the key of the time on the first try; approximately of the keys are correctly identified after making fewer than guesses. In summary, the significance of this analysis does not lie in the effectiveness of the described attack, but more so in the fact that the original analysis failed to take auxiliary information into consideration when evaluating the security of the template.
Lastly, we highlight the importance of quantifying the privacy of a user's biometric against adversaries who have access to the cryptographic key (i.e., from Section 3). We examine a BKG proposed by Hao et al. .4 The construction generates a random key and then "locks" it with a user's iris code. The construction uses a cryptographic hash function and a "concatenated" error correction code consisting of an encoding algorithm , and the corresponding decoding algorithm . This error correction code is the composition of a Reed-Solomon and Hadamard code [16, Section 3]. Iris codes are elements in .
The BKG works as follows: given a user's iris code , select a random string , and derive the template , and output and . To later derive the key given an iris code and the template , compute . If , then output , otherwise, fail. If and are "close" to one another, then is "close" to , perhaps differing in only a few bits. The error correcting code handles these errors, yielding .
Hao et al. provide a security analysis arguing requirement using both cryptographic reasoning and a standard estimate of entropy of the input biometric. That is, they provide empirical evidence that auxiliary information cannot be used to guess a target user's biometric, and a cryptographic argument that, assuming the former, the template and auxiliary information cannot be used to guess a key. They conservatively estimate the entropy of to be 44 bits. Moreover, the authors note that if the key is ever compromised, the system can be used to "lock" a new key, since is selected at random and is not a function of the biometric.
Unfortunately, given the current construction, compromise of , in addition to the public information , allows one to completely reconstruct . Thus, even if a user were to create a new template and key pair, an adversary could use the old template and key to derive the biometric, and then use the biometric to unlock the new key. The significance of this is worth restating: because this BKG fails to meet , the privacy of a user's biometric is completely undermined once any key for that user is ever compromised.
To underscore the practical significance of each of these requirements, we present analyses of three published systems. While we point out weaknesses in specific constructions, it is not our goal to fault the those specific works. Instead, we aim to bring to light flaws in the standard approaches that were followed in each setting. In one case we exploit auxiliary information to show that an attacker can guess of the keys on her first attempt. In another case, we highlight the importance of ensuring biometric privacy by exploiting the information leaked by templates to yield a chance of guessing a user's key in one attempt. Lastly, we show that subtleties in BKG design can lead to flaws that allow an adversary to derive a user's biometric given a compromised key and template, thereby completely undermining the security of the scheme.
We hope that our work encourages designers and evaluators to analyze BKGs with a degree of skepticism, and to question claims of security that overlook the requirements presented herein. To facilitate this type of approach, we not only ensure that our requirements can be applied to real systems, but also introduce Guessing Distance--a heuristic measure that estimates the uncertainty of the outputs of a BKG given access to population statistics.
Guessing Entropy  is commonly used for measuring the expected number of guesses it takes to find an average element in a set assuming an optimal guessing strategy (i.e., first guessing the element with the highest likelihood, followed by guessing the element with the second highest likelihood, etc.). Given a distribution over and the convention that , Guessing Entropy is computed as .
Guessing Entropy is commonly used to determine how many guesses an adversary will take to guess a key. At first, Guessing Entropy and Guessing Distance appear to be quite similar. However, there is one important difference: Guessing Entropy is a summary statistic and Guessing Distance is not. While Guessing Entropy provides an intuitive and accurate estimate over distributions that are close to uniform, the fact that there is one measure of strength for all users in the population may result in somewhat misleading results when Guessing Entropy is computed over skewed distributions.
To see why this is the case, consider the following distribution: let be defined over as , and for . That is, one element (or key) is output by 50% of the users and the remaining elements are output with equal likelihood. The Guessing Entropy of is:
Thus, although the expected number of guesses to correctly select is approximately , over half of the population's keys are correctly guessed on the first attempt following the optimal strategy. To contrast this, consider an analysis of Guessing Distance with threshold . (Assume for exposition that distributions are estimated from a population of users.) To do so, evaluate each user in the population independently. Given a population of users, first remove a user to compute and use the remaining users to compute . Repeat this process for the entire population.
In the case of our pathological distribution, we may consider only two users without loss of generality: a user with distribution who outputs key , and user with distribution who outputs key . In the first case, we have , because the majority of the mass according to is assigned to , which is the most likely element according to . For , we have and , and thus . Taking the minimum value (or even reporting a CDF) shows that for a large proportion of the population (all users with distribution ), this distribution offers no security--a fact that is immediately lost if we only consider a summary statistic. However, it is comforting to note, that if we compute the average of over all users, we obtain estimates that are identical to that of guessing entropy for sets that are sufficiently large:
As noted in Section 5, it is difficult to obtain a meaningful estimate of probability distributions over large sets, e.g., . In order to quantify the security defined by a system, it is necessary to find techniques to derive meaningful estimates. This Appendix discusses how we estimate . The estimate also implicitely defines an algorithm that can be used to guess keys.
For convenience we use to denote both a biometric feature and the random variable that is defined using population statistics over (taken over the set ). If a distribution is not subscripted, it is understood to be taken over the key space . Our estimate uses of several tools from information theory:
We use the notation to denote the mutual information between the random variable and the random variable defined by the joint distribution between the random variables and .
Let be the guessing distance between the user's and population's distribution over conditioned on the even that . In particular, let be the elements of ordered such that
As before, let , and and be the smallest and largest indexes such that
Then, . In other words, if an adversary assumes that a target user is distributed according to the population and fixes the values of certain features, this is the number of guesses she will need to make to guess another feature. Unfortunately, this quantity is also infeasible to compute in light of data constraints so we endeavor to find an easily computable estimate. To this end, define the weight ( ) of an element in as:
The weights of elements that are more likely to occur given the values of other features will be larger than the weights that are less likely to occur. Intuitively, each of the values ( ) has an influence on and those values that correspond to features that have a higher correlation with have more influence. We also note that we only use two levels of conditional probabilities, which are relatively easy to compute, instead of conditioning over the entire space. Now, we use the weights to estimate the probability distributions as:
Note that while this technique may not provide a perfect estimate of each probability, our goal is to discover the relative magnitude of the probabilities because they will be used to estimate Guessing Distance. We believe that this approach achieves this goal.
We are almost ready to provide an estimate of . First, we specify an ordering for the features. The ordering will be according to an ordering measure ( ) such that features with a larger measure have a low entropy (and are therefore easier to guess) and have a high correlation with other features. An adversary could then use this ordering to reduce the number of guesses in a search by first guessing features with a higher measure. Define the feature-ordering measure for as:
Finally, we reindex the features such that for all , and estimate the guessing distance for a specific user with as:
This estimate helps in modeling an adversary that performs a brute-force search over all of the features by starting with the features that are easiest to guess and using those features to reduce the uncertainty about features that are more difficult to guess. For each feature, the adversary will need to make guesses to find the correct value. Since each incorrect guess ( of them) will cause a fruitless enumeration of the rest of the features, we multiply the number of incorrect guesses by the sizes of the ranges of the remaining features. Finally, we take the log to represent the number of guesses as bits.
Section 5 uses this estimation technique to measure of a user versus the population ( ), and for a user versus the population conditioned on the user's template ( ). The only way in which the estimation technique differs between the two settings is the definition of . In the case of , is computed by measuring the key element for every other user in the population. In the case of , is computed using all of the other user's samples in conjuction with the target user's template to derive a set of keys and taking the distribution over the th element of the keys.