Password distribution

Next: Measures Up: Security evaluation Previous: Security evaluation

Password distribution

In this section we describe how we approximately compute $\Pr\left[{p}^{(k)} \leftarrow {\cal S}\right]$ for any ${p}^{(k)}$ , i.e., the probability that the scheme yields the password ${p}^{(k)}$ . This probability is taken with respect to both random choices by the password selection algorithm and user choices.

We compute this probability inductively as follows. Suppose ${p}^{(\ell+1)} = {q}^{(\ell)}{r}^{(1)}$ . Then

$\displaystyle {\Pr\left[{p}^{(\ell+1)} \leftarrow {\cal S}\right]}$
	$\textstyle =$	$\displaystyle \Pr\left[{q}^{(\ell)} \leftarrow {\cal S}\right] \cdot$
		$\displaystyle \Pr\left[{q}^{(\ell)}{r}^{(1)} \leftarrow {\cal S}\mid {q}^{(\ell)} \leftarrow {\cal S}\right]$	(1)

if ${p}^{(\ell+1)}$ is valid for ${\cal S}$ and zero otherwise, where $\Pr\left[{q}^{(0)} \leftarrow {\cal S}\right] {\:\stackrel{{\scriptscriptstyle\rm def}}{=}\:}1$ . Here, ${p}^{(\ell+1)}$ is valid iff $\ell < k$ and, for the Story scheme, ${p}^{(\ell+1)}$ does not contain any category more than once. The second factor $\Pr\left[{q}^{(\ell)}{r}^{(1)} \leftarrow {\cal S}\mid {q}^{(\ell)} \leftarrow {\cal S}\right]$ should be understood to mean the probability that the user selects ${r}^{(1)}$ after having already selected ${q}^{(\ell)}$ according to scheme ${\cal S}$ . If the dataset contains sufficiently many observations, then this can be approximated by

$\begin{displaymath} \Pr\left[{q}^{(\ell)}{r}^{(1)} \leftarrow {\cal S}\mid {q}^... ...ht]} {\char93 \left[{q}^{(\ell)} \leftarrow {\cal S}\right]}, \end{displaymath}$

(2)

i.e., using the maximum likelihood estimation, where $\char93 \left[{x}^{(\ell)} \leftarrow {\cal S}\right]$ denotes the number of occurrences of ${x}^{(\ell)} \leftarrow {\cal S}$ in our dataset, and where $\char93 \left[{x}^{(0)} \leftarrow {\cal S}\right]$ is defined to be the number of passwords for scheme ${\cal S}$ in our dataset.

A necessary condition for the denominator of (2) to be nonzero for every possible ${q}^{(k-1)}$ is that the dataset contain $N^{k-1}$ samples for scheme ${\cal S}$ where $N \ge n$ denotes the number of image categories for ${\cal S}$ . ( in Face, and in Story.) $N^{k-1}$ is over 1700 in the Face scheme, for example. And, of course, to use (2) directly to perform a meaningful approximation, significantly more samples would be required. Thus, we introduce a simplifying, Markov assumption: a user's next decision is influenced only by her immediately prior decision(s) (e.g., see [16]). In other words, rather than condition on all of the previous choices made in a password ( ${q}^{(\ell)}$ ), only the last few choices are taken into account. Let $\ldots {x}^{(\ell)} \leftarrow {\cal S}$ denote the selection of an $\ell'$ -tuple, $\ell' \ge \ell$ , for which the most recent $\ell$ selections are ${x}^{(\ell)}$ .

Assumption 4.1 There exists a constant $\hat{\ell} \ge 0$ such that if $\ell \ge \hat{\ell}$ then

$\displaystyle {\Pr\left[{q}^{(\ell)}{r}^{(1)} \leftarrow {\cal S}\mid {q}^{(\ell)} \leftarrow {\cal S}\right]}$
	$\textstyle \approx$	$\displaystyle \Pr\left[\ldots{s}^{(\hat{\ell})}{r}^{(1)} \leftarrow {\cal S}\mid \ldots{s}^{(\hat{\ell})} \leftarrow {\cal S}\right]$	(3)

where ${s}^{(\hat{\ell})}$ is the $\hat{\ell}$ -length suffix of ${q}^{(\ell)}$ . We denote probabilities under this assumption by $\Pr_{\hat{\ell}}[\cdot]$ .

In other words, we assume that if $\ell \ge \hat{\ell}$ , then the user's next selection ${r}^{(1)}$ is influenced only by her last $\hat{\ell}$ choices. This appears to be a reasonable assumption, which is anecdotally supported by certain survey answers, such as the following from a user of the Face scheme.

``To start, I chose a face that stood out from the group, and then I picked the closest face that seemed to match.''

While this user's intention may have been to choose a selection similar to the first image she selected, we conjecture that the most recent image she selected, being most freshly on her mind, influenced her next choice at least as much as the first one did. Assumption 4.1 also seems reasonable for the Story scheme on the whole, since users who selected passwords by choosing a story were presumably trying to continue a story based on what they previously selected.

Assumption 4.1 permits us to replace (2) by

$\displaystyle {{\textstyle\Pr_{\hat{\ell}}} \left[{q}^{(\ell)}{r}^{(1)} \leftarrow {\cal S}\mid {q}^{(\ell)} \leftarrow {\cal S}\right]}$
	$\textstyle \approx$	$\displaystyle \frac{\char93 \left[\ldots{s}^{(\hat{\ell})}{r}^{(1)} \leftarrow ... ... S}\right]} {\char93 \left[\ldots{s}^{(\hat{\ell})} \leftarrow {\cal S}\right]}$	(4)

where ${s}^{(\hat{\ell})}$ is the $\hat{\ell}$ -length suffix of ${q}^{(\ell)}$ and we define $\char93 \left[\ldots{s}^{(0)} \leftarrow {\cal S}\right]$ to be the total number of category choices (

times the number of passwords) in our dataset for scheme ${\cal S}$ . Here, the necessary condition for the denominator of (4) to be nonzero for each ${s}^{(\hat{\ell})}$ is that the dataset for ${\cal S}$ contain $N^{\hat{\ell}}$ samples, e.g., in the Face scheme, twelve for $\hat{\ell} = 1$ , and so on.

We further augment the above approach with smoothing in order to compensate for gaps in the data (c.f., [16]). Specifically, we replace (4) with

$\displaystyle {\textstyle\Pr_{\hat{\ell}}\left[{q}^{(\ell)}{r}^{(1)} \leftarrow {\cal S}\mid {q}^{(\ell)} \leftarrow {\cal S}\right]}$
	$\textstyle \approx$	$\displaystyle \frac{\char93 \left[\ldots{s}^{(\hat{\ell})}{r}^{(1)} \leftarrow ... ...eft[\ldots{s}^{(\hat{\ell})} \leftarrow {\cal S}\right] + \lambda_{\hat{\ell}}}$	(5)

where ${s}^{(\hat{\ell})}$ is the $\hat{\ell}$ -length suffix of ${q}^{(\ell)}$ ; $\lambda_{\hat{\ell}} > 0$ is a real-valued parameter; and where if $\hat{\ell} > 0$ then

$\begin{displaymath} \Psi_{\hat{\ell}-1} = {\textstyle \Pr_{\hat{\ell}-1}}\left[... ...leftarrow {\cal S}\mid {q}^{(\ell)} \leftarrow {\cal S}\right] \end{displaymath}$

and $\Psi_{\hat{\ell}-1} = 1/N$ otherwise. Note that as $\lambda_{\hat{\ell}}$ is reduced toward

, (5) converges toward (4). And, as $\lambda_{\hat{\ell}}$ is increased, (5) converges toward $\Psi_{\hat{\ell}-1}$ , i.e., a probability under Assumption 4.1 for $\hat{\ell}-1$ , a stronger assumption. So, with sufficient data, we can use a small $\lambda_{\hat{\ell}}$ and thus a weaker assumption. Otherwise, using a small $\lambda_{\hat{\ell}}$ risks relying too heavily on a small number of occurrences of $\ldots{s}^{(\hat{\ell})} \leftarrow {\cal S}$ , and so we use a large $\lambda_{\hat{\ell}}$ and thus the stronger assumption.

Next: Measures Up: Security evaluation Previous: Security evaluation