November 18, 2023
Honeyword is a representative “honey" technique to detect intruders by luring them with decoy data. This kind of honey technique blends a primary object (from distribution \(P\)) with decoy samples (from distribution \(Q\)). In this research, we focus on two key Honeyword security metrics: (1) flatness function (\(\epsilon_{\mathcal{A}}(i)\)): the probability that a distinguishing attacker \(\mathcal{A}\) (a probabilistic Turing Machine) succeeds in picking up the real one from honey objects of a user within \(i\) chances, and (2) success-number function (\(\lambda_U^{\mathcal{A}}(i)\)): in a Honeyword system of \(U\) users, the expectation of \(\mathcal{A}\)’s success number (the number of passwords submitted by \(\mathcal{A}\)) before its failure number (the number of honeywords submitted by \(\mathcal{A}\)) reaches a threshold \(i\). Previous researchers are engaged in designing experimental methods to estimate their values. We’ve derived theoretical formulas on both metrics of the strongest \(\mathcal{A}\) using the optimal guessing strategy, marking a first in the field.
The mathematical structures of these metrics are intriguing: the flatness function has an expression as \(\epsilon(i)=\sum_{j=1}^{i}\int_{0}^{+\infty}\tbinom{k-1}{j-1} f(x)G^{k-j}(x)(1-G(x))^{j-1}dx\). In particular, the most important one, \(\epsilon(1)\) is \(\frac{1}{k}(M-\int_{0}^{M}G^k(x)dx)+b\), where \(M=\max_{x: Q(x)\neq 0}\frac{P(x)}{Q(x)}\), \(b=\sum_{x: Q(x)=0}P(x)\), and \(G\) is a cumulative distribution function derived from \(P\) and \(Q\). This formula provides a criterion to compare different honey distributions: the one with smaller \(M\) and \(b\) is more satisfactory. The mathematical structure of the success-number function is a series of convolutions with beta distribution kernels: \(\lambda_U(i)=U\sum_{j=1}^{i}\int_{\frac{1}{k}}^{1} \frac{\phi(x)}{1-\phi(x)} \tbinom{U-1}{j-1} x^{U-j}(1-x)^{j-1}dx\), where \(U\) is the number of users in the system and \(\phi(x)\) is a monotonically increasing function. For further elaboration, we made some representative calculations. Our findings offer insights into security assessments for Honeyword and similar honey techniques, contributing to enhanced security measures in these systems.
Honeyword ,Security analysis ,Flatness ,Success-number
Honeyword is a simple but creative scheme designed to detect password database breaches [1]. The main idea of Honeyword is to store a user’s actual password with many plausible passwords (named “honeywords”) together on the server. A typical Honeyword system has three parameters: \(k, T_1, T_2\). For each account, its “sweetword" list consists of its real password and \((k-1)\) honeywords generated by the system. Thus even if the attacker gets the password files, the attacker cannot recognize the actual password easily. Moreover, if a honeyword is used to log in by someone, the server will be alerted. The classic detection strategy is as follows [2]: an account can be locked out if a certain number of its honeywords (such as \(\ge T_1\), typically \(T_1=1\)) are submitted within a short period of time; if a large number of honeywords of the entire system are submitted in a short amount of time (such as \(\ge T_2\), typically \(T_2=10^4\)), then the database may have been breached.
So far, the researchers have developed two security metrics of Honeyword against distinguishing attacks. One is the “flatness". Here we give a definition based on a game with a security parameter \(n\). The attacker \(\mathcal{A}\) considered below is any probabilistic Turing machine.
Definition 1. \(FlatGame^{P,Q,k}_{\mathcal{A}}(n)\): Given a sweetword list consisting of one password (sampled from \(P\)) and \((k-1)\) honeywords (sampled from \(Q\)), the attacker \(\mathcal{A}\) gives \(n\) outputs (“guesses"). If the password lies in the outputs, the attacker wins the game. Otherwise, the attacker fails.
When \(n=1\), \(\max_{\mathcal{A}}\Pr [FlatGame^{P,Q,k}_{\mathcal{A}}(1)=1]\) (the highest probability of success of any attacker) is defined as “flatness" (\(\epsilon\)) by Juels and Rivest [1]. More generally, we define \(\epsilon_{\mathcal{A}}(i)=\Pr[FlatGame^{P,Q,k}_{\mathcal{A}}(i)=1]\) and \(\epsilon(i)=\max_{\mathcal{A}}\epsilon_{\mathcal{A}}(i)\), where \(1 \le i \le k\). In the following, if \(\epsilon\) appears alone, it means \(\epsilon(1)\).
Another security metric of the Honeyword is the “success-number". For simplicity of calculation, our paper deals only with the typical case \(T_1=1\) when discussing the success-number. In other words, the attacker only has one chance for each account. Here we give a definition based on a game with a security parameter \(t\). It aims to gauge how many vulnerable”low-hanging fruits" are in the face of an attacker \(\mathcal{A}\).
Definition 2. \(SNGame^{P,Q,k,U}_{\mathcal{A}}(t)\): Given sweetword lists of \(U\) users (iid. generated as the flatness game defined above), the attacker \(\mathcal{A}\) outputs \(U\) “guesses" like \((x, pw^*)\), where \(1\le x\le U\) and \(x\) is different. Then each guess is checked in order. If the \(pw^*\) is the password of \(x\)-th user, the success number adds one. Otherwise, the failure number adds one. The game outputs the success number when the failure number reaches \(t\), or after \(U\) guesses are all checked.
We care about the expectation of the output of the game. Define \(\lambda_{U}^{\mathcal{A}}(t)=\mathbb{E}[SNGame^{P,Q,k,U}_{\mathcal{A}}(t)]\) and \(\lambda_{U}(t)=\max_{\mathcal{A}}\lambda_{U}^{\mathcal{A}}(t)\). We call \(\lambda_U(t)\) the success-number function.
The formalization of the above two definitions is shown below, where GenSWL (generate sweetword list) is a helper function.
The research on Honeyword started in 2013 [1] and took shape in 2017 [2]. Juels and Rivest first proposed the Honeyword scheme and designed a lot of honeyword generation techniques [1]. They also pointed out some attacks on the Honeyword system including DoS attacks, distinguishing attacks, and so on. As for the security analysis of distinguishing attacks, they came up with the concept “\(\epsilon\)-flat" and used the Bayesian formula to determine the optimal attacking strategy, which is formally stated by Wang et al. [2]. Moreover, Wang et al. proposed two security metrics of the Honeyword system: flatness graph and success-number graph, which are essentially \(\epsilon_{\mathcal{A}}(i)\) and \(SNGame^{P,Q,k,U}_{\mathcal{A}}(t)\) in our definition, though they tried to capture their upper bounds (\(\epsilon(i)\) and \(\lambda_U(t)\) in our definition). They took the trouble to design a lot of distinguishing attackers \(\mathcal{A}\) and simulated their attacks on an instantiated honeyword system to obtain their flatness and success number graphs. They ran extensive experiments on different honeyword generation techniques with various datasets.
Unfortunately, every success-number graph they painted is actually the \(SNGame^{P,Q,k,U}_{\mathcal{A}}(t)\) of an attacker \(\mathcal{A}\) in an instantiated SNGame in our definition, rather than the expectation of \(SNGame^{P,Q,k,U}_{\mathcal{A}}(t)\): \(\lambda_{U}^{\mathcal{A}}(t)\), leaving alone the strongest attacker’s success-number \(\lambda_{U}(t)\). In fact, the experimental methods only give the ability of an attacker simulated in the experiment, not the ability of the strongest attacker in our definition.
This paper moves on from where the explorer left off. Our contributions are listed below:
We formalize the security of the Honeyword system with game-style definitions of flatness and sucess-number metrics. Moreover, we provide mathematical formulations for the two key security metrics of the strongest attacker. With this knowledge, system administrators can make informed decisions regarding appropriate settings, such as the choice of \(k\), \(T_1\), and \(T_2\) to effectively attain their desired security levels.
Furthermore, we have conducted calculations for several representative scenarios. These results support or challenge the assertions and hypotheses put forth by previous researchers, thereby contributing valuable insights to the field.
Surprisingly, the mathematical structures of their expressions are quite interesting. The \(\epsilon(1)\) has a formula as \(\frac{1}{k}(M-\int_{0}^{M}G^k(x)dx)+b\), where \(M,b\) are constant and \(G\) is a cumulative distribution function over \([0, M]\). The success-number function is a series of convolutions with beta distribution kernels. This newfound understanding holds the potential to inspire researchers for further exploration.
This section shows the existing results of the security analysis theory on Honeyword.
Juels and Rivest gave out the definition of flatness and used the Bayesian formula to obtain the optimal guessing strategy of the attacker [1], which is formally stated by Wang et al. [2]. Under the assumption that the password of each user is independent and identically distributed, the conditional probability that the \(m\)-th sweetword (\(sw_{im}\)) of the \(i\)-th User is the true password in a password file \(F\), can be calculated by only considering the sweetwords of the \(i\)-th User: \[\Pr[sw_{im}=pw_i \vert F]=\Pr[sw_{im}=pw_i \vert sw_{i1},sw_{i2},\dots,sw_{ik}] .\]
Assume that each honeyword of a user is also iid. Denote the true password distribution by \(P\) and the honeyword distribution by \(Q\). The conditional probability that the \(m\)-th sweetword of \(i\)-th User \(sw_{im}\) is the true password can be calculated as:
\[\begin{align} \Pr[sw_{im}=pw_i|sw_{i1},\dots,sw_{ik}]&=\frac{P(sw_{im})\prod_{j=1,j\neq m}^{k}Q(sw_{ij})}{\sum_{l=1}^{k}P(sw_{il})\prod_{j=1,j\neq l}^{k}Q(sw_{ij})} \\ &=\frac{\frac{P(sw_{im})}{Q(sw_{im})}}{\sum_{j=1}^{k}\frac{P(sw_{ij})}{Q(sw_{ij})}}. \label{f1} \end{align}\tag{1}\]
This result implies a user’s password is most likely the sweetword with the biggest \(P/Q\).
According to kerckhoffs’ principle, we assume the attacker has full knowledge of the honeyword distribution \(Q\). To calculate the \(\epsilon(i)\), we need to consider the strongest attacker who has full knowledge of \(P\) and \(Q\). In the following, we consider the continuous case and the discrete case, according to whether \(P\) and \(Q\) are probability distributions of continuous random variables or discrete random variables. Of course, the password space is discrete in reality, but the theoretical calculation of the continuous case is natural and can be applied to other scenarios.
Consider two cumulative probability distributions defined as \[\begin{equation} F(x)=\Pr_{pw \gets P}[\frac{P(pw)}{Q(pw)}\le x], \tag{2} \end{equation} and \begin{equation} G(x)=\Pr_{pw \gets Q}[\frac{P(pw)}{Q(pw)}\le x]. \tag{3} \end{equation}\]
Denote their corresponding probability distribution functions by \(f(x)\) and \(g(x)\).
Define \(f(+\infty):= \Pr_{pw \gets P}[Q(pw)=0]=b\), \(M=\max_{x: Q(x)\neq 0}\frac{P(x)}{Q(x)}\).
Recall that the optimal attack strategy of an attacker is to choose the password with the largest \(P/Q\) for the first attack. The \(P/Q\) probability distribution of the password sampled from \(P\) is exactly the \(f\) defined above. For example, if the \(P/Q\) of the user’s real password is 1.4, then the attacker will succeed in the first guess if and only if the \(P/Q\) values of all \((k-1)\) honeywords are less than \(1.4\). The probability of this case is \(f(1.4)\cdot G^{k-1}(1.4)\). If \(b=0\), by considering all possible \(P/Q\) values from \(0\) to \(M\), we reach the formula (4 ):
\[\begin{align} \epsilon(1)=\int_{0}^{M} f(x)G^{k-1}(x)dx \label{epsilon11} \end{align}\tag{4}\] where \(G\) is the cumulative distribution function of \(g\) and \(M\) is the maximum value of \(\frac{P(pw)}{Q(pw)}, pw \in \mathcal{PW}\).
In some case, \(f(+\infty)= b >0\), then \(\epsilon(1)=\int_{0}^{M} f(x)G^{k-1}(x)dx +b\). We discuss cases of \(f(+\infty)=0\) first and leave the cases that \(b>0\) in Section 3.3.
Similarly, \(\epsilon(i)\) can be calculated. By the definition of \(\epsilon(i)\) and the optimal attack strategy, the strongest attacker wins in a \(FlatGame^{P,Q,k}_{\mathcal{A}}(i)\) if and only if it happens that the number of honeywords with \(P/Q\) values greater than the \(P/Q\) of the true password is less than \((i-1)\).
Theorem 1. When \(f(+\infty)=0\), the flatness function in the Definition 1 in the continuous case has the formula: \[\epsilon(i)=\sum_{j=1}^{i}\int_{0}^{M}\tbinom{k-1}{j-1} f(x)G^{k-j}(x)\left(1-G(x)\right)^{j-1}dx.\]
The following property plays an important part in the analysis of \(\epsilon(i)\):
Proposition 1. \(f\) and \(g\) defined above have a relationship: \[\begin{align} f(x)=x\cdot g(x). \label{f2} \end{align}\qquad{(1)}\]
Formula (?? ) was demonstrated in a paper in 2022 [3]. To illustrate the idea, we provide a calculation in the discrete case here. Recall the definitions of the two probability distributions \(f\) and \(g\). Consider there are \(m\) passwords \(pw_1,pw_2,\cdots,pw_m\) whose \(P/Q\) value is \(x\), so that \[f(x)=\sum_{i=1}^m P(pw_i)=\sum_{i=1}^m x\cdot Q(pw_i)=x\sum_{i=1}^m Q(pw_i)=x\cdot g(x).\]
Theorem 2. When \(f(+\infty)=0\), the \(\epsilon(1)\) defined above has the formula: \[\epsilon(1)=\frac{1}{k}\left( M-\int_{0}^{M}G^k(x)dx \right). \label{f3}\qquad{(2)}\]
Proof. \[\begin{align} \epsilon(1)&=\int_{0}^{M} f(x)G^{k-1}(x)dx\\ &=\int_{0}^{M} x\cdot g(x)\cdot G^{k-1}(x)dx\\ &=\int_{0}^{M} x \cdot G^{k-1}(x)dG(x)\\ &=\frac{1}{k}\left(M-\int_{0}^{M}G^k(x)dx \right). \end{align}\] ◻
Formula (?? ) reveals clearly how \(\epsilon(1)\) of a honeyword System evolves with \(k\). In fact, \(\epsilon(1)=\Theta(\frac{M}{k})\). The remainder is an integral of \(G^k(x)\), where \(G\) is a cumulative distribution function.
Example 1. \(P(x)=x+0.5, 0\le x \le 1\) and \(Q(x)=1, 0 \le x \le 1\).
In this case, \(f(x)=x,\;\;0.5\le x \le 1.5;\; g(x)=1,\;\;0.5\le x \le 1.5\). Thus \[\begin{align} \epsilon(1)&=\frac{1}{k}\left(M-\int_{0}^{M}G^k(x)dx\right)=\frac{1}{k}(1.5-\frac{1}{k+1})\\ &=\frac{1.5}{k}-\frac{1}{k(k+1)}. \end{align}\] The \(flatness(1)\) is \(\Theta(\frac{1.5}{k})\). We can also calculate all \(\epsilon(i), 1 \le i\le k\): \[\epsilon(i)=\frac{3k+2}{2k(k+1)}\cdot i -\frac{i^2}{2k(k+1)}.\] It is a quadratic function of \(i\). Its function graph in the case of \(k=20\) is shown in Figure 3.
By viewing integration as summing, we can extend the results of the continuous case to the discrete case. However, there is a special case when a sweetword list contains \(i\) sweetwords whose \(P/Q\) values are equal. In this scenario, an optimal guessing strategy is to pick one of these \(i\) sweetwords at random.
Theorem 3. The flatness defined in the Definition 1 in the discrete case has the formula: \[\epsilon(1)=\sum_{x}\frac{1}{k}x[G^{k}(x)-G^{k}(x^-)].\]
Proof. It can be calculated as follows:
\[\begin{align} \epsilon(1)&=\sum_{x}\sum_{i=1}^{k}\frac{1}{i}f(x)\tbinom{k-1}{i-1}G^{k-i}(x^-)g^{i-1}(x)\\ &=\sum_{x}\sum_{i=1}^{k}\frac{1}{k}f(x)\tbinom{k}{i}G^{k-i}(x^-)g^{i-1}(x)\\ &=\sum_{x}\sum_{i=1}^{k}\frac{1}{k}xg(x)\tbinom{k}{i}G^{k-i}(x^-)g^{i-1}(x)\\ &=\sum_{x}\frac{1}{k}x[(G(x^-)+g(x))^k-G^k(x^-)]\\ &=\sum_{x}\frac{1}{k}x[G^{k}(x)-G^{k}(x^-)], \end{align}\] where \(G(x_0^-)=\Pr_{x \gets g}[x<x_0]\). ◻
This is analogous to the result in the continuous case: \(\frac{1}{k}\int_0^M xdG^{k}(x)\).
Before moving on to the next example, we need to make some assumptions about the distribution \(P\) of the true password. It is a fact that the passwords generated by humans are not random. There is a lot of work on describing the distribution of human passwords. Many empirical studies have shown that human passwords roughly follow the Zipf distribution [4], [5]. In the Zipf distribution model, the \(r\)-th largest probability value is \(\frac{r^{-\alpha}}{\sum_{i=1}^n i^{-\alpha}}\), where \(\alpha\) is a parameter (\(0<\alpha<1\)) and \(n\) is the size of the password space \(\mathcal{PW}\) (i.e. \(n=|\mathcal{PW}|\)). It implies that a few top-ranked passwords account for a high probability.
Example 2. Consider \(P\) is a Zipf(\(\alpha\),n) over the password space — the \(i\)-th largest probability \(P(pw_i)=\frac{i^{-a}}{\sum_{j=1}^n j^{-a}}\). \(Q\) is a uniform distribution over the password space. Denote \(S=\sum_{j=1}^n j^{-a} \approx \frac{1}{1-\alpha}n^{1-\alpha}\). The flatness can be calculated as: \[\begin{align} \epsilon(1)&=\sum_{x}\frac{1}{k}x[G^{k}(x)-G^{k}(x-)]\\ &=\frac{1}{k}\sum_{i=1}^{n}n\frac{(n-i+1)^{-\alpha}}{S}[(\frac{i}{n})^k -(\frac{i-1}{n})^k ]\\ &\approx \frac{n^{1-\alpha}}{S}\sum_{i=1}^{n} \frac{1}{n} (1-\frac{i+1}{n})^{-\alpha}(\frac{i}{n})^{k-1}\\ &\approx \frac{n^{1-\alpha}}{S} \int_0^1 (1-x)^{-\alpha}x^{k-1} dx \\ &=\frac{n^{1-\alpha}}{S} B(1-\alpha,k)\\ &\approx (1-\alpha)B(1-\alpha,k), \end{align}\] where \(B\) is the beta function.
Supposed \(\alpha=0.9\) [6], the \(\epsilon(1)-k\) graph is shown in Fig.4.
Similarly, we can calculate \(\epsilon(i)=\sum_{j=1}^{i}(1-\alpha)\tbinom{k-1}{j-1}B(j-\alpha,k+1-j)\). The flatness function graph of \(k=20\) is shown in Fig.5
By contrast, Wang et al. guessed an empirical formula of the flatness as \(\epsilon=ak^{-c}+b\) where \(a,b,c\) are constant [6]. Our analysis points out that \(c\) is equal to \(1\) in the asymptotic meaning because \(\epsilon=\Theta(\frac{M}{k})+f(+\infty)\). But when \(k\) is small, this asymptotic formula may be meaningless in some cases. An example is that the above graph shows the flatness decreases slowly with \(k\) when \(k\) is small. It is because M is so large in this case:
\[M=\frac{\frac{1}{S}}{\frac{1}{n}}\approx (1-\alpha)n^{\alpha}=0.1n^{0.9},\] Where n is the size of the password space (typically \(n\ge 10^6\)).
Consider a password \(pw\) with \(P(pw)>0\) and \(Q(pw)=0\). When such a password occurs in a sweetword list, it is surely the true password because the honeyword generating technique will never generate it. Recall that \(f(+\infty):=\Pr_{pw\gets P}[f(pw)>0,Q(pw)=0]\). In case of \(f(+\infty)>0\),
\[\begin{align} \epsilon(1)&=\int_0^M f(x)G^{k-1}(x)dx +f(+\infty)\\ &=\frac{1}{k}\left(M-\int_0^M G^{k}(x)dx\right)+f(+\infty), \label{f4} \end{align}\tag{5}\] with a property:
\[\begin{align} \int_0^M G(x)dx &=xG(x)\vert_0^M -\int_0^M xdG(x)\\ &=M-\int_0^M f(x)dx \\ &=M-1+f(+\infty).\\ \end{align}\]
It is clear from the formula (5 ) that \(\epsilon > f(+\infty)\). Some honeyword generating techniques have large \(f(+\infty)\). The list model is a typical one.
Example 3. Analysis on \(f(+\infty)\) of List model
A “List model" directly uses the empirical distribution of a training dataset. The honeyword probability distribution given by the List model is exactly the empirical distribution function of the sample set \(S\). In other words, the List model trained from a password set \(S\) generates honeyword \(hw\) with probability \(\Pr[hw]=\frac{count(hw)}{\vert S \vert}\). We suppose set \(S\) is sampled from human password distribution iid.
If \(P\) is a uniform password distribution, a classical result is that when \(\vert S \vert=\vert PW \vert =n\), \(\mathbb{E}[f(+\infty)] \approx \frac{1}{e}\). It can be calculated as :
\[\begin{align} \mathbb{E}[f(+\infty)]&=\frac{1}{n}\sum_{pw \in \mathcal{PW}}\mathbb{E}[1_{pw \notin S}]\\ &=\mathbb{E}[1_{pw \notin S}]\\ &=(1-\frac{1}{n})^n\\ &\approx \frac{1}{e}. \end{align}\]
In the same way, there is an approximation formula: \(\mathbb{E}[f(+\infty)]\approx e^{-\frac{\vert S \vert}{n}}\). It shows that the expectation of \(f(+\infty)\) is exponential decay with the size of the sample set. The attenuation constant is \(\frac{1}{n}\).
How about the \(f(+\infty)\) in the case that \(P\) is a Zipf password distribution?
Fortunately, it can be calculated with the help of the Taylor series of \(e^{-x}\). Denote the \(i\)-th largest probability in a Zipf distribution as \(p_i\), then:
\[\begin{align} \mathbb{E}[f(+\infty)]&=\sum_{i=1}^{n}p_i\mathbb{E}[1_{pw_i \notin S}]\\ &=\sum_{i=1}^{n}p_i(1-p_i)^{\vert S \vert}\\ &\approx \sum_{i=1}^{n}p_i\cdot e^{-p_i\vert S \vert }\\ &=\sum_{i=1}^{n}p_i\cdot (1-p_i\vert S \vert +\frac{1}{2!}(p_i\vert S \vert)^2-...)\\ &=\sum_{i=1}^{n} \sum_{j=0}^{+\infty} (-1)^{j}\frac{1}{j!} p_i^{j+1} \vert S \vert^j\\ &=\sum_{j=0}^{+\infty} (-1)^{j}\frac{1}{j!}\vert S \vert^j \sum_{i=1}^{n} p_i^{j+1} . \end{align}\]
Since \(p_i=\frac{i^{-\alpha}}{A}\) where \(A=\sum_{k=1}^{n}k^{-\alpha}\), it can be calculated that \(\sum_{i=1}^{n}p_i=1\), \(\sum_{i=1}^{n}p_i^j=\frac{1}{A^j}\sum_{i=1}^{n}i^{-j\cdot\alpha} \approx \frac{1}{A^j} \zeta(j\cdot\alpha)\) when \(j\cdot\alpha >1\). \(\zeta\) is the Riemann zeta function. Consider \(\alpha=0.9\) [6], we get
\[\mathbb{E}[f(+\infty)]\approx 1+\sum_{j=1}^{+\infty} (-1)^{j}\frac{1}{j!} \frac{\vert S \vert^j}{A^{j+1}} \zeta(0.9(j+1)).\]
It can be seen that the \(f(+\infty)\) is also approximately exponential decay with the size of the sample set in this case. The attenuation constant is \(\Theta(\frac{1}{A})\).
Recall that for a sweetword list of \(i\)-th User [\(sw_{i1},sw_{i2},…,sw_{ik}\)], the maximum probability of success of the attack can be calculated by the Bayesian formula. On the condition of \(T_1=1\), for each account, the attacker has only one opportunity to submit a guess. So \[w_i=\max_{1\le m\le k}\frac{P(sw_{im})/Q(sw_{im})}{\sum_{j=1}^{k}P(sw_{ij})/Q(sw_{ij})}\] is exactly the probability of attacking this account successfully by the strongest attacker. The strongest attacker will calculate \(w_i\) for each account using the formula (1 ) and then target the account with the largest \(w\), followed by the account with the second largest \(w\), and so on.
Observe that \(w_i\) is also a random variable. The randomness comes from \(pw \gets P\) and \(hw \gets Q\) in the generation of the sweetwords list. Denote the probability distribution of \(w\) by \(a(t)\), where \(0\le t \le 1\).
Theorem 4. \(\lambda_U\)-curve defined in Definition 2 has the formula: \[\lambda_U(i)=U\sum_{j=1}^{i}\int_{\frac{1}{k}}^{1}t\cdot a(t)\tbinom{U-1}{j-1}(\mathbb{E}[v_t])^{U-j}(1-\mathbb{E}[v_t])^{j-1}dt ,\] where \[v_t(x)=\left\{ \begin{align} x, x \ge t \\ 1, x < t, \end{align} \right.\] with \[\mathbb{E}[v_t]=\int_0^t a(x)dx +\int_t^1 x\cdot a(x) dx .\]
Here we will give a toy proof of \(\lambda_U(1)\) and leave the full proof of \(\lambda_U(i), 1\le i \le U\) in the Appendix (9).
Proof (toy). Recall the definition of \(\lambda_U(1)\). It is the expectation of the success-number of the strongest attacker before its first failure. Order all \(w_i, 1\le i\le U\) into \(W_1 \ge W_2 \ge \cdots \ge W_U\). In this case, the expectation of the success-number can be calculated as: \[\begin{align} &W_1(1-W_2)+2W_1W_2(1-W_3)+\cdots\\ &\;\;\;\;\;\;\; +(U-1)W_1\cdots W_{U-1}(1-W_U) +U\cdot W_1W_2\cdots W_U\\ &=W_1+W_1W_2+\cdots +W_1W_2\cdots W_U\\ &=\sum_{i=1}^{U}\prod_{j=1}^{i}W_j. \end{align}\]
Take expectation on both sides so we can get the expression of \(\lambda_U(1)\): \[\begin{align} \lambda_U(1)&=\mathbb{E}[\sum_{i=1}^{U}\prod_{j=1}^{i}W_j]\\ &=\sum_{i=1}^{U}\mathbb{E}[\prod_{j=1}^{i}W_j], \end{align}\] where \(W_j\) is the \(j\)-th largest \(w\) of all \(U\) Users.
The next task is to calculate \(\mathbb{E}[\prod_{j=1}^{i}W_j]\). Fortunately, a conditional expectation of it is feasible to calculate (not rigorously):
\[\begin{align} &\mathbb{E}[\prod_{j=1}^{i}W_j\vert W_i=t]\\ &=\tbinom{U}{1}\tbinom{U-1}{i-1}\int_t^1 x_1x_2\cdots x_{i-1}\cdot t \cdot a(x_1)a(x_2)\cdots a(x_{i-1}) \cdot \\ &\;\;\;\;\;\;\left(\int_0^t a(x)\right)^{U-i} dx_1dx_2\cdots dx_{i-1}\\ &=t\tbinom{U}{1}\tbinom{U-1}{i-1}\left(\int_t^1 xa(x)dx\right)^{i-1} \left(\int_0^t a(x)dx\right)^{U-i}. \end{align}\]
This conditional expectation can be understood by the following process. First, we select one of \(U\) users to be the \(t\)-th largest \(w\) which is \(t\). Next, we select \((i-1)\) users whose \(w\) are \(x_1,x_2,\cdots,x_{i-1} \ge t\). The rest users’ \(w\) are less than \(t\). In this case \(\prod_{j=1}^{i}W_j=x_1x_2\cdots x_{i-1}\cdot t\) and the probability can be calculated with the distribution \(a(w)\) of \(w\).
With this expression of conditional expectation, we can calculate the summation:
\[\begin{align} &\sum_{i=1}^{U}\mathbb{E}[\prod_{j=1}^{i}W_j\vert W_i=t]\\ &=\sum_{i=1}^{U}t\tbinom{U}{1}\tbinom{U-1}{i-1}\left(\int_t^1 xa(x)dx\right)^{i-1} \left(\int_0^t a(x)dx\right)^{U-i}\\ &=tU\left(\int_0^t a(x)dx+\int_t^1 xa(x)dx\right)^{U-1}\\ &=tU(\mathbb{E}[v_t])^{U-1}. \end{align}\]
All are prepared so that we can reach \(\lambda_U(1)\):
\[\begin{align} \lambda_U(1) &=U\int_{\frac{1}{k}}^1 ta(t)\left(\mathbb{E}[v_t]\right)^{U-1}dt . \end{align}\] ◻
The above calculation process is not rigorous, but it is helpful to understand the key ideas. For example, it is meaningless to discuss the value of a continuous random variable at a certain point. This loophole can be solved by considering the interval (\(\frac{1}{k} \le W_i\le t\)), but the calculation will be more cumbersome.
It needs more technology to reach \(\lambda_U(i), i > 1\). The strict proof process is in the appendix (9).
This result allows us to see the theoretical success-number graph for the first time, as opposed to some experimental curves resulting from numerous experiments.
Like \(\epsilon(1)\), the mathematical expression of success-number also reveals how \(\lambda_U(i)\) evolves with \(U\). The answer is that \(U\) is related to the convolution kernel applied to a certain function.
Theorem 5. \(\lambda_U\) defined in Definition 2 has a convolution formula:
\[\lambda_U(i)= U\sum_{j=1}^{i}\int_{\frac{1}{k}}^{1} \frac{\phi(x)}{1-\phi(x)} \tbinom{U-1}{j-1} x^{U-j}(1-x)^{j-1}dx.\]
This formula can be viewed as a series of convolution kernel \(\tbinom{U-1}{j-1} x^{U-j}(1-x)^{j-1}\) applied to a monotonically increasing function \(\frac{\phi(x)}{1-\phi(x)}\).
Proof. The key is to let \(\phi(x)\) be the inverse function of \(\mathbb{E}[v_t]\), and notice that \(\frac{\mathrm{d} \mathbb{E}[v_t]}{\mathrm{d} t}=(1-t)a(t)\), so
\[\begin{align} \lambda_U(i)&=U\sum_{j=1}^{i}\int_{\frac{1}{k}}^{1} \frac{t}{1-t} \cdot a(t)\tbinom{U-1}{j-1}(\mathbb{E}[v_t])^{U-j}(1-\mathbb{E}[v_t])^{j-1}d(\mathbb{E}[v_t])\\ &=U\sum_{j=1}^{i}\int_{\frac{1}{k}}^{1} \frac{\phi(x)}{1-\phi(x)} \tbinom{U-1}{j-1} x^{U-j}(1-x)^{j-1}dx . \end{align}\] ◻
When \(U\) is getting larger, the peak point of the convolution kernel \(x_0=\frac{U-j}{U-1}\) is larger, and meanwhile the peak value \(y_0=\Theta(\sqrt{U})\) is also larger. Considering the usual case that the user number is about \(10^7\), this convolution function is so sharp that we can assume it works as a \(\delta(\frac{U-j}{U-1})\) function. Thus we can estimate \(\lambda_U(i)\) as \(\sum_{j=1}^{i} U\frac{\phi(\frac{U-j}{U-1})}{1-\phi(\frac{U-j}{U-1})}\).
The rigorous solution of success-number enables us to select \(T_2\) to achieve the security level we want. The convolution structure also leads us to detailed analysis in future research.
The regret of the calculation of the flatness is that the actual situation requires \(k\) sweetwords to be different. In the following, we estimate the probability of the condition when at least two sweetwords are sampled the same (denoted as Event A). Denote Event B is that a honeyword and a password are sampled the same, and Event C is that there are two honeywords sampled the same. Using the union bound, we can get:
\[\begin{align} \Pr[A]&\le \Pr[B]+\Pr[C]\\ &\le \sum_{pw}P(pw)\{1-[1-Q(pw)]^{k-1}\} +\tbinom{k-1}{2} \sum_{pw}Q^2(pw) . \end{align}\]
We can conclude the flatness under the requirement of different sweetwords is less than the flatness calculated above plus \(\Pr[A]\).
Fortunately, \(\Pr[A]\) is small in most cases so our flatness calculation is satisfactory. For example, if \(Q\) is a uniform distribution over password space, \(Q(pw)=\frac{1}{n},pw \in \mathcal{PW}\). In this case, \(\Pr[A]\le 1-(1-\frac{1}{n})^{k-1}+\frac{k^2}{2n}\le \frac{k^2+2k-2}{2n}\). Consider the typical size of password space is larger than \(10^6\), so this probability is smaller than \(5\times 10^{-3}\) when \(k\le 100\).
The success-number function is figured out with the help of the probability distribution of \(w\) (i.e. \(a(x)\)). Unfortunately, It is hard for \(a(x)\) to have a neat mathematical expression, even in simple cases (\(P\) and \(Q\) have beautiful expressions).
This section reviews some typical honeyword generating techniques (HGT) raised by researchers. In our theoretical perspective, different HGT gives out different honeyword distribution \(Q\).
Juels and Rivest distinguishes two cases of HGTs – “With legacy-UI procedures" and”With modified-UI procedures" [1]. In “With modified-UI procedures", the password-change UI is modified to allow for better password/honeyword generation. A typical instance is”take-a-tail" [1]. This HGT requests a password from the user in exchange for a new, computer-generated password that they have to remember. In “With legacy-UI procedures", users choose the password just as they would in the traditional PBA. It makes users unaware of the honeyword creation processes.”With modified-UI procedures" would bring a greater memory burden to users as compared to “With legacy-UI procedures".
Due to the usability of the “With legacy-UI procedures",”With legacy-UI procedures" attract the attention of most researchers. Juels and Rivest put forward several “With legacy-UI procedures" generating techniques, such as”Chaffing by tweaking", “Chaffing-with-a-password-model" and”Modeling syntax" [1].
“Chaffing by tweaking" is to”tweak” selected character positions of the password to generate the honeywords. Here is an example in [1]: if the user’s true password is “BG+7y45”, then the sweetword list might be : ["BG+7q03", "BG+7m55", "BG+7y45", "BG+7o92"] (for tail-tweaking with t = 3 and k = 4).
By contrast, the generating techniques based on the password probability model are alluring and promising [7]. They generate honeywords just by sampling from a password probability model. It is concise and easy to analyze.
There are three typical password probability models: “List model",”Markov model", and “PCFG model" [8]. They can be trained from a password dataset. The model’s primary goal is to estimate each password’s probability according to which the sampling procedure is determined.
A “List model" directly uses the empirical distribution of the training dataset. That is, the probability of a password \(\Pr[pw]\) is its frequency in the training set.
A “Markov model" regards the password generating process as a Markov process. That is, the probability of a password \(\Pr[pw]\) is \[\prod_{i=1}\Pr [c_i\vert c_1 c_2…c_{i-1}]=\prod_{i=1} \Pr[c_i\vert c_{i-m-1} c_{i-m}…c_{i-1}],\] where m is the order of the markov process. The conditional probability is set as its frequency in the training set.
A “PCFG model" parses the password into a sequence of”tokens”. It proposed the probability of a password is the product of the probability of the "syntax" and the probability of each fill of these tokens [9]. For example, the probability of the password “mice3blind" is calculated as \(\Pr[W_4|D_1|W_5]=\Pr[W_4\to``mice"]\cdot \Pr[D_1\to``3"]\cdot \Pr[W_5\to``blind"]\).
This paper first gives out the theoretical expressions of two key security metrics of the Honeyword system and reveals their mathematical structure. These results show the theoretical values of the honeyword system for the first time. The rigorous expression of the flatness leads us to a full understanding of how the average probability of attacking successfully evolves with system parameter \(k\). The mathematical structure of success-number is a series of convolutions. We believe these theoretical results will inspire further research on honeyword and other related questions.
This research is supported by National Key R&D Program of China (2020YFB1805400), and National Natural Science Foundation of China (62072010, 62202012).
Here we give out another method to calculate \(\lambda_U(1)\). This method is convenient to be generalized to get all \(\lambda_U(i)\).
Consider two random processes: the first is to get \(U\) samples \(\{w_i\}_{i \in \{1,2,...,U\}}\) from the distribution of \(w\), whose probability distribution function is \(a(x),0\le x \le 1\); the second is to attack each account and get the attack results. Denote the success of attack as 1 and the failure as 0. Thus we get a \(U\)-bit string \(b_1b_2...b_U \in \{0,1\}^U\).
Consider a random variable \(A_i\) defined as:
\[A_i(b_1b_2...b_n)=b_i\prod_{j \neq i}1_{\{w_j\le w_i\} \cup \{b_j=1\} }=b_i\prod_{j \neq i} b_j^{1_\{w_j>w_i\}} .\]
When \(A_i\) is equal to 1, it means that all accounts that guessed before \(i\)-th account are all cracked. So the \(i\)-th account is guessed successfully before the first failure. On the other hand, \(A_i=0\) means the \(i\)-th account is guessed unsuccessfully or it is guessed after the first failure. By symmetry, all \(A_i\) have the same expectation. So \[\lambda_U(1)=\sum_{i=1}^U \mathbb{E}[A_i]=U\cdot\mathbb{E}[A_i] .\]
Recall the definition of \(v_t\):
\[v_t(x)=x^{1_{\{x\ge t\}}}=\left\{ \begin{align} x, x \ge t \\ 1, x < t . \end{align} \right.\]
Now we calculate \(\mathbb{E}[A_i]\):
\[\begin{align} \mathbb{E}[A_i]&=\mathbb{E}_1\mathbb{E}_2[A_i]\\ &=\mathbb{E}_1\mathbb{E}_2[b_i\prod_{j \neq i} b_j^{1_\{w_j>w_i\}}] \\ &=\mathbb{E}_1[\mathbb{E}_2[b_i]\prod_{j \neq i} \mathbb{E}_2[b_j^{1_\{w_j>w_i\}}] ] \\ &=\mathbb{E}_1[ w_i \prod_{j \neq i} w_j^{1_\{w_j>w_i\}} ]\\ &=\int_0^1 t [\prod_{j \neq i} \mathbb{E}_1v_t(w_j)]a(t)dt\\ &=\int_0^1 \left(\mathbb{E}[v_t(w_j)]\right)^{U-1}ta(t)dt\\ \end{align}\] where \[\mathbb{E}[v_t(w_j)] =\int_0^t xa(x)dx +\int_t^1 a(x)dx .\]
Thus
\[\lambda_U(1)=U\cdot \mathbb{E}[A_i]=U\cdot \int_0^1 (\mathbb{E}[v_t(w_j)])^{U-1}ta(t)dt .\]
Analogously, consider a random variable defined below:
\[\begin{align} B_i^S(b_1b_2...b_U)&=b_i\prod_{j \notin S \cup {i} }1_{ \{w_j \le w_i \} \cup \{b_j=1\} } \prod_{j \in S}1_{ \{w_j>w_i\} \cap \{b_j=0\} } \\ &=b_i\prod_{j \notin S \cup {i} }b_j^{1_{ \{w_j > w_i \} } }\prod_{j \in S}(1-b_j^{1_{ \{w_j>w_i\} } } ) . \end{align}\] where \(S=\{i_1,i_2,...,i_{m-1}\}\). When \(B_i^S=1\), attacker fails in the guesses of accounts in \(S\), and succeeds in guessing other accounts whose \(w\) is larger than \(w_i\). It means that the \(i\)-th account is successfully guessed between the \((m-1)\)-th failure and the \(m\)-th failure. Thus
\[\lambda_U(m)-\lambda_U(m-1)=\sum_{i=1}^{n}\sum_{S\subset (U)-\{i\},\vert S \vert =m-1} \mathbb{E}[B_i^S] .\]
Similar to the above calculation:
\[\begin{align} \mathbb{E}[B_i^S]=\mathbb{E}_1[\mathbb{E}_2 B_i^S]=\int_0^1 [\mathbb{E}v_t(w)]^{n-k}[1-\mathbb{E}[v_t(w)]]^{k-1} tg(t)dt . \end{align}\]
So we arrive at the final answer:
\[\begin{align} \lambda_U(m)-\lambda_U(m-1)&=\sum_{i=1}^{n}\sum_{S\subset (U)-\{i\},\vert S \vert =m-1} \mathbb{E}[B_i^S] \\ &=U\tbinom{U-1}{m-1}\int_0^1 [\mathbb{E}v_t(w)]^{n-k}[1-\mathbb{E}[v_t(w)]]^{k-1} tg(t)dt . \end{align}\]
where \(\tbinom{U-1}{m-1}\) is the numbers of \(S\).
To sum up, the formula of \(\lambda_U(i)\) is :
\[\begin{align} \lambda(i)=U\sum_{j=1}^{i}\int_{\frac{1}{k}}^{1}t\cdot a(t)\tbinom{U-1}{j-1}(\mathbb{E}[v_t])^{U-j}(1-\mathbb{E}[v_t])^{j-1}dt . \end{align}\]