Abstract

In this paper, we obtain new results on the weak and strong consistency of the maximum and integrated conditional likelihood estimators for the community detection problem in the Stochastic Block Model with \(k\) communities. In particular we show that maximum conditional likelihood achieves the optimal known threshold for exact recovery in the logarithmic degree regime. For the integrated conditional likelihood, we obtain a sub-optimal constant in the same regime. Both methods are shown to be weakly consistent in the divergent degree regime. These results confirm the optimality of maximum likelihood on the task of community detection, something that has remained as an open problem until now.

Keywords: SBM, profile likelihood estimator, model selection

1 Introduction↩︎

Network analysis has recently been applied to describe and model random interactions between pairs of actors in a population. The network’s nodes represent the actors, while the edges represent their interactions. Network analysis is applied in several fields, such as social sciences, biology, computer science, neuroscience, and many others. Community detection is an important task in network analysis, and it is related to identifying groups of nodes in an observed network based on their connectivity patterns.

The Stochastic Block Model (SBM) proposed by [1] is a statistical model for networks that incorporates the community structure of the nodes. In an SBM, the nodes are partitioned randomly into communities, and the existence of an edge between pairs of nodes depends on their community membership. More precisely, the existence of the edges in the network is modeled by independent Bernoulli random variables, conditionally on the nodes’ memberships, with parameters depending only on the community assignments. There are extensions of the SBM that take into account distinct aspects of networks, such as heterogeneity of the node’s degree [2]–[4], nodes belonging to more than one community [5], [6], networks that evolve on time [7], [8].

Regardless of the several extensions, much recent research has focused on the recovery of the communities for the classical SBM. An incomplete list includes Newman-Girvan modularity [9], maximum likelihood-based methods [10], [11], spectral clustering [12], [13], semidefinite programming relaxation of maximum likelihood [14] and Bayesian estimator [15].

Given the large number of community recovery methods proposed in the literature, it is natural to inquire about necessary and sufficient conditions guaranteeing the recovery of the true community assignments. The recovery of the communities has been studied in two main aspects: partial and exact recovery. Partial recovery (also called weak consistency) means that an increasing fraction of the node’s communities is recovered, with a high probability, when the network size increases. In contrast, exact recovery (strong consistency) means that all the nodes’ communities are recovered with a high probability when the network size increases. These notions are defined up to some permutation of the true labels, as exact determination of the node’s labels is impossible.

In recent works, [14] and [13] established theoretical conditions and thresholds for partial and exact community recovery in the two symmetric communities model, also known as planted bisection model, and propose algorithms attaining these thresholds. The planted bisection model is a simpler version of an SBM, with two equal-sized communities and two parameters for the within and between communities’ edge probabilities. They assume the model’s parameters are known and study the semi-sparse regime where the edge probabilities decrease to zero at a rate \(\rho_n\), where \(n\) denotes the network size. For partial recovery, the necessary and sufficient condition is that \(n\rho_n\to\infty\) as \(n\to\infty\) [13]. For exact recovery, both works established a precise phase transition threshold when \(\rho_n = \log n/n\), the minimal degree regime for obtaining strong consistency, provided the parameters satisfy \(\frac{1}{2}(\sqrt{s_1}-\sqrt{s_2})^2>1\), where \(s_1\rho_n\) and \(s_2\rho_n\) are the within and between communities’ edge probabilities, respectively. Additionally, [13] showed that strong consistency is also possible at the phase transition constant \(\frac{1}{2}(\sqrt{s_1}-\sqrt{s_2})^2=1\).

For the general SBM with unknown parameters, [16] also proved a phase transitions phenomenon under \(\rho_n = \log n/n\) and a condition on the parameters. They also proposed an algorithm attaining this threshold. But concerning maximum likelihood and Bayesian estimators, the strong consistency results known up to this moment were weaker. [10] proved the strong consistency of the maximum likelihood estimator in the regime \(\rho_n \gg \log n/n\). From the Bayesian point of view, [15] proved that the posterior mode is a strongly consistent estimator of the communities when \(\rho_n \gg (\log n)^2/n\). Even though the latter work has a stronger condition than [10], the authors pointed out that the former work assumed a global Lipschitz condition that does not generally hold. For that reason, and by the relation of both estimators, up to this moment, the strong consistency of the maximum likelihood and Bayesian estimators are only verified in the regime \(\rho_n \gg (\log n)^2/ n\).

In this work, we study the maximum and integrated conditional likelihood estimators for the general SBM with \(k\) communities not necessarily of the same size and without assuming known parameters. The integrated conditional likelihood estimator is a modified version of the estimator proposed in [15], attaining an almost optimal rate of convergence. We prove that weak consistency can be obtained for both methods under the same necessary regime \(n\rho_n\to\infty\) as \(n\to\infty\) established by [13]. Moreover, we prove that the maximum and integrated conditional likelihood estimators are strongly consistent under the regime \(\rho_n\geq \log n/n\), with a condition on the parameters. In the case \(\rho_n=\log n/n\), we obtain sharp results as shown by [16], proving even that strong consistency holds also at the phase transition constant. For the integrated conditional likelihood estimator, we obtain a sub-optimal constant when \(\rho_n = \log n/n\). The proof of the strong consistency of the estimators is derived using a new concentration result based on the general Chernoff bound. With these results, we generalize and prove the weak and strong consistency at the optimal regimes of maximum and integrated likelihood estimators, extending the results by [10] and [15] on these classes of estimators.

This paper is organized as follows. In Section 2 we introduce the main definition of the model and of the estimators, and state a basic lemma relating both estimators. In Section 3 we state the main results concerning weak and strong consistency, and in Section 5 we present their proofs. Section 6 presents a brief discussion, and in the Appendix, we state and prove several auxiliary results, including the new concentration result based on the Chernoff bound.

2 Community detection in Stochastic Block Models↩︎

The SBM with \(n\) nodes and \(k\) communities can be described by the pair of random structures \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\), where \({\boldsymbol{Z}_n}\) denotes the community assignment of the nodes and \(\boldsymbol{A}_{n\times n}\) is the adjacency matrix representing the network. The random vector \({\boldsymbol{Z}_n}=(Z_1,Z_2,\dots,Z_n)\) of community assignments is composed by independent and identically distributed random variables with distribution \(\pi\) over \([k]=\{1,2,\dots,k\}\). Given \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\), the law of the adjacency matrix \(\boldsymbol{A}_{n\times n}\) is a product of Bernoulli random variables whose parameters depend only on the nodes’ labels. More formally, there exists a symmetric probability matrix \(P \in [0,1]^{k\times k}\) such that the conditional distribution of \(\boldsymbol{A}_{n\times n}=\boldsymbol{a}_{n\times n}\) given \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\) can be written as \[\label{def-prob} \mathbb{P}(\boldsymbol{a}_{n\times n}|\boldsymbol{z}_n) = \prod_{1\leq a, b\leq k} \!\!P_{ab}^{o_{ab}/2} (1-P_{ab})^{(n_{ab}-o_{ab})/2}\,,\tag{1}\] where the counters \(n_a=n_a(\boldsymbol{z}_n)\), \(n_{ab}=n_{ab}(\boldsymbol{z}_n)\) and \(o_{ab}=o_{ab}(\boldsymbol{z}_n,\boldsymbol{a}_{n\times n})\) are given by \[\begin{align} n_a(\boldsymbol{z}_n) &= \sum\limits_{i=1}^n \mathbb{1}\{z_i=a\}\, , \qquad\quad\;\, 1 \leq a \leq k\,,\\ n_{ab}(\boldsymbol{z}_n) &=\begin{cases} n_a(\boldsymbol{z}_n)n_b(\boldsymbol{z}_n)\, ,& 1 \leq a < b \leq k,\\ n_a(\boldsymbol{z}_n)(n_a(\boldsymbol{z}_n)-1)\,, & 1 \leq a=b \leq k,\, \end{cases} \end{align}\] and \[o_{ab}(\boldsymbol{z}_n,\boldsymbol{a}_{n\times n}) = \sum\limits_{1\leq i, j\leq n} \mathbb{1}\{z_i=a,z_j=b\}a_{ij}\,.\]

As it is usual in the definition of likelihood functions, by convention, we define \(0^0=1\) in 1 when some of the parameters are 0.

For any \(a\in[k]\), the expected number of nodes in community \(a\) is given by \(n\pi_a\). When \(\pi_a=1/k\) for all \(a\in[k]\), we say that the network is balanced. But as \({\boldsymbol{Z}_n}\) is randomly generated, even in the balanced case, the communities can have different sizes. Given a matrix \(P\) and a node in the community \(a\), its expected degree is given by \(n\sum_{b}\pi_bP_{ba}\). Then, assuming \(k\) and \(\pi\) fixed, the expected degree of a node is of the order of \(nP\). When \(P\) is fixed with bounded entries, the expected degree grows linearly with \(n\) and the network is dense. But in general, observed networks do not have too many edges, so the interesting regimes are those where \(P\) decreases with the number of nodes \(n\) and the expected degree is of order \(nP \ll n\). To emphasize the dependence on \(n\), we reparametrized \(P\) by writing \(P = \rho_n S\) for some matrix \(S\) with all entries strictly positive. For simplicity in the presentation of the results, we assume all entries in \(P\) have the same asymptotic order \(\rho_n\), so we can define the values \[s_{\min}= \min_{a,b} S_{ab}\quad \text{ and }\quad s_{\max}= \max_{a,b} S_{ab}\] and take them as strictly positive constants not depending on \(n\). As we are mainly interested in sparse networks, we will assume that the sequence \(\{\rho_n\}_{n\in\mathbb{N}}\) satisfies \(\rho_n\to 0\) as \(n\to\infty\). Other scenarios are also possible, but they are out of the scope of this article.

In this paper, we obtain conditions over \((\pi, \rho_nS)\) in order to obtain weak and strong consistency of the maximum and integrated conditional likelihood estimators. To guarantee the identifiability of the model, we make the assumption that all entries of \(\pi\) are positive and that the matrix \(S\) does not contain any two identical columns, as it is usually assumed in the literature.

Given the community labels \(\boldsymbol{z}_n\), the maximum likelihood estimators for \(P=\rho_nS\) can be easily obtained by maximizing 1 . Using these estimates we can write the likelihood modularity (ML) for any \(\boldsymbol{z}_n\) as \[\label{def-P-hat} Q_{\mathrm{\small ml}}(\boldsymbol{z}_n) = \frac{1}{2n^2}\sum_{1\leq a, b\leq k} n_{ab}(\boldsymbol{z}_n) \tau\Bigl( \dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)}\Bigr)\,\tag{2}\] with \(\tau(x) = x\log x + (1-x)\log (1-x)\). Given the observed graph \(\boldsymbol{a}_{n\times n}\), to estimate the community labels in \([k]^n\) we define \[\label{zstar} {\widehat{\boldsymbol{z}}_n}^{\mathrm{\small ml}}= \mathop{\mathrm{arg\,max}}\limits_{\boldsymbol{z}_n\in \mathcal{F}(n,\alpha)}\left\lbrace Q_{\mathrm{\small ml}}(\boldsymbol{z}_n) \right\rbrace\,,\tag{3}\] where, for some \(\alpha>0\) \[\label{calF} \mathcal{F}(n,\alpha) =\{\boldsymbol{z}_n\in [k]^n\colon n_a(\boldsymbol{z}_n)\geq \alpha n \text{ for all }a\in [k]\}\,.\tag{4}\]

The set \(\mathcal{F}(n,\alpha)\) contains all community assignments such that the smallest community has a size at least \(\alpha n\). This allows us to consider the community detection problem in the case of unbalanced networks. For balanced symmetric networks we have \(\alpha=1/k\), but in this work we assume \(\alpha\) positive with the only restriction that \(\alpha < \min_a \pi_a\). This also allows us to consider a diverging number of communities. It is worth noting that requesting linear size communities does not seem to be too restrictive, as sublinear communities would eventually “disappear” and could not provide a fixed \(k\)-communities model. On the other hand, this restriction appears as a necessary condition when considering the logarithm degree regime, even for weak consistency, see Theorem 5.

Another related form to estimate the communities, from a Bayesian perspective, is to consider the integrated conditional likelihood approach, which is obtained by integrating out the parameter \(P\) from the conditional distribution 1 , that is \[\mathbb{Q}(\boldsymbol{a}_{n\times n}\mid \boldsymbol{z}_n) = \int_{[0,1]^{k(k+1)/2}}\mathbb{P}(\boldsymbol{a}_{n\times n}\mid \boldsymbol{z}_n)\nu(P)d P\,,\] where \(\nu(P)\) denotes a prior distribution on the matrix \(P\). For convenience in the computations (see below), we assume that the prior distribution of \(P\) is given by \[\begin{align} P_{ab} &\;\overset{i.i.d.}{\sim}\; \text{Beta}(1/2,1/2), \quad 1\leq a \leq b \leq k\,. \end{align}\] Rewriting the likelihood 1 by taking the product only on the communities \(a\) and \(b\) such that \(a \leq b\) and using the conjugacy between the Bernoulli and Beta distributions, we obtain that \[\mathbb{Q}(\boldsymbol{a}_{n\times n}\mid \boldsymbol{z}_n) = \prod_{1\leq a\leq b\leq k} \frac{ B( \tilde{o}_{ab}(\boldsymbol{z}_n)+1/2, \tilde{n}_{ab}(\boldsymbol{z}_n)-\tilde{o}_{ab}(\boldsymbol{z}_n)+1/2)}{B(1/2,1/2)}\,,\] where \(\tilde{o}_{ab}(\boldsymbol{z}_n)=o_{ab}(\boldsymbol{z}_n)/2\), if \(a= b\) and \(\tilde{o}_{ab}(\boldsymbol{z}_n)= o_{ab}(\boldsymbol{z}_n)\) if \(a\neq b\) and \(\tilde{n}_{ab}(\boldsymbol{z}_n)= n_{ab}(\boldsymbol{z}_n)/2\), if \(a=b\) and \(\tilde{n}_{ab}(\boldsymbol{z}_n)=n_{ab}(\boldsymbol{z}_n)\) if \(a\neq b\) with \(B(x,y)\) the beta-function. Thus, the integrated conditional likelihood modularity can be defined as \[\label{def-P-hat2} Q_{\mathrm{\small icl}}(\boldsymbol{z}_n) = \frac{1}{n^2}\sum_{1\leq a \leq b\leq k} \log \frac{B( \tilde{o}_{ab}(\boldsymbol{z}_n)+1/2, \tilde{n}_{ab}(\boldsymbol{z}_n)-\tilde{o}_{ab}(\boldsymbol{z}_n)+1/2)}{B(1/2,1/2)}\,.\tag{5}\] Given the observed graph \(\boldsymbol{a}_{n\times n}\), the estimator of the communities by the maximum integrated conditional likelihood (ICL) is given by \[\label{zstar2} {\widehat{\boldsymbol{z}}_n}^{\mathrm{\small icl}}\;=\; \mathop{\mathrm{arg\,max}}\limits_{\boldsymbol{z}_n\in \mathcal{F}(n,\alpha)} Q_{\mathrm{\small icl}}(\boldsymbol{z}_n) \,,\tag{6}\] where \(\mathcal{F}(n,\alpha)\) is defined, for some \(\alpha>0\), by 4 .

The first to propose a Bayesian community detection approach and to establish its strong consistency was [15], under the condition \(\rho_n \gg \log^2 n/n\). They defined prior distributions on \(\pi\) and \(P\) and derived the joint distribution of \(\boldsymbol{A}_{n\times n}\) and \({\boldsymbol{Z}_n}\) by integrating out the parameters. The estimator of the communities was defined as the posterior mode of the distribution of \({\boldsymbol{Z}_n}\) given \(\boldsymbol{A}_{n\times n}\). In this work, we consider the integrated conditional likelihood approach, which is obtained by integrating out the parameter \(P\) from the conditional distribution of \(\boldsymbol{A}_{n\times n}\) given \({\boldsymbol{Z}_n}\), and defining the estimator of the communities according to this distribution. This minor modification allows us to prove the consistency of this Bayesian estimator under the regime \(\rho_n \asymp \log n/n\).

A basic known result relating the integrated conditional likelihood and the maximum likelihood modularities, that allow us to treat both estimators with the same techniques, is stated in the following lemma.

Lemma 1. For all \(n\) we have that \[\max_{\boldsymbol{e}_n\in[k]^n} \bigl| \;Q_{\mathrm{\small icl}}(\boldsymbol{e}_n) - Q_{\mathrm{\small ml}}(\boldsymbol{e}_n) \;\bigr| \;\leq\; \frac{k^2(\log n+2)}{n^2}\,.\]

The proof of this basic result is included in the Supplementary Material to this article.

3 Weak and strong consistency↩︎

Denote the discrepancy between any estimated community vector \({\widehat{\boldsymbol{z}}_n}\) and the true community labels \({\boldsymbol{Z}_n}\) by the function \[m({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n}) \;=\; \min_{\sigma}\, \Bigl\{\, \sum_{i=1}^n \mathbf{1}\{\hat{z}_i \neq \sigma(Z_i)\} \,\Bigr\}\] where \(\sigma\) denotes any permutation of the labels, \(\sigma\colon [k]\to[k]\).

We say that an estimator is weakly consistent if the fraction of misclassified nodes, up to a permutation, converges to zero; that is, if for all \(\epsilon >0\) we have \[\mathbb{P}(m({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n}) < \epsilon n) \to 1\quad \text{ as } n\to\infty\,.\] This is also called partial recovery of the communities. A stronger notion of consistency defines the exact recovery of the communities if asymptotically no errors occur, that is if \[\mathbb{P}(m({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n}) =0 ) \to 1\quad \text{ as } n\to\infty\,.\] An estimator satisfying the above condition is called strongly consistent.

Our first result refers to the weak consistency of the maximum and integrated conditional likelihood estimators.

Theorem 1. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((k,\pi, \rho_nS)\). Then if \(\alpha<\min_{a} \pi_a\) and \(n\rho_n\to\infty\) as \(n\to\infty\), we have that both \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\) and \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\) are weakly consistent. Moreover, \(\alpha\) can be taken as decreasing with \(n\), at a rate \(\alpha \gg (n\rho_n)^{-1/2}\).

The result above shows that partial recovery can be obtained when \(n\rho_n\to\infty\). This condition has been proved and shown to be necessary for the two communities’ symmetric model by [13]. Here we generalize the result for any number of communities \(k\) and models with unknown parameters.

We now state the strong consistency of the estimators \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\) and \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\) for \(\rho_n = \Omega(\log n/n)\) or equivalently \(\rho_n^{-1} = O(n/\log n)\). When \(\rho_n\gg \log n/n\), any pair \((\pi,S)\) fulfills the requirements for strong consistency. However, in the special case \(\rho_n=\log n/n\), we have a precise phase transition threshold depending on \((\pi, S)\) above which we have the strong consistency of \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\).

The explicit constant that plays an essential role in the results is given by \[\label{constantC} C(\pi, S) \;=\; \min_{b,b'} \max_{t\in [0,1]} \;\sum_{a} \pi_a H_t(S_{ab} \| S_{ab'})\tag{7}\] with \[H_t(p\| q) \;=\; (1-t) p + t p - p^{1-t}q^t\,.\] For the maximum likelihood estimator \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\) when \(\rho_n=\log n/n\) we obtain the optimal condition \(C(\pi,S)\geq1\), showing that we can have also strong consistency at the the phase transition threshold \(C(\pi,S)=1\). This generalizes the results of [16] and [13]. In the case of the integrated conditional likelihood estimator, we obtain a slightly stronger condition depending on \(k\), namely \(C(\pi,S)\geq 1 + k^2\). It remains an open question if this condition can be relaxed for \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\).

Theorem 2. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\). Then if \(\rho_n \to 0\) as \(\rho_n\gg \log n/n\) or \(\rho_n=\log n/n\) and \(C(\pi,S)\geq 1\) we have that \[\mathbb{P}\Bigl(\, m({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n}) \, = \,0\, \Bigr) \;\to\; 1\] as \(n\to\infty\).

Remark 3. In the two symmetric communities model with \(\rho_n=\log n / n\) and \[S = \begin{pmatrix} s_1 & s_2\\ s_2 & s_1 \end{pmatrix}\] we have that \[\begin{align} C(\pi,S)& = \frac{1}{2}(\sqrt{s_1} - \sqrt{s_2})^2 \end{align}\] so the condition \(C(\pi,S)\geq 1\) is the same necessary and sufficient condition obtained by [13], [14] in the planted bisection model. In the balanced model with \(k\) communities, that is when \(\pi_i=1/k\) for all \(i\) and \(S_{ij} = s_1\) for \(i=j\) and \(S_{ij} = s_2\) for \(i\neq j\) we have that \[\begin{align} C(\pi,S)& = \frac{1}{k} (\sqrt{s_1} - \sqrt{s_2})^2\,. \end{align}\]

In the general model with \(k\) communities, our constant \(C(\pi,S)\) coincides with that obtained by [16]. They obtain consistency under the condition \(C(\pi,S)> 1\) and show that consistency is imposible for \(C(\pi,S)<1\). Here we also prove consistency in this general setting at the phase transition threshold \(C(\pi,S)= 1\), completing all the possible scenarios.

In the case of the estimator \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\), we obtain a similar result with a slightly suboptimal constant in the case \(\rho_n \gg \log n/n\), but extending to the optimal sparse regime the consistency result for the Bayesian estimator obtained by [15], that was proved in the regime \(\rho_n\gg \log^2 n/n\).

Theorem 4. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\). Then if \(\rho_n \to 0\) as \(\rho_n\gg \log n/n\) or \(\rho_n=\log n/n\) and \(C(\pi,S)\geq 1+k^2\) we have that \[\mathbb{P}\Bigl(\, m({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl},{\boldsymbol{Z}_n}) \, = \,0\, \Bigr) \;\to\; 1\] as \(n\to\infty\).

4 Empirical analysis↩︎

The ML and ICL estimators given in 3 and 6 , respectively, involve a maximization over all possible community assignments, making it computationally demanding. In practice, it is possible to approximate the solution of the ML estimator by a solution of the variational EM algorithm, as proposed in [17]. In the case of the ICL estimator, [18] proposed a greedy inference algorithm based on the integrated likelihood. While this algorithm differs slightly from the proposed ICL estimator in 6 by jointly estimating the number of communities and clusters, we utilize it as an illustration of the practical application of the ICL estimator. In order to measure the discrepancy between the estimated communities and the true ones, we opted to employ Normalized Mutual Information (NMI) because it is well-defined even for clusters with different numbers of communities [19], as can be the case in the computation of the ICL estimator. The NMI values range from 0 to 1, where 0 indicates no mutual information, and 1 indicates perfect agreement between the two communities’ assignments.

We first study the impact of the difference between the probability of connection within (\(\rho_n s_1\)) and between (\(\rho_ns_2\)) communities in terms of \((\sqrt{s_1}-\sqrt{s_2})^2\) for networks generated by the SBM model with balanced communities and \(\rho_n=\log n / n\). Figure 1 shows that the NMI increases to one as the quantity \((\sqrt{s_1}-\sqrt{s_2})^2\) approaches \(k\), visualized as the dashed vertical line, as stated in Theorem 2.

To investigate the impact of the sparsity parameter \(\rho_n\) in the NMI between the estimated and true communities assignments, we set \(s_1\) and \(s_2\) such that \((\sqrt{s_1}-\sqrt{s_2})^2 = 2.10\) and we vary the values of \(\rho_n\) for fixed \(n=200\). Figure 2 shows that setting \(\rho_n=1/n=0.005\) the NMI is close to zero and it approaches to one when \(\rho_n\) is getting closer to \(\log n / n\).

5 Proofs of main results↩︎

Denote the confusion matrix between two community assignments \(\boldsymbol{e}_n,\boldsymbol{z}_n\in [k]^n\) by \(R(\boldsymbol{e}_n,\boldsymbol{z}_n) \in \mathbb{R}_{\geq0}^{k\times k}\) with entries \[\label{conf95mat} R_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n) \;=\; {\frac{1}{n}}\sum_{i=1}^n \mathbb{1}\{e_i=a, z_i=b\},\quad a,b\in [k]\,.\tag{8}\] Observe that the row sum and the column sum of the confusion matrix satisfy that \(n_{a}(\boldsymbol{e}_n) = n [R(\boldsymbol{e}_n,{\boldsymbol{Z}_n})\mathbf{1}]_a\) and \(n_{a}(\boldsymbol{z}_n) = n [R(\boldsymbol{e}_n,\boldsymbol{z}_n)^T\mathbf{1}]_a\), respectively, where \(\mathbf{1}\) is a \(k\times 1\) vector with entries equal to one. Given a vector \(v \in \mathbb{R}^k\) we denote by \(\text{Diag}(v)\) the \(k\times k\) diagonal matrix with diagonal elements given by \(v\). An important observation is that \(\text{Diag}(R(\boldsymbol{e}_n,\boldsymbol{z}_n)^T\mathbf{1}) = R(\boldsymbol{z}_n,\boldsymbol{z}_n)\), for all \(\boldsymbol{e}_n,\boldsymbol{z}_n\in [k]^n\).

Given any matrix \(R\in \mathbb{R}_{\geq0}^{k\times k}\) and \(a,b\in[k]\) we define the value \[\label{pmixt} [P_R]_{ab} = \frac{[R P R^T]_{ab}-\delta_{ab} \sum_{i=1}^{k}P_{ii} R_{ai}/n}{ [R\mathbf{1}]_{a}( [R\mathbf{1}]_{b} - \delta_{ab}/n)}\,,\tag{9}\] where \(\delta_{ab}=\mathbf{1}\{a=b\}\). Observe that for the model 1 , for any \(\boldsymbol{e}_n,\boldsymbol{z}_n\in [k]^n\) we have that \[\label{cond95exp95oab} \mathbb{E}(o_{ab}(\boldsymbol{e}_n)\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;=\; n_{ab}(\boldsymbol{e}_n) [P_{R(\boldsymbol{e}_n,\boldsymbol{z}_n)}]_{ab}\,.\tag{10}\]

Now define for any confusion matrix \(R \in \mathbb{R}_{\geq0}^{k\times k}\) the function \[\begin{align} H_{P,n}(R) &\;=\; \frac{1}{2} \sum_{1\leq a,b\leq k} [R\mathbf{1}]_a([R\mathbf{1}]_b-\delta_{ab}/n)\, \tau\bigl( [ P_{R}]_{ab} \bigr) \end{align}\] where \(P_R\) is defined in 9 and \(\tau(x)=x\log(x) + (1-x)\log(1-x)\). Observe that the function \(H_{P,n}(R)\) is the likelihood modularity 2 with the argument in the function \(\tau\) substituted by its conditional expectation, given by 10 .

We also define, for two vectors \(\boldsymbol{e}_n,\boldsymbol{z}_n\in [k]^n\), the function \(X\colon [k]^n\times [k]^n \to \mathbb{R}\) by \[\label{Xmatrix} X(\boldsymbol{e}_n,\boldsymbol{z}_n)\;=\; \frac{1}{2n^2}\sum_{1\leq a,b\leq k}\,n_{ab}(\boldsymbol{e}_n) \Bigl[ \tau\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr) - \tau\bigl( [ P_{R(\boldsymbol{e}_n,\boldsymbol{z}_n)}]_{ab} \bigr)\Bigr]\,.\tag{11}\] Observe that with this definition we have that \[\label{Xmatrix2} X(\boldsymbol{e}_n,\boldsymbol{z}_n)\;=\; Q_{\mathrm{\small ml}}(\boldsymbol{e}_n) - H_{P,n}(R(\boldsymbol{e}_n,\boldsymbol{z}_n))\,.\tag{12}\]

Let \(\|M\|_1\) denote the entry-wise 1-norm of a matrix \(M\), defined as the sum of the absolute values of all entries of \(M\).

We will now prove the weak consistency of the maximum and integrated likelihood estimators as stated in Theorem 1.

Proof of Theorem 1. Observe that by Lemma 2 we have that \[m({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n}) \;=\; \frac{n}{2}\,\|R({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n})-R({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})\|_1\,.\] Then it is enough to prove that for all \(\epsilon>0\) \[\|R({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n})-R({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})\|_1<\epsilon\] eventually almost surely as \(n\to\infty\). Let \(\mathcal{R}_\epsilon\) be the set of probability matrices \(R\) with \[\min_{\sigma} \|D_\sigma R - \text{Diag}(R^T\mathbf{1})\|_1\geq \epsilon\quad\text{ and }\quad \min_{a\colon \pi_a>0}( [R^T\mathbf{1}]_a) \geq \epsilon\,,\] where \(D_\sigma\) denotes a permutation matrix in the \(k\) labels. That is, \(\mathcal{R}_\epsilon\) consists of confusion matrices \(R\) such that \(R\) contains a minimal fraction of nonzero elements in each of its columns and \(R\) can not be converted into a diagonal matrix through row permutations. Let \[\eta_n = \inf_{R\in \mathcal{R}_\epsilon} \{ \,H_{\rho_nS,n}(\text{Diag}(R^T\mathbf{1}))-H_{\rho_nS,n}(R)\,\}\,.\] By Lemma 3, we can see that \(\eta_n\geq c\rho_n\) for all \(n\) sufficiently large and for a given constant \(c>0\). Then \[\label{qml1} H_{\rho_nS,n}(\text{Diag}(R^T\mathbf{1}))-H_{\rho_nS,n}(R) \;\geq\; c\rho_n\tag{13}\] and, by the definition of \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\), we also have that \[\label{qml2} Q_{\mathrm{\small ml}}({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}) - Q_{\mathrm{\small ml}}({\boldsymbol{Z}_n}) \;\geq\; 0\tag{14}\] eventually almost surely as \(n\to\infty\), provided \(\alpha< \min_{a}\pi_a\). Therefore \[X({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n}) \;\geq \; c\rho_n\,.\] By Theorem 6 we have for all \(\delta>0\) that \[|X({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})|\;<\; \delta\rho_n\] what is a contradiction for \(\delta<c\). Then we must have that \(R({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n})\not\in \mathcal{R}_\epsilon\) or \[\|R({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml},{\boldsymbol{Z}_n})-R({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})\|_1<\epsilon\] because the other condition is certainly true for a sufficiently small \(\epsilon\), eventually almost surely as \(n\to\infty\). This concludes the proof of Theorem 1 for the case of the maximum conditional likelihood estimator \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\). In the case of \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\), by definition we obtain that \[Q_{\mathrm{\small icl}}({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}) - Q_{\mathrm{\small icl}}({\boldsymbol{Z}_n}) \;\geq\; 0\] then by 13 and Lemma 1 we have that \[\begin{align} X({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n}) &\;= \; H_{\rho_nS,n}(\text{Diag}(R^T\mathbf{1}))- Q_{\mathrm{\small icl}}({\boldsymbol{Z}_n}) + Q_{\mathrm{\small icl}}({\boldsymbol{Z}_n}) - Q_{\mathrm{\small ml}}({\boldsymbol{Z}_n}) \\ &\quad\; -H_{\rho_nS,n}(R) + Q_{\mathrm{\small icl}}({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}) - Q_{\mathrm{\small icl}}({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}) + Q_{\mathrm{\small ml}}({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}) \\ & \;\geq \; c\rho_n - \frac{k^2 (\log n+2)}{n^2}\\ & \;\geq \; \frac{c}{2}\rho_n \end{align}\] for sufficiently large \(n\). But we also have by Theorem 6 that \[|X({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})|\;<\; \delta\rho_n\] what is a contradiction for \(\delta<c/2\). ◻

Proof of Theorem 2. Let \({\widehat{\boldsymbol{z}}_n}={\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\) and denote by \(m=m({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n})\) and \(R = R({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n})\). Take \(\epsilon>0\) such that \[m\;=\; \frac{n}{2}\,\|\text{Diag}(R^T\mathbf{1}) - R\|_1 \;\leq \; \epsilon n\,.\] Observe that this event holds with increasing probability by Theorem 1, so it is enough to prove that \(m\) does not belong to the interval \((0, \epsilon n]\), with probability converging to 1. By the Strong Law of Large Numbers we also have that \[\|\text{Diag}(R^T\mathbf{1}) - \text{Diag}(\pi)\|_1 \;\leq \; \epsilon\] eventually almost surely as \(n\to\infty\), therefore the above facts imply that \[\|R - \text{Diag}(\pi)\|_1 \;\leq \; 2\epsilon\] with probability converging to 1 as \(n\to\infty\). If \(\epsilon\) is taken sufficiently small and \(m\geq 1\), by Lemma 4 we have that \[\label{mainineq} \begin{align} H_{\rho_n S,n}(\text{Diag}(R^T\mathbf{1})) - H_{\rho_nS,n}(R) \;&\geq \; \rho_n\widetilde{C}(R,S) - \frac{c\rho_n m^2}{n^2} - \frac{c'\rho_n^2 m}{n} \end{align}\tag{15}\] with \[\widetilde{C}(R, S) = \sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a} K_1(S_{ab}\| S_{ab'})R_{bb'} \,,\] \(K_1(p\|q) = p\log(p/q) + q-p\) and \(c,c'\) some constants depending only on \(S\). Moreover, we have that \[\label{eq378} Q_{\mathrm{\small ml}}({\widehat{\boldsymbol{z}}_n}) - Q_{\mathrm{\small ml}}({\boldsymbol{Z}_n}) \;\geq\; 0\tag{16}\] eventually almost surely as \(n\to\infty\), provided \(\alpha< \min_{a}\pi_a\). Then by summing 15 and 16 and by 12 we must have that \[\label{eq238} X({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n}) \;\geq \; \rho_n\widetilde{C}(R,S) - \frac{c\rho_nm^2}{n^2}- \frac{c'\rho_n^2 m}{n}\tag{17}\] eventually almost surely as \(n\to\infty\). We will prove that \[\mathbb{P}\Bigl(\, X({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n}) \; >\; \rho_n\widetilde{C}(R,S) - \frac{c\rho_nm^2}{n^2}- \frac{c'\rho_n^2 m}{n} \Bigr) \;\longrightarrow\; 0\] as \(n\to\infty\), implying that \(m({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n})=0\) with probability converging to 1 as \(n\to\infty\). By the Law of Large Numbers we know that the event \(\{n_{a}({\boldsymbol{Z}_n})\geq \frac{\pi_a}{2} n \text{ for all }a \}\) will hold eventually with probability one. Then by Theorem 9 we will have for almost all \(\boldsymbol{z}_n\) and all \(\boldsymbol{e}_n\in[k]^n\) with \(m(\boldsymbol{z}_n,\boldsymbol{e}_n) \leq \min_a \frac{1}{16}\pi_a n\) that \[\begin{align} \mathbb{P}( X(\boldsymbol{e}_n,\boldsymbol{z}_n) - &X(\boldsymbol{z}_n,\boldsymbol{z}_n) \; >\; \, \rho_n \widetilde{C}(R,S) - \frac{c\rho_nm^2}{n^2}- \frac{c'\rho_n^2 m}{n} \,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n)\\ &\;\leq\; \exp\Bigl[ - \sup_{t\in(0,1]} \Bigl\{ t \widetilde{C}(R,S) - C_t(R,S)\Bigr\} \rho_n n^2 + D(m,n) \Bigr] \end{align}\] with \(C_t(R,\rho_nS)\) given by ?? and \[D(m, n) \;=\; D \Bigl( \rho_nm^2 + \rho_n^2mn + \delta \rho_n (n+ m\sqrt{n}) \Bigr)\,,\] where \(D\) is a constant only depending on \(S\) and \(\delta>0\) is arbitrarily small. As Hoeffding’s inequality implies that \(|[R^T\mathbf{1}]_{a} - \pi_a| \leq \sqrt{\log n/n}\) for all \(a\) and for almost all \(\boldsymbol{z}_n\) when \(n\to\infty\), we obtain that \[\begin{align} t\widetilde{C}(R,S)-C_t(R, S) &\geq\; \sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a} H_t(S_{ab} \| S_{ab'})R_{b'b}\\ &\geq \; C(\pi,S) \frac{m}{n} - \frac{c''m \sqrt{\log n}}{n^{3/2}} \end{align}\] with \[H_t(p\| q) \;=\; (1-t) p + t q - p^{1-t}q^t\,,\] \[C(\pi, S) \;=\; \min_{b,b'} \max_{t\in [0,1]} \;\sum_{a} \pi_a H_t(S_{ab} \| S_{ab'})\,,\] and \[c''= \max_{b, b'} \max_{t\in[0,1]} H_t(S_{ab} \| S_{ab'})\,.\] Now we will use a union bound over \(m=1,\dots, \epsilon n\) and \(\boldsymbol{e}_n\) with \(m(\boldsymbol{e}_n,\boldsymbol{z}_n)=m\). We first observe that we can simply bound the union over different confusion matrices \(R\) with \(n\sum_{b\neq b'} R_{bb'} =m\). A counting argument shows that we can bound this number by \[\binom{m+k(k-1)-1}{m}\] as we can think of this number as the result of putting \(m\) balls into \(k(k-1)\) boxes. This bound is useful for \(m\in (0,\rho_nn)\). Then, denoting by \[x = \rho_n\widetilde{C}(R,S) - \frac{c\rho_nm^2}{n^2}- \frac{c'\rho_n^2 m}{n}\] and using the usual bound \[\binom{M}{m} \;\leq \; \biggl(\frac{eM}{m} \biggr)^m\] we obtain that \[\begin{align} \mathbb{P}\Bigl(\, &\sup_{m\in (0,\rho_nn)} \sup_{\boldsymbol{e}_n\colon m(\boldsymbol{e}_n,\boldsymbol{z}_n)=m}X(\boldsymbol{e}_n,\boldsymbol{z}_n) - X(\boldsymbol{z}_n,\boldsymbol{z}_n) \, >\, x \,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\,\Bigr) \\[2mm] &\leq\; \sum_{m=1}^{\rho_n n} \binom{m+k(k-1)-1}{m}\exp\Bigl[ - C(\pi,S)\rho_n n m +\widetilde{D}(m,n) \Bigr] \\ &\leq\; \sum_{m=1}^{\rho_n n} \exp\bigl[ - C(\pi,S)\rho_n n m + \widetilde{D}(n,m)+ m\log e(m+k(k-1)) \bigr]\\ &\leq\; \sum_{m=1}^{\rho_n n} \exp\Bigl[ - \rho_n n m \Bigl( C(\pi,S) -\frac{\widetilde{D}(m, n)}{\rho_n n m}- \frac{\log e(\rho_n n+k(k-1))}{\rho_n n} \Bigr) \Bigr]\,, \end{align}\] with \(\widetilde{D}(m, n)=D(m, n) + c'' \rho_nm\sqrt{n\log n}\) . Then as \[\frac{\widetilde{D}(m, n)}{\rho_n n m}+ \frac{\log e(\rho_nn+k(k-1))}{\rho_n n}\] is arbitrarily small for \(n\) sufficiently large, we can bound the last sum by \[\begin{align} \rho_n n \;\exp\Bigl[ - \frac{1}{2} C(\pi,S) \rho_nn &\Bigr] \end{align}\] provided \(C(\pi,S)>0\) and \(n\) is sufficiently large. Then the probability above converges to 0 as \(n\to\infty\). Now, for \(m\in(\rho_nn, \epsilon n)\), we bound the number of \(\boldsymbol{e}_n\) with \(m(\boldsymbol{e}_n,\boldsymbol{z}_n)=m\) by the usual bound \[\binom{n}{m}(k-1)^m \;\leq \; \biggl(\frac{en(k-1)}{m} \biggr)^m\] and we obtain that \[\label{main-display} \begin{align} \mathbb{P}\Bigl(\, &\sup_{m\in (\rho_nn,\epsilon n)} \sup_{\boldsymbol{e}_n\colon m(\boldsymbol{e}_n,\boldsymbol{z}_n)=m}X(\boldsymbol{e}_n,\boldsymbol{z}_n) - X(\boldsymbol{z}_n,\boldsymbol{z}_n) \, >\, x \,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\,\Bigr) \\[2mm] &\leq\; \sum_{m={\rho_n n}}^{\epsilon n} \biggl(\frac{en(k-1)}{m} \biggr)^m\exp\Bigl[ - C(\pi,S)\rho_n n m +\widetilde{D}(m,n) \Bigr] \\ &=\; \sum_{m={\rho_n n}}^{\epsilon n} \exp\bigl[ - C(\pi,S)\rho_n n m + \widetilde{D}(n,m)+ m\log e(k-1)n - m\log m\bigr]\\ & =\; \sum_{m={\rho_n n}}^{\epsilon n} \exp\Bigl[ - \rho_n n m \Bigl( C(\pi,S) - \frac{\widetilde{D}(n,m)}{\rho_nmn}- \frac{\log e(k-1)n}{\rho_n n}+\frac{\log m}{\rho_n n} \Bigr) \Bigr] \,. \end{align}\tag{18}\] Now, for \(\rho\gg \log n/n\), the negative terms inside the parenthesis can be made arbitrarily small, then if \(C(\pi,S)>0\), the sum converges to 0 as \(n\to\infty\). So it remains to consider the case \(\rho_n=\log n/n\). Here the last display of 18 equals \[\begin{align} &\sum_{m=\rho_n n}^{\epsilon n} \exp\Bigl[ - m \log n\Bigl( C(\pi,S) -\frac{\widetilde{D}(n,m)}{\rho_nmn}- \frac{\log e(k-1)}{\log n} - 1+\frac{\log \log n}{\log n} \Bigr) \Bigr] \,. \end{align}\] In this case we have that \[\begin{align} \frac{\widetilde{D}(n,m)}{\rho_nmn} &= D \Bigl( \rho_nm^2 + \rho_n^2mn + \delta \rho_n (n+ m\sqrt{n}) \Bigr)\\ &= D \Bigl( \frac{m}{n} + \rho_n + \delta \bigl(\frac{1}{m}+ \frac{1}{\sqrt{n}}\bigr) \Bigr)\;\ll\; \frac{\log\log n}{\log n} \end{align}\] and also \[\frac{\log e(k-1)}{\log n} \;\ll\; \frac{\log\log n}{\log n}\] Then if \(C(\pi,S)\geq 1\), the sum can be bounded above by \[\begin{align} \frac{1}{2} \exp\bigl( - \frac{1}{2} \log n \log\log n \bigr) \end{align}\] and so converges to 0 as \(n\to\infty\). This concludes the proof of Theorem 2. ◻

Proof of Theorem 4. In the case of \({\widehat{\boldsymbol{z}}_n}={\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\), the proof is analogous to that of Theorem 2 with some minor modifications, but we obtain a suboptimal constant. Observe that, as \[Q_{\mathrm{\small icl}}({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}) - Q_{\mathrm{\small icl}}({\boldsymbol{Z}_n}) \;\geq\; 0\] now by Lemma 1 we have, instead of 17 that \[\label{eq232} \begin{align} X({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n}) - X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n}) \;&>\; \rho_n\widetilde{C}(R,S) - \frac{c\rho_nm^2}{n^2}- \frac{c'\rho_n^2 m}{n}- \frac{k^2 \log n}{n^2} - \frac{2k^2}{n^2}\,. \end{align}\tag{19}\] The proof follows as before, substituting \(\widetilde{D}(n,m)\) with \[\bar D(n,m) = \widetilde{D}(n,m) + k^2\log n + 2k^2\,.\] As before, if \(\rho_nn \gg \log n\), then \[\frac{\bar D(n,m)}{\rho_nmn}\; \longrightarrow\; 0\] as \(n\to\infty\), and the strong consistency of \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\) is guaranteed under the condition \(C(\pi,S)>0\). If \(\rho_n n = \log n\), the sum in the last display of 18 becomes \[\begin{align} \sum_{m=\log n}^{\epsilon n} \exp\Bigl\{ - m \Bigl[ \bigl(C(\pi,S)-1-k^2\bigr) \log n + \frac{1}{2}\log\log n\Bigr]\Bigr\} \end{align}\] so, to obtain the strong consistency of \({\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\) we need a little stronger condition, namely \(C(\pi,S)\geq 1+k^2\) for \(\rho_n=\log n/n\). ◻

6 Discussion↩︎

In this paper we proved the maximum conditional likelihood estimator of the communities in a general stochastic block model with unknown parameters and sparsity regime \(\rho_n\geq\log n/n\) can exactly recover the communities above and at the phase transition threshold, namely \(\rho_n=\log n/n\) and \(C(\pi,S)\geq 1\). Moreover, the integrated likelihood estimator is also strongly consistent under the same sparsity regime, but for a slightly bigger constant, namely \(C(\pi,S)\geq 1+k^2\). It remains open to show if this condition can be relaxed for this estimator. Considering almost exact recovery, we prove that a sufficient condition is \(n\rho_n\to\infty\) as \(n\to\infty\), generalizing the results obtained in [13] that proved the sufficiency and necessity of this condition in the planted bisection model with known parameters. These constitute new results showing that maximum conditional likelihood is optimal for community detection. As it is usual in the literature, in this paper we assume a known number of communities. Yet its estimation is an interesting problem in itself. It remains an open question to know if the estimation of \(k\) can be achieved for the regime \(n\rho_n\to\infty\) for penalized maximum likelihood as shown for a penalized complete likelihood estimator in [20].

Acknowledgments↩︎

This work was produced as part of the activities of the Research, Innovation and Dissemination Center for Neuromathematics (FAPESP 2013/07699-0). It was also supported by FAPESP projects (grants 2017/10555-0 and 2023/05857-9) and CNPq project (grant 441884/2023-7). FL is partially supported by a CNPq’s research fellowship, grant 311763/2020-0.

7 Appendix↩︎

In this section we present some auxiliary results that are necessary to prove the weak and strong consistency of the proposed estimators.

We begin by proving Lemma 1 relating the maximum and integrated conditional likelihood modularities.

Proof of Lemma 1. Observe that for any \(\boldsymbol{e}_n\in [k]^n\) we have \[\label{razao95a124z} Q_{\mathrm{\small ml}}(\boldsymbol{e}_n)-Q_{\mathrm{\small icl}}(\boldsymbol{e}_n) \;=\; \frac{1}{n^2}\sum\limits_{1\leq a \leq b \leq k} \log Q(\tilde{o}_{ab}(\boldsymbol{e}_n), \tilde{n}_{ab}(\boldsymbol{e}_n))\tag{20}\] with \[Q(o_{ab}(\boldsymbol{e}_n), n_{ab}(\boldsymbol{e}_n)) \;=\; \dfrac{ \Gamma\bigl(\frac{1}{2}\bigr)^2\Gamma\Bigl(\tilde{n}_{ab}+1\Bigr)\Bigl( \dfrac{\tilde{o}_{ab}}{ \tilde{n}_{ab}} \Bigr)^{ \tilde{o}_{ab}}\Bigl(1-\dfrac{\tilde{o}_{ab}}{\tilde{n}_{ab}}\Bigr)^{ \tilde{n}_{ab}-\tilde{o}_{ab}}}{\Gamma\Bigl(\tilde{o}_{ab}+\frac{1}{2}\Bigr)\Gamma\Bigl( \tilde{n}_{ab}-\tilde{o}_{ab}+\frac{1}{2}\Bigr)}\,,\] where, for simplicity in the notation, we have omitted the argument \(\boldsymbol{e}_n\) from the counters. For each pair \(a,b\), \(1\leq a,b \leq k\), using [4], we get \[\begin{align} \dfrac{\left( \dfrac{\tilde{o}_{ab}}{ \tilde{n}_{ab}} \right)^{ \tilde{o}_{ab}}\left(1-\dfrac{\tilde{o}_{ab}}{\tilde{n}_{ab}}\right)^{ \tilde{n}_{ab}-\tilde{o}_{ab}}}{\Gamma\left(\tilde{o}_{ab}+\frac{1}{2}\right)\Gamma\left( \tilde{n}_{ab}-\tilde{o}_{ab}+\frac{1}{2}\right)}\;\leq\; \frac{1}{\Gamma\left( \tilde{n}_{ab}+\frac{1}{2}\right)\Gamma\left(\frac{1}{2}\right)}\,. \end{align}\] Then \[\begin{align} & Q_{\mathrm{\small ml}}(\boldsymbol{e}_n)-Q_{\mathrm{\small icl}}(\boldsymbol{e}_n)\; \leq\; \frac{1}{n^2}\sum\limits_{1\leq a \leq b \leq k} \log\biggl( \frac{\Gamma(\frac{1}{2})\Gamma\left( \tilde{n}_{ab}+1\right)}{\Gamma\left( \tilde{n}_{ab}+\frac{1}{2}\right)} \biggr)\,. \end{align}\] Using the standard bound for the Gamma function \[x^{x-\frac{1}{2}}e^{-x}\sqrt{2\pi} \;\leq\; \Gamma(x)\; \leq\; x^{x-\frac{1}{2}}e^{-x}\sqrt{2\pi} e^{\frac{1}{12x}}\] valid for \(x>0\) we have that \[\begin{align} &\log\biggl( \frac{\Gamma(\frac{1}{2})\Gamma\left( \tilde{n}_{ab}+1\right)}{\Gamma\left( \tilde{n}_{ab}+\frac{1}{2}\right)} \biggr)\; \leq\; (\tilde{n}_{ab}+\textstyle{\frac{1}{2}})\log\left(\tilde{n}_{ab}+1\right) +\frac{1}{12(\tilde{n}_{ab}+1)} -n_{ab}\log\left(\tilde{n}_{ab}+\frac{1}{2}\right)+\frac{1}{2}\\ &\qquad\quad\leq \; \textstyle\frac{1}{2} \log \tilde{n}_{ab} + (\tilde{n}_{ab}+\textstyle\frac{1}{2})\log(1+\frac{1}{\tilde{n}_{ab}})-\tilde{n}_{ab}\log(1+\frac{1}{2\tilde{n}_{ab}})+\textstyle{\frac{1}{2}+\frac{1}{12 \tilde{n}_{ab}}} \\ & \qquad\quad\leq\; \log n + 2\,, \end{align}\] where the last inequality follows by \(1-\frac{1}{x} \leq \log x \leq x-1\) and \(1\leq \tilde{n}_{ab} \leq n^2\). As \[Q_{\mathrm{\small ml}}(\boldsymbol{e}_n)-Q_{\mathrm{\small icl}}(\boldsymbol{e}_n) \;\geq\; 0\] we conclude that \[|\, Q_{\mathrm{\small ml}}(\boldsymbol{e}_n)-Q_{\mathrm{\small icl}}(\boldsymbol{e}_n) \,| \;\leq\; \frac{k(k+1)(\log n+2)}{2n^2} \;\leq\; \frac{k^2(\log n+2)}{n^2}\,.\qedhere\] ◻

7.1 Basic results↩︎

The first two lemmas were stated and proved in [15]. We include them here for completeness, with some adaptations for our purposes.

Lemma 2. For every \(\boldsymbol{e}_n,\boldsymbol{z}_n\in[k]^n\) we have that \[\frac{1}{n}\sum_{i=1}^n \mathbf{1}\{e_i\neq z_i\} \;=\; \frac{1}{2}\|\text{Diag}(R(\boldsymbol{e}_n,\boldsymbol{z}_n)^T\mathbf{1}) - R(\boldsymbol{e}_n,\boldsymbol{z}_n)\|_1\,,\] where \(\text{Diag}(R(\boldsymbol{e}_n,\boldsymbol{z}_n)^T\mathbf{1})\) is a \(k\times k\) diagonal matrix with entries \([R(\boldsymbol{e}_n,\boldsymbol{z}_n)^T\mathbf{1}]_a\) on the diagonal for all \(a\in [k]\).

Proof. See [15]. ◻

Lemma 3. For any fixed matrix \(S\), uniformly in probability matrices \(R\), as \(\rho_n\to0\) \[\label{sma} \frac{1}{\rho_n} \biggl( H_{\rho_nS,n}(\text{Diag}(R^T\mathbf{1})) - H_{\rho_nS,n}(R))\biggr)\;\rightarrow\;G_{S}(\text{Diag}(R^T\mathbf{1})) - G_{S}(R))\,,\tag{21}\] with \[G_S(R) \;=\; \sum_{a,b} [RSR^T]_{ab} \log\frac{ [RSR^T]_{ab}}{ [R\mathbf{1}]_a [R\mathbf{1}]_b} \,.\] Furthermore, if \((S,\pi)\) is identifiable and the columns of \(R\) corresponding to positive coordinates of \(\pi\) are not identically zero, then the right side is strictly positive unless \(D_\sigma R\) is a diagonal matrix for some permutation matrix \(D_\sigma\).

Proof. The proof given in [15] is for a matrix \(S\) with values in \([0,1]\). But the statement is valid for any bounded matrix \(S\), by adding and subtracting the value \[\sum_{a, b}[RSR^T]_{ab} \log \frac{1}{s_{\max}}\] in 21 and also normalizing by \(s_{\max}\) both sides. ◻

Lemma 4. Let \(S\) be fixed and symmetric, \(S>0\) coordinatewise and \((\pi,S)\) identifiable. Then there exist constants \(c,c'>0\) only depending on \(S\) such that \[\label{vdpeq1} \begin{align} H_{\rho_n S,n}(\text{Diag}(R^T\mathbf{1})) - H_{\rho_nS,n}(R) \;\geq \;& \sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a} \widetilde{K}(\rho_n S_{ab}\| \rho_n S_{ab'})R_{bb'} \\ & - c\rho_n \Bigl( \sum_{b\neq b} R_{bb'}\Bigr)^2 - c'\rho_n^2\sum_{b\neq b} R_{bb'} \end{align}\tag{22}\] for all \(R\) with \(\|R-\text{Diag}(\pi)\|_1 < \epsilon\) for \(\epsilon\) sufficiently small and \(K_1(p\|q) = p \log (p/q) + q-p\) .

Proof of Lemma 4. By a lengthy calculation, [15] proved that the left hand side of 22 equals \[- \nabla G(0)^T \lambda - \int_0^1 ( \nabla G(s\lambda) - \nabla G(0))^T ds\, \lambda\] with \(\lambda\) a vector defined by \(\lambda_{bb'}= R_{bb'}\) for \(b\neq b'\) and \[G(\lambda)\;=\; H_{P,n}\Bigl( \text{Diag}(R^T\mathbf{1}) + \sum_{b\neq b'}\lambda_{bb'}\Delta_{bb'} \Bigr)\,\] for some given matrices \(\Delta_{bb'}\). Then the authors proved that \[\frac{\partial}{\partial \lambda_{bb'}} G(\lambda)|_{\lambda=0} \;=\; - \sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a} KL(P_{ab'}\| P_{ab}) + \frac{1}{2n}KL(P_{b'b'}\|P_{bb})\,\] for \(KL(s\|t) = s\log (s/t) + (1-a)\log((1-s)/(1-t))\). Then \[\begin{align} - \nabla G(0)^T \lambda \;&=\; \sum_{a}\sum_{b\neq b'} \Bigl( [R^T\mathbf{1}]_{a}KL(P_{ab'}\| P_{ab})R_{bb'} - \frac{1}{2n}KL(P_{b'b'}\|P_{bb})R_{bb'}\Bigr)\\ &\geq\; \sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a}KL(P_{ab'}\| P_{ab})R_{bb'} - \frac{c_1\rho_n}{n} \sum_{b\neq b'} R_{bb'} \end{align}\] where \(c_1\) is a constant only depending on \(S\). On the other hand, \[\begin{align} \int_0^1 ( \nabla G(s\lambda) - \nabla G(0))^T ds\, \lambda \;&\leq\; \max_{b\neq b'} \int_0^1 \bigl| ( \nabla G(s\lambda) - \nabla G(0))_{bb'}\bigr| ds \, \|\lambda\|_1 \\ &\;=\;c_2 \rho_n \|\lambda\|_1^2 \end{align}\] as \(( \nabla G(s\lambda) - \nabla G(0))_{bb'}\) has uniformly continuous derivatives in a neighborhood of 0. Moreover, for \(p,q\) bounded we have that \[KL(\rho_n p\| \rho_n q) \;\geq \; \rho_n K_1(p\|q) - c_3 \rho_n^2\] with \(K_1(p\|q)=p\log(p/q) + q - p\). Therefore \[\begin{align} H_{\rho_n S,n}(\text{Diag}(R^T\mathbf{1})) - H_{\rho_nS,n}(R) \;\geq \; &\sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a} \widetilde{K}(\rho_n S_{ab}\|\rho_n S_{ab'})R_{bb'}\\ & - c\rho_n \Bigl( \sum_{b\neq b} R_{bb '}\Bigr)^2 - c'\rho_n^2 \sum_{b\neq b} R_{bb'} \,, \end{align}\] with \(c\) and \(c'\) positive constants only depending on \(S\). ◻

7.2 Auxiliary results for weak consistency↩︎

For simplicity in the notation, from now on we write \(R(\boldsymbol{e}_n,{\boldsymbol{Z}_n})=R(\boldsymbol{e}_n)\) for all \(\boldsymbol{e}_n\in[k]^n\). Observe that for fixed sets \(I,J\subset [n]\) and conditionally on \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\), the random variables \[W_{ij} = \mathbb{1}\{i\in I, j\in J\}( A_{ij} - P_{z_iz_j})\,,\quad i,j=1,\dots, n\] have zero mean and are independent. Moreover, the conditional variance is given by \[\text{Var}(W_{ij}\,|\,Z_i=z_i,Z_j=z_j) \;=\; \mathbb{1}\{i\in I, j\in J\}P_{z_i,z_j}(1 - P_{z_i,z_j})\,.\] By the assumptions on \(P\) we have that \[\rho_n s_{\min}(1-\rho_ns_{\max}) \;\leq\; \text{Var}(W_{ij}|Z_i=z_i,Z_j=z_j) \;\leq\; \rho_ns_{\max}\,.\] We are interested in a concentration result for the empirical probabilities given by \[\widehat P_{ab}(\boldsymbol{A}_{n\times n}|{\widehat{\boldsymbol{z}}_n}) \;=\; \frac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{n_{ab}({\widehat{\boldsymbol{z}}_n})}\,,\qquad 1\leq a,b\leq k\,,\] where \({\widehat{\boldsymbol{z}}_n}\) is any of the estimators defined by 3 or 6 . We do not know in principle what \({\widehat{\boldsymbol{z}}_n}\) is, so we control these probabilities over all possible vectors \({\widehat{\boldsymbol{z}}_n}\).

Theorem 5. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\), with \(n\rho_n\to\infty\). Then for all \(\delta>0\) \[\label{evento1} \sup_{a,b \in [k]} \; \bigl| \,\widehat P_{ab}(\boldsymbol{A}_{n\times n}|{\boldsymbol{Z}_n}) - \rho_nS_{ab}\, \bigr| \;<\; \frac{\sqrt{\delta\rho_n \log n}}{n}\,\tag{23}\] with probability converging to 1 as \(n\to\infty\). Moreover, for all \(\alpha>0\) and all \(\delta<s_{\min}/2\) and for \[n\rho_n \;>\; \frac{4s_{\max}}{\alpha^2\delta^2}\] we have that \[\label{evento3} \sup_{a,b \in [k]}\sup_{\boldsymbol{e}_n\in \mathcal{F}(n,\alpha)} \bigl| \,\widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n)}]_{ab}\, \bigr| \;<\; \delta\rho_n \,,\tag{24}\] eventually almost surely as \(n\to\infty\).

Proof. For any \(\boldsymbol{e}_n\in [k]^n\) and \(a,b\in [k]\) fixed consider the event \[B_{n}(\boldsymbol{e}_n) \; = \;\bigl\{\, \bigl|\, \widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n)}]_{ab}\, \bigr| \,>\, x \bigr\}\,,\] with \([P_{R(\boldsymbol{e}_n)}]_{ab}=[P_{R(\boldsymbol{e}_n,{\boldsymbol{Z}_n})}]_{ab}\) given by 9 . Then \[\begin{align} \mathbb{P}\Bigl(\;\sup_{a,b \in [k]} \;\bigl|\,&\widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n})]_{ab} \,\bigr| \;>\; x\;\Bigr) \\[2mm] &\leq\; \sum_{\boldsymbol{z}_n} \sum_{a,b} \;\mathbb{P}( B_{n}(\boldsymbol{e}_n) \,|\, {\boldsymbol{Z}_n}=\boldsymbol{z}_n)\,\mathbb{P}({\boldsymbol{Z}_n}=\boldsymbol{z}_n)\,. \end{align}\] Now, for any \(a\in [k]\) define the set \(\mathcal{N}_a(\boldsymbol{e}_n) = \{i\colon e_i=a\}\). Given that \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\) and for \(\mathcal{N}_a(\boldsymbol{e}_n)=I\) and \(\mathcal{N}_b(\boldsymbol{e}_n)=J\) we have that \[o_{ab}(\boldsymbol{e}_n) - n_{ab}(\boldsymbol{e}_n) [P_{R(\boldsymbol{e}_n)}]_{ab}\] is a sum of \(n_{ab}(\boldsymbol{e}_n)=|I||J|\) zero mean independent random variables with variance bounded above by \(\rho_ns_{\max}\) for all \(n\) and below by \(\rho_ns_{\min}/2\), for sufficiently large \(n\), and the variables are also bounded in absolute value by 1. Here \(|I|\) denotes the cardinal of the set \(I\). If \(a=b\) then we have \(n_{aa}(\boldsymbol{e}_n) = |I|(|I|-1)/2\). The analysis is analogous using \(|J| = (|I|-1)/2\) below. Then by Bennett’s inequality we have that \[\begin{align} \mathbb{P}( B_{n}(\boldsymbol{e}_n)\,|\, {\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;&\leq\; 2\exp\Bigl(- \sigma^2 h\bigl(\frac{x|I||J|}{\sigma^2}\bigr)\Bigr)\\ \end{align}\] with \(h(x) = (1+x)\log(1+x)-x\). Then using the bounds \[\frac{1}{2}|I||J|\rho_ns_{\min}\;\leq\; \sigma^2 \;\leq\; |I||J| \rho_n s_{\max}\] and \[h(y) \;\geq\; \log(2)\frac{y^2}{2}\] valid for \(y\in [0,1]\) we obtain, for \(x<\frac{1}{2}\rho_ns_{\min}\) that \[\label{bennet1} \begin{align} \mathbb{P}( B_{n}(\boldsymbol{e}_n)\,|\, {\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;&\leq \;2\exp\Bigl(- \frac{\log(2)x^2 |I||J|}{2 \rho_n s_{\max}} \Bigr)\,. \end{align}\tag{25}\] To prove 23 it is enough to take \(\boldsymbol{e}_n={\boldsymbol{Z}_n}\) and \(x=\sqrt{\delta\rho_n\log n}/n\) in 25 , and as in this case \(|I| \geq \pi_a n/2\) for all \(a\), eventually almost surely as \(n\to\infty\), we obtain that \[\begin{align} \mathbb{P}( B_{n}({\boldsymbol{Z}_n})) \;&\leq\; 2\exp\Bigl(- \delta \log n \Bigr)\,, \end{align}\] with \(\delta\) arbitrarily small. Then for all \(\delta>0\) \[\begin{align} \mathbb{P}\Bigl(\;\sup_{a,b \in [k]} \;\bigl|\,&\widehat P_{ab}(\boldsymbol{A}_{n\times n}|{\boldsymbol{Z}_n}) - [P_{R({\boldsymbol{Z}_n}})]_{ab} \,\bigr| \;>\; \frac{\sqrt{\delta\rho_n\log n}}{n}\;\Bigr) \;\leq\; \frac{2 k^2}{n^\delta}\;\to \;0 \end{align}\] as \(n\to\infty\). In order to prove 24 , we use a union bound over possible sets \(I\) and \(J\), as different \(\boldsymbol{e}_n\) with the same \(\mathcal{N}_a(\boldsymbol{e}_n)\) and \(\mathcal{N}_b(\boldsymbol{e}_n)\) have the same value for \(\widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n)}]_{ab}\). Then \[\label{conditional95prob} \begin{align} \mathbb{P}\Bigl(\;\sup_{a,b \in [k]}\sup_{\boldsymbol{e}_n\in \mathcal{F}(n,\alpha)} \;\bigl|\,&\widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n})]_{ab} \,\bigr| \;>\; x\;\Bigr) \\[2mm] &\leq\; \sum_{\boldsymbol{z}_n} \sum_{a,b}\sum_{I,J} \;\mathbb{P}( B_{n}(I,J) \,|\, {\boldsymbol{Z}_n}=\boldsymbol{z}_n)\,\mathbb{P}({\boldsymbol{Z}_n}=\boldsymbol{z}_n) \end{align}\tag{26}\] with \[B_{n}(I,J) \; = \; \bigcup_{\boldsymbol{e}_n\in C(I,J)}\;\bigl\{\, \bigl|\, \widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n)}]_{ab}\, \bigr| \,>\, x\,\bigr\}\] and \[C(I,J) \;=\; \{\boldsymbol{e}_n\colon \mathcal{N}_a(\boldsymbol{e}_n)=I \text{ and }\mathcal{N}_b(\boldsymbol{e}_n)=J \}\,.\] Now we need to compute the number of different sets \(I\), \(J\). Let \(|I|=n_1\), \(|J|=n_2\). If \(a\neq b\) and \(k\geq 3\), the remaining vertices are distributed into \(k-2\) communities, that is \(n_3=n-n_1-n_2\). If \(k=2\), we simply assume \(n_3=0\). And if \(a=b\) we only consider \(n_1\) and \(n_3\), as \(I=J\) and \(n_2=n_1\). In any case we have \[\binom{n}{n_1\,n_2\,n_3}\] choices for \(I\) and \(J\) with \(|I|=n_1\) and \(|J|=n_2\). Therefore taking \(x= \delta\rho_n\) in 25 we have that \[\begin{align} \sum_{I,J}\, \mathbb{P}( B_{n}(I,J) \,|\, {\boldsymbol{Z}_n}=\boldsymbol{z}_n)\; &\leq\; 2\sum_{n_1,n_2} \binom{n}{n_1\,n_2\,n_3} \exp\Bigl[- \frac{\log(2)\delta^2 \rho_n n_1n_2}{2s_{\max}}\Bigr]\\ &\leq\; 2 \exp\Bigl[-\frac{\log(2)\delta^2\alpha^2 n^2\rho_n}{2s_{\max}} \Bigr] \sum_{n_1,n_2} \binom{n}{n_1\,n_2\,n_3} \\ &\leq\; 2 \exp\Bigl[-\log(2) n \Bigl( \frac{\delta^2\alpha^2 n\rho_n}{2s_{\max}} - 2 \Bigr) \Bigr]\,. \end{align}\] and therefore \[\begin{align} \mathbb{P}\Bigl(\;\sup_{a,b \in [k]}\sup_{\boldsymbol{e}_n\in \mathcal{F}(n,\alpha)} \;\bigl|\,&\widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n})]_{ab} \,\bigr| \;>\; x\;\Bigr) \leq\; 2 k^2\exp\Bigl[-\log(2) n \Bigl( \frac{\delta^2\alpha^2 n\rho_n}{2s_{\max}} - 2 \Bigr) \Bigr]\,. \end{align}\] This bound goes to zero with \(n\) (and in fact is summable in \(n\)) if \[\frac{\delta^2\alpha^2 n\rho_n}{2s_{\max}} - 2 \;>\; 0\] for all sufficiently large \(n\), or equivalently if \[n\rho_n \;>\; \frac{4s_{\max}}{\alpha^2\delta^2}\,.\] The strict inequality is satisfied eventually for \(n\rho_n\to\infty\) as \(n\to\infty\), provided \(\alpha\) and \(\delta\) are fixed constants or decrease sufficiently slowly to zero. ◻

The next corollary is a key technical result that allows us to approximate the function \(\tau\) in 2 by a Taylor expansion and ensures that the argument belongs to a bounded interval, where \(\tau\) has bounded derivatives and is then a locally Lipchitz function. This result is in contrast with the work by [10] that assumes \(\tau\) a global Lipschitz function, what do not holds in general, see [15].

Corollary 1. Under the same hypotheses of Theorem 5 we have that \[\frac{o_{ab}(\boldsymbol{e}_n)}{\rho_n\,n_{ab}(\boldsymbol{e}_n)} \;\in \; (s_{\min}- \delta, s_{\max}+\delta)\] simultaneously for all \(a,b \in [k]\) and \(\boldsymbol{e}_n\in [k]^n\) satisfying \(n_a(\boldsymbol{e}_n)\geq \alpha n\) for all \(a\), eventually almost surely as \(n\to\infty\).

Proof. By Theorem 5 we have that \[\label{evento} \Bigl| \,\widehat P_{ab}(\boldsymbol{A}_{n\times n}|\boldsymbol{e}_n) - [P_{R(\boldsymbol{e}_n)}]_{ab}\, \Bigr| \;<\; \delta\rho_n\,,\tag{27}\] simultaneously for all \(a,b\in [k]\) and \(\boldsymbol{e}_n\in[k]^n\) such that \(n_a(\boldsymbol{e}_n)\geq \alpha n\), eventually almost surely as \(n\to\infty\). By hypothesis we have that \[\begin{align} n_{ab}(\boldsymbol{e}_n) [P_{R(\boldsymbol{e}_n,\boldsymbol{z}_n)}]_{ab} &= \mathbb{E}(o_{ab}(\boldsymbol{e}_n)\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \\ & = \sum\limits_{a',b'} P_{a'b'} \sum\limits_{1 \leq i,j \leq n}\mathbb{1}\{z_i=a',z_j=b'\}\mathbb{1}\{e_i=a,e_j=b\}\\ & \geq \rho_n\,s_{\min}\,n_{ab}(\boldsymbol{e}_n) \,. \end{align}\] Analogously, \[n_{ab}(\boldsymbol{e}_n) [P_{R(\boldsymbol{e}_n,\boldsymbol{z}_n)}]_{ab}\; \leq\; \rho_n\,s_{\max}\,n_{ab}(\boldsymbol{e}_n) \,.\] Then under the hypotheses of Theorem 5 we have that \[\frac{o_{ab}(\boldsymbol{e}_n)}{\rho_n\,n_{ab}(\boldsymbol{e}_n)} \;\in \; [s_{\min}- \delta,s_{\max}+\delta]\] simultaneously for all \(a,b\in [k]\) and \(\boldsymbol{e}_n\in[k]^n\) such that \(n_a(\boldsymbol{e}_n)\geq \alpha n\), eventually almost surely as \(n\to\infty\). ◻

Now, based on Corollary 1, we can do a Taylor expantion for \(\tau\) and prove the following result.

Theorem 6. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\), with \(\rho_n\to0\) and \(n\rho_n\to\infty\) as \(n\to\infty\). Then for all \(\alpha < \min_a \pi_a\) and all \(\delta > 0\) we have that \[|X({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n})- X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})| \; <\; \delta\rho_n\] eventually almost surely as \(n\to\infty\), for \({\widehat{\boldsymbol{z}}_n}={\widehat{\boldsymbol{z}}_n}^\mathrm{\small ml}\) and \({\widehat{\boldsymbol{z}}_n}={\widehat{\boldsymbol{z}}_n}^\mathrm{\small icl}\). Moreover, \(\alpha\) can be taken as a decreasing sequence in \(n\), provided \[n\rho_n \;>\; \frac{4s_{\max}}{\alpha^2\delta^2}\] for \(n\) sufficiently large.

Proof. For simplicity in the notation we write \(X({\widehat{\boldsymbol{z}}_n})=X({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n})\), \(X({\boldsymbol{Z}_n})=X({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})\), \(R({\widehat{\boldsymbol{z}}_n},{\boldsymbol{Z}_n})=R({\widehat{\boldsymbol{z}}_n})\) and \(R({\boldsymbol{Z}_n},{\boldsymbol{Z}_n})=R({\boldsymbol{Z}_n})\). By the Taylor expansion of the function \(\tau\) when \(\rho_n\to0\) we have that \[\label{taylor95tau} \tau(\rho_n x) \;=\; \rho_n \gamma(x) + x \rho_n \log\rho_n + O\bigl(\rho_n^2x^2\bigr)\tag{28}\] with \[\gamma(x) \;=\; x\log x - x\,.\] Then \[\label{second95term} \begin{align} \tau\Bigl(\dfrac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{n_{ab}({\widehat{\boldsymbol{z}}_n})}\Bigr) \;&=\; \tau\Bigl(\rho_n \dfrac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{\rho_nn_{ab}({\widehat{\boldsymbol{z}}_n})}\Bigr)\\ &=\; \rho_n \gamma\Bigl(\dfrac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{\rho_nn_{ab}({\widehat{\boldsymbol{z}}_n})}\Bigr) + \dfrac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{\rho_nn_{ab}({\widehat{\boldsymbol{z}}_n})} \rho_n\log \rho_n + O\Bigl( \bigl(\dfrac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{n_{ab}({\widehat{\boldsymbol{z}}_n})}\bigr)^2 \Bigr) \end{align}\tag{29}\] and a similar expression for \(\dfrac{o_{ab}({\boldsymbol{Z}_n})}{n_{ab}({\boldsymbol{Z}_n})}\), \([ P_{R({\widehat{\boldsymbol{z}}_n})}]_{ab}\) and \([ P_{R({\boldsymbol{Z}_n})}]_{ab}\). Therefore we have that \[\label{taylor-exp} \begin{align} X({\widehat{\boldsymbol{z}}_n}) - X({\boldsymbol{Z}_n}) \;= \; & \frac{1}{2}\; \Biggl( \; \sum_{a,b\in[k]}\,\frac{ \rho_n n_{ab}({\widehat{\boldsymbol{z}}_n})}{n^2} \Bigl[ \gamma\Bigl(\dfrac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{\rho_n n_{ab}({\widehat{\boldsymbol{z}}_n})}\Bigr) - \gamma\bigl( [ P_{R({\widehat{\boldsymbol{z}}_n})}]_{ab}/\rho_n \bigr)\Bigr] \\ &- \sum_{a,b\in[k]}\,\frac{ \rho_n n_{ab}({\boldsymbol{Z}_n})}{n^2} \Bigl[ \gamma\Bigl(\dfrac{o_{ab}({\boldsymbol{Z}_n})}{\rho_nn_{ab}({\boldsymbol{Z}_n})}\Bigr) - \gamma\bigl( [ P_{R({\boldsymbol{Z}_n})}]_{ab}/\rho_n \bigr)\Bigr]\Biggr)\\ & + O\biggl( \Bigl(\dfrac{ko_{ab}({\widehat{\boldsymbol{z}}_n})}{n_{ab}({\widehat{\boldsymbol{z}}_n})}\Bigr)^2 \biggr) + O((k\rho_n)^2) \end{align}\tag{30}\] as the second term in 29 cancels in the difference \(X({\widehat{\boldsymbol{z}}_n}) - X({\boldsymbol{Z}_n})\), because \[\sum_{a,b} o_{ab}({\widehat{\boldsymbol{z}}_n}) \;=\; \sum_{a,b} o_{ab}({\boldsymbol{Z}_n})\] and \[\sum_{a,b} R({\widehat{\boldsymbol{z}}_n})PR({\widehat{\boldsymbol{z}}_n})^T \;=\; \sum_{a,b} R({\boldsymbol{Z}_n})PR({\boldsymbol{Z}_n})^T\,.\] Now, the function \(\gamma\) has bounded derivatives in any bounded interval, and we know by Corollary 1 that \[\frac{o_{ab}(\boldsymbol{e}_n)}{\rho_n\,n_{ab}(\boldsymbol{e}_n)} \;\in \; \Bigl[ \frac{s_{\min}}{2}, \frac{3s_{\max}}{2}\Bigr]\] simultaneously for all \(a,b \in [k]\) and \(\boldsymbol{e}_n\in [k]^n\) satisfying \(n_a(\boldsymbol{e}_n)\geq \alpha n\) for all \(a\), eventually almost surely as \(n\to\infty\). Call \(M\) the upper bound on the absolute value of the derivative of \(\gamma\) on this interval, then \[\begin{align} |X({\widehat{\boldsymbol{z}}_n})- X({\boldsymbol{Z}_n})| \;\leq\; &\;\frac{1}{2} M\Biggl( \sum_{1\leq a, b\leq k}\; \Bigl|\frac{o_{ab}({\widehat{\boldsymbol{z}}_n})}{n^2} -[R({\widehat{\boldsymbol{z}}_n})PR({\widehat{\boldsymbol{z}}_n})^T]_{ab} \Bigr| \\ &+ \sum_{1\leq a, b\leq k}\; \Bigl|\frac{o_{ab}({\boldsymbol{Z}_n})}{n^2} -[R({\boldsymbol{Z}_n})PR({\boldsymbol{Z}_n})^T]_{ab} \Bigr|\Biggr)\\ &+ O((k\rho_n)^2)\,. \end{align}\] Observe that for all \(\alpha<\min_a \pi_a\) we must have \({\boldsymbol{Z}_n}\) satisfying \(n_a({\boldsymbol{Z}_n})\geq \alpha n\) eventually almost surely as \(n\to\infty\). Then by 24 in Theorem 5, for all \(0<\delta'<s_{\min}/2\), the right-hand side can be bounded above by \[2\delta' Mk^2\rho_n\] eventually almost surely as \(n\to\infty\). Moreover, \(\alpha\) and \(\delta'\) can decrease to zero as \(n\to\infty\), with appropriate rates. Then taking \(\delta' < \delta/2Mk^2\), we finish the proof of Theorem 6. ◻

7.3 Auxiliary results for strong consistency↩︎

We now prove a concentration result for the difference in the number of edges in the network corresponding to two different labelings of the vertices.

Proposition 7. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\). For \(\boldsymbol{z}_n, \boldsymbol{e}_n\in [k]^n\) and \(a,b\in [k]\), define \[W_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n) = \dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{e}_n))}{n^2}- \dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}+ \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{z}_n))}{n^2} \,.\] Then \[\begin{align} \mathbb{P}\Bigl(\, & |W_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n)| \; >\; x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\, \Bigr) \;\leq\;2\exp\bigl(- xn^2 + 6\rho_n s_{\max}m(\boldsymbol{e}_n,\boldsymbol{z}_n)n\bigr)\,. \end{align}\] Moreover, for \(\rho_n\geq \log n/n\), there exists a constant \(c>0\) such that, given \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\) \[|W_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n)|\; \leq\; \frac{c \rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}\] simultaneously for all \(\boldsymbol{e}_n\in[k]^n\).

Proof of Proposition 7. Let \[Y^{ab} = \sum_{1\leq i<j\leq n} Y^{ab}_{ij}\] with \[\begin{align} Y_{ij}^{ab} = \frac{1}{n^2}&\Bigl( \mathbb{1}\{e_i=a,e_j=b\}+ \mathbb{1}\{e_i=b,e_j=a\} \\ &- \mathbb{1}\{z_i=a,z_j=b\} - \mathbb{1}\{z_i=b,z_j=a\} \Bigr)A_{ij}\,. \end{align}\] Conditioned on \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\), the variables \(Y_{ij}^{ab}\), for \(1\leq i<j\leq n\) are independent. Then applying Chernoff’s bound we obtain that \[\begin{align} \mathbb{P}( \,Y^{ab} - \mathbb{E}(Y^{ab}) > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;&\leq\; \inf_{t>0} \;\exp\bigl(- xt \bigr)\prod_{1\leq i<j\leq n}\mathbb{E}\bigl( \exp[ t(Y_{ij}-\mathbb{E}(Y_{ij}))]\bigr)\,. \end{align}\] Let’s analyze the expected values. Define \[\begin{align} \mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n) \;=\; & \mathbb{1}\{e_i=a,e_j=b\}+ \mathbb{1}\{e_i=b,e_j=a\} \\ &- \mathbb{1}\{z_i=a,z_j=b\} - \mathbb{1}\{z_i=b,z_j=a\}\,. \end{align}\] We have \[\mathbb{E}(Y_{ij}^{ab}) = \frac{\rho_nS_{ij}\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n^2}\,.\] On the other hand \[\begin{align} \mathbb{E}(\exp(tY_{ij})) \;&=\; \rho_nS_{z_iz_j} \exp\Bigl(\frac{t\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n^2}\Bigr)+ 1 - \rho_nS_{z_iz_j}\\ &=\; 1 + \rho_nS_{z_iz_j} \Bigl[ \exp\Bigl(\frac{t\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n^2}\Bigr) - 1\Bigr]\\ &\leq \; \exp\Bigl( \rho_nS_{z_iz_j} \Bigl[\exp\Bigl(\frac{t\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n^2} - 1\Bigr]\Bigr)\,. \end{align}\] Then \[\begin{align} \mathbb{E}(\exp(&t(Y_{ij}- \mathbb{E}(Y_{ij}))) \;\leq\;\exp\Bigl( \rho_nS_{z_iz_j} L\Bigl( \frac{t\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n^2}\Bigr) \Bigr) \end{align}\] with \(L(x)=e^x-x-1\). Then, taking \(t=n^2\) and using that \(e^x-x-1\leq 3|x|\) for \(x\in[-2,2]\), we have that \[\begin{align} \mathbb{P}( &Y^{ab} - \mathbb{E}(Y^{ab}) > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;\\ &\leq\; \exp \bigl(- xn^2\bigr)\prod_{1\leq i<j\leq n}\exp\Bigl[3\rho_nS_{z_iz_j}|\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)|\Bigr]\\ &=\; \exp \bigl(- xn^2\bigr)\exp\Bigl[3\rho_n \sum_{1\leq i<j\leq n}S_{z_iz_j} |\mathbb{1}_{ij}(\boldsymbol{e}_n,\boldsymbol{z}_n)|\Bigr]\\ &\leq\; \exp \bigl(- xn^2 + 6\rho_n s_{\max}m(\boldsymbol{e}_n,\boldsymbol{z}_n)n \bigr)\,. \end{align}\] The same can be obtained for left deviations. Therefore \[\mathbb{P}( |W_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n)| > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;\leq\; 2\exp\bigl(- xn^2 + 6\rho_n s_{\max}m(\boldsymbol{e}_n,\boldsymbol{z}_n)n\bigr)\,.\] To prove the second part of the proposition, we use a union bound over \(m\) and \(\boldsymbol{e}_n\) with \(m(\boldsymbol{e}_n,\boldsymbol{z}_n)=m\) and the usual bound \[\binom{n}{m} \;\leq \; \biggl(\frac{en}{m} \biggr)^m\] to obtain that \[\begin{align} \mathbb{P}( \; \sup_{m,\boldsymbol{e}_n} |W_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n)| > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;&\leq\; 2\sum_{m=1}^n \binom{n}{m} (k-1)^m \exp\bigl(- xn^2 + 6\rho_n s_{\max}mn\bigr)\\ &\leq\; 2\sum_{m=1}^n\exp\bigl(- xn^2 + 6\rho_n s_{\max}mn+ m\log en\bigr) \end{align}\] Then for \(x = c\rho_n m/n\) and \(\rho_n\geq \log n/n\) and we can bound the probability above by \[2\sum_{m=1}^n\exp( -m(c- 6s_{\max}-1)\log n)\,.\] This probability goes to zero if \(c\) is sufficiently large, namely \(c> 6s_{\max}+1\). ◻

Corollary 2. For \(\rho_n \geq \log n/n\), there exists constants \(c_i >0\), for \(i=1,2,3\) such that, given \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\) we have that

\(\Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} - \dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr| \;\leq\; \displaystyle\frac{c_1\rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}\)
\(\Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)} - \dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)}\Bigr| \;\leq\; \displaystyle \frac{c_2\rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)n}{n_{ab}(\boldsymbol{z}_n)-2nm(\boldsymbol{e}_n,\boldsymbol{z}_n)}\)
\(\Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{e}_n))}{n^2}\Bigr| \;\leq\; \displaystyle \frac{\sqrt{\delta\rho_n\log n}}{n} +\frac{c\rho_nm(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}\)

simultaneously for all \(\boldsymbol{e}_n\in[k]^n\) and \(\delta\) an arbitrarily small positive constant.

Proof. The first inequality follows directly by Proposition 7 and the fact that \[\Bigl| \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{e}_n))}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{z}_n))}{n^2}\Bigr| \;\leq\; \frac{2s_{\max}\rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n} \,.\] To prove 2 observe that \[\begin{align} \Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)} - \dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)}\Bigr| &\leq \frac{1}{n_{ab}(\boldsymbol{e}_n)}\Bigl| o_{ab}(\boldsymbol{e}_n)- o_{ab}(\boldsymbol{z}_n)\Bigr| + \frac{ o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)n_{ab}(\boldsymbol{e}_n)}\Bigl| n_{ab}(\boldsymbol{z}_n)- n_{ab}(\boldsymbol{e}_n)\Bigr|\,. \end{align}\] By Corollary 1 and part 1. we can bound the right hand side of the inequality above by \[\begin{align} \frac{1}{n_{ab}(\boldsymbol{e}_n)}& \bigl(\bigl| o_{ab}(\boldsymbol{e}_n)- o_{ab}(\boldsymbol{z}_n)\bigr| + 2\rho_n s_{\max}\bigl| n_{ab}(\boldsymbol{z}_n)- n_{ab}(\boldsymbol{e}_n)\bigr|\bigr)\\ &\leq \; \frac{1}{n_{ab}(\boldsymbol{z}_n)-2nm(\boldsymbol{e}_n,\boldsymbol{z}_n)}\bigl( c_1 \rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)n + 4s_{\max}\rho_n m (\boldsymbol{e}_n,\boldsymbol{z}_n)n\bigr)\,. \end{align}\] Finally, to prove 3 we use Proposition 7 and Theorem 5 and we obtain that \[\begin{align} \Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{e}_n))}{n^2}\Bigr| \;&\leq\; |W_{ab}(\boldsymbol{e}_n,\boldsymbol{z}_n)| + \Bigl| \dfrac{o_{ab}(\boldsymbol{z}_n))}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{z}_n))}{n^2}\Bigr|\\ &\leq\; \frac{c\rho_nm(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}+ \frac{\sqrt{\delta\rho_n\log n}}{n}\,. \end{align}\] simultaneously for all \(m=1,\dots, n\) and \(\boldsymbol{e}_n\in[k]^n\) such that \(n_a(\boldsymbol{e}_n)\geq \alpha n\) for all \(a\). ◻

In the following theorem, we use a Chernoff bound for a specific binary discrete distribution to obtain an optimal concentration result.

Proposition 8. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\), with \(S>0\) fixed and \(\rho_n\to0\) as \(n\to\infty\). Then, for all \(\boldsymbol{z}_n, \boldsymbol{e}_n\in [k]^n\) and for \[W(\boldsymbol{e}_n,\boldsymbol{z}_n) = \frac{1}{2}\sum_{1\leq a, b\leq k}\, \Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{e}_n))}{n^2}- \dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}+ \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{z}_n))}{n^2} \Bigr)\log S_{ab}\] we have that \[\begin{align} \mathbb{P}\Bigl(\, & W(\boldsymbol{e}_n,\boldsymbol{z}_n) \; >\; x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\, \Bigr) \;\leq\; \exp\Bigl[ - \sup_{t\in (0,1]} \Bigl\{ n^2 (xt- \rho_n n^2 C_t(R,S))\Bigl\} + D\rho_nm^2\Bigr]\,, \end{align}\] with \[\label{tildeC} C_t(R,S) = \sum_{a}\sum_{b\neq b'} [R^T\mathbf{1}]_{a} K_t(S_{ab} \| S_{ab'})R_{b'b} \,,\qquad{(1)}\] \(K_t(p\|q) = p^{1-t}q^t + t p \log (p/q) - p\) and \(D\) a constant only depending on \(S\).

Proof. Observe that we can write the sum inside the probability above as \[\begin{align} W = \frac{1}{2n^2} \sum_{1\leq a, b\leq k}\,\,\sum_{1\leq i,j\leq n} [ Y_{ij}^{ab} - \mathbb{E}(Y_{ij}^{ab}) ] \end{align}\] with \[Y_{ij}^{ab} = \log S_{ab}\Bigl( \mathbb{1}\{e_i=a,e_j=b\}A_{ij}\, - \mathbb{1}\{z_i=a,z_j=b\}A_{ij} \Bigr)\,.\] Consider \[Y_{ij} = \sum_{1\leq a, b\leq k}Y_{ij}^{ab}\] then we can write \[Y_{ij} = \log(S_{e_ie_j}/S_{z_iz_j}) A_{ij}\,.\] Conditioned on \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\), the variables \(Y_{ij}\), for \(1\leq i<j\leq n\) are independent, \(Y_{ji} = Y_{ij}\) and \(Y_{ii}=0\) for all \(i=1,\dots,n\). Then applying Chernoff’s bound, we obtain that \[\begin{align} \mathbb{P}( \,W(\boldsymbol{e}_n,\boldsymbol{z}_n) > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n)\;&=\; \mathbb{P}\Bigl( \,\sum_{1\leq i<j\leq n} Y_{ij} - \mathbb{E}(Y_{ij}) > n^2x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\Bigr) \\ &\leq\; \inf_{t\in (0,1]} \;\Bigl\{ \exp\bigl(- n^2xt \bigr)\prod_{1\leq i<j\leq n}\mathbb{E}\bigl( \exp[ t(Y_{ij}-\mathbb{E}(Y_{ij}))]\bigr)\Bigr\} \end{align}\] where the last expectation is given \({\boldsymbol{Z}_n}=\boldsymbol{z}_n\). A straightforward calculation shows that \[\begin{align} \mathbb{E}(\exp[t (Y_{ij} - \mathbb{E}(Y_{ij})]) \;&\leq\; \exp\Bigl[ \rho_nS_{z_iz_j} L\Bigl(t\log \frac{S_{e_ie_j}}{S_{z_iz_j}}\Bigr)\Bigr]\,, \end{align}\] with \(L(x) = e^x-x-1\). Therefore \[\begin{align} \mathbb{P}( \,W(\boldsymbol{e}_n,\boldsymbol{z}_n) > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;&\leq\; \;\inf_{t\in (0,1]} \Bigl\{ \;\exp\Bigl( - n^2x t + \sum_{1\leq i<j\leq n} \rho_nS_{z_iz_j} L\Bigl(t\log \frac{S_{e_ie_j}}{S_{z_iz_j}}\Bigr)\Bigr)\Bigr\}\\ &= \;\exp\Bigl( - \sup_{t\in(0,1]} \; \Bigl\{ n^2x t - \rho_n\sum_{1\leq i<j\leq n} S_{z_iz_j} L\Bigl(t\log \frac{S_{e_ie_j}}{S_{z_iz_j}}\Bigr)\Bigr\}\Bigr)\,. \end{align}\] Observe that for all \(1\leq i<j\leq n\) we have \[S_{z_iz_j} L\Bigl(t\log \frac{S_{e_ie_j}}{S_{z_iz_j}}\Bigr) \;=\; S_{z_iz_j}^{1-t}S_{e_ie_j}^t+tS_{z_iz_j}\log \frac{S_{z_iz_j}}{S_{e_ie_j}}- S_{z_iz_j}\,.\] Then define, for \(t \in [0,1]\) and \(p,q>0\), the function \[\label{Kt} K_t(p \|q) \;=\; p^{1-t} q^{t} + tp\log (p/q) - p.\tag{31}\] Using that matrix \(S\) is symmetric and denoting \(R=R(\boldsymbol{e}_n,\boldsymbol{z}_n)\) we can write \[\begin{align} \sum_{1\leq i<j\leq n}K_t(S_{z_iz_j} \| S_{e_ie_j})\;&=\; \frac{n^2}{2}\;\sum_{a',a}\sum_{b',b} R_{a'a}K_t(S_{ab} \| S_{a'b'})R_{b'b}\\ &=\; n^2\;\sum_{b\neq b'}\sum_{a} R_{aa}K_t(S_{ab} \| S_{ab'})R_{b'b} \\ &\quad + \frac{n^2}{2}\;\sum_{a\neq a'}\sum_{b\neq b'} R_{a'a}K_t(S_{ab} \| S_{a'b'})R_{b'b}\\ &\leq\;n^2\; \sum_{a} \sum_{b\neq b'} [R^T\mathbf{1}]_{a} K_t(S_{ab} \| S_{ab'})R_{b'b} + Dm^2(\boldsymbol{e}_n,\boldsymbol{z}_n)\,, \end{align}\] with \[\label{D} D = \max_{t\in[0,1]}\max\limits_{a\neq a',b\neq b'} \{ K_t(S_{ab} \| S_{a'b'}) \}\,,\tag{32}\] observing that \(D_0(S) = 0\). Then \[\mathbb{P}\Bigl(W(\boldsymbol{e}_n,\boldsymbol{z}_n) >x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\Bigr) \;\leq\; \exp\Bigl[ - \sup_{t\in (0,1]} \Bigl\{n^2 \bigl(xt- \rho_n n^2 C_t(R,S)\bigr)\Bigr\} + D\rho_nm^2 \Bigr]\] with \(C_t(R,S)\) given by ?? and \(D\) a constant only depending on \(S\). ◻

The following result is the base to prove the strong consistency of the maximum and integrated conditional likelihood estimators.

Theorem 9. Let \(({\boldsymbol{Z}_n},\boldsymbol{A}_{n\times n})\) be generated from a SBM with parameters \((\pi, \rho_nS)\), with \(S>0\) fixed, \(\rho_n \geq \log n/n\) and \(\rho_n\to0\) as \(n\to\infty\). Then for all \(\boldsymbol{z}_n\in [k]^n\) such that \(n_{a}(\boldsymbol{z}_n)\geq \frac{\pi_a}{2} n\) and all \(\boldsymbol{e}_n\in[k]^n\) with \(m(\boldsymbol{z}_n,\boldsymbol{e}_n) \leq \min_a \frac{1}{16}\pi_a n\) we have that \[\begin{align} \mathbb{P}( X(\boldsymbol{e}_n) - &\,X(\boldsymbol{z}_n) > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \;\leq\; \\[2mm] &\exp\Bigl[ - \sup_{t\in(0,1]} \Bigl\{ x n^2t - \rho_n n^2 C_t(R,S))\Bigr\} +F\widetilde{\phi}(n,m)\Bigr]\,, \end{align}\] with \(m=m(\boldsymbol{e}_n,\boldsymbol{z}_n)\), \(C_t(R,S)\) given by ?? , \[\label{phin2} \widetilde{\phi}(n,m) = \rho_nm^2 + \rho_n^2mn + \delta \rho_n (n+ m\sqrt{n})\tag{33}\] \(F\) a positive constant depending only on \(S\) and \(\delta\) arbitrarily small.

Proof. Observe that \[\begin{align} X(\boldsymbol{e}_n)& - X(\boldsymbol{z}_n)\;= \; \frac{1}{2}\Biggl(\sum_{1\leq a, b\leq k}\, \dfrac{n_{ab}(\boldsymbol{e}_n)}{n^2} \Bigl[\tau\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr) - \tau\Bigl([ P_{R(\boldsymbol{e}_n)}]_{ab}\Bigr) \Bigr]\\ &\qquad\qquad\qquad\qquad\qquad + \dfrac{n_{ab}(\boldsymbol{z}_n)}{n^2} \Bigl[\tau\Bigl([ P_{R(\boldsymbol{z}_n)}]_{ab}\Bigr) - \tau\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{ n_{ab}(\boldsymbol{z}_n)}\Bigr) \Bigr]\Biggl)\,, \end{align}\] with \[\tau(x) = x\log x + (1-x) \log(1-x)\,.\] By the Gibbs inequality, we have, in the case of \(x,y\in (0,1)\) that \[\tau(x) \geq x\log y + (1-x) \log(1-y)\,.\] We use this inequality for \(x=[ P_{R(\boldsymbol{e}_n)}]_{ab}\) and \(y=\dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\) and we obtain that \[\tau\Bigl([ P_{R(\boldsymbol{e}_n)}]_{ab}\Bigr) \;\geq\; [ P_{R(\boldsymbol{e}_n)}]_{ab}\log \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)} + (1 - [ P_{R(\boldsymbol{e}_n)}]_{ab}\log \Bigl(1-\dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr)\,.\] We also use it for \(x=\dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)}\) and \(y= \rho_n S_{ab}\) and we obtain that \[\tau\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)}\Bigr) \;\geq\; \dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)}\log\rho_n S_{ab} + (1 - \dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)})\log \Bigl(1-\rho_n S_{ab}\Bigr)\] We also have that \[[ P_{R(\boldsymbol{z}_n)}]_{ab} = \rho_nS_{ab}\,.\] Then we obtain the bound \[\label{XX} \begin{align} X(\boldsymbol{e}_n) - X(\boldsymbol{z}_n)\;\leq\;& \frac{1}{2}\sum_{1\leq a, b\leq k}\, \Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}- \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr) \Bigr) \log \Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr) \\ &+ \Bigl(\mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr) - \frac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr) \log\Bigl( 1 - \frac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr)\\ &+ \Bigl(\mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr)- \dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr) \log \rho_nS_{ab}\\ &+ \Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2} - \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr)\Bigr)\log ( 1- \rho_nS_{ab})\,, \end{align}\tag{34}\] where, by an abuse of notation, we write \(\mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr) = \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\Bigr)\). Observe that we can write \[\begin{align} \log \Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr) \;=\; \log \Bigl[ \rho_nS_{ab} \Bigl(1 + \frac{ o_{ab}(\boldsymbol{e}_n)/n_{ab}(\boldsymbol{e}_n) - \rho_nS_{ab}}{\rho_nS_{ab}}\Bigr)\Bigr] \end{align}\] and similarly \[\begin{align} \log \Bigl(1 - \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)}\Bigr) \;=\; \log \Bigl[ \bigl(1-\rho_nS_{ab}\bigr) \Bigl(1 + \frac{ \rho_nS_{ab} - o_{ab}(\boldsymbol{e}_n)/n_{ab}(\boldsymbol{e}_n)}{1-\rho_nS_{ab}}\Bigr)\Bigr] \end{align}\] Then as \(\log(1+x) \leq |x|\) for all \(x> -1\), assuming \(\rho_nS_{ab} < 1-\rho_nS_{ab}\) for all \(a,b\), the difference \(X(\boldsymbol{e}_n) - X(\boldsymbol{z}_n)\) in 34 can be upper bounded by \[\label{XXX} \begin{align} \frac{1}{2}\sum_{1\leq a, b\leq k}\, &\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}- \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr) - \frac{o_{ab}(\boldsymbol{z}_n)}{n^2} + \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr)\Bigr) \log S_{ab}\\ &+ \frac{1}{2}\sum_{1\leq a, b\leq k} \; \Bigl|\mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr)- \dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} + \frac{o_{ab}(\boldsymbol{z}_n)}{n^2} - \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr)\Bigr| \rho_nS_{ab} \\ & + \frac{1}{2}\sum_{1\leq a, b\leq k} \frac{1}{S_{ab} \rho_n} \; \Bigl|\mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr) - \frac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr|\, \Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)} - \rho_nS_{ab}\Bigr|\,. \end{align}\tag{35}\] By Proposition 7 we have \[\max_{ab} \; \Bigl|\mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2}\Bigr)- \dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} + \frac{o_{ab}(\boldsymbol{z}_n)}{n^2} - \mathbb{E}\Bigl(\dfrac{o_{ab}(\boldsymbol{z}_n)}{n^2}\Bigr)\Bigr| \;\leq\; \frac{c\rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}\] with \(c\) a given constant only depending on \(S\). On the other hand, by the Triangular Inequality, Corollary 2 and Theorem 5 we have that \[\begin{align} \Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)} - \rho_nS_{ab}\Bigr| \;&\leq\; \Bigl| \dfrac{o_{ab}(\boldsymbol{e}_n)}{n_{ab}(\boldsymbol{e}_n)} -\dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)} \Bigr| + \Bigl| \dfrac{o_{ab}(\boldsymbol{z}_n)}{n_{ab}(\boldsymbol{z}_n)} - \rho_nS_{ab}\Bigr|\\ &\leq\; \frac{c_2\rho_n m(\boldsymbol{e}_n,\boldsymbol{z}_n)n}{n_{ab}(\boldsymbol{z}_n)-2nm(\boldsymbol{e}_n,\boldsymbol{z}_n)}+ \frac{\sqrt{\delta\rho_n\log n}}{n}\\ &\leq\; \rho_n \biggl(\frac{c_2m(\boldsymbol{e}_n,\boldsymbol{z}_n)n}{n_{ab}(\boldsymbol{z}_n)-2nm(\boldsymbol{e}_n,\boldsymbol{z}_n)}+ \sqrt{\frac{\delta\log n}{\rho_nn^2}}\biggr)\\ &\leq\; \rho_n \biggl(\frac{c m(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}+ \sqrt{\frac{\delta\log n}{\rho_nn^2}}\biggr)\,. \end{align}\] Then, the last two terms of 35 can be bounded above by \(E\phi(n,m)\) for \(E>0\) a given constant depending only on \(S\) and \[\label{phin} \phi(n,m) = \rho_n\biggl( \frac{cm(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}+ \sqrt{\frac{\delta\log n}{\rho_nn^2}}\biggr)^2+ \rho_n^2\frac{m(\boldsymbol{e}_n,\boldsymbol{z}_n)}{n}\,.\tag{36}\] Therefore \[\begin{align} \mathbb{P}\Bigl( X(\boldsymbol{e}_n) - X(\boldsymbol{z}_n) > x\,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n\Bigr) \;\leq\; \mathbb{P}\Bigl(W(\boldsymbol{e}_n,\boldsymbol{z}_n) > x - E \phi(n,m) \Bigr) \end{align}\] with \[W(\boldsymbol{e}_n,\boldsymbol{z}_n) = \frac{1}{2}\sum_{1\leq a, b\leq k}\, \Bigl(\dfrac{o_{ab}(\boldsymbol{e}_n)}{n^2} - \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{e}_n))}{n^2}- \dfrac{o_{ab}({\boldsymbol{Z}_n})}{n^2}+ \dfrac{\mathbb{E}(o_{ab}(\boldsymbol{z}_n))}{n^2} \Bigr)\log S_{ab}\,.\] Then, by Proposition 8, we have that \[\begin{align} \mathbb{P}\bigl( W&(\boldsymbol{e}_n,\boldsymbol{z}_n)\; > x - E \phi(n,m) \,|\,{\boldsymbol{Z}_n}=\boldsymbol{z}_n) \\[2mm] &\leq\; \exp\Bigl[ - \sup_{t\in (0,1]} \Bigl\{ n^2 (xt- \rho_n n^2 C_t(R,S))\Bigr\} + D\rho_nm^2 + En^2 \phi(n,m) \Bigr]\,. \end{align}\] with \(D\) and \(E\) constants depending only on \(S\). Observing that \(D\rho_nm^2 + \phi(n,m) \leq \widetilde{\phi}(n,m)\) with \(\widetilde{\phi}(n,m)\) given by 33 , the result follows. ◻

References↩︎

[1]

Holland, Paul W, Laskey, Kathryn Blackmond, & Leinhardt, Samuel. 1983. Stochastic blockmodels: First steps. Social networks, 5(2), 109–137.

[2]

Karrer, Brian, & Newman, Mark EJ. 2011. Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.

[3]

Gao, Chao, Ma, Zongming, Zhang, Anderson Y., & Zhou, Harrison H. 2018. . The Annals of Statistics, 46(5), 2153 – 2185.

[4]

Cerqueira, Andressa, Gallo, Sandro, Leonardi, Florencia, & Vera, Cristel. 2024. Consistent model selection for the Degree Corrected Stochastic Blockmodel. ALEA, Lat. Am. J. Probab. Math. Stat, 21, 267–292.

[5]

Latouche, Pierre, Birmelé, Etienne, Ambroise, Christophe, et al. 2011. Overlapping stochastic block models with application to the french political blogosphere. The Annals of Applied Statistics, 5(1), 309–336.

[6]

Airoldi, Edoardo M., Blei, David M., Fienberg, Stephen E., & Xing, Eric P. 2008. Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9(65), 1981–2014.

[7]

Xu, Kevin S., & Hero, Alfred O. 2014. Dynamic Stochastic Blockmodels for Time-Evolving Social Networks. IEEE Journal of Selected Topics in Signal Processing, 8(4), 552–562.

[8]

Matias, Catherine, & Miele, Vincent. 2017. Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 79(4), 1119–1141.

[9]

Girvan, M., & Newman, M. E. J. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.

[10]

Bickel, Peter J, & Chen, Aiyou. 2009. A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50), 21068–21073.

[11]

Amini, Arash A, Chen, Aiyou, Bickel, Peter J, Levina, Elizaveta, et al. 2013. Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41(4), 2097–2122.

[12]

Lei, Jing, & Rinaldo, Alessandro. 2015. . The Annals of Statistics, 43(1), 215 – 237.

[13]

Mossel, Elchanan, Neeman, Joe, & Sly, Allan. 2016. . Electronic Journal of Probability, 21(none), 1 – 24.

[14]

Abbe, Emmanuel, Bandeira, Afonso S, & Hall, Georgina. 2015. Exact recovery in the stochastic block model. IEEE Transactions on information theory, 62(1), 471–487.

[15]

van der Pas, SL, van der Vaart, AW, et al. 2018. Bayesian Community Detection. Bayesian Analysis, 767 – 796.

[16]

Abbe, Emmanuel, & Sandon, Colin. 2015. Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery. Pages 670–688 of:2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[17]

Daudin, J-J, Picard, Franck, & Robin, Stéphane. 2008. A mixture model for random graphs. Statistics and computing, 18(2), 173–183.

[18]

Côme, Etienne, & Latouche, Pierre. 2015. Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Statistical Modelling, 15(6), 564–589.

[19]

Yao, Y. Y. 2003. Information-Theoretic Measures for Knowledge Discovery and Data Mining. Berlin, Heidelberg: Springer Berlin Heidelberg. Pages 115–136.

[20]

Cerqueira, A., &Leonardi, F. 2020. Estimation of the Number of Communities in the Stochastic Block Model. IEEE Transactions on Information Theory, 66(10), 6403–6412.

Optimal recovery by maximum and integrated conditional likelihood in the general Stochastic Block Model