Abstract

Recommender systems are essential tools in the digital landscape for connecting users with content that more closely aligns with their preferences. Matrix completion is a widely used statistical framework for such systems, aiming to predict a user’s preferences for items they have not yet rated by leveraging the observed ratings in a partially filled user–item rating matrix. Realistic applications of matrix completion in recommender systems must address several challenges that are too often neglected: (i) the discrete nature of rating-scale data, (ii) the presence of malicious users who manipulate the system to their advantage through the creation of fake profiles, and (iii) missing-not-at-random patterns, where users are more likely to rate items they expect to enjoy. Our goal in this paper is twofold. First, we propose a novel matrix completion method, robust discrete matrix completion (RDMC), designed specifically to handle the discrete nature of sparse rating-scale data and to remain reliable in the presence of adversarial manipulation. We evaluate RDMC through carefully designed experiments and realistic case studies. Our work therefore, secondly, offers a statistically-sound blueprint for future studies on how to evaluate matrix completion methods for recommender systems under realistic scenarios.

Keywordsadversarial attacks, imputation, latent low-rank regularization, missing data, outliers, recommendation engines

0.0.0.1 Funding information

This work is supported by grants of the Dutch Research Council (NWO), research program Vidi, grant numbers VI.Vidi.195.141 (Andreas Alfons) and VI.Vidi.211.032 (Ines Wilms).

1 Introduction↩︎

Recommender systems play a critical role in modern digital platforms, particularly in online retail and advertising, where they help users discover relevant products efficiently and drive substantial commercial value. Improvements in recommendation accuracy, even marginally, can yield significant financial gains in industries such as e-commerce and computational advertising [1]. However, malicious agents may launch so-called attacks with fake user profiles to deliberately manipulate recommendations to their advantage (e.g., [2]; see also [3], and [4], for overviews). For instance, already in the early 2000s, platforms like Amazon and eBay have reported repeated attempts to subvert their recommender systems through deceptive practices such as fake reviews and purchased ratings [5]. Today, these concerns remain more relevant than ever, as highlighted by [6], and they are drawing increasing attention from regulatory authorities such as the [7], the [8], or the [9]. For instance, the [10] recently introduced new measures and regulations aimed at protecting consumers, which include urging online platforms to proactively detect and address fake profiles and suspicious behavioral patterns—such as those generated by artificial intelligence. This paper introduces a novel matrix completion method that can withstand adversarial manipulation in recommender systems; our experiments offer a statistically-sound yet realistic blueprint for future studies to evaluate performance.

From a statistical perspective, recommender systems are often framed as matrix completion problems for a sparse user-item rating matrix (e.g., ranging from one star to five stars), with predictions being used to recommend new items to users. A prominent example is the famous Netflix Prize competition [11]. While early work on matrix completion dates back to, among others, [12], [13], and [14], it has been actively studied since [15] and [16]. For overviews, we refer to, e.g., [17], [18], and [1].

A multitude of methods for matrix completion exist, including weighted low-rank matrix approximation through singular value decomposition [19]; matrix factorization (e.g., [13], [14]; or [20], for recent computational advances); rank minimization, among which nuclear norm minimization [15], [16], [21], singular value thresholding [22]–[24], iteratively reweighted least squares minimization [25]; combinations of the former [26], [27]; or dynamic approaches over time [28].

We contribute to the literature, first methodologically, by presenting a novel matrix completion procedure called robust discrete matrix completion (RDMC) that is tailored towards the discrete nature of incomplete rating-scale data while being robust to adversarial manipulation. We assess RDMC against the widely-used procedure Soft-Impute [21], [26], as well as a variant that we adjusted with a post-hoc discretization step to fit the discrete nature of rating-scale data. We study these methods in empirical case studies and extensive simulations. These experiments are carefully designed to be more realistic than previous studies, thereby providing an important second conceptual contribution to the literature, as our experiments may serve as realistic yet statistically-sound blueprints for future studies in the context of matrix completion and recommender systems to further boost transparency, reproducibility and generalizability. With our contributions, we in particular address the following issues.

First, the vast majority of existing methods for matrix completion are formulated over the real number domain and produce continuous predictions, despite recommender systems applications featuring (discrete and bounded) rating-scale data. Exceptions that include discreteness or box constraints can be found in [29], [30], [31], [32], [33], or [34]. Although the observed ratings may be interpreted as discrete measurements of a latent continuous sentiment, the underlying continuous distributions are not identified without additional assumptions [35], invalidating comparisons of predictions across different items. The proposed procedure RDMC therefore restricts the predictions to the given rating scale via a discreteness constraint.

Second, only a relatively small subset of the literature on matrix completion considers the presence of outliers or corrupted/manipulated observations, such as fake profiles in recommender systems. From a perspective of robust statistics, there are primarily two approaches for handling outliers to avoid biased analyses. One approach involves explicitly identifying and removing outliers, while the generally favored approach focuses on robust methods that can accommodate the presence of outliers without compromising performance [36]. In the context of matrix completion and recommender systems, an example of the first strategy is given in [37], while existing procedures for the latter are commonly based on robust nuclear norm minimization [29], [38]–[42] or robust matrix factorization [43]–[47]. Among these, only [29] address discrete data. Moreover, [6] cautions against the prevalent business practice of merely identifying and removing fraudulent reviews, as their study demonstrates that such an approach can result in persistent, long-term negative effects. This highlights the need for a robust statistical approach to protect against adversarial manipulation in matrix completion for discrete rating-scale data, which RDMC achieves by incorporating a robust loss function.

Third, much of the literature does not investigate the effect of missing data mechanisms other than missing completely at random (MCAR) on matrix completion procedures, with some notable exceptions being [48] for missing at random (MAR) as well as [49] and [50] for missing not at random (MNAR). Yet missing values in recommender systems applications are not MCAR in practice [1]: users predominantly rate items that they have consumed or purchased and thus expected to like, intrinsically linking the probability of missingness to their preferences. Hence, we study the performance of the proposed method under realistic MNAR settings.

For transparency and reproducibility—key concerns in the current scientific landscape as discussed in [1]—we provide implementations of the methodology in package RMCLab [51] for the statistical computing environment R [52], which is available from https://CRAN.R-project.org/package=RMCLab. The main part of the code is thereby written in C++ to improve computational efficiency. Furthermore, replication files of all analyses will be made publicly available upon acceptance of this manuscript.

The remainder of the paper is structured as follows. Section 2 introduces the proposed procedure RDMC for robust matrix completion with discrete rating-scale data. In Section 3, we introduce a simulation design that mimics recommender systems applications, and we investigate the performance of RDMC. Section 4 then presents two empirical case studies. The final Section 5 provides a concluding discussion.

2 A robust procedure for discrete matrix completion↩︎

We start by formulating the regularized optimization problem for matrix completion in Section 2.1 and presenting the corresponding algorithm in Section 2.2. Section 2.3 provides further algorithmic details, while Section 2.4 introduces different choices of robust loss functions. Finally, the selection of the regularization parameter is discussed in Section 2.5.

2.1 Problem formulation↩︎

Suppose that we observe an incomplete rating matrix \(\boldsymbol{R}\) of \(n\) rows (representing individuals providing the ratings) and \(p\) columns (representing the rated items) with elements \(R_{ij} \in \{1, 2, \dots, K\}\). For practical reasons (see Section 2.3 for further discussion), we consider a column-centered matrix \(\boldsymbol{X}\) with elements \(X_{ij} \in \mathcal{C}_{j} = \{c_{1}^{j}, \dots, c_{K}^{j}\} \subset \mathbb{R}\), where \(c_{1}^{j} < \dots < c_{K}^{j}\) for \(j = 1, \dots, p\). Note that the values of the rating categories may vary between columns of \(\boldsymbol{X}\). Hence, the assumption that the original rating matrix \(\boldsymbol{R}\) takes values in \(\{1, 2, \dots, K\}\) is purely for simplicity and without loss of generality, and the proposed procedure extends in a straightforward manner to settings where even the number of rating categories may differ per column.

Let \(\Omega\) denote the index set of observed entries in \(\boldsymbol{X}\) and define the projection \(P_{\Omega}\) to be the \((n \times p)\)-dimensional matrix whose elements are given by \[\left( P_\Omega(\boldsymbol{X}) \right)_{ij} = \begin{cases} X_{ij} & \text{ if } (i,j) \in \Omega, \\ 0 & \text{ otherwise.} \end{cases}\] Matrix completion is often based on minimizing the mean squared error for the observed elements subject to a nuclear norm constraint as a relaxation of a low-rank constraint. Specifically, the widely-used procedure Soft-Impute [21], [26] solves the optimization problem (in Lagrangian form) \[\label{eq:SI} \min_{\boldsymbol{L}} \frac{1}{2} \| P_{\Omega}(\boldsymbol{X}) - P_{\Omega}(\boldsymbol{L}) \|_{F}^{2} + \lambda \| \boldsymbol{L} \|_{*},\tag{1}\] where \(\| \cdot \|_{F}\) denotes the Frobenius norm and \(\| \cdot \|_{*}\) the nuclear norm, while \(\lambda \geq 0\) is a regularization parameter.

In order to preserve the discrete nature of the ratings, one could add a discreteness constraint to 1. However, requiring the elements of \(\boldsymbol{L}\) to be discrete may not yield a solution, since such a discreteness constraint and a rank constraint are unlikely to be fulfilled simultaneously. To resolve this issue, an ancillary continuous matrix \(\boldsymbol{Z}\) can be introduced into problem 1 such that the discreteness constraint operates on \(\boldsymbol{L}\) and the nuclear norm constraint on \(\boldsymbol{Z}\), while ensuring that \(\boldsymbol{L}\) and \(\boldsymbol{Z}\) remain as close as possible. In addition, the squared Frobenius norm in 1 does not protect against corrupted observations (such as fake profiles in recommender systems). Indeed, this norm penalizes large errors quadratically, implying that corrupted observations with large errors may dominate the loss function and significantly influence the recommender system. For increased protection, we replace it with a robust loss function that puts less emphasis on large errors, thereby preventing them from disproportionately affecting the recommender system. More specifically, we replace it with a (pseudo-)norm \(\| \boldsymbol{Y} \|_{\rho} = \sum_{i,j} \rho(Y_{ij})\) based on a robust loss function \(\rho\) [46]. Putting all of this together, we obtain the following optimization problem in augmented Lagrangian form: \[\label{eq:RDMC} \begin{array}{r l} {\displaystyle \min_{\boldsymbol{L}, \boldsymbol{Z}}} & {\displaystyle \| P_{\Omega}(\boldsymbol{X}) - P_{\Omega}(\boldsymbol{L}) \|_{\rho} + \lambda || \boldsymbol{Z}||_{*} + \langle \boldsymbol{\Theta}, \boldsymbol{L} - \boldsymbol{Z} \rangle_{F} + \frac{\mu}{2} || \boldsymbol{L} - \boldsymbol{Z}||_{F}^{2}} \\ \text{subject to} & L_{ij} \in \mathcal{C}_{j}, \quad i = 1, \dots, n, j = 1, \dots, p, \end{array}\tag{2}\] where \(\langle \cdot,\cdot \rangle_{F}\) denotes the Frobenius inner product, \(\boldsymbol{\Theta}\) is a multiplier adjusting for the discrepancy between \(\boldsymbol{L}\) and \(\boldsymbol{Z}\), and \(\mu\) is an additional regularization parameter. It should be noted that 2 generalizes the optimization problem formulated in [29] to a wider class of robust loss functions and by allowing different values of the rating categories between columns of the (column-centered) rating matrix.

2.2 Algorithm↩︎

The formulation of the optimization problem 2 lends itself to an alternating direction method of multipliers (ADMM) algorithm [53], which follows along similar lines as that of [29]. For a given value of the regularization parameter \(\lambda\), the following steps are iterated until convergence (after initialization as described in Section 2.3).

First, consider \(\boldsymbol{L}\) fixed and solve 2 for \(\boldsymbol{Z}\). Combining the terms for the Frobenius inner product and Frobenius norm, dropping constant terms, and division by \(\mu\) yields the equivalent minimization problem \[\min_{\boldsymbol{Z}} {\textstyle \frac{1}{2} \| (\boldsymbol{L} + \frac{1}{\mu} \boldsymbol{\Theta}) - \boldsymbol{Z} \|_{F}^{2} + \frac{\lambda}{\mu} \| \boldsymbol{Z} \|_{*} }.\] The solution is given by the soft-thresholded singular value decomposition (SVD) of \(\boldsymbol{L} + \frac{1}{\mu} \boldsymbol{\Theta}\) [22]. That is, with \(\boldsymbol{L} + \frac{1}{\mu} \boldsymbol{\Theta} = \boldsymbol{U} \boldsymbol{D} \boldsymbol{V}^{\top}\), where \(\boldsymbol{U}\) is \((n \times q)\)-dimensional, \(\boldsymbol{V}\) is \((p \times q)\)-dimensional, \(\boldsymbol{D}\) is a \((q \times q)\)-dimensional diagonal matrix whose diagonal elements are denoted by \(d_{1}, \dots, d_{q}\), and \(q \leq \min(n, p)\) denoting the rank, we obtain \[\boldsymbol{Z} = \boldsymbol{U} S(\boldsymbol{D}) \boldsymbol{V}^{\top}, \label{eq:Zupdate}\tag{3}\] where \(S(\boldsymbol{D}) = \text{diag} \left( (d_{1} - \frac{\lambda}{\mu})_{+}, \dots, (d_{q} - \frac{\lambda}{\mu})_{+} \right)\) with \((y)_{+} = \max(y, 0)\).

Second, consider \(\boldsymbol{Z}\) fixed and solve 2 for \(\boldsymbol{L}\). Using similar operations as described above but without rescaling, we arrive at the equivalent minimization problem \[\begin{array}{r l} {\displaystyle \min_{\boldsymbol{L}}} & \| P_{\Omega}(\boldsymbol{X}) - P_{\Omega}(\boldsymbol{L}) \|_{\rho} + \frac{\mu}{2} \| \boldsymbol{L} - \boldsymbol{Z} + \frac{1}{\mu} \boldsymbol{\Theta} \|_{F}^{2} \\ \text{subject to} & L_{ij} \in \mathcal{C}_{j}, \quad i = 1, \dots, n, j = 1, \dots, p. \end{array}\] This can be solved element-wise, with the solution being given by \[L_{ij} = \begin{cases} {\displaystyle \mathop{\mathrm{arg\,min}}_{c_{k} \in \mathcal{C}_{j}}} \; \rho(c_{k}- X_{ij}) + \frac{\mu}{2} (c_{k} - Z_{ij} + \frac{1}{\mu} \Theta_{ij})^2 & \text{ for } (i,j) \in \Omega, \\ {\displaystyle \mathop{\mathrm{arg\,min}}_{c_{k} \in \mathcal{C}_{j}}} \; (c_{k} - Z_{ij} + \frac{1}{\mu} \Theta_{ij})^2 & \text{ for } (i,j) \notin \Omega. \end{cases} \label{eq:updateL}\tag{4}\]

Third, update the discrepancy parameter \(\boldsymbol{\Theta} \leftarrow \boldsymbol{\Theta} + \mu (\boldsymbol{L} - \boldsymbol{Z})\) and the regularization parameter \(\mu \leftarrow \delta \mu\) with \(\delta > 1\). Hence, \(\mu\) grows exponentially in the number of iterations to speed up convergence.

Pseudo-code for the algorithm is presented in Algorithm 1. Following [29] and [46], we initialize \(\mu = 0.1\) and set \(\delta = 1.05\). As convergence criterion, we take the relative change in the objective function from 2 falling below a given threshold, which we set to \(\varepsilon_\text{tol} = 10^{-4}\). Furthermore, we set the maximum number of iterations to \(t_\text{max}=100\), as this sufficed for convergence in all our numerical experiments.

Figure 1: Robust matrix completion for discrete rating-scale data

2.3 Column centering and initialization↩︎

Since the algorithm contains a soft-thresholded SVD step, it is important that the observed (incomplete) rating matrix \(\boldsymbol{R}\) is centered. One possibility is to center each column by the midpoint of the rating scale (i.e., the mean of the minimum and maximum rating category). While this can be expected to work well if the rating distributions of the columns are symmetric around the midpoint, such a setting is unrealistic in practical applications. In recommender systems, the distributions of popular items, for instance, are typically skewed towards higher ratings, with the maximum rating often being the most frequent. Hence, we center the \(j\)th column in the given data matrix \(\boldsymbol{R}\) by the median \(M_j\) of its observed cells, i.e., we obtain \(\boldsymbol{X}\) by setting \(X_{ij} = R_{ij} - M_{j}\), \(i = 1, \dots, n\), \(j = 1, \dots, p\). Accordingly, we transform the set of rating categories to \(C_{j} = \{c_{1}^{j}, \dots, c_{K}^{j}\}\) with \(c_{k}^{j} = k - M_{j}\) for \(k = 1, \dots, K\). This highlights the need for a procedure that allows for different rating categories in different columns, a feature that our proposed procedure RDMC accommodates.

Using the median-centered matrix \(\boldsymbol{X}\), we initialize the matrix \(\boldsymbol{L} = P_{\Omega}(\boldsymbol{X})\) (corresponding to median imputation) and the discrepancy parameter \(\boldsymbol{\Theta}\) as an \((n \times p)\)-dimensional matrix of zeros. However, the algorithm is typically applied for a grid of values for \(\lambda\) (see Section 2.5). Then the obtained solutions for \(\boldsymbol{L}\) and \(\boldsymbol{\Theta}\) for a given value of \(\lambda\) are used as starting values for the next value of \(\lambda\). As \(\boldsymbol{\Theta}\) adjusts for the discrepancy between \(\boldsymbol{L}\) and \(\boldsymbol{Z}\) (which should increase with increasing \(\lambda\)), the values of \(\lambda\) should thereby be sorted in ascending order.

2.4 Loss functions↩︎

In order to reduce the influence of large errors due to corrupted observations such as fake profiles, we consider the following robust loss functions for the (pseudo-)norm \(\|\cdot\|_{\rho}\) in 2 , which are common choices in the literature on robust methods:

The pseudo-Huber loss \(\rho(y) = \tau^2 (\sqrt{1 + (y / \tau)^2} - 1)\) with parameter \(\tau\). The inclusion of this loss function is motivated by its successful application to rating-scale data in a different context, namely in autoencoder neural networks for the detection of careless responding in surveys from the behavioral sciences [54], [55]. Here, we suggest to link the parameter \(\tau\) to the step size between rating categories by setting \(\tau = 1\).
The absolute loss \(\rho(y) = |y|\).
A truncated variant of the absolute loss given by \(\rho(y) = \min (|y|, \tau)\). We suggest to set \(\tau = (K - 1) / 2\), i.e., the loss is truncated at half the range of the rating categories.

Since the update step for \(\boldsymbol{L}\) yields separable problems for its elements (see Section 2.2), the use of a nonconvex function such as the truncated absolute loss comes at no cost to computational complexity. A nonconvex loss may yield greater robustness in selecting the regularization parameter \(\lambda\) when minimizing the loss on a test set, which is discussed next.

2.5 Selection of the regularization parameter↩︎

To select the regularization parameter \(\lambda\) from a grid of candidate values based on out-of-sample prediction performance, some of the observed elements of \(\boldsymbol{R}\) can be set to missing values for subsequent use as a validation set. For measuring the prediction error on the validation set, a natural choice is to apply the same loss function \(\rho\) that is used for fitting the algorithm on the training set. It is possible to apply a cross-validation scheme whereby the observed elements are randomly divided into blocks, with each block being used as validation set once. However, even if the rows are independent, elements within the same row are not. It is therefore unclear if cross-validation would reduce the correlations among prediction errors on the different validation sets, compared to repeated holdout validation whereby a proportion of the observed elements are randomly selected in each replication to form the validation set. Hence, we prefer repeated holdout validation, as the proportion of observations in the validation set and the number of replications can be chosen independently.

3 Simulations↩︎

Before applying the proposed procedure in empirical case studies, we thoroughly evaluate it via simulations. Crucially, our simulation design closely mimics recommender systems applications so that it may serve as blueprint for future studies. We perform 100 replications.

3.1 Data generation↩︎

We simulate data of \(n = 300\) user ratings on \(p = 200\) items. We start by simulating latent continuous data from the low-rank matrix factorization model \(\boldsymbol{Z} = \boldsymbol{A} \boldsymbol{B}^{\top} + \mathcal{E}\), where \(\boldsymbol{A}\) is of dimension \(n \times q\), \(\boldsymbol{B}\) of dimension \(p \times q\), and \(\boldsymbol{\mathcal{E}}\) of dimension \(n \times p\), with rank \(q = 20\) and the elements of \(\boldsymbol{A}\), \(\boldsymbol{B}\), and \(\boldsymbol{\mathcal{E}}\) being independent and standard normally distributed. Subsequently, we rescale \(\boldsymbol{Z}^{*} = \boldsymbol{Z} / \sqrt{q+1}\) so that the elements of \(\boldsymbol{Z}^{*}\) have variance 1.

To create popular items (in general higher ratings) and unpopular items (in general lower ratings), we add random mean shifts \(s_{1}, \dots, s_{p}\) to the respective columns of \(\boldsymbol{Z}^{*}\) before discretization into \(K \in \{3, 5, 10\}\) ordinal rating categories encoded as values \(\{1, \dots, K\}\). These mean shifts are randomly drawn from the interval \([-s_{\text{max}}, s_{\text{max}}]\), where \(s_{\text{max}}\) depends on the number of categories and breakpoints in the discretization. Specifically, \(s_{\text{max}}\) is chosen so that the corresponding mean shift results in 40% of the ratings being expected in the maximum rating category.

For discretizing the resulting continuous data matrix in order to obtain a rating matrix \(\boldsymbol{R}\), we use the following breakpoints depending on the number of rating categories:

\(K = 3\): inspired by Netflix’ asymmetric rating scale (“Not for me”, “I like this”, “Love this”), we set the breakpoints between categories at 0 and 1.5.
\(K = 5\): to simulate common 1 to 5 star ratings, we set the breakpoints at \(-1.5\), \(-0.5\), \(0.5\), and \(1.5\).
\(K = 10\): to simulate common 1 to 10 star ratings (or, equivalently, ratings up to 5 stars but allowing for half-stars), we set the breakpoints at \(-2\), \(-1.5\), \(-1\), \(-0.5\), \(0\), \(0.5\), \(1\), \(1.5\), and \(2\).

3.2 Missing values↩︎

We generate missing values in the rating matrix \(\boldsymbol{R}\) using two different settings:

Missing not at random (MNAR): the negated mean shifts \(-s_{1}, \dots, -s_{p}\) are mapped to the interval \([0.4, 0.99]\) in order to obtain the proportion of observations in each item to be replaced by missing values. For instance, the most popular item (large positive mean shift, high ratings) contains 40% missing values, while the most unpopular item (large negative mean shift, low ratings) contains 99% missing values.² Hence, low ratings are much more likely to be missing than high ratings.
Missing completely at random (MCAR): a proportion \(\gamma = 0.7\) of cells in \(\boldsymbol{R}\) is randomly selected and replaced by missing values. Here, \(\gamma\) is chosen so that the overall proportion of missing values is similar to the MNAR setting. Note that this setting is included for reference purposes since MCAR is unrealistic in recommender systems applications.

3.3 Attacks↩︎

To investigate the robustness of the methods against adversarial manipulation with profile injection attacks (also known as shilling attacks), we focus on so-called nuke attacks aimed at demoting a certain target item, i.e., decreasing the probability of it being recommended. We apply three efficient attack schemes, denoted by average, reverse bandwagon, and love/hate (see [57] for a detailed overview of common attack schemes).

To select the target item, we first make a pre-selection of items with a mean shift larger than \(0.9 \cdot s_{\text{max}}\). Although it requires knowledge of unobserved information, this pre-selection reflects that the attacker has some prior notion of popular items. Among those, the item with the highest average observed rating is targeted, with fake profiles assigning the minimum rating to this item. To keep the effect of the attacks comparable across missing data settings, the number of fake profiles is determined as a proportion \(\varepsilon = 0.2\) of the number of observed ratings in the target item.

Table 1: Overview of the selected nuke attack schemes, following the formalization provided by [57].
	Selected items			Filler items
Attack type	Selection	Fraction	Rating	Selection	Fraction	Rating
Average	Not used			Random	0.1	Item mode
Reverse bandwagon	Unpopular	0.1	\(c_{\min} = 1\)	Random	0.1	\(c_{\min} = 1\)
Love/hate	Not used			Random	0.1	\(c_{\max} = K\)

Table 1 summarizes the strategy in the three attack schemes regarding so-called selected items (which are chosen by the attacker based on specific characteristics) and filler items (which are randomly chosen). In the average attack, filler items are assigned ratings based on the item mode. The idea is that the fake profiles have typical ratings in other items but dislike the target item. We thereby use the item mode instead of the item average, following the recommendation of [58] for discrete ratings. The reverse bandwagon attack is intended to associate the target item with disliked items. Hence, it uses unpopular items as selected items and assigns the minimum rating to both the selected and filler items. As unpopular items, we take the items with the lowest average observed ratings, provided that at least 20 ratings are observed. In the love/hate attack, the filler items receive the maximum rating so that the fake profiles love any other item while hating the target item.

3.4 Methods↩︎

For the proposed method RDMC, we consider the three loss functions described in Section 2.4 (denoted by pHuber for the pseudo-Huber loss, absolute for the absolute loss, and truncated for the truncated absolute loss) and set other parameters as described in Section 2.2.

We compare RDMC with the popular procedure Soft-Impute (SI) [21], [26]. Since this method yields continuous predictions, we introduce a variant that discretizes the obtained predictions to the given rating scale via the mapping function \(m_{j}(y) = \min(\max([y], c_{\min}), c_{\max})\), where \(c_{\min}\) and \(c_{\max}\) are the minimum and maximum rating, respectively, and \([\cdot]\) denotes rounding to the nearest integer (SI-discretized). We use the SVD-based algorithm due to its theoretical convergence guarantees and set the same convergence threshold as for RDMC.

For selecting the regularization parameter \(\lambda\), both RDMC and SI use repeated holdout validation with 5 replications and 10% of the observed cells to be randomly selected as validation set. The candidate values are thereby given by a logarithmic grid of ten values between 0.01 and 1, which are scaled by the largest singular value of \(P_{\Omega}(\boldsymbol{X}_{\text{train}})\), with \(\boldsymbol{X}_{\text{train}}\) denoting the median-centered (for RDMC) or mean-centered (for SI) training data.

Additional benchmark methods are median imputation (denoted by median), a discretized variant of median imputation (median-discretized; in case the median of a column falls in between rating categories, the predictions are randomly sampled from the corresponding two categories), and mode imputation (mode; in case of a column with multiple modes, one of them is selected at random for each missing cell).

3.5 Evaluation criteria↩︎

In the absence of an attack, the methods are evaluated using the mean absolute error (MAE) over the predictions of all missing cells, i.e., \[\label{eq:MAE} MAE = \frac{1}{|\Gamma|} \sum_{(i,j) \in \Gamma} |R_{ij} - \widehat{R}_{ij}|,\tag{5}\] where \(\Gamma\) denotes the index set of missing cells in \(\boldsymbol{R}\), and \(\widehat{R}_{ij}\) is the prediction of the cell \(R_{ij}\). For the different attack settings, we evaluate the methods via the mean prediction shift (MPS) in the target variable [57], i.e., \[MPS = \frac{1}{|\Gamma_{\text{target}}|} \sum_{i \in \Gamma_{\text{target}}} \left (\widehat{R}_{ij_{\text{target}}}^{\text{after}} - \widehat{R}_{ij_{\text{target}}}^{\text{before}} \right),\] where \(\Gamma_{\text{target}}\) is the index set of missing cells in the target variable \(j_{\text{target}}\) of \(\boldsymbol{R}\), while \(\widehat{R}_{ij}^{\text{before}}\) and \(\widehat{R}_{ij}^{\text{after}}\) denote the prediction of the cell \(R_{ij}\) before and after the attack, respectively.

3.6 Results↩︎

Figure 2: Results for the simulated recommender system without an attack. The rows correspond to different missing data mechanisms and the columns to different numbers of rating categories.

For RDMC, we focus on the pseudo-Huber loss, as preliminary simulations indicate that results for the absolute loss and the truncated absolute loss are similar (see Appendix 6). Figure 2 displays box plots of the MAE for the setting without an attack. In case of three rating categories, SI without the discretization step does not improve upon median and mode imputation. The discretized variant, on the other hand, yields a clear improvement in MAE, with RDMC falling in between. For five rating categories, SI and RDMC considerably outperform the benchmark methods, with RDMC being in between the two SI approaches for the MNAR setting, but performing equally well as SI-discretized in the MCAR setting. For ten rating categories, RDMC yields the best performance in terms of MAE, while both SI approaches exhibit large variability in the MCAR setting.

Figure 3: Results for the simulated recommender system with attacks in the MNAR setting. The rows correspond to different attacks and the columns to different numbers of rating categories. Results for mode imputation were unstable and are therefore omitted.

We now turn to stability of the predictions under adversarial manipulation, with Figure 3 containing box plots of the MPS for the MNAR setting. Results for the MCAR setting are qualitatively similar and can be found in Appendix 6. Both SI approaches are strongly influenced by the attacks, with the average attack having the biggest impact. The MPS is thereby large enough that the target item would be recommended to far fewer users—for instance, the predictions on average drop by at least one full rating category in the setting with ten categories. RDMC is far more stable in the presence of attacks, with an MPS much closer to zero in all settings. Furthermore, while median regression is stable for three and five rating categories, it yields an MPS of \(-1\) in most instances with ten rating categories.

On a final note, we also included a variant of RDMC with the pseudo-Huber loss that we do not iterate until convergence. Instead, we apply a liberal stopping criterion, limiting the procedure to a maximum of 10 iterations (denoted by RDMC-pHuber-liberal in the figures). Interestingly, the MAE in case of no attack is often smaller compared to iterating until convergence, whereas the MPS in the presence of an attack is slightly larger in absolute value in case of three rating categories.

4 Empirical case studies↩︎

We consider two publicly available datasets in the context of recommender systems. The first in Section 4.1 is used to investigate the impact of attacks on empirical data (rather than stylized simulated data), while the second in Section 4.2 allows to compare two missing data mechanisms (MNAR and MCAR). In both case studies, the observed ratings are split randomly into subsets of 80% for training the algorithms and 20% for testing prediction performance.

Since computational burden forms an important practical consideration across many recommender system applications, we also analyze the stability of RDMC and SI with respect to their stopping criteria in order to assess whether reducing the computational burden comes at a cost of accuracy. We use two stopping criteria for each method, denoted by liberal and strict. For RDMC, the liberal criterion corresponds to stopping after a maximum of 10 iterations, whereas the strict criterion iterates until convergence (cf. our simulations from Section 3). For SI, on the other hand, we follow the strategy of [26] by setting a larger convergence threshold of \(10^{-3}\) for the liberal criterion, while the strict criterion uses the threshold \(10^{-4}\) (as in the simulations). The other parameters are kept the same as in Section 3, except that the number of replications in repeated holdout sampling for selecting the regularization parameter is increased from 5 to 10.

4.1 MovieLens 100K data: The impact of attacks↩︎

The famous MovieLens 100K dataset [56] is extensively analyzed in previous studies on recommender systems [46], [48], [59]. It consists of 100,000 movie ratings on a scale from 1 to 5, of \(n = 943\) users on 1,682 movies. We restrict our analysis to the \(p = 939\) movies that are rated at least 20 times. The observed ratings follow a left-skewed distribution with a majority of medium to high ratings, as shown in Figure 4.

First, Figure 5 (left) displays the prediction performance of the different methods in terms of the MAE on the test set. RDMC (with any of the three loss functions) and SI clearly outperform median and mode imputation with an MAE of around 0.725, while SI-discretized presents an even smaller MAE around 0.69. Applying the liberal stopping criterion thereby hardly affects prediction performance of RDMC and SI compared to the strict criterion. However, Figure 5 (right) shows that it yields a substantial reduction in computation time for RDMC—by more than a factor of three—resulting in similar computation time to SI.

Figure 5: Prediction performance (left panel) and computation time in seconds (right panel) for the MovieLens 100K data. Computation time is averaged over a grid of ten values for \(\lambda\) and in case of RDMC also across the three considered loss functions..

Figure 6: Stability of predictions for the MovieLens 100K data in the presence of attacks. The rows correspond to different attacks and the columns to different attack sizes.

Second, we investigate the sensitivity of the methods to adversarial manipulation. To this end, we use the three nuke attack schemes described in Section 3.3. We additionally vary the number of fake profiles, based on a proportion \(\varepsilon \in \{0.10,0.15,0.20\}\) of the number of observed ratings in the target variable. Figure 6 displays the results for the MPS. All RDMC approaches outperform both SI approaches by a considerable margin, for all attack schemes. Moreover, RDMC provides effective protection against the attacks with an MPS close to 0 in almost all cases, irrespective of the loss function. In the presence of the love/hate attack, however, the strict stopping criterion is necessary for RDMC to achieve the best result. As expected, the MPS of the two SI approaches deteriorates as the size of the attack increases. Interestingly, SI-discretized falls behind SI when the attack size exceeds 15%.

4.2 Yahoo! Music data: The impact of missingness mechanisms↩︎

The Yahoo! Music ratings data contain two datasets of users rating songs on a scale of 1 to 5. In the first dataset (User Selected), the users choose the songs they want to rate, whereas in the second dataset (Randomly Selected), 10 songs from a pool of \(p_{2} = 1,000\) are randomly assigned to them. Hence, the former constitutes an MNAR mechanism while the latter represents the MCAR mechanism. To ensure a fair comparison across datasets, we restrict our analysis to the \(n = 2,361\) users who are present in both datasets and rated at least 20 songs in the first dataset. Furthermore, we limit the first dataset to the \(p_{1} = 998\) songs with at least 20 ratings. In both datasets, we have more than 95% of missing values (96% for MNAR and 99% for MCAR). Figure 7 displays the distributions of the observed ratings in the two datasets. The missing data mechanism clearly leads to different rating distributions. When the users can choose the songs they want to evaluate (MNAR, left plot), the distribution of the ratings is almost bimodal with 1-star ratings being the most frequent, followed by 5-star ratings. When the songs to be rated are randomly assigned (MCAR, right plot), the distribution is highly right-skewed with decreasing frequency for increasing ratings, and very few high ratings.

Figure 8: Prediction performance for the Yahoo datasets. The columns correspond to different missing data mechanisms.

Figure 9: Computation time in seconds for the Yahoo datasets, averaged over a grid of ten values for \lambda and in case of RDMC also across the three considered loss functions. The columns correspond to different missing data mechanisms. — Figure 9: Computation time in seconds for the Yahoo datasets, averaged over a grid of ten values for \(\lambda\) and in case of RDMC also across the three considered loss functions. The columns correspond to different missing data mechanisms.

Figure 8 shows the MAE on the test set for both missing data mechanisms. As expected, the MAE of all methods is lower in case of MCAR compared to MNAR. Overall, RDMC outperforms the SI approaches in both scenarios. For RDMC, the loss function and the number of iterations seem to have an effect only for the MNAR scenario, with a small advantage for the absolute loss function and perhaps a minor improvement with the liberal stopping criterion. For SI, the discretized variant indicates only a minimal improvement over the standard algorithm, but the stopping criterion has a major impact for the MNAR scenario: the strict stopping criterion is required to ensure performance similar to RDMC. Nevertheless, as illustrated in Figure 9, RDMC with the liberal stopping criterion remains competitive with SI in terms of computation time.

5 Conclusions and discussion↩︎

In a recent review paper, [1] conclude that too few statisticians are engaged in research on recommender systems despite their potential to make consequential contributions. But to fulfill this potential, it is vital that recommender systems are studied under more realistic scenarios. Our simulations and case studies may thereby provide a blueprint—or at least useful starting values—on how to design such studies.

Moreover, we present a novel method for robust matrix completion with discrete rating-scale data (RDMC), as well as an extension of the popular method Soft-Impute (SI) with a post-hoc discretization step (SI-discretized). RDMC thereby combines a robust loss function on the errors for the observed ratings with a discreteness constraint on the predictions and a low-rank constraint on an ancillary continuous matrix. Extensive simulations and two empirical case studies focused on attacks with fake user profiles and missing not at random (MNAR) mechanisms demonstrate that RDMC protects well against adversarial manipulation, while paying only a small price in terms of performance in the absence of corrupted observations compared to SI-discretized. The choice among the three considered loss functions does not yield considerable differences in performance, with the pseudo-Huber loss being recommended overall due to in general requiring fewer iterations for convergence. Moreover, performing only a small number of iterations (rather than iterating until convergence) can reduce computational effort, in many cases without a detrimental effect on performance. Regarding Soft-Impute, the discretization step results in an improvement over its standard variant in the absence of corrupted observations, in particular in settings with few rating categories (five or less), but it remains equally vulnerable to fake profiles.

Our method shows promise for broader applicability across diverse domains. For instance, empirical research in social and behavioral sciences relies heavily on rating-scale surveys, but listwise deletion remains the predominant method of handling missing data in some fields [60]. Preliminary simulations suggest that RDMC represents a notable advancement in robust imputation of rating-scale data (see Appendix 7), even in the presence of inattentive respondents or bot respondents—a critical issue in online surveys. Furthermore, our approach holds potential for contexts involving so-called strategic missing data situations [61]—such as financial reporting, college admissions, job applications, and marketing—where individuals may intentionally withhold information to achieve favorable outcomes. Finally, recent research has shown a notable rise in matrix completion methods for panel data [49], [62], [63], providing a promising direction for future adaptation of our robust methodology.

Data availability↩︎

The MovieLens 100K data are publicly available from https://grouplens.org/datasets/movielens/100k/. The Yahoo! Music data are available through the Yahoo! Webscope data sharing program at https://webscope.sandbox.yahoo.com/catalog.php?datatype=r. Various datasets are available through this link; we use the dataset titled “R3 - Yahoo! Music ratings for User Selected and Randomly Selected songs, version 1.0”, downloaded on September 9, 2024.

6 Additional simulation results↩︎

Figure 10 contains preliminary results for RDMC with different loss functions from 50 replications of the simulated recommender system with five rating categories, where the left plot displays the MAE for the setting without an attack and the right plot presents the MPS for different attack schemes. All three loss functions yield similar results, while the pseudo-Huber loss typically requires fewer iterations to converge (see Figure 11).

Figure 10: Preliminary results for RDMC with different loss functions for the simulated recommender system with five rating categories. The left plot shows the mean absolute error in the setting without an attack, while the right plot displays the mean prediction shift for different attacks. The rows correspond to different missing data mechanisms..

Figure 11: Number of iterations for RDMC in preliminary simulations of a recommender system with five rating categories. The rows correspond to different missing data mechanisms and the columns to different attacks.

Figure 12 shows the MPS under different attacks for the simulated recommender system in the MCAR setting. The results are qualitatively similar to those of the MNAR setting, but a notable difference is that there is a small number of instances where RDMC yields a relatively large negative prediction shift. A possible explanation why this occurs in the MCAR setting but not in the MNAR setting is that in the former, all variables including the target variable contain equally sparse information. In practice, however, this is a rather unrealistic setting.

Figure 12: Results for the simulated recommender system with attacks in the MCAR setting. The rows correspond to different attacks and the columns to different numbers of rating categories. Results for mode imputation were unstable and are therefore omitted.

7 Rating-scale surveys↩︎

7.1 Introduction↩︎

Another relevant application of rating-scale data, although rarely considered in the literature on matrix completion, are surveys in the social and behavioral sciences. In such surveys, researchers measure each latent construct of interest (e.g., personality traits) by multiple rating-scale items (e.g., respondents may be asked how accurately certain adjectives describe their personality, with response categories ranging from 1 = “very accurate” to 5 = “very accurate”; [64]). Missing values are commonly occurring due to nonresponse in parts of the survey, with nonresponse often being missing not at random (MNAR). For instance, participants may abandon the survey yielding higher probability of missingness for later survey items, or they may not respond to items they find, e.g., confusing or inappropriate. In addition, careless responding (i.e., responses not based on item content due to inattention or misunderstanding; e.g., [65]) or bot responding [66] are common occurrences in online surveys, and a prevalence as low as 5% can invalidate research findings [64], [67]–[69].

In some fields, removing participants with missing responses is still the most common method of handling missing responses in survey data [60]. Yet this constitutes a loss of valuable information (the observed responses of the removed participants) and generally results in biased analyses [70]. Furthermore, the common presence of careless respondents or bot respondents requires imputation methods designed for discrete rating-scale data. We therefore investigate the use of the proposed matrix completion method RDMC in preliminary simulations motivated by surveys in the social and behavioral sciences.

7.2 Simulations↩︎

7.2.1 Data generation↩︎

We simulate rating-scale responses of \(n = 300\) participants to a survey on \(q = 10\) latent constructs, with each construct being measured by \(r \in \{4, 8\}\) items such that the total number of items is \(p \in \{40, 80\}\). For this purpose, we first simulate the underlying latent sentiments \(\boldsymbol{Z}\) from a multivariate normal distribution \(N(\boldsymbol{0}, \boldsymbol{\Sigma})\), where \(\boldsymbol{\Sigma}\) follows a block-Toeplitz structure with 1 on the diagonal and the remaining elements being given by \(\rho_{kl} = 0.6^{|k-l|+1}\) with \(k = 1, \dots, q\) and \(l = 1, \dots, q\) denoting the row and column indices of the blocks in \(\boldsymbol{\Sigma}\).

In survey-based research, skewness towards one side of the rating scale is common, for instance, in organizational and management research [71], [72]. Hence, we generate items with skewed distributions by adding random mean shifts to the columns of \(\boldsymbol{Z}\)—with all items within the same construct receiving the same mean shift—before discretization into \(K \in \{5, 7, 9\}\) answer categories interpreted as values \(\{1, \dots, K\}\). Similar to the previous simulations, the mean shifts are drawn from the interval \([0, s_{\text{max}}]\), where \(s_{\text{max}}\) is chosen to result in 40% of the responses being expected in the highest answer category.

The resulting continuous data matrix is then discretized using equispaced breakpoints \(-K/2 + 1, -K/2 + 2, \dots, K/2 -2, K/2 -1\). To simulate reverse-keyed items, which are commonly used in practice to help detect response inattention, half the items of each construct are reversed by reassigning answer category 1 to \(K\), 2 to \(K-1\), and so on. Note that this implies that the data contains both left-skewed and right-skewed items. As is common in the behavioral sciences, we randomize the order of the items in the survey (with the same order being used for all participants), resulting in the final rating-scale data matrix \(\boldsymbol{R}\).

7.2.2 Missing values and careless responding↩︎

While online data collection tools can be set up to require participants to respond to individual questions in order to progress in the survey, participants may abandon the survey altogether. We therefore inject missing values into the data matrix \(\boldsymbol{R}\) by simulating respondents who stop responding at some point in the survey. We set the proportion of respondents who abandon the survey to \(\gamma \in \{0.2, 0.4, 0.6\}\). For each of those respondents, we randomly select the item at which they stop responding, with all responses from this item onwards being replaced by missing values. We emphasize that here \(\gamma\) does not indicate the proportion of missing cells in \(\boldsymbol{R}\), but the number of rows that contain missing values.

To simulate careless respondents, a proportion \(\varepsilon \in \{0, 0.1, 0.2\}\) of the rows in \(\boldsymbol{R}\) are replaced with careless respondents who randomly select from the two extreme answer categories \(\{1, l\}\). Although this is a common careless response style [54], [73], it may not be realistic that all careless respondents behave in this way. Yet we chose this careless response style because the high leverage of the responses should be particularly influential for matrix completion methods. Note that our simulation allows for the situation that a careless respondent abandons the survey such that a certain row may contain both careless responses and missing values.

7.2.3 Evaluation criteria↩︎

We compare the methods by computing the mean absolute error (MAE) over the predictions of missing cells as in 5 , but excluding missing cells in rows corresponding to careless respondents. The reason for excluding the latter is that in a practical setting, careless respondents should be handled by detection and removal, or better by applying robust methods that downweight such outlying observations [54].

7.2.4 Results↩︎

We focus on the results from Figures 13 and 14 for the settings with 20% of respondents who abandon the survey, as the results for 40% and 60% are similar (see Figures 15–18).

Box plots for the MAE in the simulated survey with \(p=40\) items are shown in Figure 13. In general, differences between the methods are smaller than in the simulations for recommender systems. On the other hand, we now see more differences among RDMC with different loss functions. Most notably, RDMC with the pseudo-Huber function remains close to SI-discretized in the absence of careless responding, while RDMC with the truncated absolute loss yields the best results for 20% careless respondents. In addition, with a higher number of answer categories, robustness benefits of RDMC over SI become more apparent. For instance, with 10% of careless respondents, RDMC and SI-discretized perform similarly for five answer categories, whereas all three variants of RDMC outperform SI-discretized in case of seven or nine answer categories.

Figure 13: Results for the simulated survey with p = 40 variables and 20% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents. — Figure 13: Results for the simulated survey with \(p = 40\) variables and 20% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents.

Increasing the number of items to \(p=80\), Figure 14 contains the corresponding box plots of the MAE. We now observe bigger differences between the methods. For all settings regarding the number of answer categories, the MAE of RDMC remains in between that of the two SI approaches in case of no careless responding, while RDMC yields better performance in the presence of careless respondents. As previously, the pseudo-Huber loss is preferable in the absence of careless responding (with an MAE close to that of SI-discretized), while the truncated absolute loss is the most robust for larger prevalence of careless respondents.

Figure 14: Results for the simulated survey with p = 80 variables and 20% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents. — Figure 14: Results for the simulated survey with \(p = 80\) variables and 20% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents.

Finally, we included variants of RDMC with the three loss functions for which we perform at most 10 iterations. Here, the MAE is at best similar but in most cases slightly higher compared to iterating until convergence. For simplicity, we thus omit these variants from the figures.

In summary, the choice of loss function seems to play a more important role in a survey context than in our simulations of recommender systems (see Section 3). The pseudo-Huber loss is thereby recommended if one expects relatively few careless respondents, whereas the truncated absolute loss is recommended if one expects a sizable number of careless respondents.

Figure 15: Results for the simulated survey with p = 40 variables and 40% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents. — Figure 15: Results for the simulated survey with \(p = 40\) variables and 40% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents.

Figure 16: Results for the simulated survey with p = 80 variables and 40% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents. — Figure 16: Results for the simulated survey with \(p = 80\) variables and 40% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents.

Figure 17: Results for the simulated survey with p = 40 variables and 60% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents. — Figure 17: Results for the simulated survey with \(p = 40\) variables and 60% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents.

Figure 18: Results for the simulated survey with p = 80 variables and 60% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents. — Figure 18: Results for the simulated survey with \(p = 80\) variables and 60% of respondents who abandon the survey. The rows correspond to different number of answer categories and the columns to different proportions of careless respondents.

References↩︎

[1]

LeBlanc, P. M., Banks, D., Fu, L., Li, M., Tang, Z., & Wu, Q. (2024). Recommender systems: A review. Journal of the American Statistical Association, 119(545), 773–785. https://doi.org/10.1080/01621459.2023.2279695.

[2]

Van Roy, B. & Yan, X. (2010). Manipulation robustness of collaborative filtering. Management Science, 56(11), 1911–1929. https://doi.org/10.1287/mnsc.1100.1232.

[3]

Gunes, I., Kaleli, C., Bilge, A., & Polat, H. (2014). Shilling attacks against recommender systems: A comprehensive survey. Artificial Intelligence Review, 42(4), 767–799. https://doi.org/10.1007/s10462-012-9364-9.

[4]

Si, M. & Li, Q. (2020). Shilling attacks against collaborative recommender systems: A review. Artificial Intelligence Review, 53(1), 291–319. https://doi.org/10.1007/s10462-018-9655-x.

[5]

Lam, S. K. & Riedl, J. (2004). Shilling recommender systems for fun and profit. Proceedings of the 13th International Conference on World Wide Web, 393–402. https://doi.org/10.1145/988672.988726.

[6]

Adamopoulos, P. (2024). The spillover effect of fraudulent reviews on product recommendations. Management Science, 70(12), 8818–8832. https://doi.org/10.1287/mnsc.2021.02612.

[7]

Federal Trade Commission(2019). FTC brings first case challenging fake paid reviews on an independent retail website. https://www.ftc.gov/news-events/news/press-releases/2019/02/ftc-brings-first-case-challenging-fake-paid-reviews-independent-retail-website. Accessed May 26, 2025.

[8]

Competition and Markets Authority(2021). CMA to investigate Amazon and Google over fake reviews. https://www.gov.uk/government/news/cma-to-investigate-amazon-and-google-over-fake-reviews. Accessed May 26, 2025.

[9]

European Commission(2022). Digital services act: Commission welcomes political agreement on rules ensuring a safe and accountable online environment. https://ec.europa.eu/commission/presscorner/detail/en/ip_22_2545. Accessed May 26, 2025.

[10]

Federal Trade Commission(2024). Trade regulation rule on the use of consumer reviews and testimonials - 16 CFR part 465 - RIN 3084–AB76. Federal Register, 89(163), 68034–68079. https://www.federalregister.gov/documents/2024/08/22/2024-18519/trade-regulation-rule-on-the-use-of-consumer-reviews-and-testimonials.

[11]

Bennett, J. & Lanning, S. (2007). The Netflix prize. Proceedings of KDD Cup and Workshop.

[12]

Achlioptas, D. & McSherry, F. (2001). Fast computation of low rank matrix approximations. STOC ’01: Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, 611–618. https://doi.org/10.1145/380752.380858.

[13]

Srebro, N., Rennie, J., & Jaakkola, T. (2004). Maximum-margin matrix factorization. Advances in Neural Information Processing Systems, volume 17.

[14]

Rennie, J. D. & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. ICML ’05: Proceedings of the 22nd International Conference on Machine Learning, 713–719. https://doi.org/10.1145/1102351.1102441.

[15]

Candès, E. & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772. https://doi.org/10.1007/s10208-009-9045-5.

[16]

Candès, E. J. & Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5), 2053–2080. https://doi.org/10.1109/TIT.2010.2044061.

[17]

Davenport, M. A. & Romberg, J. (2016). An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing, 10(4), 608–622. https://doi.org/10.1109/JSTSP.2016.2539100.

[18]

Nguyen, L. T., Kim, J., & Shim, B. (2019). Low-rank matrix completion: A contemporary survey. IEEE Access, 7, 94215–94237. https://doi.org/10.1109/ACCESS.2019.2928130.

[19]

Srebro, N. & Jaakkola, T. (2003). Weighted low-rank approximations. ICML ’03: Proceedings of the 20th International Conference on Machine Learning, 720–727.

[20]

Schiavon, L., Nipoti, B., & Canale, A. (2024). Accelerated structured matrix factorization. Journal of Computational and Graphical Statistics, 33(3), 917–927. https://doi.org/10.1080/10618600.2023.2301072.

[21]

Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11(80), 2287–2322.

[22]

Cai, J.-F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982. https://doi.org/10.1137/080738970.

[23]

Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1), 177–214. https://doi.org/10.1214/14-AOS1272.

[24]

Lei, Y. & Zhou, D.-X. (2019). Analysis of singular value thresholding algorithm for matrix completion. Journal of Fourier Analysis and Applications, 25(6), 2957–2972. https://doi.org/10.1007/s00041-019-09688-8.

[25]

Mohan, K. & Fazel, M. (2012). Iterative reweighted algorithms for matrix rank minimization. Journal of Machine Learning Research, 13(110), 3441–3473.

[26]

Hastie, T., Mazumder, R., Lee, J. D., & Zadeh, R. (2015). Matrix completion and low-rank SVD via fast alternating least squares. Journal of Machine Learning Research, 16(104), 3367–3402.

[27]

Cho, J., Kim, D., & Rohe, K. (2019). Intelligent initialization and adaptive thresholding for iterative matrix completion: Some statistical and algorithmic theory for adaptive-impute. Journal of Computational and Graphical Statistics, 28(2), 323–333. https://doi.org/10.1080/10618600.2018.1518238.

[28]

Chen, Z., Yang, Y., & Yao, F. (2024). Dynamic matrix recovery. Journal of the American Statistical Association, 119(548), 2996–3007. https://doi.org/10.1080/01621459.2023.2297468.

[29]

Huang, J., Nie, F., & Huang, H. (2013). Robust discrete matrix completion. Proceedings of the AAAI Conference on Artificial Intelligence, volume 27, 424–430. https://doi.org/10.1609/aaai.v27i1.8675.

[30]

Huo, Z., Liu, J., & Huang, H. (2016). Optimal discrete matrix completion. Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 1687–1693. https://doi.org/10.1609/aaai.v30i1.10300.

[31]

Nguyen, D. M., Tsiligianni, E., & Deligiannis, N. (2018). Learning discrete matrix factorization models. IEEE Signal Processing Letters, 25(5), 720–724. https://doi.org/10.1109/LSP.2018.2823268.

[32]

Tatsukawa, M. & Tanaka, M. (2018). Box constrained low-rank matrix approximation with missing values. Proceedings of the 7th International Conference on Operations Research and Enterprise Systems - ICORES, 78–84. https://doi.org/10.5220/0006612100780084.

[33]

Iimori, H., de Abreu, G. T. F., Taghizadeh, O., & Ishibashi, K. (2020). Discrete-aware matrix completion via proximal gradient. 2020 54th Asilomar Conference on Signals, Systems, and Computers, 1327–1332. https://doi.org/10.1109/IEEECONF51394.2020.9443404.

[34]

Bertsimas, D. & Li, M. L. (2023). Interpretable matrix completion: A discrete optimization approach. INFORMS Journal on Computing, 35(5), 952–965. https://doi.org/10.1287/ijoc.2022.0022.

[35]

Bond, T. N. & Lang, K. (2019). The sad truth about happiness scales. Journal of Political Economy, 127(4), 1629–1640. https://doi.org/10.1086/701679.

[36]

Avella Medina, M. & Ronchetti, E. (2015). Robust statistics: A selective overview and new directions. WIREs Computational Statistics, 7(6), 372–393. https://doi.org/10.1002/wics.1363.

[37]

Lee, J.-S. & Zhu, D. (2012). Shilling attack detection—a new approach for a trustworthy recommender system. INFORMS Journal on Computing, 24(1), 117–131. https://doi.org/10.1287/ijoc.1100.0440.

[38]

Chen, Y., Jalali, A., Sanghavi, S., & Caramanis, C. (2013). Low-rank matrix recovery from errors and erasures. IEEE Transactions on Information Theory, 59(7), 4324–4337. https://doi.org/10.1109/TIT.2013.2249572.

[39]

Nie, F., Wang, H., Huang, H., & Ding, C. (2015). Joint Schatten \(p\)-norm and \(\ell_p\)-norm robust matrix completion for missing value recovery. Knowledge and Information Systems, 42(3), 525–544. https://doi.org/10.1007/s10115-013-0713-z.

[40]

Cambier, L. & Absil, P.-A. (2016). Robust low-rank matrix completion by Riemannian optimization. SIAM Journal on Scientific Computing, 38(5), S440–S460. https://doi.org/10.1137/15M1025153.

[41]

Elsener, A. & van de Geer, S. (2018). Robust low-rank matrix estimation. The Annals of Statistics, 46(6B), 3481–3509. https://doi.org/10.1214/17-AOS1666.

[42]

Shang, P. & Kong, L. (2021). Regularization parameter selection for the low rank matrix recovery. Journal of Optimization Theory and Applications, 189(3), 772–792. https://doi.org/10.1007/s10957-021-01852-9.

[43]

Zhao, L., Babu, P., & Palomar, D. P. (2016). Efficient algorithms on robust low-rank matrix completion against outliers. IEEE Transactions on Signal Processing, 64(18), 4767–4780. https://doi.org/10.1109/TSP.2016.2572049.

[44]

Zhang, F., Lu, Y., Chen, J., Liu, S., & Ling, Z. (2017). Robust collaborative filtering based on non-negative matrix factorization and \(R_1\)-norm. Knowledge-Based Systems, 118, 177–190. https://doi.org/10.1016/j.knosys.2016.11.021.

[45]

Ruppel, F., Muma, M., & Zoubir, A. M. (2020). Globally optimal robust matrix completion based on M-estimation. 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. https://doi.org/10.1109/MLSP49062.2020.9231573.

[46]

Tang, L. & Guan, W. (2020). Robust matrix completion with complex noise. Multimedia Tools and Applications, 79(3–4), 2703–2717. https://doi.org/https://doi.org/10.1007/s11042-019-08430-2.

[47]

Wang, B. & Fan, J. (2025). Robust matrix completion with heavy-tailed noise. Journal of the American Statistical Association, 120(550), 922–934. https://doi.org/10.1080/01621459.2024.2375037.

[48]

Mao, X., Chen, S. X., & Wong, R. K. (2019). Matrix completion with covariate information. Journal of the American Statistical Association, 114(525), 198–210. https://doi.org/10.1080/01621459.2017.1389740.

[49]

Choi, J. & Yuan, M. (2024). Matrix completion when missing is not at random and its applications in causal panel data models. Journal of the American Statistical Association. Forthcoming. https://doi.org/10.1080/01621459.2024.2380105.

[50]

Xu, Y., Zhuang, F., Wang, E., Li, C., & Wu, J. (2025). Learning WithoutMissing-At-RandomPriorPropensity-AGenerativeApproach for RecommenderSystems. IEEE Transactions on Knowledge and Data Engineering, 37(2), 754–765. https://doi.org/10.1109/TKDE.2024.3490593.

[51]

Alfons, A. & Archimbaud, A. (2025). : Lab for Matrix Completion and Imputation of Discrete Rating Data. package version 0.1.0. https://CRAN.R-project.org/package=RMCLab.

[52]

(2025). : A Language and Environment for Statistical Computing. Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

[53]

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122. https://doi.org/10.1561/2200000016.

[54]

Alfons, A. & Welz, M. (2024). Open science perspectives on machine learning for the identification of careless responding: A new hope or phantom menace? Social and Personality Psychology Compass, 18(2), e12941. https://doi.org/10.1111/spc3.12941.

[55]

Welz, M. & Alfons, A. (2025). When respondents don’t care anymore: Identifying the onset of careless responding. :2303.07167. https://doi.org/10.48550/arXiv.2303.07167.

[56]

Harper, F. M. & Konstan, J. A. (2015). The MovieLens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4), 19. https://doi.org/10.1145/2827872.

[57]

Mobasher, B., Burke, R., Bhaumik, R., & Williams, C. (2007). Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness. ACM Transactions on Internet Technology (TOIT), 7(4), 23–es. https://doi.org/10.1145/1278366.1278372.

[58]

Turk, A. M. & Bilge, A. (2019). Robustness analysis of multi-criteria collaborative filtering algorithms against shilling attacks. Expert Systems with Applications, 115, 386–402. https://doi.org/10.1016/j.eswa.2018.08.001.

[59]

Mazumder, R., Saldana, D., & Weng, H. (2020). Matrix completion with nonconvex regularization: Spectral operators and scalable algorithms. Statistics and Computing, 30(4), 1113–1138. https://doi.org/10.1007/s11222-020-09939-5.

[60]

Peng, J., Hahn, J., & Huang, K.-W. (2023). Handling missing values in information systems research: A review of methods and assumptions. Information Systems Research, 34(1), 5–26. https://doi.org/10.1287/isre.2022.1104.

[61]

Zhang, S. & Wang, M. (2019). Correction of corrupted columns through fast robust hankel matrix completion. IEEE Transactions on Signal Processing, 67(10), 2580–2594. https://doi.org/10.1109/TSP.2019.2904021.

[62]

Athey, S., Bayati, M., Doudchenko, N., Imbens, G., & Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536), 1716–1730. https://doi.org/10.1080/01621459.2021.1891924.

[63]

Bai, J. & Ng, S. (2021). Matrix completion, counterfactuals, and factor analysis of missing data. Journal of the American Statistical Association, 116(536), 1746–1763. https://doi.org/10.1080/01621459.2021.1967163.

[64]

Arias, V. B., Garrido, L., Jenaro, C., Martinez-Molina, A., & Arias, B. (2020). A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52(6), 2489–2505. https://doi.org/https://doi.org/10.3758/s13428-020-01401-8.

[65]

Ward, M. & Meade, A. W. (2023). Dealing with careless responding in survey data: Prevention, identification, and recommended best practices. Annual Review of Psychology, 74(1), 577–596. https://doi.org/10.1146/annurev-psych-040422-045007.

[66]

Storozuk, A., Ashley, M., Delage, V., & Maloney, E. A. (2020). Got bots? Practical recommendations to protect online survey data from bot attacks. The Quantitative Methods for Psychology, 16(5), 472–481. https://doi.org/10.20982/tqmp.16.5.p472.

[67]

Credé, M. (2010). Random responding as a threat to the validity of effect size estimates in correlational research. Educational and Psychological Measurement, 70(4), 596–612. https://doi.org/10.1177/0013164410366686.

[68]

Welz, M., Archimbaud, A., & Alfons, A. (2024a). How much carelessness is too much? Quantifying the impact of careless responding. PsyArXiv. https://doi.org/10.31234/osf.io/8fj6p.

[69]

Welz, M., Mair, P., & Alfons, A. (2024b). Robust estimation of polychoric correlation. arXiv:2407.18835. https://doi.org/10.48550/arXiv.2407.18835.

[70]

Little, R. J. & Rubin, D. B. (2019). Statistical Analysis with Missing Data. John Wiley & Sons.

[71]

Becker, T. E., Robertson, M. M., & Vandenberg, R. J. (2019). Nonlinear transformations in organizational research: Possible problems and potential solutions. Organizational Research Methods, 22(4), 831–866. https://doi.org/10.1177/1094428118775205.

[72]

Alfons, A., Ateş, N., & Groenen, P. (2022). A robust bootstrap test for mediation analysis. Organizational Research Methods, 25(3), 591–617. https://doi.org/10.1177/1094428121999096.

[73]

Baumgartner, H. & Steenkamp, J.-B. E. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(2), 143–156. https://doi.org/10.1509/jmkr.38.2.143.18840.

Parts of this research were done while Aurore Archimbaud was at Erasmus University Rotterdam.↩︎
These choices are motivated by the MovieLens 100K data [56] used in Section 4.1, in which the most complete item contains \(38.2\%\) missing values and the most incomplete item contains more than 99% missing values.↩︎

Robust Matrix Completion for Discrete Rating-Scale Data: Coping with Fake Profiles in Recommender Systems