December 10, 2024
The interplay between cognition and gaming, notably through educational games enhancing cognitive skills, has garnered significant attention in recent years. This research introduces the CogSimulator, a novel algorithm for simulating user cognition in
small-group settings with minimal data, as the educational game Wordle exemplifies. The CogSimulator employs Wasserstein-1 distance and coordinates search optimization for hyperparameter tuning, enabling precise few-shot predictions in new game scenarios.
Comparative experiments with the Wordle dataset illustrate that our model surpasses most conventional machine learning models in mean Wasserstein-1 distance, mean squared error, and mean accuracy, showcasing its efficacy in cognitive enhancement through
tailored game design.
Keywords: Artificial Intelligence, Education, Group Behaviour, Skill acquisition and learning, Computational Modeling.
In recent years, the application of artificial intelligence has expanded across various fields such as music [1], gaming [2], and healthcare [3],
demonstrating its broad impact and potential. Concurrently, the relationship between cognition and games has become a hot topic, particularly in research focused on using games to assess cognitive abilities and explore whether games can enhance cognitive
skills [4]. At the same time, the relationship between cognitive level and education is also the focus of attention, where cognitive ability is
considered a crucial predictor of education and socioeconomic achievement, especially regarding strategies and methods in student learning, individual differences, etc [5], [6].
In terms of using games to help measure cognitive ability, games can test different aspects of cognitive function, including aspects such as memory and attention, which can assist in diagnosing disease [7]. At the same time, games are thought to be more effective in assessing cognitive abilities, even in improving fairness and user experience [8]. Therefore, games are closely linked to cognitive ability, and in the framework of the importance of cognitive ability in education, the better use of games to
contribute to the development of cognitive ability is a significant agenda. However, given the heterogeneous attributes of participants, a fixed game algorithm may not enhance the cognitive abilities of all individuals consistently [9]. Conventional approaches, encompassing behavior trees [10] and data-driven methodologies [11], attempt to tailor game modes to diverse cognitive profiles of users yet
necessitate substantial data acquisition. Technologies related to artificial intelligence, including Transfer Learning and Convolutional Neural Networks (CNN), are also challenged by over-fitting issues, consequently diminishing the precision in predicting
user cognitive patterns [12].
This work developed the CogSimulator model, which intends to capture and simulate the user’s cognitive level based on a small amount of data. Further, using less data to simulate user groups, the game can implement a more targeted educational game for
specific users. This paper uses the game “Wordle", a word game, to demonstrate how to evaluate and improve cognitive ability with a limited number of users, making the game a more acceptable form for teenagers in learning [13]. It is important to highlight that due to the CogSimulator’s efficient use of data, encompassing common word frequency metrics pervasive across numerous
word-based games, it is well-suited for application to newly developed games or those aimed at niche user groups. By simulating individual player behaviors, the model offers game designers a valuable tool for tailoring game difficulty to match specific
cognitive profiles.
Cognition includes the mental processes of acquiring knowledge and understanding through thought, experience, and the senses and plays a fundamental role in human development and interaction with the environment [14]. This area, crucial for critical thinking, problem-solving skills, and the effective processing and interpretation of information, has attracted
increasing attention in contemporary research, particularly at the intersection of cognition and play [15], [16]. Human research on educational games dates back to 1981 when Malone investigated in his seminal paper how to use the captivating effects of computer games to make learning fun and
interesting [17]. In the mid-1980s, research examined the link between video game play and cognitive performance, and this correlation was
gradually confirmed [18]. However, even using AI-based game-based educational technologies, such as AI applications for learning new skills or
knowledge, may only improve learners’ cognition if they motivate long-term use [19]. Therefore, the challenge is producing a game
design that suits the needs and preferences of the players to ensure the gamers’ cognitive enhancement.
Many academic studies have shown that positive motivation is crucial in enhancing human engagement, which in turn helps promote cognitive improvement [20]–[22]. In this context, it can be broken down participation motivation into two categories: intrinsic participation motivation
(stemming from the game’s intrinsic design elements) and extrinsic participation motivation (stemming from the game’s reward and punishment system) [19]. Although the persistence of intrinsically motivated engagement is remarkable, intrinsic motivation depends on various factors, including player type, specific educational needs, and personal interests [9]. Therefore, if the game mode cannot change with different individuals, the degree of cognitive enhancement may vary widely between
individuals. This variability poses a significant challenge to developing fixed game mechanics that can universally meet the diverse needs of all players [9].
Traditional approaches to creating adaptive and responsive gaming environments, designed to cater to players’ individual needs, have primarily relied on rule-based systems such as finite state machines and behavior trees [23]. These systems provide an easy way to build simple agents that provide different feedback based on user actions [10]. However, while this approach effectively creates a baseline interactive experience, it cannot dynamically adapt to individual players’ subtle and changing
preferences or capabilities. An alternative solution lies in data-driven approaches [11]. This approach entails collecting extensive gameplay data,
such as average completion times, to assess game difficulty and subsequently recommend games of varying difficulty levels to different players. However, this approach relies on substantial data accumulation for accurate predictions, making it challenging
to apply effectively in scenarios requiring rapid adaptation to new tasks. As for artificial intelligence technology, most of the gamification research that appeared in the past decade from 2010 to 2020 failed to provide a structured overview of game
elements such as NPCs working with artificial intelligence [24]. Until 2023, Generative Agents realized the simulation of human
behavior based on large language models (LLMs), even in zero-shot scenarios [23]. However, the need for extensive resources poses a
significant obstacle, especially for small educational games in their embryonic stages [25]. For these emerging games, the
high threshold of data and computing requirements makes it challenging to fully exploit the potential of AI-driven interactive elements. In contrast, models such as CNN may cause overfitting problems due to insufficient data volume. Therefore, while
advances in artificial intelligence technology offer promising avenues for enhancing the realism and engagement of game environments, their applicability still needs to be improved in resource-constrained settings.
This work will elucidate and test the model using Wordle as an illustrative case. Wordle is a word-guessing game that epitomizes its players’ diverse needs and individual characteristics, reflecting the unique responses and strategies each person brings to
the game. The goal of Wordle is to guess a five-letter word within six attempts. After each guess, the color gives feedbacks as: Green (the correct letter in the right spot), Yellow (The correct letter but in the wrong position) or Grey (the letter outside
the word) [26]. Unaided players guess words mainly through word recall, largely limited by their vocabulary. Consequently, the CogSimulator was developed to
emulate user cognition utilizing a limited dataset of game records, thereby facilitating the design of game difficulties optimized for cognitive enhancement.
We analyze that traditional machine learning models often face severe overfitting due to insufficient sample sizes, limiting their effectiveness. In contrast, our model excels in explaining word difficulty and result distribution, offering a more robust
solution as detailed in Table [tab:my95label]. Attributes such as word frequency, prevalent across various word-based games, provide a solid foundation for applying our
model to different gaming scenarios. To address these challenges, we have developed a novel sampling simulator that better reflects the gameplay dynamics of the broader population, as illustrated in Figure 1.
The Cogsimulator operates through a process analogous to Markov Chain Monte Carlo (MCMC) as described by Geyer [27], where
parameters progressively converge to a steady state distribution and achieve detailed balance. The model incorporates hyperparameters that capture players’ cognitive processes at each stage of their guessing attempts, alongside the stochastic variations
observed in each trial. Unlike traditional optimization methods that require derivatives of the cost function, our approach uses Coordinate Search Optimization [28], which optimizes parameters coordinate-wise within the hyperparameter space in each iteration, allowing the model to align with the actual trial distributions observed in data eventually. To measure the deviation
between our training set and the outputs generated by the simulation, we employ the Wasserstein-1 distance [29], whose convex properties aid in
steering the algorithm towards the global optimum. This setup ensures our model accurately estimates word difficulty and adeptly classifies the distribution of players’ guessing attempts. Once optimized, the sampling simulator effectively replicates
samples that reflect the average performance of the simulated player population.
Further refining our approach, we estimate the difficulty of newly introduced words by comparing their Wasserstein distance with that of the most straightforward word identified in the training dataset. This comparison uses the generated Wasserstein Metric
to provide a relative measure of word complexity. Consequently, when a new word is introduced to the model, it predicts not only the difficulty of the word but also the expected number of guesses by players. This prediction is based on the collective
cognitive profile of the player group derived from the training data, thereby aiding in selecting words that are optimally challenging yet engaging for players.
As the core component of CogSimulator, the simulator uses a coordinate search method algorithm to tune hyperparameters to match human performance in player simulators automatically. The simulator takes a single word as input and can generate a
distribution of the number of times a user guessed the word, containing the percentage of attempts for each of the seven trial types. Other factors will be incorporated into the neural network alongside word guessing frequency distribution. This
integration will occur through a process of 5-fold cross-validation to ensure a better fit for various real-world factors, thereby yielding the most accurate predictions. To determine whether the player selected a specific word in a step, given a qualified
dictionary that allows him to choose, algorithms 1 and 2 output the words sampled in a round.
In simulating Wordle player choices, the CogSimulator posits that players are more inclined to guess words they encounter more frequently. However, the human capacity to recall words is finite, and the probability of recalling a specific word is not strictly proportional to its word frequency. Thus, we introduce two parameters to adjust for this. The first parameter, \(K\), represents the cognitive limit, or the maximum number of words a person can typically remember at one time. The simulator only considers up to \(K\) most frequent words as viable options for player guesses. The second parameter, \(T\), represents a scaling factor for the frequency of the most common word, providing a baseline for comparison. To calculate the selection probability for each word, we multiply its frequency by \(T\) and then normalize by dividing over the sum of the scaled frequencies for all \(K\) words considered. This method ensures that while the likelihood of selecting highly frequent words is amplified, less common words maintain a non-zero probability of selection proportional to their relative frequency. This nuanced approach balances the natural human tendency to favor familiar words and the game’s challenge to recall less frequent words.
On each trial, the simulator randomly samples words based on their probability of selection [30]. After 1000 samples, a trial distribution of words is generated. At the same time, a coordinate search algorithm will be applied to automatically adjust hyperparameters to improve the fitting accuracy of the simulator. Finally, the simulator can generate seven probability distributions of word guesses containing the percentages for seven categories (1, 2, 3, 4, 5, 6, X). Finally, the model can predict the difficulty of a word for a specific cognitive user through distribution and Wasserstein Metric.
To test the difficulty of a word for a particular cognitive group, we first obtain the distribution of word guessing times for this cognitive group’s best record in past games. Then, by comparing the distance between the distribution of new input words and the distribution of the best record, the difficulty of the word for this cognitive group is judged. For this purpose, we propose using the Wasserstein-1 distance to evaluate the discrepancy between two trial distributions. Let \(p, q\) be two probability distributions on compact spaces. Denote \(\Pi(p, q)\) as the set of all distributions \(\pi(\omega, \omega')\) on \(\mathcal{X} \times \mathcal{X}'\) such that the marginals are \(p(x)\) and \(q(y)\) respectively. Then the Wasserstein-1 distance between \(p\) and \(q\) is \[\begin{align} \label{W1} W_1(p, q)&=\inf _{\pi \in \Pi(p, q)}\int_{\mathcal{X} \times \mathcal{X}}\|\omega-\omega'\| d\pi(\omega, \omega')\\&=\inf _{\pi \in \Pi(p, q)} \underset{(\omega, \omega') \sim \pi}{\mathbb{E}}\left[\|\omega-\omega'\|\right]. \end{align}\tag{1}\]
when 1 applies to discrete sample spaces, let us assume \(\mathcal{X}=\left\{\omega_i\right\}_{i=1}^m\) and \(\mathcal{X}'=\left\{\omega_i^{\prime}\right\}_{i=1}^{m'}\). \(p\) and \(q\) are trial distributions of words \(A\) and \(A*\), respectively. The distance between \(p\) and \(q\) can be obtained by solving the following linear programming problem
\[\begin{align} \label{LP} \mathcal{W}(A, A*) &= \inf_{\{\gamma_{i, j}\}, i, j} \left\{ \sum_{i=1}^m \sum_{j=1}^{m'} \gamma_{i, j}|\omega_i-\omega_j^{\prime}|: \right. \\ &\left. \sum_{i=1}^s \gamma_{i, j}=q_j, \sum_{j=1}^{s^{\prime}} \gamma_{i, j}=p_i, \gamma_{i, j} \geq 0 \right\}. \end{align}\tag{2}\]
In this special case where the support of any trial distribution is the same, i.e. \(m=m'\) with uniform weights, it can be easily shown that Wasserstein-1 distance has a nice closed and compact form \(\frac{1}{m} \sum_{i=1}^m|\omega_{\eta(i)}-\omega'_{\vartheta(i)}|\) where \(\eta\) is a sorting permutation of \(\omega_i\) and \(\vartheta\) is a sorting permutation \(\omega'_j\). The difficulty of word \(A\) is determined by calculating its Wasserstein-1 distance w.r.t trial distribution of the ‘easiest’ in the dataset, which we assume is the word “train". Therefore, we propose the Wasserstein-1 difficulty measure of the form.
\[\label{Diff} \mathcal{W}^*(A):= \mathcal{W}(p_{\text{``train"}}, p_{A}).\tag{3}\]
To obtain parameters to simulate the cognition of the target population, we used Coordinate Search Optimization. The algorithm is as follows:
By limiting the set of search directions to the axes of the input space, the coordinate search/descent technique is an alternative zero-order local approach that addresses the scaling issue seen in traditional local search. The theory is intuitive
[31]: random search was created to simultaneously minimize the mean W-1 discrepancy about all of its parameters,
which takes the form. \[f(T,K)=(\frac{1}{355})\sum_{i=1}^{355}\mathcal{W}(f_{A_i},f^*_{A_i}),\] where \(f_{A_1}\) and \(f^*_{A_1}\) are trial distribution
of target word \(A_i\) from ground truth and simulated result, respectively. Note that the distribution \(f^*_{A_i}\) is a realization from our Wordle simulator given hyper-parameter \(K\) and \(T\). A coordinate-wise algorithm reduces this function to one coordinate or weight at a time or, more generally, to a subset of coordinates or weights at a time while holding the other
coordinates or weights constant. Despite this limitation, these algorithms are much more versatile than random search (in fact, they may be used to solve even medium-sized machine learning issues efficiently in practice) [32], even though they require additional steps to establish approximate minima and restrict the number of descent directions that can be
discovered. These algorithms also act as predicates for a whole line of higher-order coordinate descent techniques, just like they did with the random search strategy.
The model employed in this study utilized Wordle results sourced from Wordle Stats over the full year from January 7, 2022, to December 31, 2022. This one year was chosen to capture longitudinal data, reflecting genuine player interactions throughout different seasons and stages of player development, thereby minimizing pre-selection bias and providing a comprehensive basis for understanding group behavior in word-guessing activities. The results include the distribution of the number of trials it took players to succeed, a critical measure of the game’s difficulty, and player engagement. These details can be found at https://shorturl.at/adeO6. Additionally, to construct a robust model, we integrated a dictionary database comprising five-word English terms from the Google Books Ngram Corpus from 1970 to 2019. This corpus provided the foundational data for calculating each word’s usage frequency, a key determinant of user selection within our predictive framework. We define word frequency as the relative occurrence of a word in the corpus, which is a direct measure of its commonality and presumed familiarity to players, reflecting the likely cognitive effort required for players to guess the word correctly [33].
This algorithm uses Wasserstein-1 distance to evaluate the difference between two trial distributions. Here, the attempt distribution refers to the attempts the user requires to complete the task in the Wordle game. This model aims to determine the difficulty of the word A relative to the word A\(^*\) in the Wordle game. This is done by calculating the Wasserstein-1 distance of word A relative to the overall attempt distribution. Figure 5 shows the difficulty distribution of 355 ground truth words. It shows that the proposed metric is sufficiently consistent, representative of the difficulty realized by records, and mediates between other quantifiers. Clearly, ‘easier’ target words exhibit a trial distribution shifted towards the left on the x-axis, and ‘harder’ target words ( yellow colored) have a right-shifted trail distribution.
To assess the efficacy of the Coordinate Search Optimization algorithm, it is essential to consider the broader statistical profile rather than just a single winning result. This algorithm’s utility lies in its ability to iteratively explore and
optimize a multi-dimensional space by adjusting one coordinate at a time. It is particularly suited for problems with a complex objective function or lack an analytical gradient. The evaluation of this optimization technique is predicated on its capacity
to train generators that output a discrete target trial distribution accurately. We plan to invoke the same number of generative realizations for a robust evaluation as in fixed-length accurate data batches. This approach ensures a fair comparison between
the model’s output and the empirical ground truth. Consistency with the ground truth is then assessed by visualizing the distribution of densities using the same projection of functional PCA [34], as depicted in Figure 6 (a). Further, we solidify our statistical analysis by constructing an empirical 95% confidence interval for the optimization outcomes,
with the results presented in Figure 6 (b) indicating that the sampling algorithm is robust, as evidenced by the slight variance observed. The predictions and summaries derived from this approach are systematically tabulated
for further scrutiny.
The model is benchmarked with several other machine learning algorithms that comprehensively estimate trial distributions. Note that we are using the attributes to feed as the input; in regression-type benchmarks, the machine learning algorithms output
a \(7\) dimensional vector filled with floating number loadings, whereas in classification tasks, benchmarks output a category. We train these machine learning algorithms under the canonical parameter settings.
Representative methods, namely, linear regression, decision tree regression, random forest regression, and Multiple layer perceptron regressor, are selected for the experiments, and the actual performance of the algorithms is compared by studying the
training and validating performance among these algorithms. For the sake of fairness, hyperparameters and settings for each method are set by default.
The simulator has proven a successful method for predicting the difficulty and distribution of words in the simulator game. The model achieves an accuracy of 87\(\%\), outperforming other machine learning algorithms we have
tried. The results demonstrate that the simulator is effective in predicting the distribution of future reports and the difficulty and distribution of specific words in Wordle.
Overall, the simulator significantly improves the prediction of word difficulty and distribution in Wordle. Its superior performance compared to other machine learning algorithms highlights the effectiveness of the Convolutional Networks under the
Wasserstein training approach. This work proposes a robust model for predicting the difficulty of educational games tailored to specific cognitive groups, ultimately enhancing overall game playability and engagement. The results of different models are
shown in the table [tab:my95label].
This study introduces the CogSimulator, designed to simulate cognitive distributions in contexts with limited sample sizes, exemplified by its application to Wordle. The model competes against machine learning metrics such as Wasserstein-1 distance,
mean squared error, and mean accuracy. Leveraging the universal relevance of word frequency attributes, the CogSimulator shows promising generalization across word-based games, suggesting significant potential impacts on cognitive game development for
niche user groups. Despite its strengths, the model tends to represent an “average" player, which may not reflect the diversity in player strategies and cognitive processes, particularly where data distributions are multimodal.
Future work will enhance this model by integrating clustering algorithms to detect and model unique player profiles, thereby better capturing the diversity of players. Plans also include integrating player feedback as a dynamic reward mechanism and
developing an advanced parameter optimization method for dynamic loss functions to suit simulation tasks better and extend applicability to a broader range of educational games.
ChalnickBillman1988a? Feigenbaum1963a? Hill1983a? OhlssonLangley1985a? Matlock2001? NewellSimon1972a? ShragerLangley1990a?
\(\dagger\) Corresponding author. Email: aobo.wang@nus.edu.sg↩︎