Estimation of Resistance Training RPE using
Inertial Sensors and Electromyography


Abstract

Accurate estimation of rating of perceived exertion (RPE) can enhance resistance training through personalized feedback and injury prevention. This study investigates the application of machine learning models to estimate RPE during single-arm dumbbell bicep curls, using data from wearable inertial and electromyography (EMG) sensors. A custom dataset of 69 sets and over 1000 repetitions was collected, with statistical features extracted for model training. Among the models evaluated, a random forest classifier achieved the highest performance, with 41.4% exact accuracy and 85.9% \(\pm1\) RPE accuracy. While the inclusion of EMG data slightly improved model accuracy over inertial sensors alone, its utility may have been limited by factors such as data quality and placement sensitivity. Feature analysis highlighted eccentric repetition time as the strongest RPE predictor. The results demonstrate the feasibility of wearable-sensor-based RPE estimation and identify key challenges for improving model generalizability.

Rating of perceived exertion (RPE), electromyography (EMG), inertial sensors.

1 Introduction↩︎

Over the last decade, science-supported approaches to strength training optimization have advanced considerably [1], which, in turn, has increased the demand for research on resistance training. In parallel, the awareness and application of machine learning (ML) and artificial intelligence in sports have expanded significantly, with demonstrated use cases including exercise classification, rehabilitation monitoring, and performance assessment [2]. The convergence of these two areas presents a valuable research opportunity. In this study, we investigate ML methods for using inertial measurements to estimate the rating of perceived exertion (RPE), while also examining the potential role of electromyography (EMG) data during the training phase.

RPE is a key measure in resistance training, quantifying the perceived intensity of exercise. The term “perceived” is central, as it implies inherent uncertainty and subjectivity. Several scales exist to measure RPE, such as the Borg Scale (6–20) [3]; however, one of the most intuitive and widely used scales is the Borg CR10 scale [3]. Defined by Gunnar Borg in 1982, this scale ranges from 1 to 10, where 1 indicates no exertion and 10 represents absolute failure. Generally, values above 6 are recognized as representing difficult exercise [4]. Given its relevance and widespread adoption in resistance training, this paper uses the Borg CR10 scale.

Intensity is widely regarded as a key factor influencing muscular hypertrophy [5], defined as an increase in the cross-sectional area of muscle [6]. In bodybuilding, training intensity is critical for maximizing muscle growth. However, intensity is equally important in strength-focused training, such as powerlifting, as higher intensities promote neural adaptations, improvements in the rate of force development, and strength gains. A detailed understanding of intensity is therefore essential for designing effective training programs that optimize both hypertrophy and strength outcomes. If trainees misunderstand the scale, they could be at risk of either poor training results, or significant and lasting injury [7]. An estimation system mitigates these risks by removing a degree of ambiguity surrounding the measure. This is particularly important in the current digital era, where personal training is increasingly delivered online [8]. In such settings, the ability to remotely monitor and regulate effort has become critical, making an automated system for accurately estimating RPE especially valuable.

A range of studies have investigated the use of ML for estimating RPE. However, most have focused on cardiovascular exercise rather than resistance training. For example, Carey et al. explored exertion estimation in Australian football players using wearable accelerometers, GPS receivers, and heart rate monitors [9]. Whilst numerous studies have investigated exertion estimation using inertial or physiological sensors, research specifically focused on estimating RPE directly from EMG signals, particularly using machine learning and wearable EMG systems, remains limited. This gap is, for example, evident in the PERSIST dataset [10], which integrates inertial sensors, heart rate monitors, and electrocardiography sensors for resistance training, but does not include EMG data.

In this study, wearable inertial measurement units (IMUs) and surface electromyography (SEMG) sensors are employed to capture movement and muscle activity during resistance training. These sensors were selected due to their non-invasive nature, ease of use, and, in the case of IMUs, their ubiquity in modern wearable devices, making them well-suited for real-world and long-term applications. A novel EMG- and IMU-based dataset of resistance training repetitions is presented, and multiple ML models are evaluated for RPE estimation. EMG data is used only during the training phase to generate labels and inform feature selection. Specifically, extracted EMG features are used to encode labels via dimensionality reduction techniques, and models are trained to estimate these labels using IMU data. During testing, only IMU data is provided as input, reflecting real-world deployment conditions where EMG data is typically not available. Our contributions are threefold:

  1. The first investigation of using EMG signals during training for IMU-based RPE estimation is provided, along with benchmarking of multiple ML models for this task;

  2. Key informative features and limitations are identified, offering insights for the design of future wearable-sensor-based exertion monitoring systems;

  3. A novel EMG- and IMU-based resistance training dataset is made publicly available at
    https://doi.org/10.5281/zenodo.17259403, to support reproducibility and future research.

2 Data Collection↩︎

Figure 1: Delsys sensor units attached to a participant.

Data was collected from five participants, all male. The participants were aged 18–25 and had a minimum of two years of resistance training experience, as well as at least one year of familiarity with the Borg CR10 scale to ensure accurate RPE reporting. Additional inclusion criteria included the absence of recent injuries or physical impairments, guaranteeing consistency in biomechanical movement during data collection. To adhere to anonymity constraints, participants were given pseudo-anonymised IDs created by generating a random letter and then a random 3 digit number. This ID was then tied to data records for identification if participants requested the removal of their data. Further details of participants are presented in Table [tab:participant-info].

Participant information with anonymised IDs

Figure 2: Repetition count per participant.
Figure 3: Distribution of collected RPE values.
Figure 4: Data processing pipeline.

Data was collected from the single-arm dumbbell bicep curl exercise. This exercise was selected due to its universal nature, meaning it is easy for participants to perform consistently and correctly. The bicep curl also produces a large and easily distinguishable range of motion at the wrist, where wearable exercise trackers are typically worn. EMG and IMU data was recorded using the Delsys Trigno Wireless EMG System [11], sampling EMG and IMU data at 2148.1 Hz and 370.4 Hz, respectively. Two sensor units were attached to the participant’s bicep and wrist using non-invasive double-sided tape, as shown in Figure 1. The wrist unit collected IMU data, as a consumer wearable device would. The bicep unit collected EMG data, since the bicep was the muscle being targeted. The wrist sensor was consistently placed on the outer wrist, positioned horizontally midway between the distal ends of the ulna and radius bones. The bicep sensor was placed on the belly of the biceps muscle, along the midline. During data collection, participants performed repetitions of the exercise and verbally reported their RPE after every repetition. The RPE values were recorded in a spreadsheet and matched to repetitions using unique set IDs, via a Python script. The unique set ID was set using the format of userID_weight_setnum, for example, the 9th recorded set with a 15 kg weight for participant A321 was stored as A321_15_9.

Participants performed natural sets of bicep curls, completing as many or as few repetitions as they preferred. This approach allowed the data to reflect typical exercise behavior, ensuring a variety of repetition ranges. Dumbbells of 5 kg, 10 kg, and 15 kg were made available, providing a range of difficulties to elicit different levels of exertion. A wide spectrum of RPE values was targeted, which was naturally achieved due to the normally distributed nature of perceived exertion. In total, 69 sets of exercise were collected, with a total of 1003 repetitions. The number of repetitions collected for each participant is shown in Figure 2, and the distribution of RPEs is shown in Figure 3.

3 Methods↩︎

3.1 Preprocessing↩︎

Figure 4 illustrates the process by which raw sensor data is incorporated into the rep-wise dataset. Each set is segmented into individual repetitions, relevant features are extracted, and the resulting data is stored.

3.1.1 Sampling Rate Matching↩︎

The sampling rates were dynamically determined for each set to ensure accuracy in case of any misconfiguration. To match the lower IMU sampling rate, the EMG signal was first low-pass filtered using a Butterworth filter and then resampled using polynomial interpolation [12]. This ensures that the EMG and IMU signals are aligned, so that each sample corresponds to the same point in time across both sensors, simplifying subsequent processing. Following this, each sensor signal was individually smoothed using a rolling average window [13].

3.1.2 Repetition Segmentation↩︎

Once the sampling rates had been aligned and the signals smoothed, each set of exercises was segmented into individual repetitions. To do this, we first differentiated the accelerometer signal along the axis extending outward from the palm to obtain the jerk, and then identified repetition boundaries and midpoints as instances of zero-crossings in the jerk signal; see Figure 5. This procedure can be viewed as a form of peak detection applied to the accelerometer data. A minimum peak distance was imposed to reduce false positives. To ensure accuracy, repetition midpoints were counted and compared against the ground-truth RPE annotations.

It was decided that the end point of one rep would be the starting point of the next, regardless of whether the trainee took a pause in between reps. The break duration and characteristics of movements during these breaks can indicate fatigue, with longer breaks indicating greater fatigue and, in turn, a higher RPE. As such, it was decided that these breaks should be included in repetitions. Reps were then given a unique ID, the ID being the set ID with the rep number from the set. This aids easy rep identification for testing of RNN models. Once each rep had been given an ID, the RPE for each rep was extracted from the RPE file.

Figure 5: Example of a set marked with repetition boundaries and midpoints.

3.1.3 IMU Feature Extraction↩︎

Following the rep segmentation, features were extracted from the sensor signals for model training. The six IMU data streams enabled the extraction of a wide range of features. Time-based features were computed, including the duration of the concentric (upward) and eccentric (downward) phases of each repetition. For each sensor axis, statistical features such as mean, standard deviation, range, minimum, and maximum were calculated for both concentric and eccentric movements. To quantify the “smoothness” of a repetition, we fitted a polynomial regression to the signal and extracted the coefficient of determination (\(R^2\)) as a feature. Additionally, jerk was computed, from which statistical features were also derived. Gyroscope features were calculated to capture rotational trends, including mean, standard deviation, and \(R^2\). Previous studies have suggested that gyroscope data may be less informative for activity analysis than accelerometer data [14], reducing the need for extensive gyroscope feature extraction. In total, 55 IMU features were extracted.

3.1.4 EMG Feature Extraction↩︎

EMG features were similarly extracted using statistical metrics, such as mean and root mean square (RMS) to reflect the intensity and power of muscle activation, and variance to capture the variability in muscle power output. The number of zero crossings was computed, as fewer zero crossings can indicate fatigue due to a shift toward lower-frequency activity. Peak amplitude was extracted to represent bursts of muscle activation. In total, nine EMG features were extracted. These features were used solely for generating training labels, which served as targets for ML models using IMU features. In practical applications, IMU data can then be used to estimate these EMG-derived labels, which, in turn, may aid the estimation of RPE.

3.2 EMG Labeling↩︎

To incorporate EMG training data into a real-world RPE estimation pipeline, models were developed to encode EMG data into labels. Instead of treating the full set of EMG features as ground truth targets—which would require a separate estimation model for each feature—dimensionality reduction was applied to condense the EMG features into a compact label space. These EMG-derived labels capture the essential structure of the muscle activity while limiting the number of models required, as illustrated in Figure 6.

Figure 6: EMG labeling pipeline.

3.2.1 Dimensionality Reduction↩︎

Three parallel methods for dimensionality reduction were applied: principal component analysis (PCA) [15], t-distributed stochastic neighbour embedding (t-SNE) [16], and uniform manifold approximation and projection (UMAP) [17]. PCA produces explicit numerical components that capture the variance in the data. The first two components (PC1 and PC2) were selected as labels and can be used directly as continuous targets in regression models. By contrast, t-SNE and UMAP aim to preserve neighborhood structure in the data rather than producing interpretable axes. Their embeddings were therefore used as inputs to a clustering algorithm, with the resulting cluster assignments serving as categorical labels. In this way, PCA yielded continuous features, while t-SNE and UMAP produced categorical features after clustering.

3.2.2 Clustering Methods↩︎

To transform the t-SNE and UMAP embeddings into discrete labels, clustering was performed using both k-means (KM) and density-based spatial clustering of applications with noise (DBSCAN). DBSCAN did not provide satisfactory results despite hyperparameter tuning, so development proceeded with KM. The number of clusters \(k\) was selected as \(k=4\) based on optimization of the silhouette score [18]. To address the class imbalance in the clustered labels, the synthetic minority over-sampling technique (SMOTE) [19] was applied during preprocessing. SMOTE generates synthetic samples of minority classes rather than duplicating existing data, reducing overfitting while preserving class diversity. This step was particularly important given the relatively small dataset, where under-sampling would have further reduced the available training data.

3.3 EMG Estimation Models↩︎

Once reps had been assigned EMG labels, models were trained to estimate these labels. For PCA, separate regression models were developed to estimate PC1 and PC2 using random forest (RF), support vector regression (SVR), extreme gradient boosting (XGB), and artificial neural network (ANN) architectures. Labels produced by clustering algorithms were discrete categorical labels, and hence, a classification approach was required. To this end, we used RF, logistic regression (LR), support vector machine (SVM), XGB, and ANN architectures.

3.4 RPE Estimation Models↩︎

RPE is a discrete categorical label, ranging from 1 to 10. Thus, RPE estimation is naturally a classification problem. However, since the labels are ordinal, regression can also be applied. To assess the contribution of EMG features, models will be trained both with and without EMG features, with the latter serving as a baseline. For RPE classification, we will use RF, LR, SVM, XGB and ANN [9], [20], [21]. For RPE regression, we will use RF, support vector regression (SVR), lasso, elasticNet, ridge, ANN, and XGB.

The approach described above aims to estimate RPE individually at each rep. This approach may overlook longer-term dependencies within a set of exercises. Given the presence of sequence-related dependencies, we will also consider the two main recurrent neural network (RNN) architectures: long short-term memory networks (LSTM) and gated recurrent units (GRU). Since GRUs have fewer gates, they generally train faster than LSTMs, though they may sacrifice some of the representational capacity that LSTMs can offer [22]. If GRUs achieve accuracy comparable to LSTMs, they may therefore be the preferable choice. In real-world applications, RPE estimation would likely need to be performed in real time, making the efficiency advantage of GRUs a particularly valuable feature. These architectures can handle both classification and regression outputs, so both approaches will be evaluated during development. During preprocessing, input samples comprising a fixed number of contiguous repetitions were created. The number of repetitions in each sample was treated as a hyperparameter and optimized during training. The final RPE value within each sample was selected as the representative label for that sample. To mitigate the risk of overfitting caused by overlapping input sequences, a jittering strategy was applied, whereby small amounts of random noise were added to duplicated samples. While this augmentation may slightly reduce accuracy on the training set, it helps improve generalization to unseen data.

3.5 Data Processing Pipeline↩︎

Figure 7 shows the proposed pipeline for estimating the RPE of a new sample. The diagram shows the flow of data, with an optional sub-path for EMG labeling if these features are deemed beneficial. Depending on the final model that is selected, the RPE estimator may accept rep-wise or set-wise input.

Figure 7: Data processing pipeline for RPE estimation.

3.6 Model Evaluation Methodology↩︎

A number of evaluation metrics were used. The F1-score was used to evaluate classification model architectures. Accuracy was calculated using a \(\pm1\) tolerance, indicating how often the estimated RPE values were within one unit of the ground truth. The \(R^2\) score was also used. Root mean square error (RMSE) was the primary metric for evaluating the regression architectures. Throughout, models were evaluated using 4-fold cross-validation. For rep-wise models, folds were created by shuffling individual reps. For RNN models, which operated on overlapping sequences of reps, folds were grouped by set to avoid duplicated reps leaking across train and test splits.

3.7 Hyperparameter Selection↩︎

Bayesian optimization, that is, iteratively selecting promising parameter combinations to efficiently improve model performance using Bayesian methods [23], was employed for most hyperparameter tuning. In cases where integrating Bayesian optimization into a model’s workflow proved difficult, random search was used instead.

4 Experimental Results↩︎

4.1 EMG Estimation Models↩︎

Tables 1 and 2 display the performance of the regression models estimating the value of EMG PCA components 1 and 2, respectively. The results for PC1 demonstrate that the XGBoost regressor had the lowest RMSE of 0.6627. The XGB model was therefore selected for producing the estimated labels for RPE model training. The SVR model had a marginally better MAE score, however, the lower RMSE is favoured due to its penalisation of larger errors. RF performed better on the \(R^2\) metric, however, the difference of 0.003 is very small. For PC2 estimation, the ANN model performed best on all metrics, with a final RMSE of 0.2230 and a very strong \(R^2\) of 0.9446. The superior performance observed on PC2 compared with PC1 is likely due to the lower variation in PC2.

Table 1: Performance on EMG PC1
Model MAE MSE RMSE \(R^2\)
ANN 0.3689 0.5157 0.6922 0.5311
RF 0.3257 0.4789 0.6648 0.5693
SVR 0.3250 0.5414 0.6991 0.5274
XGB 0.3309 0.4672 0.6627 0.5664
Table 2: Performance on EMG PC2
Model MAE MSE RMSE \(R^2\)
ANN 0.1677 0.0499 0.2230 0.9446
RF 0.1813 0.1194 0.3314 0.8927
SVR 0.1711 0.0539 0.2318 0.9405
XGB 0.1830 0.1391 0.3412 0.8843

Tables 3 and 4 demonstrate the performance of the EMG classification models estimating the KM generated labels for t-SNE and UMAP representations, respectively. Poorer performing models have been omitted for brevity. As seen in Table 3, the best model architecture for t-SNE was XGBoost, which performed significantly better than other models across all metrics. Similarly, Table 4 shows that for UMAP, RF performed the best, closely followed by XGB. F1-scores of over 0.8, comparable to the accuracy score, show that both models are handling class imbalances well, demonstrating effective use of SMOTE.

Table 3: Performance on EMG t-SNE Labels
Model Accuracy Precision Recall F1
RF 0.8088 0.8125 0.8088 0.8086
XGB 0.8147 0.8174 0.8147 0.8147
ANN 0.8069 0.8156 0.8069 0.8059
Table 4: Performance on EMG UMAP Labels
Model Accuracy Precision Recall F1
RF 0.8396 0.8438 0.8396 0.8400
XGB 0.8312 0.8341 0.8312 0.8313
ANN 0.7882 0.8082 0.7882 0.7905

4.2 RPE Estimation Models↩︎

4.2.1 Classification Models↩︎

A subset of cross-validation results from the classification model evaluation is presented in Table 5. Models with weaker performance have been omitted for brevity. As indicated by the bolded best scores, the RF model with EMG features achieved the highest accuracy. A \(\pm1\) accuracy score of 85.9% (95% CI: 0.831–0.888) demonstrates that the model is consistently estimating RPE within an acceptable range. Further, the F1-score being marginally higher than the best accuracy demonstrates the model’s ability to make a balanced trade-off between precision and recall, as well as reasonable performance on minority class estimates, as supported by the confusion matrices shown in Figures 8 and 9. The RMSE of RF without EMG was marginally lower than with, but the majority of metrics favour RF with EMG as the best model. Since it achieved the best performance on five out of six metrics, the RF model with EMG features is considered the most effective classification model for RPE estimation.

Table 5: Performance of Models for RPE Classification
Model EMG? MAE RMSE Accuracy Acc. (\(\pm1\)) F1 \(R^2\)
ANN Y 0.9891 1.4896 0.3729 0.7886 0.4074 0.3764
ANN N 0.9762 1.4510 0.3758 0.7806 0.4059 0.3794
RF Y 0.8116 1.2482 0.4138 0.8594 0.4424 0.4249
RF N 0.8225 1.2350 0.3968 0.8564 0.4196 0.4085
XGB Y 0.8335 1.2361 0.4028 0.8355 0.4270 0.4103
XGB N 0.8385 1.2676 0.4118 0.8315 0.4234 0.4224
Figure 8: Confusion matrix for the RF classifier with EMG features.
Figure 9: Confusion matrix for the RF classifier without EMG features.

4.2.2 Regression Models↩︎

The results from the regression models are presented in Table 6, with weaker-performing architectures omitted for brevity. When calculating classification metrics for regression outputs, the outputs were rounded to the nearest integer to ensure consistent metrics across models for fair comparison. However, the \(\pm1\) accuracy metric was computed on unrounded estimates, covering a span of two RPE classes. Unlike the classification setting, where \(\pm1\) spans three classes, this results in notably lower \(\pm1\) accuracy values. Overall, the XGB regression models achieved the strongest performance, each ranking best on three metrics. The XGB model with EMG features obtained the lowest MAE and RMSE scores. While rounding improved the classification-style metrics slightly, the regression-specific metrics were prioritized, and the XGB model with EMG features is identified as the best-performing regression model. Nonetheless, these regression models perform considerably worse than the classification models on traditional accuracy metrics. Therefore, a classification-based approach to RPE estimation is generally preferred, although regression may be advantageous if capturing subtle differences in exertion is important.

Table 6: Performance of Models for RPE Regression
Model EMG? MAE RMSE Accuracy Acc. (\(\pm1\)) F1 \(R^2\)
ANN Y 0.8236 1.0708 0.3769 0.6860 0.3344 0.3908
ANN N 0.8622 1.1350 0.3758 0.6580 0.3492 0.3985
SVR Y 0.8142 1.0654 0.3799 0.7049 0.3611 0.4083
SVR N 0.8265 1.0682 0.3808 0.6819 0.3375 0.4051
XGB Y 0.7879 1.0439 0.4058 0.7059 0.3709 0.4464
XGB N 0.7927 1.0540 0.4177 0.7039 0.3807 0.4554

4.2.3 RNN Models↩︎

The performance of the RNN classification models is presented in Table 7. Interestingly, for both regression and classification tasks, the best-performing models did not include EMG features. Overall, the RNNs underperformed compared to the rep-wise models, with the regression variants performing particularly poorly; these were therefore omitted from the table. This weak performance is likely attributable to the jitter introduced during preprocessing, which may have impaired the integrity of the data. The consistently better performance of classification compared with regression further supports the use of a classification-based approach for RPE estimation. Although the accuracy and F1-scores are substantially lower than those of the rep-wise models, the \(\pm1\) accuracy is comparatively less affected, suggesting that even when estimates are incorrect, they tend to fall within the correct region.

4.3 Feature Importance for RF RPE Classification↩︎

Overall, the best-performing model for RPE estimation was the RF classifier using EMG features. Figure 10 shows feature importance for this model, computed as the average decrease in impurity each feature provides across all trees. Time-based features were the most influential, followed by jerk features. This indicates that one of the primary determinants of RPE is the time taken to complete a repetition. Interestingly, eccentric duration appears more important than concentric, suggesting that pauses at the end of a rep may serve as a key marker of perceived difficulty. This does not imply that longer repetitions are inherently more difficult; rather, it likely reflects that higher RPEs tend to result in longer repetition times. The association is further supported by a Pearson correlation coefficient (PCC) of \(r=0.541\) between rep length and RPE.

Table 7: Performance of RNN Models for RPE Classification
Model EMG? MAE RMSE Accuracy Acc. (\(\pm1\)) F1 \(R^2\)
LSTM Y 1.0747 1.4853 0.3359 0.7386 0.2662 0.3366
LSTM N 1.0453 1.4453 0.3360 0.7444 0.2125 0.3025
GRU Y 1.0514 1.4687 0.3327 0.7503 0.2903 0.3366
GRU N 1.0914 1.5420 0.3396 0.7289 0.2702 0.3351
Figure 10: Feature importance for the RF classifier with EMG features.

5 Discussion↩︎

The results presented in Section 4 show that classification models, particularly RFs with EMG features, provide the strongest performance for RPE estimation; however, it remains unclear why the inclusion of EMG features yields only modest improvements despite its theoretical relevance to perceived exertion.

5.1 Impact of EMG features↩︎

Table 8 shows the differences in model performance metrics, computed by subtracting the metrics of each model without EMG features from the corresponding model with EMG features. It should be noted that the architectures were not identical, as different hyperparameters were deemed optimal for different models. In the table, a positive mean indicates that models with EMG tended to perform better, whereas a negative mean indicates that models without EMG performed better. The standard deviations, inherently positive, measure the spread of differences in metrics, i.e., the consistency of the performance differences. Most metrics have positive mean and median values, suggesting that EMG features generally improve performance, albeit marginally. However, the relatively high standard deviations compared to the means highlight substantial variability in these differences, which is further reflected in the minimum and maximum values for each metric. Notably, the maximum values have larger magnitudes than the minimums, indicating that potential gains can exceed potential losses. Accuracy, recall (weighted), F1 (weighted), and \(R^2\) all have negative mean values. This likely reflects poor performance on minority classes when EMG features are used, as supported by the contrasting positive averages in the macro metrics. Noise in the EMG data may also contribute to uncertain estimates, particularly for under-represented classes. Overall, the predominance of positive averages supports the use of EMG features as a valid approach to RPE estimation. This conclusion is reinforced by the best EMG-based model achieving a higher F1-score (+0.02) than the best-performing non-EMG model (XGB classifier).

Table 8: Impact of EMG Features on Model Performance
Metric Mean Median Std. Dev. Max Min
Accuracy -0.0040 -0.0015 0.0150 0.0170 -0.0530
Acc. (\(\pm1\)) 0.0072 0.0070 0.0196 0.0296 -0.0566
Precision (macro) 0.0087 0.0065 0.0315 0.0727 -0.0653
Recall (macro) 0.0073 0.0035 0.0195 0.0401 -0.0380
F1 (macro) 0.0057 0.0042 0.0216 0.0537 -0.0411
Precision (weighted) 0.0004 0.0009 0.0187 0.0341 -0.0565
Recall (weighted) -0.0040 -0.0015 0.0150 0.0170 -0.0530
F1 (weighted) -0.0025 -0.0015 0.0164 0.0205 -0.0535
\(R^2\) -0.0115 0.0008 0.0544 0.0504 -0.1856
Figure 11: Scatter plot of PC1 values from IMU and EMG feature sets.

5.2 Correlations and Physiological Considerations↩︎

The modest performance improvements gained from incorporating EMG training features have several possible interpretations. One possibility is that IMU and EMG data are strongly correlated, such that including EMG provides little additional information beyond what is already captured by the IMU. While strong estimation performance in Tables 1 to 4 might initially support this, further analysis suggests otherwise. Figure 11 shows the PC1s from the IMU and EMG feature sets, with a PCC of 0.219, indicating a positive but weak correlation. This implies that the EMG data likely contains additional complexity not captured by the IMU. Another possible explanation for the limited benefit is issues with the underlying EMG data. The selected EMG features may not encompass all relevant information in the raw signals, and irreducible noise could introduce inconsistencies, thereby affecting model performance. This is further supported by the PCC between EMG PC1 and RPE, which is \(r=-0.128\), showing a very weak negative correlation. Since EMG measures muscle activation, it might be expected to correlate strongly with perceived exertion. One potential reason this assumption does not hold is the inherent ambiguity of RPE: individuals with different pain tolerances and muscle conditioning may perceive exertion differently. Another factor may be limitations in data quality. Sensors were intended to be consistently placed on the peak of the bicep; however, due to natural variations in muscle anatomy, placement may not have been optimal, potentially resulting in misread signal amplitudes. Additionally, participants with particularly strong forearm or brachialis muscles may have performed a disproportionate amount of work with these muscles rather than the biceps [24], further skewing EMG readings. Since prior research indicates a correlation between EMG and RPE [25], the weak correlation observed in this study is likely attributable to limitations in data quality and quantity.

5.3 EMG Feature Extraction↩︎

EMG feature extraction and selection is an area that warrants further exploration. The extracted features were used to encode labels via dimensionality reduction techniques, after which models were developed to estimate these features using inertial measurements. The EMG estimation models achieved reasonable accuracy, demonstrating links between IMU and EMG encodings. The limited benefit of EMG features in the RPE models is unlikely to be due to high EMG estimation errors, as estimation performance was adequate, with low RMSE for PCA labels and classification label accuracy exceeding 80%. The high feature importance of both PC1 and PC2 in Figure 10 further demonstrates the practical effectiveness of these EMG-derived labels. Supervised dimensionality reduction methods, such as linear discriminant analysis (LDA), could potentially enhance the correlation between EMG encodings and RPE. The lower feature importance of the k-means labels, derived from UMAP and t-SNE embeddings, may reflect that these dimensionality reduction methods provided less informative encodings of the EMG data. To investigate this, the best RF architecture was trained using ground truth EMG features instead of estimated labels. This increased model performance to a \(\pm1\) accuracy of 86.4% (+0.5%) and an absolute accuracy of 41.8% (+0.4%). Although these gains are modest, they correspond to improvements of 0.8% and 2.1% over the RF model without EMG features, representing a meaningful increase over the baseline. These results suggest that future applications could benefit from estimation models trained directly on EMG features rather than on imputed labels.

6 Conclusion↩︎

In summary, this project explored the novel use of EMG training data for estimating RPE from inertial sensor input. A variety of models were implemented and evaluated, with Random Forest emerging as the best-performing classifier for RPE using imputed EMG labels. The model achieved a best F1-score of 0.443, indicating reasonable performance given class imbalance, and a \(\pm1\) accuracy of 85.9%, demonstrating strong estimation performance within an acceptable margin. On average, models incorporating EMG features outperformed the baselines, although these gains were marginal.

The modest benefits observed from EMG training data highlight the need for larger and more diverse datasets, ideally on the order of 10,000–20,000 repetitions, with broader participant representation to capture variability in exertion responses. Future work should also investigate alternative strategies for incorporating EMG data, such as focusing on feature selection and extraction of informative raw features rather than reduced encodings. Finally, while rep-wise models outperformed recurrent architectures, the presence of long-term dependencies suggests that RNN-based approaches may prove valuable if trained on larger datasets without reliance on augmentation techniques that compromise temporal integrity.

References↩︎

[1]
J. Fisher, J. Steele, S. Bruce-Low, and D. Smith, “Evidence based resistance training recommendations,” Medicina Sportiva, vol. 15, no. 3, pp. 147–162, 2011.
[2]
F. Frangoudes, M. Matsangidou, E. C. Schiza, K. Neokleous, and C. S. Pattichis, “Assessing human motion during exercise using machine learning: A literature review,” IEEE Access, vol. 10, pp. 86 874–86 903, 2022.
[3]
G. A. Borg, “Psychophysical bases of perceived exertion.” Medicine and Science in Sports and Exercise, vol. 14, no. 5, pp. 377–381, 1982.
[4]
M. L. Day, M. R. McGuigan, G. Brice, and C. Foster, “Monitoring exercise intensity during resistance training using the session RPE scale,” The Journal of Strength & Conditioning Research, vol. 18, no. 2, pp. 353–358, 2004.
[5]
T. Lasevicius, C. Ugrinowitsch, B. J. Schoenfeld, H. Roschel, L. D. Tavares, E. O. De Souza, G. Laurentino, and V. Tricoli, “Effects of different intensities of resistance training with equated volume load on muscle strength and hypertrophy,” European Journal of Sport Science, vol. 18, no. 6, pp. 772–780, 2018.
[6]
B. J. Schoenfeld, “The mechanisms of muscle hypertrophy and their application to resistance training,” The Journal of Strength & Conditioning Research, vol. 24, no. 10, pp. 2857–2872, 2010.
[7]
R. E. Vetter and M. L. Symonds, “Correlations between injury, training intensity, and physical and mental exhaustion among college athletes,” The Journal of Strength & Conditioning Research, vol. 24, no. 3, pp. 587–596, 2010.
[8]
J. Passmore, Q. Liu, D. Tee, and S. Tewald, “The impact of covid-19 on coaching practice: results from a global coach survey,” Coaching: An International Journal of Theory, Research and Practice, vol. 16, no. 2, pp. 173–189, 2023.
[9]
D. L. Carey, K. Ong, M. E. Morris, J. Crow, and K. M. Crossley, “Predicting ratings of perceived exertion in Australian football players: methods for live estimation,” International Journal of Computer Science in Sport, vol. 15, no. 2, pp. 64–77, 2016.
[10]
J. A. Albert, A. Herdick, C. M. Brahms, U. Granacher, and B. Arnrich, “Persist: A multimodal dataset for the prediction of perceived exertion during resistance training,” Data, vol. 8, no. 1, p. 9, 2022.
[11]
D. Inc., “Trigno wireless biofeedback system,” 2024, accessed: 2025-04-28. [Online]. Available: https://delsys.com/trigno/.
[12]
C. J. De Luca, L. D. Gilmore, M. Kuznetsov, and S. H. Roy, “Filtering the surface EMG signal: Movement artifact and baseline noise contamination,” Journal of Biomechanics, vol. 43, no. 8, pp. 1573–1579, 2010.
[13]
G. G. Redhyka, D. Setiawan, and D. Soetraprawata, “Embedded sensor fusion and moving-average filter for inertial measurement unit (IMU) on the microcontroller-based stabilized platform,” in International Conference on Automation, Cognitive Science, Optics, Micro Electro-Mechanical System, and Information Technology.IEEE, 2015, pp. 72–77.
[14]
M. Alanazi, R. S. Aldahr, and M. Ilyas, “Effectiveness of machine learning on human activity recognition using accelerometer and gyroscope sensors: A survey,” in Proceedings of the 26th World Multi-Conference on Systemics, Cybernetics and Informatics, 2022, pp. 12–15.
[15]
H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.
[16]
L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
[17]
L. McInnes, J. Healy, and J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.
[18]
T. M. Kodinariya, P. R. Makwana et al., “Review on determining number of cluster in k-means clustering,” International Journal of Advance Research in Computer Science and Management Studies, vol. 1, no. 6, pp. 90–95, 2013.
[19]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[20]
P. Davidson, P. Düking, C. Zinner, B. Sperlich, and A. Hotho, “Smartwatch-derived data and machine learning algorithms estimate classes of ratings of perceived exertion in runners: A pilot study,” Sensors, vol. 20, no. 9, p. 2637, 2020.
[21]
J. A. Albert, A. Herdick, C. M. Brahms, U. Granacher, and B. Arnrich, “Using machine learning to predict perceived exertion during resistance training with wearable heart rate and movement sensors,” in IEEE International Conference on Bioinformatics and Biomedicine.IEEE, 2021, pp. 801–808.
[22]
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
[23]
J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, and S.-H. Deng, “Hyperparameter optimization for machine learning models based on Bayesian optimization,” Journal of Electronic Science and Technology, vol. 17, no. 1, pp. 26–40, 2019.
[24]
G. Coratella, G. Tornatore, S. Longo, N. Toninelli, R. Padovan, F. Esposito, and E. Cè, “Biceps brachii and brachioradialis excitation in biceps curl exercise: different handgrips, different synergy,” Sports, vol. 11, no. 3, p. 64, 2023.
[25]
K. M. Lagally, R. J. Robertson, K. I. Gallagher, F. L. Goss, J. M. Jakicic, S. M. Lephart, S. T. McCAW, and B. Goodpaster, “Perceived exertion, electromyography, and blood lactate during acute bouts of resistance exercise,” Medicine & Science in Sports & Exercise, vol. 34, no. 3, pp. 552–559, 2002.