SSL-SE-EEG: A Framework for Robust Learning from Unlabeled EEG Data with Self-Supervised Learning and Squeeze-Excitation Networks


Abstract

Electroencephalography (EEG) plays a crucial role in brain-computer interfaces (BCIs) and neurological diagnostics, but its real-world deployment faces challenges due to noise artifacts, missing data, and high annotation costs. We introduce SSL-SE-EEG, a framework that integrates Self-Supervised Learning (SSL) with Squeeze-and-Excitation Networks (SE-Nets) to enhance feature extraction, improve noise robustness, and reduce reliance on labeled data. Unlike conventional EEG processing techniques, SSL-SE-EEG transforms EEG signals into structured 2D image representations, suitable for deep learning. Experimental validation on MindBigData, TUH-AB, SEED-IV and BCI-IV datasets demonstrates state-of-the-art accuracy (91% in MindBigData, 85% in TUH-AB), making it well-suited for real-time BCI applications. By enabling low-power, scalable EEG processing, SSL-SE-EEG presents a promising solution for biomedical signal analysis, neural engineering, and next-generation BCIs.

EEG, Self-supervised learning, Squeeze and Excitation Network, Power efficient BCI

1 Introduction↩︎

Electroencephalography (EEG) is a vital biopotential signal used to measure brain activity in applications such as brain-computer interfaces, cognitive monitoring, and the diagnosis of neurological disorders [1]. Despite its importance, real-world EEG applications face significant challenges due to noise, motion artifacts, and incomplete data from missing or corrupted channels, often resulting from electrode displacement [2], [3].

Figure 1: (a) EEG workflow highlighting challenges in data labeling and processing. (b) Trade-off between model accuracy and labeling cost across ML paradigms, showing how SSL-SE-EEG achieves high performance with reduced labeled data.

The traditional EEG workflow is illustrated in Fig. 1 (a). It begins with data collection and labor-intensive labeling, which is subject-specific, task-specific, and highly variable across acquisition conditions. Subsequently, the collected data is processed either manually by domain experts or through automated methods. Automated approaches typically leverage machine learning (ML) techniques, which can be broadly categorized into two groups. Traditional ML techniques include artificial neural networks (ANN), support vector machines (SVM), and principal component analysis (PCA) [4]. More recently, deep learning architectures, particularly convolutional neural networks (CNNs), have shown superior performance, largely due to their ability to effectively extract features from EEG signals [5].

While traditional ML techniques and most CNN-based methods have demonstrated success in EEG processing, they predominantly rely on the supervised learning (SL) paradigm, which requires extensive labeled datasets to uncover meaningful patterns [6]. This dependence on manual annotation renders data preparation expensive, time-consuming, and subject to strict human research constraints [7][9]. Moreover, even with ample labeled data, achieving robust generalization across subjects and sessions remains a persistent and unresolved challenge [10]. Unsupervised learning, which operates solely on unlabeled data, alleviates the burden of annotation but often struggles to learn features that reliably separate signal from noise [1]. In response to these limitations, self-supervised learning (SSL) has emerged as a promising middle ground. SSL frameworks leverage large volumes of unlabeled data to learn rich, transferable representations through carefully designed pretext tasks, requiring only a small fraction of labeled data for downstream fine-tuning [11]. As shown in Fig. 1 (b), SSL offers an attractive trade-off, achieving high performance with substantially reduced labeling costs.

In this paper, we introduce SSL-SE-EEG, a novel framework that combines SSL with CNNs and Squeeze-and-Excitation Networks (SE-Nets) to enhance feature extraction and improve robustness to noise in EEG-based image representations. Our approach first transforms EEG signals into structured 2D RGB image representations, preserving critical temporal and amplitude information, thereby enabling compatibility with CNN-based encoders and facilitating diverse view generation for contrastive learning. SE-Nets further refine feature learning by dynamically recalibrating channel-wise responses, allowing the model to focus on the most informative aspects of EEG signals. As highlighted in Fig. 1 (b), SSL-SE-EEG delivers high classification accuracy while significantly minimizing reliance on labeled data.

We validate our framework across four public EEG datasets, demonstrating strong generalization across subjects and tasks. Additionally, we show that SE integration introduces minimal power overhead, making SSL-SE-EEG particularly well-suited for deployment in low-power, wearable EEG systems [12][14].

We summarize our main contributions as follows:

  • SSL-SE-EEG Framework: We propose a novel framework that integrates self-supervised learning with Squeeze-and-Excitation Networks to enhance feature extraction, improve generalization, and reduce dependency on labeled EEG data.

  • EEG Representation as 2D Images: We transform EEG signals into 2D RGB image representations that preserve temporal and amplitude characteristics, enabling effective feature learning through CNN.

  • Accuracy and Energy Efficiency with SE-Nets: We validate SSL-SE-EEG on multiple public EEG datasets, achieving high classification accuracy while maintaining lightweight SE-Net integration, ensuring suitability for energy-efficient, wearable EEG applications.

2 Relevant Topics to Understand↩︎

Figure 2: Overview of the proposed SSL-SE-EEG pipeline, which consists of two steps. Step 1 involves preprocessing EEG signals into 2D image representations. Step 2 integrates self-supervised learning with SE-Nets through a two-stage process: Stage 1 applies contrastive learning using a modified encoder (BE+SE) and a projection head (PH) to maximize feature diversity; Stage 2 fine-tunes the network on a smaller labeled dataset for classification, leveraging learned representations for robust inference.

2.1 Self-Supervised Learning (SSL)↩︎

SSL is a machine learning paradigm in which models are trained on unlabeled data using inherent structures or generating supervisory signals through pretext tasks [11]. In summary, SSL methods typically design auxiliary tasks, such as predicting missing segments, contrasting different views of the same data, or reconstructing inputs, to force the model to learn valuable representations without relying on manual annotations [8]. Recent advances include techniques such as contrastive learning (e.g. SimCLR [15], MoCo [16]) and masked autoencoders, which have significantly improved performance in areas such as computer vision[17][19] and natural language processing [20][22]. SSL has been used in biomedical datasets like knee MRI, SARS-COV-CT, and TissueMNIST and has shown promising results [23][25]. However, studies on EEG and SSL are limited [26], [27].

2.2 Squeeze and Excitation Network (SE-Net)↩︎

SE-Net is a neural network architecture designed to improve feature representations by explicitly modeling the interdependencies between channels [28]. It works by "squeezing" global spatial information into a channel descriptor through a global pooling layer and then "exciting" or recalibrating these channels via a gating mechanism with fully connected layers, allowing the network to focus on the most informative features. Recent work has integrated SE-Net modules into various architectures to improve performance in tasks such as image classification, object detection, and semantic segmentation [29][31]. Moreover, its adaptability has shown promise in the biomedical domains, where channel-wise recalibration can help to capture better and emphasize critical information [32][35].

Despite their success in other domains, SE-Nets have been limited in use in EEG analysis. This work integrates SSL and SE-Nets to create a label-independent framework that adapts to missing channels and efficiently learns the appropriate features of EEG. The following section details our methodology, outlining how SSL and SE-Nets improve feature learning and generalization in EEG processing.

Figure 3: Illustration of EEG-to-2D image conversion: (a) Raw EEG signals are segmented into 2-second windows, where 50 ms temporal segments are flattened into 1D columns to construct a 2D image. (b) Examples from TUH-AB and BCI-IV datasets demonstrate the effectiveness of this transformation for CNN-based feature extraction and classification.

3 Proposed framework - SSL-SE-EEG↩︎

To reduce the reliance on labeled data while enhancing robustness and generalization, we introduce Self-Supervised Learning with Squeeze-and-Excitation Networks for EEG SSL-SE-EEG. This framework enables robust feature extraction from unlabeled EEG signals while dynamically enhancing relevant patterns and suppressing noise. Additionally, it leverages a CNN-friendly 2D image representation of EEG data, making it well-suited for deep learning architectures. Fig. 2 outlines the workflow. It starts with Data Processing, followed by the the modified SSL implementation with SE-Nets. Each step is described below.

3.0.1 Data Processing↩︎

Raw EEG signals are time-series waveforms that may not be directly compatible with many deep learning architectures predominantly designed for image-based inputs. To bridge this gap, we introduce a procedure to transform EEG waveforms into a structured 2D image representation, preserving temporal and amplitude information in a format that CNNs can effectively process. The concept of transforming raw signal data into image representations for CNN processing has been previously explored in different contexts such as RF signal analysis [36], and finds use here for EEG signals. The process is illustrated in Fig. 3 (a).

We begin by segmenting the raw EEG into 5-second windows, ensuring that each generated image encapsulates a fixed duration of EEG activity. Each 5-second window is further divided into 50ms segments, where each segment is flattened into a 1D vector, forming a single column in the resulting 2D matrix. This results in a matrix with 100 columns (corresponding to the 100 segments of 50ms each), while the number of rows depends on the sampling frequency of the EEG signal. The sampling frequency of the EEG in Fig. 3 (a) is 500Hz. Each resulting 2D image, therefore, represents precisely 5 seconds of EEG data, maintaining both spatial and temporal continuity. To ensure compatibility with standard CNN architectures, the 2D matrix is reshaped into a 224\(\times\)​224\(\times\)​3 RGB image, making it a structured and information-rich input format for feature extraction. Each 50 ms segment vector is normalized to [0, 1] and mapped to RGB channels using a perceptually uniform colormap (example ‘viridis’ in Matplotlib). 256 discrete color levels map EEG values to 8-bit RGB intensities. Note that depending on data availability, 2-second windows with 20ms segments, with other colormaps may be used to produce the 2D image structure while maintaining compatibility with the framework.

Fig. 3 (b) shows examples of raw EEG waveforms and the corresponding 2D images for various classes in the BCI IV and TUH datasets. These image representations reveal class-specific distinctions more clearly than the raw waveforms alone. By introducing this image-based representation as input, we can use our SSL-SE-EEG framework to extract meaningful features more effectively.

3.0.2 Modified SSL implementation with SE-Net↩︎

Our SSL-SE-EEG framework is inspired by Simple Contrastive Learning (SimCLR), a widely adopted self-supervised learning method [15] and comprises two main components: a pretraining block and a validation block, as illustrated in Fig. 2.

3.0.2.1 Stage 1: Pretraining

In the pretraining stage, we first augment each input image to different views of input data. We transform our EEG data to generate multiple views of the same signal like rotation, blur, etc. Next, we use a modified ResNet as the base encoder. Unlike conventional ResNet architectures, we integrate Squeeze-and-Excitation (SE) blocks after each convolutional layer. SE blocks enhance feature learning by adaptively recalibrating channel-wise representations. This mechanism selectively emphasizes informative channels while suppressing less relevant ones, a key advantage over standard convolutions that treat all input channels uniformly.

Figure 4: Modified Base Encoder with SE-Net. SE-blocks enhance EEG feature sensitivity via global pooling and fully connected layers.

Fig. 4 illustrates the SE block architecture, where each SE block operates in three stages as follows:

  • Squeeze: A global pooling operation aggregates information across the spatial dimensions, capturing the overall feature distribution.

  • Excitation: Two fully connected layers learn the relative importance of each channel, highlighting discriminative features.

  • Recalibration: The learned channel weights rescale the original feature maps, suppressing noise while enhancing relevant EEG signals.

By integrating SE blocks with SSL, the framework automatically refines EEG representations, improving robustness against sensor dropouts and signal artifacts.

Following the encoder, a projection head (implemented as a multi-layer perceptron) maps features into a lower-dimensional space. This step facilitates contrastive learning by ensuring that different transformations of the same EEG segment are mapped closer together, while representations of distinct EEG segments are pushed apart.

The framework is trained using NT-Xent loss (Normalized Temperature-Scaled Cross Entropy), a contrastive loss function shown in Eq 1  [15]. This objective maximizes similarity between augmented versions of the same EEG signal while ensuring separation from other samples. \[\label{eq95loss} \mathbb{L}_{i,j} = -\log \frac{\exp(\text{sim}(\mathbf{z}_i, \mathbf{z}_j)/\tau)}{\sum\limits_{k=1}^{2N} \mathbf{1}_{[k \neq i]} \exp(\text{sim}(\mathbf{z}_i, \mathbf{z}_k)/\tau)}\\\tag{1}\] where \(\mathbf{z}_i\) & \(\mathbf{z}_j\) denote latent embeddings of EEG samples, \({sim}\) is cosine similarity,\(\tau\) is the temperature parameter and \(N\) is the batch size

Through contrastive learning and channel-wise recalibration, SSL-SE-EEG enables noise-tolerant feature learning. This approach significantly enhances model robustness and improves classification performance on downstream EEG tasks. By leveraging SSL pretrain, our framework effectively learns discriminative EEG representations, even in scenarios with limited labeled data.

3.0.2.2 Stage 2: Validation

After pretraining, is the validation phase, where the encoder’s weights remain frozen. The learned feature representations are then used for classification on a labeled EEG dataset. A classification head is added on top of the frozen encoder and trained using categorical cross-entropy loss to assess the model’s performance.

4 Experimental Setup↩︎

Using multiple public EEG datasets, we evaluate SSL-SE-EEG on NVIDIA L40 GPUs. Data preprocessing was performed using Python libraries such as Pandas, NumPy, and OpenCV, while model training and evaluation were conducted with TensorFlow. The model performance was evaluated using standard evaluation metrics such as accuracy and F1 score and computed with Scikit-learn. Accuracy measures the proportion of correctly predicted samples, providing a general indicator of model performance. However, in imbalanced datasets, accuracy alone may be misleading. The F1 score, a harmonic mean of precision and recall, offers a more reliable performance measure across all classes, as shown in (Eq.  2 ).

\[\label{eq95acc} ~ { \setlength{\arraycolsep}{1.5pt} \begin{equation} \textstyle \text{Accuracy} = \frac{TP + TN}{TP + FP + TN + FN} \end{equation} } ~ { \setlength{\arraycolsep}{1.5pt} \begin{equation} \textstyle F1 = \frac{2TP}{2TP + FP + FN} \end{equation} }\tag{2}\]

We use the following datasets:

  1. MindBigData(MBD). This dataset comprises 2-second EEG recordings collected from commercial headsets (NeuroSky MindWave, Emotiv EPOC, Interaxon Muse, and Emotiv Insight). The participants viewed and mentally processed the digits (0-9), forming a 10-class classification task where each class corresponds to a digit  [37].

  2. TUH EEG Abnormal Corpus (TUH-AB). This is one of the most extensive publicly available clinical EEG datasets. It contains manually labeled recordings, leading to a two-class classification task in which the classes represent normal and abnormal EEG patterns. Each recording lasts approximately 20 minutes and is captured from 21 EEG channels [38].

  3. SEED-IV. This dataset comprises EEG recordings of 15 participants watching videos designed to evoke emotions. The recordings are categorized into four distinct emotions: happy, sad, neutral, and fear [39].

  4. BCI-IV. This dataset includes EEG recordings from healthy participants engaged in motor imagery tasks. Data was continuously recorded using 59 channels of an Ag/AgCl electrode cap, as subjects envisioned moving their left hand, right hand, or foot, forming a 3-class classification task [40].

5 Evaluation and Results↩︎

In this section, we evaluate our proposed framework SSL-SE-EEG to address the following:

  1. Performance of SSL-SE-EEG: How accurately does the framework classify EEG signals on public datasets?

  2. Accuracy benefit of SE-Net: How does incorporating SE-Net enhance feature representation and improve model performance?

  3. Impact on Power Consumption with SE-Net: Does including SE-Net significantly impact power consumption, and how does this affect the feasibility of ultra-low-power wearable EEG systems?

This section is divided into three parts, each addressing one of the above questions and offering an analysis of the results and insights gained.

Table 1: Comparison with State-of-the-Art Methods on MindBigData and TUH-AB
Paper Preprocessing Dataset Architecture Accuracy (%)
SSL-SE-EEG (Ours) 2D Image Representation MindBigData SSL-SE-EEG 91
[41] Spectrogram MindBigData CNN 91
[42] Raw EEG (Time Series) MindBigData DWT + BiLSTM 71
[43] Spectrogram MindBigData CNN 86
SSL-SE-EEG (Ours) 2D Image Representation of EEG TUH-AB SSL-SE-EEG 85.18
[44] Raw EEG (Time Series) TUH-AB 1D-CNN-RNN 82.27
[45] Raw EEG (Time Series) TUH-AB SSL 84.26
[46] Raw EEG (Time Series) TUH-AB LSTM + Attention 74

 

5.1 Prediction Accuracy of SSL-SE-EEG↩︎

We evaluate the performance of our proposed framework, SSL-SE-EEG, using a two-phase approach: pretraining followed by downstream fine-tuning, as described in Section 3. We conduct experiments on two distinct EEG datasets: MBD [37], which is imbalanced with 11 classes, and TUH-AB [38], which is balanced with three classes, introduced in Section 4.

Experiment 1: Pretraining on MBD. We pretrained SSL-SE-EEG on 50,000 images from the imbalanced MBD dataset for 50 epochs, then fine-tuned it on two tasks: (1) an unseen 2,500-image subset from MBD, and (2) 2,500 images from TUH-AB. The model achieved 91.12% and 89.24% accuracy, respectively, demonstrating robust feature learning from imbalanced data and across datasets.

Experiment 2: Pretraining on TUH-AB. Next, we pretrained on 50,000 images from the balanced TUH-AB dataset, then fine-tuned the model on an unseen subset of 2,500 TUH-AB images and 2,500 MBD images. This approach yielded slightly lower accuracy: 86.43% on TUH-AB and 85.18% on MBD, suggesting that despite good performance, further adaptation is needed to more effectively address class imbalance.

Insights. These experiments reveal two key insights. First, SSL-SE-EEG generalizes well across EEG datasets with varying class distributions, confirming the robustness and transferability of its learned representations. Second, even when fine-tuned on limited data, the framework achieves performance comparable to the state-of-the-art methods, as shown in Table 1. These findings underscore the efficiency and broad applicability of SSL-SE-EEG in EEG analysis tasks.

Table 2: Comparison of classification performance with and without SE using both Supervised and SSL methods.
Dataset Type # Classes Without SE - Supervised With SE - Supervised Without SE - SSL With SE - SSL
4-5 (lr)6-7 (lr)8-9 (lr)10-11 Acc (%) F1 Acc (%) F1 Acc (%) F1 Acc (%) F1
MindBigData Imbalanced 11 57 0.45 72 0.69 71 0.64 84 0.92
BCI IV Balanced 3 55 0.53 62 0.61 64 0.64 72 0.76
SEED IV Imbalanced 4 68 0.65 78 0.70 73 0.72 83 0.78
TUH-AB Balanced 2 62 0.60 70 0.69 68 0.67 75 0.75

 

Figure 5: Generalization performance of SSL-SE-EEG across datasets. The model is pretrained on MBD or TUH-AB and tested on unseen subsets, showing strong cross-dataset transfer with minimal accuracy drop.

5.2 Accuracy benefit of SE-Net↩︎

To assess the benefit of incorporating the SE module within SSL-SE-EEG, we performed experiments on four public EEG datasets. MindBigData [37], BCI IV [40], SEED IV [39], and TUH-AB [38]. We used a 80%/20% split for pre-training and downstream validation for each data set, respectively. To ensure a fair comparison, all models were pre-trained for 10 epochs using the 2D image representation described in Section 3, and performance was evaluated using accuracy and F1 score (Equations 2 (a) and (b)).

Table 2 compares models with and without the SE module under both SSL-pretrained and SL conditions. The integration of the SE module consistently boosts performance across all datasets. For example, on the MBD imbalanced data set, the SSL-pretrained model accuracy improved from 71% to 84%, and its F1 score increased from 0.64 to 0.92 with SE. Similar improvements were observed in the balanced BCI IV and TUH-AB datasets and the imbalanced SEED IV datasets.

Insights. Performance gains can be attributed to the SE module’s ability to dynamically recalibrate channel-wise feature responses, effectively allowing the network to focus on the most informative and discriminative features. On imbalanced datasets like MindBigData and SEED IV, where minority classes are often underrepresented, this channel reweighting leads to more balanced attention across classes, thereby improving both accuracy and F1 score. In contrast, balanced datasets such as BCI IV and TUH-AB benefit from the SE module through enhanced feature extraction and noise suppression, which results in more robust classification even under varying conditions. These improvements are evident regardless of whether SSL pre-training is used, underscoring the broad applicability of the SE module and its crucial role in enhancing EEG signal analysis.

5.3 Impact on Power Consumption by incorporating SE-Net↩︎

Figure 6: GPU power and inference time comparison for SSL-SE-EEG. SE-Net slightly increases power (\leq​0.4%) and inference time (\leq​1ms) while enhancing feature extraction.

Next, we examine how adding the SE module affects GPU power consumption and energy use. We evaluated one datapoint from the TUH-AB dataset on one L40 GPU and estimated power using NVIDIA’s NVML library. Fig. 6 illustrates the average GPU power consumption and total energy use under two learning paradigms: SSL pre-trained and SL, comparing setups with and without SE-Net.

In the SL setup, adding SE-Nets leads to a 0.15% increase in average GPU power. When SSL pretraining is used, adding SE raises the average GPU power by 0.4%. This shows that the addition of SE-Nets introduces a slight increase in power, attributed to the lightweight nature of global average pooling and simple channel-wise recalibration. The difference in power consumption between SL and SSL is mainly due to their underlying computational methodology rather than the SE module itself. In SL, each sample is processed once through a single ResNet-based CNN, whereas SSL processes multiple augmented views of the same image (see Fig 2), increasing the number of forward passes and computations per sample. As a result, the same SE module causes a relatively larger increase in power consumption in SSL due to the higher number of computation per input.

Insights. From a broader perspective, the minor increase in power consumption and runtime compared to non-SE implementations suggests that SE-Nets introduce minimal power overhead while enhancing efficiency. In fact, SE-Nets improve feature extraction and yield more robust representations, leading to more accurate predictions and better generalization. When deployed on ML-optimized hardware, such as [47], which operates at an efficiency of 0.3–2.6 TOPS/W, our proposed SSL-SE-EEG framework would consume as little as 2.96 mW in the best-case scenario and up to 25.67 mW in the worst case, which is well within the power constraints of mobile and wearable devices. Furthermore, quantization and dedicated hardware can further optimize power efficiency while maintaining performance [12]. Future work will explore these hardware-aware optimizations to enhance the practicality of SSL-SE-EEG in real-world low-power applications.

6 Conclusion↩︎

This work introduces SSL-SE-EEG, a novel framework that leverages SSL to reduce dependency on labeled data and uses SE-Nets to enhance feature selection and noise suppression. Our approach achieves 91% accuracy in MindBigData and 85.18% in TUH-AB. We also perform experiments to demonstrate that SE-Net integration improves classification by up to 15% public data sets while keeping power consumption low (\(\leq\)​0.4% compared to no SE-Net). These results demonstrate that SSL-SE-EEG enables robust, scalable EEG-based interfaces, making it well-suited for real-time cognitive state monitoring, BCIs, and neurorehabilitation applications. Future work will focus on further optimizing real-time deployment for low-power wearable EEG systems.

References↩︎

[1]
M.-P. Hosseini, A. Hosseini, and K. Ahi, “A review on machine learning for eeg signal processing in bioengineering,” IEEE reviews in biomedical engineering, vol. 14, pp. 204–218, 2020.
[2]
A. Puce and M. S. Hämäläinen, “A review of issues related to data acquisition and analysis in eeg/meg studies,” Brain sciences, vol. 7, no. 6, p. 58, 2017.
[3]
C. Gondran, E. Siebert, S. Yacoub, and E. Novakov, “Noise of surface bio-potential electrodes based on nasicon ceramic and ag- agcl,” Medical and Biological Engineering and Computing, vol. 34, pp. 460–466, 1996.
[4]
M. C. Guerrero, J. S. Parada, and H. E. Espitia, “Eeg signal analysis using classification techniques: Logistic regression, artificial neural networks, support vector machines, and convolutional neural networks,” Heliyon, vol. 7, no. 6, 2021.
[5]
M. Saeidi, W. Karwowski, F. V. Farahani, K. Fiok, R. Taiar, P. A. Hancock, and A. Al-Juaid, “Neural decoding of eeg signals with machine learning: a systematic review,” Brain sciences, vol. 11, no. 11, p. 1525, 2021.
[6]
M. S. Nafea and Z. H. Ismail, “Supervised machine learning and deep learning techniques for epileptic seizure recognition using eeg signals—a systematic literature review,” Bioengineering, vol. 9, no. 12, p. 781, 2022.
[7]
B. Team, “How deep learning is changing machine learning ai in eeg data processing,” Bitbrain. com, vol. 23, 2020.
[8]
A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” Technologies, vol. 9, no. 1, p. 2, 2020.
[9]
Y. Liu and G. Fu, “Emotion recognition by deeply learned multi-channel textual and eeg features,” Future Generation Computer Systems, vol. 119, pp. 1–6, 2021.
[10]
M. D. Kohan, A. M. Nasrabadi, M. B. Shamsollahi, et al., “Interview based connectivity analysis of eeg in order to detect deception,” Medical hypotheses, vol. 136, p. 109517, 2020.
[11]
J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, H. Luo, and D. Tao, “A survey on self-supervised learning: Algorithms, applications, and future trends,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
[12]
M. R. Chowdhury, A. Ghosh, M. F. Bari, and S. Sen, “Leveraging ultra-law-power wearables using distributed neural networks,” in 2024 IEEE 20th International Conference on Body Sensor Networks (BSN), pp. 1–4, IEEE, 2024.
[13]
B. Chatterjee, P. Mohseni, and S. Sen, “Bioelectronic sensor nodes for the internet of bodies,” Annual Review of Biomedical Engineering, vol. 25, no. 1, pp. 101–129, 2023.
[14]
S. Sen and A. Datta, “Human-inspired distributed wearable ai,” in Proceedings of the 61st ACM/IEEE Design Automation Conference, pp. 1–4, 2024.
[15]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, pp. 1597–1607, PMLR, 2020.
[16]
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
[17]
K. Ohri and M. Kumar, “Review on self-supervised image recognition using deep neural networks,” Knowledge-Based Systems, vol. 224, p. 107090, 2021.
[18]
S. Ramesh, V. Srivastav, D. Alapatt, T. Yu, A. Murali, L. Sestini, C. I. Nwoye, I. Hamoud, S. Sharma, A. Fleurentin, et al., “Dissecting self-supervised learning methods for surgical computer vision,” Medical Image Analysis, vol. 88, p. 102844, 2023.
[19]
H. Hojjati, T. K. K. Ho, and N. Armanfard, “Self-supervised anomaly detection in computer vision and beyond: A survey and outlook,” Neural Networks, vol. 172, p. 106106, 2024.
[20]
A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger, et al., “Prottrans: Toward understanding the language of life through self-supervised learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 7112–7127, 2021.
[21]
A. Baevski, A. Babu, W.-N. Hsu, and M. Auli, “Efficient self-supervised learning with contextualized target representations for vision, speech and language,” in International Conference on Machine Learning, pp. 1416–1429, PMLR, 2023.
[22]
E. Morais, R. Hoory, W. Zhu, I. Gat, M. Damasceno, and H. Aronowitz, “Speech emotion recognition using self-supervised features,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6922–6926, IEEE, 2022.
[23]
S. Atito, S. M. Anwar, M. Awais, and J. Kittler, “Sb-ssl: Slice-based self-supervised transformers for knee abnormality classification from mri,” in Workshop on Medical Image Learning with Limited and Noisy Data, pp. 86–95, Springer, 2022.
[24]
Z. Tan, Y. Yu, J. Meng, S. Liu, and W. Li, “Self-supervised learning with self-distillation on covid-19 medical image classification,” Computer Methods and Programs in Biomedicine, vol. 243, p. 107876, 2024.
[25]
Z. Huang, R. Jiang, S. Aeron, and M. C. Hughes, “Systematic comparison of semi-supervised and self-supervised learning for medical image classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22282–22293, 2024.
[26]
M. N. Mohsenvand, M. R. Izadi, and P. Maes, “Contrastive representation learning for electroencephalogram classification,” in Machine Learning for Health, pp. 238–253, PMLR, 2020.
[27]
H. Banville, O. Chehab, A. Hyvärinen, D.-A. Engemann, and A. Gramfort, “Uncovering the structure of clinical eeg signals with self-supervised learning,” Journal of Neural Engineering, vol. 18, no. 4, p. 046020, 2021.
[28]
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
[29]
P. Sun, X. Niu, P. Sun, and K. Xu, “Squeeze-and-excitation network-based radar object detection with weighted location fusion,” in Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 545–552, 2021.
[30]
Z. Xu, X. Hong, T. Chen, Z. Yang, and Y. Shi, “Scale-aware squeeze-and-excitation for lightweight object detection,” IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 49–56, 2022.
[31]
J. Wang, Z. Luan, Z. Yu, J. Ren, J. Gao, K. Yuan, and H. Xu, “Superpixel segmentation with squeeze-and-excitation networks,” Signal, Image and Video Processing, pp. 1–8, 2022.
[32]
R. Ge, T. Shen, Y. Zhou, C. Liu, L. Zhang, B. Yang, Y. Yan, J.-L. Coatrieux, and Y. Chen, “Convolutional squeeze-and-excitation network for ecg arrhythmia detection,” Artificial Intelligence in Medicine, vol. 121, p. 102181, 2021.
[33]
L. Xiong, C. Yi, Q. Xiong, and S. Jiang, “Sea-net: medical image segmentation network based on spiral squeeze-and-excitation and attention modules,” BMC Medical Imaging, vol. 24, no. 1, p. 17, 2024.
[34]
X. Li, Y. Wei, L. Wang, S. Fu, and C. Wang, “Msgse-net: Multi-scale guided squeeze-and-excitation network for subcortical brain structure segmentation,” Neurocomputing, vol. 461, pp. 228–243, 2021.
[35]
M. Hayat, “Squeeze & excitation joint with combined channel and spatial attention for pathology image super-resolution,” Franklin Open, vol. 8, p. 100170, 2024.
[36]
M. F. Bari, B. Chatterjee, L. Duncan, and S. Sen, “Rf-psf: A cnn-based process distinction method using inadvertent rf signatures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 11, pp. 4233–4245, 2023.
[37]
“Mindbigdata the mnist of brain digits.” .
[38]
I. Obeid and J. Picone, “The temple university hospital eeg data corpus,” Frontiers in neuroscience, vol. 10, p. 196, 2016.
[39]
“Seed dataset.” .
[40]
B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Müller, and G. Curio, “The non-invasive berlin brain–computer interface: fast acquisition of effective performance in untrained subjects,” NeuroImage, vol. 37, no. 2, pp. 539–550, 2007.
[41]
N. Kumari, S. Anwar, and V. Bhattacharjee, “Convolutional neural network-based visually evoked eeg classification model on mindbigdata,” in Proceedings of Research and Applications in Artificial Intelligence: RAAI 2020, pp. 233–241, Springer, 2021.
[42]
N. C. Mahapatra and P. Bhuyan, “Eeg-based classification of imagined digits using a recurrent neural network,” Journal of Neural Engineering, vol. 20, no. 2, p. 026040, 2023.
[43]
S. Falciglia, F. Betello, S. Russo, and C. Napoli, “Learning visual stimulus-evoked eeg manifold for neural image classification,” Neurocomputing, vol. 588, p. 127654, 2024.
[44]
S. Roy, I. Kiral-Kornek, and S. Harrer, “Deep learning enabled automatic abnormal eeg identification,” in 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2756–2759, IEEE, 2018.
[45]
S. H. Kamsvåg and O. Størmer, “Exploring eeg self-supervised learning through channel grouping,” Master’s thesis, NTNU, 2023.
[46]
L. SOCCOL, “Attention-based eeg classification,” Master’s thesis, Padua, 2022.
[47]
B. Moons and M. Verhelst, “A 0.3–2.6 tops/w precision-scalable processor for real-time large-scale convnets,” in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), pp. 1–2, IEEE, 2016.

  1. Meghna Roy Chowdhury, Yi Ding, and Shreyas Sen are with Elmore Family School of Electrical and Computer Engineering, Purdue University, USA {mroycho,yiding,shreyas}@purdue.edu↩︎