Pan-cancer classification using transcriptomic (RNA-Seq) data can inform tumor subtyping and therapy selection, but is challenging due to extremely high dimensionality and limited sample sizes. In this study, we propose a novel deep learning framework that uses a class-conditional variational autoencoder (cVAE) to augment training data for pan-cancer gene expression classification. Using 801 tumor RNA-Seq samples spanning 5 cancer types from The Cancer Genome Atlas (TCGA), we first perform feature selection to reduce 20,531 gene expression features to the 500 most variably expressed genes. A cVAE is then trained on this data to learn a latent representation of gene expression conditioned on cancer type, enabling the generation of synthetic gene expression samples for each tumor class. We augment the training set with these cVAE-generated samples (doubling the dataset size) to mitigate overfitting and class imbalance. A two-layer multilayer perceptron (MLP) classifier is subsequently trained on the augmented dataset to predict tumor type. The augmented framework achieves high classification accuracy (~98%) on a held-out test set, substantially outperforming a classifier trained on the original data alone. We present detailed experimental results, including VAE training curves, classifier performance metrics (ROC curves and confusion matrix), and architecture diagrams to illustrate the approach. The results demonstrate that cVAE-based synthetic augmentation can significantly improve pan-cancer prediction performance, especially for underrepresented cancer classes.
The human brain is organized as a complex network, where connections between regions are characterized by both functional connectivity (FC) and structural connectivity (SC). While previous studies have primarily focused on network-level FC-SC correlations (i.e., the correlation between FC and SC across all edges within a predefined network), edge-level correlations (i.e., the correlation between FC and SC across subjects at each edge) has received comparatively little attention. In this study, we systematically analyze both network-level and edge-level FC-SC correlations, demonstrating that they lead to divergent conclusions about the strength of brain function-structure association. To explain these discrepancies, we introduce new random effects models that decompose FC and SC variability into different sources: subject effects, edge effects, and their interactions. Our results reveal that network-level and edge-level FC-SC correlations are influenced by different effects, each contributing differently to the total variability in FC and SC. This modeling framework provides the first statistical approach for disentangling and quantitatively assessing different sources of FC and SC variability and yields new insights into the relationship between functional and structural brain networks.
This paper introduces a nonstandard finite difference (NSFD) approach to a reaction-diffusion SEIQR epidemiological model, which captures the spatiotemporal dynamics of infectious disease transmission. Formulated as a system of semilinear parabolic partial differential equations (PDEs), the model extends classical compartmental models by incorporating spatial diffusion to account for population movement and spatial heterogeneity. The proposed NSFD discretization is designed to preserve the continuous model's essential qualitative features, such as positivity, boundedness, and stability, which are often compromised by standard finite difference methods. We rigorously analyze the model's well-posedness, construct a structure-preserving NSFD scheme for the PDE system, and study its convergence and local truncation error. Numerical simulations validate the theoretical findings and demonstrate the scheme's effectiveness in preserving biologically consistent dynamics.
Comprehending the nature of action potentials is fundamental to our understanding of the functioning of nervous systems in general. The ionic mechanisms underlying action potentials in the squid giant axon were first described by Hodgkin and Huxley in 1952 and their findings have formed our orthodox view of how the physiological action potential functions. However, substan-tial evidence has now accumulated to show that the action potential is accompanied by a syn-chronized coupled soliton pressure pulse in the cell membrane, the action potential pulse (AP-Pulse) which we have recently shown to have an essential function in computation. Here we ex-plore the interactions between the soliton and the ionic mechanisms known to be associated with the action potential. Computational models of the action potential usually describe it as a binary event, but we have shown that it must be a quantum ternary event known as the computa-tional action potential (CAP), whose temporal fixed point is the threshold of the soliton, rather than the rather plastic action potential peak used in other models to facilitate meaningful compu-tation. We have demonstrated this type of frequency computation for the retina, in detail, and also provided an extensive analysis for computation for other brain neural networks. The CAP ac-companies the APPulse and the Physiological action potential. Therefore, we conclude that nerve impulses appear to be an ensemble of three inseparable, interdependent, concurrent states: the physiological action potential, the APPulse and the CAP. However, while the physio-logical action potential is important in terms of neural connectivity, it is irrelevant to computational processes as this is always facilitated by the soliton part of the APPulse.
A common assumption in evolutionary thought is that adaptation drives an increase in biological complexity. However, the rules governing evolution of complexity appear more nuanced. Evolution is deeply connected to learning, where complexity is much better understood, with established results on optimal complexity appropriate for a given learning task. In this work, we suggest a mathematical framework for studying the relationship between evolved organismal complexity and enviroenmntal complexity by leveraging a mathematical isomorphism between evolutionary dynamics and learning theory. Namely, between the replicator equation and sequential Bayesian learning, with evolving types corresponding to competing hypotheses and fitness in a given environment to likelihood of observed evidence. In Bayesian learning, implicit regularization prevents overfitting and drives the inference of hypotheses whose complexity matches the learning challenge. We show how these results naturally carry over to the evolutionary setting, where they are interpreted as organism complexity evolving to match the complexity of the environment, with too complex or too simple organisms suffering from \textit{overfitness} and \textit{underfitness}, respectively. Other aspects, peculiar to evolution and not to learning, reveal additional trends. One such trend is that frequently changing environments decrease selected complexity, a result with potential implications to both evolution and learning. Together, our results suggest that the balance between over-adaptation to transient environmental features, and insufficient flexiblity in responding to environmental challenges, drives the emergence of optimal complexity, reflecting environmental structure. This framework offers new ways of thinking about biological complexity, suggesting new potential causes for it to increase or decrease in different environments.
Polyphenols and proteins are essential biomolecules that influence food functionality and, by extension, human health. Their interactions -- hereafter referred to as PhPIs (polyphenol-protein interactions) -- affect key processes such as nutrient bioavailability, antioxidant activity, and therapeutic efficacy. However, these interactions remain challenging due to the structural diversity of polyphenols and the dynamic nature of protein binding. Traditional experimental techniques like nuclear magnetic resonance (NMR) and mass spectrometry (MS), along with computational tools such as molecular docking and molecular dynamics (MD), have offered important insights but face constraints in scalability, throughput, and reproducibility. This review explores how deep learning (DL) is reshaping the study of PhPIs by enabling efficient prediction of binding sites, interaction affinities, and MD using high-dimensional bio- and chem-informatics data. While DL enhances prediction accuracy and reduces experimental redundancy, its effectiveness remains limited by data availability, quality, and representativeness, particularly in the context of natural products. We critically assess current DL frameworks for PhPIs analysis and outline future directions, including multimodal data integration, improved model generalizability, and development of domain-specific benchmark datasets. This synthesis offers guidance for researchers aiming to apply DL in unraveling structure-function relationships of polyphenols, accelerating discovery in nutritional science and therapeutic development.
We present a mathematical model to study the transmission dynamics of soil-transmitted helminth (STH) infections and to assess the impact of community-based water, sanitation, and hygiene (WASH) program interventions. STH infections are a pressing public health issue in vulnerable populations, impairing children's growth and development. Our model explicitly incorporates WASH coverage and effectiveness as dynamic parameters, enabling analysis of their effects on the basic and effective reproduction numbers and the stability of disease-free and endemic equilibria. Through saddle-node bifurcation analysis, we identify the critical thresholds in the intervention parameters necessary for infection elimination. Numerical simulations show these thresholds and delineate the conditions under which WASH interventions alone may or may not suffice to eliminate transmission, even under widespread coverage. Our findings provide a mathematical framework to optimize helminth control strategies and provide evidence-based insights for public health policies aligned with the World Health Organization's Global Strategy for the elimination of STH infections by 2030.
We use unmanned aerial drones to estimate wildlife density in southeastern Austria and compare these estimates to camera trap data. Traditional methods like capture-recapture, distance sampling, or camera traps are well-established but labour-intensive or spatially constrained. Using thermal (IR) and RGB imagery, drones enable efficient, non-intrusive animal counting. Our surveys were conducted during the leafless period on single days in October and November 2024 in three areas of a sub-Illyrian hill and terrace landscape. Flight transects were based on predefined launch points using a 350 m grid and an algorithm that defined the direction of systematically randomized transects. This setup allowed surveying large areas in one day using multiple drones, minimizing double counts. Flight altitude was set at 60 m to avoid disturbing roe deer (Capreolus capreolus) while ensuring detection. Animals were manually annotated in the recorded imagery and extrapolated to densities per square kilometer. We applied three extrapolation methods with increasing complexity: naive area-based extrapolation, bootstrapping, and zero-inflated negative binomial modelling. For comparison, a Random Encounter Model (REM) estimate was calculated using camera trap data from the flight period. The drone-based methods yielded similar results, generally showing higher densities than REM, except in one area in October. We hypothesize that drone-based density reflects daytime activity in open and forested areas, while REM estimates average activity over longer periods within forested zones. Although both approaches estimate density, they offer different perspectives on wildlife presence. Our results show that drones offer a promising, scalable method for wildlife density estimation.
Smart agriculture applications, integrating technologies like the Internet of Things and machine learning/artificial intelligence (ML/AI) into agriculture, hold promise to address modern challenges of rising food demand, environmental pollution, and water scarcity. Alongside the concept of the phytobiome, which defines the area including the plant, its environment, and associated organisms, and the recent emergence of molecular communication (MC), there exists an important opportunity to advance agricultural science and practice using communication theory. In this article, we motivate to use the communication engineering perspective for developing a holistic understanding of the phytobiome communication and bridge the gap between the phytobiome communication and smart agriculture. Firstly, an overview of phytobiome communication via molecular and electrophysiological signals is presented and a multi-scale framework modeling the phytobiome as a communication network is conceptualized. Then, how this framework is used to model electrophysiological signals is demonstrated with plant experiments. Furthermore, possible smart agriculture applications, such as smart irrigation and targeted delivery of agrochemicals, through engineering the phytobiome communication are proposed. These applications merge ML/AI methods with the Internet of Bio-Nano-Things enabled by MC and pave the way towards more efficient, sustainable, and eco-friendly agricultural production. Finally, the implementation challenges, open research issues, and industrial outlook for these applications are discussed.
Extensive research has been done on feature selection (FS) algorithms for high-dimensional datasets aiming to improve model performance, reduce computational cost and identify features of interest. We test the null hypothesis of using randomly selected features to compare against features selected by FS algorithms to validate the performance of the latter. Our results show that FS on high-dimensional datasets (in particular gene expression) in classification tasks is not useful. We find that (1) models trained on small subsets (0.02%-1% of all features) of randomly selected features almost always perform comparably to those trained on all features, and (2) a "typical"- sized random subset provides comparable or superior performance to that of top-k features selected in various published studies. Thus, our work challenges many feature selection results on high dimensional datasets, particularly in computational genomics. It raises serious concerns about studies that propose drug design or targeted interventions based on computationally selected genes, without further validation in a wet lab.
Biological evolution is realised through the same mechanisms of birth and death that underlie change in population density. The deep interdependence between ecology and evolution is well-established, and recent models focus on integrating eco-evolutionary dynamics to demonstrate how ecological and evolutionary processes interact and feed back upon each other. Nevertheless, a gap remains between the logical foundations of ecology and evolution. Population ecology and evolution have fundamental equations that define how the size of a population (ecology) and the average characteristic within a population (evolution) change over time. These fundamental equations are a complete and exact description of change for any closed population, but how they are formally linked remains unclear. We link the fundamental equations of population ecology and evolution with an equation that sums how individual characteristics interact with individual fitness in a population. From this equation, we derive the fundamental equations of population ecology and evolutionary biology (the Price equation). We thereby identify an overlooked bridge between ecology and biological evolution. Our unification formally recovers the equivalence between mean population growth rate and evolutionary fitness and links this change to ecosystem function. We outline how our framework can be used to further develop eco-evolutionary theory.
In the context of clinical research, computational models have received increasing attention over the past decades. In this systematic review, we aimed to provide an overview of the role of so-called in silico clinical trials (ISCTs) in medical applications. Exemplary for the broad field of clinical medicine, we focused on in silico (IS) methods applied in drug development, sometimes also referred to as model informed drug development (MIDD). We searched PubMed and this http URL for published articles and registered clinical trials related to ISCTs. We identified 202 articles and 48 trials, and of these, 76 articles and 19 trials were directly linked to drug development. We extracted information from all 202 articles and 48 clinical trials and conducted a more detailed review of the methods used in the 76 articles that are connected to drug development. Regarding application, most articles and trials focused on cancer and imaging-related research while rare and pediatric diseases were only addressed in 14 articles and 5 trials, respectively. While some models were informed combining mechanistic knowledge with clinical or preclinical (in-vivo or in-vitro) data, the majority of models were fully data-driven, illustrating that clinical data is a crucial part in the process of generating synthetic data in ISCTs. Regarding reproducibility, a more detailed analysis revealed that only 24% (18 out of 76) of the articles provided an open-source implementation of the applied models, and in only 20% of the articles the generated synthetic data were publicly available. Despite the widely raised interest, we also found that it is still uncommon for ISCTs to be part of a registered clinical trial and their application is restricted to specific diseases leaving potential benefits of ISCTs not fully exploited.
The aim of this paper is to present an, admittedly somewhat subjective, bird's eye view of the mathematical theory concerning the spread of an infectious disease in a susceptible host population with static structure, culminating in a future-oriented description of various modelling challenges.
Understanding the stoichiometry and associated stability of virus-like particles (VLPs) is crucial for optimizing their assembly efficiency and immunogenic properties, which are essential for advancing biotechnology, vaccine design, and drug delivery. However, current experimental methods for determining VLP stoichiometry are labor-intensive, and time consuming. Machine learning approaches have hardly been applied to the study of VLPs. To address this challenge, we introduce a novel persistent Laplacian-based machine learning (PLML) mode that leverages both harmonic and non-harmonic spectra to capture intricate topological and geometric features of VLP structures. This approach achieves superior performance on the VLP200 dataset compared to existing methods. To further assess robustness and generalizability, we collected a new dataset, VLP706, containing 706 VLP samples with expanded stoichiometry diversity. Our PLML model maintains strong predictive accuracy on VLP706. Additionally, through random sequence perturbative mutation analysis, we found that 60-mers and 180-mers exhibit greater stability than 240-mers and 420-mers.
Biological systems are remarkably susceptible to relatively small temperature changes. The most obvious example is fever, when a modest rise in body temperature of only few Kelvin has strong effects on our immune system and how it fights pathogens. Another very important example is climate change, when even smaller temperature changes lead to dramatic shifts in ecosystems. Although it is generally accepted that the main effect of an increase in temperature is the acceleration of biochemical reactions according to the Arrhenius equation, it is not clear how it affects large biochemical networks with complicated architectures. For developmental systems like fly and frog, it has been shown that the system response to temperature deviates in a characteristic manner from the linear Arrhenius plot of single reactions, but a rigorous explanation has not been given yet. Here we use a graph-theoretical interpretation of the mean first-passage times of a biochemical master equation to give a statistical description. We find that in the limit of large system size and if the network has a bias towards a target state, then the Arrhenius plot is generically quadratic, in excellent agreement with numerical simulations for large networks as well as with experimental data for developmental times in fly and frog. We also discuss under which conditions this generic response can be violated, for example for linear chains, which have only one spanning tree.
Collective turns in starling flocks propagate linearly with negligible attenuation, indicating the existence of an underdamped sector in the dispersion relation. Beside granting linear propagation of the phase perturbations, the real part of the frequency should also yield a spin-wave form of the unperturbed correlation function. However, new high-resolution experiments on real flocks show that underdamped traveling waves coexist with an overdamped Lorentzian correlation. Theory and experiments are reconciled once we add to the dynamics a Fermi-Pasta-Ulam-Tsingou term.