Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output. We evaluate the effectiveness of ABFP on the DNNs in the MLPerf datacenter inference benchmark -- realizing less than $1\%$ loss in accuracy compared to FLOAT32. We also propose a novel method of finetuning for AMS devices, Differential Noise Finetuning (DNF), which samples device noise to speed up finetuning compared to conventional Quantization-Aware Training.

User evaluations include a significant quantity of information across online platforms. This information source has been neglected by the majority of existing recommendation systems, despite its potential to ease the sparsity issue and enhance the quality of suggestions. This work presents a deep model for concurrently learning item attributes and user behaviour from review text. Deep Cooperative Neural Networks (DeepCoNN) is the suggested model consisting of two parallel neural networks connected in their final layers. One of the networks focuses on learning user behaviour from reviews submitted by the user, while the other network learns item attributes from user reviews. On top, a shared layer is added to connect these two networks. Similar to factorization machine approaches, the shared layer allows latent factors acquired for people and things to interact with each other. On a number of datasets, DeepCoNN surpasses all baseline recommendation systems, according to experimental findings.

We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the cumulative regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of the immediate user feedback. Our data model and source code are available at ~\url{https://anonymous.4open.science/r/exp3_ss-9985/}.

In this paper we address mobile manipulation planning problems in the presence of sensing and environmental uncertainty. In particular, we consider mobile sensing manipulators operating in environments with unknown geometry and uncertain movable objects, while being responsible for accomplishing tasks requiring grasping and releasing objects in a logical fashion. Existing algorithms either do not scale well or neglect sensing and/or environmental uncertainty. To face these challenges, we propose a hybrid control architecture, where a symbolic controller generates high-level manipulation commands (e.g., grasp an object) based on environmental feedback, an informative planner designs paths to actively decrease the uncertainty of objects of interest, and a continuous reactive controller tracks the sparse waypoints comprising the informative paths while avoiding a priori unknown obstacles. The overall architecture can handle environmental and sensing uncertainty online, as the robot explores its workspace. Using numerical simulations, we show that the proposed architecture can handle tasks of increased complexity while responding to unanticipated adverse configurations.

Evaluation of biases in language models is often limited to synthetically generated datasets. This dependence traces back to the need for a prompt-style dataset to trigger specific behaviors of language models. In this paper, we address this gap by creating a prompt dataset with respect to occupations collected from real-world natural sentences present in Wikipedia. We aim to understand the differences between using template-based prompts and natural sentence prompts when studying gender-occupation biases in language models. We find bias evaluations are very sensitive to the design choices of template prompts, and we propose using natural sentence prompts for systematic evaluations to step away from design choices that could introduce bias in the observations.

Deep generative models like StyleGAN hold the promise of semantic image editing: modifying images by their content, rather than their pixel values. Unfortunately, working with arbitrary images requires inverting the StyleGAN generator, which has remained challenging so far. Existing inversion approaches obtain promising yet imperfect results, having to trade-off between reconstruction quality and downstream editability. To improve quality, these approaches must resort to various techniques that extend the model latent space after training. Taking a step back, we observe that these methods essentially all propose, in one way or another, to increase the number of free parameters. This suggests that inversion might be difficult because it is underconstrained. In this work, we address this directly and dramatically overparameterize the latent space, before training, with simple changes to the original StyleGAN architecture. Our overparameterization increases the available degrees of freedom, which in turn facilitates inversion. We show that this allows us to obtain near-perfect image reconstruction without the need for encoders nor for altering the latent space after training. Our approach also retains editability, which we demonstrate by realistically interpolating between images.

Augmented reality applications have rapidly spread across online platforms, allowing consumers to virtually try-on a variety of products, such as makeup, hair dying, or shoes. However, parametrizing a renderer to synthesize realistic images of a given product remains a challenging task that requires expert knowledge. While recent work has introduced neural rendering methods for virtual try-on from example images, current approaches are based on large generative models that cannot be used in real-time on mobile devices. This calls for a hybrid method that combines the advantages of computer graphics and neural rendering approaches. In this paper we propose a novel framework based on deep learning to build a real-time inverse graphics encoder that learns to map a single example image into the parameter space of a given augmented reality rendering engine. Our method leverages self-supervised learning and does not require labeled training data which makes it extendable to many virtual try-on applications. Furthermore, most augmented reality renderers are not differentiable in practice due to algorithmic choices or implementation constraints to reach real-time on portable devices. To relax the need for a graphics-based differentiable renderer in inverse graphics problems, we introduce a trainable imitator module. Our imitator is a generative network that learns to accurately reproduce the behavior of a given non-differentiable renderer. We propose a novel rendering sensitivity loss to train the imitator, which ensures that the network learns an accurate and continuous representation for each rendering parameter. Our framework enables novel applications where consumers can virtually try-on a novel unknown product from an inspirational reference image on social media. It can also be used by graphics artists to automatically create realistic rendering from a reference product image.

Deep reinforcement learning (RL) has shown promising results in the motion planning of manipulators. However, no method guarantees the safety of highly dynamic obstacles, such as humans, in RL-based manipulator control. This lack of formal safety assurances prevents the application of RL for manipulators in real-world human environments. Therefore, we propose a shielding mechanism that ensures ISO-verified human safety while training and deploying RL algorithms on manipulators. We utilize a fast reachability analysis of humans and manipulators to guarantee that the manipulator comes to a complete stop before a human is within its range. Our proposed method guarantees safety and significantly improves the RL performance by preventing episode-ending collisions. We demonstrate the performance of our proposed method in simulation using human motion capture data.

We present a novel and simple solution to Atomic Broadcast (AB). We reduce AB to two subproblems. One of them is Reliable Broadcast (RB). We also introduce a subproblem we call Weakly-terminating Binary Agreement (WBA). WBA relaxes Binary Agreement (BA) protocols by not always terminating. WBA admits much simpler solutions than BA. We discuss concrete solutions to RB and WBA. We prove safety, liveness, and censorship resilience of our new AB protocol.

Transmission power systems usually consist of interconnected sub-grids that are operated relatively independently. When a failure happens, it is desirable to localize its impact within the sub-grid where the failure occurs. This paper introduces three interface networks to connect sub-grids, achieving better failure localization while maintaining robust network connectivity. The proposed interface networks are validated with numerical experiments on the IEEE 118-bus test network under both DC and AC power flow models.

Humans can flexibly extend word usages across different grammatical classes, a phenomenon known as word class conversion. Noun-to-verb conversion, or denominal verb (e.g., to Google a cheap flight), is one of the most prevalent forms of word class conversion. However, existing natural language processing systems are impoverished in interpreting and generating novel denominal verb usages. Previous work has suggested that novel denominal verb usages are comprehensible if the listener can compute the intended meaning based on shared knowledge with the speaker. Here we explore a computational formalism for this proposal couched in frame semantics. We present a formal framework, Noun2Verb, that simulates the production and comprehension of novel denominal verb usages by modeling shared knowledge of speaker and listener in semantic frames. We evaluate an incremental set of probabilistic models that learn to interpret and generate novel denominal verb usages via paraphrasing. We show that a model where the speaker and listener cooperatively learn the joint distribution over semantic frame elements better explains the empirical denominal verb usages than state-of-the-art language models, evaluated against data from 1) contemporary English in both adult and child speech, 2) contemporary Mandarin Chinese, and 3) the historical development of English. Our work grounds word class conversion in probabilistic frame semantics and bridges the gap between natural language processing systems and humans in lexical creativity.

The ubiquity of distributed agreement protocols, such as consensus, has galvanized interest in verification of such protocols as well as applications built on top of them. The complexity and unboundedness of such systems, however, makes their verification onerous in general, and, particularly prohibitive for full automation. An exciting, recent breakthrough reveals that, through careful modeling, it becomes possible for verification of interesting distributed agreement-based (DAB) systems, that are unbounded in the number of processes, to be reduced to model checking of small, finite-state systems. It is an open question if such reductions are also possible for DAB systems that are doubly-unbounded, in particular, DAB systems that additionally have unbounded data domains. We answer this question in the affirmative in this work for models of DAB systems, thereby broadening the class of DAB systems which can be automatically verified. We present a new symmetry-based reduction and develop a tool, Venus, that can efficiently verify sophisticated DAB system models.

A modular version of the baskets queue of Hoffman, Shalev and Shavit is presented. It manipulates the head and tail using a novel object called load-link/increment-conditional, which can be implemented using only READ/WRITE instructions, and admits implementations that spread contention. This suggests that there might be an alternative to the seemingly inherent bottleneck in previous queue implementations that manipulate the head and the tail using read-modify-write instructions over a single shared register.

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new algorithm captures the effect of environment misclassification and highlights the benefits over learning each task separately or meta-learning without recognition of the distinct mixture components.

Large disk arrays are organized into storage nodes -- SNs or bricks with their own cashed RAID controller for multiple disks. Erasure coding at SN level is attained via parity or Reed-Solomon codes. Hierarchical RAID -- HRAID -- provides an additional level of coding across SNs, e.g., check strips P, Q at intra-SN level and R at the inter-SN level. Failed disks and SNs are not replaced and rebuild is accomplished by restriping, e.g., overwriting P and Q for disk failures and R for an SN failure. For a given total redundancy level we use an approximate reliability analysis method and Monte-Carlo simulation to explore the better apportionment of check blocks for intra- vs inter-SN redundancy. Our study indicates that a higher MTTDL -- Mean-Time-to-Data-Loss -- is attained by associating higher reliability at intra-SN level rather than inter-SN level, which is contrary to that of an IBM study.

We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$-round regret in which we include a linear growth of regret associated with each communication round. Our regret bound is of order $\mathcal{O}\Big(\sqrt{\frac{T}{N \log(1/|\lambda_2|)}}\cdot (\log T)^2\Big)$, where $\lambda_2$ is the second largest (in absolute value) eigenvalue of the communication matrix.

Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large labeled datasets for each task that are expensive to collect in the real world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we explore the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20 percent increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes.

Current changes in society and the education system, cumulated with the accelerated development of new technologies, entail inherent changes in the educational process. Numerous studies have shown that the pandemic has forced a rapid transformation of higher education. Thus, if before the pandemic digital technologies were used to supplement the learning process, now they are the main means of learning delivery. In addition, as previous research has shown, new pedagogical strategies and new ways of teaching and learning are needed for the current generation of students, the so-called Generation Z, to acquire new knowledge and develop new skills. In this necessary evolution of the educational process, a possible solution to increase the effectiveness of the learning process for the Generation Z students is to use microlearning to extend the traditional ways of learning. Many studies have shown that microlearning, based on how today's students learn and memorize, facilitates learning. In recent years there has been a growing trend in their use of microlearning in the educational process. But, in order to be effective, this approach must allow the individual knowledge building, by indicating a guiding direction of the optimal path to achieve the proposed objectives. We propose a system for personalized learning using microlearning, which provides support and guidance to students based on their individual needs, in order to increase their interest in learning, but also to compensate for various deficiencies in their educational background. We also present a case study from the higher education sector. Feedback from students and data collected during the semester as a result of the students' behavioural analysis and their real learning motivations will be used to improve the proposed system.

The web has become a dominant epistemic environment, influencing people's beliefs at a global scale. However, online epistemic environments are increasingly polluted, impairing societies' ability to coordinate effectively in the face of global crises. We argue that centralized platforms are a main source of epistemic pollution, and that healthier environments require redesigning how we collectively govern attention. Inspired by decentralization and open source software movements, we propose Open Source Attention, a socio-technical framework for "freeing" human attention from control by platforms, through a decentralized eco-system for creating, storing and querying stigmergic markers; the digital traces of human attention.

Borrowing ideas from {\em Production functions} in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models. We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset. One of the interesting conclusions of the study is that if the cost of machine translation is greater than zero, the optimal performance at least cost is always achieved with at least some or only manually-created data. To our knowledge, this is the first attempt towards extending the concept of production functions to study data collection strategies for training multilingual models, and can serve as a valuable tool for other similar cost vs data trade-offs in NLP.

Typical deep learning approaches to modeling high-dimensional data often result in complex models that do not easily reveal a new understanding of the data. Research in the deep learning field is very actively pursuing new methods to interpret deep neural networks and to reduce their complexity. An approach is described here that starts with linear models and incrementally adds complexity only as supported by the data. An application is shown in which models that map global temperature and precipitation to years are trained to investigate patterns associated with changes in climate.

Developer experience is an important concern for software organizations as enhancing developer experience improves productivity, satisfaction, engagement and retention. We set out to understand what affects developer experience through semi-structured interviews with 21 developers from industry, which we transcribed and iteratively coded. Our findings elucidate factors that affect developer experience and characteristics that influence their respective importance to individual developers. We also identify strategies employed by individuals and teams to improve developer experience and the barriers that stand in their way. Lastly, we describe the coping mechanisms of developers when developer experience cannot be sufficiently improved. Our findings result in the DX Framework, an actionable conceptual framework for understanding and improving developer experience. The DX Framework provides a go-to reference for organizations that want to enable more productive and effective work environments for their developers.

Neural architecture search (NAS) has shown great promise in the field of automated machine learning (AutoML). NAS has outperformed hand-designed networks and made a significant step forward in the field of automating the design of deep neural networks, thus further reducing the need for human expertise. However, most research is done targeting a single specific task, leaving research of NAS methods over multiple tasks mostly overlooked. Generally, there exist two popular ways to find an architecture for some novel task. Either searching from scratch, which is ineffective by design, or transferring discovered architectures from other tasks, which provides no performance guarantees and is probably not optimal. In this work, we present a meta-learning framework to warm-start Differentiable architecture search (DARTS). DARTS is a NAS method that can be initialized with a transferred architecture and is able to quickly adapt to new tasks. A task similarity measure is used to determine which transfer architecture is selected, as transfer architectures found on similar tasks will likely perform better. Additionally, we employ a simple meta-transfer architecture that was learned over multiple tasks. Experiments show that warm-started DARTS is able to find competitive performing architectures while reducing searching costs on average by 60%.

Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic diversity. We argue that this makes the existing practices in multilingual evaluation unreliable and does not provide a full picture of the performance of MMLMs across the linguistic landscape. We propose that the recent work done in Performance Prediction for NLP tasks can serve as a potential solution in fixing benchmarking in Multilingual NLP by utilizing features related to data and language typology to estimate the performance of an MMLM on different languages. We compare performance prediction with translating test data with a case study on four different multilingual datasets, and observe that these methods can provide reliable estimates of the performance that are often on-par with the translation based approaches, without the need for any additional translation as well as evaluation costs.

We present a decision support system for managing water quality in prawn ponds. The system uses various sources of data and deep learning models in a novel way to provide 24-hour forecasting and anomaly detection of water quality parameters. It provides prawn farmers with tools to proactively avoid a poor growing environment, thereby optimising growth and reducing the risk of losing stock. This is a major shift for farmers who are forced to manage ponds by reactively correcting poor water quality conditions. To our knowledge, we are the first to apply Transformer as an anomaly detection model, and the first to apply anomaly detection in general to this aquaculture problem. Our technical contributions include adapting ForecastNet for multivariate data and adapting Transformer and the Attention model to incorporate weather forecast data into their decoders. We attain an average mean absolute percentage error of 12% for dissolved oxygen forecasts and we demonstrate two anomaly detection case studies. The system is successfully running in its second year of deployment on a commercial prawn farm.

We present a new technique for automatically inferring inductive invariants of parameterized distributed protocols specified in TLA+. Ours is the first such invariant inference technique to work directly on TLA+, an expressive, high level specification language. To achieve this, we present a new algorithm for invariant inference that is based around a core procedure for generating plain, potentially non-inductive lemma invariants that are used as candidate conjuncts of an overall inductive invariant. We couple this with a greedy lemma invariant selection procedure that selects lemmas that eliminate the largest number of counterexamples to induction at each round of our inference procedure. We have implemented our algorithm in a tool, endive, and evaluate it on a diverse set of distributed protocol benchmarks, demonstrating competitive performance and ability to uniquely solve an industrial scale reconfiguration protocol.

In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviating the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them consider security as the priority for in-storage computing. Specifically, since modern SSD controllers do not have a trusted execution environment, an offloaded (malicious) program could steal, modify, and even destroy the data stored in the SSD. In this paper, we first investigate the attacks that could be conducted by offloaded in-storage programs. To defend against these attacks, we build IceClave, a lightweight trusted execution environment for in-storage computing. IceClave enables security isolation between in-storage programs and flash management functions. IceClave also achieves security isolation between in-storage programs and enforces memory encryption and integrity verification of in-storage DRAM with low overhead. To protect data loaded from flash chips, IceClave develops a lightweight data encryption/decryption mechanism in flash controllers. We develop IceClave with a full system simulator and evaluate IceClave with a variety of data-intensive applications. Compared to state-of-the-art in-storage computing approaches, IceClave introduces only 7.6% performance overhead, while enforcing security isolation in the SSD controller with minimal hardware cost. IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.31$\times$ better performance than the conventional host-based trusted computing approach.

Fractional-step methods are a popular and powerful divide-and-conquer approach for the numerical solution of differential equations. When the integrators of the fractional steps are Runge--Kutta methods, such methods can be written as generalized additive Runge--Kutta (GARK) methods, and thus the representation and analysis of such methods can be done through the GARK framework. We show how the general Butcher tableau representation and linear stability of such methods are related to the coefficients of the splitting method, the individual sub-integrators, and the order in which they are applied. We use this framework to explain some observations in the literature about fractional-step methods such as the choice of sub-integrators, the order in which they are applied, and the role played by negative splitting coefficients in the stability of the method.

A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machine-learning platforms often update models over time, on data that often shifts in distribution, giving the attacker more information. This paper proposes new attacks that take advantage of one or more model updates to improve MI. A key part of our approach is to leverage rich information from standalone MI attacks mounted separately against the original and updated models, and to combine this information in specific ways to improve attack effectiveness. We propose a set of combination functions and tuning methods for each, and present both analytical and quantitative justification for various options. Our results on four public datasets show that our attacks are effective at using update information to give the adversary a significant advantage over attacks on standalone models, but also compared to a prior MI attack that takes advantage of model updates in a related machine-unlearning setting. We perform the first measurements of the impact of distribution shift on MI attacks with model updates, and show that a more drastic distribution shift results in significantly higher MI risk than a gradual shift. Our code is available at https://www.github.com/stanleykywu/model-updates.

In this paper we propose, analyze, and test numerically a pressure-robust stabilized finite element for a linearized case of the Navier-Stokes' equation in the high Reynolds number regime, also known as Oseen's problem. Stabilization terms are defined by jumps of different combinations of derivatives for the convective term over the element faces of the triangulation of the domain. With the help of these stabilizing terms, and the fact the finite element space is assumed to provide a point-wise divergence-free velocity, an $\mathcal O\big(h^{k+\frac12}\big)$ error estimate in the $L^2$-norm is proved for the method (in the convection-dominated regime), and optimal order estimates in the remaining norms of the error. Numerical results supporting the theoretical findings are provided.

Neural networks have been criticised for their inability to perform continual learning due to catastrophic forgetting and rapid unlearning of a past concept when a new concept is introduced. Catastrophic forgetting can be alleviated by specifically designed models and training techniques. This paper outlines a novel Spline Additive Model (SAM). SAM exhibits intrinsic memory retention with sufficient expressive power for many practical tasks, but is not a universal function approximator. SAM is extended with the Kolmogorov-Arnold representation theorem to a novel universal function approximator, called the Kolmogorov-Arnold Spline Additive Model - KASAM. The memory retention, expressive power and limitations of SAM and KASAM are illustrated analytically and empirically. SAM exhibited robust but imperfect memory retention, with small regions of overlapping interference in sequential learning tasks. KASAM exhibited greater susceptibility to catastrophic forgetting. KASAM in combination with pseudo-rehearsal training techniques exhibited superior performance in regression tasks and memory retention.

Dependency injection (DI) is generally known to improve maintainability by keeping application classes separate from the library. Particularly within the Java environment, there are many applications using the principles of DI with the aim to improve maintainability. There exists some work that provides an inference on the impact of DI on maintainability, but no conclusive evidence is provided. The fact that there are no publicly available tools for quantifying DI makes such an evidence more difficult to be produced. In this paper, we propose a novel metric, DCBO, to measure the proportion of DI in a project based on weighted couplings. We describe how DCBO can serve as a more meaningful metric in computing maintainability when DI is also considered. The metric is implemented in the CKJM-Analyzer, an extension of the CKJM tool that utilizes static code analysis to detect DI. We discuss the algorithmic approach behind the static analysis and prove the soundness of the tool using a set of open-source Java projects.

In this paper we explore a context of application of Cob, a recently introduced Byzantine Fault Tolerant consensus protocol. Cob proves to be a leaderless consensus protocol which carries out the consensus process in parallel on each component of a list of events to be observed and recorded. We show how Cob can be used to define a consensus layer for scalable and sustainable blockchains. This layer is used to design consensus protocols based on sharding as a mean to achieve scalability, and on the fragmentation of time in time-slots (which get assigned to nodes that are instructed to create new blocks) as a mean to reduce the amount of computation and communication necessary for the maintenance of the distributed ledger. We explain why Cob is a viable candidate to implement such consensus layer through the introduction of an auxiliary blockchain that we name Synchronization Chain.

Modal logics allow reasoning about various modes of truth: for example, what it means for something to be possibly true, or to know that something is true as opposed to merely believing it. This report describes embeddings of propositional and quantified modal logic in the PVS verification system. The resources of PVS allow this to be done in an attractive way that supports much of the standard syntax of modal logic, while providing effective automation. The report introduces and formally specifies and verifies several standard topics in modal logic such as relationships between the standard modal axioms and properties of the accessibility relation, and attributes of the Barcan Formula and its converse in both constant and varying domains.

There have been several successful implementations of bio-inspired legged robots that can trot, walk, and hop robustly even in the presence of significant unplanned disturbances. Despite all of these accomplishments, practical control and high-level decision-making algorithms in multi-modal legged systems are overlooked. In nature, animals such as birds impressively showcase multiple modes of mobility including legged and aerial locomotion. They are capable of performing robust locomotion over large walls, tight spaces, and can recover from unpredictable situations such as sudden gusts or slippery surfaces. Inspired by these animals' versatility and ability to combine legged and aerial mobility to negotiate their environment, our main goal is to design and control legged robots that integrate two completely different forms of locomotion, ground and aerial mobility, in a single platform. Our robot, the Husky Carbon, is being developed to integrate aerial and legged locomotion and to transform between legged and aerial mobility. This work utilizes a Reference Governor (RG) based on low-level control of Husky's dynamical model to maintain the efficiency of legged locomotion, uses Probabilistic Road Maps (PRM) and 3D A* algorithms to generate an optimal path based on the energetic cost of transport for legged and aerial mobility

We prove a two-way correspondence between the min-max optimization of general CPE loss function GANs and the minimization of associated $f$-divergences. We then focus on $\alpha$-GAN, defined via the $\alpha$-loss, which interpolates several GANs (Hellinger, vanilla, Total Variation) and corresponds to the minimization of the Arimoto divergence. We show that the Arimoto divergences induced by $\alpha$-GAN equivalently converge, for all $\alpha\in \mathbb{R}_{>0}\cup\{\infty\}$. However, under restricted learning models and finite samples, we provide estimation bounds which indicate diverse GAN behavior as a function of $\alpha$. Finally, we present empirical results on a toy dataset that highlight the practical utility of tuning the $\alpha$ hyperparameter.

Bats' dynamic morphing wings are known to be extremely high-dimensional, and they employ the combination of inertial dynamics and aerodynamics manipulations to showcase extremely agile maneuvers. Bats heavily rely on their highly flexible wings and are capable of dynamically morphing their wings to adjust aerodynamic and inertial forces applied to their wing and perform sharp banking turns. There are technical hardware and control challenges in copying the morphing wing flight capabilities of flying animals. This work is majorly focused on the modeling and control aspects of stable, tail-less, morphing wing flight. A classical control approach using bang-bang control is proposed to stabilize a bio-inspired morphing wing robot called Aerobat. Robot-environment interactions based on horseshoe vortex shedding and Wagner functions is derived to realistically evaluate the feasibility of the bang-bang control, which is then implemented on the robot in experiments to demonstrate first-time closed-loop stable flights of Aerobat.

The Spotted Lanternfly (SLF) is an invasive planthopper that threatens the local biodiversity and agricultural economy of regions such as the Northeastern United States and Japan. As researchers scramble to study the insect, there is a great potential for computer vision tasks such as detection, pose estimation, and accurate identification to have important downstream implications in containing the SLF. However, there is currently no publicly available dataset for training such AI models. To enable computer vision applications and motivate advancements to challenge the invasive SLF problem, we propose LANTERN-RD, the first curated image dataset of the spotted lanternfly and its look-alikes, featuring images with varied lighting conditions, diverse backgrounds, and subjects in assorted poses. A VGG16-based baseline CNN validates the potential of this dataset for stimulating fresh computer vision applications to accelerate invasive SLF research. Additionally, we implement the trained model in a simple mobile classification application in order to directly empower responsible public mitigation efforts. The overarching mission of this work is to introduce a novel SLF image dataset and release a classification framework that enables computer vision applications, boosting studies surrounding the invasive SLF and assisting in minimizing its agricultural and economic damage.

Contrastive learning pre-trains an image encoder using a large amount of unlabeled data such that the image encoder can be used as a general-purpose feature extractor for various downstream tasks. In this work, we propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data, such that the downstream classifiers built based on the poisoned encoder for multiple target downstream tasks simultaneously classify attacker-chosen, arbitrary clean inputs as attacker-chosen, arbitrary classes. We formulate our data poisoning attack as a bilevel optimization problem, whose solution is the set of poisoning inputs; and we propose a contrastive-learning-tailored method to approximately solve it. Our evaluation on multiple datasets shows that PoisonedEncoder achieves high attack success rates while maintaining the testing accuracy of the downstream classifiers built upon the poisoned encoder for non-attacker-chosen inputs. We also evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses. Our results show that these defenses can decrease the attack success rate of PoisonedEncoder, but they also sacrifice the utility of the encoder or require a large clean pre-training dataset.

We consider a cell-free massive multiple-input multiple-output (MIMO) network utilizing a virtually full-duplex (vFD) mode, where access points (APs) with a downlink (DL) mode and those with an uplink (UL) mode simultaneously serve DL and UL users (UEs). In order to maximize the sum spectral efficiency (SE) of the DL and UL transmissions, we formulate a mixed-integer optimization problem to jointly design the AP mode assignment and power control. This problem is subject to minimum per-UE SE requirements, per-AP power control, and per-UL UE power constraints. By employing the successive convex approximation technique, we propose an algorithm to obtain a stationary solution of the formulated problem. Numerical results show that the proposed vFD approach can provide a sum SE that is $2.5$ and $1.5$ times larger than the traditional half-duplex and heuristic baseline schemes, respectively, in terms of $95\%$-likely sum SE.

We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g. a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compress a given large network into much smaller size tailored to meet each contextual requirement. We employ a hypernetwork to parameterize the posterior distribution of weights given conditional inputs and minimize a variational objective of this Bayesian neural network. To further reduce the network sizes, we propose a new input-output group sparsity factorization of weights to encourage more sparseness in the generated weights. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.

Owing to the rapid development of sensor technology, hyperspectral (HS) remote sensing (RS) imaging has provided a significant amount of spatial and spectral information for the observation and analysis of the Earth's surface at a distance of data acquisition devices, such as aircraft, spacecraft, and satellite. The recent advancement and even revolution of the HS RS technique offer opportunities to realize the full potential of various applications, while confronting new challenges for efficiently processing and analyzing the enormous HS acquisition data. Due to the maintenance of the 3-D HS inherent structure, tensor decomposition has aroused widespread concern and research in HS data processing tasks over the past decades. In this article, we aim at presenting a comprehensive overview of tensor decomposition, specifically contextualizing the five broad topics in HS data processing, and they are HS restoration, compressed sensing, anomaly detection, super-resolution, and spectral unmixing. For each topic, we elaborate on the remarkable achievements of tensor decomposition models for HS RS with a pivotal description of the existing methodologies and a representative exhibition on the experimental results. As a result, the remaining challenges of the follow-up research directions are outlined and discussed from the perspective of the real HS RS practices and tensor decomposition merged with advanced priors and even with deep neural networks. This article summarizes different tensor decomposition-based HS data processing methods and categorizes them into different classes from simple adoptions to complex combinations with other priors for the algorithm beginners. We also expect this survey can provide new investigations and development trends for the experienced researchers who understand tensor decomposition and HS RS to some extent.

Natural language processing (NLP) is the field that attempts to make human language accessible to computers, and it relies on applying a mathematical model to express the meaning of symbolic language. One such model, DisCoCat, defines how to express both the meaning of individual words as well as their compositional nature. This model can be naturally implemented on quantum computers, leading to the field quantum NLP (QNLP). Recent experimental work used quantum machine learning techniques to map from text to class label using the expectation value of the quantum encoded sentence. Theoretical work has been done on computing the similarity of sentences but relies on an unrealized quantum memory store. The main goal of this thesis is to leverage the DisCoCat model to design a quantum-based kernel function that can be used by a support vector machine (SVM) for NLP tasks. Two similarity measures were studied: (i) the transition amplitude approach and (ii) the SWAP test. A simple NLP meaning classification task from previous work was used to train the word embeddings and evaluate the performance of both models. The Python module lambeq and its related software stack was used for implementation. The explicit model from previous work was used to train word embeddings and achieved a testing accuracy of $93.09 \pm 0.01$%. It was shown that both the SVM variants achieved a higher testing accuracy of $95.72 \pm 0.01$% for approach (i) and $97.14 \pm 0.01$% for (ii). The SWAP test was then simulated under a noise model defined by the real quantum device, ibmq_guadalupe. The explicit model achieved an accuracy of $91.94 \pm 0.01$% while the SWAP test SVM achieved 96.7% on the testing dataset, suggesting that the kernelized classifiers are resilient to noise. These are encouraging results and motivate further investigations of our proposed kernelized QNLP paradigm.

The Gaussian multiple-input multiple-output (MIMO) multi-receiver wiretap channel is studied in this paper. The base station broadcasts confidential messages to K intended users while keeping the messages secret from an eavesdropper. The capacity of this channel has already been characterized by applying dirty-paper coding and stochastic encoding. However, K factorial encoding orders may need to be enumerated for that, which makes the problem intractable. We prove that there exists one optimal encoding order and reduced the K factorial times to a one-time encoding. The optimal encoding order is proved by forming a secrecy weighted sum rate (WSR) maximization problem. The optimal order is the same as that for the MIMO broadcast channel without secrecy constraint, that is, the weight of users' rate in the WSR maximization problem determines the optimal encoding order. Numerical results verify the optimal encoding order.

Trusted execution environments (TEEs) such as \intelsgx facilitate the secure execution of an application on untrusted machines. Sadly, such environments suffer from serious limitations and performance overheads in terms of writing back data to the main memory, their interaction with the OS, and the ability to issue I/O instructions. There is thus a plethora of work that focuses on improving the performance of such environments -- this necessitates the need for a standard, widely accepted benchmark suite (something similar to SPEC and PARSEC). To the best of our knowledge, such a suite does not exist. Our suite, SGXGauge, contains a diverse set of workloads such as blockchain codes, secure machine learning algorithms, lightweight web servers, secure key-value stores, etc. We thoroughly characterizes the behavior of the benchmark suite on a native platform and on a platform that uses a library OS-based shimming layer (GrapheneSGX). We observe that the most important metrics of interest are performance counters related to paging, memory, and TLB accesses. There is an abrupt change in performance when the memory footprint starts to exceed the size of the EPC size in Intel SGX, and the library OS does not add a significant overhead (~ +- 10%).

Purpose: The objective of this investigation is to provide a comprehensive analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. Methods: Using a data set of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluate feature based methods previously developed for surgical skill assessment mostly under benchtop settings. In addition, we present and validate two deep learning methods that directly assess skill using RGB videos. In the first method, we predict instrument tips as keypoints, and learn surgical skill using temporal convolutional neural networks. In the second method, we propose a novel architecture for surgical skill assessment that includes a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network), both of which are augmented by visual attention mechanisms. We report the area under the receiver operating characteristic curve, sensitivity, specificity, and predictive values with each method through 5-fold cross-validation. Results: For the task of binary skill classification (expert vs. novice), deep neural network based methods exhibit higher AUC than the classical spatiotemporal interest point based methods. The neural network approach using attention mechanisms also showed high sensitivity and specificity. Conclusion: Deep learning methods are necessary for video-based assessment of surgical skill in the operating room. Our findings of internal validity of a network using attention mechanisms to assess skill directly using RGB videos should be evaluated for external validity in other data sets.

In this work, we propose a joint system combining a talking face generation system with a text-to-speech system that can generate multilingual talking face videos from only the text input. Our system can synthesize natural multilingual speeches while maintaining the vocal identity of the speaker, as well as lip movements synchronized to the synthesized speech. We demonstrate the generalization capabilities of our system by selecting four languages (Korean, English, Japanese, and Chinese) each from a different language family. We also compare the outputs of our talking face generation model to outputs of a prior work that claims multilingual support. For our demo, we add a translation API to the preprocessing stage and present it in the form of a neural dubber so that users can utilize the multilingual property of our system more easily.

In this paper, we evaluate the performance of the multilevel Monte Carlo method (MLMC) for deterministic and uncertain hyperbolic systems, where randomness is introduced either in the modeling parameters or in the approximation algorithms. MLMC is a well known variance reduction method widely used to accelerate Monte Carlo (MC) sampling. However, we demonstrate in this paper that for hyperbolic systems, whether MLMC can achieve a real boost turns out to be delicate. The computational costs of MLMC and MC depend on the interplay among the accuracy (bias) and the computational cost of the numerical method for a single sample, as well as the variances of the sampled MLMC corrections or MC solutions. We characterize three regimes for the MLMC and MC performances using those parameters, and show that MLMC may not accelerate MC and can even have a higher cost when the variances of MC solutions and MLMC corrections are of the same order. Our studies are carried out by a few prototype hyperbolic systems: a linear scalar equation, the Euler and shallow water equations, and a linear relaxation model, the above statements are proved analytically in some cases, and demonstrated numerically for the cases of the stochastic hyperbolic equations driven by white noise parameters and Glimm's random choice method for deterministic hyperbolic equations.

Robotic Information Gathering (RIG) relies on the uncertainty of a probabilistic model to identify critical areas for efficient data collection. Gaussian processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data typically does not satisfy the assumption of stationarity, where different locations are assumed to have the same degree of variability. As a result, the prediction uncertainty does not accurately capture prediction error, limiting the success of RIG algorithms. We propose a novel family of nonstationary kernels, named the Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a nonstationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used RBF kernel and other popular nonstationary kernels. The improved uncertainty quantification guides the downstream RIG planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with high spatial variations, enabling the model to characterize the salient environmental features.

The topic of generalizing machine learning models learned on a collection of source domains to unknown target domains is challenging. While many domain generalization (DG) methods have achieved promising results, they primarily rely on the source domains at train-time without manipulating the target domains at test-time. Thus, it is still possible that those methods can overfit to source domains and perform poorly on target domains. Driven by the observation that domains are strongly related to styles, we argue that reducing the gap between source and target styles can boost models' generalizability. To solve the dilemma of having no access to the target domain during training, we introduce Test-time Fourier Style Calibration (TF-Cal) for calibrating the target domain style on the fly during testing. To access styles, we utilize Fourier transformation to decompose features into amplitude (style) features and phase (semantic) features. Furthermore, we present an effective technique to Augment Amplitude Features (AAF) to complement TF-Cal. Extensive experiments on several popular DG benchmarks and a segmentation dataset for medical images demonstrate that our method outperforms state-of-the-art methods.

Based on the observation that $\mathbb{Q}^{(p-1) \times (p-1)}$ is isomorphic to a quotient skew polynomial ring, we propose a new method for $(p-1)\times (p-1)$ matrix multiplication over $\mathbb{Q}$, where $p$ is a prime number. The main feature of our method is the acceleration for matrix multiplication if the product is skew-sparse. Based on the new method, we design a deterministic algorithm with complexity $O(T^{\omega-2} p^2)$, where $T\le p-1$ is a parameter determined by the skew-sparsity of input matrices and $\omega$ is the asymptotic exponent of matrix multiplication. Moreover, by introducing randomness, we also propose a probabilistic algorithm with complexity $O^\thicksim(t^{\omega-2}p^2+p^2\log\frac{1}{\nu})$, where $t\le p-1$ is the skew-sparsity of the product and $\nu$ is the probability parameter.

Recently, the structural reading comprehension (SRC) task on web pages has attracted increasing research interests. Although previous SRC work has leveraged extra information such as HTML tags or XPaths, the informative topology of web pages is not effectively exploited. In this work, we propose a Topological Information Enhanced model (TIE), which transforms the token-level task into a tag-level task by introducing a two-stage process (i.e. node locating and answer refining). Based on that, TIE integrates Graph Attention Network (GAT) and Pre-trained Language Model (PLM) to leverage the topological information of both logical structures and spatial structures. Experimental results demonstrate that our model outperforms strong baselines and achieves state-of-the-art performances on the web-based SRC benchmark WebSRC at the time of writing. The code of TIE will be publicly available at https://github.com/X-LANCE/TIE.

Though widely used in industry, traditional task-oriented dialogue systems suffer from three bottlenecks: (i) difficult ontology construction (e.g., intents and slots); (ii) poor controllability and interpretability; (iii) annotation-hungry. In this paper, we propose to represent utterance with a simpler concept named Dialogue Action, upon which we construct a tree-structured TaskFlow and further build task-oriented chatbot with TaskFlow as core component. A framework is presented to automatically construct TaskFlow from large-scale dialogues and deploy online. Our experiments on real-world after-sale customer services show TaskFlow can satisfy the major needs, as well as reduce the developer burden effectively.

This paper proposes Impala, a new cryptographic protocol for private inference in the client-cloud setting. Impala builds upon recent solutions that combine the complementary strengths of homomorphic encryption (HE) and secure multi-party computation (MPC). A series of protocol optimizations are developed to reduce both communication and performance bottlenecks. First, we remove MPC's overwhelmingly high communication cost from the client by introducing a proxy server and developing a low-overhead key switching technique. Key switching reduces the clients bandwidth by multiple orders of magnitude, however the communication between the proxy and cloud is still excessive. Second, to we develop an optimized garbled circuit that leverages truncated secret shares for faster evaluation and less proxy-cloud communication. Finally, we propose sparse HE convolution to reduce the computational bottleneck of using HE. Compared to the state-of-the-art, these optimizations provide a bandwidth savings of over 3X and speedup of 4X for private deep learning inference.

Due to the labor-intensive nature of manual test oracle construction, various automated testing techniques have been proposed to enhance the reliability of Natural Language Processing (NLP) software. In theory, these techniques mutate an existing test case (e.g., a sentence with its label) and assume the generated one preserves an equivalent or similar semantic meaning and thus, the same label. However, in practice, many of the generated test cases fail to preserve similar semantic meaning and are unnatural (e.g., grammar errors), which leads to a high false alarm rate and unnatural test cases. Our evaluation study finds that 44% of the test cases generated by the state-of-the-art (SOTA) approaches are false alarms. These test cases require extensive manual checking effort, and instead of improving NLP software, they can even degrade NLP software when utilized in model training. To address this problem, we propose AEON for Automatic Evaluation Of NLP test cases. For each generated test case, it outputs scores based on semantic similarity and language naturalness. We employ AEON to evaluate test cases generated by four popular testing techniques on five datasets across three typical NLP tasks. The results show that AEON aligns the best with human judgment. In particular, AEON achieves the best average precision in detecting semantic inconsistent test cases, outperforming the best baseline metric by 10%. In addition, AEON also has the highest average precision of finding unnatural test cases, surpassing the baselines by more than 15%. Moreover, model training with test cases prioritized by AEON leads to models that are more accurate and robust, demonstrating AEON's potential in improving NLP software.

Cross-Domain Recommendation (CDR) has been popularly studied to utilize different domain knowledge to solve the cold-start problem in recommender systems. Most of the existing CDR models assume that both the source and target domains share the same overlapped user set for knowledge transfer. However, only few proportion of users simultaneously activate on both the source and target domains in practical CDR tasks. In this paper, we focus on the Partially Overlapped Cross-Domain Recommendation (POCDR) problem, that is, how to leverage the information of both the overlapped and non-overlapped users to improve recommendation performance. Existing approaches cannot fully utilize the useful knowledge behind the non-overlapped users across domains, which limits the model performance when the majority of users turn out to be non-overlapped. To address this issue, we propose an end-to-end dual-autoencoder with Variational Domain-invariant Embedding Alignment (VDEA) model, a cross-domain recommendation framework for the POCDR problem, which utilizes dual variational autoencoders with both local and global embedding alignment for exploiting domain-invariant user embedding. VDEA first adopts variational inference to capture collaborative user preferences, and then utilizes Gromov-Wasserstein distribution co-clustering optimal transport to cluster the users with similar rating interaction behaviors. Our empirical studies on Douban and Amazon datasets demonstrate that VDEA significantly outperforms the state-of-the-art models, especially under the POCDR setting.

Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective approach (i.e., managed NVM). However, such an approach is runtime specific, which lacks a generic abstraction across different managed languages. Similar to the well-known filesystem primitives that allow diverse programs to access same files via the block I/O interface, managed NVM deserves the same system-wide property for persistent objects across managed runtimes with low overhead. In this paper, we present UniHeap, a new NVM framework for managing persistent objects. It proposes a unified persistent object model that supports various managed languages, and manages NVM within a shared heap that enables cross-language persistent object sharing. UniHeap reduces the object persistence overhead by managing the shared heap in a log-structured manner and coalescing object updates during the garbage collection. We implement UniHeap as a generic framework and extend it to different managed runtimes that include HotSpot JVM, cPython, and JavaScript engine SpiderMonkey. We evaluate UniHeap with a variety of applications, such as key-value store and transactional database. Our evaluation shows that UniHeap significantly outperforms state-of-the-art object sharing approaches, while introducing negligible overhead to the managed runtimes.

Living systems process sensory data to facilitate adaptive behaviour. A given sensor can be stimulated as the result of internally driven activity, or by purely external (environmental) sources. It is clear that these inputs are processed differently - have you ever tried tickling yourself? The canonical explanation of this difference is that when the brain sends a signal that would result in motor activity, it uses a copy of that signal to predict the sensory consequences of the resulting motor activity. The predicted sensory input is then subtracted from the actual sensory input, resulting in attenuation of the stimuli. To critically evaluate this idea, and investigate when non-predictive solutions may be viable, we implement a computational model of a simple embodied system with self-caused sensorimotor dynamics, and analyse how controllers successfully accomplish tasks in this model. We find that in these simple systems, solutions that regulate behaviour to control self-caused sensory inputs tend to emerge, rather than solutions which predict and filter out self-caused inputs. In some cases, solutions depend on the presence of these self-caused inputs.

Image harmonization aims to generate a more realistic appearance of foreground and background for a composite image. Existing methods perform the same harmonization process for the whole foreground. However, the implanted foreground always contains different appearance patterns. All the existing solutions ignore the difference of each color block and losing some specific details. Therefore, we propose a novel global-local two stages framework for Fine-grained Region-aware Image Harmonization (FRIH), which is trained end-to-end. In the first stage, the whole input foreground mask is used to make a global coarse-grained harmonization. In the second stage, we adaptively cluster the input foreground mask into several submasks by the corresponding pixel RGB values in the composite image. Each submask and the coarsely adjusted image are concatenated respectively and fed into a lightweight cascaded module, adjusting the global harmonization performance according to the region-aware local feature. Moreover, we further designed a fusion prediction module by fusing features from all the cascaded decoder layers together to generate the final result, which could utilize the different degrees of harmonization results comprehensively. Without bells and whistles, our FRIH algorithm achieves the best performance on iHarmony4 dataset (PSNR is 38.19 dB) with a lightweight model. The parameters for our model are only 11.98 M, far below the existing methods.

Modularity is essential to many well-performing structured systems, as it is a useful means of managing complexity [8]. An analysis of modularity in neural networks produced by machine learning algorithms can offer valuable insight into the workings of such algorithms and how modularity can be leveraged to improve performance. However, this property is often overlooked in the neuroevolutionary literature, so the modular nature of many learning algorithms is unknown. This property was assessed on the popular algorithm "NeuroEvolution of Augmenting Topologies" (NEAT) for standard simulation benchmark control problems due to NEAT's ability to optimise network topology. This paper shows that NEAT networks seem to rapidly increase in modularity over time with the rate and convergence dependent on the problem. Interestingly, NEAT tends towards increasingly modular networks even when network fitness converges. It was shown that the ideal level of network modularity in the explored parameter space is highly dependent on other network variables, dispelling theories that modularity has a straightforward relationship to network performance. This is further proven in this paper by demonstrating that rewarding modularity directly did not improve fitness.

This paper shows, in the framework of the logical method,the unsolvability of $k$-set agreement task by devising a suitable formula of epistemic logic. The unsolvability of $k$-set agreement task is a well-known fact, which is a direct consequence of Sperner's lemma, a classic result from combinatorial topology. However, Sperner's lemma does not provide a good intuition for the unsolvability,hiding it behind the elegance of its combinatorial statement. The logical method has a merit that it can account for the reason of unsolvability by a concrete formula, but no epistemic formula for the general unsolvability result for $k$-set agreement task has been presented so far. We employ a variant of epistemic $\mu$-calculus, which extends the standard epistemic logic with distributed knowledge operators and propositional fixpoints, as the formal language of logic. With these extensions, we can provide an epistemic $\mu$-calculus formula that mentions higher-dimensional connectivity, which is essential in the original proof of Sperner's lemma, and thereby show that $k$-set agreement tasks are not solvable even by multi-round protocols. Furthermore, we also show that the same formula applies to establish the unsolvability for $k$-concurrency, a submodel of the 2-round protocol.

Systematicity, i.e., the ability to recombine known parts and rules to form new sequences while reasoning over relational data, is critical to machine intelligence. A model with strong systematicity is able to train on small-scale tasks and generalize to large-scale tasks. In this paper, we propose R5, a relational reasoning framework based on reinforcement learning that reasons over relational graph data and explicitly mines underlying compositional logical rules from observations. R5 has strong systematicity and being robust to noisy data. It consists of a policy value network equipped with Monte Carlo Tree Search to perform recurrent relational prediction and a backtrack rewriting mechanism for rule mining. By alternately applying the two components, R5 progressively learns a set of explicit rules from data and performs explainable and generalizable relation prediction. We conduct extensive evaluations on multiple datasets. Experimental results show that R5 outperforms various embedding-based and rule induction baselines on relation prediction tasks while achieving a high recall rate in discovering ground truth rules.

Relational graph neural networks have garnered particular attention to encode graph context in knowledge graphs (KGs). Although they achieved competitive performance on small KGs, how to efficiently and effectively utilize graph context for large KGs remains an open problem. To this end, we propose the Relation-based Embedding Propagation (REP) method. It is a post-processing technique to adapt pre-trained KG embeddings with graph context. As relations in KGs are directional, we model the incoming head context and the outgoing tail context separately. Accordingly, we design relational context functions with no external parameters. Besides, we use averaging to aggregate context information, making REP more computation-efficient. We theoretically prove that such designs can avoid information distortion during propagation. Extensive experiments also demonstrate that REP has significant scalability while improving or maintaining prediction quality. Notably, it averagely brings about 10% relative improvement to triplet-based embedding methods on OGBL-WikiKG2 and takes 5%-83% time to achieve comparable results as the state-of-the-art GC-OTE.

We present ViT5, a pretrained Transformer-based encoder-decoder model for the Vietnamese language. With T5-style self-supervised pretraining, ViT5 is trained on a large corpus of high-quality and diverse Vietnamese texts. We benchmark ViT5 on two downstream text generation tasks, Abstractive Text Summarization and Named Entity Recognition. Although Abstractive Text Summarization has been widely studied for the English language thanks to its rich and large source of data, there has been minimal research into the same task in Vietnamese, a much lower resource language. In this work, we perform exhaustive experiments on both Vietnamese Abstractive Summarization and Named Entity Recognition, validating the performance of ViT5 against many other pretrained Transformer-based encoder-decoder models. Our experiments show that ViT5 significantly outperforms existing models and achieves state-of-the-art results on Vietnamese Text Summarization. On the task of Named Entity Recognition, ViT5 is competitive against previous best results from pretrained encoder-based Transformer models. Further analysis shows the importance of context length during the self-supervised pretraining on downstream performance across different settings.

Technologies and lifestyles have been increasingly geared toward consumerism in recent years. Accordingly, it is both the price and the delivery time that matter most to the ultimate customers of commercial enterprises. Consequently, the importance of having an optimal delivery time is becoming increasingly evident these days. Scheduling can be used to optimize supply chains and production systems in this manner, which is one practical method for lowering costs and boosting productivity. This paper suggests a multi-objective scheduling model for hierarchical hub structures (HHS) with three levels of service. The factory and customers hub (second level) and central are on the first level in which the factory has a Flexible Flow Shop (FFS) environment. The noncentral hub (third level) is responsible for the delivery of products made in the factory to customers. Customer nodes and factories are connected separately to the second level, and the non-central hubs are connected to the third level. The model's objective is to minimize transportation and production costs and product arrival times. To validate and evaluate the model, small instances have been solved and analyzed in detail with the weighted sum and e-constraint methods. Consequently, based on the ideal mean distance (MID) metric, the two methods were compared for the designed instances. As NP-hardness causes the previously proposed methods to solve large-scale problems to be time-consuming, a meta-heuristic method was developed to solve the large-scale problem.

We present an approach to generating 3D human models from images. The key to our framework is that we predict double-sided orthographic depth maps and color images from a single perspective projected image. Our framework consists of three networks. The first network predicts normal maps to recover geometric details such as wrinkles in the clothes and facial regions. The second network predicts shade-removed images for the front and back views by utilizing the predicted normal maps. The last multi-headed network takes both normal maps and shade-free images and predicts depth maps while selectively fusing photometric and geometric information through multi-headed attention gates. Experimental results demonstrate that our method shows visually plausible results and competitive performance in terms of various evaluation metrics over state-of-the-art methods.

Machine Learning (ML) has made unprecedented progress in the past several decades. However, due to the memorability of the training data, ML is susceptible to various attacks, especially Membership Inference Attacks (MIAs), the objective of which is to infer the model's training data. So far, most of the membership inference attacks against ML classifiers leverage the shadow model with the same structure as the target model. However, empirical results show that these attacks can be easily mitigated if the shadow model is not clear about the network structure of the target model. In this paper, We present attacks based on black-box access to the target model. We name our attack \textbf{l-Leaks}. The l-Leaks follows the intuition that if an established shadow model is similar enough to the target model, then the adversary can leverage the shadow model's information to predict a target sample's membership.The logits of the trained target model contain valuable sample knowledge. We build the shadow model by learning the logits of the target model and making the shadow model more similar to the target model. Then shadow model will have sufficient confidence in the member samples of the target model. We also discuss the effect of the shadow model's different network structures to attack results. Experiments over different networks and datasets demonstrate that both of our attacks achieve strong performance.

Recently some mixed alphabet rings are involved in constructing few-Lee weight codes with optimal or minimal Gray images using suitable defining sets or down-sets. Inspired by these works, we choose the mixed alphabet ring $\mathbb{Z}_2\mathbb{Z}_2[u]$ to construct a special class of linear code $C_L$ over $\mathbb{Z}_2[u]$ with $u^2=0$ by employing simplicial complexes generated by a single maximal element. We show that $C_L$ has few-Lee weights by determining the Lee weight distribution of $C_L$. Also we have an infinite family of minimal codes over $\mathbb{Z}_2$ via Gray map, which can be used to secret sharing schemes.

We consider the problem of estimating an upper bound on the capacity of a memoryless channel with unknown channel law and continuous output alphabet. A novel data-driven algorithm is proposed that exploits the dual representation of capacity where the maximization over the input distribution is replaced with a minimization over a reference distribution on the channel output. To efficiently compute the required divergence maximization between the conditional channel and the reference distribution, we use a modified mutual information neural estimator that takes the channel input as an additional parameter. We evaluate our approach on different memoryless channels and show that the estimated upper bounds closely converge either to the channel capacity or to best-known lower bounds.

In this paper, a cooperative communication network based on two energy-harvesting (EH) decode-and-forward (DF) relays which harvest energy from the ambience using buffers with harvest-store-use (HSU) architecture is considered. For improving the data delivery in this network, an opportunistic routing (OR) algorithm considering channel status information (CSI), location and energy buffer status of relays is proposed. For the sake of deriving the theoretical expressions for limiting distribution of energy stored in buffers with discrete-time continuous-state space Markov chain (DCSMC) model, the probabilities that the packet to be forwarded exists in one and more transmitting nodes are obtained based on the state transition matrix (STM). Moreover, the closed-form expressions for outage probability and throughput of the network based on the CSI and the limiting distributions of energy stored in buffers are presented. Numerous experiments have been performed to verify the derived theoretical expressions.

As we discussed in Part I of this topic, there is a clear desire to model and comprehend human behavior. Given the popular presupposition of human reasoning as the standard for learning and decision-making, there have been significant efforts and a growing trend in research to replicate these innate human abilities in artificial systems. In Part I, we discussed learning methods which generate a model of behavior from exploration of the system and feedback based on the exhibited behavior as well as topics relating to the use of or accounting for beliefs with respect to applicable skills or mental states of others. In this work, we will continue the discussion from the perspective of methods which focus on the assumed cognitive abilities, limitations, and biases demonstrated in human reasoning. We will arrange these topics as follows (i) methods such as cognitive architectures, cognitive heuristics, and related which demonstrate assumptions of limitations on cognitive resources and how that impacts decisions and (ii) methods which generate and utilize representations of bias or uncertainty to model human decision-making or the future outcomes of decisions.

Supply Chain (SC) modeling is essential to understand and influence SC behavior, especially for increasingly globalized and complex SCs. Existing models address various SC notions, e.g., processes, tiers and production, in an isolated manner limiting enriched analysis granted by integrated information systems. Moreover, the scarcity of real-world data prevents the benchmarking of the overall SC performance in different circumstances, especially wrt. resilience during disruption. We present SENS, an ontology-based Knowlegde-Graph (KG) equipped with SPARQL implementations of KPIs to incorporate an end-to-end perspective of the SC including standardized SCOR processes and metrics. Further, we propose SENS-GEN, a highly configurable data generator that leverages SENS to create synthetic semantic SC data under multiple scenario configurations for comprehensive analysis and benchmarking applications. The evaluation shows that the significantly improved simulation and analysis capabilities, enabled by SENS, facilitate grasping, controlling and ultimately enhancing SC behavior and increasing resilience in disruptive scenarios.

There is a clear desire to model and comprehend human behavior. Trends in research covering this topic show a clear assumption that many view human reasoning as the presupposed standard in artificial reasoning. As such, topics such as game theory, theory of mind, machine learning, etc. all integrate concepts which are assumed components of human reasoning. These serve as techniques to attempt to both replicate and understand the behaviors of humans. In addition, next generation autonomous and adaptive systems will largely include AI agents and humans working together as teams. To make this possible, autonomous agents will require the ability to embed practical models of human behavior, which allow them not only to replicate human models as a technique to "learn", but to to understand the actions of users and anticipate their behavior, so as to truly operate in symbiosis with them. The main objective of this paper it to provide a succinct yet systematic review of the most important approaches in two areas dealing with quantitative models of human behaviors. Specifically, we focus on (i) techniques which learn a model or policy of behavior through exploration and feedback, such as Reinforcement Learning, and (ii) directly model mechanisms of human reasoning, such as beliefs and bias, without going necessarily learning via trial-and-error.

Online federated learning (OFL) is a promising framework to collaboratively learn a sequence of non-linear functions (or models) from distributed streaming data incoming to multiple clients while keeping the privacy of their local data. In this framework, we first construct a vanilla method (named OFedAvg) by incorporating online gradient descent (OGD) into the de facto aggregation method (named FedAvg). Despite its optimal asymptotic performance, OFedAvg suffers from heavy communication overhead and long learning delay. To tackle these shortcomings, we propose a communication-efficient OFL algorithm (named OFedQIT) by means of a stochastic quantization and an intermittent transmission. Our major contribution is to theoretically prove that OFedQIT over $T$ time slots can achieve an optimal sublinear regret bound $\mathcal{O}(\sqrt{T})$ for any real data (including non-IID data) while significantly reducing the communication overhead. Furthermore, this optimality is still guaranteed even when a small fraction of clients (having faster processing time and high-quality communication channel) in a network are participated at once. Our analysis reveals that OFedQIT successfully addresses the drawbacks of OFedAvg while maintaining superior learning accuracy. Experiments with real datasets demonstrate the effectiveness of our algorithm on various online classification and regression tasks.

The analytic deep prior (ADP) approach was recently introduced for the theoretical analysis of deep image prior (DIP) methods with special network architectures. In this paper, we prove that ADP is in fact equivalent to classical variational Ivanov methods for solving ill-posed inverse problems. Besides, we propose a new variant which incorporates the strategy of early stopping into the ADP model. For both variants, we show how classical regularization properties (existence, stability, convergence) can be obtained under common assumptions.

Gaussian process regression (GPR) has been a well-known machine learning method for various applications such as uncertainty quantifications (UQ). However, GPR is inherently a data-driven method, which requires sufficiently large dataset. If appropriate physics constraints (e.g. expressed in partial differential equations) can be incorporated, the amount of data can be greatly reduced and the accuracy further improved. In this work, we propose a hybrid data driven-physics constrained Gaussian process regression framework. We encode the physics knowledge with Boltzmann-Gibbs distribution and derive our model through maximum likelihood (ML) approach. We apply deep kernel learning method. The proposed model learns from both data and physics constraints through the training of a deep neural network, which serves as part of the covariance function in GPR. The proposed model achieves good results in high-dimensional problem, and correctly propagate the uncertainty, with very limited labelled data provided.

Caching multimedia contents at the network edge is a key solution to decongest the amount of traffic in the backhaul link. In this paper, we extend and analyze the coded caching technique [1] in an unexplored scenario, i.e. at the edge of two-tier heterogeneous networks with an arbitrary number of users. We characterize the performance of such scheme by deriving a closed-form expression of the average backhaul load and reveal a significant gain compared to other benchmark caching schemes proposed in the literature.

Work on Local Dynamic Maps (LDM) implementation is still in its early stages, as the LDM standards only define how information shall be structured in databases, while the mechanism to fuse or link information across different layers is left undefined. A working LDM component, as a real-time database inside the vehicle is an attractive solution to multi-ADAS systems, which may feed a real-time LDM database that serves as a central point of information inside the vehicle, exposing fused and structured information to other components (e.g., decision-making systems). In this paper we describe our approach implementing a real-time LDM component using the RTMaps middleware, as a database deployed in a vehicle, but also at road-side units (RSU), making use of the three pillars that guide a successful fusion strategy: utilisation of standards (with conversions between domains), middlewares to unify multiple ADAS sources, and linkage of data via semantic concepts.

We survey what is known about Fekete points/optimal designs for a simplex in $\R^d.$ Several new results are included. The notion of Fej\'er exponenet for a set of interpolation points is introduced.

Supply Chains (SCs) are subject to disruptive events that potentially hinder the operational performance. Disruption Management Process (DMP) relies on the analysis of integrated heterogeneous data sources such as production scheduling, order management and logistics to evaluate the impact of disruptions on the SC. Existing approaches are limited as they address DMP process steps and corresponding data sources in a rather isolated manner which hurdles the systematic handling of a disruption originating anywhere in the SC. Thus, we propose MARE a semantic disruption management and resilience evaluation framework for integration of data sources included in all DMP steps, i.e. Monitor/Model, Assess, Recover and Evaluate. MARE, leverages semantic technologies i.e. ontologies, knowledge graphs and SPARQL queries to model and reproduce SC behavior under disruptive scenarios. Also, MARE includes an evaluation framework to examine the restoration performance of a SC applying various recovery strategies. Semantic SC DMP, put forward by MARE, allows stakeholders to potentially identify the measures to enhance SC integration, increase the resilience of supply networks and ultimately facilitate digitalization.

Reinforcement learning (RL) is highly suitable for devising control strategies in the context of dynamical systems. A prominent instance of such a dynamical system is the system of equations governing fluid dynamics. Recent research results indicate that RL-augmented computational fluid dynamics (CFD) solvers can exceed the current state of the art, for example in the field of turbulence modeling. However, while in supervised learning, the training data can be generated a priori in an offline manner, RL requires constant run-time interaction and data exchange with the CFD solver during training. In order to leverage the potential of RL-enhanced CFD, the interaction between the CFD solver and the RL algorithm thus have to be implemented efficiently on high-performance computing (HPC) hardware. To this end, we present Relexi as a scalable RL framework that bridges the gap between machine learning workflows and modern CFD solvers on HPC systems providing both components with its specialized hardware. Relexi is built with modularity in mind and allows easy integration of various HPC solvers by means of the in-memory data transfer provided by the SmartSim library. Here, we demonstrate that the Relexi framework can scale up to hundreds of parallel environment on thousands of cores. This allows to leverage modern HPC resources to either enable larger problems or faster turnaround times. Finally, we demonstrate the potential of an RL-augmented CFD solver by finding a control strategy for optimal eddy viscosity selection in large eddy simulations.

Cloud service providers have launched Machine-Learning-as-a-Service (MLaaS) platforms to allow users to access large-scale cloudbased models via APIs. In addition to prediction outputs, these APIs can also provide other information in a more human-understandable way, such as counterfactual explanations (CF). However, such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks which aim to steal the internal functionality of models in the cloud. Due to the black-box nature of cloud models, however, a vast number of queries are inevitably required by existing attack strategies before the substitute model achieves high fidelity. In this paper, we propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model. This is motivated by our observation that current querying strategies suffer from decision boundary shift issue induced by taking far-distant queries and close-to-boundary CFs into substitute model training. We then propose DualCF strategy to circumvent the above issues, which is achieved by taking not only CF but also counterfactual explanation of CF (CCF) as pairs of training samples for the substitute model. Extensive and comprehensive experimental evaluations are conducted on both synthetic and real-world datasets. The experimental results favorably illustrate that DualCF can produce a high-fidelity model with fewer queries efficiently and effectively.

Pharmaceutical industry can better leverage its data assets to virtualize drug discovery through a collaborative machine learning platform. On the other hand, there are non-negligible risks stemming from the unintended leakage of participants' training data, hence, it is essential for such a platform to be secure and privacy-preserving. This paper describes a privacy risk assessment for collaborative modeling in the preclinical phase of drug discovery to accelerate the selection of promising drug candidates. After a short taxonomy of state-of-the-art inference attacks we adopt and customize several to the underlying scenario. Finally we describe and experiments with a handful of relevant privacy protection techniques to mitigate such attacks.

The notion of concept drift refers to the phenomenon that the data generating distribution changes over time; as a consequence machine learning models may become inaccurate and need adjustment. In this paper we consider the problem of detecting those change points in unsupervised learning. Many unsupervised approaches rely on the discrepancy between the sample distributions of two time windows. This procedure is noisy for small windows, hence prone to induce false positives and not able to deal with more than one drift event in a window. In this paper we rely on structural properties of drift induced signals, which use spectral properties of kernel embedding of distributions. Based thereon we derive a new unsupervised drift detection algorithm, investigate its mathematical properties, and demonstrate its usefulness in several experiments.

Font synthesis has been a very active topic in recent years because manual font design requires domain expertise and is a labor-intensive and time-consuming job. While remarkably successful, existing methods for font synthesis have major shortcomings; they require finetuning for unobserved font style with large reference images, the recent few-shot font synthesis methods are either designed for specific language systems or they operate on low-resolution images which limits their use. In this paper, we tackle this font synthesis problem by learning the font style in the embedding space. To this end, we propose a model, called FontNet, that simultaneously learns to separate font styles in the embedding space where distances directly correspond to a measure of font similarity, and translates input images into the given observed or unobserved font style. Additionally, we design the network architecture and training procedure that can be adopted for any language system and can produce high-resolution font images. Thanks to this approach, our proposed method outperforms the existing state-of-the-art font generation methods on both qualitative and quantitative experiments.

Current search interfaces of digital libraries are not suitable to satisfy complex or convoluted information needs directly, when it comes to cases such as "Find authors who only recently started working on a topic". They might offer possibilities to obtain this information only by requiring vast user interaction. We present SchenQL, a web interface of a domain-specific query language on bibliographic metadata, which offers information search and exploration by query formulation and navigation in the system. Our system focuses on supporting aggregation of data and providing specialised domain dependent functions while being suitable for domain experts as well as casual users of digital libraries.

Spike sorting algorithms are used to separate extracellular recordings of neuronal populations into single-unit spike activities. The development of customized hardware implementing spike sorting algorithms is burgeoning. However, there is a lack of a systematic approach and a set of standardized evaluation criteria to facilitate direct comparison of both software and hardware implementations. In this paper, we formalize a set of standardized criteria and a publicly available synthetic dataset entitled Synthetic Simulations Of Extracellular Recordings (SSOER), which was constructed by aggregating existing synthetic datasets with varying Signal-To-Noise Ratios (SNRs). Furthermore, we present a benchmark for future comparison, and use our criteria to evaluate a simulated Resistive Random-Access Memory (RRAM) In-Memory Computing (IMC) system using the Discrete Wavelet Transform (DWT) for feature extraction. Our system consumes approximately (per channel) 10.72mW and occupies an area of 0.66mm$^2$ in a 22nm FDSOI Complementary Metal-Oxide-Semiconductor (CMOS) process.

In this paper, we study the information transmission problem under the distributed learning framework, where each worker node is merely permitted to transmit a $m$-dimensional statistic to improve learning results of the target node. Specifically, we evaluate the corresponding expected population risk (EPR) under the regime of large sample sizes. We prove that the performance can be enhanced since the transmitted statistics contribute to estimating the underlying distribution under the mean square error measured by the EPR norm matrix. Accordingly, the transmitted statistics correspond to the eigenvectors of this matrix, and the desired transmission allocates these eigenvectors among the statistics such that the EPR is minimal. Moreover, we provide the analytical solution of the desired statistics for single-node and two-node transmission, where a geometrical interpretation is given to explain the eigenvector selection. For the general case, an efficient algorithm that can output the allocation solution is developed based on the node partitions.

In this paper we discuss different transmission operators for the non-overlapping Schwarz method which are suited for solving the time-harmonic Helmholtz equation in cavities (i.e. closed domains which do not feature an outgoing wave condition). Such problems are heavily impacted by back-propagating waves which are often neglected when devising optimized transmission operators for the Schwarz method. This work explores new operators taking into account those back-propagating waves and compares them with well-established operators neglecting these contributions. Notably, this paper focuses on the case of rectangular cavities, as the optimal (non-local) transmission operator can be easily determined. Nonetheless, deviations from this ideal geometry are considered as well. In particular, computations of the acoustic noise in a three-dimensional model of the helium vessel of a beamline cryostat with optimized Schwarz schemes are discussed. Those computations show a reduction of 46% in the iteration count, when comparing an operator optimized for cavities with those optimized for unbounded problems.

As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity.

Identification capacities of randomized and deterministic identification were proved to exceed channel capacity for Gaussian channels \emph{with} channel side information (CSI). In this work, we extend deterministic identification to the block fading channels without CSI by applying identification codes for both channel estimation and user identification. We prove that identification capacity is asymptotically higher than transmission capacity even in the absence of CSI. And we also analyze the finite-length performance theoretically and numerically. The simulation results verify the feasibility of the proposed blind deterministic identification in finite blocklength regime.

The Whole-Body Locomotion Framework (WoLF) is an end-to-end software suite devoted to the loco-manipulation of quadruped robots. WoLF abstracts the complexity of planning and control of quadrupedal robot hardware into a simple to use and robust software that can be connected through multiple tele-operation devices to different quadruped robot models. Furthermore, WoLF allows controlling mounted devices, such as arms or pan-tilt cameras, jointly with the quadrupedal platform. In this short paper, we introduce the main features of WoLF and its overall software architecture.

A key challenge in video question answering is how to realize the cross-modal semantic alignment between textual concepts and corresponding visual objects. Existing methods mostly seek to align the word representations with the video regions. However, word representations are often not able to convey a complete description of textual concepts, which are in general described by the compositions of certain words. To address this issue, we propose to first build a syntactic dependency tree for each question with an off-the-shelf tool and use it to guide the extraction of meaningful word compositions. Based on the extracted compositions, a hypergraph is further built by viewing the words as nodes and the compositions as hyperedges. Hypergraph convolutional networks (HCN) are then employed to learn the initial representations of word compositions. Afterwards, an optimal transport based method is proposed to perform cross-modal semantic alignment for the textual and visual semantic space. To reflect the cross-modal influences, the cross-modal information is incorporated into the initial representations, leading to a model named cross-modality-aware syntactic HCN. Experimental results on three benchmarks show that our method outperforms all strong baselines. Further analyses demonstrate the effectiveness of each component, and show that our model is good at modeling different levels of semantic compositions and filtering out irrelevant information.

Fog networks offer computing resources with varying capacities at different distances from end users. A Fog Node (FN) closer to the network edge may have less powerful computing resources compared to the cloud, but processing of computational tasks in an FN limits long-distance transmission. How should the tasks be distributed between fog and cloud nodes? We formulate a universal non-convex Mixed-Integer Nonlinear Programming (MINLP) problem minimizing task transmission- and processing-related energy with delay constraints to answer this question. It is transformed with Successive Convex Approximation (SCA) and decomposed using the primal and dual decomposition techniques. Two practical algorithms called Energy-EFFicient Resource Allocation (EEFFRA) and Low-Complexity (LC)-EEFFRA are proposed. They allow for successful distribution of network requests between FNs and the cloud in various scenarios significantly reducing the average energy cost and decreasing the number of computational requests with unmet delay requirements.

In recommender system, some feature directly affects whether an interaction would happen, making the happened interactions not necessarily indicate user preference. For instance, short videos are objectively easier to be finished even though the user does not like the video. We term such feature as confounding feature, and video length is a confounding feature in video recommendation. If we fit a model on such interaction data, just as done by most data-driven recommender systems, the model will be biased to recommend short videos more, and deviate from user actual requirement. This work formulates and addresses the problem from the causal perspective. Assuming there are some factors affecting both the confounding feature and other item features, e.g., the video creator, we find the confounding feature opens a backdoor path behind user item matching and introduces spurious correlation. To remove the effect of backdoor path, we propose a framework named Deconfounding Causal Recommendation (DCR), which performs intervened inference with do-calculus. Nevertheless, evaluating do calculus requires to sum over the prediction on all possible values of confounding feature, significantly increasing the time cost. To address the efficiency challenge, we further propose a mixture-of experts (MoE) model architecture, modeling each value of confounding feature with a separate expert module. Through this way, we retain the model expressiveness with few additional costs. We demonstrate DCR on the backbone model of neural factorization machine (NFM), showing that DCR leads to more accurate prediction of user preference with small inference time cost.

Internet of Things (IoT) is a growing technology that relies on connected 'things' that gather data from peer devices and send data to servers via APIs (Application Programming Interfaces). The design quality of those APIs has a direct impact on their understandability and reusability. This study focuses on the linguistic design quality of REST APIs for IoT applications and assesses their linguistic quality by performing the detection of linguistic patterns and antipatterns in REST APIs for IoT applications. Linguistic antipatterns are considered poor practices in the naming, documentation, and choice of identifiers. In contrast, linguistic patterns represent best practices to APIs design. The linguistic patterns and their corresponding antipatterns are hence contrasting pairs. We propose the SARAv2 (Semantic Analysis of REST APIs version two) approach to perform syntactic and semantic analyses of REST APIs for IoT applications. Based on the SARAv2 approach, we develop the REST-Ling tool and empirically validate the detection results of nine linguistic antipatterns. We analyse 19 REST APIs for IoT applications. Our detection results show that the linguistic antipatterns are prevalent and the REST-Ling tool can detect linguistic patterns and antipatterns in REST APIs for IoT applications with an average accuracy of over 80%. Moreover, the tool performs the detection of linguistic antipatterns on average in the order of seconds, i.e., 8.396 seconds. We found that APIs generally follow good linguistic practices, although the prevalence of poor practices exists.

Choreographic models support a correctness-by-construction principle in distributed programming. Also, they enable the automatic generation of correct message-based communication patterns from a global specification of the desired system behaviour. In this paper we extend the theory of choreography automata, a choreographic model based on finite-state automata, with two key features. First, we allow participants to act only in some of the scenarios described by the choreography automaton. While this seems natural, many choreographic approaches in the literature, and choreography automata in particular, forbid this behaviour. Second, we equip communications with assertions constraining the values that can be communicated, enabling a design-by-contract approach. We provide a toolchain allowing to exploit the theory above to generate APIs for TypeScript web programming. Programs communicating via the generated APIs follow, by construction, the prescribed communication pattern and are free from communication errors such as deadlocks.

Neural code synthesis has reached a point where snippet generation is accurate enough to be considered for integration into human software development workflows. Commercial products aim to increase programmers' productivity, without being able to measure it directly. In this case study, we asked users of GitHub Copilot about its impact on their productivity, and sought to find a reflection of their perception in directly measurable user data. We find that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers' perception of productivity.

This work is dedicated to the review and perspective of the new direction that we call "Neuropunk revolution" resembling the cultural phenomenon of cyberpunk. This new phenomenon has its foundations in advances in neuromorphic technologies including memristive and bio-plausible simulations, BCI, and neurointerfaces as well as unconventional approaches to AI and computing in general. We present the review of the current state-of-the-art and our vision of near future development of scientific approaches and future technologies. We call the "Neuropunk revolution" the set of trends that in our view provide the necessary background for the new generation of approaches technologies to integrate the cybernetic objects with biological tissues in close loop system as well as robotic systems inspired by the biological processes again integrated with biological objects. We see bio-plausible simulations implemented by digital computers or spiking networks memristive hardware as promising bridge or middleware between digital and (neuro)biological domains.

We develop a discrete extension operator for trimmed spline spaces consisting of piecewise polynomial functions of degree $p$ with $k$ continuous derivatives. The construction is based on polynomial extension from neighboring elements together with projection back into the spline space. We prove stability and approximation results for the extension operator. Finally, we illustrate how we can use the extension operator to construct a stable cut isogeometric method for an elliptic model problem.

Many software systems, such as online social networks enable users to share information about themselves. While the action of sharing is simple, it requires an elaborate thought process on privacy: what to share, with whom to share, and for what purposes. Thinking about these for each piece of content to be shared is tedious. Recent approaches to tackle this problem build personal assistants that can help users by learning what is private over time and recommending privacy labels such as private or public to individual content that a user considers sharing. However, privacy is inherently ambiguous and highly personal. Existing approaches to recommend privacy decisions do not address these aspects of privacy sufficiently. Ideally, a personal assistant should be able to adjust its recommendation based on a given user, considering that user's privacy understanding. Moreover, the personal assistant should be able to assess when its recommendation would be uncertain and let the user make the decision on her own. Accordingly, this paper proposes a personal assistant that uses evidential deep learning to classify content based on its privacy label. An important characteristic of the personal assistant is that it can model its uncertainty in its decisions explicitly, determine that it does not know the answer, and delegate from making a recommendation when its uncertainty is high. By factoring in the user's own understanding of privacy, such as risk factors or own labels, the personal assistant can personalize its recommendations per user. We evaluate our proposed personal assistant using a well-known data set. Our results show that our personal assistant can accurately identify uncertain cases, personalize them to its user's needs, and thus helps users preserve their privacy well.

An important limitation to the development of AI-based solutions for In Vitro Fertilization (IVF) is the black-box nature of most state-of-the-art models, due to the complexity of deep learning architectures, which raises potential bias and fairness issues. The need for interpretable AI has risen not only in the IVF field but also in the deep learning community in general. This has started a trend in literature where authors focus on designing objective metrics to evaluate generic explanation methods. In this paper, we study the behavior of recently proposed objective faithfulness metrics applied to the problem of embryo stage identification. We benchmark attention models and post-hoc methods using metrics and further show empirically that (1) the metrics produce low overall agreement on the model ranking and (2) depending on the metric approach, either post-hoc methods or attention models are favored. We conclude with general remarks about the difficulty of defining faithfulness and the necessity of understanding its relationship with the type of approach that is favored.

We present a deep learning model for finding human-understandable connections between input features. Our approach uses a parameterized, differentiable activation function, based on the theoretical background of nilpotent fuzzy logic and multi-criteria decision-making (MCDM). The learnable parameter has a semantic meaning indicating the level of compensation between input features. The neural network determines the parameters using gradient descent to find human-understandable relationships between input features. We demonstrate the utility and effectiveness of the model by successfully applying it to classification problems from the UCI Machine Learning Repository.

Although deep face recognition has achieved impressive progress in recent years, controversy has arisen regarding discrimination based on skin tone, questioning their deployment into real-world scenarios. In this paper, we aim to systematically and scientifically study this bias from both data and algorithm aspects. First, using the dermatologist approved Fitzpatrick Skin Type classification system and Individual Typology Angle, we contribute a benchmark called Identity Shades (IDS) database, which effectively quantifies the degree of the bias with respect to skin tone in existing face recognition algorithms and commercial APIs. Further, we provide two skin-tone aware training datasets, called BUPT-Globalface dataset and BUPT-Balancedface dataset, to remove bias in training data. Finally, to mitigate the algorithmic bias, we propose a novel meta-learning algorithm, called Meta Balanced Network (MBN), which learns adaptive margins in large margin loss such that the model optimized by this loss can perform fairly across people with different skin tones. To determine the margins, our method optimizes a meta skewness loss on a clean and unbiased meta set and utilizes backward-on-backward automatic differentiation to perform a second order gradient descent step on the current margins. Extensive experiments show that MBN successfully mitigates bias and learns more balanced performance for people with different skin tones in face recognition. The proposed datasets are available at this http URL

Oracle bone script is the earliest-known Chinese writing system of the Shang dynasty and is precious to archeology and philology. However, real-world scanned oracle data are rare and few experts are available for annotation which make the automatic recognition of scanned oracle characters become a challenging task. Therefore, we aim to explore unsupervised domain adaptation to transfer knowledge from handprinted oracle data, which are easy to acquire, to scanned domain. We propose a structure-texture separation network (STSN), which is an end-to-end learning framework for joint disentanglement, transformation, adaptation and recognition. First, STSN disentangles features into structure (glyph) and texture (noise) components by generative models, and then aligns handprinted and scanned data in structure feature space such that the negative influence caused by serious noises can be avoided when adapting. Second, transformation is achieved via swapping the learned textures across domains and a classifier for final classification is trained to predict the labels of the transformed scanned characters. This not only guarantees the absolute separation, but also enhances the discriminative ability of the learned features. Extensive experiments on Oracle-241 dataset show that STSN outperforms other adaptation methods and successfully improves recognition performance on scanned data even when they are contaminated by long burial and careless excavation.

Efficiently utilizing discriminative features is crucial for convolutional neural networks to achieve remarkable performance in medical image segmentation and is also important for model generalization across multiple domains, where letting model recognize domain-specific and domain-invariant information among multi-site datasets is a reasonable strategy for domain generalization. Unfortunately, most of the recent disentangle networks are not directly adaptable to unseen-domain datasets because of the limitations of offered data distribution. To tackle this deficiency, we propose Contrastive Domain Disentangle (CDD) network for generalizable medical image segmentation. We first introduce a disentangle network to decompose medical images into an anatomical representation factor and a modality representation factor. Then, a style contrastive loss is proposed to encourage the modality representations from the same domain to distribute as close as possible while different domains are estranged from each other. Finally, we propose a domain augmentation strategy that can randomly generate new domains for model generalization training. Experimental results on multi-site fundus image datasets for optic cup and disc segmentation show that the CDD has good model generalization. Our proposed CDD outperforms several state-of-the-art methods in domain generalizable segmentation.

Strategies that include the generation of synthetic data are beginning to be viable as obtaining real data can be logistically complicated, very expensive or slow. Not only the capture of the data can lead to complications, but also its annotation. To achieve high-fidelity data for training intelligent systems, we have built a 3D scenario and set-up to resemble reality as closely as possible. With our approach, it is possible to configure and vary parameters to add randomness to the scene and, in this way, allow variation in data, which is so important in the construction of a dataset. Besides, the annotation task is already included in the data generation exercise, rather than being a post-capture task, which can save a lot of resources. We present the process and concept of synthetic data generation in an automotive context, specifically for driver and passenger monitoring purposes, as an alternative to real data capturing.

In the 2-choice allocation problem, $m$ balls are placed into $n$ bins, and each ball must choose between two random bins $i, j \in [n]$ that it has been assigned to. It has been known for more than two decades, that if each ball follows the Greedy strategy (i.e., always pick the less-full bin), then the maximum load will be $m/n + O(\log \log n)$ with high probability in $n$ (and $m / n + O(\log m)$ with high probability in $m$). It has remained open whether the same bounds hold in the dynamic version of the same game, where balls are inserted/deleted with up to $m$ balls present at a time. We show that these bounds do not hold in the dynamic setting: already on $4$ bins, there exists a sequence of insertions/deletions that cause {Greedy} to incur a maximum load of $m/4 + \Omega(\sqrt{m})$ with probability $\Omega(1)$ -- this is the same bound as if each ball is simply assigned to a random bin! This raises the question of whether any 2-choice allocation strategy can offer a strong bound in the dynamic setting. Our second result answers this question in the affirmative: we present a new strategy, called ModulatedGreedy, that guarantees a maximum load of $m / n + O(\log m)$, at any given moment, with high probability in $m$. Generalizing ModulatedGreedy, we obtain dynamic guarantees for the $(1 + \beta)$-choice setting, and for the setting of balls-and-bins on a graph. Finally, we consider a setting in which balls can be reinserted after they are deleted, and where the pair $i, j$ that a given ball uses is consistent across insertions. This seemingly small modification renders tight load balancing impossible: on 4 bins, any strategy that is oblivious to the specific identities of balls must allow for a maximum load of $m/4 + poly(m)$ at some point in the first $poly(m)$ insertions/deletions, with high probability in $m$.

Knowledge graph embedding research has mainly focused on learning continuous representations of entities and relations tailored towards the link prediction problem. Recent results indicate an ever increasing predictive ability of current approaches on benchmark datasets. However, this effectiveness often comes with the cost of over-parameterization and increased computationally complexity. The former induces extensive hyperparameter optimization to mitigate malicious overfitting. The latter magnifies the importance of winning the hardware lottery. Here, we investigate a remedy for the first problem. We propose a technique based on Kronecker decomposition to reduce the number of parameters in a knowledge graph embedding model, while retaining its expressiveness. Through Kronecker decomposition, large embedding matrices are split into smaller embedding matrices during the training process. Hence, embeddings of knowledge graphs are not plainly retrieved but reconstructed on the fly. The decomposition ensures that elementwise interactions between three embedding vectors are extended with interactions within each embedding vector. This implicitly reduces redundancy in embedding vectors and encourages feature reuse. To quantify the impact of applying Kronecker decomposition on embedding matrices, we conduct a series of experiments on benchmark datasets. Our experiments suggest that applying Kronecker decomposition on embedding matrices leads to an improved parameter efficiency on all benchmark datasets. Moreover, empirical evidence suggests that reconstructed embeddings entail robustness against noise in the input knowledge graph. To foster reproducible research, we provide an open-source implementation of our approach, including training and evaluation scripts as well as pre-trained models in our knowledge graph embedding framework (https://github.com/dice-group/dice-embeddings).

Stress prediction in porous materials and structures is challenging due to the high computational cost associated with direct numerical simulations. Convolutional Neural Network (CNN) based architectures have recently been proposed as surrogates to approximate and extrapolate the solution of such multiscale simulations. These methodologies are usually limited to 2D problems due to the high computational cost of 3D voxel based CNNs. We propose a novel geometric learning approach based on a Graph Neural Network (GNN) that efficiently deals with three-dimensional problems by performing convolutions over 2D surfaces only. Following our previous developments using pixel-based CNN, we train the GNN to automatically add local fine-scale stress corrections to an inexpensively computed coarse stress prediction in the porous structure of interest. Our method is Bayesian and generates densities of stress fields, from which credible intervals may be extracted. As a second scientific contribution, we propose to improve the extrapolation ability of our network by deploying a strategy of online physics-based corrections. Specifically, we condition the posterior predictions of our probabilistic predictions to satisfy partial equilibrium at the microscale, at the inference stage. This is done using an Ensemble Kalman algorithm, to ensure tractability of the Bayesian conditioning operation. We show that this innovative methodology allows us to alleviate the effect of undesirable biases observed in the outputs of the uncorrected GNN, and improves the accuracy of the predictions in general.

This paper introduces a draft open standard for the robot equivalent of an aircraft flight data recorder, which we call an ethical black box. This is a device, or software module, capable of securely recording operational data (sensor, actuator and control decisions) for a social robot, in order to support the investigation of accidents or near-miss incidents. The open standard, presented as an annex to this paper, is offered as a first draft for discussion within the robot ethics community. Our intention is to publish further drafts following feedback, in the hope that the standard will become a useful reference for social robot designers, operators and robot accident/incident investigators.

Millimeter-wave radar systems are one of the core components of the safety-critical Advanced Driver Assistant System (ADAS) of a modern vehicle. Due to their ability to operate efficiently despite bad weather conditions and poor visibility, they are often the only reliable sensor a car has to detect and evaluate potential dangers in the surrounding environment. In this paper, we propose several attacks against automotive radars for the purposes of assessing their reliability in real-world scenarios. Using COTS hardware, we are able to successfully interfere with automotive-grade FMCW radars operating in the commonly used 77GHz frequency band, deployed in real-world, truly wireless environments. Our strongest type of interference is able to trick the victim into detecting virtual (moving) objects. We also extend this attack with a novel method that leverages noise to remove real-world objects, thus complementing the aforementioned object spoofing attack. We evaluate the viability of our attacks in two ways. First, we establish a baseline by implementing and evaluating an unrealistically powerful adversary which requires synchronization to the victim in a limited setup that uses wire-based chirp synchronization. Later, we implement, for the first time, a truly wireless attack that evaluates a weaker but realistic adversary which is non-synchronized and does not require any adjustment feedback from the victim. Finally, we provide theoretical fundamentals for our findings, and discuss the efficiency of potential countermeasures against the proposed attacks. We plan to release our software as open-source.

Recently, anomaly detection and localization in multimedia data have received significant attention among the machine learning community. In real-world applications such as medical diagnosis and industrial defect detection, anomalies only present in a fraction of the images. To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization. SSM not only enhances the training of the inpainting network but also leads to great improvement in the efficiency of mask prediction at inference. Through random masking, each image is augmented into a diverse set of training triplets, thus enabling the autoencoder to learn to reconstruct with masks of various sizes and shapes during training. To improve the efficiency and effectiveness of anomaly detection and localization at inference, we propose a novel progressive mask refinement approach that progressively uncovers the normal regions and finally locates the anomalous regions. The proposed SSM method outperforms several state-of-the-arts for both anomaly detection and anomaly localization, achieving 98.3% AUC on Retinal-OCT and 93.9% AUC on MVTec AD, respectively.

Deep neural networks, as a powerful system to represent high dimensional complex functions, play a key role in deep learning. Convergence of deep neural networks is a fundamental issue in building the mathematical foundation for deep learning. We investigated the convergence of deep ReLU networks and deep convolutional neural networks in two recent researches (arXiv:2107.12530, 2109.13542). Only the Rectified Linear Unit (ReLU) activation was studied therein, and the important pooling strategy was not considered. In this current work, we study the convergence of deep neural networks as the depth tends to infinity for two other important activation functions: the leaky ReLU and the sigmoid function. Pooling will also be studied. As a result, we prove that the sufficient condition established in arXiv:2107.12530, 2109.13542 is still sufficient for the leaky ReLU networks. For contractive activation functions such as the sigmoid function, we establish a weaker sufficient condition for uniform convergence of deep neural networks.

Various powerful deep neural network architectures have made great contribution to the exciting successes of deep learning in the past two decades. Among them, deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision by winning the first place in many deep learning competitions. Also, ResNets were the first class of neural networks in the development history of deep learning that are really deep. It is of mathematical interest and practical meaning to understand the convergence of deep ResNets. We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks. Toward this purpose, we first give a matrix-vector description of general deep neural networks with shortcut connections and formulate an explicit expression for the networks by using the notions of activation domains and activation matrices. The convergence is then reduced to the convergence of two series involving infinite products of non-square matrices. By studying the two series, we establish a sufficient condition for pointwise convergence of ResNets. Our result is able to give justification for the design of ResNets. We also conduct experiments on benchmark machine learning data to verify our results.

Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely i.i.d., compositional, zero-shot. We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs). We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs. Generating new datasets is a costly process and, thus, is not an alternative to smaller research groups and companies. In this work, we propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort. We test our hypothesis on three KGQA datasets, i.e., LC-QuAD, LC-QuAD 2.0 and QALD-9). Experiments on re-splitted KGQA datasets demonstrate its effectiveness towards generalizability. The code and a unified way to access 18 available datasets is online at https://github.com/semantic-systems/KGQA-datasets as well as https://github.com/semantic-systems/KGQA-datasets-generalization.

The real-time transient stability assessment (TSA) plays a critical role in the secure operation of the power system. Although the classic numerical integration method, \textit{i.e.} time-domain simulation (TDS), has been widely used in industry practice, it is inevitably trapped in a high computational complexity due to the high latitude sophistication of the power system. In this work, a data-driven power system estimation method is proposed to quickly predict the stability of the power system before TDS reaches the end of simulating time windows, which can reduce the average simulation time of stability assessment without loss of accuracy. As the topology of the power system is in the form of graph structure, graph neural network based representation learning is naturally suitable for learning the status of the power system. Motivated by observing the distribution information of crucial active power and reactive power on the power system's bus nodes, we thus propose a distribution-aware learning~(DAL) module to explore an informative graph representation vector for describing the status of a power system. Then, TSA is re-defined as a binary classification task, and the stability of the system is determined directly from the resulting graph representation without numerical integration. Finally, we apply our method to the online TSA task. The case studies on the IEEE 39-bus system and Polish 2383-bus system demonstrate the effectiveness of our proposed method.

Today's social networks continuously generate massive streams of data, which provide a valuable starting point for the detection of rumours as soon as they start to propagate. However, rumour detection faces tight latency bounds, which cannot be met by contemporary algorithms, given the sheer volume of high-velocity streaming data emitted by social networks. Hence, in this paper, we argue for best-effort rumour detection that detects most rumours quickly rather than all rumours with a high delay. To this end, we combine techniques for efficient, graph-based matching of rumour patterns with effective load shedding that discards some of the input data while minimising the loss in accuracy. Experiments with large-scale real-world datasets illustrate the robustness of our approach in terms of runtime performance and detection accuracy under diverse streaming conditions.

We present a Hoare logic that extends program specifications with regular expressions that capture behaviors in terms of sequences of events that arise during the execution. The idea is similar to session types or process-like behavioral contracts, two currently popular research directions. The approach presented here strikes a particular balance between expressiveness and proof automation, notably, it can capture interesting sequential behavior across multiple iterations of loops. The approach is modular and integrates well with autoactive deductive verification tools. We describe and demonstrate our prototype implementation in SecC using two case studies: A matcher for E-Mail addresses and a specification of the game steps in the VerifyThis Casino challenge.

Previous efforts on making Satisfiability (SAT) solving fit for high performance computing (HPC) have lead to super-linear speedups on particular formulae, but for most inputs cannot make efficient use of a large number of processors. Moreover, long latencies (minutes to days) of job scheduling make large-scale SAT solving on demand impractical for most applications. We address both issues with Mallob, a framework for job scheduling in the context of SAT solving which exploits malleability, i.e., the ability to add or remove processing power from a job during its computation. Mallob includes a massively parallel, distributed, and malleable SAT solving engine based on Hordesat with a more succinct and communication-efficient approach to clause sharing and numerous further improvements over its precursor. For example, Mallob on 640 cores outperforms an updated and improved configuration of Hordesat on 2560 cores. Moreover, Mallob can also solve many formulae in parallel while dynamically adapting the assigned resources, and jobs arriving in the system are usually initiated within a fraction of a second.

Blind inpainting algorithms based on deep learning architectures have shown a remarkable performance in recent years, typically outperforming model-based methods both in terms of image quality and run time. However, neural network strategies typically lack a theoretical explanation, which contrasts with the well-understood theory underlying model-based methods. In this work, we leverage the advantages of both approaches by integrating theoretically founded concepts from transform domain methods and sparse approximations into a CNN-based approach for blind image inpainting. To this end, we present a novel strategy to learn convolutional kernels that applies a specifically designed filter dictionary whose elements are linearly combined with trainable weights. Numerical experiments demonstrate the competitiveness of this approach. Our results show not only an improved inpainting quality compared to conventional CNNs but also significantly faster network convergence within a lightweight network design.

Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation. The GR-BERT achieves new state-of-the-art in lexical substitution task and greatly promotes BERT sentence representation in both unsupervised and supervised STS tasks.

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.

Despite recent success in conditional image synthesis, prevalent input conditions such as semantics and edges are not clear enough to express `Linear (Ridges)' and `Planar (Scale)' representations. To address this problem, we propose a novel framework StyLandGAN, which synthesizes desired landscape images using a depth map which has higher expressive power. Our StyleLandGAN is extended from the unconditional generation model to accept input conditions. We also propose a '2-phase inference' pipeline which generates diverse depth maps and shifts local parts so that it can easily reflect user's intend. As a comparison, we modified the existing semantic image synthesis models to accept a depth map as well. Experimental results show that our method is superior to existing methods in quality, diversity, and depth-accuracy.

This paper studies the synchronization of stochastic linear systems which are subject to a general class of noises, in the sense that the noises are bounded in covariance but might be correlated with the states of agents and among each other. We propose an event-based control protocol for achieving the synchronization among agents in the mean square sense and theoretically analyze the performance of it by using a stochastic Lyapunov function, where the stability of $c$-martingales is particularly developed to handle the challenges brought by the general model of noises and the event-triggering mechanism. The proposed event-based synchronization algorithm is then applied to solve the problem of distributed estimation in sensor network. Specifically, by losslessly decomposing the optimal Kalman filter, it is shown that the problem of distributed estimation can be resolved by using the algorithms designed for achieving the synchronization of stochastic linear systems. As such, an event-based distributed estimation algorithm is developed, where each sensor performs local filtering solely using its own measurement, together with the proposed event-based synchronization algorithm to fuse the local estimates of neighboring nodes. With the reduced communication frequency, the designed estimator is proved to be stable under the minimal requirements of network connectivity and collective system observability.

Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with respect to automatic quality metrics in prior work, we show that it can fail to select the right set of output words, particularly for semantically non-compositional linguistic phenomena such as idiomatic expressions, leading to reduced translation quality as perceived by humans. Trading off latency for quality by increasing the size of the allowed set is often not an option in real-world scenarios. We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest2020 and idiomatic expressions, at an inference latency competitive with alignment-based selection using aggressive thresholds, thereby removing the dependency on separately trained alignment models.

Matrix factorization, one of the most popular methods in machine learning, has recently benefited from introducing non-linearity in prediction tasks using tropical semiring. The non-linearity enables a better fit to extreme values and distributions, thus discovering high-variance patterns that differ from those found by standard linear algebra. However, the optimization process of various tropical matrix factorization methods is slow. In our work, we propose a new method FastSTMF based on Sparse Tropical Matrix Factorization (STMF), which introduces a novel strategy for updating factor matrices that results in efficient computational performance. We evaluated the efficiency of FastSTMF on synthetic and real gene expression data from the TCGA database, and the results show that FastSTMF outperforms STMF in both accuracy and running time. Compared to NMF, we show that FastSTMF performs better on some datasets and is not prone to overfitting as NMF. This work sets the basis for developing other matrix factorization techniques based on many other semirings using a new proposed optimization process.

To tackle the rising phenomenon of hate speech, efforts have been made towards data curation and analysis. When it comes to analysis of bias, previous work has focused predominantly on race. In our work, we further investigate bias in hate speech datasets along racial, gender and intersectional axes. We identify strong bias against African American English (AAE), masculine and AAE+Masculine tweets, which are annotated as disproportionately more hateful and offensive than from other demographics. We provide evidence that BERT-based models propagate this bias and show that balancing the training data for these protected attributes can lead to fairer models with regards to gender, but not race.

Spanning tree of a network or a graph is a subgraph connecting all the nodes with the minimum number of edges. Spanning tree retains the connectivity of a network and possibly other structural properties, and is one of the simplest techniques for network simplification or sampling, and for revealing its backbone or skeleton. The Prim's algorithm and the Kruskal's algorithm are well known algorithms for computing a spanning tree of a weighted network. In this paper, we study the performance of these algorithms on unweighted networks, and compare them to different priority-first search algorithms. We show that the distances between the nodes and the diameter of a network are best preserved by an algorithm based on the breadth-first search node traversal. The algorithm computes a spanning tree with properties of a balanced tree and a power-law node degree distribution. We support our results by experiments on synthetic graphs and more than a thousand real networks, and demonstrate different practical applications of computed spanning trees. We conclude that, if a spanning tree is supposed to retain the distances between the nodes or the diameter of an unweighted network, then the breadth-first search algorithm should be the preferred choice.

Permutation decoding gained recent interest as it can exploit the symmetries of a code in a parallel fashion. Moreover, it has been shown that by viewing permuted polar codes as polar subcodes, the set of usable permutations in permutation decoding can be increased. We extend this idea to pre-transformed polar codes, such as cyclic redundancy check (CRC)-aided polar codes, which previously could not be decoded using permutations due to their lack of automorphisms. Using belief propagation (BP)-based subdecoders, we showcase a performance close to CRC-aided SCL (CA-SCL) decoding. The proposed algorithm outperforms the previously best performing iterative CRC-aided belief propagation list (CA-BPL) decoder both in error-rate performance and decoding latency.

It is widely known how the human ability to cooperate has influenced the thriving of our species. However, as we move towards a hybrid human-machine future, it is still unclear how the introduction of AI agents in our social interactions will affect this cooperative capacity. Within the context of the one-shot collective risk dilemma, where enough members of a group must cooperate in order to avoid a collective disaster, we study the evolutionary dynamics of cooperation in a hybrid population made of both adaptive and fixed-behavior agents. Specifically, we show how the first learn to adapt their behavior to compensate for the behavior of the latter. The less the (artificially) fixed agents cooperate, the more the adaptive population is motivated to cooperate, and vice-versa, especially when the risk is higher. By pinpointing how adaptive agents avoid their share of costly cooperation if the fixed-behavior agents implement a cooperative policy, our work hints towards an unbalanced hybrid world. On one hand, this means that introducing cooperative AI agents within our society might unburden human efforts. Nevertheless, it is important to note that costless artificial cooperation might not be realistic, and more than deploying AI systems that carry the cooperative effort, we must focus on mechanisms that nudge shared cooperation among all members in the hybrid system.

Roadside units (RSUs), which have strong computing capability and are close to vehicle nodes, have been widely used to process delay- and computation-intensive tasks of vehicle nodes. However, due to their high mobility, vehicles may drive out of the coverage of RSUs before receiving the task processing results. In this paper, we propose a mobile edge computing-assisted vehicular network, where vehicles can offload their tasks to a nearby vehicle via a vehicle-to-vehicle (V2V) link or a nearby RSU via a vehicle-to-infrastructure link. These tasks are also migrated by a V2V link or an infrastructure-to-infrastructure (I2I) link to avoid the scenario where the vehicles cannot receive the processed task from the RSUs. Considering mutual interference from the same link of offloading tasks and migrating tasks, we construct a vehicle offloading decision-based game to minimize the computation overhead. We prove that the game can always achieve Nash equilibrium and convergence by exploiting the finite improvement property. We then propose a task migration (TM) algorithm that includes three task-processing methods and two task-migration methods. Based on the TM algorithm, computation overhead minimization offloading (COMO) algorithm is presented. Extensive simulation results show that the proposed TM and COMO algorithms reduce the computation overhead and increase the success rate of task processing.

Lash is a higher-order automated theorem prover created as a fork of the theorem prover Satallax. The basic underlying calculus of Satallax is a ground tableau calculus whose rules only use shallow information about the terms and formulas taking part in the rule. Lash uses new, efficient C representations of vital structures and operations. Most importantly, Lash uses a C representation of (normal) terms with perfect sharing along with a C implementation of normalizing substitutions. We describe the ways in which Lash differs from Satallax and the performance improvement of Lash over Satallax when used with analogous flag settings. With a 10s timeout Lash outperforms Satallax on a collection TH0 problems from the TPTP. We conclude with ideas for continuing the development of Lash.

Sensors embedded in mobile smart devices can monitor users' activity with high accuracy to provide a variety of services to end-users ranging from precise geolocation, health monitoring, and handwritten word recognition. However, this involves the risk of accessing and potentially disclosing sensitive information of individuals to the apps that may lead to privacy breaches. In this paper, we aim to minimize privacy leakages that may lead to user identification on mobile devices through user tracking and distinguishability while preserving the functionality of apps and services. We propose a privacy-preserving mechanism that effectively handles the sensor data fluctuations (e.g., inconsistent sensor readings while walking, sitting, and running at different times) by formulating the data as time-series modeling and forecasting. The proposed mechanism also uses the notion of correlated noise-series against noise filtering attacks from an adversary, which aims to filter out the noise from the perturbed data to re-identify the original data. Unlike existing solutions, our mechanism keeps running in isolation without the interaction of a user or a service provider. We perform rigorous experiments on benchmark datasets and show that our proposed mechanism limits user tracking and distinguishability threats to a significant extent compared to the original data while maintaining a reasonable level of utility of functionalities. In general, we show that our obfuscation mechanism reduces the user trackability threat by 60\% across all the datasets while maintaining the utility loss below 0.5 Mean Absolute Error (MAE). We also observe that our mechanism is more effective in large datasets. For example, with the Swipes dataset, the distinguishability risk is reduced by 60\% on average while the utility loss is below 0.5 MAE.

This paper describes the University of Maryland's submission to the Special Task on Formality Control for Spoken Language Translation at \iwslt, which evaluates translation from English into 6 languages with diverse grammatical formality markers. We investigate to what extent this problem can be addressed with a \textit{single multilingual model}, simultaneously controlling its output for target language and formality. Results show that this strategy can approach the translation quality and formality control achieved by dedicated translation models. However, the nature of the underlying pre-trained language model and of the finetuning samples greatly impact results.

We develop innovative algorithms for solving the strong-constraint formulation of four-dimensional variational data assimilation in large-scale applications. We present a space-time decomposition approach that employs domain decomposition along both the spatial and temporal directions in the overlapping case and involves partitioning of both the solution and the operators. Starting from the global functional defined on the entire domain, we obtain a type of regularized local functionals on the set of subdomains providing the order reduction of both the predictive and the data assimilation models. We analyze the algorithm convergence and its performance in terms of reduction of time complexity and algorithmic scalability. The numerical experiments are carried out on the shallow water equation on the sphere according to the setup available at the Ocean Synthesis/Reanalysis Directory provided by Hamburg University.

The present paper studies mathematical models for representing, imaging, and analyzing polycrystalline materials. We introduce various techniques for converting grain maps into diagram or tessellation representations that rely on constrained clustering. In particular, we show how to significantly accelerate the generalized balanced power diagram method from [1] and how to extend it to allow for optimization over all relevant parameters. A comparison of the accuracies of the proposed approaches is given based on a 3D real-world data set of $339\times 339 \times 599$ voxels.

Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices. In this paper, we propose an innovative approach that integrates the different per-domain per-device models into a unified model, using a combination of domain embedding, domain experts, mixture of experts and adversarial training. We run careful ablation studies to show the benefit of each of these innovations in contributing to the accuracy of the overall unified model. Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models, giving relative gains of up to 10% over a baseline model with negligible increase in the number of parameters.

In this paper, a new weighted first-order formulation is proposed for solving the anisotropic diffusion equations with deep neural networks. For many numerical schemes, the accurate approximation of anisotropic heat flux is crucial for the overall accuracy. In this work, the heat flux is firstly decomposed into two components along the two eigenvectors of the diffusion tensor, thus the anisotropic heat flux approximation is converted into the approximation of two isotropic components. Moreover, to handle the possible jump of the diffusion tensor across the interface, the weighted first-order formulation is obtained by multiplying this first-order formulation by a weighted function. By the decaying property of the weighted function, the weighted first-order formulation is always well-defined in the pointwise way. Finally, the weighted first-order formulation is solved with deep neural network approximation. Compared to the neural network approximation with the original second-order elliptic formulation, the proposed method can significantly improve the accuracy, especially for the discontinuous anisotropic diffusion problems.

In this paper, we rigorously analyze the energy, momentum and magnetic moment behaviours of two splitting methods for solving charged-particle dynamics. The near-conservations of these invariants are given for the system under constant magnetic field or quadratic electric potential. By the approach named as backward error analysis, we derive the modified equations and modified invariants of the splitting methods and based on which, the near-conservations over long times are proved. Some numerical experiments are presented to demonstrate these long time behaviours.

Federated Learning (FL) has been recently receiving increasing consideration from the cybersecurity community as a way to collaboratively train deep learning models with distributed profiles of cyberthreats, with no disclosure of training data. Nevertheless, the adoption of FL in cybersecurity is still in its infancy, and a range of practical aspects have not been properly addressed yet. Indeed, the Federated Averaging algorithm at the core of the FL concept requires the availability of test data to control the FL process. Although this might be feasible in some domains, test network traffic of newly discovered attacks cannot be always shared without disclosing sensitive information. In this paper, we address the convergence of the FL process in dynamic cybersecurity scenarios, where the trained model must be frequently updated with new recent attack profiles to empower all members of the federation with latest detection features. To this aim, we propose FLAD (adaptive Federated Learning Approach to DDoS attack detection), a FL solution for cybersecurity applications based on an adaptive mechanism that orchestrates the FL process by dynamically assigning more computation to those members whose attacks profiles are harder to learn, without the need of sharing any test data to monitor the performance of the trained model. Using a recent dataset of DDoS attacks, we demonstrate that FLAD outperforms the original FL algorithm in terms of convergence time and accuracy across a range of unbalanced datasets of heterogeneous DDoS attacks. We also show the robustness of our approach in a realistic scenario, where we retrain the deep learning model multiple times to introduce the profiles of new attacks on a pre-trained model.

We propose a new approach for unsupervised learning of hyperelastic constitutive laws with physics-consistent deep neural networks. In contrast to supervised learning, which assumes the availability of stress-strain pairs, the approach only uses realistically measurable full-field displacement and global reaction force data, thus it lies within the scope of our recent framework for Efficient Unsupervised Constitutive Law Identification and Discovery (EUCLID) and we denote it as NN-EUCLID. The absence of stress labels is compensated for by leveraging a physics-motivated loss function based on the conservation of linear momentum to guide the learning process. The constitutive model is based on input-convex neural networks, which are capable of learning a function that is convex with respect to its inputs. By employing a specially designed neural network architecture, multiple physical and thermodynamic constraints for hyperelastic constitutive laws, such as material frame indifference, (poly-)convexity, and stress-free reference configuration are automatically satisfied. We demonstrate the ability of the approach to accurately learn several hidden isotropic and anisotropic hyperelastic constitutive laws - including e.g., Mooney-Rivlin, Arruda-Boyce, Ogden, and Holzapfel models - without using stress data. For anisotropic hyperelasticity, the unknown anisotropic fiber directions are automatically discovered jointly with the constitutive model. The neural network-based constitutive models show good generalization capability beyond the strain states observed during training and are readily deployable in a general finite element framework for simulating complex mechanical boundary value problems with good accuracy.

We present fully abstract encodings of the call-by-name and call-by-value $\lambda$-calculus into HOcore, a minimal higher-order process calculus with no name restriction. We consider several equivalences on the $\lambda$-calculus side -- normal-form bisimilarity, applicative bisimilarity, and contextual equivalence -- that we internalize into abstract machines in order to prove full abstraction of the encodings. We also demonstrate that this technique scales to the $\lambda\mu$-calculus, i.e., a standard extension of the $\lambda$-calculus with control operators.

In the course of under a year, the European Commission has launched some of the most important regulatory proposals to date on platform governance. The Commission's goals behind cross-sectoral regulation of this sort include the protection of markets and democracies alike. While all these acts propose sophisticated rules for setting up new enforcement institutions and procedures, one aspect remains highly unclear: how digital enforcement will actually take place in practice. Focusing on the Digital Services Act (DSA), this discussion paper critically addresses issues around social media data access for the purpose of digital enforcement and proposes the use of a legal compliance application programming interface (API) as a means to facilitate compliance with the DSA and complementary European and national regulation. To contextualize this discussion, the paper pursues two scenarios that exemplify the harms arising out of content monetization affecting a particularly vulnerable category of social media users: children. The two scenarios are used to further reflect upon essential issues surrounding data access and legal compliance with the DSA and further applicable legal standards in the field of labour and consumer law.

In this work, a 2D contour generation algorithm is proposed for irregular regions. The contour of the physical domain is approximated by mesh segments using the known coordinates of the contour. For this purpose, the algorithm uses a repeating structure that analyzes the known irregular contour coordinates to approximate the physical domain contour by mesh segments. To this end, the algorithm calculates the slope of the line defined by the known point of the irregular contours and the neighboring vertices. In this way, the algorithm calculates the points of the line and its distance to the closest known nodes of the mesh, allowing to obtain the points of the approximate contour. This process is repeated until the approximate contour is obtained. Therefore, this approximate contour generation algorithm, from known nodes of a mesh, is suitable for describing meshes involving geometries with irregular contours and for calculating finite differences in numerical simulations. The contour is evaluated through three geometries, the difference between the areas delimited by the given contour and the approximate contour, the number of nodes and the number of internal points. It can be seen that the increase in geometry complexity implies the need for a greater number of nodes in the contour, generating more refined meshes that allow reaching differences in areas below 2%.

We revisit the problem of determining the independent domination number in hypercubes for which the known upper bound is still not tight for general dimensions. We present here a constructive method to build an independent dominating set $S_n$ for the $n$-dimensional hypercube $Q_n$, where $n=2p+1$, $p$ being a positive integer $\ge 1$, provided an independent dominating set $S_p$ for the $p$-dimensional hypercube $Q_p$, is known. The procedure also computes the minimum independent dominating set for all $n=2^k-1$, $k>1$. Finally, we establish that the independent domination number $\alpha_n\leq 3 \times 2^{n-k-2}$ for $7\times 2^{k-2}-1\leq n<2^{k+1}-1$, $k>1$. This is an improved upper bound for this range as compared to earlier work.

Classical multiple instance learning (MIL) methods are often based on the identical and independent distributed assumption between instances, hence neglecting the potentially rich contextual information beyond individual entities. On the other hand, Transformers with global self-attention modules have been proposed to model the interdependencies among all instances. However, in this paper we question: Is global relation modeling using self-attention necessary, or can we appropriately restrict self-attention calculations to local regimes in large-scale whole slide images (WSIs)? We propose a general-purpose local attention graph-based Transformer for MIL (LA-MIL), introducing an inductive bias by explicitly contextualizing instances in adaptive local regimes of arbitrary size. Additionally, an efficiently adapted loss function enables our approach to learn expressive WSI embeddings for the joint analysis of multiple biomarkers. We demonstrate that LA-MIL achieves state-of-the-art results in mutation prediction for gastrointestinal cancer, outperforming existing models on important biomarkers such as microsatellite instability for colorectal cancer. This suggests that local self-attention sufficiently models dependencies on par with global modules. Our implementation will be published.

Existing protocols for multilateral negotiation require a full consensus among the negotiating parties. In contrast, we propose a protocol for multilateral negotiation that allows partial consensus, wherein only a subset of the negotiating parties can reach an agreement. We motivate problems that require such a protocol and describe the protocol formally.

AI-synthesized faces are visually challenging to discern from real ones. They have been used as profile images for fake social media accounts, which leads to high negative social impacts. Although progress has been made in developing automatic methods to detect AI-synthesized faces, there is no open platform to study the human performance of AI-synthesized faces detection. In this work, we develop an online platform called Open-eye to study the human performance of AI-synthesized face detection. We describe the design and workflow of the Open-eye in this paper.

Deepfakes are a form of synthetic image generation used to generate fake videos of individuals for malicious purposes. The resulting videos may be used to spread misinformation, reduce trust in media, or as a form of blackmail. These threats necessitate automated methods of deepfake video detection. This paper investigates whether temporal information can improve the deepfake detection performance of deep learning models. To investigate this, we propose a framework that classifies new and existing approaches by their defining characteristics. These are the types of feature extraction: automatic or manual, and the temporal relationship between frames: dependent or independent. We apply this framework to investigate the effect of temporal dependency on a model's deepfake detection performance. We find that temporal dependency produces a statistically significant (p < 0.05) increase in performance in classifying real images for the model using automatic feature selection, demonstrating that spatio-temporal information can increase the performance of deepfake video detection models.

The Sinkhorn operator has recently experienced a surge of popularity in computer vision and related fields. One major reason is its ease of integration into deep learning frameworks. To allow for an efficient training of respective neural networks, we propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation. In comparison to prior work, our framework is based on the most general formulation of the Sinkhorn operator. It allows for any type of loss function, while both the target capacities and cost matrices are differentiated jointly. We further construct error bounds of the resulting algorithm for approximate inputs. Finally, we demonstrate that for a number of applications, simply replacing automatic differentiation with our algorithm directly improves the stability and accuracy of the obtained gradients. Moreover, we show that it is computationally more efficient, particularly when resources like GPU memory are scarce.

We present the first shared task on semantic change discovery and detection in Spanish and create the first dataset of Spanish words manually annotated for semantic change using the DURel framework (Schlechtweg et al., 2018). The task is divided in two phases: 1) Graded Change Discovery, and 2) Binary Change Detection. In addition to introducing a new language the main novelty with respect to the previous tasks consists in predicting and evaluating changes for all vocabulary words in the corpus. Six teams participated in phase 1 and seven teams in phase 2 of the shared task, and the best system obtained a Spearman rank correlation of 0.735 for phase 1 and an F1 score of 0.716 for phase 2. We describe the systems developed by the competing teams, highlighting the techniques that were particularly useful and discuss the limits of these approaches.

A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clustering termed as DRBM-ClustNet is proposed. This core-clustering engine consists of a Deep Restricted Boltzmann Machine (DRBM) for processing unlabeled data by creating new features that are uncorrelated and have large variance with each other. Next, the number of clusters are predicted using the Bayesian Information Criterion (BIC), followed by a Kohonen Network-based clustering layer. The processing of unlabeled data is done in three stages for efficient clustering of the non-linearly separable datasets. In the first stage, DRBM performs non-linear feature extraction by capturing the highly complex data representation by projecting the feature vectors of $d$ dimensions into $n$ dimensions. Most clustering algorithms require the number of clusters to be decided a priori, hence here to automate the number of clusters in the second stage we use BIC. In the third stage, the number of clusters derived from BIC forms the input for the Kohonen network, which performs clustering of the feature-extracted data obtained from the DRBM. This method overcomes the general disadvantages of clustering algorithms like the prior specification of the number of clusters, convergence to local optima and poor clustering accuracy on non-linear datasets. In this research we use two synthetic datasets, fifteen benchmark datasets from the UCI Machine Learning repository, and four image datasets to analyze the DRBM-ClustNet. The proposed framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods. The obtained results demonstrate that the DRBM-ClustNet outperforms state-of-the-art clustering algorithms.

Existing knowledge distillation methods mostly focus on distillation of teacher's prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel {\em \modelname{}} ({\bf\em \shortname{})} method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions. This is accomplished by introducing a notion of cross-network logit computed through passing student's representation into teacher's classifier. Further, considering the set of seen classes as a basis for the semantic space in a combinatorial perspective, we scale \shortname{} to unseen classes for enabling effective exploitation of largely available, arbitrary unlabeled training data. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL). Extensive experiments show that our \shortname{} outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks, as well as less studied yet practically crucial binary network distillation. Under more realistic open-set SSL settings we introduce, we reveal that knowledge distillation is generally more effective than existing Out-Of-Distribution (OOD) sample detection, and our proposed \shortname{} is superior over both previous distillation and SSL competitors. The source code is available at \url{https://github.com/jingyang2017/SRD\_ossl}.

During the past decade, neural network models have made tremendous progress on in-domain semantic role labeling (SRL). However, performance drops dramatically under the out-of-domain setting. In order to facilitate research on cross-domain SRL, this paper presents MuCPAD, a multi-domain Chinese predicate-argument dataset, which consists of 30,897 sentences and 92,051 predicates from six different domains. MuCPAD exhibits three important features. 1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates. 2) We explicitly annotate omitted core arguments to recover more complete semantic structure, considering that omission of content words is ubiquitous in multi-domain Chinese texts. 3) We compile 53 pages of annotation guidelines and adopt strict double annotation for improving data quality. This paper describes in detail the annotation methodology and annotation process of MuCPAD, and presents in-depth data analysis. We also give benchmark results on cross-domain SRL based on MuCPAD.

We consider physics-informed neural networks [Raissi et al., J. Comput. Phys. 278 (2019) 686-707] for forward physical problems. In order to find optimal PINNs configuration, we introduce a hyper-parameter tuning procedure via Gaussian processes-based Bayesian optimization. We apply the procedure to Helmholtz problems for bounded domains and conduct a thorough study, focusing on: (i) performance, (ii) the collocation points density $r$ and (iii) the frequency $\kappa$, confirming the applicability and necessity of the method. Numerical experiments are performed in two and three dimensions, including comparison to finite element methods.

We characterize the capacity for the discrete-time arbitrarily varying channel with discrete inputs, outputs, and states when (a) the encoder and decoder do not share common randomness, (b) the input and state are subject to cost constraints, (c) the transition matrix of the channel is deterministic given the state, and (d) at each time step the adversary can only observe the current and past channel inputs when choosing the state at that time. The achievable strategy involves stochastic encoding together with list decoding and a disambiguation step. The converse uses a two-phase "babble-and-push" strategy where the adversary chooses the state randomly in the first phase, list decodes the output, and then chooses state inputs to symmetrize the channel in the second phase. These results generalize prior work on specific channels models (additive, erasure) to general discrete alphabets and models.

Robotic cloth manipulation is challenging due to its deformability, which makes determining its full state infeasible. However, for cloth folding, it suffices to know the position of a few semantic keypoints. Convolutional neural networks (CNN) can be used to detect these keypoints, but require large amounts of annotated data, which is expensive to collect. To overcome this, we propose to learn these keypoint detectors purely from synthetic data, enabling low-cost data collection. In this paper, we procedurally generate images of towels and use them to train a CNN. We evaluate the performance of this detector for folding towels on a unimanual robot setup and find that the grasp and fold success rates are 77% and 53%, respectively. We conclude that learning keypoint detectors from synthetic data for cloth folding and related tasks is a promising research direction, discuss some failures and relate them to future work. A video of the system, as well as the codebase, more details on the CNN architecture and the training setup can be found at https://github.com/tlpss/workshop-icra-2022-cloth-keypoints.git.

A fundamental problem in the field of unsupervised machine learning is the detection of anomalies corresponding to rare and unusual observations of interest; reasons include for their rejection, accommodation or further investigation. Anomalies are intuitively understood to be something unusual or inconsistent, whose occurrence sparks immediate attention. More formally anomalies are those observations-under appropriate random variable modelling-whose expectation of occurrence with respect to a grouping of prior interest is less than one; such a definition and understanding has been used to develop the parameter-free perception anomaly detection algorithm. The present work seeks to establish important and practical connections between the approach used by the perception algorithm and prior decades of research in neurophysiology and computational neuroscience; particularly that of information processing in the retina and visual cortex. The algorithm is conceptualised as a neuron model which forms the kernel of an unsupervised neural network that learns to signal unexpected observations as anomalies. Both the network and neuron display properties observed in biological processes including: immediate intelligence; parallel processing; redundancy; global degradation; contrast invariance; parameter-free computation, dynamic thresholds and non-linear processing. A robust and accurate model for anomaly detection in univariate and multivariate data is built using this network as a concrete application.

We present equivalent conditions and asymptotic models for the diffraction problem of elastic and acoustic waves in a solid medium surrounded by a thin layer of fluid medium. Due to the thinness of the layer with respect to the wavelength, this problem is well suited for the notion of equivalent conditions and the effect of the fluid medium on the solid is as a first approximation local. We derive and validate equivalent conditions up to the fourth order for the elastic displacement. These conditions approximate the acoustic waves which propagate in the fluid region. This approach leads to solve only elastic equations. The construction of equivalent conditions is based on a multiscale expansion in power series of the thickness of the layer for the solution of the transmission problem.

Analyzing third-party software such as malware or firmware is a crucial task for security analysts. Although various approaches for automatic analysis exist and are the subject of ongoing research, analysts often have to resort to manual static analysis to get a deep understanding of a given binary sample. Since the source code of encountered samples is rarely available, analysts regularly employ decompilers for easier and faster comprehension than analyzing a binary's disassembly. In this paper, we introduce our decompilation approach dewolf. We developed a variety of improvements over the previous academic state-of-the-art decompiler and some novel algorithms to enhance readability and comprehension, focusing on manual analysis. To evaluate our approach and to obtain a better insight into the analysts' needs, we conducted three user surveys. The results indicate that dewolf is suitable for malware comprehension and that its output quality noticeably exceeds Ghidra and Hex-Rays in certain aspects. Furthermore, our results imply that decompilers aiming at manual analysis should be highly configurable to respect individual user preferences. Additionally, future decompilers should not necessarily follow the unwritten rule to stick to the code-structure dictated by the assembly in order to produce readable output. In fact, the few cases where dewolf already cracks this rule lead to its results considerably exceeding other decompilers. We publish a prototype implementation of dewolf and all survey results on GitHub.

We study a pitfall in the typical workflow for differentially private machine learning. The use of differentially private learning algorithms in a "drop-in" fashion -- without accounting for the impact of differential privacy (DP) noise when choosing what feature engineering operations to use, what features to select, or what neural network architecture to use -- yields overly complex and poorly performing models. In other words, by anticipating the impact of DP noise, a simpler and more accurate alternative model could have been trained for the same privacy guarantee. We systematically study this phenomenon through theory and experiments. On the theory front, we provide an explanatory framework and prove that the phenomenon arises naturally from the addition of noise to satisfy differential privacy. On the experimental front, we demonstrate how the phenomenon manifests in practice using various datasets, types of models, tasks, and neural network architectures. We also analyze the factors that contribute to the problem and distill our experimental insights into concrete takeaways that practitioners can follow when training models with differential privacy. Finally, we propose privacy-aware algorithms for feature selection and neural network architecture search. We analyze their differential privacy properties and evaluate them empirically.

Video frame interpolation involves the synthesis of new frames from existing ones. Convolutional neural networks (CNNs) have been at the forefront of the recent advances in this field. One popular CNN-based approach involves the application of generated kernels to the input frames to obtain an interpolated frame. Despite all the benefits interpolation methods offer, many of these networks require a lot of parameters, with more parameters meaning a heavier computational burden. Reducing the size of the model typically impacts performance negatively. This paper presents a method for parameter reduction for a popular flow-less kernel-based network (Adaptive Collaboration of Flows). Through our technique of removing the layers that require the most parameters and replacing them with smaller encoders, we reduce the number of parameters of the network and even achieve better performance compared to the original method. This is achieved by deploying rotation to force each individual encoder to learn different features from the input images. Ablations are conducted to justify design choices and an evaluation on how our method performs on full-length videos is presented.

Planning the defossilization of energy systems by facilitating high penetration of renewables and maintaining access to abundant and affordable primary energy resources is a nontrivial multi-objective problem encompassing economic, technical, environmental, and social aspects. However, so far, most long-term policies to decrease the carbon footprint of our societies consider the cost of the system as the leading indicator in the energy system models. To address this gap, we developed a new approach by adding the energy return on investment (EROI) in a whole-energy system model. We built the database with all EROI technologies and resources considered. This novel model is applied to the Belgian energy system in 2035 for several greenhouse gas emissions targets. However, moving away from fossil-based to carbon-neutral energy systems raises the issue of the uncertainty of low-carbon technologies and resource data. Thus, we conduct a global sensitivity analysis to identify the main parameters driving the variations in the EROI of the system. In this case study, the main results are threefold: (i) the EROI of the system decreases from 8.9 to 3.9 when greenhouse gas emissions are reduced by 5; (ii) the renewable fuels - mainly imported renewable gas - represent the largest share of the system primary energy mix due to the lack of endogenous renewable resources such as wind and solar; (iii) in the sensitivity analysis, the renewable fuels drive 67% of the variation of the EROI of the system for low greenhouse gas emissions scenarios. The decrease in the EROI of the system raises questions about meeting the climate targets without adverse socio-economic impact. Thus, accounting for other criteria in energy planning models that nuance the cost-based results is essential to guide policy-makers in addressing the challenges of the energy transition.

Federated learning systems facilitate training of global models in settings where potentially heterogeneous data is distributed across a large number of clients. Such systems operate in settings with intermittent client availability and/or time-varying communication constraints. As a result, the global models trained by federated learning systems may be biased towards clients with higher availability. We propose F3AST, an unbiased algorithm that dynamically learns an availability-dependent client selection strategy which asymptotically minimizes the impact of client-sampling variance on the global model convergence, enhancing performance of federated learning. The proposed algorithm is tested in a variety of settings for intermittently available clients under communication constraints, and its efficacy demonstrated on synthetic data and realistically federated benchmarking experiments using CIFAR100 and Shakespeare datasets. We show up to 186% and 8% accuracy improvements over FedAvg, and 8% and 7% over FedAdam on CIFAR100 and Shakespeare, respectively.

The quasi-static multiple network poroelastic theory (MPET) model, first introduced in the context of geomechanics, has recently found new applications in medicine. In practice, the parameters in the MPET equations can vary over several orders of magnitude which makes their stable discretization and fast solution a challenging task. Here, a new efficient parameter-robust hybridized discontinuous Galerkin method, which also features fluid mass conservation, is proposed for the MPET model. Its stability analysis which is crucial for the well-posedness of the discrete problem is performed and cost-efficient fast parameter-robust preconditioners are derived. We present a series of numerical computations for a 4-network MPET model of a human brain which support the performance of the new algorithms.

State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require reasoning over numbers. Recent work sees two main reasons for this: (1) popular tokenisation algorithms are optimized for common words, and therefore have limited expressiveness for numbers, and (2) common pretraining objectives do not target numerical reasoning or understanding numbers at all. Recent approaches usually address them separately and mostly by proposing architectural changes or pretraining models from scratch. In this paper, we propose a new extended pretraining approach called reasoning-aware pretraining to jointly address both shortcomings without requiring architectural changes or pretraining from scratch. Using contrastive learning, our approach incorporates an alternative number representation into an already pretrained model, while improving its numerical reasoning skills by training on a novel pretraining objective called inferable number prediction task. We evaluate our approach on three different tasks that require numerical reasoning, including (a) reading comprehension in the DROP dataset, (b) inference-on-tables in the InfoTabs dataset, and (c) table-to-text generation in WikiBio and SciGen datasets. Our results on DROP and InfoTabs show that our approach improves the accuracy by 9.6 and 33.9 points on these datasets, respectively. Our human evaluation on SciGen and WikiBio shows that our approach improves the factual correctness on all datasets.

A matrix $A$ is said to have the $\ell_p$-Restricted Isometry Property ($\ell_p$-RIP) if for all vectors $x$ of up to some sparsity $k$, $\|Ax\|_p$ is roughly proportional to $\|x\|_p$. It is known that with high probability, random dense $m\times n$ matrices (e.g., with i.i.d.\ $\pm 1$ entries) are $\ell_2$-RIP with $k \approx m/\log n$, and sparse random matrices are $\ell_p$-RIP for $p \in [1,2)$ when $k, m = \Theta(n)$. However, when $m = \Theta(n)$, sparse random matrices are known to \emph{not} be $\ell_2$-RIP with high probability. With this backdrop, we show that there are no sparse matrices with $\pm 1$ entries that are $\ell_2$-RIP. On the other hand, for $p \neq 2$, we show that any $\ell_p$-RIP matrix \emph{must} be sparse. Thus, sparsity is incompatible with $\ell_2$-RIP, but necessary for $\ell_p$-RIP for $p \neq 2$.

Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $\omega(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high probability over the choice of the hypergraph, our spectral algorithm certifies a bound of $\tilde{O}(\sqrt{n})$ on the clique number in polynomial time. This matches, up to $\textrm{polylog}(n)$ factors, the best known certificate for the clique number in random graphs, which is the special case of $k = 2$. Prior to our work, the best known refutation algorithms [CGL04, AOW15] rely on a reduction to the problem of refuting random $k$-XOR via Feige's XOR trick [Fei02], and yield a polynomially worse bound of $\tilde{O}(n^{3/4})$ on the clique number when $p = O(1)$. Our algorithm bypasses the XOR trick and relies instead on a natural generalization of the Lovasz theta semidefinite programming relaxation for cliques in hypergraphs.

Recognition of text on word or line images, without the need for sub-word segmentation has become the mainstream of research and development of text recognition for Indian languages. Modelling unsegmented sequences using Connectionist Temporal Classification (CTC) is the most commonly used approach for segmentation-free OCR. In this work we present a comprehensive empirical study of various neural network models that uses CTC for transcribing step-wise predictions in the neural network output to a Unicode sequence. The study is conducted for 13 Indian languages, using an internal dataset that has around 1000 pages per language. We study the choice of line vs word as the recognition unit, and use of synthetic data to train the models. We compare our models with popular publicly available OCR tools for end-to-end document image recognition. Our end-to-end pipeline that employ our recognition models and existing text segmentation tools outperform these public OCR tools for 8 out of the 13 languages. We also introduce a new public dataset called Mozhi for word and line recognition in Indian language. The dataset contains more than 1.2 million annotated word images (120 thousand text lines) across 13 Indian languages. Our code, trained models and the Mozhi dataset will be made available at this http URL

Learning from limited and imbalanced data is a challenging problem in the Artificial Intelligence community. Real-time scenarios demand decision-making from rare events wherein the data are typically imbalanced. These situations commonly arise in medical applications, cybersecurity, catastrophic predictions etc. This motivates development of learning algorithms capable of learning from imbalanced data. Human brain effortlessly learns from imbalanced data. Inspired by the chaotic neuronal firing in the human brain, a novel learning algorithm namely \emph{Neurochaos Learning} (NL) was recently proposed. NL is categorized in three blocks: Feature Transformation, Neurochaos Feature Extraction (CFX), and Classification. In this work, the efficacy of neurochaos feature transformation and extraction for classification in imbalanced learning is studied. We propose a unique combination of neurochaos based feature transformation and extraction with traditional ML algorithms. The explored datasets in this study revolve around medical diagnosis, banknote fraud detection, environmental applications and spoken-digit classification. In this study, experiments are performed in both high and low training sample regime. In the former, five out of nine datasets have shown a performance boost in terms of macro F1-score after using CFX features. The highest performance boost obtained is $\textbf{25.97\%}$ for {\it Statlog (Heart)} dataset using CFX+Decision Tree. In the low training sample regime (from just one to nine training samples per class), the highest performance boost of $\textbf{144.38\%}$ is obtained for {\it Haberman's Survival} dataset using CFX+Random Forest. NL offers enormous flexibility of combining CFX with any ML classifier to boost its performance, especially for learning tasks with limited and imbalanced data.

Few-shot learning (FSL) has emerged as an effective learning method and shows great potential. Despite the recent creative works in tackling FSL tasks, learning valid information rapidly from just a few or even zero samples still remains a serious challenge. In this context, we extensively investigated 200+ latest papers on FSL published in the past three years, aiming to present a timely and comprehensive overview of the most recent advances in FSL along with impartial comparisons of the strengths and weaknesses of the existing works. For the sake of avoiding conceptual confusion, we first elaborate and compare a set of similar concepts including few-shot learning, transfer learning, and meta-learning. Furthermore, we propose a novel taxonomy to classify the existing work according to the level of abstraction of knowledge in accordance with the challenges of FSL. To enrich this survey, in each subsection we provide in-depth analysis and insightful discussion about recent advances on these topics. Moreover, taking computer vision as an example, we highlight the important application of FSL, covering various research hotspots. Finally, we conclude the survey with unique insights into the technology evolution trends together with potential future research opportunities in the hope of providing guidance to follow-up research.

In order to address the limitations of gestural capabilities in physical robots, researchers in Virtual, Augmented, Mixed Reality Human-Robot Interaction (VAM-HRI) have been using augmented-reality visualizations that increase robot expressivity and improve user perception (e.g., social presence). While a multitude of virtual robot deictic gestures (e.g., pointing to an object) have been implemented to improve interactions within VAM-HRI, such systems are often reported to have tradeoffs between functional and social user perceptions of robots, creating a need for a unified approach that considers both attributes. We performed a literature analysis that selected factors that were noted to significantly influence either user perception or task efficiency and propose a set of design considerations and recommendations that address those factors by combining anthropomorphic and non-anthropomorphic virtual gestures based on the motivation of the interaction, visibility of the target and robot, salience of the target, and distance between the target and robot. The proposed recommendations provide the VAM-HRI community with starting points for selecting appropriate gesture types for a multitude of interaction contexts.

In this paper, we describe the magnetic potential in the vicinity of a corner of a conducting body embedded in a dielectric medium in a bidimensional setting. We make explicit the corner asymptotic expansion for this potential as the distance to the corner goes to zero. This expansion involves singular functions and singular coefficients. We introduce a method for the calculation of the singular functions near the corner and we provide two methods to compute the singular coefficients: the method of moments and the method of quasi-dual singular functions. Estimates for the convergence of both approximate methods are proven. We eventually illustrate the theoretical results with finite element computations. The specific non-standard feature of this problem lies in the structure of its singular functions: They have the form of series whose first terms are harmonic polynomials and further terms are genuine non-smooth functions generated by the piecewise constant zeroth order term of the operator.

Ensuring safety of reinforcement learning (RL) algorithms is crucial for many real-world tasks. However, vanilla RL does not guarantee safety for an agent. In recent years, several methods have been proposed to provide safety guarantees for RL. To the best of our knowledge, there is no comprehensive comparison of these provably safe RL methods. We therefore introduce a categorization for existing provably safe RL methods, and present the theoretical foundations for both continuous and discrete action spaces. Additionally, we evaluate provably safe RL on an inverted pendulum. In the experiments, it is shown that indeed only provably safe RL methods guarantee safety.

Recent work has shown that systems for speech translation (ST) -- similarly to automatic speech recognition (ASR) -- poorly handle person names. This shortcoming does not only lead to errors that can seriously distort the meaning of the input, but also hinders the adoption of such systems in application scenarios (like computer-assisted interpreting) where the translation of named entities, like person names, is crucial. In this paper, we first analyse the outputs of ASR/ST systems to identify the reasons of failures in person name transcription/translation. Besides the frequency in the training data, we pinpoint the nationality of the referred person as a key factor. We then mitigate the problem by creating multilingual models, and further improve our ST systems by forcing them to jointly generate transcripts and translations, prioritising the former over the latter. Overall, our solutions result in a relative improvement in token-level person name accuracy by 47.8% on average for three language pairs (en->es,fr,it).

Explanation is important for text classification tasks. One prevalent type of explanation is rationales, which are text snippets of input text that suffice to yield the prediction and are meaningful to humans. A lot of research on rationalization has been based on the selective rationalization framework, which has recently been shown to be problematic due to the interlocking dynamics. In this paper, we show that we address the interlocking problem in the multi-aspect setting, where we aim to generate multiple rationales for multiple outputs. More specifically, we propose a multi-stage training method incorporating an additional self-supervised contrastive loss that helps to generate more semantically diverse rationales. Empirical results on the beer review dataset show that our method improves significantly the rationalization performance.

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

This paper investigates the structure-property relations of thin-walled lattices under dynamic longitudinal compression, characterized by their cross-sections and heights. These relations elucidate the interactions of different geometric features of a design on mechanical response, including energy absorption. We proposed a combinatorial, key-based design system to generate different lattice designs and used the finite element method to simulate their response with the Johnson-Cook material model. Using an autoencoder, we encoded the cross-sectional images of the lattices into latent design feature vectors, which were supplied to the neural network model to generate predictions. The trained models can accurately predict lattice energy absorption curves in the key-based design system and can be extended to new designs outside of the system via transfer learning.

Objectives: We describe here the interviews we conducted in late 2021 with 19 researchers at the Department of Classical Philology and Italian Studies at the University of Bologna. The purpose has been to shed light on the definition of the word "data" in the humanities domain, as far as FAIR data management practices are concerned, and on what researchers think of the term. Methods: We invited one researcher for each of the disciplinary areas represented within the department and all 19 accepted to participate in the study. We divided participants into 5 main research areas: philology and literary criticism, language and linguistics, history of art, computer science, archival studies. The interviews were transcribed and analysed using a grounded theory approach. Results: A list of 13 research data types in the humanities has been compiled thanks to the information collected from participants; publications emerge as the most important one. The term "data" does not seem to be especially problematic, contrary to what has been reported elsewhere. Regarding current research and data management practices, methodologies and teamwork appear more central than previously reported. Conclusions: "Data" in the FAIR framework need to include all types of input and outputs humanities research work with, including publications. Humanities researchers appear ready for a discussion around making their data FAIR: they do not find the terminology particularly problematic, while they rely on precise and recognised methodologies, as well as on sharing and collaboration. Future studies should include more disciplines to paint a more precise picture.

The ability to detect whether an object is a 2D or 3D object is extremely important in autonomous driving, since a detection error can have life-threatening consequences, endangering the safety of the driver, passengers, pedestrians, and others on the road. Methods proposed to distinguish between 2 and 3D objects (e.g., liveness detection methods) are not suitable for autonomous driving, because they are object dependent or do not consider the constraints associated with autonomous driving (e.g., the need for real-time decision-making while the vehicle is moving). In this paper, we present EyeDAS, a novel few-shot learning-based method aimed at securing an object detector (OD) against the threat posed by the stereoblindness syndrome (i.e., the inability to distinguish between 2D and 3D objects). We evaluate EyeDAS's real-time performance using 2,000 objects extracted from seven YouTube video recordings of street views taken by a dash cam from the driver's seat perspective. When applying EyeDAS to seven state-of-the-art ODs as a countermeasure, EyeDAS was able to reduce the 2D misclassification rate from 71.42-100% to 2.4% with a 3D misclassification rate of 0% (TPR of 1.0). We also show that EyeDAS outperforms the baseline method and achieves an AUC of over 0.999 and a TPR of 1.0 with an FPR of 0.024.

Smart contracts show a high potential for ensuring that Supply Chain Management strategies make a qualitative leap toward higher levels of optimality, not only in terms of efficiency and profitability but also in the aggregation of skills aimed at creating the best products and services to bring to the market. In this article, we illustrate an architecture that employs smart contracts to implement various algorithmic versions of the Income Sharing principle between companies participating in a supply chain. We implement our approach on Hyperledger Fabric, the most widespread platform for private and consortium distributed ledgers, and discuss its suitability to our purposes by comparing this design choice with the alternative given by public blockchains, with particular attention to Ethereum.

This article presents new PEM fuel cell models with hexagonal and pentagonal designs. After observing cell performance improvement in these models, we optimized them. Inlet pressure and temperature were used as input parameters, and consumption and output power were the target parameters of the multi-objective optimization algorithm. Then we used artificial intelligence techniques, including deep neural networks and polynomial regression, to model the data. Next, we employed the RSM (Response Surface Method) method to derive the target functions. Furthermore, we applied the NSGA-II multi-objective genetic algorithm to optimize the targets. Compared to the base model (Cubic), the optimized Pentagonal and Hexagonal models averagely increase the output current density by 21.819% and 39.931%, respectively.

The Gravitational Search Algorithm (GSA) is an optimization algorithm based on Newton's laws of gravity and dynamics. Introduced in 2009, the GSA already has several versions and applications. However, its performance depends on the values of its parameters, which are determined empirically. Hence, its generality is compromised, because the parameters that are suitable for a particular application are not necessarily suitable for another. This paper proposes the Gravitational Search Algorithm with Normalized Gravitational Constant (GSA-NGC), which defines a new heuristic to determine the initial gravitational constant of the GSA. The new heuristic is grounded in the Brans-Dicke theory of gravitation and takes into consideration the multiple dimensions of the search space of the application. It aims to improve the final solution and reduce the number of iterations and premature convergences of the GSA. The GSA-NGC is validated experimentally, proving to be suitable for various applications and improving significantly the generality, performance, and efficiency of the GSA.

Information-theoretic fitness functions are becoming increasingly popular to produce generally useful, task-independent behaviors. One such universal function, dubbed empowerment, measures the amount of control an agent exerts on its environment via its sensorimotor system. Specifically, empowerment attempts to maximize the mutual information between an agent's actions and its received sensor states at a later point in time. Traditionally, empowerment has been applied to a conventional sensorimotor apparatus, such as a robot. Here, we expand the approach to a distributed, multi-agent sensorimotor system embodied by a neural cellular automaton (NCA). We show that the addition of empowerment as a secondary objective in the evolution of NCA to perform the task of morphogenesis, growing and maintaining a pre-specified shape, results in higher fitness compared to evolving for morphogenesis alone. Results suggest there may be a synergistic relationship between morphogenesis and empowerment. That is, indirectly selecting for coordination between neighboring cells over the duration of development is beneficial to the developmental process itself. Such a finding may have applications in developmental biology by providing potential mechanisms of communication between cells during growth from a single cell to a multicellular, target morphology. Source code for the experiments in this paper can be found at: \url{https://github.com/caitlingrasso/empowered-nca}.

Hardware accelerators, such as those based on GPUs and FPGAs, offer an excellent opportunity to efficiently parallelize functionalities. Recently, modern embedded platforms started being equipped with such accelerators, resulting in a compelling choice for emerging, highly computational intensive workloads, like those required by next-generation autonomous driving systems. Alongside the need for computational efficiency, such workloads are commonly characterized by real-time requirements, which need to be satisfied to guarantee the safe and correct behavior of the system. To this end, this paper proposes a holistic framework to help designers partition real-time applications on heterogeneous platforms with hardware accelerators. The proposed model is inspired by a realistic setup of an advanced driving assistance system presented in the WATERS 2019 Challenge by Bosch, further generalized to encompass a broader range of heterogeneous architectures. The resulting analysis is linearized and used to encode an optimization problem that jointly (i) guarantees timing constraints, (ii) finds a suitable task-to-core mapping, (iii) assigns a priority to each task, and (iv) selects which computations to accelerate, seeking for the most convenient trade-off between the smaller worst-case execution time provided by accelerators and synchronization and queuing delays.

The use case of C-V2X for road safety requires real-time network connection and information exchanging between vehicles. In order to improve the reliability and safety of the system, intelligent networked vehicles need to move cooperatively to achieve network optimization. In this paper, we use the C-V2X sidelink mode 4 abstraction and the regression results of C-V2X network level simulation to formulate the optimization of packet reception rate (PRR) with fairness in the road safety scenario. Under the optimization framework, we design a controlled mobility algorithm for the transmission node to adaptively adjust its position to maximize the aggregated PRR using only one-hop information. Simulation result shows that the algorithm converges and improve the aggregated PRR and fairness for C-V2X mode broadcast messages.

Being able to dynamically control the transmitted-beam divergence can bring important advantages in free-space optical communications. Specifically, this technique can help to optimize the overall communications performance when the optimum laser-beam divergence is not fixed or known. This is the case in most realistic space laser communication systems, since the optimum beam divergence depends on multiple factors that can vary with time, such as the link distance, or cannot be accurately known, such as the actual pointing accuracy. A dynamic beam-divergence control allows to optimize the link performance for every platform, scenario, and condition. NICT is currently working towards the development of a series of versatile lasercom terminals that can fit a variety of conditions, for which the adaptive element of the transmitted beam divergence is a key element. This manuscript presents a prototype of a beam-divergence control system designed and developed by NICT and Tamron to evaluate this technique and to be later integrated within the lasercom terminals. The basic design of the prototype is introduced as well as the first validation tests that demonstrate its performance.

Recently, weakly-supervised image segmentation using weak annotations like scribbles has gained great attention, since such annotations are much easier to obtain compared to time-consuming and label-intensive labeling at the pixel/voxel level. However, because scribbles lack structure information of region of interest (ROI), existing scribble-based methods suffer from poor boundary localization. Furthermore, most current methods are designed for 2D image segmentation, which do not fully leverage the volumetric information if directly applied to image slices. In this paper, we propose a scribble-based volumetric image segmentation, Scribble2D5, which tackles 3D anisotropic image segmentation and improves boundary prediction. To achieve this, we augment a 2.5D attention UNet with a proposed label propagation module to extend semantic information from scribbles and a combination of static and active boundary prediction to learn ROI's boundary and regularize its shape. Extensive experiments on three public datasets demonstrate Scribble2D5 significantly outperforms current scribble-based methods and approaches the performance of fully-supervised ones. Our code is available online.

Building sound and precise static call graphs for real-world JavaScript applications poses an enormous challenge, due to many hard-to-analyze language features. Further, the relative importance of these features may vary depending on the call graph algorithm being used and the class of applications being analyzed. In this paper, we present a technique to automatically quantify the relative importance of different root causes of call graph unsoundness for a set of target applications. The technique works by identifying the dynamic function data flows relevant to each call edge missed by the static analysis, correctly handling cases with multiple root causes and inter-dependent calls. We apply our approach to perform a detailed study of the recall of a state-of-the-art call graph construction technique on a set of framework-based web applications. The study yielded a number of useful insights. We found that while dynamic property accesses were the most common root cause of missed edges across the benchmarks, other root causes varied in importance depending on the benchmark, potentially useful information for an analysis designer. Further, with our approach, we could quickly identify and fix a recall issue in the call graph builder we studied, and also quickly assess whether a recent analysis technique for Node.js-based applications would be helpful for browser-based code. All of our code and data is publicly available, and many components of our technique can be re-used to facilitate future studies.

This work deals with error correction for non-volatile memories that are partially defect-at some levels. Such memory cells can only store incomplete information since some of their levels cannot be utilized entirely due to, e.g. wearout. On top of that, this paper corrects random errors $t\geq 1$ that could happen among $u$ partially defective cells while preserving their constraints. First, we show that the probability of violating the partially defective cells' restriction due to random errors is not trivial. Next, we update the models in [1] such that the coefficients of the output encoded vector plus the error vector at the partially defect positions are nonzero. Lastly, we state a simple theorem for masking the partial defects using a code with a minimum distance $d$ such that $d\geq (u+t)+1$. "Masking" means selecting a word whose entries correspond to writable levels in the (partially) defect positions. A comparison shows that, for a certain BCH code, masking $u$ cells by this theorem is as good as using the complicated coding scheme proven in [1, Theorem 1].

Dual embodied-symbolic concept representations are the foundation for deep learning and symbolic AI integration. We discuss the use of dual embodied-symbolic concept representations for molecular graph representation learning, specifically with exemplar-based contrastive self-supervised learning (SSL). The embodied representations are learned from molecular graphs, and the symbolic representations are learned from the corresponding Chemical knowledge graph (KG). We use the Chemical KG to enhance molecular graphs with symbolic (semantic) knowledge and generate their augmented molecular graphs. We treat a molecular graph and its semantically augmented molecular graph as exemplars of the same semantic class, and use the pairs as positive pairs in exemplar-based contrastive SSL.

The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions. In this setting, models operate on a huge output space, containing all possible state-object compositions. While previous works tackle the problem by learning embeddings for the compositions jointly, here we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. To ensure that the model develops primitive-specific features, we equip the state and object classifiers with separate, non-linear feature extractors. Moreover, we estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space. Finally, we propose a new setting, i.e. CZSL under partial supervision (pCZSL), where either only objects or state labels are available during training, and we can use our prior to estimate the missing labels. Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL, surpassing most recent competitors even when coupled with semi-supervised learning techniques. Code available at: https://github.com/ExplainableML/KG-SP.

Space missions are characterized by long distances, difficult or unavailable communication and high operating costs. Moreover, complexity has been constantly increasing in recent years. For this reason, improving the autonomy of space operators is an attractive goal to increase the mission reward with lower costs. This paper proposes an onboard scheduler, that integrates inside an onboard software framework for mission autonomy. Given a set of activities, it is responsible for determining the starting time of each activity according to their priority, order constraints, and resource consumption. The presented scheduler is based on linear integer programming and relies on the use of a branch-and-cut solver. The technology has been tested on an Earth Observation scenario, comparing its performance against the state-of-the-art scheduling technology.

Distributed storage systems must store large amounts of data over long periods of time. To avoid data loss due to device failures, an $[n,k]$ erasure code is used to encode $k$ data symbols into a codeword of $n$ symbols that are stored across different devices. However, device failure rates change throughout the life of the data, and tuning $n$ and $k$ according to these changes has been shown to save significant storage space. Code conversion is the process of converting multiple codewords of an initial $[n^I,k^I]$ code into codewords of a final $[n^F,k^F]$ code that decode to the same set of data symbols. In this paper, we study conversion bandwidth, defined as the total amount of data transferred between nodes during conversion. In particular, we consider the case where the initial and final codes are MDS and a single initial codeword is split into several final codewords ($k^I=\lambda^F k^F$ for integer $\lambda^F \geq 2$), called the split regime. We derive lower bounds on the conversion bandwidth in the split regime and propose constructions that significantly reduce conversion bandwidth and are optimal for certain parameters.

The generalization performance of kernel ridge regression (KRR) exhibits a multi-phased pattern that crucially depends on the scaling relationship between the sample size $n$ and the underlying dimension $d$. This phenomenon is due to the fact that KRR sequentially learns functions of increasing complexity as the sample size increases; when $d^{k-1}\ll n\ll d^{k}$, only polynomials with degree less than $k$ are learned. In this paper, we present sharp asymptotic characterization of the performance of KRR at the critical transition regions with $n \asymp d^k$, for $k\in\mathbb{Z}^{+}$. Our asymptotic characterization provides a precise picture of the whole learning process and clarifies the impact of various parameters (including the choice of the kernel function) on the generalization performance. In particular, we show that the learning curves of KRR can have a delicate "double descent" behavior due to specific bias-variance trade-offs at different polynomial scaling regimes.

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE and BoAW features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, we add end-to-end sequential modelling, and a log-mel-128-BNN.

We examine the problem of transmission control, i.e., when to transmit, in distributed wireless communications networks through the lens of multi-agent reinforcement learning. Most other works using reinforcement learning to control or schedule transmissions use some centralized control mechanism, whereas our approach is fully distributed. Each transmitter node is an independent reinforcement learning agent and does not have direct knowledge of the actions taken by other agents. We consider the case where only a subset of agents can successfully transmit at a time, so each agent must learn to act cooperatively with other agents. An agent may decide to transmit a certain number of steps into the future, but this decision is not communicated to the other agents, so it the task of the individual agents to attempt to transmit at appropriate times. We achieve this collaborative behavior through studying the effects of different actions spaces. We are agnostic to the physical layer, which makes our approach applicable to many types of networks. We submit that approaches similar to ours may be useful in other domains that use multi-agent reinforcement learning with independent agents.

Social media contains useful information about people and the society that could help advance research in many different areas (e.g. by applying opinion mining, emotion/sentiment analysis, and statistical analysis) such as business and finance, health, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However, user demographics such as gender are considered private and are not freely available. In this study, we propose a model based on transformers to predict the user's gender from their images and tweets. We fine-tune a model based on Vision Transformers (ViT) to stratify female and male images. Next, we fine-tune another model based on Bidirectional Encoders Representations from Transformers (BERT) to recognize the user's gender by their tweets. This is highly beneficial, because not all users provide an image that indicates their gender. The gender of such users could be detected form their tweets. The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively. This shows that the image and text classification models are capable of complementing each other by providing additional information to one another. We apply our method to the PAN-2018 dataset, and obtain an accuracy of 85.52%.

In the new era of internet systems and applications, a concept of detecting distinguished topics from huge amounts of text has gained a lot of attention. These methods use representation of text in a numerical format -- called embeddings -- to imitate human-based semantic similarity between words. In this study, we perform a fuzzy-based analysis of various vector representations of words, i.e., word embeddings. Also we introduce new methods of fuzzy clustering based on hybrid implementation of fuzzy clustering methods with an evolutionary algorithm named Forest Optimization. We use two popular fuzzy clustering algorithms on count-based word embeddings, with different methods and dimensionality. Words about covid from Kaggle dataset gathered and calculated into vectors and clustered. The results indicate that fuzzy clustering algorithms are very sensitive to high-dimensional data, and parameter tuning can dramatically change their performance. We evaluate results of experiments with various clustering validity indexes to compare different algorithm variation with different embeddings accuracy.

Although generative facial prior and geometric prior have recently demonstrated high-quality results for blind face restoration, producing fine-grained facial details faithful to inputs remains a challenging problem. Motivated by the classical dictionary-based methods and the recent vector quantization (VQ) technique, we propose a VQ-based face restoration method -- VQFR. VQFR takes advantage of high-quality low-level feature banks extracted from high-quality faces and can thus help recover realistic facial details. However, the simple application of the VQ codebook cannot achieve good results with faithful details and identity preservation. Therefore, we further introduce two special network designs. 1). We first investigate the compression patch size in the VQ codebook and find that the VQ codebook designed with a proper compression patch size is crucial to balance the quality and fidelity. 2). To further fuse low-level features from inputs while not "contaminating" the realistic details generated from the VQ codebook, we proposed a parallel decoder consisting of a texture decoder and a main decoder. Those two decoders then interact with a texture warping module with deformable convolution. Equipped with the VQ codebook as a facial detail dictionary and the parallel decoder design, the proposed VQFR can largely enhance the restored quality of facial details while keeping the fidelity to previous methods. Codes will be available at https://github.com/TencentARC/VQFR.

We give a self-contained randomized algorithm based on shifted inverse iteration which provably computes the eigenvalues of an arbitrary matrix $M\in\mathbb{C}^{n\times n}$ up to backward error $\delta\|M\|$ in $O(n^4+n^3\log^2(n/\delta)+\log(n/\delta)^2\log\log(n/\delta))$ floating point operations using $O(\log^2(n/\delta))$ bits of precision. While the $O(n^4)$ complexity is prohibitive for large matrices, the algorithm is simple and may be useful for provably computing the eigenvalues of small matrices using controlled precision, in particular for computing Ritz values in shifted QR algorithms as in (Banks, Garza-Vargas, Srivastava, 2022).

Inspired by cellular growth and self-organization, Neural Cellular Automata (NCAs) have been capable of "growing" artificial cells into images, 3D structures, and even functional machines. NCAs are flexible and robust computational systems but -- similarly to many other self-organizing systems -- inherently uncontrollable during and after their growth process. We present an approach to control these type of systems called Goal-Guided Neural Cellular Automata (GoalNCA), which leverages goal encodings to control cell behavior dynamically at every step of cellular growth. This approach enables the NCA to continually change behavior, and in some cases, generalize its behavior to unseen scenarios. We also demonstrate the robustness of the NCA with its ability to preserve task performance, even when only a portion of cells receive goal information.

Symbolic Regression searches for a function form that approximates a dataset often using Genetic Programming. Since there is usually no restriction to what form the function can have, Genetic Programming may return a hard to understand model due to non-linear function chaining or long expressions. A novel representation called Interaction-Transformation was recently proposed to alleviate this problem. In this representation, the function form is restricted to an affine combination of terms generated as the application of a single univariate function to the interaction of selected variables. This representation obtained competing solutions on standard benchmarks. Despite the initial success, a broader set of benchmarking functions revealed the limitations of the constrained representation. In this paper we propose an extension to this representation, called Transformation-Interaction-Rational representation that defines a new function form as the rational of two Interaction-Transformation functions. Additionally, the target variable can also be transformed with an univariate function. The main goal is to improve the approximation power while still constraining the overall complexity of the expression. We tested this representation with a standard Genetic Programming with crossover and mutation. The results show a great improvement when compared to its predecessor and a state-of-the-art performance for a large benchmark.

We develop a framework for proving rapid convergence of shifted QR algorithms which use Ritz values as shifts, in finite arithmetic. Our key contribution is a dichotomy result which addresses the known forward-instability issues surrounding the shifted QR iteration [Parlett and Le 1993]: we give a procedure which provably either computes a set of approximate Ritz values of a Hessenberg matrix with good forward stability properties, or leads to early decoupling of the matrix via a small number of QR steps. Using this framework, we show that the shifting strategy introduced in Part I of this series [Banks, Garza-Vargas, and Srivastava 2021] converges rapidly in finite arithmetic with a polylogarithmic bound on the number of bits of precision required, when invoked on matrices of controlled eigenvector condition number and minimum eigenvalue gap.

We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least $C$ times worse than the regret without corruption. In this paper, to overcome these limitations, we propose a new algorithm based on the principle of optimism in the face of uncertainty. At the core of our algorithm is a weighted ridge regression where the weight of each chosen action depends on its confidence up to some threshold. We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds. Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases ($C=0$) simultaneously.

Consider the relationship between the FDA (the principal) and a pharmaceutical company (the agent). The pharmaceutical company wishes to sell a product to make a profit, and the FDA wishes to ensure that only efficacious drugs are released to the public. The efficacy of the drug is not known to the FDA, so the pharmaceutical company must run a costly trial to prove efficacy to the FDA. Critically, the statistical protocol used to establish efficacy affects the behavior of a strategic, self-interested pharmaceutical company; a lower standard of statistical evidence incentivizes the pharmaceutical company to run more trials for drugs that are less likely to be effective, since the drug may pass the trial by chance, resulting in large profits. The interaction between the statistical protocol and the incentives of the pharmaceutical company is crucial to understanding this system and designing protocols with high social utility. In this work, we discuss how the principal and agent can enter into a contract with payoffs based on statistical evidence. When there is stronger evidence for the quality of the product, the principal allows the agent to make a larger profit. We show how to design contracts that are robust to an agent's strategic actions, and derive the optimal contract in the presence of strategic behavior.

Predicting part quality for additive manufacturing (AM) processes requires high-fidelity numerical simulation of partial differential equations (PDEs) governing process multiphysics on a scale of minimum manufacturable features. This makes part-scale predictions computationally demanding, especially when they require many small-scale simulations. We consider drop-on-demand liquid metal jetting (LMJ) as an illustrative example of such computational complexity. A model describing droplet coalescence for LMJ may include coupled incompressible fluid flow, heat transfer, and phase change equations. Numerically solving these equations becomes prohibitively expensive when simulating the build process for a full part consisting of thousands to millions of droplets. Reduced-order models (ROMs) based on neural networks (NN) or k-nearest neighbor (kNN) algorithms have been built to replace the original physics-based solver and are computationally tractable for part-level simulations. However, their quick inference capabilities often come at the expense of accuracy, robustness, and generalizability. We apply an operator learning (OL) approach to learn a mapping between initial and final states of the droplet coalescence process for enabling rapid and accurate part-scale build simulation. Preliminary results suggest that OL requires order-of-magnitude fewer data points than a kNN approach and is generalizable beyond the training set while achieving similar prediction error.

Can a micron sized sack of interacting molecules understand, and adapt to a constantly-fluctuating environment? Cellular life provides an existence proof in the affirmative, but the principles that allow for life's existence are far from being proven. One challenge in engineering and understanding biochemical computation is the intrinsic noise due to chemical fluctuations. In this paper, we draw insights from machine learning theory, chemical reaction network theory, and statistical physics to show that the broad and biologically relevant class of detailed balanced chemical reaction networks is capable of representing and conditioning complex distributions. These results illustrate how a biochemical computer can use intrinsic chemical noise to perform complex computations. Furthermore, we use our explicit physical model to derive thermodynamic costs of inference.

Ptychography is a popular microscopic imaging modality for many scientific discoveries and sets the record for highest image resolution. Unfortunately, the high image resolution for ptychographic reconstruction requires significant amount of memory and computations, forcing many applications to compromise their image resolution in exchange for a smaller memory footprint and a shorter reconstruction time. In this paper, we propose a novel image gradient decomposition method that significantly reduces the memory footprint for ptychographic reconstruction by tessellating image gradients and diffraction measurements into tiles. In addition, we propose a parallel image gradient decomposition method that enables asynchronous point-to-point communications and parallel pipelining with minimal overhead on a large number of GPUs. Our experiments on a Titanate material dataset (PbTiO3) with 16632 probe locations show that our Gradient Decomposition algorithm reduces memory footprint by 51 times. In addition, it achieves time-to-solution within 2.2 minutes by scaling to 4158 GPUs with a super-linear speedup at 364% efficiency. This performance is 2.7 times more memory efficient, 9 times more scalable and 86 times faster than the state-of-the-art algorithm.

We develop a framework for generalized variational inference in infinite-dimensional function spaces and use it to construct a method termed Gaussian Wasserstein inference (GWI). GWI leverages the Wasserstein distance between Gaussian measures on the Hilbert space of square-integrable functions in order to determine a variational posterior using a tractable optimisation criterion and avoids pathologies arising in standard variational function space inference. An exciting application of GWI is the ability to use deep neural networks in the variational parametrisation of GWI, combining their superior predictive performance with the principled uncertainty quantification analogous to that of Gaussian processes. The proposed method obtains state-of-the-art performance on several benchmark datasets.

As an alternative to entanglement entropies, the capacity of entanglement becomes a promising candidate to probe and estimate the degree of entanglement of quantum bipartite systems. In this work, we study the typical behavior of entanglement capacity over major models of random states. In particular, the exact and asymptotic formulas of average capacity have been derived under the Hilbert-Schmidt and Bures-Hall ensembles. The obtained formulas generalize some partial results of average capacity computed recently in the literature. As a key ingredient in deriving the results, we make use of recent advances in random matrix theory pertaining to the underlying orthogonal polynomials and special functions. Numerical study has been performed to illustrate the usefulness of average capacity as an entanglement indicator.

Quantum models of computation are widely believed to be more powerful than classical ones. Efforts center on proving that, for a given problem, quantum algorithms are more resource efficient than any classical one. All this, however, assumes a standard predictive paradigm of reasoning where, given initial conditions, the future holds the answer. How about bringing information from the future to the present and exploit it to one's advantage? This is a radical new approach for reasoning, so-called Retrodictive Computation, that benefits from the specific form of the computed functions. We demonstrate how to use tools of symbolic computation to realize retrodictive quantum computing at scale and exploit it to efficiently, and classically, solve instances of the quantum Deutsch-Jozsa, Bernstein-Vazirani, Simon, Grover, and Shor's algorithms.

Reconfigurable intelligent surface (RIS) is capable of intelligently manipulating the phases of the incident electromagnetic wave to improve the wireless propagation environment between the base-station (BS) and the users. This paper addresses the joint user scheduling, RIS configuration, and BS beamforming problem in an RIS-assisted downlink network with limited pilot overhead. We show that graph neural networks (GNN) with permutation invariant and equivariant properties can be used to appropriately schedule users and to design RIS configurations to achieve high overall throughput while accounting for fairness among the users. As compared to the conventional methodology of first estimating the channels then optimizing the user schedule, RIS configuration and the beamformers, this paper shows that an optimized user schedule can be obtained directly from a very short set of pilots using a GNN, then the RIS configuration can be optimized using a second GNN, and finally the BS beamformers can be designed based on the overall effective channel. Numerical results show that the proposed approach can utilize the received pilots more efficiently than the conventional channel estimation based approach, and can generalize to systems with an arbitrary number of users.

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. It is difficult to collect large quantities of such data for ASR system development due to the mobility issues often found among these users. To this end, data augmentation techniques play a vital role. In contrast to existing data augmentation techniques only modifying the speaking rate or overall shape of spectral contour, fine-grained spectro-temporal differences between dysarthric, elderly and normal speech are modelled using a novel set of speaker dependent (SD) generative adversarial networks (GAN) based data augmentation approaches in this paper. These flexibly allow both: a) temporal or speed perturbed normal speech spectra to be modified and closer to those of an impaired speaker when parallel speech data is available; and b) for non-parallel data, the SVD decomposed normal speech spectral basis features to be transformed into those of a target elderly speaker before being re-composed with the temporal bases to produce the augmented data for state-of-the-art TDNN and Conformer ASR system training. Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets. The proposed GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute (9.61% and 6.4% relative) WER reduction on the TORGO and DementiaBank data respectively. Consistent performance improvements are retained after applying LHUC based speaker adaptation.

Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by the superior performance of the Transformer, in this work, we present a learning-based framework based on Transformer, namely, a Microstructure Estimation Transformer with Sparse Coding (METSC) for dMRI-based microstructure estimation with downsampled q-space data. To take advantage of the Transformer while addressing its limitation in large training data requirements, we explicitly introduce an inductive bias - model bias into the Transformer using a sparse coding technique to facilitate the training process. Thus, the METSC is composed with three stages, an embedding stage, a sparse representation stage, and a mapping stage. The embedding stage is a Transformer-based structure that encodes the signal to ensure the voxel is represented effectively. In the sparse representation stage, a dictionary is constructed by solving a sparse reconstruction problem that unfolds the Iterative Hard Thresholding (IHT) process. The mapping stage is essentially a decoder that computes the microstructural parameters from the output of the second stage, based on the weighted sum of normalized dictionary coefficients where the weights are also learned. We tested our framework on two dMRI models with downsampled q-space data, including the intravoxel incoherent motion (IVIM) model and the neurite orientation dispersion and density imaging (NODDI) model. The proposed method achieved up to 11.25 folds of acceleration in scan time and outperformed the other state-of-the-art learning-based methods.

We show that every simple planar near-triangulation with minimum degree at least three contains two disjoint total dominating sets. The class includes all simple planar triangulations other than the triangle. This affirms a conjecture of Goddard and Henning [Thoroughly dispersed colorings, J. Graph Theory, 88 (2018) 174-191].

Atrial fibrillation (AF) is a cardiovascular disease identified as one of the main risk factors for stroke. The majority of strokes due to AF are caused by clots originating in the left atrial appendage (LAA). LAA occlusion is an effective procedure for reducing stroke risk. Planning the procedure using pre-procedural imaging and analysis has shown benefits. The analysis is commonly done by manually segmenting the appendage on 2D slices. Automatic LAA segmentation methods could save an expert's time and provide insightful 3D visualizations and accurate automatic measurements to aid in medical procedures. Several semi- and fully-automatic methods for segmenting the appendage have been proposed. This paper provides a review of automatic LAA segmentation methods on 3D and 4D medical images, including CT, MRI, and echocardiogram images. We classify methods into heuristic and model-based methods, as well as into semi- and fully-automatic methods. We summarize and compare the proposed methods, evaluate their effectiveness, and present current challenges in the field and approaches to overcome them.

We compute the whole asymptotic expansion of the probability that a large uniform labeled graph is connected, and of the probability that a large uniform labeled tournament is irreducible. In both cases, we provide a combinatorial interpretation of the involved coefficients.

The high demand for data rate in the next generation of wireless communication could be ensured by Non-Orthogonal Multiple Access (NOMA) approach in the millimetre-wave (mmW) frequency band. Joint power allocation and beamforming of mmW-NOMA systems is mandatory which could be met by optimization approaches. To this end, we have exploited Deep Reinforcement Learning (DRL) approach due to policy generation leading to an optimized sum-rate of users. Actor-critic phenomena are utilized to measure the immediate reward and provide the new action to maximize the overall Q-value of the network. The immediate reward has been defined based on the summation of the rate of two users regarding the minimum guaranteed rate for each user and the sum of consumed power as the constraints. The simulation results represent the superiority of the proposed approach rather than the Time-Division Multiple Access (TDMA) and another NOMA optimized strategy in terms of sum-rate of users.

In the context of an increasing interest toward reducing the number of traffic accidents and of associated victims, communication-based vehicle safety applications have emerged as one of the best solutions to enhance road safety. In this area, visible light communications (VLC) have a great potential for applications due to their relatively simple design for basic functioning, efficiency, and large geographical distribution. Vehicular Visible Light Communication (VVLC) is preferred as a vehicle to everything (V2X) communications scheme. Due to its highly secure, low complexity, and radio frequency (RF) interference-free characteristics, exploiting the line of sight (LoS) propagation of visible light and usage of already existing vehicle light-emitting diodes (LEDs). This research is addressing the application of the Non-Orthogonal Multiple Access (NOMA) technique in VLC based Vehicle-to- Vehicle (V2V) communication. The proposed system is simulated in almost realistic conditions and the performance of the system is analyzed under different scenarios.

Recent work in mathematical neuroscience has calculated the directed graph homology of the directed simplicial complex given by the brains sparse adjacency graph, the so called connectome. These biological connectomes show an abundance of both high-dimensional directed simplices and Betti-numbers in all viable dimensions - in contrast to Erd\H{o}s-R\'enyi-graphs of comparable size and density. An analysis of synthetically trained connectomes reveals similar findings, raising questions about the graphs comparability and the nature of origin of the simplices. We present a new method capable of delivering insight into the emergence of simplices and thus simplicial abundance. Our approach allows to easily distinguish simplex-rich connectomes of different origin. The method relies on the novel concept of an almost-d-simplex, that is, a simplex missing exactly one edge, and consequently the almost-d-simplex closing probability by dimension. We also describe a fast algorithm to identify almost-d-simplices in a given graph. Applying this method to biological and artificial data allows us to identify a mechanism responsible for simplex emergence, and suggests this mechanism is responsible for the simplex signature of the excitatory subnetwork of a statistical reconstruction of the mouse primary visual cortex. Our highly optimised code for this new method is publicly available.

Objective: During cardiac arrest treatment, a reliable detection of spontaneous circulation, usually performed by manual pulse checks, is both vital for patient survival and practically challenging. Methods: We developed a machine learning algorithm to automatically predict the circulatory state during cardiac arrest treatment from 4-second-long snippets of accelerometry and electrocardiogram data from real-world defibrillator records. The algorithm was trained based on 917 cases from the German Resuscitation Registry, for which ground truth labels were created by a manual annotation of physicians. It uses a kernelized Support Vector Machine classifier based on 14 features, which partially reflect the correlation between accelerometry and electrocardiogram data. Results: On a test data set, the proposed algorithm exhibits an accuracy of 94.4 (93.6, 95.2)%, a sensitivity of 95.0 (93.9, 96.1)%, and a specificity of 93.9 (92.7, 95.1)%. Conclusion and significance: In application, the algorithm may be used to simplify retrospective annotation for quality management and, moreover, to support clinicians to assess circulatory state during cardiac arrest treatment.

Game comonads have brought forth a new approach to studying finite model theory categorically. By representing model comparison games semantically as comonads, they allow important logical and combinatorial properties to be exressed in terms of their Eilenberg-Moore coalgebras. As a result, a number of results from finite model theory, such as preservation theorems and homomorphism counting theorems, have been formalised and parameterised by comonads, giving rise to new results simply by varying the comonad. In this paper we study the limits of the comonadic approach in the combinatorial and homomorphism-counting aspect of the theory, regardless of whether any model comparison games are involved. We show that any standard graph parameter has a corresponding comonad, classifying the same class. This comonad is constructed via a simple Kan extension formula, making it the initial solution to this problem and, furthermore, automatically admitting a homomorphism-counting theorem.

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments, similar to certain well-known traditional RL algorithms. Here we show that for a specific episodic UDRL algorithm (eUDRL, including GCSL), this is not the case, and give the causes of this limitation. To do so, we first introduce a helpful rewrite of eUDRL as a recursive policy update. This formulation helps to disprove its convergence to the optimal policy for a wide class of stochastic environments. Finally, we provide a concrete example of a very simple environment where eUDRL diverges. Since the primary aim of this paper is to present a negative result, and the best counterexamples are the simplest ones, we restrict all discussions to finite (discrete) environments, ignoring issues of function approximation and limited sample size.

The rapid progress of machine learning interatomic potentials over the past couple of years produced a number of new architectures. Particularly notable among these are the Atomic Cluster Expansion (ACE), which unified many of the earlier ideas around atom density-based descriptors, and Neural Equivariant Interatomic Potentials (NequIP), a message passing neural network with equivariant features that showed state of the art accuracy. In this work, we construct a mathematical framework that unifies these models: ACE is generalised so that it can be recast as one layer of a multi-layer architecture. From another point of view, the linearised version of NequIP is understood as a particular sparsification of a much larger polynomial model. Our framework also provides a practical tool for systematically probing different choices in the unified design space. We demonstrate this by an ablation study of NequIP via a set of experiments looking at in- and out-of-domain accuracy and smooth extrapolation very far from the training data, and shed some light on which design choices are critical for achieving high accuracy. Finally, we present BOTNet (Body-Ordered-Tensor-Network), a much-simplified version of NequIP, which has an interpretable architecture and maintains accuracy on benchmark datasets.

Designing robust and accurate prediction models has been a viable research area since a long time. While proponents of a well-functioning market predictors believe that it is difficult to accurately predict market prices but many scholars disagree. Robust and accurate prediction systems will not only be helpful to the businesses but also to the individuals in making their financial investments. This paper presents an LSTM model with two different input approaches for predicting the short-term stock prices of two Indian companies, Reliance Industries and Infosys Ltd. Ten years of historic data (2012-2021) is taken from the yahoo finance website to carry out analysis of proposed approaches. In the first approach, closing prices of two selected companies are directly applied on univariate LSTM model. For the approach second, technical indicators values are calculated from the closing prices and then collectively applied on Multivariate LSTM model. Short term market behaviour for upcoming days is evaluated. Experimental outcomes revel that approach one is useful to determine the future trend but multivariate LSTM model with technical indicators found to be useful in accurately predicting the future price behaviours.

This paper discusses how to crawl the data of financial forums such as stock bar, and conduct emotional analysis combined with the in-depth learning model. This paper will use the Bert model to train the financial corpus and predict the Shenzhen stock index. Through the comparative study of the maximal information coefficient (MIC), it is found that the emotional characteristics obtained by applying the BERT model to the financial corpus can be reflected in the fluctuation of the stock market, which is conducive to effectively improve the prediction accuracy. At the same time, this paper combines in-depth learning with financial texts to further explore the impact mechanism of investor sentiment on the stock market through in-depth learning, which will help the national regulatory authorities and policy departments to formulate more reasonable policies and guidelines for maintaining the stability of the stock market.

Ultrasound (US) is one of the most common medical imaging modalities since it is radiation-free, low-cost, and real-time. In freehand US examinations, sonographers often navigate a US probe to visualize standard examination planes with rich diagnostic information. However, reproducibility and stability of the resulting images often suffer from intra- and inter-operator variation. Reinforcement learning (RL), as an interaction-based learning method, has demonstrated its effectiveness in visual navigating tasks; however, RL is limited in terms of generalization. To address this challenge, we propose a simulation-based RL framework for real-world navigation of US probes towards the standard longitudinal views of vessels. A UNet is used to provide binary masks from US images; thereby, the RL agent trained on simulated binary vessel images can be applied in real scenarios without further training. To accurately characterize actual states, a multi-modality state representation structure is introduced to facilitate the understanding of environments. Moreover, considering the characteristics of vessels, a novel standard view recognition approach based on the minimum bounding rectangle is proposed to terminate the searching process. To evaluate the effectiveness of the proposed method, the trained policy is validated virtually on 3D volumes of a volunteer's in-vivo carotid artery, and physically on custom-designed gel phantoms using robotic US. The results demonstrate that proposed approach can effectively and accurately navigate the probe towards the longitudinal view of vessels.

Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in modern machine learning applications. In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of decentralization on the tail behavior. We first show that, when the loss function at each computational node is twice continuously differentiable and strongly convex outside a compact region, the law of the DE-SGD iterates converges to a distribution with polynomially decaying (heavy) tails. To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes. Then, we provide theoretical and empirical results showing that DE-SGD has heavier tails than centralized SGD. We also compare DE-SGD to disconnected SGD where nodes distribute the data but do not communicate. Our theory uncovers an interesting interplay between the tails and the network structure: we identify two regimes of parameters (stepsize and network size), where DE-SGD %addition of network structure can have lighter or heavier tails than disconnected SGD depending on the regime. Finally, to support our theoretical results, we provide numerical experiments conducted on both synthetic data and neural networks.

Simultaneously reflecting and transmitting reconfigurable intelligent surfaces (STAR-RIS) has recently emerged as prominent technology that exploits the transmissive property of RIS to mitigate the half-space coverage limitation of conventional RIS operating on millimeter-wave (mmWave). In this paper, we study a downlink STAR-RIS-based multi-user multiple-input single-output (MU-MISO) mmWave hybrid non-orthogonal multiple access (H-NOMA) wireless network, where a sum-rate maximization problem has been formulated. The design of active and passive beamforming vectors, time and power allocation for H-NOMA is a highly coupled non-convex problem. To handle the problem, we propose an optimization framework based on alternating optimization (AO) that iteratively solves active and passive beamforming sub-problems. Channel correlations and channel strength-based techniques have been proposed for a specific case of two-user optimal clustering and decoding order assignment, respectively, for which analytical solutions to joint power and time allocation for H-NOMA have also been derived. Simulation results show that: 1) the proposed framework leveraging H-NOMA outperforms conventional OMA and NOMA to maximize the achievable sum-rate; 2) using the proposed framework, the supported number of clusters for the given design constraints can be increased considerably; 3) through STAR-RIS, the number of elements can be significantly reduced as compared to conventional RIS to ensure a similar quality-of-service (QoS).

Gromov-Wasserstein (GW) distances are generalizations of Gromov-Haussdorff and Wasserstein distances. Due to their invariance under certain distance-preserving transformations they are well suited for many practical applications. In this paper, we introduce a concept of multi-marginal GW transport as well as its regularized and unbalanced versions. Then we generalize a bi-convex relaxation of the GW transport to our multi-marginal setting which is tight if the cost function is conditionally negative definite in a certain sense. The minimization of this relaxed model can be done by an alternating algorithm, where each step can be performed by a Sinkhorn scheme for a multi-marginal transport problem. We show a relation of our multi-marginal GW problem for a tree-structured cost function to an (unbalanced) GW barycenter problem and present different proof-of-concept numerical results.

Neural video compression has emerged as a novel paradigm combining trainable multilayer neural networks and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.

Due to the latest advances in technology, telescopes with significant sky coverage will produce millions of astronomical alerts per night that must be classified both rapidly and automatically. Currently, classification consists of supervised machine learning algorithms whose performance is limited by the number of existing annotations of astronomical objects and their highly imbalanced class distributions. In this work, we propose a data augmentation methodology based on Generative Adversarial Networks (GANs) to generate a variety of synthetic light curves from variable stars. Our novel contributions, consisting of a resampling technique and an evaluation metric, can assess the quality of generative models in unbalanced datasets and identify GAN-overfitting cases that the Fr\'echet Inception Distance does not reveal. We applied our proposed model to two datasets taken from the Catalina and Zwicky Transient Facility surveys. The classification accuracy of variable stars is improved significantly when training with synthetic data and testing with real data with respect to the case of using only real data.

Observational studies are regarded as economic alternatives to randomized trials, often used in their stead to investigate and determine treatment efficacy. Due to lack of sample size, observational studies commonly combine data from multiple sources or different sites/centers. Despite the benefits of an increased sample size, a naive combination of multicenter data may result in incongruities stemming from center-specific protocols for generating cohorts or reactions towards treatments distinct to a given center, among other things. These issues arise in a variety of other contexts, including capturing a treatment effect related to an individual's unique biological characteristics. Existing methods for estimating heterogeneous treatment effects have not adequately addressed the multicenter context, but rather treat it simply as a means to obtain sufficient sample size. Additionally, previous approaches to estimating treatment effects do not straightforwardly generalize to the multicenter design, especially when required to provide treatment insights for patients from a new, unobserved center. To address these shortcomings, we propose Multiple Domain Causal Networks (MDCN), an approach that simultaneously strengthens the information sharing between similar centers while addressing the selection bias in treatment assignment through learning of a new feature embedding. In empirical evaluations, MDCN is consistently more accurate when estimating the heterogeneous treatment effect in new centers compared to benchmarks that adjust solely based on treatment imbalance or general center differences. Finally, we justify our approach by providing theoretical analyses that demonstrate that MDCN improves on the generalization bound of the new, unobserved target center.