New articles on Computer Science


[1] 2106.09021

On the training of sparse and dense deep neural networks: less parameters, same performance

Deep neural networks can be trained in reciprocal space, by acting on the eigenvalues and eigenvectors of suitable transfer operators in direct space. Adjusting the eigenvalues, while freezing the eigenvectors, yields a substantial compression of the parameter space. This latter scales by definition with the number of computing neurons. The classification scores, as measured by the displayed accuracy, are however inferior to those attained when the learning is carried in direct space, for an identical architecture and by employing the full set of trainable parameters (with a quadratic dependence on the size of neighbor layers). In this Letter, we propose a variant of the spectral learning method as appeared in Giambagli et al {Nat. Comm.} 2021, which leverages on two sets of eigenvalues, for each mapping between adjacent layers. The eigenvalues act as veritable knobs which can be freely tuned so as to (i) enhance, or alternatively silence, the contribution of the input nodes, (ii) modulate the excitability of the receiving nodes with a mechanism which we interpret as the artificial analogue of the homeostatic plasticity. The number of trainable parameters is still a linear function of the network size, but the performances of the trained device gets much closer to those obtained via conventional algorithms, these latter requiring however a considerably heavier computational cost. The residual gap between conventional and spectral trainings can be eventually filled by employing a suitable decomposition for the non trivial block of the eigenvectors matrix. Each spectral parameter reflects back on the whole set of inter-nodes weights, an attribute which we shall effectively exploit to yield sparse networks with stunning classification abilities, as compared to their homologues trained with conventional means.


[2] 2106.09022

A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection

Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice. On a wide selection of challenging vision, language, and biology OOD benchmarks (CIFAR-100 vs CIFAR-10, CLINC OOD intent detection, Genomics OOD), we show that RMD meaningfully improves upon MD performance (by up to 15% AUROC on genomics OOD).


[3] 2106.09024

Disentangling Online Chats with DAG-Structured LSTMs

Many modern messaging systems allow fast and synchronous textual communication among many users. The resulting sequence of messages hides a more complicated structure in which independent sub-conversations are interwoven with one another. This poses a challenge for any task aiming to understand the content of the chat logs or gather information from them. The ability to disentangle these conversations is then tantamount to the success of many downstream tasks such as summarization and question answering. Structured information accompanying the text such as user turn, user mentions, timestamps, is used as a cue by the participants themselves who need to follow the conversation and has been shown to be important for disentanglement. DAG-LSTMs, a generalization of Tree-LSTMs that can handle directed acyclic dependencies, are a natural way to incorporate such information and its non-sequential nature. In this paper, we apply DAG-LSTMs to the conversation disentanglement task. We perform our experiments on the Ubuntu IRC dataset. We show that the novel model we propose achieves state of the art status on the task of recovering reply-to relations and it is competitive on other disentanglement metrics.


[4] 2106.09035

Regularization of Mixture Models for Robust Principal Graph Learning

A regularized version of Mixture Models is proposed to learn a principal graph from a distribution of $D$-dimensional data points. In the particular case of manifold learning for ridge detection, we assume that the underlying manifold can be modeled as a graph structure acting like a topological prior for the Gaussian clusters turning the problem into a maximum a posteriori estimation. Parameters of the model are iteratively estimated through an Expectation-Maximization procedure making the learning of the structure computationally efficient with guaranteed convergence for any graph prior in a polynomial time. We also embed in the formalism a natural way to make the algorithm robust to outliers of the pattern and heteroscedasticity of the manifold sampling coherently with the graph structure. The method uses a graph prior given by the minimum spanning tree that we extend using random sub-samplings of the dataset to take into account cycles that can be observed in the spatial distribution.


[5] 2106.09051

Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure

Our goal in this work is to generate realistic videos given just one initial frame as input. Existing unsupervised approaches to this task do not consider the fact that a video typically shows a 3D environment, and that this should remain coherent from frame to frame even as the camera and objects move. We address this by developing a model that first estimates the latent 3D structure of the scene, including the segmentation of any moving objects. It then predicts future frames by simulating the object and camera dynamics, and rendering the resulting views. Importantly, it is trained end-to-end using only the unsupervised objective of predicting future frames, without any 3D information nor segmentation annotations. Experiments on two challenging datasets of natural videos show that our model can estimate 3D structure and motion segmentation from a single frame, and hence generate plausible and varied predictions.


[6] 2106.09060

On the stability of the $L^{2}$ projection and the quasiinterpolant in the space of smooth periodic splines

In this paper we derive stability estimates in $L^{2}$- and $L^{\infty}$- based Sobolev spaces for the $L^{2}$ projection and a family of quasiinterolants in the space of smooth, 1-periodic, polynomial splines defined on a uniform mesh in $[0,1]$. As a result of the assumed periodicity and the uniform mesh, cyclic matrix techniques and suitable decay estimates of the elements of the inverse of a Gram matrix associated with the standard basis of the space of splines, are used to establish the stability results.


[7] 2106.09063

Specializing Multilingual Language Models: An Empirical Study

Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks in many different languages, but the success of this approach is far from universal. For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data, motivating additional model adaptations to achieve reasonably strong performance. In this work, we study the performance, extensibility, and interaction of two such adaptations for this low-resource setting: vocabulary augmentation and script transliteration. Our evaluations on a set of three tasks in nine diverse low-resource languages yield a mixed result, upholding the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.


[8] 2106.09064

Automatic Main Character Recognition for Photographic Studies

Main characters in images are the most important humans that catch the viewer's attention upon first look, and they are emphasized by properties such as size, position, color saturation, and sharpness of focus. Identifying the main character in images plays an important role in traditional photographic studies and media analysis, but the task is performed manually and can be slow and laborious. Furthermore, selection of main characters can be sometimes subjective. In this paper, we analyze the feasibility of solving the main character recognition needed for photographic studies automatically and propose a method for identifying the main characters. The proposed method uses machine learning based human pose estimation along with traditional computer vision approaches for this task. We approach the task as a binary classification problem where each detected human is classified either as a main character or not. To evaluate both the subjectivity of the task and the performance of our method, we collected a dataset of 300 varying images from multiple sources and asked five people, a photographic researcher and four other persons, to annotate the main characters. Our analysis showed a relatively high agreement between different annotators. The proposed method achieved a promising F1 score of 0.83 on the full image set and 0.96 on a subset evaluated as most clear and important cases by the photographic researcher.


[9] 2106.09065

SPeCiaL: Self-Supervised Pretraining for Continual Learning

This paper presents SPeCiaL: a method for unsupervised pretraining of representations tailored for continual learning. Our approach devises a meta-learning objective that differentiates through a sequential learning process. Specifically, we train a linear model over the representations to match different augmented views of the same image together, each view presented sequentially. The linear model is then evaluated on both its ability to classify images it just saw, and also on images from previous iterations. This gives rise to representations that favor quick knowledge retention with minimal forgetting. We evaluate SPeCiaL in the Continual Few-Shot Learning setting, and show that it can match or outperform other supervised pretraining approaches.


[10] 2106.09069

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepresented groups are not equally included in the evaluation. To encourage more in-depth model analyses, researchers have proposed the use of multiple test sets, also called challenge sets, that assess specific capabilities of a model. In this paper, we develop a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.


[11] 2106.09070

Identifiability-Guaranteed Simplex-Structured Post-Nonlinear Mixture Learning via Autoencoder

This work focuses on the problem of unraveling nonlinearly mixed latent components in an unsupervised manner. The latent components are assumed to reside in the probability simplex, and are transformed by an unknown post-nonlinear mixing system. This problem finds various applications in signal and data analytics, e.g., nonlinear hyperspectral unmixing, image embedding, and nonlinear clustering. Linear mixture learning problems are already ill-posed, as identifiability of the target latent components is hard to establish in general. With unknown nonlinearity involved, the problem is even more challenging. Prior work offered a function equation-based formulation for provable latent component identification. However, the identifiability conditions are somewhat stringent and unrealistic. In addition, the identifiability analysis is based on the infinite sample (i.e., population) case, while the understanding for practical finite sample cases has been elusive. Moreover, the algorithm in the prior work trades model expressiveness with computational convenience, which often hinders the learning performance. Our contribution is threefold. First, new identifiability conditions are derived under largely relaxed assumptions. Second, comprehensive sample complexity results are presented -- which are the first of the kind. Third, a constrained autoencoder-based algorithmic framework is proposed for implementation, which effectively circumvents the challenges in the existing algorithm. Synthetic and real experiments corroborate our theoretical analyses.


[12] 2106.09074

Robust a posteriori error analysis for rotation-based formulations of the elasticity/poroelasticity coupling

We develop the \textit{a posteriori} error analysis of three mixed finite element formulations for rotation-based equations in elasticity, poroelasticity, and interfacial elasticity-poroelasticity. The discretisations use $H^1$-conforming finite elements of degree $k+1$ for displacement and fluid pressure, and discontinuous piecewise polynomials of degree $k$ for rotation vector, total pressure, and elastic pressure. Residual-based estimators are constructed, and upper and lower bounds (up to data oscillations) for all global estimators are rigorously derived. The methods are all robust with respect to the model parameters (in particular, the Lam\'e constants), they are valid in 2D and 3D, and also for arbitrary polynomial degree $k\geq 0$. The error behaviour predicted by the theoretical analysis is then demonstrated numerically on a set of computational examples including different geometries on which we perform adaptive mesh refinement guided by the \textit{a posteriori} error estimators.


[13] 2106.09076

Deformation Driven Seq2Seq Longitudinal Tumor and Organs-at-Risk Prediction for Radiotherapy

Purpose: Radiotherapy presents unique challenges and clinical requirements for longitudinal tumor and organ-at-risk (OAR) prediction during treatment. The challenges include tumor inflammation/edema and radiation-induced changes in organ geometry, whereas the clinical requirements demand flexibility in input/output sequence timepoints to update the predictions on rolling basis and the grounding of all predictions in relationship to the pre-treatment imaging information for response and toxicity assessment in adaptive radiotherapy. Methods: To deal with the aforementioned challenges and to comply with the clinical requirements, we present a novel 3D sequence-to-sequence model based on Convolution Long Short Term Memory (ConvLSTM) that makes use of series of deformation vector fields (DVF) between individual timepoints and reference pre-treatment/planning CTs to predict future anatomical deformations and changes in gross tumor volume as well as critical OARs. High-quality DVF training data is created by employing hyper-parameter optimization on the subset of the training data with DICE coefficient and mutual information metric. We validated our model on two radiotherapy datasets: a publicly available head-and-neck dataset (28 patients with manually contoured pre-, mid-, and post-treatment CTs), and an internal non-small cell lung cancer dataset (63 patients with manually contoured planning CT and 6 weekly CBCTs). Results: The use of DVF representation and skip connections overcomes the blurring issue of ConvLSTM prediction with the traditional image representation. The mean and standard deviation of DICE for predictions of lung GTV at week 4, 5, and 6 were 0.83$\pm$0.09, 0.82$\pm$0.08, and 0.81$\pm$0.10, respectively, and for post-treatment ipsilateral and contralateral parotids, were 0.81$\pm$0.06 and 0.85$\pm$0.02.


[14] 2106.09078

Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations

As Graph Neural Networks (GNNs) are increasingly employed in real-world applications, it becomes critical to ensure that the stakeholders understand the rationale behind their predictions. While several GNN explanation methods have been proposed recently, there has been little to no work on theoretically analyzing the behavior of these methods or systematically evaluating their effectiveness. Here, we introduce the first axiomatic framework for theoretically analyzing, evaluating, and comparing state-of-the-art GNN explanation methods. We outline and formalize the key desirable properties that all GNN explanation methods should satisfy in order to generate reliable explanations, namely, faithfulness, stability, and fairness. We leverage these properties to present the first ever theoretical analysis of the effectiveness of state-of-the-art GNN explanation methods. Our analysis establishes upper bounds on all the aforementioned properties for popular GNN explanation methods. We also leverage our framework to empirically evaluate these methods on multiple real-world datasets from diverse domains. Our empirical results demonstrate that some popular GNN explanation methods (e.g., gradient-based methods) perform no better than a random baseline and that methods which leverage the graph structure are more effective than those that solely rely on the node features.


[15] 2106.09086

Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games. However, one major limitation of prior search approaches for partially observable environments is that the computational cost scales poorly with the amount of hidden information. In this paper we present \emph{Learned Belief Search} (LBS), a computationally efficient search procedure for partially observable environments. Rather than maintaining an exact belief distribution, LBS uses an approximate auto-regressive counterfactual belief that is learned as a supervised task. In multi-agent settings, LBS uses a novel public-private model architecture for underlying policies in order to efficiently evaluate these policies during rollouts. In the benchmark domain of Hanabi, LBS can obtain 55% ~ 91% of the benefit of exact search while reducing compute requirements by $35.8 \times$ ~ $4.6 \times$, allowing it to scale to larger settings that were inaccessible to previous search methods.


[16] 2106.09094

Impact of Communication Loss on MPC based Cooperative Adaptive Cruise Control and Platooning

Cooperative driving, enabled by communication between automated vehicle systems, is expected to significantly contribute to transportation safety and efficiency. Cooperative Adaptive Cruise Control (CACC) and platooning are two of the main cooperative driving applications that are currently under study. These applications offer significant improvements over current advanced driver assistant systems such as adaptive cruise control (ACC). The primary motivation of CACC and Platooning is to reduce traffic congestion and improve traffic flow, traffic throughput, and highway capacity. These applications need an efficient controller to consider the computational cost and ensure driving comfort and high responsiveness. The advantage of Model Predictive Control is that we can realize high control performance since all constrain for these applications can be explicitly dealt with through solving an optimization problem. These applications highly depend on information update and Communication reliability for their safety and stability purposes. In this paper, we propose a Model Predictive Control (MPC) based approach for CACC and platooning, and examine the impact of communication loss on the performance and robustness of the control scheme. The results show an improvement in response time and string stability, demonstrating the potential of cooperation to attenuate disturbances and improve traffic flow.


[17] 2106.09104

Improving Inference Lifetime of Neuromorphic Systems via Intelligent Synapse Mapping

Non-Volatile Memories (NVMs) such as Resistive RAM (RRAM) are used in neuromorphic systems to implement high-density and low-power analog synaptic weights. Unfortunately, an RRAM cell can switch its state after reading its content a certain number of times. Such behavior challenges the integrity and program-once-read-many-times philosophy of implementing machine learning inference on neuromorphic systems, impacting the Quality-of-Service (QoS). Elevated temperatures and frequent usage can significantly shorten the number of times an RRAM cell can be reliably read before it becomes absolutely necessary to reprogram. We propose an architectural solution to extend the read endurance of RRAM-based neuromorphic systems. We make two key contributions. First, we formulate the read endurance of an RRAM cell as a function of the programmed synaptic weight and its activation within a machine learning workload. Second, we propose an intelligent workload mapping strategy incorporating the endurance formulation to place the synapses of a machine learning model onto the RRAM cells of the hardware. The objective is to extend the inference lifetime, defined as the number of times the model can be used to generate output (inference) before the trained weights need to be reprogrammed on the RRAM cells of the system. We evaluate our architectural solution with machine learning workloads on a cycle-accurate simulator of an RRAM-based neuromorphic system. Our results demonstrate a significant increase in inference lifetime with only a minimal performance impact.


[18] 2106.09106

Explainable AI for Natural Adversarial Images

Adversarial images highlight how vulnerable modern image classifiers are to perturbations outside of their training set. Human oversight might mitigate this weakness, but depends on humans understanding the AI well enough to predict when it is likely to make a mistake. In previous work we have found that humans tend to assume that the AI's decision process mirrors their own. Here we evaluate if methods from explainable AI can disrupt this assumption to help participants predict AI classifications for adversarial and standard images. We find that both saliency maps and examples facilitate catching AI errors, but their effects are not additive, and saliency maps are more effective than examples.


[19] 2106.09109

QuantumFed: A Federated Learning Framework for Collaborative Quantum Training

With the fast development of quantum computing and deep learning, quantum neural networks have attracted great attention recently. By leveraging the power of quantum computing, deep neural networks can potentially overcome computational power limitations in classic machine learning. However, when multiple quantum machines wish to train a global model using the local data on each machine, it may be very difficult to copy the data into one machine and train the model. Therefore, a collaborative quantum neural network framework is necessary. In this article, we borrow the core idea of federated learning to propose QuantumFed, a quantum federated learning framework to have multiple quantum nodes with local quantum data train a mode together. Our experiments show the feasibility and robustness of our framework.


[20] 2106.09110

Safe Reinforcement Learning Using Advantage-Based Intervention

Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.


[21] 2106.09111

An Imprecise SHAP as a Tool for Explaining the Class Probability Distributions under Limited Training Data

One of the most popular methods of the machine learning prediction explanation is the SHapley Additive exPlanations method (SHAP). An imprecise SHAP as a modification of the original SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions. The first idea behind the imprecise SHAP is a new approach for computing the marginal contribution of a feature, which fulfils the important efficiency property of Shapley values. The second idea is an attempt to consider a general approach to calculating and reducing interval-valued Shapley values, which is similar to the idea of reachable probability intervals in the imprecise probability theory. A simple special implementation of the general approach in the form of linear optimization problems is proposed, which is based on using the Kolmogorov-Smirnov distance and imprecise contamination models. Numerical examples with synthetic and real data illustrate the imprecise SHAP.


[22] 2106.09117

DeepSplit: Scalable Verification of Deep Neural Networks via Operator Splitting

Analyzing the worst-case performance of deep neural networks against input perturbations amounts to solving a large-scale non-convex optimization problem, for which several past works have proposed convex relaxations as a promising alternative. However, even for reasonably-sized neural networks, these relaxations are not tractable, and so must be replaced by even weaker relaxations in practice. In this work, we propose a novel operator splitting method that can directly solve a convex relaxation of the problem to high accuracy, by splitting it into smaller sub-problems that often have analytical solutions. The method is modular and scales to problem instances that were previously impossible to solve exactly due to their size. Furthermore, the solver operations are amenable to fast parallelization with GPU acceleration. We demonstrate our method in obtaining tighter bounds on the worst-case performance of large convolutional networks in image classification and reinforcement learning settings.


[23] 2106.09119

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. Extracting policies from diverse offline datasets has the potential to expand the range of applicability of RL by making the training process safer, faster, and more streamlined. We investigate how to improve the performance of offline RL algorithms, its robustness to the quality of offline data, as well as its generalization capabilities. To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE). Our algorithm is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. When combined together, they substantially improve the performance and generalization of offline RL policies. In the widely studied D4RL offline RL benchmark, we find that MABE achieves higher average performance compared to prior model-free and model-based algorithms. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods. Our website is available at https://sites.google.com/berkeley.edu/mabe .


[24] 2106.09121

Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary Framework

Enforcing orthogonality in neural networks is an antidote for gradient vanishing/exploding problems, sensitivity by adversarial perturbation, and bounding generalization errors. However, many previous approaches are heuristic, and the orthogonality of convolutional layers is not systematically studied: some of these designs are not exactly orthogonal, while others only consider standard convolutional layers and propose specific classes of their realizations. To address this problem, we propose a theoretical framework for orthogonal convolutional layers, which establishes the equivalence between various orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain. Since there exists a complete spectral factorization of paraunitary systems, any orthogonal convolution layer can be parameterized as convolutions of spatial filters. Our framework endows high expressive power to various convolutional layers while maintaining their exact orthogonality. Furthermore, our layers are memory and computationally efficient for deep networks compared to previous designs. Our versatile framework, for the first time, enables the study of architecture designs for deep orthogonal networks, such as choices of skip connection, initialization, stride, and dilation. Consequently, we scale up orthogonal networks to deep architectures, including ResNet, WideResNet, and ShuffleNet, substantially increasing the performance over the traditional shallow orthogonal networks.


[25] 2106.09127

Planning on a (Risk) Budget: Safe Non-Conservative Planning in Probabilistic Dynamic Environments

Planning in environments with other agents whose future actions are uncertain often requires compromise between safety and performance. Here our goal is to design efficient planning algorithms with guaranteed bounds on the probability of safety violation, which nonetheless achieve non-conservative performance. To quantify a system's risk, we define a natural criterion called interval risk bounds (IRBs), which provide a parametric upper bound on the probability of safety violation over a given time interval or task. We present a novel receding horizon algorithm, and prove that it can satisfy a desired IRB. Our algorithm maintains a dynamic risk budget which constrains the allowable risk at each iteration, and guarantees recursive feasibility by requiring a safe set to be reachable by a contingency plan within the budget. We empirically demonstrate that our algorithm is both safer and less conservative than strong baselines in two simulated autonomous driving experiments in scenarios involving collision avoidance with other vehicles, and additionally demonstrate our algorithm running on an autonomous class 8 truck.


[26] 2106.09129

A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness

Two crucial requirements for a successful adoption of deep learning (DL) in the wild are: (1) robustness to distributional shifts, and (2) model compactness for achieving efficiency. Unfortunately, efforts towards simultaneously achieving Out-of-Distribution (OOD) robustness and extreme model compactness without sacrificing accuracy have mostly been unsuccessful. This raises an important question: "Is the inability to create compact, accurate, and robust deep neural networks (CARDs) fundamental?" To answer this question, we perform a large-scale analysis for a range of popular model compression techniques which uncovers several intriguing patterns. Notably, in contrast to traditional pruning approaches (e.g., fine tuning and gradual magnitude pruning), we find that "lottery ticket-style" pruning approaches can surprisingly be used to create high performing CARDs. Specifically, we are able to create extremely compact CARDs that are dramatically more robust than their significantly larger and full-precision counterparts while matching (or beating) their test accuracy, simply by pruning and/or quantizing. To better understand these differences, we perform sensitivity analysis in the Fourier domain for CARDs trained using different data augmentation methods. Motivated by our analysis, we develop a simple domain-adaptive test-time ensembling approach (CARD-Deck) that uses a gating module to dynamically select an appropriate CARD from the CARD-Deck based on their spectral-similarity with test samples. By leveraging complementary frequency biases of different compressed models, the proposed approach builds a "winning hand" of CARDs that establishes a new state-of-the-art on CIFAR-10-C accuracies (i.e., 96.8% clean and 92.75% robust) with dramatically better memory usage than their non-compressed counterparts. We also present some theoretical evidences supporting our empirical findings.


[27] 2106.09135

EEG-GNN: Graph Neural Networks for Classification of Electroencephalogram (EEG) Signals

Convolutional neural networks (CNN) have been frequently used to extract subject-invariant features from electroencephalogram (EEG) for classification tasks. This approach holds the underlying assumption that electrodes are equidistant analogous to pixels of an image and hence fails to explore/exploit the complex functional neural connectivity between different electrode sites. We overcome this limitation by tailoring the concepts of convolution and pooling applied to 2D grid-like inputs for the functional network of electrode sites. Furthermore, we develop various graph neural network (GNN) models that project electrodes onto the nodes of a graph, where the node features are represented as EEG channel samples collected over a trial, and nodes can be connected by weighted/unweighted edges according to a flexible policy formulated by a neuroscientist. The empirical evaluations show that our proposed GNN-based framework outperforms standard CNN classifiers across ErrP, and RSVP datasets, as well as allowing neuroscientific interpretability and explainability to deep learning methods tailored to EEG related classification problems. Another practical advantage of our GNN-based framework is that it can be used in EEG channel selection, which is critical for reducing computational cost, and designing portable EEG headsets.


[28] 2106.09140

Human-AI Interactions Through A Gricean Lens

Grice's Cooperative Principle (1975) describes the implicit maxims that guide conversation between humans. As humans begin to interact with non-human dialogue systems more frequently and in a broader scope, an important question emerges: what principles govern those interactions? The present study addresses this question by evaluating human-AI interactions using Grice's four maxims; we demonstrate that humans do, indeed, apply these maxims to interactions with AI, even making explicit references to the AI's performance through a Gricean lens. Twenty-three participants interacted with an American English-speaking Alexa and rated and discussed their experience with an in-lab researcher. Researchers then reviewed each exchange, identifying those that might relate to Grice's maxims: Quantity, Quality, Manner, and Relevance. Many instances of explicit user frustration stemmed from violations of Grice's maxims. Quantity violations were noted for too little but not too much information, while Quality violations were rare, indicating trust in Alexa's responses. Manner violations focused on speed and humanness. Relevance violations were the most frequent, and they appear to be the most frustrating. While the maxims help describe many of the issues participants encountered, other issues do not fit neatly into Grice's framework. Participants were particularly averse to Alexa initiating exchanges or making unsolicited suggestions. To address this gap, we propose the addition of human Priority to describe human-AI interaction. Humans and AIs are not conversational equals, and human initiative takes priority. We suggest that the application of Grice's Cooperative Principles to human-AI interactions is beneficial both from an AI development perspective and as a tool for describing an emerging form of interaction.


[29] 2106.09141

Probing Image-Language Transformers for Verb Understanding

Multimodal image-language transformers have achieved impressive results on a variety of tasks that rely on fine-tuning (e.g., visual question answering and image retrieval). We are interested in shedding light on the quality of their pretrained representations -- in particular, if these models can distinguish different types of verbs or if they rely solely on nouns in a given sentence. To do so, we collect a dataset of image-sentence pairs (in English) consisting of 421 verbs that are either visual or commonly found in the pretraining data (i.e., the Conceptual Captions dataset). We use this dataset to evaluate pretrained image-language transformers and find that they fail more in situations that require verb understanding compared to other parts of speech. We also investigate what category of verbs are particularly challenging.


[30] 2106.09144

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Recent works demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication -- the intensive and key computation in DNNs. With weights stored in the ReRAM crossbar cells as conductance, when the input vector is applied to word lines, the matrix-vector multiplication results can be generated as the current in bit lines. A key problem is that the weight can be either positive or negative, but the in-situ computation assumes all cells on each crossbar column with the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights, or add an offset to weights so that all values become positive. Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better solve this problem, this paper proposes FORMS, a fine-grained ReRAM-based DNN accelerator with polarized weights. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computation -- ensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we propose to use fine-grained sub-array columns, which provide a unique opportunity for input zero-skipping, significantly avoiding unnecessary computations. It also makes the hardware much easier to implement. Putting all together, with the same optimized models, FORMS achieves significant throughput improvement and speed up in frame per second over ISAAC with similar area cost.


[31] 2106.09146

Contrastive Reinforcement Learning of Symbolic Reasoning Domains

Abstract symbolic reasoning, as required in domains such as mathematics and logic, is a key component of human intelligence. Solvers for these domains have important applications, especially to computer-assisted education. But learning to solve symbolic problems is challenging for machine learning algorithms. Existing models either learn from human solutions or use hand-engineered features, making them expensive to apply in new domains. In this paper, we instead consider symbolic domains as simple environments where states and actions are given as unstructured text, and binary rewards indicate whether a problem is solved. This flexible setup makes it easy to specify new domains, but search and planning become challenging. We introduce four environments inspired by the Mathematics Common Core Curriculum, and observe that existing Reinforcement Learning baselines perform poorly. We then present a novel learning algorithm, Contrastive Policy Learning (ConPoLe) that explicitly optimizes the InfoNCE loss, which lower bounds the mutual information between the current state and next states that continue on a path to the solution. ConPoLe successfully solves all four domains. Moreover, problem representations learned by ConPoLe enable accurate prediction of the categories of problems in a real mathematics curriculum. Our results suggest new directions for reinforcement learning in symbolic domains, as well as applications to mathematics education.


[32] 2106.09151

Recovery Guarantees for Time-varying Pairwise Comparison Matrices with Non-transitivity

Pairwise comparison matrices have received substantial attention in a variety of applications, especially in rank aggregation, the task of flattening items into a one-dimensional (and thus transitive) ranking. However, non-transitive preference cycles can arise in practice due to the fact that making a decision often requires a complex evaluation of multiple factors. In some applications, it may be important to identify and preserve information about the inherent non-transitivity, either in the pairwise comparison data itself or in the latent feature space. In this work, we develop structured models for non-transitive pairwise comparison matrices that can be exploited to recover such matrices from incomplete noisy data and thus allow the detection of non-transitivity. Considering that individuals' tastes and items' latent features may change over time, we formulate time-varying pairwise comparison matrix recovery as a dynamic skew-symmetric matrix recovery problem by modeling changes in the low-rank factors of the pairwise comparison matrix. We provide theoretical guarantees for the recovery and numerically test the proposed theory with both synthetic and real-world data.


[33] 2106.09153

Selecting for Selection: Learning To Balance Adaptive and Diversifying Pressures in Evolutionary Search

Inspired by natural evolution, evolutionary search algorithms have proven remarkably capable due to their dual abilities to radiantly explore through diverse populations and to converge to adaptive pressures. A large part of this behavior comes from the selection function of an evolutionary algorithm, which is a metric for deciding which individuals survive to the next generation. In deceptive or hard-to-search fitness landscapes, greedy selection often fails, thus it is critical that selection functions strike the correct balance between gradient-exploiting adaptation and exploratory diversification. This paper introduces Sel4Sel, or Selecting for Selection, an algorithm that searches for high-performing neural-network-based selection functions through a meta-evolutionary loop. Results on three distinct bitstring domains indicate that Sel4Sel networks consistently match or exceed the performance of both fitness-based selection and benchmarks explicitly designed to encourage diversity. Analysis of the strongest Sel4Sel networks reveals a general tendency to favor highly novel individuals early on, with a gradual shift towards fitness-based selection as deceptive local optima are bypassed.


[34] 2106.09157

Positional Contrastive Learning for VolumetricMedical Image Segmentation

The success of deep learning heavily depends on the availability of large labeled training sets. However, it is hard to get large labeled datasets in medical image domain because of the strict privacy concern and costly labeling efforts. Contrastive learning, an unsupervised learning technique, has been proved powerful in learning image-level representations from unlabeled data. The learned encoder can then be transferred or fine-tuned to improve the performance of downstream tasks with limited labels. A critical step in contrastive learning is the generation of contrastive data pairs, which is relatively simple for natural image classification but quite challenging for medical image segmentation due to the existence of the same tissue or organ across the dataset. As a result, when applied to medical image segmentation, most state-of-the-art contrastive learning frameworks inevitably introduce a lot of false-negative pairs and result in degraded segmentation quality. To address this issue, we propose a novel positional contrastive learning (PCL) framework to generate contrastive data pairs by leveraging the position information in volumetric medical images. Experimental results on CT and MRI datasets demonstrate that the proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.


[35] 2106.09159

Automatic Curricula via Expert Demonstrations

We propose Automatic Curricula via Expert Demonstrations (ACED), a reinforcement learning (RL) approach that combines the ideas of imitation learning and curriculum learning in order to solve challenging robotic manipulation tasks with sparse reward functions. Curriculum learning solves complicated RL tasks by introducing a sequence of auxiliary tasks with increasing difficulty, yet how to automatically design effective and generalizable curricula remains a challenging research problem. ACED extracts curricula from a small amount of expert demonstration trajectories by dividing demonstrations into sections and initializing training episodes to states sampled from different sections of demonstrations. Through moving the reset states from the end to the beginning of demonstrations as the learning agent improves its performance, ACED not only learns challenging manipulation tasks with unseen initializations and goals, but also discovers novel solutions that are distinct from the demonstrations. In addition, ACED can be naturally combined with other imitation learning methods to utilize expert demonstrations in a more efficient manner, and we show that a combination of ACED with behavior cloning allows pick-and-place tasks to be learned with as few as 1 demonstration and block stacking tasks to be learned with 20 demonstrations.


[36] 2106.09161

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie ($\href{https://plv.colorado.edu/mungojerrie/}{plv.colorado.edu/mungojerrie}$) is a tool for testing reward schemes for $\omega$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $\omega$-automata specified in HOA.


[37] 2106.09164

mPyPl: Python Monadic Pipeline Library for Complex Functional Data Processing

In this paper, we present a new Python library called mPyPl, which is intended to simplify complex data processing tasks using functional approach. This library defines operations on lazy data streams of named dictionaries represented as generators (so-called multi-field datastreams), and allows enriching those data streams with more 'fields' in the process of data preparation and feature extraction. Thus, most data preparation tasks can be expressed in the form of neat linear 'pipeline', similar in syntax to UNIX pipes, or |> functional composition operator in F#. We define basic operations on multi-field data streams, which resemble classical monadic operations, and show similarity of the proposed approach to monads in functional programming. We also show how the library was used in complex deep learning tasks of event detection in video, and discuss different evaluation strategies that allow for different compromises in terms of memory and performance.


[38] 2106.09166

Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Although various optimization approaches have been proven to be effective in many DNNs on edge devices, most state-of-the-art work focuses on ad-hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different optimizations. In this paper, we qualitatively and quantitatively compare the energy-efficiency of FPGA-based and mobile-based DNN executions, and provide detailed analysis.


[39] 2106.09170

A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). The first approach is the simplest, yet the amount of labelled data available will limit the predictive performance. The second relies on finding and exploiting the underlying characteristics of the data distribution. The third depends on an external agent to provide the required labels in a timely fashion. This survey pays special attention to methods that leverage unlabelled data in a semi-supervised setting. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them.


[40] 2106.09171

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning. Recent works have focused on each of these modalities separately, while others have attempted to model both simultaneously in a cross-modal fashion. However, comparatively little attention has been given to leveraging one modality as a training objective to learn from the other. In this work, we propose Learning visual speech Representations from Audio via self-supervision (LiRA). Specifically, we train a ResNet+Conformer model to predict acoustic features from unlabelled visual speech. We find that this pre-trained model can be leveraged towards word-level and sentence-level lip-reading through feature extraction and fine-tuning experiments. We show that our approach significantly outperforms other self-supervised methods on the Lip Reading in the Wild (LRW) dataset and achieves state-of-the-art performance on Lip Reading Sentences 2 (LRS2) using only a fraction of the total labelled data.


[41] 2106.09173

Cross-Language Code Search using Static and Dynamic Analyses

As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search.


[42] 2106.09174

Can I Be of Further Assistance? Using Unstructured Knowledge Access to Improve Task-oriented Conversational Modeling

Most prior work on task-oriented dialogue systems are restricted to limited coverage of domain APIs. However, users oftentimes have requests that are out of the scope of these APIs. This work focuses on responding to these beyond-API-coverage user turns by incorporating external, unstructured knowledge sources. Our approach works in a pipelined manner with knowledge-seeking turn detection, knowledge selection, and response generation in sequence. We introduce novel data augmentation methods for the first two steps and demonstrate that the use of information extracted from dialogue context improves the knowledge selection and end-to-end performances. Through experiments, we achieve state-of-the-art performance for both automatic and human evaluation metrics on the DSTC9 Track 1 benchmark dataset, validating the effectiveness of our contributions.


[43] 2106.09175

Efficient and accurate KAM tori construction for the dissipative spin-orbit problem using a map reduction

We consider the dissipative spin-orbit problem in Celestial Mechanics, which describes the rotational motion of a triaxial satellite moving on a Keplerian orbit subject to tidal forcing and "drift". Our goal is to construct quasi-periodic solutions with fixed frequency, satisfying appropriate conditions. With the goal of applying rigorous KAM theory, we compute such quasi-periodic solution with very high precision. To this end, we have developed a very efficient algorithm. The first step is to compute very accurately the return map to a surface of section (using a high order Taylor's method with extended precision). Then, we find an invariant curve for the return map using recent algorithms that take advantage of the geometric features of the problem. This method is based on a rapidly convergent Newton's method which is guaranteed to converge if the initial error is small enough. So, it is very suitable for a continuation algorithm. The resulting algorithm is quite efficient. We only need to deal with a one dimensional function. If this function is discretized in $N$ points, the algorithm requires $O(N \log N) $ operations and $O(N) $ storage. The most costly step (the numerical integration of the equation along a turn) is trivial to parallelize. The main goal of the paper is to present the algorithms, implementation details and several sample results of runs. We also present both a rigorous and a numerical comparison of the results of averaged and not averaged models.


[44] 2106.09177

Insights into Data through Model Behaviour: An Explainability-driven Strategy for Data Auditing for Responsible Computer Vision Applications

In this study, we take a departure and explore an explainability-driven strategy to data auditing, where actionable insights into the data at hand are discovered through the eyes of quantitative explainability on the behaviour of a dummy model prototype when exposed to data. We demonstrate this strategy by auditing two popular medical benchmark datasets, and discover hidden data quality issues that lead deep learning models to make predictions for the wrong reasons. The actionable insights gained from this explainability driven data auditing strategy is then leveraged to address the discovered issues to enable the creation of high-performing deep learning models with appropriate prediction behaviour. The hope is that such an explainability-driven strategy can be complimentary to data-driven strategies to facilitate for more responsible development of machine learning algorithms for computer vision applications.


[45] 2106.09178

The Fishnet Open Images Database: A Dataset for Fish Detection and Fine-Grained Categorization in Fisheries

Camera-based electronic monitoring (EM) systems are increasingly being deployed onboard commercial fishing vessels to collect essential data for fisheries management and regulation. These systems generate large quantities of video data which must be reviewed on land by human experts. Computer vision can assist this process by automatically detecting and classifying fish species, however the lack of existing public data in this domain has hindered progress. To address this, we present the Fishnet Open Images Database, a large dataset of EM imagery for fish detection and fine-grained categorization onboard commercial fishing vessels. The dataset consists of 86,029 images containing 34 object classes, making it the largest and most diverse public dataset of fisheries EM imagery to-date. It includes many of the characteristic challenges of EM data: visual similarity between species, skewed class distributions, harsh weather conditions, and chaotic crew activity. We evaluate the performance of existing detection and classification algorithms and demonstrate that the dataset can serve as a challenging benchmark for development of computer vision algorithms in fisheries. The dataset is available at https://www.fishnet.ai/.


[46] 2106.09179

Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for Hyperparameter Recommendation

With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. Although methods have been proposed to speed up tuning via knowledge transfer, they typically require the final performance of hyperparameters and do not focus on low-fidelity information. Nevertheless, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage the low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in the additional observations and the need for performance forecasting. Therefore, we conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.


[47] 2106.09180

RHNAS: Realizable Hardware and Neural Architecture Search

The rapidly evolving field of Artificial Intelligence necessitates automated approaches to co-design neural network architecture and neural accelerators to maximize system efficiency and address productivity challenges. To enable joint optimization of this vast space, there has been growing interest in differentiable NN-HW co-design. Fully differentiable co-design has reduced the resource requirements for discovering optimized NN-HW configurations, but fail to adapt to general hardware accelerator search spaces. This is due to the existence of non-synthesizable (invalid) designs in the search space of many hardware accelerators. To enable efficient and realizable co-design of configurable hardware accelerators with arbitrary neural network search spaces, we introduce RHNAS. RHNAS is a method that combines reinforcement learning for hardware optimization with differentiable neural architecture search. RHNAS discovers realizable NN-HW designs with 1.84x lower latency and 1.86x lower energy-delay product (EDP) on ImageNet and 2.81x lower latency and 3.30x lower EDP on CIFAR-10 over the default hardware accelerator design.


[48] 2106.09184

A fourth-order compact time-splitting method for the Dirac equation with time-dependent potentials

In this paper, we present an approach to deal with the dynamics of the Dirac equation with time-dependent electromagnetic potentials using the fourth-order compact time-splitting method ($S_\text{4c}$). To this purpose, the time-ordering technique for time-dependent Hamiltonians is introduced, so that the influence of the time-dependence could be limited to certain steps which are easy to treat. Actually, in the case of the Dirac equation, it turns out that only those steps involving potentials need to be amended, and the scheme remains efficient, accurate, as well as easy to implement. Numerical examples in 1D and 2D are given to validate the scheme.


[49] 2106.09191

The Biot-Stokes coupling using total pressure: formulation, analysis and application to interfacial flow in the eye

We consider a multiphysics model for the flow of Newtonian fluid coupled with Biot consolidation equations through an interface, and incorporating total pressure as an unknown in the poroelastic region. A new mixed-primal finite element scheme is proposed solving for the pairs fluid velocity - pressure and displacement - total poroelastic pressure using Stokes-stable elements, and where the formulation does not require Lagrange multipliers to set up the usual transmission conditions on the interface. The stability and well-posedness of the continuous and semi-discrete problems are analysed in detail. Our numerical study {is framed in} the context of different interfacial flow regimes in Cartesian and axisymmetric coordinates that could eventually help describe early morphologic changes associated with glaucoma development in canine species.


[50] 2106.09196

CoreUI: Interactive Core Training System with 3D Human Shape

We present an interactive core training system for core training using a monocular camera image as input in this paper. It is commonly expensive to capture human pose using depth cameras or multiple cameras with conventional approaches. To solve this issue, we employ the skinned multi-person linear model of human shape to recover the 3D human pose from 2D images using pose estimation and human mesh recovery approaches. In order to support the user in maintaining the correct postures from target poses in the training, we adopt 3D human shape estimation for both the target image and input camera video. We propose CoreUI, a user interface for providing visual guidance showing the differences among the estimated targets and current human shapes in core training, which are visualized by markers at ten body parts with color changes. From our user studies, the proposed core training system is effective and convenient compared with the conventional guidance of 2D skeletons.


[51] 2106.09198

Learning Perceptual Manifold of Fonts

Along the rapid development of deep learning techniques in generative models, it is becoming an urgent issue to combine machine intelligence with human intelligence to solve the practical applications. Motivated by this methodology, this work aims to adjust the machine generated character fonts with the effort of human workers in the perception study. Although numerous fonts are available online for public usage, it is difficult and challenging to generate and explore a font to meet the preferences for common users. To solve the specific issue, we propose the perceptual manifold of fonts to visualize the perceptual adjustment in the latent space of a generative model of fonts. In our framework, we adopt the variational autoencoder network for the font generation. Then, we conduct a perceptual study on the generated fonts from the multi-dimensional latent space of the generative model. After we obtained the distribution data of specific preferences, we utilize manifold learning approach to visualize the font distribution. In contrast to the conventional user interface in our user study, the proposed font-exploring user interface is efficient and helpful in the designated user preference.


[52] 2106.09199

A Two-stage Multi-modal Affect Analysis Framework for Children with Autism Spectrum Disorder

Autism spectrum disorder (ASD) is a developmental disorder that influences the communication and social behavior of a person in a way that those in the spectrum have difficulty in perceiving other people's facial expressions, as well as presenting and communicating emotions and affect via their own faces and bodies. Some efforts have been made to predict and improve children with ASD's affect states in play therapy, a common method to improve children's social skills via play and games. However, many previous works only used pre-trained models on benchmark emotion datasets and failed to consider the distinction in emotion between typically developing children and children with autism. In this paper, we present an open-source two-stage multi-modal approach leveraging acoustic and visual cues to predict three main affect states of children with ASD's affect states (positive, negative, and neutral) in real-world play therapy scenarios, and achieved an overall accuracy of 72:40%. This work presents a novel way to combine human expertise and machine intelligence for ASD affect recognition by proposing a two-stage schema.


[53] 2106.09201

Trilateral Attention Network for Real-time Medical Image Segmentation

Accurate segmentation of medical images into anatomically meaningful regions is critical for the extraction of quantitative indices or biomarkers. The common pipeline for segmentation comprises regions of interest detection stage and segmentation stage, which are independent of each other and typically performed using separate deep learning networks. The performance of the segmentation stage highly relies on the extracted set of spatial features and the receptive fields. In this work, we propose an end-to-end network, called Trilateral Attention Network (TaNet), for real-time detection and segmentation in medical images. TaNet has a module for region localization, and three segmentation pathways: 1) handcrafted pathway with hand-designed convolutional kernels, 2) detail pathway with regular convolutional kernels, and 3) a global pathway to enlarge the receptive field. The first two pathways encode rich handcrafted and low-level features extracted by hand-designed and regular kernels while the global pathway encodes high-level context information. By jointly training the network for localization and segmentation using different sets of features, TaNet achieved superior performance, in terms of accuracy and speed, when evaluated on an echocardiography dataset for cardiac segmentation. The code and models will be made publicly available in TaNet Github page.


[54] 2106.09203

Learning from Demonstration without Demonstrations

State-of-the-art reinforcement learning (RL) algorithms suffer from high sample complexity, particularly in the sparse reward case. A popular strategy for mitigating this problem is to learn control policies by imitating a set of expert demonstrations. The drawback of such approaches is that an expert needs to produce demonstrations, which may be costly in practice. To address this shortcoming, we propose Probabilistic Planning for Demonstration Discovery (P2D2), a technique for automatically discovering demonstrations without access to an expert. We formulate discovering demonstrations as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree to find demonstration trajectories. These demonstrations are used to initialize a policy, then refined by a generic RL algorithm. We provide theoretical guarantees of P2D2 finding successful trajectories, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration RL techniques in a range of classic control and robotics tasks, requiring only a fraction of exploration samples and achieving better asymptotic performance.


[55] 2106.09204

An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models

The performance of fine-tuning pre-trained language models largely depends on the hyperparameter configuration. In this paper, we investigate the performance of modern hyperparameter optimization methods (HPO) on fine-tuning pre-trained language models. First, we study and report three HPO algorithms' performances on fine-tuning two state-of-the-art language models on the GLUE dataset. We find that using the same time budget, HPO often fails to outperform grid search due to two reasons: insufficient time budget and overfitting. We propose two general strategies and an experimental procedure to systematically troubleshoot HPO's failure cases. By applying the procedure, we observe that HPO can succeed with more appropriate settings in the search space and time budget; however, in certain cases overfitting remains. Finally, we make suggestions for future work. Our implementation can be found in https://github.com/microsoft/FLAML/tree/main/flaml/nlp/.


[56] 2106.09206

QWin: Enforcing Tail Latency SLO at Shared Storage Backend

Consolidating latency-critical (LC) and best-effort (BE) tenants at storage backend helps to increase resources utilization. Even if tenants use dedicated queues and threads to achieve performance isolation, threads are still contend for CPU cores. Therefore, we argue that it is necessary to partition cores between LC and BE tenants, and meanwhile each core is dedicated to run a thread. Expect for frequently changing bursty load, fluctuated service time at storage backend also drastically changes the need of cores. In order to guarantee tail latency service level objectives (SLOs), the abrupt changing need of cores must be satisfied immediately. Otherwise, tail latency SLO violation happens. Unfortunately, partitioning-based approaches lack the ability to react the changing need of cores, resulting in extreme spikes in latency and SLO violation happens. In this paper, we present QWin, a tail latency SLO aware core allocation to enforce tail latency SLO at shared storage backend. QWin consists of an SLO-to-core calculation model that accurately calculates the number of cores combining with definitive runtime load determined by a flexible request-based window, and an autonomous core allocation that adjusts cores at adaptive frequency by dynamically changing core policies. When consolidating multiple LC and BE tenants, QWin outperforms the-state-of-the-art approaches in guaranteeing tail latency SLO for LC tenants and meanwhile increasing bandwidth of BE tenants by up to 31x.


[57] 2106.09207

On the Power of Preconditioning in Sparse Linear Regression

Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian $N(0,\Sigma)$ with $\Sigma : n \times n$, and seek estimators $\hat{w}$ minimizing $(\hat{w}-w^*)^T\Sigma(\hat{w}-w^*)$, where $w^*$ is the $k$-sparse ground truth. Information theoretically, one can achieve strong error bounds with $O(k \log n)$ samples for arbitrary $\Sigma$ and $w^*$; however, no efficient algorithms are known to match these guarantees even with $o(n)$ samples, without further assumptions on $\Sigma$ or $w^*$. As far as hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In this work, we give upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if $\Sigma$ is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-$t$ graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires $\Omega(t^{1/20})$ samples to recover $O(\log n)$-sparse signals when covariates are drawn from this model.


[58] 2106.09208

External Service Sensing (ESS): Research Framework, Challenges and Opportunities

The flourish of web-based services gave birth to the research area \textit{services computing}, a rapidly-expanding academic community since nearly 20 years ago. Consensus has been reached on a set of representative research problems in services computing, such as service selection, service composition, service recommendation, and service quality prediction. An obvious fact is that most services keep constant changes to timely adapt to changes of external business/technical environment and changes of internal development strategies. However, traditional services computing research does not consider such changes sufficiently. Many works regard services as \textit{static} entities; this leads to the situation that some proposed models/algorithms do not work in real world. Sensing various types of service changes is of great significance to the practicability and rationality of services computing research. In this paper, a new research problem \textit{External Service Sensing} (ESS) is defined to cope with various changes in services, and a research framework of ESS is presented to elaborate the scope and boundary of ESS. This framework is composed of four orthogonal dimensions: sensing objects, sensing contents, sensing channels, and sensing techniques. Each concrete ESS problem is defined by combining different values in these dimensions, and existing research work related to service changes can be well adapted to this framework. Real-world case studies demonstrate the soundness of ESS and its framework. Finally, some challenges and opportunities in ESS research are listed for researchers in the services computing community. To the best of our knowledge, this is the first time to systematically define service change-related research as a standard services computing problem, and thus broadening the research scope of services computing.


[59] 2106.09211

Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery

We propose a new framework -- Square Root Principal Component Pursuit -- for low-rank matrix recovery from observations corrupted with noise and outliers. Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level. We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a priori unknown) noise level. In comparison, previous formulations such as stable PCP rely on noise-dependent parameters to achieve similar performance, and are therefore challenging to deploy in applications where the noise level is unknown. We validate the effectiveness of our new method through experiments on simulated and real datasets. Our simulations corroborate the claim that a universal choice of the regularization parameter yields near optimal performance across a range of noise levels, indicating that the proposed method outperforms the (somewhat loose) bound proved here.


[60] 2106.09212

Long-Short Temporal Contrastive Learning of Video Transformers

Video transformers have recently emerged as a competitive alternative to 3D CNNs for video understanding. However, due to their large number of parameters and reduced inductive biases, these models require supervised pretraining on large-scale image datasets to achieve top performance. In this paper, we empirically demonstrate that self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results that are on par or better than those obtained with supervised pretraining on large-scale image datasets, even massive ones such as ImageNet-21K. Since transformer-based models are effective at capturing dependencies over extended temporal spans, we propose a simple learning procedure that forces the model to match a long-term view to a short-term view of the same video. Our approach, named Long-Short Temporal Contrastive Learning (LSTCL), enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent. To demonstrate the generality of our findings, we implement and validate our approach under three different self-supervised contrastive learning frameworks (MoCo v3, BYOL, SimSiam) using two distinct video-transformer architectures, including an improved variant of the Swin Transformer augmented with space-time attention. We conduct a thorough ablation study and show that LSTCL achieves competitive performance on multiple video benchmarks and represents a convincing alternative to supervised image-based pretraining.


[61] 2106.09218

DroidMorph: Are We Ready to Stop the Attack of Android Malware Clones?

The number of Android malware variants (clones) are on the rise and, to stop this attack of clones we need to develop new methods and techniques for analysing and detecting them. As a first step, we need to study how these malware clones are generated. This will help us better anticipate and recognize these clones. In this paper we present a new tool named DroidMorph, that provides morphing of Android applications (APKs) at different level of abstractions, and can be used to create Android application (malware/benign) clones. As a case study we perform testing and evaluating resilience of current commercial anti-malware products against attack of the Android malware clones generated by DroidMorph. We found that 8 out of 17 leading commercial anti-malware programs were not able to detect any of the morphed APKs. We hope that DroidMorph will be used in future research, to improve Android malware clones analysis and detection, and help stop them.


[62] 2106.09219

Decentralised Intelligence, Surveillance, and Reconnaissance in Unknown Environments with Heterogeneous Multi-Robot Systems

We present the design and implementation of a decentralised, heterogeneous multi-robot system for performing intelligence, surveillance and reconnaissance (ISR) in an unknown environment. The team consists of functionally specialised robots that gather information and others that perform a mission-specific task, and is coordinated to achieve simultaneous exploration and exploitation in the unknown environment. We present a practical implementation of such a system, including decentralised inter-robot localisation, mapping, data fusion and coordination. The system is demonstrated in an efficient distributed simulation. We also describe an UAS platform for hardware experiments, and the ongoing progress.


[63] 2106.09223

Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks

To evaluate the robustness gain of Bayesian neural networks on image classification tasks, we perform input perturbations, and adversarial attacks to the state-of-the-art Bayesian neural networks, with a benchmark CNN model as reference. The attacks are selected to simulate signal interference and cyberattacks towards CNN-based machine learning systems. The result shows that a Bayesian neural network achieves significantly higher robustness against adversarial attacks generated against a deterministic neural network model, without adversarial training. The Bayesian posterior can act as the safety precursor of ongoing malicious activities. Furthermore, we show that the stochastic classifier after the deterministic CNN extractor has sufficient robustness enhancement rather than a stochastic feature extractor before the stochastic classifier. This advises on utilizing stochastic layers in building decision-making pipelines within a safety-critical domain.


[64] 2106.09225

On the Capabilities of Pointer Networks for Deep Deductive Reasoning

The importance of building neural networks that can learn to reason has been well recognized in the neuro-symbolic community. In this paper, we apply neural pointer networks for conducting reasoning over symbolic knowledge bases. In doing so, we explore the benefits and limitations of encoder-decoder architectures in general and pointer networks in particular for developing accurate, generalizable and robust neuro-symbolic reasoners. Based on our experimental results, pointer networks performs remarkably well across multiple reasoning tasks while outperforming the previously reported state of the art by a significant margin. We observe that the Pointer Networks preserve their performance even when challenged with knowledge graphs of the domain/vocabulary it has never encountered before. To the best of our knowledge, this is the first study on neuro-symbolic reasoning using Pointer Networks. We hope our impressive results on these reasoning problems will encourage broader exploration of pointer networks' capabilities for reasoning over more complex logics and for other neuro-symbolic problems.


[65] 2106.09226

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.


[66] 2106.09227

Current Challenges and Future Directions in Podcast Information Access

Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we highlight the many differences between podcasts and other media, and discuss our perspective on challenges and future research directions in the domain of podcast information access.


[67] 2106.09229

An Evaluation of Self-Supervised Pre-Training for Skin-Lesion Analysis

Self-supervised pre-training appears as an advantageous alternative to supervised pre-trained for transfer learning. By synthesizing annotations on pretext tasks, self-supervision allows to pre-train models on large amounts of pseudo-labels before fine-tuning them on the target task. In this work, we assess self-supervision for the diagnosis of skin lesions, comparing three self-supervised pipelines to a challenging supervised baseline, on five test datasets comprising in- and out-of-distribution samples. Our results show that self-supervision is competitive both in improving accuracies and in reducing the variability of outcomes. Self-supervision proves particularly useful for low training data scenarios ($<1\,500$ and $<150$ samples), where its ability to stabilize the outcomes is essential to provide sound results.


[68] 2106.09230

JSI at the FinSim-2 task: Ontology-Augmented Financial Concept Classification

Ontologies are increasingly used for machine reasoning over the last few years. They can provide explanations of concepts or be used for concept classification if there exists a mapping from the desired labels to the relevant ontology. Another advantage of using ontologies is that they do not need a learning process, meaning that we do not need the train data or time before using them. This paper presents a practical use of an ontology for a classification problem from the financial domain. It first transforms a given ontology to a graph and proceeds with generalization with the aim to find common semantic descriptions of the input sets of financial concepts. We present a solution to the shared task on Learning Semantic Similarities for the Financial Domain (FinSim-2 task). The task is to design a system that can automatically classify concepts from the Financial domain into the most relevant hypernym concept in an external ontology - the Financial Industry Business Ontology. We propose a method that maps given concepts to the mentioned ontology and performs a graph search for the most relevant hypernyms. We also employ a word vectorization method and a machine learning classifier to supplement the method with a ranked list of labels for each concept.


[69] 2106.09231

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Previous literatures show that pre-trained masked language models (MLMs) such as BERT can achieve competitive factual knowledge extraction performance on some datasets, indicating that MLMs can potentially be a reliable knowledge source. In this paper, we conduct a rigorous study to explore the underlying predicting mechanisms of MLMs over different extraction paradigms. By investigating the behaviors of MLMs, we find that previous decent performance mainly owes to the biased prompts which overfit dataset artifacts. Furthermore, incorporating illustrative cases and external contexts improve knowledge prediction mainly due to entity type guidance and golden answer leakage. Our findings shed light on the underlying predicting mechanisms of MLMs, and strongly question the previous conclusion that current MLMs can potentially serve as reliable factual knowledge bases.


[70] 2106.09232

Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction

Event extraction is challenging due to the complex structure of event records and the semantic gap between text and event. Traditional methods usually extract event records by decomposing the complex structure prediction task into multiple subtasks. In this paper, we propose Text2Event, a sequence-to-structure generation paradigm that can directly extract events from the text in an end-to-end manner. Specifically, we design a sequence-to-structure network for unified event extraction, a constrained decoding algorithm for event knowledge injection during inference, and a curriculum learning algorithm for efficient model learning. Experimental results show that, by uniformly modeling all tasks in a single model and universally predicting different labels, our method can achieve competitive performance using only record-level annotations in both supervised learning and transfer learning settings.


[71] 2106.09233

De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention

Distant supervision tackles the data bottleneck in NER by automatically generating training instances via dictionary matching. Unfortunately, the learning of DS-NER is severely dictionary-biased, which suffers from spurious correlations and therefore undermines the effectiveness and the robustness of the learned models. In this paper, we fundamentally explain the dictionary bias via a Structural Causal Model (SCM), categorize the bias into intra-dictionary and inter-dictionary biases, and identify their causes. Based on the SCM, we learn de-biased DS-NER via causal interventions. For intra-dictionary bias, we conduct backdoor adjustment to remove the spurious correlations introduced by the dictionary confounder. For inter-dictionary bias, we propose a causal invariance regularizer which will make DS-NER models more robust to the perturbation of dictionaries. Experiments on four datasets and three DS-NER models show that our method can significantly improve the performance of DS-NER.


[72] 2106.09234

Denoising Distantly Supervised Named Entity Recognition via a Hypergeometric Probabilistic Model

Denoising is the essential step for distant supervision based named entity recognition. Previous denoising methods are mostly based on instance-level confidence statistics, which ignore the variety of the underlying noise distribution on different datasets and entity types. This makes them difficult to be adapted to high noise rate settings. In this paper, we propose Hypergeometric Learning (HGL), a denoising algorithm for distantly supervised NER that takes both noise distribution and instance-level confidence into consideration. Specifically, during neural network training, we naturally model the noise samples in each batch following a hypergeometric distribution parameterized by the noise-rate. Then each instance in the batch is regarded as either correct or noisy one according to its label confidence derived from previous training step, as well as the noise distribution in this sampled batch. Experiments show that HGL can effectively denoise the weakly-labeled data retrieved from distant supervision, and therefore results in significant improvements on the trained models.


[73] 2106.09236

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition

End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superior performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.


[74] 2106.09237

Towards Assurance-Driven Architectural Decomposition of Software Systems

Computer systems are so complex, so they are usually designed and analyzed in terms of layers of abstraction. Complexity is still a challenge facing logical reasoning tools that are used to find software design flaws and implementation bugs. Abstraction is also a common technique for scaling those tools to more complex systems. However, the abstractions used in the design phase of systems are in many cases different from those used for assurance. In this paper we argue that different software quality assurance techniques operate on different aspects of software systems. To facilitate assurance, and for a smooth integration of assurance tools into the Software Development Lifecycle (SDLC), we present a 4-dimensional meta-architecture that separates computational, coordination, and stateful software artifacts early on in the design stage. We enumerate some of the design and assurance challenges that can be addressed by this meta-architecture, and demonstrate it on the high-level design of a simple file system.


[75] 2106.09241

CoANE: Modeling Context Co-occurrence for Attributed Network Embedding

Attributed network embedding (ANE) is to learn low-dimensional vectors so that not only the network structure but also node attributes can be preserved in the embedding space. Existing ANE models do not consider the specific combination between graph structure and attributes. While each node has its structural characteristics, such as highly-interconnected neighbors along with their certain patterns of attribute distribution, each node's neighborhood should be not only depicted by multi-hop nodes, but consider certain clusters or social circles. To model such information, in this paper, we propose a novel ANE model, Context Co-occurrence-aware Attributed Network Embedding (CoANE). The basic idea of CoANE is to model the context attributes that each node's involved diverse patterns, and apply the convolutional mechanism to encode positional information by treating each attribute as a channel. The learning of context co-occurrence can capture the latent social circles of each node. To better encode structural and semantic knowledge of nodes, we devise a three-way objective function, consisting of positive graph likelihood, contextual negative sampling, and attribute reconstruction. We conduct experiments on five real datasets in the tasks of link prediction, node label classification, and node clustering. The results exhibit that CoANE can significantly outperform state-of-the-art ANE models.


[76] 2106.09242

CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing

Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.


[77] 2106.09244

Deep Contrastive Graph Representation via Adaptive Homotopy Learning

Homotopy model is an excellent tool exploited by diverse research works in the field of machine learning. However, its flexibility is limited due to lack of adaptiveness, i.e., manual fixing or tuning the appropriate homotopy coefficients. To address the problem above, we propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed, such that the homotopy parameters can be adaptively obtained. Accordingly, the proposed AH can be widely utilized to enhance the homotopy-based algorithm. In particular, in this paper, we apply AH to contrastive learning (AHCL) such that it can be effectively transferred from weak-supervised learning (given label priori) to unsupervised learning, where soft labels of contrastive learning are directly and adaptively learned. Accordingly, AHCL has the adaptive ability to extract deep features without any sort of prior information. Consequently, the affinity matrix formulated by the related adaptive labels can be constructed as the deep Laplacian graph that incorporates the topology of deep representations for the inputs. Eventually, extensive experiments on benchmark datasets validate the superiority of our method.


[78] 2106.09245

A Material Mask Overlay Strategy for Close to Binary Design-dependent Pressure-loaded Optimized Topologies

This paper presents a Material Mask Overlay Strategy topology optimization approach with improved material assignment at the element level for achieving close to black-and-white designs for pressure-loaded problems. Hexagonal elements are employed to parametrize the design domain as this tessellation provides nonsingular local connectivity. Elliptical negative masks are used to find the optimized material layout. The material dilation and material erosion variables of each mask are systematically varied in association with a gray-scale measure constraint to achieve designs close to 0-1. Darcy's law in association with a drainage term is used to formulate the pressure field. The obtained pressure field is converted into the consistent nodal forces using Wachspress shape functions. Sensitivities of the objective and pressure load are evaluated using the adjoint-variable method. The approach is demonstrated by solving various pressure-loaded structures and pressure-actuated compliant mechanisms. Compliance is minimized for loadbearing structures, whereas a multicriteria objective is minimized for mechanism designs.


[79] 2106.09246

Federated CycleGAN for Privacy-Preserving Image-to-Image Translation

Unsupervised image-to-image translation methods such as CycleGAN learn to convert images from one domain to another using unpaired training data sets from different domains. Unfortunately, these approaches still require centrally collected unpaired records, potentially violating privacy and security issues. Although the recent federated learning (FL) allows a neural network to be trained without data exchange, the basic assumption of the FL is that all clients have their own training data from a similar domain, which is different from our image-to-image translation scenario in which each client has images from its unique domain and the goal is to learn image translation between different domains without accessing the target domain data. To address this, here we propose a novel federated CycleGAN architecture that can learn image translation in an unsupervised manner while maintaining the data privacy. Specifically, our approach arises from a novel observation that CycleGAN loss can be decomposed into the sum of client specific local objectives that can be evaluated using only their data. This local objective decomposition allows multiple clients to participate in federated CycleGAN training without sacrificing performance. Furthermore, our method employs novel switchable generator and discriminator architecture using Adaptive Instance Normalization (AdaIN) that significantly reduces the band-width requirement of the federated learning. Our experimental results on various unsupervised image translation tasks show that our federated CycleGAN provides comparable performance compared to the non-federated counterpart.


[80] 2106.09248

X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

In this work, we introduce X-FACT: the largest publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers. The dataset includes a multilingual evaluation benchmark that measures both out-of-domain generalization, and zero-shot capabilities of the multilingual models. Using state-of-the-art multilingual transformer-based models, we develop several automated fact-checking models that, along with textual claims, make use of additional metadata and evidence from news stories retrieved using a search engine. Empirically, our best model attains an F-score of around 40%, suggesting that our dataset is a challenging benchmark for evaluation of multilingual fact-checking models.


[81] 2106.09249

Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks

In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on camera- or LiDAR-based AD perception alone. However, production AD systems today predominantly adopt a Multi-Sensor Fusion (MSF) based design, which in principle can be more robust against these attacks under the assumption that not all fusion sources are (or can be) attacked at the same time. In this paper, we present the first study of security issues of MSF-based perception in AD systems. We directly challenge the basic MSF design assumption above by exploring the possibility of attacking all fusion sources simultaneously. This allows us for the first time to understand how much security guarantee MSF can fundamentally provide as a general defense strategy for AD perception. We formulate the attack as an optimization problem to generate a physically-realizable, adversarial 3D-printed object that misleads an AD system to fail in detecting it and thus crash into it. We propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. We evaluate our attack on MSF included in representative open-source industry-grade AD systems in real-world driving scenarios. Our results show that the attack achieves over 90% success rate across different object types and MSF. Our attack is also found stealthy, robust to victim positions, transferable across MSF algorithms, and physical-world realizable after being 3D-printed and captured by LiDAR and camera devices. To concretely assess the end-to-end safety impact, we further perform simulation evaluation and show that it can cause a 100% vehicle collision rate for an industry-grade AD system.


[82] 2106.09251

Optical Mouse: 3D Mouse Pose From Single-View Video

We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos. Many human clinical conditions and their corresponding animal models result in abnormal motion, and accurately measuring 3D motion at scale offers insights into health. The 3D poses improve classification of health-related attributes over 2D representations. The inferred poses are accurate enough to estimate stride length even when the feet are mostly occluded. This method could be applied as part of a continuous monitoring system to non-invasively measure animal health.


[83] 2106.09252

Temporal Logic Planning for Minimum-Time Positioning of Multiple Threat-Seduction Decoys

Reusable decoys offer a cost-effective alternative to the single-use hardware commonly applied to protect surface assets from threats. Such decoys portray fake assets to lure threats away from the true asset. To deceive a threat, a decoy first has to position itself such that it can break the radar lock. Considering multiple simultaneous threats, this paper introduces an approach for controlling multiple decoys to minimise the time required to break the locks of all the threats. The method includes the optimal allocation of one decoy to every threat with an assignment procedure that provides local position constraints to guarantee collision avoidance and thereby decouples the control of the decoys. A crude model of a decoy with uncertainty is considered for motion planning. The task of a decoy reaching a state in which the lock of the assigned threat can be broken is formulated as a temporal logic specification. To this end, the requirements to complete the task are modelled as time-varying set-membership constraints. The temporal and logical combination of the constraints is encoded in a mixed-integer optimisation problem. To demonstrate the results a simulated case study is provided.


[84] 2106.09256

Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.


[85] 2106.09257

Learning Robot Exploration Strategy with 4D Point-Clouds-like Information as Observations

Being able to explore unknown environments is a requirement for fully autonomous robots. Many learning-based methods have been proposed to learn an exploration strategy. In the frontier-based exploration, learning algorithms tend to learn the optimal or near-optimal frontier to explore. Most of these methods represent the environments as fixed size images and take these as inputs to neural networks. However, the size of environments is usually unknown, which makes these methods fail to generalize to real world scenarios. To address this issue, we present a novel state representation method based on 4D point-clouds-like information, including the locations, frontier, and distance information. We also design a neural network that can process these 4D point-clouds-like information and generate the estimated value for each frontier. Then this neural network is trained using the typical reinforcement learning framework. We test the performance of our proposed method by comparing it with other five methods and test its scalability on a map that is much larger than maps in the training set. The experiment results demonstrate that our proposed method needs shorter average traveling distances to explore whole environments and can be adopted in maps with arbitrarily sizes.


[86] 2106.09258

Knowledge Graphs and Machine Learning in biased C4I applications

This paper introduces our position on the critical issue of bias that recently appeared in AI applications. Specifically, we discuss the combination of current technologies used in AI applications i.e., Machine Learning and Knowledge Graphs, and point to their involvement in (de)biased applications of the C4I domain. Although this is a wider problem that currently emerges from different application domains, bias appears more critical in C4I than in others due to its security-related nature. While proposing certain actions to be taken towards debiasing C4I applications, we acknowledge the immature aspect of this topic within the Knowledge Graph and Semantic Web communities.


[87] 2106.09259

A Random CNN Sees Objects: One Inductive Bias of CNN and Its Applications

This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias (``The object is at sight'') in this paper. This empirical inductive bias is further analyzed and successfully applied to self-supervised learning. A CNN is encouraged to learn representations that focus on the foreground object, by transforming every image into various versions with different backgrounds, where the foreground and background separation is guided by Tobias. Experimental results show that the proposed Tobias significantly improves downstream tasks, especially for object detection. This paper also shows that Tobias has consistent improvements on training sets of different sizes, and is more resilient to changes in image augmentations. Our codes will be available at https://github.com/CupidJay/Tobias.


[88] 2106.09260

Joining datasets via data augmentation in the label space for neural networks

Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.


[89] 2106.09261

Coded Federated Learning Framework for AI-Based Mobile Application Services with Privacy-Awareness

By encoding computing tasks, coded computing can not only mitigate straggling problems in federated learning (FL), but also preserve privacy of sensitive data uploaded/contributed by participating mobile users (MUs) to the centralized server, owned by a mobile application provider (MAP). However, these advantages come with extra coding cost/complexity and communication overhead (referred to as \emph{privacy cost}) that must be considered given the limited computing/communications resources at MUs/MAP, the rationality and incentive competition among MUs in contributing data to the MAP. This article proposes a novel coded FL-based framework for a privacy-aware mobile application service to address these challenges. In particular, the MAP first determines a set of the best MUs for the FL process based on MUs' provided information/features. Then, each selected MU can propose a contract to the MAP according to its expected trainable local data and privacy-protected coded data. To find the optimal contracts that can maximize utilities of the MAP and all the participating MUs while maintaining high learning quality of the whole system, we first develop a multi-principal one-agent contract-based problem leveraging coded FL-based multiple utility functions under the MUs' privacy cost, the MAP's limited computing resource, and asymmetric information between the MAP and MUs. Then, we transform the problem into an equivalent low-complexity problem and develop an iterative algorithm to solve it. Experiments with a real-world dataset show that our framework can speed up training time up to 49% and improve prediction accuracy up to 4.6 times while enhancing network's social welfare, i.e., total utility of all participating entities, up to 114% under the privacy cost consideration compared with those of baseline methods.


[90] 2106.09269

Pruning Randomly Initialized Neural Networks with Iterative Randomization

Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet.


[91] 2106.09274

Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multi-user in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance.


[92] 2106.09279

Field trial on Ocean Estimation for Multi-Vessel Multi-Float-based Active perception

Marine vehicles have been used for various scientific missions where information over features of interest is collected. In order to maximise efficiency in collecting information over a large search space, we should be able to deploy a large number of autonomous vehicles that make a decision based on the latest understanding of the target feature in the environment. In our previous work, we have presented a hierarchical framework for the multi-vessel multi-float (MVMF) problem where surface vessels drop and pick up underactuated floats in a time-minimal way. In this paper, we present the field trial results using the framework with a number of drifters and floats. We discovered a number of important aspects that need to be considered in the proposed framework, and present the potential approaches to address the challenges.


[93] 2106.09281

MatES: Web-based Forward Chaining Expert System for Maternal Care

The solution to prevent maternal complications are known and preventable by trained health professionals. But in countries like Ethiopia where the patient to physician ratio is 1 doctor to 1000 patients, maternal mortality and morbidity rate is high. To fill the gap of highly trained health professionals, Ethiopia introduced health extension programs. Task shifting to health extension workers (HEWs) contributed in decreasing mortality and morbidity rate in Ethiopia. Knowledge-gap has been one of the major challenges to HEWs. The reasons are trainings are not given in regular manner, there is no midwife, gynecologists or doctors around for consultation, and all guidelines are paper-based which are easily exposed to damage. In this paper, we describe the design and implementation of a web-based expert system for maternal care. We only targeted the major 10 diseases and complication of maternal health issues seen in Sub-Saharan Africa. The expert system can be accessed through the use of web browsers from computers as well as smart phones. Forward chaining rule-based expert system is used in order to give suggestions and create a new knowledge from the knowledge-base. This expert system can be used to train HEWs in the field of maternal health. Keywords: expert system, maternal care, forward-chaining, rule-based expert system, PHLIPS


[94] 2106.09282

Smart Contract Vulnerability Detection: From Pure Neural Network to Interpretable Graph Feature and Expert Pattern Fusion

Smart contracts hold digital coins worth billions of dollars, their security issues have drawn extensive attention in the past years. Towards smart contract vulnerability detection, conventional methods heavily rely on fixed expert rules, leading to low accuracy and poor scalability. Recent deep learning approaches alleviate this issue but fail to encode useful expert knowledge. In this paper, we explore combining deep learning with expert patterns in an explainable fashion. Specifically, we develop automatic tools to extract expert patterns from the source code. We then cast the code into a semantic graph to extract deep graph features. Thereafter, the global graph feature and local expert patterns are fused to cooperate and approach the final prediction, while yielding their interpretable weights. Experiments are conducted on all available smart contracts with source code in two platforms, Ethereum and VNT Chain. Empirically, our system significantly outperforms state-of-the-art methods. Our code is released.


[95] 2106.09289

MHNF: Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning

Attention mechanism enables the Graph Neural Networks(GNNs) to learn the attention weights between the target node and its one-hop neighbors, the performance is further improved. However, the most existing GNNs are oriented to homogeneous graphs and each layer can only aggregate the information of one-hop neighbors. Stacking multi-layer networks will introduce a lot of noise and easily lead to over smoothing. We propose a Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning method (MHNF). Specifically, we first propose a hybrid metapath autonomous extraction model to efficiently extract multi-hop hybrid neighbors. Then, we propose a hop-level heterogeneous Information aggregation model, which selectively aggregates different-hop neighborhood information within the same hybrid metapath. Finally, a hierarchical semantic attention fusion model (HSAF) is proposed, which can efficiently integrate different-hop and different-path neighborhood information respectively. This paper can solve the problem of aggregating the multi-hop neighborhood information and can learn hybrid metapaths for target task, reducing the limitation of manually specifying metapaths. In addition, HSAF can extract the internal node information of the metapaths and better integrate the semantic information of different levels. Experimental results on real datasets show that MHNF is superior to state-of-the-art methods in node classification and clustering tasks (10.94% - 69.09% and 11.58% - 394.93% relative improvement on average, respectively).


[96] 2106.09291

Towards Understanding Deep Learning from Noisy Labels with Small-Loss Criterion

Deep neural networks need large amounts of labeled data to achieve good performance. In real-world applications, labels are usually collected from non-experts such as crowdsourcing to save cost and thus are noisy. In the past few years, deep learning methods for dealing with noisy labels have been developed, many of which are based on the small-loss criterion. However, there are few theoretical analyses to explain why these methods could learn well from noisy labels. In this paper, we theoretically explain why the widely-used small-loss criterion works. Based on the explanation, we reformalize the vanilla small-loss criterion to better tackle noisy labels. The experimental results verify our theoretical explanation and also demonstrate the effectiveness of the reformalization.


[97] 2106.09292

CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing

We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two particular types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards. Specifically, we develop a local smoothing algorithm which uses a policy derived from Q-functions smoothed with Gaussian noise over each encountered state to guarantee the robustness of actions taken along this trajectory. Next, we develop a global smoothing algorithm for certifying the robustness of a finite-horizon cumulative reward under adversarial state perturbations. Finally, we propose a local smoothing approach which makes use of adaptive search in order to obtain tight certification bounds for reward. We use the proposed RL robustness certification framework to evaluate six methods that have previously been shown to yield empirically robust RL, including adversarial training and several forms of regularization, on two representative Atari games. We show that RegPGD, RegCVX, and RadialRL achieve high certified robustness among these. Furthermore, we demonstrate that our certifications are often tight by evaluating these algorithms against adversarial attacks.


[98] 2106.09296

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper, we propose Voice2Series (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping. Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%. We further provide a theoretical justification of V2S by proving its population risk is upper bounded by the source risk and a Wasserstein distance accounting for feature alignment via reprogramming. Our results offer new and effective means to time series classification.


[99] 2106.09297

Embedding-based Product Retrieval in Taobao Search

Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community.


[100] 2106.09300

Multi-level Motion Attention for Human Motion Prediction

Human motion prediction aims to forecast future human poses given a historical motion. Whether based on recurrent or feed-forward neural networks, existing learning based methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. In this context, we study the use of different types of attention, computed at joint, body part, and full pose levels. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://github.com/wei-mao-2019/HisRepItself.


[101] 2106.09302

How can we learn (more) from challenges? A statistical approach to driving future algorithm development

Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically, little effort has been put into the systematic investigation on what characterizes images in which state-of-the-art algorithms fail. To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Our framework relies on the semantic meta data annotation of images, which serves as foundation for a General Linear Mixed Models (GLMM) analysis. Based on 51,542 meta data annotations performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Segmentation Challenge (ROBUST-MIS) challenge 2019 and revealed underexposure, motion and occlusion of instruments as well as the presence of smoke or other objects in the background as major sources of algorithm failure. Our subsequent method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail. Due to the objectivity and generic applicability of our approach, it could become a valuable tool for validation in the field of medical image analysis and beyond. and segmentation of small, crossing, moving and transparent instrument(s) (parts).


[102] 2106.09305

Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Time series is a special type of sequence data, a set of observations collected at even intervals of time and ordered chronologically. Existing deep learning techniques use generic sequence models (e.g., recurrent neural network, Transformer model, or temporal convolutional network) for time series analysis, which ignore some of its unique properties. For example, the downsampling of time series data often preserves most of the information in the data, while this is not true for general sequence data such as text sequence and DNA sequence. Motivated by the above, in this paper, we propose a novel neural network architecture and apply it for the time series forecasting problem, wherein we conduct sample convolution and interaction at multiple resolutions for temporal modeling. The proposed architecture, namelySCINet, facilitates extracting features with enhanced predictability. Experimental results show that SCINet achieves significant prediction accuracy improvement over existing solutions across various real-world time series forecasting datasets. In particular, it can achieve high fore-casting accuracy for those temporal-spatial datasets without using sophisticated spatial modeling techniques. Our codes and data are presented in the supplemental material.


[103] 2106.09306

PEN4Rec: Preference Evolution Networks for Session-based Recommendation

Session-based recommendation aims to predict user the next action based on historical behaviors in an anonymous session. For better recommendations, it is vital to capture user preferences as well as their dynamics. Besides, user preferences evolve over time dynamically and each preference has its own evolving track. However, most previous works neglect the evolving trend of preferences and can be easily disturbed by the effect of preference drifting. In this paper, we propose a novel Preference Evolution Networks for session-based Recommendation (PEN4Rec) to model preference evolving process by a two-stage retrieval from historical contexts. Specifically, the first-stage process integrates relevant behaviors according to recent items. Then, the second-stage process models the preference evolving trajectory over time dynamically and infer rich preferences. The process can strengthen the effect of relevant sequential behaviors during the preference evolution and weaken the disturbance from preference drifting. Extensive experiments on three public datasets demonstrate the effectiveness and superiority of the proposed model.


[104] 2106.09307

Design of a prototypical platform for autonomous and connected vehicles

Self-driving technology is expected to revolutionize different sectors and is seen as the natural evolution of road vehicles. In the last years, real-world validation of designed and virtually tested solutions is growing in importance since simulated environments will never fully replicate all the aspects that can affect results in the real world. To this end, this paper presents our prototype platform for experimental research on connected and autonomous driving projects. In detail, the paper presents the overall architecture of the vehicle focusing both on mechanical aspects related to remote actuation and sensors set-up and software aspects by means of a comprehensive description of the main algorithms required for autonomous driving as ego-localization, environment perception, motion planning, and actuation. Finally, experimental tests conducted in an urban-like environment are reported to validate and assess the performances of the overall system.


[105] 2106.09308

Characterization and Mitigation of Electromigration Effects in TSV-Based Power Delivery Network Enabled 3D-Stacked DRAMs

With 3D-stacked DRAM architectures becoming more prevalent, it has become important to find ways to characterize and mitigate the adverse effects that can hinder their inherent access parallelism and throughput. One example of such adversities is the electromigration (EM) effects in the through-silicon vias (TSVs) of the power delivery network (PDN) of 3D-stacked DRAM architectures. Several prior works have addressed the effects of EM in TSVs of 3D integrated circuits. However, no prior work has addressed the effects of EM in the PDN TSVs on the performance and lifetime of 3D-stacked DRAMs. In this paper, we characterize the effects of EM in PDN TSVs on a Hybrid Memory Cube (HMC) architecture employing the conventional PDN design with clustered layout of power and ground TSVs. We then present a new PDN design with a distributed layout of power and ground TSVs and show that it can mitigate the adverse effects of EM on the HMC architecture performance without requiring additional power and ground pins. Our benchmark-driven simulation-based analysis shows that compared to the clustered PDN layout, our proposed distributed PDN layout improves the EM-affected lifetime of the HMC architecture by up to 10 years. During this useful lifetime, the HMC architecture yields up to 1.51 times less energy-delay product (EDP).


[106] 2106.09309

Layer Folding: Neural Network Depth Reduction using Activation Linearization

Despite the increasing prevalence of deep neural networks, their applicability in resource-constrained devices is limited due to their computational load. While modern devices exhibit a high level of parallelism, real-time latency is still highly dependent on networks' depth. Although recent works show that below a certain depth, the width of shallower networks must grow exponentially, we presume that neural networks typically exceed this minimal depth to accelerate convergence and incrementally increase accuracy. This motivates us to transform pre-trained deep networks that already exploit such advantages into shallower forms. We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one. We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth. Finally, we use our method to provide more efficient alternatives to MobileNetV2 and EfficientNet-Lite architectures on the ImageNet classification task.


[107] 2106.09315

Fast evaluation of some p-adic transcendental functions

We design algorithms for computing values of many p-adic elementary and special functions, including logarithms, exponentials, polylogarithms, and hypergeometric functions. All our algorithms feature a quasi-linear complexity with respect to the target precision and most of them are based on an adaptation to the-adic setting of the binary splitting and bit-burst strategies.


[108] 2106.09316

Optimized Power Control Design for Over-the-Air Federated Edge Learning

This paper investigates the transmission power control in over-the-air federated edge learning (Air-FEEL) system. Different from conventional power control designs (e.g., to minimize the individual mean squared error (MSE) of the over-the-air aggregation at each round), we consider a new power control design aiming at directly maximizing the convergence speed. Towards this end, we first analyze the convergence behavior of Air-FEEL (in terms of the optimality gap) subject to aggregation errors at different communication rounds. It is revealed that if the aggregation estimates are unbiased, then the training algorithm would converge exactly to the optimal point with mild conditions; while if they are biased, then the algorithm would converge with an error floor determined by the accumulated estimate bias over communication rounds. Next, building upon the convergence results, we optimize the power control to directly minimize the derived optimality gaps under both biased and unbiased aggregations, subject to a set of average and maximum power constraints at individual edge devices. We transform both problems into convex forms, and obtain their structured optimal solutions, both appearing in a form of regularized channel inversion, by using the Lagrangian duality method. Finally, numerical results show that the proposed power control policies achieve significantly faster convergence for Air-FEEL, as compared with benchmark policies with fixed power transmission or conventional MSE minimization.


[109] 2106.09317

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

Recently, there has been an increasing interest in neural speech synthesis. While the deep neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to generate a more emotional and more expressive speech is becoming a new challenge to researchers due to the scarcity of high-quality emotion speech dataset and the lack of advanced emotional TTS model. In this paper, we first briefly introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation. After that, we propose a simple but efficient architecture for emotional speech synthesis called EMSpeech. Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding. In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations. Finally, by showing a comparable performance in the emotional speech synthesis task, we successfully demonstrate the ability of the proposed model.


[110] 2106.09320

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification

In far-field speaker verification, the performance of speaker embeddings is susceptible to degradation when there is a mismatch between the conditions of enrollment and test speech. To solve this problem, we propose the feature-level and instance-level transfer learning in the teacher-student framework to learn a domain-invariant embedding space. For the feature-level knowledge transfer, we develop the contrastive loss to transfer knowledge from teacher model to student model, which can not only decrease the intra-class distance, but also enlarge the inter-class distance. Moreover, we propose the instance-level pairwise distance transfer method to force the student model to preserve pairwise instances distance from the well optimized embedding space of the teacher model. On FFSVC 2020 evaluation set, our EER on Full-eval trials is relatively reduced by 13.9% compared with the fusion system result on Partial-eval trials of Task2. On Task1, compared with the winner's DenseNet result on Partial-eval trials, our minDCF on Full-eval trials is relatively reduced by 6.3%. On Task3, the EER and minDCF of our proposed method on Full-eval trials are very close to the result of the fusion system on Partial-eval trials. Our results also outperform other competitive domain adaptation methods.


[111] 2106.09323

Quantum Software Development Lifecycle

With recent advances in the development of more powerful quantum computers, the research area of quantum software engineering is emerging, having the goal to provide concepts, principles, and guidelines to develop high-quality quantum applications. In classical software engineering, lifecycles are used to document the process of designing, implementing, maintaining, analyzing, and adapting software. Such lifecycles provide a common understanding of how to develop and operate an application, which is especially important due to the interdisciplinary nature of quantum computing. Since today`s quantum applications are, in most cases, hybrid, consisting of quantum and classical programs, the lifecycle for quantum applications must involve the development of both kinds of programs. However, the existing lifecycles only target the development of quantum or classical programs in isolation. Additionally, the various programs must be orchestrated, e.g., using workflows. Thus, the development of quantum applications also incorporates the workflow lifecycle. In this chapter, we analyze the software artifacts usually comprising a quantum application and present their corresponding lifecycles. Furthermore, we identify the points of connection between the various lifecycles and integrate them into the overall quantum software development lifecycle. Therefore, the integrated lifecycle serves as a basis for the development and execution of hybrid quantum applications.


[112] 2106.09325

Central Kurdish machine translation: First large scale parallel corpus and experiments

While the computational processing of Kurdish has experienced a relative increase, the machine translation of this language seems to be lacking a considerable body of scientific work. This is in part due to the lack of resources especially curated for this task. In this paper, we present the first large scale parallel corpus of Central Kurdish-English, Awta, containing 229,222 pairs of manually aligned translations. Our corpus is collected from different text genres and domains in an attempt to build more robust and real-world applications of machine translation. We make a portion of this corpus publicly available in order to foster research in this area. Further, we build several neural machine translation models in order to benchmark the task of Kurdish machine translation. Additionally, we perform extensive experimental analysis of results in order to identify the major challenges that Central Kurdish machine translation faces. These challenges include language-dependent and-independent ones as categorized in this paper, the first group of which are aware of Central Kurdish linguistic properties on different morphological, syntactic and semantic levels. Our best performing systems achieve 22.72 and 16.81 in BLEU score for Ku$\rightarrow$EN and En$\rightarrow$Ku, respectively.


[113] 2106.09326

Towards bio-inspired unsupervised representation learning for indoor aerial navigation

Aerial navigation in GPS-denied, indoor environments, is still an open challenge. Drones can perceive the environment from a richer set of viewpoints, while having more stringent compute and energy constraints than other autonomous platforms. To tackle that problem, this research displays a biologically inspired deep-learning algorithm for simultaneous localization and mapping (SLAM) and its application in a drone navigation system. We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors, that mitigates the sensitivity to perceptual aliasing, and works on power-efficient, embedded hardware. The designed algorithm is evaluated on a dataset collected in an indoor warehouse environment, and initial results show the feasibility for robust indoor aerial navigation.


[114] 2106.09329

Network Science, Homophily and Who Reviews Who in the Linux Kernel?

In this research, we investigate peer review in the development of Linux by drawing on network theory and network analysis. We frame an analytical model which integrates the sociological principle of homophily (i.e., the relational tendency of individuals to establish relationships with similar others) with prior research on peer-review in general and open-source software in particular. We found a relatively strong homophily tendency for maintainers to review other maintainers, but a comparable tendency is surprisingly absent regarding developers' organizational affiliation. Such results mirror the documented norms, beliefs, values, processes, policies, and social hierarchies that characterize the Linux kernel development. Our results underline the power of generative mechanisms from network theory to explain the evolution of peer review networks. Regarding practitioners' concern over the Linux commercialization trend, no relational bias in peer review was found albeit the increasing involvement of firms.


[115] 2106.09330

A Simple Generative Network

Generative neural networks are able to mimic intricate probability distributions such as those of handwritten text, natural images, etc. Since their inception several models were proposed. The most successful of these were based on adversarial (GAN), auto-encoding (VAE) and maximum mean discrepancy (MMD) relatively complex architectures and schemes. Surprisingly, a very simple architecture (a single feed-forward neural network) in conjunction with an obvious optimization goal (Kullback_Leibler divergence) was apparently overlooked. This paper demonstrates that such a model (denoted SGN for its simplicity) is able to generate samples visually and quantitatively competitive as compared with the fore-mentioned state of the art methods.


[116] 2106.09331

A new robotic hand based on the design of fingers with spatial motions

This article presents a new hand architecture with three under-actuated fingers. Each finger performs spatial movements to achieve more complex and varied grasping than the existing planar-movement fingers. The purpose of this hand is to grasp complex-shaped workpieces as they leave the machining centres. Among the taxonomy of grips, cylindrical and spherical grips are often used to grasp heavy objects. A combination of these two modes makes it possible to capture most of the workpieces machined with 5-axis machines. However, the change in grasping mode requires the fingers to reconfigure themselves to perform spatial movements. This solution requires the addition of two or three actuators to change the position of the fingers and requires sensors to recognize the shape of the workpiece and determine the type of grasp to be used. This article proposes to extend the notion of under-actuated fingers to spatial movements. After a presentation of the kinematics of the fingers, the problem of stability is discussed as well as the transmission of forces in this mechanism. The complete approach for calculating the stability conditions is presented from the study of Jacobian force transmission matrices. CAD representations of the hand and its behavior in spherical and cylindrical grips are presented.


[117] 2106.09336

THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers

We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images. Key to our methodology is an intermediate 3d marker representation, where we aim to combine the predictive power of model-free-output architectures and the regularizing, anthropometrically-preserving properties of a statistical human surface model like GHUM -- a recently introduced, expressive full body statistical 3d human model, trained end-to-end. Our novel transformer-based prediction pipeline can focus on image regions relevant to the task, supports self-supervised regimes, and ensures that solutions are consistent with human anthropometry. We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models, for the task of inferring 3d human shape, joint positions, and global translation. Moreover, we observe very solid 3d reconstruction performance for difficult human poses collected in the wild.


[118] 2106.09338

Investigating Misinformation Dissemination on Social Media in Pakistan

Fake news and misinformation are one of the most significant challenges brought about by advances in communication technologies. We chose to research the spread of fake news in Pakistan because of some unfortunate incidents that took place during 2020. These included the downplaying of the severity of the COVID-19 pandemic, and protests by right-wing political movements. We observed that fake news and misinformation contributed significantly to these events and especially affected low-literate and low-income populations. We conducted a cross-platform comparison of misinformation on WhatsApp, Twitter and YouTube with a primary focus on messages shared in public WhatsApp groups, and analysed the characteristics of misinformation, techniques used to make is believable, and how users respond to it. To the best of our knowledge, this is the first attempt to compare misinformation on all three platforms in Pakistan. Data collected over a span of eight months helped us identify fake news and misinformation related to politics, religion and health, among other categories. Common elements which were used by fake news creators in Pakistan to make false content seem believable included: appeals to emotion, conspiracy theories, political and religious polarization, incorrect facts and impersonation of credible sources.


[119] 2106.09339

Optimal explicit stabilized postprocessed $τ$-leap method for the simulation of chemical kinetics

The simulation of chemical kinetics involving multiple scales constitutes a modeling challenge (from ordinary differential equations to Markov chain) and a computational challenge (multiple scales, large dynamical systems, time step restrictions). In this paper we propose a new discrete stochastic simulation algorithm: the postprocessed second kind stabilized orthogonal $\tau$-leap Runge-Kutta method (PSK-$\tau$-ROCK). In the context of chemical kinetics this method can be seen as a stabilization of Gillespie's explicit $\tau$-leap combined with a postprocessor. The stabilized procedure allows to simulate problems with multiple scales (stiff), while the postprocessing procedure allows to approximate the invariant measure (e.g. mean and variance) of ergodic stochastic dynamical systems. We prove stability and accuracy of the PSK-$\tau$-ROCK. Numerical experiments illustrate the high reliability and efficiency of the scheme when compared to other $\tau$-leap methods.


[120] 2106.09343

Lost in Interpreting: Speech Translation from Source or Interpreter?

Interpreters facilitate multi-lingual meetings but the affordable set of languages is often smaller than what is needed. Automatic simultaneous speech translation can extend the set of provided languages. We investigate if such an automatic system should rather follow the original speaker, or an interpreter to achieve better translation quality at the cost of increased delay. To answer the question, we release Europarl Simultaneous Interpreting Corpus (ESIC), 10 hours of recordings and transcripts of European Parliament speeches in English, with simultaneous interpreting into Czech and German. We evaluate quality and latency of speaker-based and interpreter-based spoken translation systems from English to Czech. We study the differences in implicit simplification and summarization of the human interpreter compared to a machine translation system trained to shorten the output to some extent. Finally, we perform human evaluation to measure information loss of each of these approaches.


[121] 2106.09344

Virtual Reality based Digital Twin System for remote laboratories and online practical learning

There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research is currently being undertaken into developing generic models to enable the semi-automatic creation of a virtual learning application. A case study describing the creation of a virtual learning application for an electrical laboratory tutorial is presented.


[122] 2106.09348

Hybrid high-order methods. A primer with application to solid mechanics

This book is organized into eight chapters. The first three gently introduce the basic principles of hybrid high-order methods on a linear diffusion problem, the key ideas underlying the mathematical analysis, and some useful variants of the method as well as links to other methods from the literature. The following four present various challenging applications to solid mechanics, including linear elasticity and hyperelasticity, elastodynamics, contact/friction, and plasticity. The last chapter reviews implementation aspects. This book is primarily intended for graduate students, researchers (in applied mathematics, numerical analysis, and computational mechanics), and engineers working in related fields of application. Basic knowledge of the devising and analysis of finite element methods is assumed. Special effort was made to streamline the presentation so as to pinpoint the essential ideas, address key mathematical aspects, present examples, and provide bibliographic pointers.


[123] 2106.09349

Blockchain Oracle Design Patterns

Blockchain is a form of distributed ledger technology (DLT) where data is shared among users connected over the internet. Transactions are data state changes on the blockchain that are permanently recorded in a secure and transparent way without the need of a third party. Besides, the introduction of smart contracts to the blockchain has added programmability to the blockchain and revolutionized the software ecosystem leading toward decentralized applications (DApps) attracting businesses and organizations to employ this technology. Although promising, blockchains and smart contracts have no access to the external systems (i.e., off-chain) where real-world data and events resides; consequently, the usability of smart contracts in terms of performance and programmability would be limited to the on-chain data. Hence, \emph{blockchain oracles} are introduced to mitigate the issue and are defined as trusted third-party services that send and verify the external information (i.e., feedback) and submit it to smart contracts for triggering state changes in the blockchain. In this paper, we will study and analyze blockchain oracles with regard to how they provide feedback to the blockchain and smart contracts. We classify the blockchain oracle techniques into two major groups such as voting-based strategies and reputation-based ones. The former mainly relies on participants' stakes for outcome finalization while the latter considers reputation in conjunction with authenticity proof mechanisms for data correctness and integrity. We then provide a structured description of patterns in detail for each classification and discuss research directions in the end.


[124] 2106.09350

Identifiability of AMP chain graph models

We study identifiability of Andersson-Madigan-Perlman (AMP) chain graph models, which are a common generalization of linear structural equation models and Gaussian graphical models. AMP models are described by DAGs on chain components which themselves are undirected graphs. For a known chain component decomposition, we show that the DAG on the chain components is identifiable if the determinants of the residual covariance matrices of the chain components are monotone non-decreasing in topological order. This condition extends the equal variance identifiability criterion for Bayes nets, and it can be generalized from determinants to any super-additive function on positive semidefinite matrices. When the component decomposition is unknown, we describe conditions that allow recovery of the full structure using a polynomial time algorithm based on submodular function minimization. We also conduct experiments comparing our algorithm's performance against existing baselines.


[125] 2106.09352

Large Scale Private Learning via Low-rank Reparametrization

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence. Specifically, we reparametrize each weight matrix with two \emph{gradient-carrier} matrices of small dimension and a \emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design \emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9\%$ on four downstream tasks with $\epsilon=8$, which is within $5\%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.


[126] 2106.09354

Retrospective Analysis of Controversial Subtopics on COVID-19 in Japan

For efficient political decision-making in an emergency situation, a thorough recognition and understanding of the polarized topics is crucial. The cost of unmitigated polarization would be extremely high for the society; therefore, it is desirable to identify the polarizing issues before they become serious. With this in mind, we conducted a retrospective analysis of the polarized subtopics of COVID-19 to obtain insights for future policymaking. To this end, we first propose a framework to comprehensively search for controversial subtopics. We then retrospectively analyze subtopics on COVID-19 using the proposed framework, with data obtained via Twitter in Japan. The results show that the proposed framework can effectively detect controversial subtopics that reflect current reality. Controversial subtopics tend to be about the government, medical matters, economy, and education; moreover, the controversy score had a low correlation with the traditional indicators--scale and sentiment of the subtopics--which suggests that the controversy score is a potentially important indicator to be obtained. We also discussed the difference between subtopics that became highly controversial and ones that did not despite their large scale.


[127] 2106.09357

Cat-like Jumping and Landing of Legged Robots in Low-gravity Using Deep Reinforcement Learning

In this article, we show that learned policies can be applied to solve legged locomotion control tasks with extensive flight phases, such as those encountered in space exploration. Using an off-the-shelf deep reinforcement learning algorithm, we trained a neural network to control a jumping quadruped robot while solely using its limbs for attitude control. We present tasks of increasing complexity leading to a combination of three-dimensional (re-)orientation and landing locomotion behaviors of a quadruped robot traversing simulated low-gravity celestial bodies. We show that our approach easily generalizes across these tasks and successfully trains policies for each case. Using sim-to-real transfer, we deploy trained policies in the real world on the SpaceBok robot placed on an experimental testbed designed for two-dimensional micro-gravity experiments. The experimental results demonstrate that repetitive, controlled jumping and landing with natural agility is possible.


[128] 2106.09358

ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks

Deep neural networks have enormous representational power which leads them to overfit on most datasets. Thus, regularizing them is important in order to reduce overfitting and enhance their generalization capabilities. Recently, channel shuffle operation has been introduced for mixing channels in group convolutions in resource efficient networks in order to reduce memory and computations. This paper studies the operation of channel shuffle as a regularization technique in deep convolutional networks. We show that while random shuffling of channels during training drastically reduce their performance, however, randomly shuffling small patches between channels significantly improves their performance. The patches to be shuffled are picked from the same spatial locations in the feature maps such that a patch, when transferred from one channel to another, acts as structured noise for the later channel. We call this method "ShuffleBlock". The proposed ShuffleBlock module is easy to implement and improves the performance of several baseline networks on the task of image classification on CIFAR and ImageNet datasets. It also achieves comparable and in many cases better performance than many other regularization methods. We provide several ablation studies on selecting various hyperparameters of the ShuffleBlock module and propose a new scheduling method that further enhances its performance.


[129] 2106.09362

Frustratingly Easy Transferability Estimation

Transferability estimation has been an essential tool in selecting a pre-trained model and the layers of it to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. We propose a simple, efficient, and effective transferability measure named TransRate. With single pass through the target data, TransRate measures the transferability as the mutual information between the features of target examples extracted by a pre-trained model and labels of them. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. TransRate is theoretically analyzed to be closely related to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 22 pre-trained models and 16 downstream tasks.


[130] 2106.09369

Wavelet-Packet Powered Deepfake Image Detection

As neural networks become more able to generate realistic artificial images, they have the potential to improve movies, music, video games and make the internet an even more creative and inspiring place. Yet, at the same time, the latest technology potentially enables new digital ways to lie. In response, the need for a diverse and reliable toolbox arises to identify artificial images and other content. Previous work primarily relies on pixel-space CNN or the Fourier transform. To the best of our knowledge, wavelet-based gan analysis and detection methods have been absent thus far. This paper aims to fill this gap and describes a wavelet-based approach to gan-generated image analysis and detection. We evaluate our method on FFHQ, CelebA, and LSUN source identification problems and find improved or competitive performance.


[131] 2106.09370

Deep generative modeling for probabilistic forecasting in power systems

Greater direct electrification of end-use sectors with a higher share of renewables is one of the pillars to power a carbon-neutral society by 2050. This study uses a recent deep learning technique, the normalizing flows, to produce accurate probabilistic forecasts that are crucial for decision-makers to face the new challenges in power systems applications. Through comprehensive empirical evaluations using the open data of the Global Energy Forecasting Competition 2014, we demonstrate that our methodology is competitive with other state-of-the-art deep learning generative models: generative adversarial networks and variational autoencoders. The models producing weather-based wind, solar power, and load scenarios are properly compared both in terms of forecast value, by considering the case study of an energy retailer, and quality using several complementary metrics.


[132] 2106.09373

Unsupervised Path Representation Learning with Curriculum Negative Sampling

Path representations are critical in a variety of transportation applications, such as estimating path ranking in path recommendation systems and estimating path travel time in navigation systems. Existing studies often learn task-specific path representations in a supervised manner, which require a large amount of labeled training data and generalize poorly to other tasks. We propose an unsupervised learning framework Path InfoMax (PIM) to learn generic path representations that work for different downstream tasks. We first propose a curriculum negative sampling method, for each input path, to generate a small amount of negative paths, by following the principles of curriculum learning. Next, \emph{PIM} employs mutual information maximization to learn path representations from both a global and a local view. In the global view, PIM distinguishes the representations of the input paths from those of the negative paths. In the local view, \emph{PIM} distinguishes the input path representations from the representations of the nodes that appear only in the negative paths. This enables the learned path representations to encode both global and local information at different scales. Extensive experiments on two downstream tasks, ranking score estimation and travel time estimation, using two road network datasets suggest that PIM significantly outperforms other unsupervised methods and is also able to be used as a pre-training method to enhance supervised path representation learning.


[133] 2106.09375

Recovery under Side Constraints

This paper addresses sparse signal reconstruction under various types of structural side constraints with applications in multi-antenna systems. Side constraints may result from prior information on the measurement system and the sparse signal structure. They may involve the structure of the sensing matrix, the structure of the non-zero support values, the temporal structure of the sparse representationvector, and the nonlinear measurement structure. First, we demonstrate how a priori information in form of structural side constraints influence recovery guarantees (null space properties) using L1-minimization. Furthermore, for constant modulus signals, signals with row-, block- and rank-sparsity, as well as non-circular signals, we illustrate how structural prior information can be used to devise efficient algorithms with improved recovery performance and reduced computational complexity. Finally, we address the measurement system design for linear and nonlinear measurements of sparse signals. Moreover, we discuss the linear mixing matrix design based on coherence minimization. Then we extend our focus to nonlinear measurement systems where we design parallel optimization algorithms to efficiently compute stationary points in the sparse phase retrieval problem with and without dictionary learning.


[134] 2106.09377

A New Dissipativity Condition for Asymptotic Stability of Discounted Economic MPC

Economic Model Predictive Control has recently gained popularity due to its ability to directly optimize a given performance criterion, while enforcing constraint satisfaction for nonlinear systems. Recent research has developed both numerical algorithms and stability analysis for the undiscounted case. The introduction of a discount factor in the cost, however, can be desirable in some cases of interest, e.g., economics, stochastically terminating processes, Markov decision processes, etc. Unfortunately, the stability theory in this case is still not fully developed. In this paper we propose a new dissipativity condition to prove asymptotic stability in the infinite horizon case and we connect our results with existing ones in the literature on discounted economic optimal control. Numerical examples are provided to illustrate the theoretical results.


[135] 2106.09380

Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems

The incremental diffusion of machine learning algorithms in supporting cybersecurity is creating novel defensive opportunities but also new types of risks. Multiple researches have shown that machine learning methods are vulnerable to adversarial attacks that create tiny perturbations aimed at decreasing the effectiveness of detecting threats. We observe that existing literature assumes threat models that are inappropriate for realistic cybersecurity scenarios because they consider opponents with complete knowledge about the cyber detector or that can freely interact with the target systems. By focusing on Network Intrusion Detection Systems based on machine learning, we identify and model the real capabilities and circumstances required by attackers to carry out feasible and successful adversarial attacks. We then apply our model to several adversarial attacks proposed in literature and highlight the limits and merits that can result in actual adversarial attacks. The contributions of this paper can help hardening defensive systems by letting cyber defenders address the most critical and real issues, and can benefit researchers by allowing them to devise novel forms of adversarial attacks based on realistic threat models.


[136] 2106.09383

Area Optimisation of Two Stage Miller Compensated Op-Amp in 65 nm Using Hybrid PSO

Analog circuit design can be formulated as a non-linear constrained optimisation problem that can be solved using any suitable optimisation algorithms. Different optimisation techniques have been reported to reduce the design time of analog circuits. A hybrid particle swarm optimisation algorithm with linearly decreasing inertia weight for the optimisation of analog circuit design is proposed in this study. The proposed method is used to design a two-stage operational amplifier circuit with Miller compensation. The results show that the proposed optimisation method can substantially reduce the design time needed for analog circuits.


[137] 2106.09385

On the Dark Side of Calibration for Modern Neural Networks

Modern neural networks are highly uncalibrated. It poses a significant challenge for safety-critical systems to utilise deep neural networks (DNNs), reliably. Many recently proposed approaches have demonstrated substantial progress in improving DNN calibration. However, they hardly touch upon refinement, which historically has been an essential aspect of calibration. Refinement indicates separability of a network's correct and incorrect predictions. This paper presents a theoretically and empirically supported exposition for reviewing a model's calibration and refinement. Firstly, we show the breakdown of expected calibration error (ECE), into predicted confidence and refinement. Connecting with this result, we highlight that regularisation based calibration only focuses on naively reducing a model's confidence. This logically has a severe downside to a model's refinement. We support our claims through rigorous empirical evaluations of many state of the art calibration approaches on standard datasets. We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement. Even under natural data shift, this calibration-refinement trade-off holds for the majority of calibration methods. These findings call for an urgent retrospective into some popular pathways taken for modern DNN calibration.


[138] 2106.09388

Deep Subdomain Adaptation Network for Image Classification

For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. Previous deep domain adaptation methods mainly learn a global domain shift, i.e., align the global source and target distributions without considering the relationships between two subdomains within the same category of different domains, leading to unsatisfying transfer learning performance without capturing the fine-grained information. Recently, more and more researchers pay attention to Subdomain Adaptation which focuses on accurately aligning the distributions of the relevant subdomains. However, most of them are adversarial methods which contain several loss functions and converge slowly. Based on this, we present Deep Subdomain Adaptation Network (DSAN) which learns a transfer network by aligning the relevant subdomain distributions of domain-specific layer activations across different domains based on a local maximum mean discrepancy (LMMD). Our DSAN is very simple but effective which does not need adversarial training and converges fast. The adaptation can be achieved easily with most feed-forward network models by extending them with LMMD loss, which can be trained efficiently via back-propagation. Experiments demonstrate that DSAN can achieve remarkable results on both object recognition tasks and digit classification tasks. Our code will be available at: https://github.com/easezyc/deep-transfer-learning


[139] 2106.09393

using multiple losses for accurate facial age estimation

Age estimation is an essential challenge in computer vision. With the advances of convolutional neural networks, the performance of age estimation has been dramatically improved. Existing approaches usually treat age estimation as a classification problem. However, the age labels are ambiguous, thus make the classification task difficult. In this paper, we propose a simple yet effective approach for age estimation, which improves the performance compared to classification-based methods. The method combines four classification losses and one regression loss representing different class granularities together, and we name it as Age-Granularity-Net. We validate the Age-Granularity-Net framework on the CVPR Chalearn 2016 dataset, and extensive experiments show that the proposed approach can reduce the prediction error compared to any individual loss. The source code link is https://github.com/yipersevere/age-estimation.


[140] 2106.09394

On temporal homogenization in the numerical simulation of atherosclerotic plaque growth

A temporal homogenization approach for the numerical simulation of atherosclerotic plaque growth is extended to fully coupled fluid-structure interaction (FSI) simulations. The numerical results indicate that the two-scale approach yields significantly different results compared to a simple heuristic averaging, where only stationary long-scale FSI problems are solved, confirming the importance of incorporating stress variations on small time-scales. In the homogenization approach, a periodic fine-scale problem, which is periodic with respect to the heart beat, has to be solved for each long-scale time step. Even if no exact initial conditions are available, periodicity can be achieved within only 2-3 heart beats by simple time-stepping.


[141] 2106.09395

A Self-supervised Method for Entity Alignment

Entity alignment, aiming to identify equivalent entities across different knowledge graphs (KGs), is a fundamental problem for constructing large-scale KGs. Over the course of its development, supervision has been considered necessary for accurate alignments. Inspired by the recent progress of self-supervised learning, we explore the extent to which we can get rid of supervision for entity alignment. Existing supervised methods for this task focus on pulling each pair of positive (labeled) entities close to each other. However, our analysis suggests that the learning of entity alignment can actually benefit more from pushing sampled (unlabeled) negatives far away than pulling positive aligned pairs close. We present SelfKG by leveraging this discovery to design a contrastive learning strategy across two KGs. Extensive experiments on benchmark datasets demonstrate that SelfKG without supervision can match or achieve comparable results with state-of-the-art supervised baselines. The performance of SelfKG demonstrates self-supervised learning offers great potential for entity alignment in KGs.


[142] 2106.09397

Quantized Federated Learning under Transmission Delay and Outage Constraints

Federated learning (FL) has been recognized as a viable distributed learning paradigm which trains a machine learning model collaboratively with massive mobile devices in the wireless edge while protecting user privacy. Although various communication schemes have been proposed to expedite the FL process, most of them have assumed ideal wireless channels which provide reliable and lossless communication links between the server and mobile clients. Unfortunately, in practical systems with limited radio resources such as constraint on the training latency and constraints on the transmission power and bandwidth, transmission of a large number of model parameters inevitably suffers from quantization errors (QE) and transmission outage (TO). In this paper, we consider such non-ideal wireless channels, and carry out the first analysis showing that the FL convergence can be severely jeopardized by TO and QE, but intriguingly can be alleviated if the clients have uniform outage probabilities. These insightful results motivate us to propose a robust FL scheme, named FedTOE, which performs joint allocation of wireless resources and quantization bits across the clients to minimize the QE while making the clients have the same TO probability. Extensive experimental results are presented to show the superior performance of FedTOE for a deep learning-based classification task with transmission latency constraints.


[143] 2106.09398

Episode Adaptive Embedding Networks for Few-shot Learning

Few-shot learning aims to learn a classifier using a few labelled instances for each class. Metric-learning approaches for few-shot learning embed instances into a high-dimensional space and conduct classification based on distances among instance embeddings. However, such instance embeddings are usually shared across all episodes and thus lack the discriminative power to generalize classifiers according to episode-specific features. In this paper, we propose a novel approach, namely \emph{Episode Adaptive Embedding Network} (EAEN), to learn episode-specific embeddings of instances. By leveraging the probability distributions of all instances in an episode at each channel-pixel embedding dimension, EAEN can not only alleviate the overfitting issue encountered in few-shot learning tasks, but also capture discriminative features specific to an episode. To empirically verify the effectiveness and robustness of EAEN, we have conducted extensive experiments on three widely used benchmark datasets, under various combinations of different generic embedding backbones and different classifiers. The results show that EAEN significantly improves classification accuracy about $10\%$ to $20\%$ in different settings over the state-of-the-art methods.


[144] 2106.09399

Data sharing of computer scientists: an analysis of current research information system data

Without sufficient information about researchers data sharing, there is a risk of mismatching FAIR data service efforts with the needs of researchers. This study describes a methodology where departmental publications are used to analyse the ways in which computer scientists share research data. All journal articles published by researchers in the computer science department of the case studys university during 2019 were extracted for scrutiny from the current research information system. For these 193 articles, a coding framework was developed to capture the key elements of acquiring and sharing research data. Furthermore, a rudimentary classification of the main study types exhibited in the investigated articles was developed to accommodate the multidisciplinary nature of the case departments research agenda. Human interaction and intervention studies often collected original data, whereas research on novel computational methods and life sciences more frequently used openly available data. Articles that made data available for reuse were most often in life science studies, whereas data sharing was least frequent in human interaction studies. The use of open code was most frequent in life science studies and novel computational methods. The findings highlight that multidisciplinary research organisations may include diverse subfields that have their own cultures of data sharing, and suggest that research information system-based methods may be valuable additions to the questionnaire and interview methodologies eliciting insight into researchers data sharing. The collected data and coding framework are provided as open data to facilitate future research.


[145] 2106.09402

Class Balancing GAN with a Classifier in the Loop

Generative Adversarial Networks (GANs) have swiftly evolved to imitate increasingly complex image distributions. However, majority of the developments focus on performance of GANs on balanced datasets. We find that the existing GANs and their training regimes which work well on balanced datasets fail to be effective in case of imbalanced (i.e. long-tailed) datasets. In this work we introduce a novel theoretically motivated Class Balancing regularizer for training GANs. Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset. This is achieved via modelling the effective class frequency based on the exponential forgetting observed in neural networks and encouraging the GAN to focus on underrepresented classes. We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets. Specifically, when applied to an unconditional GAN, it improves the FID from $13.03$ to $9.01$ on the long-tailed iNaturalist-$2019$ dataset.


[146] 2106.09403

Density of Free Modules over Finite Chain Rings

In this paper we focus on modules over a finite chain ring $\mathcal{R}$ of size $q^s$. We compute the density of free modules of $\mathcal{R}^n$, where we separately treat the asymptotics in $n,q$ and $s$. In particular, we focus on two cases: one where we fix the type of the module and one where we fix the rank of the module. In both cases, the density results can be bounded by the Andrews-Gordon identities. We also study the asymptotic behaviour of modules generated by random matrices over $\mathcal{R}$. Since linear codes over $\mathcal{R}$ are submodules of $\mathcal{R}^n$ we get direct implications for coding theory. For example, we show that random codes achieve the Gilbert-Varshamov bound with high probability.


[147] 2106.09407

A Fait Accompli? An Empirical Study into the Absence of Consent to Third-Party Tracking in Android~Apps

Third-party tracking allows companies to collect users' behavioural data and track their activity across digital devices. This can put deep insights into users' private lives into the hands of strangers, and often happens without users' awareness or explicit consent. EU and UK data protection law, however, requires consent, both 1) to access and store information on users' devices and 2) to legitimate the processing of personal data as part of third-party tracking, as we analyse in this paper. This paper further investigates whether and to what extent consent is implemented in mobile apps. First, we analyse a representative sample of apps from the Google Play Store. We find that most apps engage in third-party tracking, but few obtained consent before doing so, indicating potentially widespread violations of EU and UK privacy law. Second, we examine the most common third-party tracking libraries in detail. While most acknowledge that they rely on app developers to obtain consent on their behalf, they typically fail to put in place robust measures to ensure this: disclosure of consent requirements is limited; default consent implementations are lacking; and compliance guidance is difficult to find, hard to read, and poorly maintained.


[148] 2106.09408

Predicting cognitive scores with graph neural networks through sample selection learning

Analyzing the relation between intelligence and neural activity is of the utmost importance in understanding the working principles of the human brain in health and disease. In existing literature, functional brain connectomes have been used successfully to predict cognitive measures such as intelligence quotient (IQ) scores in both healthy and disordered cohorts using machine learning models. However, existing methods resort to flattening the brain connectome (i.e., graph) through vectorization which overlooks its topological properties. To address this limitation and inspired from the emerging graph neural networks (GNNs), we design a novel regression GNN model (namely RegGNN) for predicting IQ scores from brain connectivity. On top of that, we introduce a novel, fully modular sample selection method to select the best samples to learn from for our target prediction task. However, since such deep learning architectures are computationally expensive to train, we further propose a \emph{learning-based sample selection} method that learns how to choose the training samples with the highest expected predictive power on unseen samples. For this, we capitalize on the fact that connectomes (i.e., their adjacency matrices) lie in the symmetric positive definite (SPD) matrix cone. Our results on full-scale and verbal IQ prediction outperforms comparison methods in autism spectrum disorder cohorts and achieves a competitive performance for neurotypical subjects using 3-fold cross-validation. Furthermore, we show that our sample selection approach generalizes to other learning-based methods, which shows its usefulness beyond our GNN architecture.


[149] 2106.09420

Remote Health-Monitoring of First Responders over TETRA Links

In this paper, we present a system for remote health-monitoring of first responders over TETRA radio links. The system features a smart garment that periodically records and sends physiological parameters of first responders to a remote agent, which processes the recordings and feeds back the health-status notifications and warnings in the form of electrotactile stimuli. The choice of TETRA as the connectivity solution is driven by its routine use by first responders all over the world, thus representing a convenient and economically-effective connectivity basis. Although the support for data communications in TETRA is rather limited and in practice reduced to the Short Data Service, we show that TETRA can serve the intended purpose in the considered scenario, achieving tolerable delay and message-loss performance. Moreover, when the system is examined and optimized in terms of the peak Age-of-Information, a metric suitable to characterize the quasi-periodic nature of the monitoring process, its performance becomes rather favorable, enabling timely monitoring of the first responders' health status.


[150] 2106.09421

State Estimation with Model Reduction and Shape Variability. Application to biomedical problems

We develop a mathematical and numerical framework to solve state estimation problems for applications that present variations in the shape of the spatial domain. This situation arises typically in a biomedical context where inverse problems are posed on certain organs or portions of the body which inevitably involve morphological variations. If one wants to provide fast reconstruction methods, the algorithms must take into account the geometric variability. We develop and analyze a method which allows to take this variability into account without needing any a priori knowledge on a parametrization of the geometrical variations. For this, we rely on morphometric techniques involving Multidimensional Scaling, and couple them with reconstruction algorithms that make use of reduced model spaces pre-computed on a database of geometries. We prove the potential of the method on a synthetic test problem inspired from the reconstruction of blood flows and quantities of medical interest with Doppler ultrasound imaging.


[151] 2106.09422

CRIL: Continual Robot Imitation Learning via Generative and Prediction Model

Imitation learning (IL) algorithms have shown promising results for robots to learn skills from expert demonstrations. However, for versatile robots nowadays that need to learn diverse tasks, providing and learning the multi-task demonstrations all at once are both difficult. To solve this problem, in this work we study how to realize continual imitation learning ability that empowers robots to continually learn new tasks one by one, thus reducing the burden of multi-task IL and accelerating the process of new task learning at the same time. We propose a novel trajectory generation model that employs both a generative adversarial network and a dynamics prediction model to generate pseudo trajectories from all learned tasks in the new task learning process to achieve continual imitation learning ability. Our experiments on both simulation and real world manipulation tasks demonstrate the effectiveness of our method.


[152] 2106.09424

Interpretable Machine Learning Classifiers for Brain Tumour Survival Prediction

Prediction of survival in patients diagnosed with a brain tumour is challenging because of heterogeneous tumour behaviours and responses to treatment. Better estimations of prognosis would support treatment planning and patient support. Advances in machine learning have informed development of clinical predictive models, but their integration into clinical practice is almost non-existent. One reasons for this is the lack of interpretability of models. In this paper, we use a novel brain tumour dataset to compare two interpretable rule list models against popular machine learning approaches for brain tumour survival prediction. All models are quantitatively evaluated using standard performance metrics. The rule lists are also qualitatively assessed for their interpretability and clinical utility. The interpretability of the black box machine learning models is evaluated using two post-hoc explanation techniques, LIME and SHAP. Our results show that the rule lists were only slightly outperformed by the black box models. We demonstrate that rule list algorithms produced simple decision lists that align with clinical expertise. By comparison, post-hoc interpretability methods applied to black box models may produce unreliable explanations of local model predictions. Model interpretability is essential for understanding differences in predictive performance and for integration into clinical practice.


[153] 2106.09431

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an elegant architecture combining graph convolutions with global feature pooling to extract local features. During training, the model is incentivized to create realistic deformations by approximating geodesics on the underlying shape space manifold. This strong geometric prior allows to train our model end-to-end and in a fully unsupervised manner without requiring any manual correspondence annotations. NeuroMorph works well for a large variety of input shapes, including non-isometric pairs from different object categories. It obtains state-of-the-art results for both shape correspondence and interpolation tasks, matching or surpassing the performance of recent unsupervised and supervised methods on multiple benchmarks.


[154] 2106.09432

Unsupervised Training Data Generation of Handwritten Formulas using Generative Adversarial Networks with Self-Attention

The recognition of handwritten mathematical expressions in images and video frames is a difficult and unsolved problem yet. Deep convectional neural networks are basically a promising approach, but typically require a large amount of labeled training data. However, such a large training dataset does not exist for the task of handwritten formula recognition. In this paper, we introduce a system that creates a large set of synthesized training examples of mathematical expressions which are derived from LaTeX documents. For this purpose, we propose a novel attention-based generative adversarial network to translate rendered equations to handwritten formulas. The datasets generated by this approach contain hundreds of thousands of formulas, making it ideal for pretraining or the design of more complex models. We evaluate our synthesized dataset and the recognition approach on the CROHME 2014 benchmark dataset. Experimental results demonstrate the feasibility of the approach.


[155] 2106.09433

Towards Heterogeneous Clients with Elastic Federated Learning

Federated learning involves training machine learning models over devices or data silos, such as edge processors or data warehouses, while keeping the data local. Training in heterogeneous and potentially massive networks introduces bias into the system, which is originated from the non-IID data and the low participation rate in reality. In this paper, we propose Elastic Federated Learning (EFL), an unbiased algorithm to tackle the heterogeneity in the system, which makes the most informative parameters less volatile during training, and utilizes the incomplete local updates. It is an efficient and effective algorithm that compresses both upstream and downstream communications. Theoretically, the algorithm has convergence guarantee when training on the non-IID data at the low participation rate. Empirical experiments corroborate the competitive performance of EFL framework on the robustness and the efficiency.


[156] 2106.09435

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.


[157] 2106.09436

Semi-Autoregressive Transformer for Image Captioning

Current state-of-the-art image captioning models adopt autoregressive decoders, \ie they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. To tackle this issue, non-autoregressive image captioning models have recently been proposed to significantly accelerate the speed of inference by generating all words in parallel. However, these non-autoregressive models inevitably suffer from large generation quality degradation since they remove words dependence excessively. To make a better trade-off between speed and quality, we introduce a semi-autoregressive model for image captioning~(dubbed as SATIC), which keeps the autoregressive property in global but generates words parallelly in local. Based on Transformer, there are only a few modifications needed to implement SATIC. Extensive experiments on the MSCOCO image captioning benchmark show that SATIC can achieve a better trade-off without bells and whistles. Code is available at {\color{magenta}\url{https://github.com/YuanEZhou/satic}}.


[158] 2106.09440

ÐArcher: Detecting On-Chain-Off-Chain Synchronization Bugs in Decentralized Applications

Since the emergence of Ethereum, blockchain-based decentralized applications (DApps) have become increasingly popular and important. To balance the security, performance, and costs, a DApp typically consists of two layers: an on-chain layer to execute transactions and store crucial data on the blockchain and an off-chain layer to interact with users. A DApp needs to synchronize its off-chain layer with the on-chain layer proactively. Otherwise, the inconsistent data in the off-chain layer could mislead users and cause undesirable consequences, e.g., loss of transaction fees. However, transactions sent to the blockchain are not guaranteed to be executed and could even be reversed after execution due to chain reorganization. Such non-determinism in the transaction execution is unique to blockchain. DApp developers may fail to perform the on-chain-off-chain synchronization accurately due to their lack of familiarity with the complex transaction lifecycle. In this work, we investigate the challenges of synchronizing on-chain and off-chain data in Ethereum-based DApps. We present two types of bugs that could result in inconsistencies between the on-chain and off-chain layers. To help detect such on-chain-off-chain synchronization bugs, we introduce a state transition model to guide the testing of DApps and propose two effective oracles to facilitate the automatic identification of bugs. We build the first testing framework, DArcher, to detect on-chain-off-chain synchronization bugs in DApps. We have evaluated DArcher on 11 popular real-world DApps. DArcher achieves high precision (99.3%), recall (87.6%), and accuracy (89.4%) in bug detection and significantly outperforms the baseline methods. It has found 15 real bugs in the 11 DApps. So far, six of the 15 bugs have been confirmed by the developers, and three have been fixed. These promising results demonstrate the usefulness of DArcher.


[159] 2106.09442

Dynamic Metasurface Antennas for Energy Efficient Massive MIMO Uplink Communications

Future wireless communications are largely inclined to deploy a massive number of antennas at the base stations (BS) by exploiting energy-efficient and environmentally friendly technologies. An emerging technology called dynamic metasurface antennas (DMAs) is promising to realize such massive antenna arrays with reduced physical size, hardware cost, and power consumption. This paper aims to optimize the energy efficiency (EE) performance of DMAs-assisted massive MIMO uplink communications. We propose an algorithmic framework for designing the transmit precoding of each multi-antenna user and the DMAs tuning strategy at the BS to maximize the EE performance, considering the availability of the instantaneous and statistical channel state information (CSI), respectively. Specifically, the proposed framework includes Dinkelbach's transform, alternating optimization, and deterministic equivalent methods. In addition, we obtain a closed-form solution to the optimal transmit signal directions for the statistical CSI case, which simplifies the corresponding transmission design. The numerical results show good convergence performance of our proposed algorithms as well as considerable EE performance gains of the DMAs-assisted massive MIMO uplink communications over the baseline schemes.


[160] 2106.09445

A structure-preserving surrogate model for the closure of the moment system of the Boltzmann equation using convex deep neural networks

Direct simulation of physical processes on a kinetic level is prohibitively expensive in aerospace applications due to the extremely high dimension of the solution spaces. In this paper, we consider the moment system of the Boltzmann equation, which projects the kinetic physics onto the hydrodynamic scale. The unclosed moment system can be solved in conjunction with the entropy closure strategy. Using an entropy closure provides structural benefits to the physical system of partial differential equations. Usually computing such closure of the system spends the majority of the total computational cost, since one needs to solve an ill-conditioned constrained optimization problem. Therefore, we build a neural network surrogate model to close the moment system, which preserves the structural properties of the system by design, but reduces the computational cost significantly. Numerical experiments are conducted to illustrate the performance of the current method in comparison to the traditional closure.


[161] 2106.09449

DocNLI: A Large-scale Dataset for Document-level Natural Language Inference

Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems such as relation extraction, question answering, summarization, etc. It has been studied intensively in the past few years thanks to the availability of large-scale labeled datasets. However, most existing studies focus on merely sentence-level inference, which limits the scope of NLI's application in downstream NLP problems. This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI. DocNLI is transformed from a broad range of NLP problems and covers multiple genres of text. The premises always stay in the document granularity, whereas the hypotheses vary in length from single sentences to passages with hundreds of words. Additionally, DocNLI has pretty limited artifacts which unfortunately widely exist in some popular sentence-level NLI datasets. Our experiments demonstrate that, even without fine-tuning, a model pretrained on DocNLI shows promising performance on popular sentence-level benchmarks, and generalizes well to out-of-domain NLP tasks that rely on inference at document granularity. Task-specific fine-tuning can bring further improvements. Data, code, and pretrained models can be found at https://github.com/salesforce/DocNLI.


[162] 2106.09450

Simultaneous Transmission and Reflection Reconfigurable Intelligent Surface Assisted MIMO Systems

In this work, we investigate a novel simultaneous transmission and reflection reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output downlink system, where three practical transmission protocols, namely, energy splitting (ES), mode selection (MS), and time splitting (TS), are studied. For the system under consideration, we maximize the weighted sum rate with multiple coupled variables. To solve this optimization problem, a block coordinate descent algorithm is proposed to reformulate this problem and design the precoding matrices and the transmitting and reflecting coefficients (TARCs) in an alternate manner. Specifically, for the ES scheme, the precoding matrices are solved using the Lagrange dual method, while the TARCs are obtained using the penalty concave-convex method. Additionally, the proposed method is extended to the MS scheme by solving a mixed-integer problem. Moreover, we solve the formulated problem for the TS scheme using a one-dimensional search and the Majorization-Minimization technique. Our simulation results reveal that: 1) Simultaneous transmission and reflection RIS (STAR-RIS) can achieve better performance than reflecting-only RIS; 2) In unicast communication, TS scheme outperforms the ES and MS schemes, while in broadcast communication, ES scheme outperforms the TS and MS schemes.


[163] 2106.09453

Learning to Associate Every Segment for Video Panoptic Segmentation

Temporal correspondence - linking pixels or objects across frames - is a fundamental supervisory signal for the video models. For the panoptic understanding of dynamic scenes, we further extend this concept to every segment. Specifically, we aim to learn coarse segment-level matching and fine pixel-level matching together. We implement this idea by designing two novel learning objectives. To validate our proposals, we adopt a deep siamese model and train the model to learn the temporal correspondence on two different levels (i.e., segment and pixel) along with the target task. At inference time, the model processes each frame independently without any extra computation and post-processing. We show that our per-frame inference model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets. Moreover, due to its high efficiency, the model runs in a fraction of time (3x) compared to the previous state-of-the-art approach.


[164] 2106.09455

Conference proceedings KI4Industry AI for SMEs -- the online congress for practical entry into AI for SMEs

The Institute of Materials and Processes, IMP, of the University of Applied Sciences in Karlsruhe, Germany in cooperation with VDI Verein Deutscher Ingenieure e.V, AEN Automotive Engineering Network and their cooperation partners present their competences of AI-based solution approaches in the production engineering field. The online congress KI 4 Industry on November 12 and 13, 2020, showed what opportunities the use of artificial intelligence offers for medium-sized manufacturing companies, SMEs, and where potential fields of application lie. The main purpose of KI 4 Industry is to increase the transfer of knowledge, research and technology from universities to small and medium-sized enterprises, to demystify the term AI and to encourage companies to use AI-based solutions in their own value chain or in their products.


[165] 2106.09460

DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text

This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments. The dataset was annotated for sentiment analysis and offensive language identification for a total of more than 60,000 YouTube comments. The dataset consists of around 44,000 comments in Tamil-English, around 7,000 comments in Kannada-English, and around 20,000 comments in Malayalam-English. The data was manually annotated by volunteer annotators and has a high inter-annotator agreement in Krippendorff's alpha. The dataset contains all types of code-mixing phenomena since it comprises user-generated content from a multilingual country. We also present baseline experiments to establish benchmarks on the dataset using machine learning methods. The dataset is available on Github (https://github.com/bharathichezhiyan/DravidianCodeMix-Dataset) and Zenodo (https://zenodo.org/record/4750858\#.YJtw0SYo\_0M).


[166] 2106.09461

Modelling resource allocation in uncertain system environment through deep reinforcement learning

Reinforcement Learning has applications in field of mechatronics, robotics, and other resource-constrained control system. Problem of resource allocation is primarily solved using traditional predefined techniques and modern deep learning methods. The drawback of predefined and most deep learning methods for resource allocation is failing to meet the requirements in cases of uncertain system environment. We can approach problem of resource allocation in uncertain system environment alongside following certain criteria using deep reinforcement learning. Also, reinforcement learning has ability for adapting to new uncertain environment for prolonged period of time. The paper provides a detailed comparative analysis on various deep reinforcement learning methods by applying different components to modify architecture of reinforcement learning with use of noisy layers, prioritized replay, bagging, duelling networks, and other related combination to obtain improvement in terms of performance and reduction of computational cost. The paper identifies problem of resource allocation in uncertain environment could be effectively solved using Noisy Bagging duelling double deep Q network achieving efficiency of 97.7% by maximizing reward with significant exploration in given simulated environment for resource allocation.


[167] 2106.09462

pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks

Extracting opinions from texts has gathered a lot of interest in the last years, as we are experiencing an unprecedented volume of user-generated content in social networks and other places. A problem that social researchers find in using opinion mining tools is that they are usually behind commercial APIs and unavailable for other languages than English. To address these issues, we present pysentimiento, a multilingual Python toolkit for Sentiment Analysis and other Social NLP tasks. This open-source library brings state-of-the-art models for Spanish and English in a black-box fashion, allowing researchers to easily access these techniques.


[168] 2106.09467

Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigation in machine learning, the topic leads to contentious debates on how to ensure fairness in practice (data bias versus algorithmic bias). Distributionally Robust Optimization (DRO) seemingly addresses this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset. The results cover finite and infinite number of training distributions, as well as convex and non-convex loss functions. We show that neither DRO nor curating the training set should be construed as a complete solution for bias mitigation: in the same way that there is no universally robust training set, there is no universal way to setup a DRO problem and ensure a socially acceptable set of results. We then leverage these insights to provide a mininal set of practical recommendations for addressing bias with DRO. Finally, we discuss ramifications of our results in other related applications of DRO, using an example of adversarial robustness. Our results show that there is merit to both the algorithm-focused and the data-focused side of the bias debate, as long as arguments in favor of these positions are precisely qualified and backed by relevant mathematics known today.


[169] 2106.09469

A posteriori estimator for the adaptive solution of a quasi-static fracture phase-field model with irreversibility constraints

Within this article, we develop a residual type a posteriori error estimator for a time discrete quasi-static phase-field fracture model. Particular emphasize is given to the robustness of the error estimator for the variational inequality governing the phase-field evolution with respect to the phase-field regularization parameter $\epsilon$. The article concludes with numerical examples highlighting the performance of the proposed a posteriori error estimators on three standard test cases; the single edge notched tension and shear test as well as the L-shaped panel test.


[170] 2106.09475

Backward Gradient Normalization in Deep Neural Networks

We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.


[171] 2106.09482

Elicitation of Adaptive Requirements Using Creativity Triggers: A Controlled Experiment

Adaptive systems react to changes in their environment by changing their behavior. Identifying these needed adaptations is very difficult, but central to requirements elicitation for adaptive systems. As the necessary or potential adaptations are typically not obvious to the stakeholders, the problem is how to effectively elicit adaptation-relevant information. One approach is to use creativity techniques to support the systematic identification and elicitation of adaptation requirements. In particular, here, we analyze a set of creativity triggers defined for systematic exploration of potential adaptation requirements. We compare these triggers with brainstorming as a baseline in a controlled experiment with 85 master students. The results indicate that the proposed triggers are suitable for the efficient elicitation of adaptive requirements and that the 15 trigger questions produce significantly more requirements fragments than solo brainstorming.


[172] 2106.09485

Secure Multi-Function Computation with Private Remote Sources

We consider a distributed function computation problem in which parties observing noisy versions of a remote source facilitate the computation of a function of their observations at a fusion center through public communication. The distributed function computation is subject to constraints, including not only reliability and storage but also privacy and secrecy. Specifically, 1) the remote source should remain private from an eavesdropper and the fusion center, measured in terms of the information leaked about the remote source; 2) the function computed should remain secret from the eavesdropper, measured in terms of the information leaked about the arguments of the function, to ensure secrecy regardless of the exact function used. We derive the exact rate regions for lossless and lossy single-function computation and illustrate the lossy single-function computation rate region for an information bottleneck example, in which the optimal auxiliary random variables are characterized for binary-input symmetric-output channels. We extend the approach to lossless and lossy asynchronous multiple-function computations with joint secrecy and privacy constraints, in which case inner and outer bounds for the rate regions differing only in the Markov chain conditions imposed are characterized.


[173] 2106.09486

Deep HDR Hallucination for Inverse Tone Mapping

Inverse Tone Mapping (ITM) methods attempt to reconstruct High Dynamic Range (HDR) information from Low Dynamic Range (LDR) image content. The dynamic range of well-exposed areas must be expanded and any missing information due to over/under-exposure must be recovered (hallucinated). The majority of methods focus on the former and are relatively successful, while most attempts on the latter are not of sufficient quality, even ones based on Convolutional Neural Networks (CNNs). A major factor for the reduced inpainting quality in some works is the choice of loss function. Work based on Generative Adversarial Networks (GANs) shows promising results for image synthesis and LDR inpainting, suggesting that GAN losses can improve inverse tone mapping results. This work presents a GAN-based method that hallucinates missing information from badly exposed areas in LDR images and compares its efficacy with alternative variations. The proposed method is quantitatively competitive with state-of-the-art inverse tone mapping methods, providing good dynamic range expansion for well-exposed areas and plausible hallucinations for saturated and under-exposed areas. A density-based normalisation method, targeted for HDR content, is also proposed, as well as an HDR data augmentation method targeted for HDR hallucination.


[174] 2106.09487

Synthesizing Modular Manipulators For Tasks With Time, Obstacle, And Torque Constraints

Modular robots can be tailored to achieve specific tasks and rearranged to achieve previously infeasible ones. The challenge is choosing an appropriate design from a large search space. In this work, we describe a framework that automatically synthesizes the design and controls for a serial chain modular manipulator given a task description. The task includes points to be reached in the 3D space, time constraints, a load to be sustained at the end-effector, and obstacles to be avoided in the environment. These specifications are encoded as a constrained optimization in the robot's kinematics and dynamics and, if a solution is found, the formulation returns the specific design and controls to perform the task. Finally, we demonstrate our approach on a complex specification in which the robot navigates a constrained environment while holding an object.


[175] 2106.09493

Scalable Approach for Normalizing E-commerce Text Attributes (SANTA)

In this paper, we present SANTA, a scalable framework to automatically normalize E-commerce attribute values (e.g. "Win 10 Pro") to a fixed set of pre-defined canonical values (e.g. "Windows 10"). Earlier works on attribute normalization focused on fuzzy string matching (also referred as syntactic matching in this paper). In this work, we first perform an extensive study of nine syntactic matching algorithms and establish that 'cosine' similarity leads to best results, showing 2.7% improvement over commonly used Jaccard index. Next, we argue that string similarity alone is not sufficient for attribute normalization as many surface forms require going beyond syntactic matching (e.g. "720p" and "HD" are synonyms). While semantic techniques like unsupervised embeddings (e.g. word2vec/fastText) have shown good results in word similarity tasks, we observed that they perform poorly to distinguish between close canonical forms, as these close forms often occur in similar contexts. We propose to learn token embeddings using a twin network with triplet loss. We propose an embedding learning task leveraging raw attribute values and product titles to learn these embeddings in a self-supervised fashion. We show that providing supervision using our proposed task improves over both syntactic and unsupervised embeddings based techniques for attribute normalization. Experiments on a real-world attribute normalization dataset of 50 attributes show that the embeddings trained using our proposed approach obtain 2.3% improvement over best string matching and 19.3% improvement over best unsupervised embeddings.


[176] 2106.09500

Making Sense of Complex Sensor Data Streams

This concept paper draws from our previous research on individual grip force data collected from biosensors placed on specific anatomical locations in the dominant and non dominant hands of operators performing a robot assisted precision grip task for minimally invasive endoscopic surgery. The specificity of the robotic system on the one hand, and that of the 2D image guided task performed in a real world 3D space on the other, constrain the individual hand and finger movements during task performance in a unique way. Our previous work showed task specific characteristics of operator expertise in terms of specific grip force profiles, which we were able to detect in thousands of highly variable individual data. This concept paper is focused on two complementary data analysis strategies that allow achieving such a goal. In contrast with other sensor data analysis strategies aimed at minimizing variance in the data, it is in this case here necessary to decipher the meaning of the full extent of intra and inter individual variance in the sensor data by using the appropriate statistical analyses, as shown in the first part of this paper. Then, it is explained how the computation of individual spatio temporal grip force profiles permits detecting expertise specific differences between individual users. It is concluded that these two analytic strategies are complementary. They enable drawing meaning from thousands of biosensor data reflecting human grip performance and its evolution with training, while fully taking into account their considerable inter and intra individual variability.


[177] 2106.09501

DeepInsight: Interpretability Assisting Detection of Adversarial Samples on Graphs

With the rapid development of artificial intelligence, a series of machine learning algorithms, e.g., graph neural networks, have been proposed to facilitate network analysis or graph data mining. Unfortunately, recent studies indicate that such advanced methods may suffer from adversarial attacks, i.e., they may lose effectiveness when only a small fraction of links are purposely changed. However, little is known what's the difference between adversarial nodes and clean nodes, and what's the preference of each attack method, in terms of network structure. In this paper, we theoretically investigate three well-known adversarial attack methods, i.e., Nettack, Meta Attack, and GradArgmax, and find that different attack methods have their specific attack preferences on changing network structure. Such attack patterns are further validated by the experimental results on real-world networks, i.e., generally the top 4 most important network attributes on detecting adversarial samples are sufficient to explain the preference of each attack method. Based on these findings, we further utilize the network attributes to design machine learning models for adversarial sample detection and attack method recognition, achieving the outstanding performance.


[178] 2106.09502

Biomedical Interpretable Entity Representations

Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important domains such as biomedicine. There has been recent work on general interpretable representation learning (Onoe and Durrett, 2020), but these domain-agnostic representations do not readily transfer to the important domain of biomedicine. In this paper, we create a new entity type system and training set from a large corpus of biomedical texts by mapping entities to concepts in a medical ontology, and from these to Wikipedia pages whose categories are our types. From this mapping we derive Biomedical Interpretable Entity Representations(BIERs), in which dimensions correspond to fine-grained entity types, and values are predicted probabilities that a given entity is of the corresponding type. We propose a novel method that exploits BIER's final sparse and intermediate dense representations to facilitate model and entity type debugging. We show that BIERs achieve strong performance in biomedical tasks including named entity disambiguation and entity label classification, and we provide error analysis to highlight the utility of their interpretability, particularly in low-supervision settings. Finally, we provide our induced 68K biomedical type system, the corresponding 37 million triples of derived data used to train BIER models and our best performing model.


[179] 2106.09503

The Parity Ray Regularizer for Pacing in Auction Markets

Budget-management systems are one of the key components of modern auction markets. Internet advertising platforms typically offer advertisers the possibility to pace the rate at which their budget is depleted, through budget-pacing mechanisms. We focus on multiplicative pacing mechanisms in an online setting in which a bidder is repeatedly confronted with a series of advertising opportunities. After collecting bids, each item is then allocated through a single-item, second-price auction. If there were no budgetary constraints, bidding truthfully would be an optimal choice for the advertiser. However, since their budget is limited, the advertiser may want to shade their bid downwards in order to preserve their budget for future opportunities, and to spread expenditures evenly over time. The literature on online pacing problems mostly focuses on the setting in which the bidder optimizes an additive separable objective, such as the total click-through rate or the revenue of the allocation. In many settings, however, bidders may also care about other objectives which oftentimes are non-separable, and therefore not amenable to traditional online learning techniques. Building on recent work, we study the frequent case in which advertisers seek to reach a certain distribution of impressions over a target population of users. We introduce a novel regularizer to achieve this desideratum, and show how to integrate it into an online mirror descent scheme attaining the optimal order of sub-linear regret compared to the optimal allocation in hindsight when inputs are drawn independently, from an unknown distribution. Moreover, we show that our approach can easily be incorporated in standard existing pacing systems that are not usually built for this objective. The effectiveness of our algorithm in internet advertising applications is confirmed by numerical experiments on real-world data.


[180] 2106.09508

KIT Bus: A Shuttle Model for CARLA Simulator

With the continuous development of science and technology, self-driving vehicles will surely change the nature of transportation and realize the automotive industry's transformation in the future. Compared with self-driving cars, self-driving buses are more efficient in carrying passengers and more environmentally friendly in terms of energy consumption. Therefore, it is speculated that in the future, self-driving buses will become more and more important. As a simulator for autonomous driving research, the CARLA simulator can help people accumulate experience in autonomous driving technology faster and safer. However, a shortcoming is that there is no modern bus model in the CARLA simulator. Consequently, people cannot simulate autonomous driving on buses or the scenarios interacting with buses. Therefore, we built a bus model in 3ds Max software and imported it into the CARLA to fill this gap. Our model, namely KIT bus, is proven to work in the CARLA by testing it with the autopilot simulation. The video demo is shown on our Youtube.


[181] 2106.09509

Resurrect3D: An Open and Customizable Platform for Visualizing and Analyzing Cultural Heritage Artifacts

Art and culture, at their best, lie in the act of discovery and exploration. This paper describes Resurrect3D, an open visualization platform for both casual users and domain experts to explore cultural artifacts. To that end, Resurrect3D takes two steps. First, it provides an interactive cultural heritage toolbox, providing not only commonly used tools in cultural heritage such as relighting and material editing, but also the ability for users to create an interactive "story": a saved session with annotations and visualizations others can later replay. Second, Resurrect3D exposes a set of programming interfaces to extend the toolbox. Domain experts can develop custom tools that perform artifact-specific visualization and analysis.


[182] 2106.09513

The promise of energy-efficient battery-powered urban aircraft

Improvements in rechargeable batteries are enabling several electric urban air mobility (UAM) aircraft designs with up to 300 miles of range with payload equivalents of up to 7 passengers. We find that novel UAM aircraft consume between 130 Wh/passenger-mile up to ~1,200 Wh/passenger-mile depending on the design and utilization, relative to an expected consumption of over 220 Wh/passenger-mi for terrestrial electric vehicles and 1,000 Wh/passenger-mile for combustion engine vehicles. We also find that several UAM aircraft designs are approaching technological viability with current Li-ion batteries, based on the specific power-and-energy while rechargeability and lifetime performance remain uncertain. These aspects highlight the technological readiness of a new segment of transportation.


[183] 2106.09516

Transductive Few-Shot Learning: Clustering is All You Need?

We investigate a general formulation for clustering and transductive few-shot learning, which integrates prototype-based objectives, Laplacian regularization and supervision constraints from a few labeled data points. We propose a concave-convex relaxation of the problem, and derive a computationally efficient block-coordinate bound optimizer, with convergence guarantee. At each iteration,our optimizer computes independent (parallel) updates for each point-to-cluster assignment. Therefore, it could be trivially distributed for large-scale clustering and few-shot tasks. Furthermore, we provides a thorough convergence analysis based on point-to-set maps. Were port comprehensive clustering and few-shot learning experiments over various data sets, showing that our method yields competitive performances, in term of accuracy and optimization quality, while scaling up to large problems. Using standard training on the base classes, without resorting to complex meta-learning and episodic-training strategies, our approach outperforms state-of-the-art few-shot methods by significant margins, across various models, settings and data sets. Surprisingly, we found that even standard clustering procedures (e.g., K-means), which correspond to particular, non-regularized cases of our general model, already achieve competitive performances in comparison to the state-of-the-art in few-shot learning. These surprising results point to the limitations of the current few-shot benchmarks, and question the viability of a large body of convoluted few-shot learning techniques in the recent literature.


[184] 2106.09517

Dynamic Knowledge Distillation with A Single Stream Structure for RGB-DSalient Object Detection

RGB-D salient object detection(SOD) demonstrates its superiority on detecting in complex environments due to the additional depth information introduced in the data. Inevitably, an independent stream is introduced to extract features from depth images, leading to extra computation and parameters. This methodology which sacrifices the model size to improve the detection accuracy may impede the practical application of SOD problems. To tackle this dilemma, we propose a dynamic distillation method along with a lightweight framework, which significantly reduces the parameters. This method considers the factors of both teacher and student performance within the training stage and dynamically assigns the distillation weight instead of applying a fixed weight on the student model. Extensive experiments are conducted on five public datasets to demonstrate that our method can achieve competitive performance compared to 10 prior methods through a 78.2MB lightweight structure.


[185] 2106.09518

Multi-Layered Blockchain Governance Game

This paper deals with design of an integrated secure Blockchain network framework to prevent damages from attackers. The multi-layer concept which could handle multiple number of networks is adapted on the top of Blockchain Governance Game frameworks. This new integrated theoretical model is designed to find the best strategies toward preparation for preventing whole network systems malfunction from attackers and it is developed based on the combination of the Blockchain Governance Game and the Strategic Alliance for Blockchain Governance Game. Analytically tractable results for executing a safety mode are fully obtained and simulated results are demonstrated to obtain the optimal values of hyper parameters of a Blockchain based security network. This research helps those whom are constructing a multiple layer network by enhancing security features through multi-layer framework in a decentralized network.


[186] 2106.09524

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow. Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. To fully complete our analysis, we provide convergence guarantees for the dynamics. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and they help explain the greater performances observed in practice of stochastic gradient descent over gradient descent.


[187] 2106.09526

Exploring the Properties and Evolution of Neural Network Eigenspaces during Training

In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given task. We further show that the observed effects are independent from previously reported pathological patterns like the ``tail pattern'' described in \cite{featurespace_saturation}. Finally we are able to show that saturation patterns converge early during training, allowing for a quicker cycle time during analysis


[188] 2106.09527

Federated Learning for Intrusion Detection System: Concepts, Challenges and Future Directions

The rapid development of the Internet and smart devices trigger surge in network traffic making its infrastructure more complex and heterogeneous. The predominated usage of mobile phones, wearable devices and autonomous vehicles are examples of distributed networks which generate huge amount of data each and every day. The computational power of these devices have also seen steady progression which has created the need to transmit information, store data locally and drive network computations towards edge devices. Intrusion detection systems play a significant role in ensuring security and privacy of such devices. Machine Learning and Deep Learning with Intrusion Detection Systems have gained great momentum due to their achievement of high classification accuracy. However the privacy and security aspects potentially gets jeopardised due to the need of storing and communicating data to centralized server. On the contrary, federated learning (FL) fits in appropriately as a privacy-preserving decentralized learning technique that does not transfer data but trains models locally and transfers the parameters to the centralized server. The present paper aims to present an extensive and exhaustive review on the use of FL in intrusion detection system. In order to establish the need for FL, various types of IDS, relevant ML approaches and its associated issues are discussed. The paper presents detailed overview of the implementation of FL in various aspects of anomaly detection. The allied challenges of FL implementations are also identified which provides idea on the scope of future direction of research. The paper finally presents the plausible solutions associated with the identified challenges in FL based intrusion detection system implementation acting as a baseline for prospective research.


[189] 2106.09533

Author Clustering and Topic Estimation for Short Texts

Analysis of short text, such as social media posts, is extremely difficult because it relies on observing many document-level word co-occurrence pairs. Beyond topic distributions, a common downstream task of the modeling is grouping the authors of these documents for subsequent analyses. Traditional models estimate the document groupings and identify user clusters with an independent procedure. We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document, with user-level topic distributions. We also simultaneously cluster users, removing the need for post-hoc cluster estimation and improving topic estimation by shrinking noisy user-level topic distributions towards typical values. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text, and we demonstrate its usefulness on a dataset of tweets from United States Senators, recovering both meaningful topics and clusters that reflect partisan ideology.


[190] 2106.09534

Adversarial Visual Robustness by Causal Intervention

Adversarial training is the de facto most promising defense against adversarial examples. Yet, its passive nature inevitably prevents it from being immune to unknown attackers. To achieve a proactive defense, we need a more fundamental understanding of adversarial examples, beyond the popular bounded threat model. In this paper, we provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning, where attackers are precisely exploiting the confounding effect. Therefore, a fundamental solution for adversarial robustness is causal intervention. As the confounder is unobserved in general, we propose to use the instrumental variable that achieves intervention without the need for confounder observation. We term our robust training method as Causal intervention by instrumental Variable (CiiV). It has a differentiable retinotopic sampling layer and a consistency loss, which is stable and guaranteed not to suffer from gradient obfuscation. Extensive experiments on a wide spectrum of attackers and settings applied in MNIST, CIFAR-10, and mini-ImageNet datasets empirically demonstrate that CiiV is robust to adaptive attacks.


[191] 2106.09536

Single Event Transient Fault Analysis of ELEPHANT cipher

In this paper, we propose a novel fault attack termed as Single Event Transient Fault Analysis (SETFA) attack, which is well suited for hardware implementations. The proposed approach pinpoints hotspots in the cypher's Sbox combinational logic circuit that significantly reduce the key entropy when subjected to faults. ELEPHANT is a parallel authenticated encryption and associated data (AEAD) scheme targeted to hardware implementations, a finalist in the Lightweight cryptography (LWC) competition launched by NIST. In this work, we investigate vulnerabilities of ELEPHANT against fault analysis. We observe that the use of 128-bit random nonce makes it resistant against many cryptanalysis techniques like differential, linear, etc., and their variants. However, the relaxed nature of Statistical Fault Analysis (SFA) methods makes them widely applicable in restrictive environments. We propose a SETFA-based key recovery attack on Elephant. We performed Single experiments with random plaintexts and keys, on Dumbo, a Sponge-based instance of the Elephant-AEAD scheme. Our proposed approach could recover the secret key in 85-250 ciphertexts. In essence, this work investigates new vulnerabilities towards fault analysis that may require to be addressed to ensure secure computations and communications in IoT scenarios.


[192] 2106.09538

Exploring deterministic frequency deviations with explainable AI

Deterministic frequency deviations (DFDs) critically affect power grid frequency quality and power system stability. A better understanding of these events is urgently needed as frequency deviations have been growing in the European grid in recent years. DFDs are partially explained by the rapid adjustment of power generation following the intervals of electricity trading, but this intuitive picture fails especially before and around noonday. In this article, we provide a detailed analysis of DFDs and their relation to external features using methods from explainable Artificial Intelligence. We establish a machine learning model that well describes the daily cycle of DFDs and elucidate key interdependencies using SHapley Additive exPlanations (SHAP). Thereby, we identify solar ramps as critical to explain patterns in the Rate of Change of Frequency (RoCoF).


[193] 2106.09543

Future mobility as a bio-inspired collaborative system

The current trends towards vehicle-sharing, electrification, and autonomy are predicted to transform mobility. Combined appropriately, they have the potential of significantly improving urban mobility. However, what will come after most vehicles are shared, electric, and autonomous remains an open question, especially regarding the interactions between vehicles and how these interactions will impact system-level behaviour. Inspired by nature and supported by swarm robotics and vehicle platooning models, this paper proposes a future mobility in which shared, electric, and autonomous vehicles behave as a bio-inspired collaborative system. The collaboration between vehicles will lead to a system-level behaviour analogous to natural swarms. Natural swarms can divide tasks, cluster, build together, or transport cooperatively. In this future mobility, vehicles will cluster by connecting either physically or virtually, which will enable the possibility of sharing energy, data or computational power, provide services or transfer cargo, among others. Vehicles will collaborate either with vehicles that are part of the same fleet, or with any other vehicle on the road, by finding mutualistic relationships that benefit both parties. The field of swarm robotics has already translated some of the behaviours from natural swarms to artificial systems and, if we further translate these concepts into urban mobility, exciting ideas emerge. Within mobility-related research, the coordinated movement proposed in vehicle platooning models can be seen as a first step towards collaborative mobility. This paper contributes with the proposal of a framework for future mobility that integrates current research and mobility trends in a novel and unique way.


[194] 2106.09547

Machine Learning for Postprocessing Ensemble Streamflow Forecasts

Skillful streamflow forecasting informs decisions in various areas of water policy and management. We integrate dynamical modeling with machine learning to demonstrate the enhanced quality of streamflow forecasts at short-to medium-range timescales (1 - 7 days). Dynamical modeling generates ensemble streamflow forecasts by forcing a hydrological model with numerical weather prediction model outputs. We employ a Long Short-Term Memory (LSTM) neural network to correct forecast biases in raw ensemble streamflow forecasts obtained from dynamical modeling. For forecast verification, we use different metrics such as skill score and reliability diagram conditioned upon the lead time, flow threshold, and season. The verification results show that the LSTM can improve streamflow forecasts relative to climatological, temporal persistence, deterministic, and raw ensemble forecasts. The LSTM demonstrates improvement across all lead times, flow thresholds, and seasons. As compared to the raw ensembles, relative gain in forecast skill from LSTM is generally higher at medium-range timescales compared to initial lead time; high flows compared to low-moderate flows; and warm-season compared to the cool ones. Overall, our results highlight the benefits of LSTM for improving both the skill and reliability of streamflow forecasts.


[195] 2106.09548

Scale-Consistent Fusion: from Heterogeneous Local Sampling to Global Immersive Rendering

Image-based geometric modeling and novel view synthesis based on sparse, large-baseline samplings are challenging but important tasks for emerging multimedia applications such as virtual reality and immersive telepresence. Existing methods fail to produce satisfactory results due to the limitation on inferring reliable depth information over such challenging reference conditions. With the popularization of commercial light field (LF) cameras, capturing LF images (LFIs) is as convenient as taking regular photos, and geometry information can be reliably inferred. This inspires us to use a sparse set of LF captures to render high-quality novel views globally. However, fusion of LF captures from multiple angles is challenging due to the scale inconsistency caused by various capture settings. To overcome this challenge, we propose a novel scale-consistent volume rescaling algorithm that robustly aligns the disparity probability volumes (DPV) among different captures for scale-consistent global geometry fusion. Based on the fused DPV projected to the target camera frustum, novel learning-based modules have been proposed (i.e., the attention-guided multi-scale residual fusion module, and the disparity field guided deep re-regularization module) which comprehensively regularize noisy observations from heterogeneous captures for high-quality rendering of novel LFIs. Both quantitative and qualitative experiments over the Stanford Lytro Multi-view LF dataset show that the proposed method outperforms state-of-the-art methods significantly under different experiment settings for disparity inference and LF synthesis.


[196] 2106.09553

Do Large Scale Molecular Language Representations Capture Important Structural Information?

Predicting chemical properties from the structure of a molecule is of great importance in many applications including drug discovery and material design. Machine learning based molecular property prediction holds the promise of enabling accurate predictions at much less complexity, when compared to, for example Density Functional Theory (DFT) calculations. Features extracted from molecular graphs, using graph neural nets in a supervised manner, have emerged as strong baselines for such tasks. However, the vast chemical space together with the limited availability of labels makes supervised learning challenging, calling for learning a general-purpose molecular representation. Recently, pre-trained transformer-based language models (PTLMs) on large unlabeled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, here we present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer. This model was employed with a linear attention mechanism and highly paralleized training on 1D SMILES sequences of 1.1 billion unlabeled molecules from the PubChem and ZINC datasets. Experiments show that the learned molecular representation performs competitively, when compared to existing graph-based and fingerprint-based supervised learning baselines, on the challenging tasks of predicting properties of QM8 and QM9 molecules. Further task-specific fine-tuning of the MoLFormerr representation improves performance on several of those property prediction benchmarks. These results provide encouraging evidence that large-scale molecular language models can capture sufficient structural information to be able to accurately predict quantum chemical properties and beyond.


[197] 2106.09558

Element Intervention for Open Relation Extraction

Open relation extraction aims to cluster relation instances referring to the same underlying relation, which is a critical step for general relation extraction. Current OpenRE models are commonly trained on the datasets generated from distant supervision, which often results in instability and makes the model easily collapsed. In this paper, we revisit the procedure of OpenRE from a causal view. By formulating OpenRE using a structural causal model, we identify that the above-mentioned problems stem from the spurious correlations from entities and context to the relation type. To address this issue, we conduct \emph{Element Intervention}, which intervenes on the context and entities respectively to obtain the underlying causal effects of them. We also provide two specific implementations of the interventions based on entity ranking and context contrasting. Experimental results on unsupervised relation extraction datasets show that our methods outperform previous state-of-the-art methods and are robust across different datasets.


[198] 2106.09563

On Anytime Learning at Macroscale

Classical machine learning frameworks assume access to a possibly large dataset in order to train a predictive model. In many practical applications however, data does not arrive all at once, but in batches over time. This creates a natural trade-off between accuracy of a model and time to obtain such a model. A greedy predictor could produce non-trivial predictions by immediately training on batches as soon as these become available but, it may also make sub-optimal use of future data. On the other hand, a tardy predictor could wait for a long time to aggregate several batches into a larger dataset, but ultimately deliver a much better performance. In this work, we consider such a streaming learning setting, which we dub {\em anytime learning at macroscale} (ALMA). It is an instance of anytime learning applied not at the level of a single chunk of data, but at the level of the entire sequence of large batches. We first formalize this learning setting, we then introduce metrics to assess how well learners perform on the given task for a given memory and compute budget, and finally we test several baseline approaches on standard benchmarks repurposed for anytime learning at macroscale. The general finding is that bigger models always generalize better. In particular, it is important to grow model capacity over time if the initial model is relatively small. Moreover, updating the model at an intermediate rate strikes the best trade off between accuracy and time to obtain a useful predictor.


[199] 2106.09564

Knowledge distillation from multi-modal to mono-modal segmentation networks

The joint use of multiple imaging modalities for medical image segmentation has been widely studied in recent years. The fusion of information from different modalities has demonstrated to improve the segmentation accuracy, with respect to mono-modal segmentations, in several applications. However, acquiring multiple modalities is usually not possible in a clinical setting due to a limited number of physicians and scanners, and to limit costs and scan time. Most of the time, only one modality is acquired. In this paper, we propose KD-Net, a framework to transfer knowledge from a trained multi-modal network (teacher) to a mono-modal one (student). The proposed method is an adaptation of the generalized distillation framework where the student network is trained on a subset (1 modality) of the teacher's inputs (n modalities). We illustrate the effectiveness of the proposed framework in brain tumor segmentation with the BraTS 2018 dataset. Using different architectures, we show that the student network effectively learns from the teacher and always outperforms the baseline mono-modal network in terms of segmentation accuracy.


[200] 2106.09565

Interval Privacy: A Framework for Data Collection

The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and tradeoffs for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval containing it. The proposed interval privacy mechanisms can be easily deployed through most existing survey-based data collection paradigms, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not distort it. The way of using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individual respondents. We study different theoretical aspects of the proposed privacy. In the context of supervised learning, we also offer a method such that existing supervised learning algorithms designed for point-valued data could be directly applied to learning from interval-valued data.


[201] 2106.09572

Topic Modeling and Progression of American Digital News Media During the Onset of the COVID-19 Pandemic

Currently, the world is in the midst of a severe global pandemic, which has affected all aspects of people's lives. As a result, there is a deluge of COVID-related digital media articles published in the United States, due to the disparate effects of the pandemic. This large volume of information is difficult to consume by the audience in a reasonable amount of time. In this paper, we develop a Natural Language Processing (NLP) pipeline that is capable of automatically distilling various digital articles into manageable pieces of information, while also modelling the progression topics discussed over time in order to aid readers in rapidly gaining holistic perspectives on pressing issues (i.e., the COVID-19 pandemic) from a diverse array of sources. We achieve these goals by first collecting a large corpus of COVID-related articles during the onset of the pandemic. After, we apply unsupervised and semi-supervised learning procedures to summarize articles, then cluster them based on their similarities using the community detection methods. Next, we identify the topic of each cluster of articles using the BART algorithm. Finally, we provide a detailed digital media analysis based on the NLP-pipeline outputs and show how the conversation surrounding COVID-19 evolved over time.


[202] 2106.09575

Rotation Invariant Graph Neural Networks using Spin Convolutions

Progress towards the energy breakthroughs needed to combat climate change can be significantly accelerated through the efficient simulation of atomic systems. Simulation techniques based on first principles, such as Density Functional Theory (DFT), are limited in their practical use due to their high computational expense. Machine learning approaches have the potential to approximate DFT in a computationally efficient manner, which could dramatically increase the impact of computational simulations on real-world problems. Approximating DFT poses several challenges. These include accurately modeling the subtle changes in the relative positions and angles between atoms, and enforcing constraints such as rotation invariance or energy conservation. We introduce a novel approach to modeling angular information between sets of neighboring atoms in a graph neural network. Rotation invariance is achieved for the network's edge messages through the use of a per-edge local coordinate frame and a novel spin convolution over the remaining degree of freedom. Two model variants are proposed for the applications of structure relaxation and molecular dynamics. State-of-the-art results are demonstrated on the large-scale Open Catalyst 2020 dataset. Comparisons are also performed on the MD17 and QM9 datasets.


[203] 2106.09578

Modeling Worlds in Text

We provide a dataset that enables the creation of learning agents that can build knowledge graph-based world models of interactive narratives. Interactive narratives -- or text-adventure games -- are partially observable environments structured as long puzzles or quests in which an agent perceives and interacts with the world purely through textual natural language. Each individual game typically contains hundreds of locations, characters, and objects -- each with their own unique descriptions -- providing an opportunity to study the problem of giving language-based agents the structured memory necessary to operate in such worlds. Our dataset provides 24198 mappings between rich natural language observations and: (1) knowledge graphs that reflect the world state in the form of a map; (2) natural language actions that are guaranteed to cause a change in that particular world state. The training data is collected across 27 games in multiple genres and contains a further 7836 heldout instances over 9 additional games in the test set. We further provide baseline models using rules-based, question-answering, and sequence learning approaches in addition to an analysis of the data and corresponding learning tasks.


[204] 2106.09580

Optimality and Stability in Federated Learning: A Game-theoretic Approach

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players). First, we provide and prove the correctness of an efficient algorithm to calculate an optimal (error minimizing) arrangement of players. Next, we analyze the relationship between the stability and optimality of an arrangement. First, we show that for some regions of parameter space, all stable arrangements are optimal (Price of Anarchy equal to 1). However, we show this is not true for all settings: there exist examples of stable arrangements with higher cost than optimal (Price of Anarchy greater than 1). Finally, we give the first constant-factor bound on the performance gap between stability and optimality, proving that the total error of the worst stable solution can be no higher than 9 times the total error of an optimal solution (Price of Anarchy bound of 9).


[205] 2106.09584

SIFT Matching by Context Exposed

This paper investigates how to step up local image descriptor matching by exploiting matching context information. Two main contexts are identified, originated respectively from the descriptor space and from the keypoint space. The former is generally used to design the actual matching strategy while the latter to filter matches according to the local spatial consistency. On this basis, a new matching strategy and a novel local spatial filter, named respectively blob matching and Delaunay Triangulation Matching (DTM) are devised. Blob matching provides a general matching framework by merging together several strategies, including pre-filtering as well as many-to-many and symmetric matching, enabling to achieve a global improvement upon each individual strategy. DTM alternates between Delaunay triangulation contractions and expansions to figure out and adjust keypoint neighborhood consistency. Experimental evaluation shows that DTM is comparable or better than the state-of-the-art in terms of matching accuracy and robustness, especially for non-planar scenes. Evaluation is carried out according to a new benchmark devised for analyzing the matching pipeline in terms of correct correspondences on both planar and non-planar scenes, including state-of-the-art methods as well as the common SIFT matching approach for reference. This evaluation can be of assistance for future research in this field.


[206] 2106.09586

Prevalence and Propagation of Fake News

In recent years, scholars have raised concerns on the effects that unreliable news, or "fake news," has on our political sphere, and our democracy as a whole. For example, the propagation of fake news on social media is widely believed to have influenced the outcome of national elections, including the 2016 U.S. Presidential Election, and the 2020 COVID-19 pandemic. What drives the propagation of fake news on an individual level, and which interventions could effectively reduce the propagation rate? Our model disentangles bias from truthfulness of an article and examines the relationship between these two parameters and a reader's own beliefs. Using the model, we create policy recommendations for both social media platforms and individual social media users to reduce the spread of untruthful or highly biased news. We recommend that platforms sponsor unbiased truthful news, focus fact-checking efforts on mild to moderately biased news, recommend friend suggestions across the political spectrum, and provide users with reports about the political alignment of their feed. We recommend that individual social media users fact check news that strongly aligns with their political bias and read articles of opposing political bias.


[207] 2106.09588

End-to-End Cross-Domain Text-to-SQL Semantic Parsing with Auxiliary Task

In this work, we focus on two crucial components in the cross-domain text-to-SQL semantic parsing task: schema linking and value filling. To encourage the model to learn better encoding ability, we propose a column selection auxiliary task to empower the encoder with the relevance matching capability by using explicit learning targets. Furthermore, we propose two value filling methods to build the bridge from the existing zero-shot semantic parsers to real-world applications, considering most of the existing parsers ignore the values filling in the synthesized SQL. With experiments on Spider, our proposed framework improves over the baselines on the execution accuracy and exact set match accuracy when database contents are unavailable, and detailed analysis sheds light on future work.


[208] 2106.09589

Classifying vaccine sentiment tweets by modelling domain-specific representation and commonsense knowledge into context-aware attentive GRU

Vaccines are an important public health measure, but vaccine hesitancy and refusal can create clusters of low vaccine coverage and reduce the effectiveness of vaccination programs. Social media provides an opportunity to estimate emerging risks to vaccine acceptance by including geographical location and detailing vaccine-related concerns. Methods for classifying social media posts, such as vaccine-related tweets, use language models (LMs) trained on general domain text. However, challenges to measuring vaccine sentiment at scale arise from the absence of tonal stress and gestural cues and may not always have additional information about the user, e.g., past tweets or social connections. Another challenge in LMs is the lack of commonsense knowledge that are apparent in users metadata, i.e., emoticons, positive and negative words etc. In this study, to classify vaccine sentiment tweets with limited information, we present a novel end-to-end framework consisting of interconnected components that use domain-specific LM trained on vaccine-related tweets and models commonsense knowledge into a bidirectional gated recurrent network (CK-BiGRU) with context-aware attention. We further leverage syntactical, user metadata and sentiment information to capture the sentiment of a tweet. We experimented using two popular vaccine-related Twitter datasets and demonstrate that our proposed approach outperforms state-of-the-art models in identifying pro-vaccine, anti-vaccine and neutral tweets.


[209] 2106.09590

Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

This paper presents a framework for assessing data and metadata quality within Open Data portals. Although a few benchmark frameworks already exist for this purpose, they are not yet detailed enough in both breadth and depth to make valid statements about the actual discoverability and accessibility of publicly available data collections. To address this research gap, we have designed a quality framework that is able to evaluate data quality in Open Data portals on dedicated and fine-grained dimensions, such as interoperability, findability, uniqueness or completeness. Additionally, we propose quality measures that allow for valid assessments regarding cross-portal findability and uniqueness of dataset descriptions. We have validated our novel quality framework for the German Open Data landscape and found out that metadata often still lacks meaningful descriptions and is not yet extensively connected to the Semantic Web.


[210] 2106.09592

Data lake concept and systems: a survey

Although big data has been discussed for some years, it still has many research challenges, especially the variety of data. It poses a huge difficulty to efficiently integrate, access, and query the large volume of diverse data in information silos with the traditional 'schema-on-write' approaches such as data warehouses. Data lakes have been proposed as a solution to this problem. They are repositories storing raw data in its original formats and providing a common access interface. This survey reviews the development, definition, and architectures of data lakes. We provide a comprehensive overview of research questions for designing and building data lakes. We classify the existing data lake systems based on their provided functions, which makes this survey a useful technical reference for designing, implementing and applying data lakes. We hope that the thorough comparison of existing solutions and the discussion of open research challenges in this survey would motivate the future development of data lake research and practice.


[211] 2106.09599

On PQC Migration and Crypto-Agility

Besides the development of PQC algorithms, the actual migration of IT systems to such new schemes has to be considered, best by utilizing or establishing crypto-agility. Much work in this respect is currently conducted all over the world, making it hard to keep track of the many individual challenges and respective solutions that have been identified. In consequence, it is difficult to judge for both individual application scenarios and on a global scale, whether all (known) challenges have been addressed respectively or what their current state is. We provide a literature survey and a snapshot of the discovered challenges and solutions categorized in different areas. We use this as starting point for a community project to keep track of the ongoing efforts and the state of the art in this field. Thereby we offer a single entry-point into the subject reflecting the current state in a timely manner.


[212] 2106.09608

Learning Knowledge Graph-based World Models of Textual Environments

World models improve a learning agent's ability to efficiently operate in interactive and situated environments. This work focuses on the task of building world models of text-based game environments. Text-based games, or interactive narratives, are reinforcement learning environments in which agents perceive and interact with the world using textual natural language. These environments contain long, multi-step puzzles or quests woven through a world that is filled with hundreds of characters, locations, and objects. Our world model learns to simultaneously: (1) predict changes in the world caused by an agent's actions when representing the world as a knowledge graph; and (2) generate the set of contextually relevant natural language actions required to operate in the world. We frame this task as a Set of Sequences generation problem by exploiting the inherent structure of knowledge graphs and actions and introduce both a transformer-based multi-task architecture and a loss function to train it. A zero-shot ablation study on never-before-seen textual worlds shows that our methodology significantly outperforms existing textual world modeling techniques as well as the importance of each of our contributions.


[213] 2106.09613

Meta-Calibration: Meta-Learning of Model Calibration Using Differentiable Expected Calibration Error

Calibration of neural networks is a topical problem that is becoming increasingly important for real-world use of neural networks. The problem is especially noticeable when using modern neural networks, for which there is significant difference between the model confidence and the confidence it should have. Various strategies have been successfully proposed, yet there is more space for improvements. We propose a novel approach that introduces a differentiable metric for expected calibration error and successfully uses it as an objective for meta-learning, achieving competitive results with state-of-the-art approaches. Our approach presents a new direction of using meta-learning to directly optimize model calibration, which we believe will inspire further work in this promising and new direction.


[214] 2106.09614

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

3D face reconstruction from a single image is challenging due to its ill-posed nature. Model-based face autoencoders address this issue effectively by fitting a face model to the target image in a weakly supervised manner. However, in unconstrained environments occlusions distort the face reconstruction because the model often erroneously tries to adapt to occluded face regions. Supervised occlusion segmentation is a viable solution to avoid the fitting of occluded face regions, but it requires a large amount of annotated training data. In this work, we enable model-based face autoencoders to segment occluders accurately without requiring any additional supervision during training, and this separates regions where the model will be fitted from those where it will not be fitted. To achieve this, we extend face autoencoders with a segmentation network. The segmentation network decides which regions the model should adapt to by reaching balances in a trade-off between including pixels and adapting the model to them, and excluding pixels so that the model fitting is not negatively affected and reaches higher overall reconstruction accuracy on pixels showing the face. This leads to a synergistic effect, in which the occlusion segmentation guides the training of the face autoencoder to constrain the fitting in the non-occluded regions, while the improved fitting enables the segmentation model to better predict the occluded face regions. Qualitative and quantitative experiments on the CelebA-HQ database and the AR database verify the effectiveness of our model in improving 3D face reconstruction under occlusions and in enabling accurate occlusion segmentation from weak supervision only. Code available at https://github.com/unibas-gravis/Occlusion-Robust-MoFA.


[215] 2106.09621

Privacy-Preserving Eye-tracking Using Deep Learning

The expanding usage of complex machine learning methods like deep learning has led to an explosion in human activity recognition, particularly applied to health. In particular, as part of a larger body sensor network system, face and full-body analysis is becoming increasingly common for evaluating health status. However, complex models which handle private and sometimes protected data, raise concerns about the potential leak of identifiable data. In this work, we focus on the case of a deep network model trained on images of individual faces. Full-face video recordings taken from 493 individuals undergoing an eye-tracking based evaluation of neurological function were used. Outputs, gradients, intermediate layer outputs, loss, and labels were used as inputs for a deep network with an added support vector machine emission layer to recognize membership in the training data. The inference attack method and associated mathematical analysis indicate that there is a low likelihood of unintended memorization of facial features in the deep learning model. In this study, it is showed that the named model preserves the integrity of training data with reasonable confidence. The same process can be implemented in similar conditions for different models.


[216] 2106.09623

Towards Explainable Student Group Collaboration Assessment Models Using Temporal Representations of Individual Student Roles

Collaboration is identified as a required and necessary skill for students to be successful in the fields of Science, Technology, Engineering and Mathematics (STEM). However, due to growing student population and limited teaching staff it is difficult for teachers to provide constructive feedback and instill collaborative skills using instructional methods. Development of simple and easily explainable machine-learning-based automated systems can help address this problem. Improving upon our previous work, in this paper we propose using simple temporal-CNN deep-learning models to assess student group collaboration that take in temporal representations of individual student roles as input. We check the applicability of dynamically changing feature representations for student group collaboration assessment and how they impact the overall performance. We also use Grad-CAM visualizations to better understand and interpret the important temporal indices that led to the deep-learning model's decision.


[217] 2106.09624

Probabilistic Stability Assessment for Active Distribution Grids

This paper demonstrates the concept of probabilistic stability assessment on large-signal stability in the use case of short circuits in an active distribution grid. Here, the concept of survivability is applied, which extends classical stability assessments by evaluating the stability and operational limits during transients for a wide range of operating points and failures. For this purpose, a free, open-source, and computationally efficient environment (Julia) for dynamic simulation of power grids is used to demonstrate its capabilities. The model implementation is validated against established commercial software and deviations are minimal with respect to power flow and dynamic simulations.The results of a large-scale survivability analysis reveal i) a broad field of application for probabilistic stability analysis and ii) that new non-intuitive stability correlations can be obtained. Hence,the proposed method shows strong potential to efficiently conduct power system stability analysis in active distribution grids.


[218] 2106.09627

Towards Prevention of Sportsmen Burnout: Formal Analysis of Sub-Optimal Tournament Scheduling

Scheduling a sports tournament is a complex optimization problem, which requires a large number of hard constraints to satisfy. Despite the availability of several such constraints in the literature, there remains a gap since most of the new sports events pose their own unique set of requirements, and demand novel constraints. Specifically talking of the strictly time bound events, ensuring fairness between the different teams in terms of their rest days, traveling, and the number of successive games they play, becomes a difficult task to resolve, and demands attention. In this work, we present a similar situation with a recently played sports event, where a suboptimal schedule favored some of the sides more than the others. We introduce various competitive parameters to draw a fairness comparison between the sides and propose a weighting criterion to point out the sides that enjoyed this schedule more than the others. Furthermore, we use root mean squared error between an ideal schedule and the actual ones for each side to determine unfairness in the distribution of rest days across their entire schedules. The latter is crucial, since successively playing a large number of games may lead to sportsmen burnout, which must be prevented.


[219] 2106.09636

Multi-Modal Prototype Learning for Interpretable Multivariable Time Series Classification

Multivariable time series classification problems are increasing in prevalence and complexity in a variety of domains, such as biology and finance. While deep learning methods are an effective tool for these problems, they often lack interpretability. In this work, we propose a novel modular prototype learning framework for multivariable time series classification. In the first stage of our framework, encoders extract features from each variable independently. Prototype layers identify single-variable prototypes in the resulting feature spaces. The next stage of our framework represents the multivariable time series sample points in terms of their similarity to these single-variable prototypes. This results in an inherently interpretable representation of multivariable patterns, on which prototype learning is applied to extract representative examples i.e. multivariable prototypes. Our framework is thus able to explicitly identify both informative patterns in the individual variables, as well as the relationships between the variables. We validate our framework on a simulated dataset with embedded patterns, as well as a real human activity recognition problem. Our framework attains comparable or superior classification performance to existing time series classification methods on these tasks. On the simulated dataset, we find that our model returns interpretations consistent with the embedded patterns. Moreover, the interpretations learned on the activity recognition dataset align with domain knowledge.


[220] 2106.09637

AttDLNet: Attention-based DL Network for 3D LiDAR Place Recognition

Deep networks have been progressively adapted to new sensor modalities, namely to 3D LiDAR, which led to unprecedented achievements in autonomous vehicle-related applications such as place recognition. One of the main challenges of deep models in place recognition is to extract efficient and descriptive feature representations that relate places based on their similarity. To address the problem of place recognition using LiDAR data, this paper proposes a novel 3D LiDAR-based deep learning network (named AttDLNet) that comprises an encoder network and exploits an attention mechanism to selectively focus on long-range context and interfeature relationships. The proposed network is trained and validated on the KITTI dataset, using the cosine loss for training and a retrieval-based place recognition pipeline for validation. Additionally, an ablation study is presented to assess the best network configuration. Results show that the encoder network features are already very descriptive, but adding attention to the network further improves performance. From the ablation study, results indicate that the middle encoder layers have the highest mean performance, while deeper layers are more robust to orientation change. The code is publicly available on the project website: https://github.com/Cybonic/ AttDLNet


[221] 2106.09640

Microgrid Resilience: A Holistic and Context-Aware Resilience Metric

Microgrids present an effective solution for the coordinated deployment of various distributed energy resources and furthermore provide myriad additional benefits such as resilience, decreased carbon footprint, and reliability to energy consumers and the energy system as a whole. Boosting the resilience of distribution systems is another major benefit of microgrids. This is because they can also serve as a backup power source when the utility grid operations are interrupted due to either high-probability low-impact events like a component failure or low-probability high-impact events - be it a natural disaster or a planned cyberattack. However, the degree to which any particular system can defend, adapt, and restore normal operation depends on various factors including the type and severity of events to which a microgrid is subjected. These factors, in turn, are dependent on the geographical location of the deployed microgrid as well as the cyber risk profile of the site where the microgrid is operating. Therefore, in this work, we attempt to capture this multi-dimensional interplay of various factors in quantifying the ability of the microgrid to be resilient in these varying aspects. This paper, thus, proposes a customized site-specific quantification of the resilience strength for the individual microgrid capability to absorb, restore, and adapt to the changing circumstances for sustaining the critical load when a low-probability high-impact event occurs - termed as - context-aware resilience metric. We also present a case study to illustrate the key elements of our integrated analytical approach.


[222] 2106.09643

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Class-imbalanced data, in which some classes contain far more samples than others, is ubiquitous in real-world applications. Standard techniques for handling class-imbalance usually work by training on a re-weighted loss or on re-balanced data. Unfortunately, training overparameterized neural networks on such objectives causes rapid memorization of minority class data. To avoid this trap, we harness meta-learning, which uses both an ''outer-loop'' and an ''inner-loop'' loss, each of which may be balanced using different strategies. We evaluate our method, MetaBalance, on image classification, credit-card fraud detection, loan default prediction, and facial recognition tasks with severely imbalanced data, and we find that MetaBalance outperforms a wide array of popular re-sampling strategies.


[223] 2106.09645

Prototypical Graph Contrastive Learning

Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs instance discrimination task which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this paper, we propose a Prototypical Graph Contrastive Learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proved to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods.


[224] 2106.09647

Deep Learning Through the Lens of Example Difficulty

Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth. Our extensive investigation reveals surprising yet simple relationships between the prediction depth of a given input and the model's uncertainty, confidence, accuracy and speed of learning for that data point. We further categorize difficult examples into three interpretable groups, demonstrate how these groups are processed differently inside deep models and showcase how this understanding allows us to improve prediction accuracy. Insights from our study lead to a coherent view of a number of separately reported phenomena in the literature: early layers generalize while later layers memorize; early layers converge faster and networks learn easy data and simple functions first.


[225] 2106.09650

Multi-head or Single-head? An Empirical Comparison for Transformer Training

Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications. The popular belief is that this effectiveness stems from the ability of jointly attending multiple positions. In this paper, we first demonstrate that jointly attending multiple positions is not a unique feature of multi-head attention, as multi-layer single-head attention also attends multiple positions and is more effective. Then, we suggest the main advantage of the multi-head attention is the training stability, since it has less number of layers than the single-head attention, when attending the same number of positions. For example, 24-layer 16-head Transformer (BERT-large) and 384-layer single-head Transformer has the same total attention head number and roughly the same model size, while the multi-head one is significantly shallower. Meanwhile, we show that, with recent advances in deep learning, we can successfully stabilize the training of the 384-layer Transformer. As the training difficulty is no longer a bottleneck, substantially deeper single-head Transformer achieves consistent performance improvements without tuning hyper-parameters.


[226] 2106.09658

Non-intrusive Nonlinear Model Reduction via Machine Learning Approximations to Low-dimensional Operators

Although projection-based reduced-order models (ROMs) for parameterized nonlinear dynamical systems have demonstrated exciting results across a range of applications, their broad adoption has been limited by their intrusivity: implementing such a reduced-order model typically requires significant modifications to the underlying simulation code. To address this, we propose a method that enables traditionally intrusive reduced-order models to be accurately approximated in a non-intrusive manner. Specifically, the approach approximates the low-dimensional operators associated with projection-based reduced-order models (ROMs) using modern machine-learning regression techniques. The only requirement of the simulation code is the ability to export the velocity given the state and parameters as this functionality is used to train the approximated low-dimensional operators. In addition to enabling nonintrusivity, we demonstrate that the approach also leads to very low computational complexity, achieving up to $1000\times$ reduction in run time. We demonstrate the effectiveness of the proposed technique on two types of PDEs.


[227] 2106.09659

Robustness and Consistency in Linear Quadratic Control with Predictions

We study the problem of learning-augmented predictive linear quadratic control. Our goal is to design a controller that balances consistency, which measures the competitive ratio when predictions are accurate, and robustness, which bounds the competitive ratio when predictions are inaccurate. We propose a novel $\lambda$-confident controller and prove that it maintains a competitive ratio upper bound of $1+\min\{O(\lambda^2\varepsilon)+ O(1-\lambda)^2,O(1)+O(\lambda^2)\}$ where $\lambda\in [0,1]$ is a trust parameter set based on the confidence in the predictions, and $\varepsilon$ is the prediction error. Further, we design a self-tuning policy that adaptively learns the trust parameter $\lambda$ with a regret that depends on $\varepsilon$ and the variation of perturbations and predictions.


[228] 2106.09665

Understanding the Effectiveness of Reviews in E-commerce Top-N Recommendation

Modern E-commerce websites contain heterogeneous sources of information, such as numerical ratings, textual reviews and images. These information can be utilized to assist recommendation. Through textual reviews, a user explicitly express her affinity towards the item. Previous researchers found that by using the information extracted from these reviews, we can better profile the users' explicit preferences as well as the item features, leading to the improvement of recommendation performance. However, most of the previous algorithms were only utilizing the review information for explicit-feedback problem i.e. rating prediction, and when it comes to implicit-feedback ranking problem such as top-N recommendation, the usage of review information has not been fully explored. Seeing this gap, in this work, we investigate the effectiveness of textual review information for top-N recommendation under E-commerce settings. We adapt several SOTA review-based rating prediction models for top-N recommendation tasks and compare them to existing top-N recommendation models from both performance and efficiency. We find that models utilizing only review information can not achieve better performances than vanilla implicit-feedback matrix factorization method. When utilizing review information as a regularizer or auxiliary information, the performance of implicit-feedback matrix factorization method can be further improved. However, the optimal model structure to utilize textual reviews for E-commerce top-N recommendation is yet to be determined.


[229] 2106.09667

Poisoning and Backdooring Contrastive Learning

Contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.005% of a dataset (e.g., just 150 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of less than 0.0001% of the dataset (e.g., just two out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.


[230] 2106.09668

Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and the mini-mental state examination (MMSE) score prediction. We proposed a model that obtains unimodal decisions from different LSTMs, one for each modality of text and audio, and then combines them using a gating mechanism for the final prediction. We focused on sequential modelling of text and audio and investigated whether the disfluencies present in individuals' speech relate to the extent of their cognitive impairment. Our results show that the proposed classification and regression schemes obtain very promising results on both development and test sets. This suggests Alzheimer's Disease can be detected successfully with sequence modeling of the speech data of medical sessions.


[231] 2106.09669

Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audiovisual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio separation model. Our proposed model addresses these issues using cross-modal and self-attention modules that capture audio-visual dependencies at a finer resolution over time, and by unsupervised pre-training of audio separation model. These improvements allow the model to generalize to a much wider set of unseen videos. For evaluation and semi-supervised training, we collected human annotations of on-screen audio from a large database of in-the-wild videos (YFCC100M). Our results show marked improvements in on-screen separation performance, in more general conditions than previous methods.


[232] 2106.09670

Indian Masked Faces in the Wild Dataset

Due to the COVID-19 pandemic, wearing face masks has become a mandate in public places worldwide. Face masks occlude a significant portion of the facial region. Additionally, people wear different types of masks, from simple ones to ones with graphics and prints. These pose new challenges to face recognition algorithms. Researchers have recently proposed a few masked face datasets for designing algorithms to overcome the challenges of masked face recognition. However, existing datasets lack the cultural diversity and collection in the unrestricted settings. Country like India with attire diversity, people are not limited to wearing traditional masks but also clothing like a thin cotton printed towel (locally called as ``gamcha''), ``stoles'', and ``handkerchiefs'' to cover their faces. In this paper, we present a novel \textbf{Indian Masked Faces in the Wild (IMFW)} dataset which contains images with variations in pose, illumination, resolution, and the variety of masks worn by the subjects. We have also benchmarked the performance of existing face recognition models on the proposed IMFW dataset. Experimental results demonstrate the limitations of existing algorithms in presence of diverse conditions.


[233] 2106.09671

An Attract-Repel Decomposition of Undirected Networks

Dot product latent space embedding is a common form of representation learning in undirected graphs (e.g. social networks, co-occurrence networks). We show that such models have problems dealing with 'intransitive' situations where A is linked to B, B is linked to C but A is not linked to C. Such situations occur in social networks when opposites attract (heterophily) and in co-occurrence networks when there are substitute nodes (e.g. the presence of Pepsi or Coke, but rarely both, in otherwise similar purchase baskets). We present a simple expansion which we call the attract-repel (AR) decomposition: a set of latent attributes on which similar nodes attract and another set of latent attributes on which similar nodes repel. We demonstrate the AR decomposition in real social networks and show that it can be used to measure the amount of latent homophily and heterophily. In addition, it can be applied to co-occurrence networks to discover roles in teams and find substitutable ingredients in recipes.


[234] 2106.09672

The 2021 Image Similarity Dataset and Challenge

This paper introduces a new benchmark for large-scale image similarity detection. This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021). The goal is to determine whether a query image is a modified copy of any image in a reference corpus of size 1~million. The benchmark features a variety of image transformations such as automated transformations, hand-crafted image edits and machine-learning based manipulations. This mimics real-life cases appearing in social media, for example for integrity-related problems dealing with misinformation and objectionable content. The strength of the image manipulations, and therefore the difficulty of the benchmark, is calibrated according to the performance of a set of baseline approaches. Both the query and reference set contain a majority of ``distractor'' images that do not match, which corresponds to a real-life needle-in-haystack setting, and the evaluation metric reflects that. We expect the DISC21 benchmark to promote image copy detection as an important and challenging computer vision task and refresh the state of the art.


[235] 2106.09675

Gone Fishing: Neural Active Learning with Fisher Embeddings

There is an increasing need for effective active learning algorithms that are compatible with deep neural networks. While there are many classic, well-studied sample selection methods, the non-convexity and varying internal representation of neural models make it unclear how to extend these approaches. This article introduces BAIT, a practical, tractable, and high-performing active learning algorithm for neural networks that addresses these concerns. BAIT draws inspiration from the theoretical analysis of maximum likelihood estimators (MLE) for parametric models. It selects batches of samples by optimizing a bound on the MLE error in terms of the Fisher information, which we show can be implemented efficiently at scale by exploiting linear-algebraic structure especially amenable to execution on modern hardware. Our experiments show that BAIT outperforms the previous state of the art on both classification and regression problems, and is flexible enough to be used with a variety of model architectures.


[236] 2106.09677

Adaptive Low-Rank Regularization with Damping Sequences to Restrict Lazy Weights in Deep Networks

Overfitting is one of the critical problems in deep neural networks. Many regularization schemes try to prevent overfitting blindly. However, they decrease the convergence speed of training algorithms. Adaptive regularization schemes can solve overfitting more intelligently. They usually do not affect the entire network weights. This paper detects a subset of the weighting layers that cause overfitting. The overfitting recognizes by matrix and tensor condition numbers. An adaptive regularization scheme entitled Adaptive Low-Rank (ALR) is proposed that converges a subset of the weighting layers to their Low-Rank Factorization (LRF). It happens by minimizing a new Tikhonov-based loss function. ALR also encourages lazy weights to contribute to the regularization when epochs grow up. It uses a damping sequence to increment layer selection likelihood in the last generations. Thus before falling the training accuracy, ALR reduces the lazy weights and regularizes the network substantially. The experimental results show that ALR regularizes the deep networks well with high training speed and low resource usage.


[237] 2106.09678

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.


[238] 2106.09679

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

The task of unsupervised motion retargeting in videos has seen substantial advancements through the use of deep neural networks. While early works concentrated on specific object priors such as a human face or body, recent work considered the unsupervised case. When the source and target videos, however, are of different shapes, current methods fail. To alleviate this problem, we introduce JOKR - a JOint Keypoint Representation that captures the motion common to both the source and target videos, without requiring any object prior or data collection. By employing a domain confusion term, we enforce the unsupervised keypoint representations of both videos to be indistinguishable. This encourages disentanglement between the parts of the motion that are common to the two domains, and their distinctive appearance and motion, enabling the generation of videos that capture the motion of the one while depicting the style of the other. To enable cases where the objects are of different proportions or orientations, we apply a learned affine transformation between the JOKRs. This augments the representation to be affine invariant, and in practice broadens the variety of possible retargeting pairs. This geometry-driven representation enables further intuitive control, such as temporal coherence and manual editing. Through comprehensive experimentation, we demonstrate the applicability of our method to different challenging cross-domain video pairs. We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans. We also demonstrate superior temporal coherency and visual quality compared to state-of-the-art alternatives, through statistical metrics and a user study. Source code and videos can be found at https://rmokady.github.io/JOKR/ .


[239] 2106.09680

Accuracy, Interpretability, and Differential Privacy via Explainable Boosting

We show that adding differential privacy to Explainable Boosting Machines (EBMs), a recent method for training interpretable ML models, yields state-of-the-art accuracy while protecting privacy. Our experiments on multiple classification and regression datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees. In addition to high accuracy, two other benefits of applying DP to EBMs are: a) trained models provide exact global and local interpretability, which is often important in settings where differential privacy is needed; and b) the models can be edited after training without loss of privacy to correct errors which DP noise may have introduced.


[240] 2106.09681

XCiT: Cross-Covariance Image Transformers

Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic complexity in time and memory, hindering application to long sequences and high-resolution images. We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images. Our cross-covariance image transformer (XCiT) is built upon XCA. It combines the accuracy of conventional transformers with the scalability of convolutional architectures. We validate the effectiveness and generality of XCiT by reporting excellent results on multiple vision benchmarks, including image classification and self-supervised feature learning on ImageNet-1k, object detection and instance segmentation on COCO, and semantic segmentation on ADE20k.


[241] 2106.09683

PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classes

We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and ghost sample. This allows us to get nontrivial PAC-Bayes and MI-style bounds for general VC classes, something recently shown to be impossible with standard PAC-Bayesian/MI bounds. Second, it allows us to get faster rates of order $O \left(({\text{KL}}/n)^{\gamma}\right)$ for $\gamma > 1/2$ if a Bernstein condition holds and for exp-concave losses (with $\gamma=1$), which is impossible with both standard PAC-Bayes generalization and MI bounds. Our work extends the recent work by Steinke and Zakynthinou [2020] who handle MI with VC but neither PAC-Bayes nor fast rates, the recent work of Hellstr\"om and Durisi [2020] who extend the latter to the PAC-Bayes setting via a unifying exponential inequality, and Mhammedi et al. [2019] who initiated fast rate PAC-Bayes generalization error bounds but handle neither MI nor general VC classes.


[242] 2106.09685

LoRA: Low-Rank Adaptation of Large Language Models

The dominant paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, conventional fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, deploying many independent instances of fine-tuned models, each with 175B parameters, is extremely expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning. LoRA performs on-par or better than fine-tuning in model quality on both GPT-3 and GPT-2, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptations, which sheds light on the efficacy of LoRA. We release our implementation in GPT-2 at https://github.com/microsoft/LoRA .


[243] 2106.09686

How Low Can We Go: Trading Memory for Error in Low-Precision Training

Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these precision choices as a hyperparameter tuning problem, and borrow ideas from meta-learning to learn the tradeoff between memory and error. In this paper, we introduce Pareto Estimation to Pick the Perfect Precision (PEPPP). We use matrix factorization to find non-dominated configurations (the Pareto frontier) with a limited number of network evaluations. For any given memory budget, the precision that minimizes error is a point on this frontier. Practitioners can use the frontier to trade memory for error and choose the best precision for their goals.


[244] 2106.09689

Statistical Query Lower Bounds for List-Decodable Linear Regression

We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0< \alpha <1/2$ such that an $\alpha$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-\alpha)$-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of $d^{\mathrm{poly}(1/\alpha)}$ for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.


[245] 2106.09691

Interactive Change Point Detection using optimisation approach and Bayesian statistics applied to real world applications

Change point detection becomes more and more important as datasets increase in size, where unsupervised detection algorithms can help users process data. To detect change points, a number of unsupervised algorithms have been developed which are based on different principles. One approach is to define an optimisation problem and minimise a cost function along with a penalty function. In the optimisation approach, the choice of the cost function affects the predictions made by the algorithm. In extension to the existing studies, a new type of cost function using Tikhonov regularisation is introduced. Another approach uses Bayesian statistics to calculate the posterior probability distribution of a specific point being a change point. It uses a priori knowledge on the distance between consecutive change points and a likelihood function with information about the segments. The optimisation and Bayesian approaches for offline change point detection are studied and applied to simulated datasets as well as a real world multi-phase dataset. The approaches have previously been studied separately and a novelty lies in comparing the predictions made by the two approaches in a specific setting, consisting of simulated datasets and a real world example. The study has found that the performance of the change point detection algorithms are affected by the features in the data.


[246] 2106.09692

Hi-Phy: A Benchmark for Hierarchical Physical Reasoning

Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds. Humans are very experienced in physical reasoning while it remains a major challenge for AI. To facilitate research addressing this problem, several benchmarks have been proposed recently. However, these benchmarks do not enable us to measure an agent's granular physical reasoning capabilities when solving a complex reasoning task. In this paper, we propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities. Inspired by how humans acquire these capabilities, we propose a general hierarchy of physical reasoning capabilities with increasing complexity. Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds. This benchmark enables us to conduct a comprehensive agent evaluation by measuring the agent's granular physical reasoning capabilities. We conduct an evaluation with human players, learning agents, and heuristic agents and determine their capabilities. Our evaluation shows that learning agents, with good local generalization ability, still struggle to learn the underlying physical reasoning capabilities and perform worse than current state-of-the-art heuristic agents and humans. We believe that this benchmark will encourage researchers to develop intelligent agents with advanced, human-like physical reasoning capabilities. URL: https://github.com/Cheng-Xue/Hi-Phy


[247] 2106.09693

Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks

We have proposed orthogonal-Pad\'e activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Pad\'e activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.


[248] 2106.09694

Simulation study on the fleet performance of shared autonomous bicycles

Rethinking cities is now more imperative than ever, as society is facing challenges such as population growth and climate change. The design of cities can not be abstracted from the design of its mobility system, and, therefore, efficient solutions must be found to transport people and goods throughout the city in an ecological way. An autonomous bicycle-sharing system that combines the benefits of vehicle sharing, electrification, autonomy, and micro-mobility could increase the efficiency and convenience of bicycle-sharing systems incentivizing more people to bike and enjoy their cities in an environmentally friendly way. Due to the uniqueness and radical novelty of introducing autonomous driving technology into bicycle-sharing systems and the inherent complexity of these systems, there is a need to quantify the potential impact of autonomy on fleet performance and user experience. This paper presents an ad-hoc agent-based, discrete event simulator that provides an in-depth understanding of the fleet behavior of autonomous bicycle-sharing systems in the most realistic possible scenarios, including a rebalancing system based on demand prediction. In addition, this work quantifies the extent to which an autonomous system would outperform current bicycle-sharing schemes and describes the impact of different parameters on system efficiency and service quality. This research shows that with a fleet size three and a half times smaller than a station-based system and eight times smaller than a dockless system, an autonomous system can provide overall improved performance and user experience even with no rebalancing. These findings indicate that the remarkable efficiency of an autonomous bicycle-sharing system could compensate for the additional cost of autonomous bicycles.


[249] 2106.09696

BABEL: Bodies, Action and Behavior with English Labels

Understanding the semantics of human movement -- the what, how and why of the movement -- is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of action labels for about 43 hours of mocap sequences from AMASS. Action labels are at two levels of abstraction -- sequence labels describe the overall action in the sequence, and frame labels describe all actions in every frame of the sequence. Each frame label is precisely aligned with the duration of the corresponding action in the mocap sequence, and multiple actions can overlap. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. Labels from BABEL can be leveraged for tasks like action recognition, temporal action localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark of progress in 3D action recognition. The dataset, baseline method, and evaluation code is made available, and supported for academic research purposes at https://babel.is.tue.mpg.de/.


[250] 2106.09700

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug design and repurposing. Recent work has shown that general-domain language models (LMs) can serve as "soft" KGs, and that they can be fine-tuned for the task of KG completion. In this work, we study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We evaluate several domain-specific LMs, fine-tuning them on datasets centered on drugs and diseases that we represent as KGs and enrich with textual entity descriptions. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance. Finally, we demonstrate the advantage of LM models in the inductive setting with novel scientific entities. Our datasets and code are made publicly available.


[251] 2106.09701

Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning

Modern computer vision applications suffer from catastrophic forgetting when incrementally learning new concepts over time. The most successful approaches to alleviate this forgetting require extensive replay of previously seen data, which is problematic when memory constraints or data legality concerns exist. In this work, we consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL), where an incremental learning agent must learn new concepts over time without storing generators or training data from past tasks. One approach for DFCIL is to replay synthetic images produced by inverting a frozen copy of the learner's classification model, but we show this approach fails for common class-incremental benchmarks when using standard distillation strategies. We diagnose the cause of this failure and propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation, and show that our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks. Our method even outperforms several standard replay based methods which store a coreset of images.


[252] 2106.09703

MoDist: Motion Distillation for Self-supervised Video Representation Learning

We present MoDist as a novel method to explicitly distill motion information into self-supervised video representations. Compared to previous video representation learning methods that mostly focus on learning motion cues implicitly from RGB inputs, we show that the representation learned with our MoDist method focus more on foreground motion regions and thus generalizes better to downstream tasks. To achieve this, MoDist enriches standard contrastive learning objectives for RGB video clips with a cross-modal learning objective between a Motion pathway and a Visual pathway. We evaluate MoDist on several datasets for both action recognition (UCF101/HMDB51/SSv2) as well as action detection (AVA), and demonstrate state-of-the-art self-supervised performance on all datasets. Furthermore, we show that MoDist representation can be as effective as (in some cases even better than) representations learned with full supervision. Given its simplicity, we hope MoDist could serve as a strong baseline for future research in self-supervised video representation learning.


[253] 2106.09707

Learning to Predict Visual Attributes in the Wild

Visual attributes constitute a large portion of information contained in a scene. Objects can be described using a wide variety of attributes which portray their visual appearance (color, texture), geometry (shape, size, posture), and other intrinsic properties (state, action). Existing work is mostly limited to study of attribute prediction in specific domains. In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances. Formally, object attribute prediction is a multi-label classification problem where all attributes that apply to an object must be predicted. Our dataset poses significant challenges to existing methods due to large number of attributes, label sparsity, data imbalance, and object occlusion. To this end, we propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features with multi-hop attention, reweighting and resampling techniques, a novel negative label expansion scheme, and a novel supervised attribute-aware contrastive learning algorithm. Using these techniques, we achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current state of the art. Further details about the VAW dataset can be found at this http URL


[254] 2106.09708

Multi-Label Learning from Single Positive Labels

Predicting all applicable labels for a given image is known as multi-label classification. Compared to the standard multi-class case (where each image has only one label), it is considerably more challenging to annotate training data for multi-label classification. When the number of potential labels is large, human annotators find it difficult to mention all applicable labels for each training image. Furthermore, in some settings detection is intrinsically difficult e.g. finding small object instances in high resolution images. As a result, multi-label training data is often plagued by false negatives. We consider the hardest version of this problem, where annotators provide only one relevant label for each image. As a result, training sets will have only one positive label per image and no confirmed negatives. We explore this special case of learning from missing labels across four different multi-label image classification datasets for both linear classifiers and end-to-end fine-tuned deep networks. We extend existing multi-label losses to this setting and propose novel variants that constrain the number of expected positive labels during training. Surprisingly, we show that in some cases it is possible to approach the performance of fully labeled classifiers despite training with significantly fewer confirmed labels.


[255] 2106.09711

Visual Correspondence Hallucination: Towards Geometric Reasoning

Given a pair of partially overlapping source and target images and a keypoint in the source image, the keypoint's correspondent in the target image can be either visible, occluded or outside the field of view. Local feature matching methods are only able to identify the correspondent's location when it is visible, while humans can also hallucinate its location when it is occluded or outside the field of view through geometric reasoning. In this paper, we bridge this gap by training a network to output a peaked probability distribution over the correspondent's location, regardless of this correspondent being visible, occluded, or outside the field of view. We experimentally demonstrate that this network is indeed able to hallucinate correspondences on unseen pairs of images. We also apply this network to a camera pose estimation problem and find it is significantly more robust than state-of-the-art local feature matching-based competitors.


[256] 2106.09712

IFCNet: A Benchmark Dataset for IFC Entity Classification

Enhancing interoperability and information exchange between domain-specific software products for BIM is an important aspect in the Architecture, Engineering, Construction and Operations industry. Recent research started investigating methods from the areas of machine and deep learning for semantic enrichment of BIM models. However, training and evaluation of these machine learning algorithms requires sufficiently large and comprehensive datasets. This work presents IFCNet, a dataset of single-entity IFC files spanning a broad range of IFC classes containing both geometric and semantic information. Using only the geometric information of objects, the experiments show that three different deep learning models are able to achieve good classification performance.


[257] 2106.09714

No-frills Dynamic Planning using Static Planners

In this paper, we address the task of interacting with dynamic environments where the changes in the environment are independent of the agent. We study this through the context of trapping a moving ball with a UR5 robotic arm. Our key contribution is an approach to utilize a static planner for dynamic tasks using a Dynamic Planning add-on; that is, if we can successfully solve a task with a static target, then our approach can solve the same task when the target is moving. Our approach has three key components: an off-the-shelf static planner, a trajectory forecasting network, and a network to predict robot's estimated time of arrival at any location. We demonstrate the generalization of our approach across environments. More information and videos at https://mlevy2525.github.io/DynamicAddOn.


[258] 2106.09028

Exponential Error Convergence in Data Classification with Optimized Random Features: Acceleration by Quantum Machine Learning

Random features are a central technique for scalable learning algorithms based on kernel methods. A recent work has shown that an algorithm for machine learning by quantum computer, quantum machine learning (QML), can exponentially speed up sampling of optimized random features, even without imposing restrictive assumptions on sparsity and low-rankness of matrices that had limited applicability of conventional QML algorithms; this QML algorithm makes it possible to significantly reduce and provably minimize the required number of features for regression tasks. However, a major interest in the field of QML is how widely the advantages of quantum computation can be exploited, not only in the regression tasks. We here construct a QML algorithm for a classification task accelerated by the optimized random features. We prove that the QML algorithm for sampling optimized random features, combined with stochastic gradient descent (SGD), can achieve state-of-the-art exponential convergence speed of reducing classification error in a classification task under a low-noise condition; at the same time, our algorithm with optimized random features can take advantage of the significant reduction of the required number of features so as to accelerate each iteration in the SGD and evaluation of the classifier obtained from our algorithm. These results discover a promising application of QML to significant acceleration of the leading classification algorithm based on kernel methods, without ruining its applicability to a practical class of data sets and the exponential error-convergence speed.


[259] 2106.09068

Extracting real social interactions from social media: a debate of COVID-19 policies in Mexico

A study of the dynamical formation of networks of friends and enemies in social media, in this case Twitter, is presented. We characterise the single node properties of such networks, as the clustering coefficient and the degree, to investigate the structure of links. The results indicate that the network is made from three kinds of nodes: one with high clustering coefficient but very small degree, a second group has zero clustering coefficient with variable degree, and finally, a third group in which the clustering coefficient as a function of the degree decays as a power law. This third group represents $\sim2\%$ of the nodes and is characteristic of dynamical networks with feedback. This part of the lattice seemingly represents strongly interacting friends in a real social network.


[260] 2106.09082

Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization

Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We further specialize the algorithm to solve distributionally robust, decision-dependent learning problems, where gradient information is not readily available. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.


[261] 2106.09093

A Hands-on Comparison of DNNs for Dialog SeparationUsing Transfer Learning from Music Source Separation

This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the considered broadcast content, and vocals separation in music is considered as a parallel for dialog separation in the target application domain. These similarities are assumed to enable transfer learning between the tasks. Three models pre-trained on music (Open-Unmix, Spleeter, and Conv-TasNet) are considered in the experiments, and fine-tuned with real broadcast data. The performance of the models is evaluated before and after fine-tuning with computational evaluation metrics (SI-SIRi, SI-SDRi, 2f-model), as well as with a listening test simulating an application where the non-speech signal is partially attenuated, e.g., for better speech intelligibility. The evaluations include two reference systems specifically developed for dialog separation. The results indicate that pre-trained music source separation models can be used for dialog separation to some degree, and that they benefit from the fine-tuning, reaching a performance close to task-specific solutions.


[262] 2106.09125

Convex Optimization for Trajectory Generation

Reliable and efficient trajectory generation methods are a fundamental need for autonomous dynamical systems of tomorrow. The goal of this article is to provide a comprehensive tutorial of three major convex optimization-based trajectory generation methods: lossless convexification (LCvx), and two sequential convex programming algorithms known as SCvx and GuSTO. In this article, trajectory generation is the computation of a dynamically feasible state and control signal that satisfies a set of constraints while optimizing key mission objectives. The trajectory generation problem is almost always nonconvex, which typically means that it is not readily amenable to efficient and reliable solution onboard an autonomous vehicle. The three algorithms that we discuss use problem reformulation and a systematic algorithmic strategy to nonetheless solve nonconvex trajectory generation tasks through the use of a convex optimizer. The theoretical guarantees and computational speed offered by convex optimization have made the algorithms popular in both research and industry circles. To date, the list of applications includes rocket landing, spacecraft hypersonic reentry, spacecraft rendezvous and docking, aerial motion planning for fixed-wing and quadrotor vehicles, robot motion planning, and more. Among these applications are high-profile rocket flights conducted by organizations like NASA, Masten Space Systems, SpaceX, and Blue Origin. This article aims to give the reader the tools and understanding necessary to work with each algorithm, and to know what each method can and cannot do. A publicly available source code repository supports the provided numerical examples. By the end of the article, the reader should be ready to use the methods, to extend them, and to contribute to their many exciting modern applications.


[263] 2106.09132

Multivariate Pair Trading by Volatility & Model Adaption Trade-off

Pair trading is one of the most discussed topics among financial researches. Despite a growing base of work, portfolio management for multivariate time series is rarely discussed. On the other hand, most researches focus on refining strategy rules instead of finding the optimal portfolio weight. In this paper, we brought up a simple yet profitable strategy called Volatility & Model Adaption Trade-off (VMAT) to leverage the issues. Experiment studies show its superior profit performance over baselines.


[264] 2106.09149

Fréchet derivatives of expected functionals of solutions to stochastic differential equations

In the analysis of stochastic dynamical systems described by stochastic differential equations (SDEs), it is often of interest to analyse the sensitivity of the expected value of a functional of the solution of the SDE with respect to perturbations in the SDE parameters. In this paper, we consider path functionals that depend on the solution of the SDE up to a stopping time. We derive formulas for Fr\'{e}chet derivatives of the expected values of these functionals with respect to bounded perturbations of the drift, using the Cameron-Martin-Girsanov theorem for the change of measure. Using these derivatives, we construct an example to show that the map that sends the change of drift to the corresponding relative entropy is not in general convex. We then analyse the existence and uniqueness of solutions to stochastic optimal control problems defined on possibly random time intervals, as well as gradient-based numerical methods for solving such problems.


[265] 2106.09188

Physics-informed CoKriging model of a redox flow battery

Redox flow batteries (RFBs) offer the capability to store large amounts of energy cheaply and efficiently, however, there is a need for fast and accurate models of the charge-discharge curve of a RFB to potentially improve the battery capacity and performance. We develop a multifidelity model for predicting the charge-discharge curve of a RFB. In the multifidelity model, we use the Physics-informed CoKriging (CoPhIK) machine learning method that is trained on experimental data and constrained by the so-called "zero-dimensional" physics-based model. Here we demonstrate that the model shows good agreement with experimental results and significant improvements over existing zero-dimensional models. We show that the proposed model is robust as it is not sensitive to the input parameters in the zero-dimensional model. We also show that only a small amount of high-fidelity experimental datasets are needed for accurate predictions for the range of considered input parameters, which include current density, flow rate, and initial concentrations.


[266] 2106.09215

Optimum-statistical collaboration towards efficient black-box optimization

With increasingly more hyperparameters involved in their training, machine learning systems demand a better understanding of hyperparameter tuning automation. This has raised interest in studies of provably black-box optimization, which is made more practical by better exploration mechanism implemented in algorithm design, managing the flux of both optimization and statistical errors. Prior efforts focus on delineating optimization errors, but this is deficient: black-box optimization algorithms can be inefficient without considering heterogeneity among reward samples. In this paper, we make the key delineation on the role of statistical uncertainty in black-box optimization, guiding a more efficient algorithm design. We introduce \textit{optimum-statistical collaboration}, a framework of managing the interaction between optimization error flux and statistical error flux evolving in the optimization process. Inspired by this framework, we propose the \texttt{VHCT} algorithms for objective functions with only local-smoothness assumptions. In theory, we prove our algorithm enjoys rate-optimal regret bounds; in experiments, we show the algorithm outperforms prior efforts in extensive settings.


[267] 2106.09216

Layer Pruning on Demand with Intermediate CTC

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and stochastic depth, to train a model whose performance does not degrade much after pruning. We present an in-depth analysis of layer behaviors using singular vector canonical correlation analysis (SVCCA), and efficient strategies for finding layers which are safe to prune. Using the proposed method, we show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU, while each pruned sub-model maintains the accuracy of individually trained model of the same depth.


[268] 2106.09222

Localized Uncertainty Attacks

The susceptibility of deep learning models to adversarial perturbations has stirred renewed attention in adversarial examples resulting in a number of attacks. However, most of these attacks fail to encompass a large spectrum of adversarial perturbations that are imperceptible to humans. In this paper, we present localized uncertainty attacks, a novel class of threat models against deterministic and stochastic classifiers. Under this threat model, we create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain. To find such regions, we utilize the predictive uncertainty of the classifier when the classifier is stochastic or, we learn a surrogate model to amortize the uncertainty when it is deterministic. Unlike $\ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible. When considered under our threat model, these attacks still produce strong adversarial examples; with the examples retaining a greater degree of similarity with the inputs.


[269] 2106.09276

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum l1-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.


[270] 2106.09286

Sub-linear convergence of a tamed stochastic gradient descent method in Hilbert space

In this paper, we introduce the tamed stochastic gradient descent method (TSGD) for optimization problems. Inspired by the tamed Euler scheme, which is a commonly used method within the context of stochastic differential equations, TSGD is an explicit scheme that exhibits stability properties similar to those of implicit schemes. As its computational cost is essentially equivalent to that of the well-known stochastic gradient descent method (SGD), it constitutes a very competitive alternative to such methods. We rigorously prove (optimal) sub-linear convergence of the scheme for strongly convex objective functions on an abstract Hilbert space. The analysis only requires very mild step size restrictions, which illustrates the good stability properties. The analysis is based on a priori estimates more frequently encountered in a time integration context than in optimization, and this alternative approach provides a different perspective also on the convergence of SGD. Finally, we demonstrate the usability of the scheme on a problem arising in a context of supervised learning.


[271] 2106.09303

A Multi-task convolutional neural network for blind stereoscopic image quality assessment using naturalness analysis

This paper addresses the problem of blind stereoscopic image quality assessment (NR-SIQA) using a new multi-task deep learning based-method. In the field of stereoscopic vision, the information is fairly distributed between the left and right views as well as the binocular phenomenon. In this work, we propose to integrate these characteristics to estimate the quality of stereoscopic images without reference through a convolutional neural network. Our method is based on two main tasks: the first task predicts naturalness analysis based features adapted to stereo images, while the second task predicts the quality of such images. The former, so-called auxiliary task, aims to find more robust and relevant features to improve the quality prediction. To do this, we compute naturalness-based features using a Natural Scene Statistics (NSS) model in the complex wavelet domain. It allows to capture the statistical dependency between pairs of the stereoscopic images. Experiments are conducted on the well known LIVE PHASE I and LIVE PHASE II databases. The results obtained show the relevance of our method when comparing with those of the state-of-the-art. Our code is available online on \url{https://github.com/Bourbia-Salima/multitask-cnn-nrsiqa_2021}.


[272] 2106.09311

Controllable Confidence-Based Image Denoising

Image denoising is a classic restoration problem. Yet, current deep learning methods are subject to the problems of generalization and interpretability. To mitigate these problems, in this project, we present a framework that is capable of controllable, confidence-based noise removal. The framework is based on the fusion between two different denoised images, both derived from the same noisy input. One of the two is denoised using generic algorithms (e.g. Gaussian), which make few assumptions on the input images, therefore, generalize in all scenarios. The other is denoised using deep learning, performing well on seen datasets. We introduce a set of techniques to fuse the two components smoothly in the frequency domain. Beyond that, we estimate the confidence of a deep learning denoiser to allow users to interpret the output, and provide a fusion strategy that safeguards them against out-of-distribution inputs. Through experiments, we demonstrate the effectiveness of the proposed framework in different use cases.


[273] 2106.09363

Similarity of particle systems using an invariant root mean square deviation measure

Determining whether two particle systems are similar is a common problem in particle simulations. When the comparison should be invariant under permutations, orthogonal transformations, and translations of the systems, special techniques are needed. We present an algorithm that can test particle systems of finite size for similarity and, if they are similar, can find the optimal alignment between them. Our approach is based on an invariant version of the root mean square deviation (RMSD) measure and is capable of finding the globally optimal solution in $O(n^3)$ operations where $n$ is the number of three-dimensional particles.


[274] 2106.09374

Quantum algorithm for Dyck Language with Multiple Types of Brackets

We consider the recognition problem of the Dyck Language generalized for multiple types of brackets. We provide an algorithm with quantum query complexity $O(\sqrt{n}(\log n)^{0.5k})$, where $n$ is the length of input and $k$ is the maximal nesting depth of brackets. Additionally, we show the lower bound for this problem which is $O(\sqrt{n}c^{k})$ for some constant $c$. Interestingly, classical algorithms solving the Dyck Language for multiple types of brackets substantially differ form the algorithm solving the original Dyck language. At the same time, quantum algorithms for solving both kinds of the Dyck language are of similar nature and requirements.


[275] 2106.09376

Differentially Private Hamiltonian Monte Carlo

Markov chain Monte Carlo (MCMC) algorithms have long been the main workhorses of Bayesian inference. Among them, Hamiltonian Monte Carlo (HMC) has recently become very popular due to its efficiency resulting from effective use of the gradients of the target distribution. In privacy-preserving machine learning, differential privacy (DP) has become the gold standard in ensuring that the privacy of data subjects is not violated. Existing DP MCMC algorithms either use random-walk proposals, or do not use the Metropolis--Hastings (MH) acceptance test to ensure convergence without decreasing their step size to zero. We present a DP variant of HMC using the MH acceptance test that builds on a recently proposed DP MCMC algorithm called the penalty algorithm, and adds noise to the gradient evaluations of HMC. We prove that the resulting algorithm converges to the correct distribution, and is ergodic. We compare DP-HMC with the existing penalty, DP-SGLD and DP-SGNHT algorithms, and find that DP-HMC has better or equal performance than the penalty algorithm, and performs more consistently than DP-SGLD or DP-SGNHT.


[276] 2106.09391

Convergence of Dynamic Programming on the Semidefinite Cone

The goal of this paper is to investigate new and simple convergence analysis of dynamic programming for linear quadratic regulator problem of discrete-time linear time-invariant systems. In particular, bounds on errors are given in terms of both matrix inequalities and matrix norm. Under a mild assumption on the initial parameter, we prove that the Q-value iteration exponentially converges to the optimal solution. Moreover, a global asymptotic convergence is also presented. These results are then extended to the policy iteration. We prove that in contrast to the Q-value iteration, the policy iteration always converges exponentially fast. An example is given to illustrate the results.


[277] 2106.09415

Trainable Discrete Feature Embeddings for Variational Quantum Classifier

Quantum classifiers provide sophisticated embeddings of input data in Hilbert space promising quantum advantage. The advantage stems from quantum feature maps encoding the inputs into quantum states with variational quantum circuits. A recent work shows how to map discrete features with fewer quantum bits using Quantum Random Access Coding (QRAC), an important primitive to encode binary strings into quantum states. We propose a new method to embed discrete features with trainable quantum circuits by combining QRAC and a recently proposed strategy for training quantum feature map called quantum metric learning. We show that the proposed trainable embedding requires not only as few qubits as QRAC but also overcomes the limitations of QRAC to classify inputs whose classes are based on hard Boolean functions. We numerically demonstrate its use in variational quantum classifiers to achieve better performances in classifying real-world datasets, and thus its possibility to leverage near-term quantum computers for quantum machine learning.


[278] 2106.09473

Importance measures derived from random forests: characterisation and extension

Nowadays new technologies, and especially artificial intelligence, are more and more established in our society. Big data analysis and machine learning, two sub-fields of artificial intelligence, are at the core of many recent breakthroughs in many application fields (e.g., medicine, communication, finance, ...), including some that are strongly related to our day-to-day life (e.g., social networks, computers, smartphones, ...). In machine learning, significant improvements are usually achieved at the price of an increasing computational complexity and thanks to bigger datasets. Currently, cutting-edge models built by the most advanced machine learning algorithms typically became simultaneously very efficient and profitable but also extremely complex. Their complexity is to such an extent that these models are commonly seen as black-boxes providing a prediction or a decision which can not be interpreted or justified. Nevertheless, whether these models are used autonomously or as a simple decision-making support tool, they are already being used in machine learning applications where health and human life are at stake. Therefore, it appears to be an obvious necessity not to blindly believe everything coming out of those models without a detailed understanding of their predictions or decisions. Accordingly, this thesis aims at improving the interpretability of models built by a specific family of machine learning algorithms, the so-called tree-based methods. Several mechanisms have been proposed to interpret these models and we aim along this thesis to improve their understanding, study their properties, and define their limitations.


[279] 2106.09481

Stochastic Bias-Reduced Gradient Methods

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $\delta$, variance $O(\log(1/\delta))$, and an expected sampling cost of $O(\log(1/\delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.


[280] 2106.09488

Scaling Laws for Acoustic Models

There is a recent trend in machine learning to increase model quality by growing models to sizes previously thought to be unreasonable. Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships, or scaling laws, that predict model quality from model size, training set size, and the available compute budget. These scaling laws allow one to choose nearly optimal hyper-parameters given constraints on available training data, model parameter count, or training computation budget. In this paper, we demonstrate that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws. We extend previous work to jointly predict loss due to model size, to training set size, and to the inherent "irreducible loss" of the task. We find that the scaling laws accurately match model performance over two orders of magnitude in both model size and training set size, and make predictions about the limits of model performance.


[281] 2106.09512

Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

Postprocessing ensemble weather predictions to correct systematic errors has become a standard practice in research and operations. However, only few recent studies have focused on ensemble postprocessing of wind gust forecasts, despite its importance for severe weather warnings. Here, we provide a comprehensive review and systematic comparison of eight statistical and machine learning methods for probabilistic wind gust forecasting via ensemble postprocessing, that can be divided in three groups: State of the art postprocessing techniques from statistics (ensemble model output statistics (EMOS), member-by-member postprocessing, isotonic distributional regression), established machine learning methods (gradient-boosting extended EMOS, quantile regression forests) and neural network-based approaches (distributional regression network, Bernstein quantile network, histogram estimation network). The methods are systematically compared using six years of data from a high-resolution, convection-permitting ensemble prediction system that was run operationally at the German weather service, and hourly observations at 175 surface weather stations in Germany. While all postprocessing methods yield calibrated forecasts and are able to correct the systematic errors of the raw ensemble predictions, incorporating information from additional meteorological predictor variables beyond wind gusts leads to significant improvements in forecast skill. In particular, we propose a flexible framework of locally adaptive neural networks with different probabilistic forecast types as output, which not only significantly outperform all benchmark postprocessing methods but also learn physically consistent relations associated with the diurnal cycle, especially the evening transition of the planetary boundary layer.


[282] 2106.09532

ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive that cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) to rescore ASR N-best hypotheses. To improve contextualization, we utilize turn level dialogue acts along with cross utterance context carry over. Additionally, to adapt our domain-general NLM towards e-commerce on-the-fly, we use embeddings derived from a finetuned masked LM on in-domain data. Finally, to improve robustness towards in-domain content words, we propose a multi-task model that can jointly perform content word detection and language modeling tasks. Compared to a non-contextual LSTM LM baseline, our best performing NLM rescorer results in a content WER reduction of 19.2% on e-commerce audio test set and a slot labeling F1 improvement of 6.4%.


[283] 2106.09539

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.


[284] 2106.09545

STAN: A stuttering therapy analysis helper

Stuttering is a complex speech disorder identified by repeti-tions, prolongations of sounds, syllables or words and blockswhile speaking. Specific stuttering behaviour differs strongly,thus needing personalized therapy. Therapy sessions requirea high level of concentration by the therapist. We introduceSTAN, a system to aid speech therapists in stuttering therapysessions. Such an automated feedback system can lower thecognitive load on the therapist and thereby enable a more con-sistent therapy as well as allowing analysis of stuttering overthe span of multiple therapy sessions.


[285] 2106.09556

A Deep Reinforcement Learning Approach towards Pendulum Swing-up Problem based on TF-Agents

Adapting the idea of training CartPole with Deep Q-learning agent, we are able to find a promising result that prevent the pole from falling down. The capacity of reinforcement learning (RL) to learn from the interaction between the environment and agent provides an optimal control strategy. In this paper, we aim to solve the classic pendulum swing-up problem that making the learned pendulum to be in upright position and balanced. Deep Deterministic Policy Gradient algorithm is introduced to operate over continuous action domain in this problem. Salient results of optimal pendulum are proved with increasing average return, decreasing loss, and live video in the code part.


[286] 2106.09574

Localization based on enhanced low frequency interaural level difference

The processing of low-frequency interaural time differences is found to be problematic among hearing-impaired people. The current generation of beamformers does not consider this deficiency. In an attempt to tackle this issue, we propose to replace the inaudible interaural time differences in the low-frequency region with the interaural level differences. In addition, a beamformer is introduced and analyzed, which enhances the low-frequency interaural level differences of the sound sources using a near-field transformation. The proposed beamforming problem is relaxed to a convex problem using semi-definite relaxation. The instrumental analysis suggests that the low-frequency interaural level differences are enhanced without hindering the provided intelligibility. A psychoacoustic localization test is done using a listening experiment, which suggests that the replacement of time differences into level differences improves the localization performance of normal-hearing listeners for an anechoic scene but not for a reverberant scene.


[287] 2106.09606

Cardinality Minimization, Constraints, and Regularization: A Survey

We survey optimization problems that involve the cardinality of variable vectors in constraints or the objective function. We provide a unified viewpoint on the general problem classes and models, and give concrete examples from diverse application fields such as signal and image processing, portfolio selection, or machine learning. The paper discusses general-purpose modeling techniques and broadly applicable as well as problem-specific exact and heuristic solution approaches. While our perspective is that of mathematical optimization, a main goal of this work is to reach out to and build bridges between the different communities in which cardinality optimization problems are frequently encountered. In particular, we highlight that modern mixed-integer programming, which is often regarded as impractical due to commonly unsatisfactory behavior of black-box solvers applied to generic problem formulations, can in fact produce provably high-quality or even optimal solutions for cardinality optimization problems, even in large-scale real-world settings. Achieving such performance typically draws on the merits of problem-specific knowledge that may stem from different fields of application and, e.g., shed light on structural properties of a model or its solutions, or lead to the development of efficient heuristics; we also provide some illustrative examples.


[288] 2106.09620

Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. In particular, we establish the major result that identifiability for this framework holds even in the presence of noise of unknown distribution. The SNICA setting therefore subsumes all the existing nonlinear ICA models for time-series and also allows for new much richer identifiable models. Finally, as an example of our framework's flexibility, we introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states; and enables principled estimation and inference by variational maximum-likelihood.


[289] 2106.09622

Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model

Decoding the speech signal that a person is listening to from the human brain via electroencephalography (EEG) can help us understand how our auditory system works. Linear models have been used to reconstruct the EEG from speech or vice versa. Recently, Artificial Neural Networks (ANNs) such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures have outperformed linear models in modeling the relation between EEG and speech. Before attempting to use these models in real-world applications such as hearing tests or (second) language comprehension assessment we need to know what level of speech information is being utilized by these models. In this study, we aim to analyze the performance of an LSTM-based model using different levels of speech features. The task of the model is to determine which of two given speech segments is matched with the recorded EEG. We used low- and high-level speech features including: envelope, mel spectrogram, voice activity, phoneme identity, and word embedding. Our results suggest that the model exploits information about silences, intensity, and broad phonetic classes from the EEG. Furthermore, the mel spectrogram, which contains all this information, yields the highest accuracy (84%) among all the features.


[290] 2106.09646

Impurity-induced increase in the thermal quantum correlations and teleportation in an Ising-XXZ diamond chain

In this work we analyze the quantum correlations in a spin-1/2 Ising-XXZ diamond chain with one plaquette distorted impurity. We have shown that the introduction of impurity into the chain can significantly increase entanglement as well as quantum correlations compared to the original model, without impurity. Due to the great flexibility in the choice of impurity parameters, the model presented is very general and this fact can be very useful for future experimental measurements. In addition to entanglement and quantum coherence, we studied quantum teleportation through a quantum channel composed by a coupled of Heisenberg dimers with distorted impurity in an Ising-XXZ diamond chain, as well as fidelity in teleportation. Our analysis shows that the appropriate choice of parameters can greatly increase all the measures analyzed. For comparison purposes, we present all our results together with the results of the measurements made for the original model, without impurity, studied in previous works.


[291] 2106.09660

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

This paper introduces WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis. WaveGrad 2 is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence. The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform. This contrasts to the original WaveGrad vocoder which conditions on mel-spectrogram features, generated by a separate model. The iterative refinement process starts from Gaussian noise, and through a series of refinement steps (e.g., 50 steps), progressively recovers the audio sequence. WaveGrad 2 offers a natural way to trade-off between inference speed and sample quality, through adjusting the number of refinement steps. Experiments show that the model can generate high fidelity audio, approaching the performance of a state-of-the-art neural TTS system. We also report various ablation studies over different model configurations. Audio samples are available at https://wavegrad.github.io/v2.


[292] 2106.09662

Automatic Segmentation of the Prostate on 3D Trans-rectal Ultrasound Images using Statistical Shape Models and Convolutional Neural Networks

In this work we propose to segment the prostate on a challenging dataset of trans-rectal ultrasound (TRUS) images using convolutional neural networks (CNNs) and statistical shape models (SSMs). TRUS is commonly used for a number of image-guided interventions on the prostate. Fast and accurate segmentation on the organ in these images is crucial to planning and fusion with other modalities such as magnetic resonance images (MRIs) . However, TRUS has limited soft tissue contrast and signal to noise ratio which makes the task of segmenting the prostate challenging and subject to inter-observer and intra-observer variability. This is especially problematic at the base and apex where the gland boundary is hard to define. In this paper, we aim to tackle this problem by taking advantage of shape priors learnt on an MR dataset which has higher soft tissue contrast allowing the prostate to be contoured more accurately. We use this shape prior in combination with a prostate tissue probability map computed by a CNN for segmentation.


[293] 2106.09663

A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization

In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., ICML'21). Then we provide a simple and clean convergence analysis of PAGE for achieving optimal convergence rates. Moreover, PAGE and its analysis can be easily adopted and generalized to other works. We hope that this note provides the insights and is helpful for future works.


[294] 2106.09664

Design and Analysis of Robust Deep Learning Models for Stock Price Prediction

Building predictive models for robust and accurate prediction of stock prices and stock price movement is a challenging research problem to solve. The well-known efficient market hypothesis believes in the impossibility of accurate prediction of future stock prices in an efficient stock market as the stock prices are assumed to be purely stochastic. However, numerous works proposed by researchers have demonstrated that it is possible to predict future stock prices with a high level of precision using sophisticated algorithms, model architectures, and the selection of appropriate variables in the models. This chapter proposes a collection of predictive regression models built on deep learning architecture for robust and precise prediction of the future prices of a stock listed in the diversified sectors in the National Stock Exchange (NSE) of India. The Metastock tool is used to download the historical stock prices over a period of two years (2013- 2014) at 5 minutes intervals. While the records for the first year are used to train the models, the testing is carried out using the remaining records. The design approaches of all the models and their performance results are presented in detail. The models are also compared based on their execution time and accuracy of prediction.


[295] 2106.09702

Spectral goodness-of-fit tests for complete and partial network data

Networks describe the, often complex, relationships between individual actors. In this work, we address the question of how to determine whether a parametric model, such as a stochastic block model or latent space model, fits a dataset well and will extrapolate to similar data. We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides an straightforward, computationally fast way of selecting parameters in a number of commonly used network models. For example, we show how to select the dimension of the latent space in latent space models. Unlike other network goodness-of-fit methods, our general approach does not require simulating from a candidate parametric model, which can be cumbersome with large graphs, and eliminates the need to choose a particular set of statistics on the graph for comparison. It also allows us to perform goodness-of-fit tests on partial network data, such as Aggregated Relational Data. We show with simulations that our method performs well in many situations of interest. We analyze several empirically relevant networks and show that our method leads to improved community detection algorithms. R code to implement our method is available on Github.