New articles on cs

[1] 2010.15120

Raw Audio for Depression Detection Can Be More Robust Against Gender Imbalance than Mel-Spectrogram Features

Depression is a large-scale mental health problem and a challenging area for machine learning researchers in terms of the detection of depression. Datasets such as the Distress Analysis Interview Corpus - Wizard of Oz have been created to aid research in this area. However, on top of the challenges inherent in accurately detecting depression, biases in datasets may result in skewed classification performance. In this paper we examine gender bias in the DAIC-WOZ dataset using audio-based deep neural networks. We show that gender biases in DAIC-WOZ can lead to an overreporting of performance, which has been overlooked in the past due to the same gender biases being present in the test set. By using raw audio and different concepts from Fair Machine Learning, such as data re-distribution, we can mitigate against the harmful effects of bias.

[2] 2010.15138

papaya2: 2D Irreducible Minkowski Tensor computation

A common challenge in scientific and technical domains is the quantitative description of geometries and shapes, e.g. in the analysis of microscope imagery or astronomical observation data. Frequently, it is desirable to go beyond scalar shape metrics such as porosity and surface to volume ratios because the samples are anisotropic or because direction-dependent quantities such as conductances or elasticity are of interest. Minkowski Tensors are a systematic family of versatile and robust higher-order shape descriptors that allow for shape characterization of arbitrary order and promise a path to systematic structure-function relationships for direction-dependent properties. Papaya2 is a software to calculate 2D higher-order shape metrics with a library interface, support for Irreducible Minkowski Tensors and interpolated marching squares. Extensions to Matlab, JavaScript and Python are provided as well. While the tensor of inertia is computed by many tools, we are not aware of other open-source software which provides higher-rank shape characterization in 2D.

[3] 2010.15149

DeSMOG: Detecting Stance in Media On Global Warming

Citing opinions is a powerful yet understudied strategy in argumentation. For example, an environmental activist might say, "Leading scientists agree that global warming is a serious concern," framing a clause which affirms their own stance ("that global warming is serious") as an opinion endorsed ("[scientists] agree") by a reputable source ("leading"). In contrast, a global warming denier might frame the same clause as the opinion of an untrustworthy source with a predicate connoting doubt: "Mistaken scientists claim [...]." Our work studies opinion-framing in the global warming (GW) debate, an increasingly partisan issue that has received little attention in NLP. We introduce DeSMOG, a dataset of stance-labeled GW sentences, and train a BERT classifier to study novel aspects of argumentation in how different sides of a debate represent their own and each other's opinions. From 56K news articles, we find that similar linguistic devices for self-affirming and opponent-doubting discourse are used across GW-accepting and skeptic media, though GW-skeptical media shows more opponent-doubt. We also find that authors often characterize sources as hypocritical, by ascribing opinions expressing the author's own view to source entities known to publicly endorse the opposing view. We release our stance dataset, model, and lexicons of framing devices for future work on opinion-framing and the automatic detection of GW stance.

[4] 2010.15155

Kernel Aggregated Fast Multipole Method: Efficient summation of Laplace and Stokes kernel functions

Many different simulation methods for Stokes flow problems involve a common computationally intense task---the summation of a kernel function over $O(N^2)$ pairs of points. One popular technique is the Kernel Independent Fast Multipole Method (KIFMM), which constructs a spatial adaptive octree and places a small number of equivalent multipole and local points around each octree box, and completes the kernel sum with $O(N)$ performance. However, the KIFMM cannot be used directly with nonlinear kernels, can be inefficient for complicated linear kernels, and in general is difficult to implement compared to less-efficient alternatives such as Ewald-type methods. Here we present the Kernel Aggregated Fast Multipole Method (KAFMM), which overcomes these drawbacks by allowing different kernel functions to be used for specific stages of octree traversal. In many cases a simpler linear kernel suffices during the most extensive stage of octree traversal, even for nonlinear kernel summation problems. The KAFMM thereby improves computational efficiency in general and also allows efficient evaluation of some nonlinear kernel functions such as the regularized Stokeslet. We have implemented our method as an open-source software library STKFMM with support for Laplace kernels, the Stokeslet, regularized Stokeslet, Rotne-Prager-Yamakawa (RPY) tensor, and the Stokes double-layer and traction operators. Open and periodic boundary conditions are supported for all kernels, and the no-slip wall boundary condition is supported for the Stokeslet and RPY tensor. The package is designed to be ready-to-use as well as being readily extensible to additional kernels. Massive parallelism is supported with mixed OpenMP and MPI.

[5] 2010.15157

Panoster: End-to-end Panoptic Segmentation of LiDAR Point Clouds

Panoptic segmentation has recently unified semantic and instance segmentation, previously addressed separately, thus taking a step further towards creating more comprehensive and efficient perception systems. In this paper, we present Panoster, a novel proposal-free panoptic segmentation method for point clouds. Unlike previous approaches relying on several steps to group pixels or points into objects, Panoster proposes a simplified framework incorporating a learning-based clustering solution to identify instances. At inference time, this acts as a class-agnostic semantic segmentation, allowing Panoster to be fast, while outperforming prior methods in terms of accuracy. Additionally, we showcase how our approach can be flexibly and effectively applied on diverse existing semantic architectures to deliver panoptic predictions.

[6] 2010.15158

CNN Profiler on Polar Coordinate Images for Tropical Cyclone Structure Analysis

Convolutional neural networks (CNN) have achieved great success in analyzing tropical cyclones (TC) with satellite images in several tasks, such as TC intensity estimation. In contrast, TC structure, which is conventionally described by a few parameters estimated subjectively by meteorology specialists, is still hard to be profiled objectively and routinely. This study applies CNN on satellite images to create the entire TC structure profiles, covering all the structural parameters. By utilizing the meteorological domain knowledge to construct TC wind profiles based on historical structure parameters, we provide valuable labels for training in our newly released benchmark dataset. With such a dataset, we hope to attract more attention to this crucial issue among data scientists. Meanwhile, a baseline is established with a specialized convolutional model operating on polar-coordinates. We discovered that it is more feasible and physically reasonable to extract structural information on polar-coordinates, instead of Cartesian coordinates, according to a TC's rotational and spiral natures. Experimental results on the released benchmark dataset verified the robustness of the proposed model and demonstrated the potential for applying deep learning techniques for this barely developed yet important topic.

[7] 2010.15162

Sizeless: Predicting the optimal size of serverless functions

Serverless functions are a cloud computing paradigm that reduces operational overheads for developers, because the cloud provider takes care of resource management tasks such as resource provisioning, deployment, and auto-scaling. The only resource management task that developers are still in charge of is resource sizing, that is, selecting how much resources are allocated to each worker instance. However, due to the challenging nature of resource sizing, developers often neglect it despite its significant cost and performance benefits. Existing approaches aiming to automate serverless functions resource sizing require dedicated performance tests, which are time consuming to implement and maintain. In this paper, we introduce Sizeless -- an approach to predict the optimal resource size of a serverless function using monitoring data from a single resource size. As our approach requires only production monitoring data, developers no longer need to implement and maintain representative performance tests. Furthermore, it enables cloud providers, which cannot engage in testing the performance of user functions, to implement resource sizing on a platform level and automate the last resource management task associated with serverless functions. In our evaluation, Sizeless was able to predict the execution time of the serverless functions of a realistic server-less application with a median prediction accuracy of 93.1%. Using Sizeless to optimize the memory size of this application results in a speedup of 16.7% while simultaneously decreasing costs by 2.5%.

[8] 2010.15169

Semi-Grant-Free NOMA: Ergodic Rates Analysis with Random Deployed Users

Semi-grant-free (Semi-GF) non-orthogonal multiple access (NOMA) enables grant-free (GF) and grant-based (GB) users to share the same resource blocks, thereby balancing the connectivity and stability of communications. This letter analyzes ergodic rates of Semi-GF NOMA systems. First, this paper exploits a Semi-GF protocol, denoted as dynamic protocol, for selecting GF users into the occupied GB channels via the GB user's instantaneous received power. Under this protocol, the closed-form analytical and approximated expressions for ergodic rates are derived. The numerical results illustrate that the GF user (weak NOMA user) has a performance upper limit, while the ergodic rate of the GB user (strong NOMA user) increases linearly versus the transmit signal-to-noise ratio.

[9] 2010.15171

Slicing a single wireless collision channel among throughput- and timeliness-sensitive services

The fifth generation (5G) wireless system has a platform-driven approach, aiming to support heterogeneous connections with very diverse requirements. The shared wireless resources should be sliced in a way that each user perceives that its requirement has been met. Heterogeneity challenges the traditional notion of resource efficiency, as the resource usage has cater for, e.g. rate maximization for one user and timeliness requirement for another user. This paper treats a model for radio access network (RAN) uplink, where a throughput-demanding broadband user shares wireless resources with an intermittently active user that wants to optimize the timeliness, expressed in terms of latency-reliability or Age of Information (AoI). We evaluate the trade-offs between throughput and timeliness for Orthogonal Multiple Access (OMA) as well as Non-Orthogonal Multiple Access (NOMA) with successive interference cancellation (SIC). We observe that NOMA with SIC, in a conservative scenario with destructive collisions, is just slightly inferior to that of OMA, which indicates that it may offer significant benefits in practical deployments where the capture effect is frequently encountered. On the other hand, finding the optimal configuration of NOMA with SIC depends on the activity pattern of the intermittent user, to which OMA is insensitive.

[10] 2010.15174

Improving Perceptual Quality by Phone-Fortified Perceptual Loss for Speech Enhancement

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e.g. phones and syllables. In this study, we took phonetic characteristics into account in the SE training process. Hence, we designed a phone-fortified perceptual (PFP) loss, and the training of our SE model was guided by PFP loss. In PFP loss, phonetic characteristics are extracted by wav2vec, an unsupervised learning model based on the contrastive predictive coding (CPC) criterion. Different from previous deep-feature-based approaches, the proposed approach explicitly uses the phonetic information in the deep feature extraction process to guide the SE model training. To test the proposed approach, we first confirmed that the wav2vec representations carried clear phonetic information using a t-distributed stochastic neighbor embedding (t-SNE) analysis. Next, we observed that the proposed PFP loss was more strongly correlated with the perceptual evaluation metrics than point-wise and signal-level losses, thus achieving higher scores for standardized quality and intelligibility evaluation metrics in the Voice Bank--DEMAND dataset.

[11] 2010.15187

A Study on Efficiency in Continual Learning Inspired by Human Learning

Humans are efficient continual learning systems; we continually learn new skills from birth with finite cells and resources. Our learning is highly optimized both in terms of capacity and time while not suffering from catastrophic forgetting. In this work we study the efficiency of continual learning systems, taking inspiration from human learning. In particular, inspired by the mechanisms of sleep, we evaluate popular pruning-based continual learning algorithms, using PackNet as a case study. First, we identify that weight freezing, which is used in continual learning without biological justification, can result in over $2\times$ as many weights being used for a given level of performance. Secondly, we note the similarity in human day and night time behaviors to the training and pruning phases respectively of PackNet. We study a setting where the pruning phase is given a time budget, and identify connections between iterative pruning and multiple sleep cycles in humans. We show there exists an optimal choice of iteration v.s. epochs given different tasks.

[12] 2010.15193

Explicit stabilized multirate method for stiff stochastic differential equations

Stabilized explicit methods are particularly efficient for large systems of stiff stochastic differential equations (SDEs) due to their extended stability domain. However, they loose their efficiency when a severe stiffness is induced by very few "fast" degrees of freedom, as the stiff and nonstiff terms are evaluated concurrently. Therefore, inspired by [A. Abdulle, M. J. Grote, and G. Rosilho de Souza, Preprint (2020), arXiv:2006.00744] we introduce a stochastic modified equation whose stiffness depends solely on the "slow" terms. By integrating this modified equation with a stabilized explicit scheme we devise a multirate method which overcomes the bottleneck caused by a few severely stiff terms and recovers the efficiency of stabilized schemes for large systems of nonlinear SDEs. The scheme is not based on any scale separation assumption of the SDE and therefore it is employable for problems stemming from the spatial discretization of stochastic parabolic partial differential equations on locally refined grids. The multirate scheme has strong order 1/2, weak order 1 and its stability is proved on a model problem. Numerical experiments confirm the efficiency and accuracy of the scheme.

[13] 2010.15195

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments

First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor virtual home-environment pose significant sample-efficiency challenges for reinforcement learning (RL) agents learning from sparse task rewards. To alleviate these challenges, prior work has provided extensive supervision via a combination of reward-shaping, ground-truth object-information, and expert demonstrations. In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent. Our key insight is that learning an object-model that incorporates object-attention into forward prediction provides a dense learning signal for unsupervised representation learning of both objects and their relationships. This, in turn, enables faster policy learning for an object-centric relational RL agent. We demonstrate our agent by introducing a set of challenging object-interaction tasks in the AI2Thor environment where learning with our attentive object-model is key to strong performance. Specifically, we compare our agent and relational RL agents with alternative auxiliary tasks to a relational RL agent equipped with ground-truth object-information, and show that learning with our object-model best closes the performance gap in terms of both learning speed and maximum success rate. Additionally, we find that incorporating object-attention into an object-model's forward predictions is key to learning representations which capture object-category and object-state.

[14] 2010.15196

A fast and scalable computational framework for large-scale and high-dimensional Bayesian optimal experimental design

We develop a fast and scalable computational framework to solve large-scale and high-dimensional Bayesian optimal experimental design problems. In particular, we consider the problem of optimal observation sensor placement for Bayesian inference of high-dimensional parameters governed by partial differential equations (PDEs), which is formulated as an optimization problem that seeks to maximize an expected information gain (EIG). Such optimization problems are particularly challenging due to the curse of dimensionality for high-dimensional parameters and the expensive solution of large-scale PDEs. To address these challenges, we exploit two essential properties of such problems: the low-rank structure of the Jacobian of the parameter-to-observable map to extract the intrinsically low-dimensional data-informed subspace, and the high correlation of the approximate EIGs by a series of approximations to reduce the number of PDE solves. We propose an efficient offline-online decomposition for the optimization problem: an offline stage of computing all the quantities that require a limited number of PDE solves independent of parameter and data dimensions, and an online stage of optimizing sensor placement that does not require any PDE solve. For the online optimization, we propose a swapping greedy algorithm that first construct an initial set of sensors using leverage scores and then swap the chosen sensors with other candidates until certain convergence criteria are met. We demonstrate the efficiency and scalability of the proposed computational framework by a linear inverse problem of inferring the initial condition for an advection-diffusion equation, and a nonlinear inverse problem of inferring the diffusion coefficient of a log-normal diffusion equation, with both the parameter and data dimensions ranging from a few tens to a few thousands.

[15] 2010.15201

Forecasting Hamiltonian dynamics without canonical coordinates

Conventional neural networks are universal function approximators, but because they are unaware of underlying symmetries or physical laws, they may need impractically many training data to approximate nonlinear dynamics. Recently introduced Hamiltonian neural networks can efficiently learn and forecast dynamical systems that conserve energy, but they require special inputs called canonical coordinates, which may be hard to infer from data. Here we significantly expand the scope of such networks by demonstrating a simple way to train them with any set of generalised coordinates, including easily observable ones.

[16] 2010.15203

Micromobility in Smart Cities: A Closer Look at Shared Dockless E-Scooters via Big Social Data

The micromobility is shaping first- and last-mile travels in urban areas. Recently, shared dockless electric scooters (e-scooters) have emerged as a daily alternative to driving for short-distance commuters in large cities due to the affordability, easy accessibility via an app, and zero emissions. Meanwhile, e-scooters come with challenges in city management, such as traffic rules, public safety, parking regulations, and liability issues. In this paper, we collected and investigated 5.8 million scooter-tagged tweets and 144,197 images, generated by 2.7 million users from October 2018 to March 2020, to take a closer look at shared e-scooters via crowdsourcing data analytics. We profiled e-scooter usages from spatial-temporal perspectives, explored different business roles (i.e., riders, gig workers, and ridesharing companies), examined operation patterns (e.g., injury types, and parking behaviors), and conducted sentiment analysis. To our best knowledge, this paper is the first large-scale systematic study on shared e-scooters using big social data.

[17] 2010.15206

Rosella: A Self-Driving Distributed Scheduler for Heterogeneous Clusters

Large-scale interactive web services and advanced AI applications make sophisticated decisions in real-time, based on executing a massive amount of computation tasks on thousands of servers. Task schedulers, which often operate in heterogeneous and volatile environments, require high throughput, i.e., scheduling millions of tasks per second, and low latency, i.e., incurring minimal scheduling delays for millisecond-level tasks. Scheduling is further complicated by other users' workloads in a shared system, other background activities, and the diverse hardware configurations inside datacenters. We present Rosella, a new self-driving, distributed approach for task scheduling in heterogeneous clusters. Our system automatically learns the compute environment and adjust its scheduling policy in real-time. The solution provides high throughput and low latency simultaneously, because it runs in parallel on multiple machines with minimum coordination and only performs simple operations for each scheduling decision. Our learning module monitors total system load, and uses the information to dynamically determine optimal estimation strategy for the backends' compute-power. Our scheduling policy generalizes power-of-two-choice algorithms to handle heterogeneous workers, reducing the max queue length of $O(\log n)$ obtained by prior algorithms to $O(\log \log n)$. We implement a Rosella prototype and evaluate it with a variety of workloads. Experimental results show that Rosella significantly reduces task response times, and adapts to environment changes quickly.

[18] 2010.15210

On Linearizability and the Termination of Randomized Algorithms

We study the question of whether the "termination with probability 1" property of a randomized algorithm is preserved when one replaces the atomic registers that the algorithm uses with linearizable (implementations of) registers. We show that in general this is not so: roughly speaking, every randomized algorithm A has a corresponding algorithm A' that solves the same problem if the registers that it uses are atomic or strongly-linearizable, but does not terminate if these registers are replaced with "merely" linearizable ones. Together with a previous result shown in [15], this implies that one cannot use the well-known ABD implementation of registers in message-passing systems to automatically transform any randomized algorithm that works in shared-memory systems into a randomized algorithm that works in message-passing systems: with a strong adversary the resulting algorithm may not terminate.

[19] 2010.15211

Safety-Aware Cascade Controller Tuning Using Constrained Bayesian Optimization

This paper presents an automated, model-free, data-driven method for the safe tuning of PID cascade controller gains based on Bayesian optimization. The optimization objective is composed of data-driven performance metrics and modeled using Gaussian processes. We further introduce a data-driven constraint that captures the stability requirements from system data. Numerical evaluation shows that the proposed approach outperforms relay feedback autotuning and quickly converges to the global optimum, thanks to a tailored stopping criterion. We demonstrate the performance of the method in simulations and experiments on a linear axis drive of a grinding machine. For experimental implementation, in addition to the introduced safety constraint, we integrate a method for automatic detection of the critical gains and extend the optimization objective with a penalty depending on the proximity of the current candidate points to the critical gains. The resulting automated tuning method optimizes system performance while ensuring stability and standardization.

[20] 2010.15217

Away from Trolley Problems and Toward Risk Management

As automated vehicles receive more attention from the media, there has been an equivalent increase in the coverage of the ethical choices a vehicle may be forced to make in certain crash situations with no clear safe outcome. Much of this coverage has focused on a philosophical thought experiment known as the "trolley problem," and substituting an automated vehicle for the trolley and the car's software for the bystander. While this is a stark and straightforward example of ethical decision making for an automated vehicle, it risks marginalizing the entire field if it is to become the only ethical problem in the public's mind. In this chapter, I discuss the shortcomings of the trolley problem, and introduce more nuanced examples that involve crash risk and uncertainty. Risk management is introduced as an alternative approach, and its ethical dimensions are discussed.

[21] 2010.15218

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate the generated architectures on an FPGA testbed, demonstrating the highest single-device and multi-device performance recorded for stencil programs on FPGAs to date, then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.

[22] 2010.15222

Exploring complex networks with the ICON R package

We introduce ICON, an R package that contains 1075 complex network datasets in a standard edgelist format. All provided datasets have associated citations and have been indexed by the Colorado Index of Complex Networks - also referred to as ICON. In addition to supplying a large and diverse corpus of useful real-world networks, ICON also implements an S3 generic to work with the network and ggnetwork R packages for network analysis and visualization, respectively. Sample code in this report also demonstrates how ICON can be used in conjunction with the igraph package. Currently, the Comprehensive R Archive Network hosts ICON v0.4.0. We hope that ICON will serve as a standard corpus for complex network research and prevent redundant work that would be otherwise necessary by individual research groups. The open source code for ICON and for this reproducible report can be found at

[23] 2010.15225

A Visuospatial Dataset for Naturalistic Verb Learning

We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access: That is, naturalistic, spontaneous speech paired with richly grounded visuospatial context. We use the collected data to compare several distributional semantics models for verb learning. We evaluate neural models based on 2D (pixel) features as well as feature-engineered models based on 3D (symbolic, spatial) features, and show that neither modeling approach achieves satisfactory performance. Our results are consistent with evidence from child language acquisition that emphasizes the difficulty of learning verbs from naive distributional data. We discuss avenues for future work on cognitively-inspired grounded language learning, and release our corpus with the intent of facilitating research on the topic.

[24] 2010.15229

Speech-Based Emotion Recognition using Neural Networks and Information Visualization

Emotions recognition is commonly employed for health assessment. However, the typical metric for evaluation in therapy is based on patient-doctor appraisal. This process can fall into the issue of subjectivity, while also requiring healthcare professionals to deal with copious amounts of information. Thus, machine learning algorithms can be a useful tool for the classification of emotions. While several models have been developed in this domain, there is a lack of userfriendly representations of the emotion classification systems for therapy. We propose a tool which enables users to take speech samples and identify a range of emotions (happy, sad, angry, surprised, neutral, clam, disgust, and fear) from audio elements through a machine learning model. The dashboard is designed based on local therapists' needs for intuitive representations of speech data in order to gain insights and informative analyses of their sessions with their patients.

[25] 2010.15232

Construction Payment Automation Using Blockchain-Enabled Smart Contracts and Reality Capture Technologies

This paper presents a smart contract-based solution for autonomous administration of construction progress payments. It bridges the gap between payments (cash flow) and the progress assessments at job sites (product flow) enabled by reality capture technologies and building information modeling (BIM). The approach eliminates the reliance on the centralized and heavily intermediated mechanisms of existing payment applications. The construction progress is stored in a distributed manner using content addressable file sharing; it is broadcasted to a smart contract which automates the on-chain payment settlements and the transfer of lien rights. The method was successfully used for processing payments to 7 subcontractors in two commercial construction projects where progress monitoring was performed using a camera-equipped unmanned aerial vehicle (UAV) and an unmanned ground vehicle (UGV) equipped with a laser scanner. The results show promise for the method's potential for increasing the frequency, granularity, and transparency of payments. The paper is concluded with a discussion of implications for project management, introducing a new model of project as a singleton state machine.

[26] 2010.15234

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions

Recently, invariant risk minimization (IRM) (Arjovsky et al.) was proposed as a promising solution to address out-of-distribution (OOD) generalization. In Ahuja et al., it was shown that solving for the Nash equilibria of a new class of "ensemble-games" is equivalent to solving IRM. In this work, we extend the framework in Ahuja et al. for linear regressions by projecting the ensemble-game on an $\ell_{\infty}$ ball. We show that such projections help achieve non-trivial OOD guarantees despite not achieving perfect invariance. For linear models with confounders, we prove that Nash equilibria of these games are closer to the ideal OOD solutions than the standard empirical risk minimization (ERM) and we also provide learning algorithms that provably converge to these Nash Equilibria. Empirical comparisons of the proposed approach with the state-of-the-art show consistent gains in achieving OOD solutions in several settings involving anti-causal variables and confounders.

[27] 2010.15236

SD-Access: Practical Experiences in Designing and Deploying Software Defined Enterprise Networks

Enterprise Networks, over the years, have become more and more complex trying to keep up with new requirements that challenge traditional solutions. Just to mention one out of many possible examples, technologies such as Virtual LANs (VLANs) struggle to address the scalability and operational requirements introduced by Internet of Things (IoT) use cases. To keep up with these challenges we have identified four main requirements that are common across modern enterprise networks: (i) scalable mobility, (ii) endpoint segmentation, (iii) simplified administration, and (iv) resource optimization. To address these challenges we designed SDA (Software Defined Access), a solution for modern enterprise networks that leverages Software-Defined Networking (SDN) and other state of the art techniques. In this paper we present the design, implementation and evaluation of SDA. Specifically, SDA: (i) leverages a combination of an overlay approach with an event-driven protocol (LISP) to dynamically adapt to traffic and mobility patterns while preserving resources, and (ii) enforces dynamic endpoint groups for scalable segmentation with low operational burden. We present our experience with deploying SDA in two real-life scenarios: an enterprise campus, and a large warehouse with mobile robots. Our evaluation shows that SDA, when compared with traditional enterprise networks, can (i) reduce overall data plane forwarding state up to 70% thanks to a reactive protocol using a centralized routing server, and (ii) reduce by an order of magnitude the handover delays in scenarios of massive mobility with respect to other approaches. Finally, we discuss lessons learned while deploying and operating SDA, and possible optimizations regarding the use of an event-driven protocol and group-based segmentation.

[28] 2010.15237

Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility

The demand for seamless Internet access under extreme user mobility, such as on high-speed trains and vehicles, has become a norm rather than an exception. However, the 4G/5G mobile network is not always reliable to meet this demand, with non-negligible failures during the handover between base stations. A fundamental challenge of reliability is to balance the exploration of more measurements for satisfactory handover, and exploitation for timely handover (before the fast-moving user leaves the serving base station's radio coverage). This paper formulates this trade-off in extreme mobility as a composition of two distinct multi-armed bandit problems. We propose Bandit and Threshold Tuning (BATT) to minimize the regret of handover failures in extreme mobility. BATT uses $\epsilon$-binary-search to optimize the threshold of the serving cell's signal strength to initiate the handover procedure with $\mathcal{O}(\log J \log T)$ regret.It further devises opportunistic Thompson sampling, which optimizes the sequence of the target cells to measure for reliable handover with $\mathcal{O}(\log T)$ regret.Our experiment over a real LTE dataset from Chinese high-speed rails validates significant regret reduction and a 29.1% handover failure reduction.

[29] 2010.15239

Cloud-Based Dynamic Programming for an Electric City Bus Energy Management Considering Real-Time Passenger Load Prediction

Electric city bus gains popularity in recent years for its low greenhouse gas emission, low noise level, etc. Different from a passenger car, the weight of a city bus varies significantly with different amounts of onboard passengers, which is not well studied in existing literature. This study proposes a passenger load prediction model using day-of-week, time-of-day, weather, temperatures, wind levels, and holiday information as inputs. The average model, Regression Tree, Gradient Boost Decision Tree, and Neural Networks models are compared in the passenger load prediction. The Gradient Boost Decision Tree model is selected due to its best accuracy and high stability. Given the predicted passenger load, dynamic programming algorithm determines the optimal power demand for supercapacitor and battery by optimizing the battery aging and energy usage in the cloud. Then rule extraction is conducted on dynamic programming results, and the rule is real-time loaded to onboard controllers of vehicles. The proposed cloud-based dynamic programming and rule extraction framework with the passenger load prediction shows 4% and 11% fewer bus operating costs in off-peak and peak hours, respectively. The operating cost by the proposed framework is less than 1% shy of the dynamic programming with the true passenger load information.

[30] 2010.15240

Test Set Optimization by Machine Learning Algorithms

Diagnosis results are highly dependent on the volume of test set. To derive the most efficient test set, we propose several machine learning based methods to predict the minimum amount of test data that produces relatively accurate diagnosis. By collecting outputs from failing circuits, the feature matrix and label vector are generated, which involves the inference information of the test termination point. Thus we develop a prediction model to fit the data and determine when to terminate testing. The considered methods include LASSO and Support Vector Machine(SVM) where the relationship between goals(label) and predictors(feature matrix) are considered to be linear in LASSO and nonlinear in SVM. Numerical results show that SVM reaches a diagnosis accuracy of 90.4% while deducting the volume of test set by 35.24%.

[31] 2010.15250

Semantic video segmentation for autonomous driving

We aim to solve semantic video segmentation in autonomous driving, namely road detection in real time video, using techniques discussed in (Shelhamer et al., 2016a). While fully convolutional network gives good result, we show that the speed can be halved while preserving the accuracy. The test dataset being used is KITTI, which consists of real footage from Germany's streets.

[32] 2010.15251

Fusion Models for Improved Visual Captioning

Visual captioning aims to generate textual descriptions given images. Traditionally, the captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them to often make mistakes. Language models can, however, be trained on vast amounts of freely available unlabelled data and have recently emerged as successful language encoders and coherent text generators. Meanwhile, several unimodal and multimodal fusion techniques have been proven to work well for natural language generation and automatic speech recognition. Building on these recent developments, and with an aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks. Next, we employ the same fusion strategies to integrate a pretrained Masked Language Model (MLM), namely BERT, with a visual captioning model, viz. Show, Attend, and Tell, for emending both syntactic and semantic errors in captions. Our caption emendation experiments on three benchmark image captioning datasets, viz. Flickr8k, Flickr30k, and MSCOCO, show improvements over the baseline, indicating the usefulness of our proposed multimodal fusion strategies. Further, we perform a preliminary qualitative analysis on the emended captions and identify error categories based on the type of corrections.

[33] 2010.15255

Model Minimization For Online Predictability

For humans in a teaming scenario, context switching between reasoning about a teammate's behavior and thinking about thier own task can slow us down, especially if the cognitive cost of predicting the teammate's actions is high. So if we can make the prediction of a robot-teammate's actions quicker, then the human can be more productive. In this paper we present an approach to constrain the actions of a robot so as to increase predictability (specifically online predictability) while keeping the plan costs of the robot within acceptable limits. Existing works on human-robot interaction do not consider the computational cost for predictability, which we consider in our approach. We approach this problem from the perspective of directed graph minimization, and we connect the concept of predictability to the out-degree of vertices. We present an algorithm to minimize graphs for predictability, and contrast this with minimization for legibility (goal inference) and optimality.

[34] 2010.15258

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community. One of the biggest use cases of these perceptual objective metrics is to evaluate noise suppression algorithms. This paper introduces a multi-stage self-teaching based perceptual objective metric that is designed to evaluate noise suppressors. The proposed method generalizes well in challenging test conditions with a high correlation to human ratings.

[35] 2010.15260

Object sieving and morphological closing to reduce false detections in wide-area aerial imagery

For object detection in wide-area aerial imagery, post-processing is usually needed to reduce false detections. We propose a two-stage post-processing scheme which comprises an area-thresholding sieving process and a morphological closing operation. We use two wide-area aerial videos to compare the performance of five object detection algorithms in the absence and in the presence of our post-processing scheme. The automatic detection results are compared with the ground-truth objects. Several metrics are used for performance comparison.

[36] 2010.15261

Deep Shells: Unsupervised Shape Correspondence with Optimal Transport

We propose a novel unsupervised learning approach to 3D shape correspondence that builds a multiscale matching pipeline into a deep neural network. This approach is based on smooth shells, the current state-of-the-art axiomatic correspondence method, which requires an a priori stochastic search over the space of initial poses. Our goal is to replace this costly preprocessing step by directly learning good initializations from the input surfaces. To that end, we systematically derive a fully differentiable, hierarchical matching pipeline from entropy regularized optimal transport. This allows us to combine it with a local feature extractor based on smooth, truncated spectral convolution filters. Finally, we show that the proposed unsupervised method significantly improves over the state-of-the-art on multiple datasets, even in comparison to the most recent supervised methods. Moreover, we demonstrate compelling generalization results by applying our learned filters to examples that significantly deviate from the training set.

[37] 2010.15266

CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models

Copy mechanisms are employed in sequence to sequence models (seq2seq) to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.

[38] 2010.15268

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning

Despite empirical success, the theory of reinforcement learning (RL) with value function approximation remains fundamentally incomplete. Prior work has identified a variety of pathological behaviours that arise in RL algorithms that combine approximate on-policy evaluation and greedification. One prominent example is policy oscillation, wherein an algorithm may cycle indefinitely between policies, rather than converging to a fixed point. What is not well understood however is the quality of the policies in the region of oscillation. In this paper we present simple examples illustrating that in addition to policy oscillation and multiple fixed points -- the same basic issue can lead to convergence to the worst possible policy for a given approximation. Such behaviours can arise when algorithms optimize evaluation accuracy weighted by the distribution of states that occur under the current policy, but greedify based on the value of states which are rare or nonexistent under this distribution. This means the values used for greedification are unreliable and can steer the policy in undesirable directions. Our observation that this can lead to the worst possible policy shows that in a general sense such algorithms are unreliable. The existence of such examples helps to narrow the kind of theoretical guarantees that are possible and the kind of algorithmic ideas that are likely to be helpful. We demonstrate analytically and experimentally that such pathological behaviours can impact a wide range of RL and dynamic programming algorithms; such behaviours can arise both with and without bootstrapping, and with linear function approximation as well as with more complex parameterized functions like neural networks.

[39] 2010.15274

Representation learning for improved interpretability and classification accuracy of clinical factors from EEG

Despite extensive standardization, diagnostic interviews for mental health disorders encompass substantial subjective judgment. Previous studies have demonstrated that EEG-based neural measures can function as reliable objective correlates of depression, or even predictors of depression and its course. However, their clinical utility has not been fully realized because of 1) the lack of automated ways to deal with the inherent noise associated with EEG data at scale, and 2) the lack of knowledge of which aspects of the EEG signal may be markers of a clinical disorder. Here we adapt an unsupervised pipeline from the recent deep representation learning literature to address these problems by 1) learning a disentangled representation using $\beta$-VAE to denoise the signal, and 2) extracting interpretable features associated with a sparse set of clinical labels using a Symbol-Concept Association Network (SCAN). We demonstrate that our method is able to outperform the canonical hand-engineered baseline classification method on a number of factors, including participant age and depression diagnosis. Furthermore, our method recovers a representation that can be used to automatically extract denoised Event Related Potentials (ERPs) from novel, single EEG trajectories, and supports fast supervised re-mapping to various clinical labels, allowing clinicians to re-use a single EEG representation regardless of updates to the standardized diagnostic system. Finally, single factors of the learned disentangled representations often correspond to meaningful markers of clinical factors, as automatically detected by SCAN, allowing for human interpretability and post-hoc expert analysis of the recommendations made by the model.

[40] 2010.15275

A direct method for solving inverse Sturm-Liouville problems

We consider two main inverse Sturm-Liouville problems: the problem of recovery of the potential and the boundary conditions from two spectra or from a spectral density function. A simple method for practical solution of such problems is developed, based on the transmutation operator approach, new Neumann series of Bessel functions representations for solutions and the Gelfand-Levitan equation. The method allows one to reduce the inverse Sturm-Liouville problem directly to a system of linear algebraic equations, such that the potential is recovered from the first element of the solution vector. We prove the stability of the method and show its numerical efficiency with several numerical examples.

[41] 2010.15277

Class-incremental learning: survey and performance evaluation

For future learning systems incremental learning is desirable, because it allows for: efficient resource usage by eliminating the need to retrain from scratch at the arrival of new data; reduced memory usage by preventing or limiting the amount of data required to be stored -- also important when privacy limitations are imposed; and learning that more closely resembles human learning. The main challenge for incremental learning is catastrophic forgetting, which refers to the precipitous drop in performance on previously learned tasks after learning a new one. Incremental learning of deep neural networks has seen explosive growth in recent years. Initial work focused on task incremental learning, where a task-ID is provided at inference time. Recently we have seen a shift towards class-incremental learning where the learner must classify at inference time between all classes seen in previous tasks without recourse to a task-ID. In this paper, we provide a complete survey of existing methods for incremental learning, and in particular we perform an extensive experimental evaluation on twelve class-incremental methods. We consider several new experimental scenarios, including a comparison of class-incremental methods on multiple large-scale datasets, investigation into small and large domain shifts, and comparison on various network architectures.

[42] 2010.15280

Specification description and verification of multitask hybrid systems in the OTS/CafeOBJ method

To develop IoT and/or CSP systems, we need consider both continuous data from physical world and discrete data in computer systems. Such a system is called a hybrid system. Because of density of continuous data, it is not easy to do software testing to ensure reliability of hybrid systems. Moreover, the size of the state space increases exponentially for multitask systems. Formal descriptions of hybrid systems may help us to verify desired properties of a given system formally with computer supports. In this paper, we propose a way to describe a formal specification of a given multitask hybrid system as an observational transition system in CafeOBJ algebraic specification language and verify it by the proof score method based on equational reasoning implemented in CafeOBJ interpreter.

[43] 2010.15283

GENs: Generative Encoding Networks

Mapping data from and/or onto a known family of distributions has become an important topic in machine learning and data analysis. Deep generative models (e.g., generative adversarial networks ) have been used effectively to match known and unknown distributions. Nonetheless, when the form of the target distribution is known, analytical methods are advantageous in providing robust results with provable properties. In this paper, we propose and analyze the use of nonparametric density methods to estimate the Jensen-Shannon divergence for matching unknown data distributions to known target distributions, such Gaussian or mixtures of Gaussians, in latent spaces. This analytical method has several advantages: better behavior when training sample quantity is low, provable convergence properties, and relatively few parameters, which can be derived analytically. Using the proposed method, we enforce the latent representation of an autoencoder to match a target distribution in a learning framework that we call a {\em generative encoding network}. Here, we present the numerical methods; derive the expected distribution of the data in the latent space; evaluate the properties of the latent space, sample reconstruction, and generated samples; show the advantages over the adversarial counterpart; and demonstrate the application of the method in real world.

[44] 2010.15288

Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks

Semantically-aligned $(speech, image)$ datasets can be used to explore "visually-grounded speech". In a majority of existing investigations, features of an image signal are extracted using neural networks "pre-trained" on other tasks (e.g., classification on ImageNet). In still others, pre-trained networks are used to extract audio features prior to semantic embedding. Without "transfer learning" through pre-trained initialization or pre-trained feature extraction, previous results have tended to show low rates of recall in $speech \rightarrow image$ and $image \rightarrow speech$ queries. Choosing appropriate neural architectures for encoders in the speech and image branches and using large datasets, one can obtain competitive recall rates without any reliance on any pre-trained initialization or feature extraction: $(speech,image)$ semantic alignment and $speech \rightarrow image$ and $image \rightarrow speech$ retrieval are canonical tasks worthy of independent investigation of their own and allow one to explore other questions---e.g., the size of the audio embedder can be reduced significantly with little loss of recall rates in $speech \rightarrow image$ and $image \rightarrow speech$ queries.

[45] 2010.15296

Fact or Factitious? Contextualized Opinion Spam Detection

In this paper we perform an analytic comparison of a number of techniques used to detect fake and deceptive online reviews. We apply a number machine learning approaches found to be effective, and introduce our own approach by fine-tuning state of the art contextualised embeddings. The results we obtain show the potential of contextualised embeddings for fake review detection, and lay the groundwork for future research in this area.

[46] 2010.15297

Analysis of Chorin-Type Projection Methods for the Stochastic Stokes Equations with General Multiplicative Noises

This paper is concerned with numerical analysis of two fully discrete Chorin-type projection methods for the stochastic Stokes equations with general non-solenoidal multiplicative noise. The first scheme is the standard Chorin scheme and the second one is a modified Chorin scheme which is designed by employing the Helmholtz decomposition on the noise function at each time step to produce a projected divergence-free noise and a "pseudo pressure" after combining the original pressure and the curl-free part of the decomposition. Optimal order rates of the convergence are proved for both velocity and pressure approximations of these two (semi-discrete) Chorin schemes. It is crucial to measure the errors in appropriate norms. The fully discrete finite element methods are formulated by discretizing both semi-discrete Chorin schemes in space by the standard finite element method. Suboptimal order error estimates are derived for both fully discrete methods. It is proved that all spatial error constants contain a growth factor $k^{-1/2}$, where $k$ denotes the time step size, which explains the deteriorating performance of the standard Chorin scheme when $k\to 0$ and the space mesh size is fixed as observed earlier in the numerical tests of [9]. Numerical results are also provided to guage the performance of the proposed numerical methods and to validate the sharpness of the theoretical error estimates.

[47] 2010.15300

Uncovering Latent Biases in Text: Method and Application to Peer Review

Quantifying systematic disparities in numerical quantities such as employment rates and wages between population subgroups provides compelling evidence for the existence of societal biases. However, biases in the text written for members of different subgroups (such as in recommendation letters for male and non-male candidates), though widely reported anecdotally, remain challenging to quantify. In this work, we introduce a novel framework to quantify bias in text caused by the visibility of subgroup membership indicators. We develop a nonparametric estimation and inference procedure to estimate this bias. We then formalize an identification strategy to causally link the estimated bias to the visibility of subgroup membership indicators, provided observations from time periods both before and after an identity-hiding policy change. We identify an application wherein "ground truth" bias can be inferred to evaluate our framework, instead of relying on synthetic or secondary data. Specifically, we apply our framework to quantify biases in the text of peer reviews from a reputed machine learning conference before and after the conference adopted a double-blind reviewing policy. We show evidence of biases in the review ratings that serves as "ground truth", and show that our proposed framework accurately detects these biases from the review text without having access to the review ratings.

[48] 2010.15302

Point Cloud Attribute Compression via Successive Subspace Graph Transform

Inspired by the recently proposed successive subspace learning (SSL) principles, we develop a successive subspace graph transform (SSGT) to address point cloud attribute compression in this work. The octree geometry structure is utilized to partition the point cloud, where every node of the octree represents a point cloud subspace with a certain spatial size. We design a weighted graph with self-loop to describe the subspace and define a graph Fourier transform based on the normalized graph Laplacian. The transforms are applied to large point clouds from the leaf nodes to the root node of the octree recursively, while the represented subspace is expanded from the smallest one to the whole point cloud successively. It is shown by experimental results that the proposed SSGT method offers better R-D performances than the previous Region Adaptive Haar Transform (RAHT) method.

[49] 2010.15303

Automatic joint damage quantification using computer vision and deep learning

Joint raveled or spalled damage (henceforth called joint damage) can affect the safety and long-term performance of concrete pavements. It is important to assess and quantify the joint damage over time to assist in building action plans for maintenance, predicting maintenance costs, and maximize the concrete pavement service life. A framework for the accurate, autonomous, and rapid quantification of joint damage with a low-cost camera is proposed using a computer vision technique with a deep learning (DL) algorithm. The DL model is employed to train 263 images of sawcuts with joint damage. The trained DL model is used for pixel-wise color-masking joint damage in a series of query 2D images, which are used to reconstruct a 3D image using open-source structure from motion algorithm. Another damage quantification algorithm using a color threshold is applied to detect and compute the surface area of the damage in the 3D reconstructed image. The effectiveness of the framework was validated through inspecting joint damage at four transverse contraction joints in Illinois, USA, including three acceptable joints and one unacceptable joint by visual inspection. The results show the framework achieves 76% recall and 10% error.

[50] 2010.15313

"where is this relationship going?": Understanding Relationship Trajectories in Narrative Text

We examine a new commonsense reasoning task: given a narrative describing a social interaction that centers on two protagonists, systems make inferences about the underlying relationship trajectory. Specifically, we propose two evaluation tasks: Relationship Outlook Prediction MCQ and Resolution Prediction MCQ. In Relationship Outlook Prediction, a system maps an interaction to a relationship outlook that captures how the interaction is expected to change the relationship. In Resolution Prediction, a system attributes a given relationship outlook to a particular resolution that explains the outcome. These two tasks parallel two real-life questions that people frequently ponder upon as they navigate different social situations: "where is this relationship going?" and "how did we end up here?". To facilitate the investigation of human social relationships through these two tasks, we construct a new dataset, Social Narrative Tree, which consists of 1250 stories documenting a variety of daily social interactions. The narratives encode a multitude of social elements that interweave to give rise to rich commonsense knowledge of how relationships evolve with respect to social interactions. We establish baseline performances using language models and the accuracies are significantly lower than human performance. The results demonstrate that models need to look beyond syntactic and semantic signals to comprehend complex human relationships.

[51] 2010.15314

Recurrent neural circuits for contour detection

We introduce a deep recurrent neural network architecture that approximates visual cortical circuits. We show that this architecture, which we refer to as the gamma-net, learns to solve contour detection tasks with better sample efficiency than state-of-the-art feedforward networks, while also exhibiting a classic perceptual illusion, known as the orientation-tilt illusion. Correcting this illusion significantly reduces gamma-net contour detection accuracy by driving it to prefer low-level edges over high-level object boundary contours. Overall, our study suggests that the orientation-tilt illusion is a byproduct of neural circuits that help biological visual systems achieve robust and efficient contour detection, and that incorporating these circuits in artificial neural networks can improve computer vision.

[52] 2010.15315

Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

The use of accurate scanning transmission electron microscopy (STEM) image simulation methods require large computation times that can make their use infeasible for the simulation of many images. Other simulation methods based on linear imaging models, such as the convolution method, are much faster but are too inaccurate to be used in application. In this paper, we explore deep learning models that attempt to translate a STEM image produced by the convolution method to a prediction of the high accuracy multislice image. We then compare our results to those of regression methods. We find that using the deep learning model Generative Adversarial Network (GAN) provides us with the best results and performs at a similar accuracy level to previous regression models on the same dataset. Codes and data for this project can be found in this GitHub repository,

[53] 2010.15316

Multiple Sclerosis Severity Classification From Clinical Text

Multiple Sclerosis (MS) is a chronic, inflammatory and degenerative neurological disease, which is monitored by a specialist using the Expanded Disability Status Scale (EDSS) and recorded in unstructured text in the form of a neurology consult note. An EDSS measurement contains an overall "EDSS" score and several functional subscores. Typically, expert knowledge is required to interpret consult notes and generate these scores. Previous approaches used limited context length Word2Vec embeddings and keyword searches to predict scores given a consult note, but often failed when scores were not explicitly stated. In this work, we present MS-BERT, the first publicly available transformer model trained on real clinical data other than MIMIC. Next, we present MSBC, a classifier that applies MS-BERT to generate embeddings and predict EDSS and functional subscores. Lastly, we explore combining MSBC with other models through the use of Snorkel to generate scores for unlabelled consult notes. MSBC achieves state-of-the-art performance on all metrics and prediction tasks and outperforms the models generated from the Snorkel ensemble. We improve Macro-F1 by 0.12 (to 0.88) for predicting EDSS and on average by 0.29 (to 0.63) for predicting functional subscores over previous Word2Vec CNN and rule-based approaches.

[54] 2010.15317

The IQIYI System for Voice Conversion Challenge 2020

This paper presents the IQIYI voice conversion system (T24) for Voice Conversion 2020. In the competition, each target speaker has 70 sentences. We have built an end-to-end voice conversion system based on PPG. First, the ASR acoustic model calculates the BN feature, which represents the content-related information in the speech. Then the Mel feature is calculated through an improved prosody tacotron model. Finally, the Mel spectrum is converted to wav through an improved LPCNet. The evaluation results show that this system can achieve better voice conversion effects. In the case of using 16k rather than 24k sampling rate audio, the conversion result is relatively good in naturalness and similarity. Among them, our best results are in the similarity evaluation of the Task 2, the 2nd in the ASV-based objective evaluation and the 5th in the subjective evaluation.

[55] 2010.15320

Gaussian Processes Model-based Control of Underactuated Balance Robots

Ranging from cart-pole systems and autonomous bicycles to bipedal robots, control of these underactuated balance robots aims to achieve both external (actuated) subsystem trajectory tracking and internal (unactuated) subsystem balancing tasks with limited actuation authority. This paper proposes a learning model-based control framework for underactuated balance robots. The key idea to simultaneously achieve tracking and balancing tasks is to design control strategies in slow- and fast-time scales, respectively. In slow-time scale, model predictive control (MPC) is used to generate the desired internal subsystem trajectory that encodes the external subsystem tracking performance and control input. In fast-time scale, the actual internal trajectory is stabilized to the desired internal trajectory by using an inverse dynamics controller. The coupling effects between the external and internal subsystems are captured through the planned internal trajectory profile and the dual structural properties of the robotic systems. The control design is based on Gaussian processes (GPs) regression model that are learned from experiments without need of priori knowledge about the robot dynamics nor successful balance demonstration. The GPs provide estimates of modeling uncertainties of the robotic systems and these uncertainty estimations are incorporated in the MPC design to enhance the control robustness to modeling errors. The learning-based control design is analyzed with guaranteed stability and performance. The proposed design is demonstrated by experiments on a Furuta pendulum and an autonomous bikebot.

[56] 2010.15327

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models. We demonstrate that this block structure arises when model capacity is large relative to the size of the training set, and is indicative of the underlying layers preserving and propagating the dominant principal component of their representations. This discovery has important ramifications for features learned by different models, namely, representations outside the block structure are often similar across architectures with varying widths and depths, but the block structure is unique to each model. We analyze the output predictions of different model architectures, finding that even when the overall accuracy is similar, wide and deep models exhibit distinctive error patterns and variations across classes.

[57] 2010.15329

Scalable Attack-Resistant Obfuscation of Logic Circuits

Hardware IP protection has been one of the most critical areas of research in the past years. Recently, attacks on hardware IPs (such as reverse engineering or cloning) have evolved as attackers have developed sophisticated techniques. Therefore, hardware obfuscation has been introduced as a powerful tool to protect IPs against piracy attacks. However, many recent attempts to break existing obfuscation methods have been successful in unlocking the IP and restoring its functionality. In this paper, we propose SARO, a Scalable Attack-Resistant Obfuscation that provides a robust functional and structural design transformation process. SARO treats the target circuit as a graph, and performs a partitioning algorithm to produce a set of sub-graphs, then applies our novel Truth Table Transformation (T3) process to each partition. We also propose the $T3_{metric}$, which is developed to quantify the structural and functional design transformation level caused by the obfuscation process. We evaluate SARO on ISCAS85 and EPFL benchmarks, and provide full security and performance analysis of our proposed framework.

[58] 2010.15335

Learning Sampling Distributions Using Local 3D Workspace Decompositions for Motion Planning in High Dimensions

Earlier work has shown that reusing experience from prior motion planning problems can improve the efficiency of similar, future motion planning queries. However, for robots with many degrees-of-freedom, these methods exhibit poor generalization across different environments and often require large datasets that are impractical to gather. We present SPARK and FLAME , two experience-based frameworks for sampling-based planning applicable to complex manipulators in 3 D environments. Both combine samplers associated with features from a workspace decomposition into a global biased sampling distribution. SPARK decomposes the environment based on exact geometry while FLAME is more general, and uses an octree-based decomposition obtained from sensor data. We demonstrate the effectiveness of SPARK and FLAME on a Fetch robot tasked with challenging pick-and-place manipulation problems. Our approaches can be trained incrementally and significantly improve performance with only a handful of examples, generalizing better over diverse tasks and environments as compared to prior approaches.

[59] 2010.15336

SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching

This paper presents a study of automatic design of neural network architectures for skeleton-based action recognition. Specifically, we encode a skeleton-based action instance into a tensor and carefully define a set of operations to build two types of network cells: normal cells and reduction cells. The recently developed DARTS (Differentiable Architecture Search) is adopted to search for an effective network architecture that is built upon the two types of cells. All operations are 2D based in order to reduce the overall computation and search space. Experiments on the challenging NTU RGB+D and Kinectics datasets have verified that most of the networks developed to date for skeleton-based action recognition are likely not compact and efficient. The proposed method provides an approach to search for such a compact network that is able to achieve comparative or even better performance than the state-of-the-art methods.

[60] 2010.15338

A New "Model-Free" Method Combined with Neural Network for MIMO Systems

In this brief, a model-free adaptive predictive control (MFAPC) is proposed. It outperforms the current model-free adaptive control (MFAC) for not only solving the time delay problem in multiple-input multiple-output (MIMO) systems but also relaxing the current rigorous assumptions for sake of a wider applicable range. The most attractive merit of the proposed controller is that the controller design, performance analysis and applications are easy for engineers to realize. Furthermore, the problem of how to choose the matrix {\lambda} is finished by analyzing the function of the closed-loop poles rather than the previous contraction mapping method. Additionally, in view of the nonlinear modeling capability and adaptability of neural networks (NNs), we combine these two classes of algorithms together. The feasibility and several interesting results of the proposed method are shown in simulations.

[61] 2010.15343

Identifying safe intersection design through unsupervised feature extraction from satellite imagery

The World Health Organization has listed the design of safer intersections as a key intervention to reduce global road trauma. This article presents the first study to systematically analyze the design of all intersections in a large country, based on aerial imagery and deep learning. Approximately 900,000 satellite images were downloaded for all intersections in Australia and customized computer vision techniques emphasized the road infrastructure. A deep autoencoder extracted high-level features, including the intersection's type, size, shape, lane markings, and complexity, which were used to cluster similar designs. An Australian telematics data set linked infrastructure design to driving behaviors captured during 66 million kilometers of driving. This showed more frequent hard acceleration events (per vehicle) at four- than three-way intersections, relatively low hard deceleration frequencies at T-intersections, and consistently low average speeds on roundabouts. Overall, domain-specific feature extraction enabled the identification of infrastructure improvements that could result in safer driving behaviors, potentially reducing road trauma.

[62] 2010.15344

Sea-Net: Squeeze-And-Excitation Attention Net For Diabetic Retinopathy Grading

Diabetes is one of the most common disease in individuals. \textit{Diabetic retinopathy} (DR) is a complication of diabetes, which could lead to blindness. Automatic DR grading based on retinal images provides a great diagnostic and prognostic value for treatment planning. However, the subtle differences among severity levels make it difficult to capture important features using conventional methods. To alleviate the problems, a new deep learning architecture for robust DR grading is proposed, referred to as SEA-Net, in which, spatial attention and channel attention are alternatively carried out and boosted with each other, improving the classification performance. In addition, a hybrid loss function is proposed to further maximize the inter-class distance and reduce the intra-class variability. Experimental results have shown the effectiveness of the proposed architecture.

[63] 2010.15346

Developing Augmented Reality based Gaming Model to Teach Ethical Education in Primary Schools

Education sector is adopting new technologies for both teaching and learning pedagogy. Augmented Reality (AR) is a new technology that can be used in the educational pedagogy to enhance the engagement with students. Students interact with AR-based educational material for more visualization and explanation. Therefore, the use of AR in education is becoming more popular. However, most researches narrate the use of AR technologies in the field of English, Maths, Science, Culture, Arts, and History education but the absence of ethical education is visible. In our paper, we design the system and develop an AR-based mobile game model in the field of Ethical education for pre-primary students. Students from pre-primary require more interactive lessons than theoretical concepts. So, we use AR technology to develop a game which offers interactive procedures where students can learn with fun and engage with the context. Finally, we develop a prototype that works with our research objective. We conclude our paper with future works.

[64] 2010.15350

A Hybrid Position/Force Controller for Joint Robots

In this paper, we present a hybrid position/force controller for operating joint robots. The hybrid controller has two goals---motion tracking and force regulating. As long as these two goals are not mutually exclusive, they can be decoupled in some way. In this work, we make use of the smooth and invertible mapping from joint space to task space to decouple the two control goals and design controllers separately. The traditional motion controller in task space is used for motion control, while the force controller is designed through manipulating the desired trajectory to regulate the force indirectly. Two case studies---contour tracking/polishing surfaces and grabbing boxes with two robotic arms---are presented to show the efficacy of the hybrid controller, and simulations with physics engines are carried out to validate the efficacy of the proposed method.

[65] 2010.15353

Domain decomposition and partitioning methods for mixed finite element discretizations of the Biot system of poroelasticity

We develop non-overlapping domain decomposition methods for the Biot system of poroelasticity in a mixed form. The solid deformation is modeled with a mixed three-field formulation with weak stress symmetry. The fluid flow is modeled with a mixed Darcy formulation. We introduce displacement and pressure Lagrange multipliers on the subdomain interfaces to impose weakly continuity of normal stress and normal velocity, respectively. The global problem is reduced to an interface problem for the Lagrange multipliers, which is solved by a Krylov space iterative method. We study both monolithic and split methods. In the monolithic method, a coupled displacement-pressure interface problem is solved, with each iteration requiring the solution of local Biot problems. We show that the resulting interface operator is positive definite and analyze the convergence of the iteration. We further study drained split and fixed stress Biot splittings, in which case we solve separate interface problems requiring elasticity and Darcy solves. We analyze the stability of the split formulations. Numerical experiments are presented to illustrate the convergence of the domain decomposition methods and compare their accuracy and efficiency.

[66] 2010.15354

Reconfigurable Intelligent Surface Aided Secure Transmission: Outage-Constrained Energy-Efficiency Maximization

Reconfigurable intelligent surface (RIS) has the potential to significantly enhance the network secure transmission performance by reconfiguring the wireless propagation environment. However, due to the passive nature of eavesdroppers and the cascaded channel brought by the RIS, the eavesdroppers' channel state information is imperfectly obtained at the base station. Under the channel uncertainty, the optimal phase-shift, power allocation, and transmission rate design for secure transmission is currently unknown due to the difficulty of handling the probabilistic constraint with coupled variables. To fill this gap, this paper formulates a problem of energy-efficient secure transmission design while incorporating the probabilistic constraint. By transforming the probabilistic constraint and decoupling variables, the secure energy efficiency maximization problem can be solved via alternatively executing difference-of-convex programming and semidefinite relaxation technique. To scale the solution to massive antennas and reflecting elements scenario, a fast first-order algorithm with low complexity is further proposed. Simulation results show that the proposed first-order algorithm achieves identical performance to the conventional method but saves at least two orders of magnitude in computation time. Moreover, the resultant RIS aided secure transmission significantly improves the energy efficiency compared to baseline schemes of random phase-shift, fixed phase-shift, and RIS ignoring CSI uncertainty.

[67] 2010.15356

Financial ticket intelligent recognition system based on deep learning

Facing the rapid growth in the issuance of financial tickets (or bills, invoices etc.), traditional manual invoice reimbursement and financial accounting system are imposing an increasing burden on financial accountants and consuming excessive manpower. To solve this problem, we proposes an iterative self-learning Framework of Financial Ticket intelligent Recognition System (FFTRS), which can support the fast iterative updating and extensibility of the algorithm model, which are the fundamental requirements for a practical financial accounting system. In addition, we designed a simple yet efficient Financial Ticket Faster Detection network (FTFDNet) and an intelligent data warehouse of financial ticket are designed to strengthen its efficiency and performance. At present, the system can recognize 194 kinds of financial tickets and has an automatic iterative optimization mechanism, which means, with the increase of application time, the types of tickets supported by the system will continue to increase, and the accuracy of recognition will continue to improve. Experimental results show that the average recognition accuracy of the system is 97.07%, and the average running time for a single ticket is 175.67ms. The practical value of the system has been tested in a commercial application, which makes a beneficial attempt for the deep learning technology in financial accounting work.

[68] 2010.15360

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Most existing approaches to disfluency detection heavily rely on human-annotated corpora, which is expensive to obtain in practice. There have been several proposals to alleviate this issue with, for instance, self-supervised learning techniques, but they still require human-annotated corpora. In this work, we explore the unsupervised learning paradigm which can potentially work with unlabeled text corpora that are cheaper and easier to obtain. Our model builds upon the recent work on Noisy Student Training, a semi-supervised learning approach that extends the idea of self-training. Experimental results on the commonly used English Switchboard test set show that our approach achieves competitive performance compared to the previous state-of-the-art supervised systems using contextualized word embeddings (e.g. BERT and ELECTRA).

[69] 2010.15363

Model-Agnostic Counterfactual Reasoning for Eliminating Popularity Bias in Recommender System

The general aim of the recommender system is to provide personalized suggestions to users, which is opposed to suggesting popular items. However, the normal training paradigm, i.e., fitting a recommender model to recover the user behavior data with pointwise or pairwise loss, makes the model biased towards popular items. This results in the terrible Matthew effect, making popular items be more frequently recommended and become even more popular. Existing work addresses this issue with Inverse Propensity Weighting (IPW), which decreases the impact of popular items on the training and increases the impact of long-tail items. Although theoretically sound, IPW methods are highly sensitive to the weighting strategy, which is notoriously difficult to tune. In this work, we explore the popularity bias issue from a novel and fundamental perspective -- cause-effect. We identify that popularity bias lies in the direct effect from the item node to the ranking score, such that an item's intrinsic property is the cause of mistakenly assigning it a higher ranking score. To eliminate popularity bias, it is essential to answer the counterfactual question that what the ranking score would be if the model only uses item property. To this end, we formulate a causal graph to describe the important cause-effect relations in the recommendation process. During training, we perform multi-task learning to achieve the contribution of each cause; during testing, we perform counterfactual inference to remove the effect of item popularity. Remarkably, our solution amends the learning process of recommendation which is agnostic to a wide range of models. We demonstrate it on Matrix Factorization (MF) and LightGCN, which are representative of the conventional and state-of-the-art model for collaborative filtering. Experiments on five real-world datasets demonstrate the effectiveness of our method.

[70] 2010.15364

Online State-Time Trajectory Planning Using Timed-ESDF in Highly Dynamic Environments

Online state-time trajectory planning in highly dynamic environments remains an unsolved problem due to the unpredictable motions of moving obstacles and the curse of dimensionality from the state-time space. Existing state-time planners are typically implemented based on randomized sampling approaches or path searching on discretized state graph. The smoothness, path clearance, and planning efficiency of these planners are usually not satisfying. In this work, we propose a gradient-based planner over the state-time space for online trajectory generation in highly dynamic environments. To enable the gradient-based optimization, we propose a Timed-ESDT that supports distance and gradient queries with state-time keys. Based on the Timed-ESDT, we also define a smooth prior and an obstacle likelihood function that is compatible with the state-time space. The trajectory planning is then formulated to a MAP problem and solved by an efficient numerical optimizer. Moreover, to improve the optimality of the planner, we also define a state-time graph and then conduct path searching on it to find a better initialization for the optimizer. By integrating the graph searching, the planning quality is significantly improved. Experiment results on simulated and benchmark datasets show that our planner can outperform the state-of-the-art methods, demonstrating its significant advantages over the traditional ones.

[71] 2010.15365

Infinite Time Solutions of Numerical Schemes for Advection Problems

This paper addresses the question whether there are numerical schemes for constant-coefficient advection problems that can yield convergent solutions for an infinite time horizon. The motivation is that such methods may serve as building blocks for long-time accurate solutions in more complex advection-dominated problems. After establishing a new notion of convergence in an infinite time limit of numerical methods, we first show that linear methods cannot meet this convergence criterion. Then we present a new numerical methodology, based on a nonlinear jet scheme framework. We show that these methods do satisfy the new convergence criterion, thus establishing that numerical methods exist that converge on an infinite time horizon, and demonstrate the long-time accuracy gains incurred by this property.

[72] 2010.15366

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.

[73] 2010.15371

Learning Centric Wireless Resource Allocation for Edge Computing: Algorithm and Experiment

Edge intelligence is an emerging network architecture that integrates sensing, communication, computing components, and supports various machine learning applications, where a fundamental communication question is: how to allocate the limited wireless resources (such as time, energy) to the simultaneous model training of heterogeneous learning tasks? Existing methods ignore two important facts: 1) different models have heterogeneous demands on training data; 2) there is a mismatch between the simulated environment and the real-world environment. As a result, they could lead to low learning performance in practice. This paper proposes the learning centric wireless resource allocation (LCWRA) scheme that maximizes the worst learning performance of multiple classification tasks. Analysis shows that the optimal transmission time has an inverse power relationship with respect to the classification error. Finally, both simulation and experimental results are provided to verify the performance of the proposed LCWRA scheme and its robustness in real implementation.

[74] 2010.15372

Learning Personalized Discretionary Lane-Change Initiation for Fully Autonomous Driving Based on Reinforcement Learning

In this article, the authors present a novel method to learn the personalized tactic of discretionary lane-change initiation for fully autonomous vehicles through human-computer interactions. Instead of learning from human-driving demonstrations, a reinforcement learning technique is employed to learn how to initiate lane changes from traffic context, the action of a self-driving vehicle, and in-vehicle user feedback. The proposed offline algorithm rewards the action-selection strategy when the user gives positive feedback and penalizes it when negative feedback. Also, a multi-dimensional driving scenario is considered to represent a more realistic lane-change trade-off. The results show that the lane-change initiation model obtained by this method can reproduce the personal lane-change tactic, and the performance of the customized models (average accuracy 86.1%) is much better than that of the non-customized models (average accuracy 75.7%). This method allows continuous improvement of customization for users during fully autonomous driving even without human-driving experience, which will significantly enhance the user acceptance of high-level autonomy of self-driving vehicles.

[75] 2010.15377

Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: an application to rugby union

Given a set of sequences comprised of time-ordered events, sequential pattern mining is useful to identify frequent sub-sequences from different sequences or within the same sequence. However, in sport, these techniques cannot determine the importance of particular patterns of play to good or bad outcomes, which is often of greater interest to coaches. In this study, we apply a supervised sequential pattern mining algorithm called safe pattern pruning (SPP) to 490 labelled event sequences representing passages of play from one rugby team's matches from the 2018 Japan Top League, and then evaluate the importance of the obtained sub-sequences to points-scoring outcomes. Linebreaks, successful lineouts, regained kicks in play, repeated phase-breakdown play, and failed opposition exit plays were identified as important patterns of play for the team scoring. When sequences were labelled with points scoring outcomes for the opposition teams, opposition team linebreaks, errors made by the team, opposition team lineouts, and repeated phase-breakdown play by the opposition team were identified as important patterns of play for the opposition team scoring. By virtue of its supervised nature and pruning properties, SPP obtained a greater variety of generally more sophisticated patterns than the well-known unsupervised PrefixSpan algorithm.

[76] 2010.15378

Collaborative Method for Incremental Learning on Classification and Generation

Although well-trained deep neural networks have shown remarkable performance on numerous tasks, they rapidly forget what they have learned as soon as they begin to learn with additional data with the previous data stop being provided. In this paper, we introduce a novel algorithm, Incremental Class Learning with Attribute Sharing (ICLAS), for incremental class learning with deep neural networks. As one of its component, we also introduce a generative model, incGAN, which can generate images with increased variety compared with the training data. Under challenging environment of data deficiency, ICLAS incrementally trains classification and the generation networks. Since ICLAS trains both networks, our algorithm can perform multiple times of incremental class learning. The experiments on MNIST dataset demonstrate the advantages of our algorithm.

[77] 2010.15382

Learning to Actively Learn: A Robust Approach

This work proposes a procedure for designing algorithms for specific adaptive data collection tasks like active learning and pure-exploration multi-armed bandits. Unlike the design of traditional adaptive algorithms that rely on concentration of measure and careful analysis to justify the correctness and sample complexity of the procedure, our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds. In particular, a single adaptive learning algorithm is learned that competes with the best adaptive algorithm learned for each equivalence class. Our procedure takes as input just the available queries, set of hypotheses, loss function, and total query budget. This is in contrast to existing meta-learning work that learns an adaptive algorithm relative to an explicit, user-defined subset or prior distribution over problems which can be challenging to define and be mismatched to the instance encountered at test time. This work is particularly focused on the regime when the total query budget is very small, such as a few dozen, which is much smaller than those budgets typically considered by theoretically derived algorithms. We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data including a noisy 20 Questions game and a joke recommendation task.

[78] 2010.15388

Prediction-Based Power Oversubscription in Cloud Platforms

Datacenter designers rely on conservative estimates of IT equipment power draw to provision resources. This leaves resources underutilized and requires more datacenters to be built. Prior work has used power capping to shave the rare power peaks and add more servers to the datacenter, thereby oversubscribing its resources and lowering capital costs. This works well when the workloads and their server placements are known. Unfortunately, these factors are unknown in public clouds, forcing providers to limit the oversubscription so that performance is never impacted. In this paper, we argue that providers can use predictions of workload performance criticality and virtual machine (VM) resource utilization to increase oversubscription. This poses many challenges, such as identifying the performance-critical workloads from black-box VMs, creating support for criticality-aware power management, and increasing oversubscription while limiting the impact of capping. We address these challenges for the hardware and software infrastructures of Microsoft Azure. The results show that we enable a 2x increase in oversubscription with minimum impact to critical workloads.

[79] 2010.15389

Learning Audio Embeddings with User Listening Data for Content-based Music Recommendation

Personalized recommendation on new track releases has always been a challenging problem in the music industry. To combat this problem, we first explore user listening history and demographics to construct a user embedding representing the user's music preference. With the user embedding and audio data from user's liked and disliked tracks, an audio embedding can be obtained for each track using metric learning with Siamese networks. For a new track, we can decide the best group of users to recommend by computing the similarity between the track's audio embedding and different user embeddings, respectively. The proposed system yields state-of-the-art performance on content-based music recommendation tested with millions of users and tracks. Also, we extract audio embeddings as features for music genre classification tasks. The results show the generalization ability of our audio embeddings.

[80] 2010.15390

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $\epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm, RobustAgg$(\epsilon)$, that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

[81] 2010.15391

Robustifying Binary Classification to Adversarial Perturbation

Despite the enormous success of machine learning models in various applications, most of these models lack resilience to (even small) perturbations in their input data. Hence, new methods to robustify machine learning models seem very essential. To this end, in this paper we consider the problem of binary classification with adversarial perturbations. Investigating the solution to a min-max optimization (which considers the worst-case loss in the presence of adversarial perturbations) we introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data. We refer to this classifier as the "Robust Max-margin" (RM) classifier. Under some mild assumptions on the loss function, we theoretically show that the gradient descent iterates (with sufficiently small step size) converge to the RM classifier in its direction. Therefore, the RM classifier can be studied to compute various performance measures (e.g. generalization error) of binary classification with adversarial perturbations.

[82] 2010.15392

Off-Policy Interval Estimation with Lipschitz Value Iteration

Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is crucial to provide provably correct upper and lower bounds of the expected reward, not just a classical single point estimate, to the end-users, as executing a poor policy can be very costly. In this work, we propose a provably correct method for obtaining interval bounds for off-policy evaluation in a general continuous setting. The idea is to search for the maximum and minimum values of the expected reward among all the Lipschitz Q-functions that are consistent with the observations, which amounts to solving a constrained optimization problem on a Lipschitz function space. We go on to introduce a Lipschitz value iteration method to monotonically tighten the interval, which is simple yet efficient and provably convergent. We demonstrate the practical efficiency of our method on a range of benchmarks.

[83] 2010.15393

Discovery and classification of Twitter bots

A very large number of people use Online Social Networks daily. Such platforms thus become attractive targets for agents that seek to gain access to the attention of large audiences, and influence perceptions or opinions. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and to create an illusion of community behavior, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyzed a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. We detected over a million events where seemingly unrelated accounts tweeted nearly identical content at nearly the same time. We filtered these concurrent content injection events and detected a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same software. We found botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between bot accounts and human users, as well as botnet interaction with user communities and Twitter trending topics.

[84] 2010.15394

Smart Homes: Security Challenges and Privacy Concerns

Development and growth of Internet of Things (IoT) technology has exponentially increased over the course of the last 10 years since its inception, and as a result has directly influenced the popularity and size of smart homes. In this article we present the main technologies and applications that constitute a smart home, we identify the main security and privacy challenges that smart home face and we provide good practices to mitigate those threats.

[85] 2010.15396

Channel Estimation and Equalization for CP-OFDM-based OTFS in Fractional Doppler Channels

Orthogonal time frequency and space (OTFS) modulation is a promising technology that satisfies high Doppler requirements for future mobile systems. OTFS modulation encodes information symbols and pilot symbols into the two-dimensional (2D) delay-Doppler (DD) domain. The received symbols suffer from inter-Doppler interference (IDI) in the fading channels with fractional Doppler shifts that are sampled at noninteger indices in the DD domain. IDI has been treated as an unavoidable effect because the fractional Doppler shifts cannot be obtained directly from the received pilot symbols. In this paper, we provide a solution to channel estimation for fractional Doppler channels. The proposed estimation provides new insight into the OTFS input-output relation in the DD domain as a 2D circular convolution with a small approximation. According to the input-output relation, we also provide a low-complexity channel equalization method using the estimated channel information. We demonstrate the error performance of the proposed channel estimation and equalization in several channels by simulations. The simulation results show that in high-mobility environments, the total system utilizing the proposed methods outperforms orthogonal frequency division multiplexing (OFDM) with ideal channel estimation and a conventional channel estimation method using a pseudo sequence.

[86] 2010.15399

Free-boundary conformal parameterization of point clouds

With the advancement in 3D scanning technology, there has been a surge of interest in the use of point clouds in science and engineering. To facilitate the computations and analyses of point clouds, prior works have considered parameterizing them onto some simple planar domains with a fixed boundary shape such as a unit circle or a rectangle. However, the geometry of the fixed shape may lead to some undesirable distortion in the parameterization. It is therefore more natural to consider free-boundary conformal parameterizations of point clouds, which minimize the local geometric distortion of the mapping without constraining the overall shape. In this work, we propose a novel approximation scheme of the Laplace--Beltrami operator on point clouds and utilize it for developing a free-boundary conformal parameterization method for disk-type point clouds. With the aid of the free-boundary conformal parameterization, high-quality point cloud meshing can be easily achieved. Furthermore, we show that using the idea of conformal welding in complex analysis, the point cloud conformal parameterization can be computed in a divide-and-conquer manner. Experimental results are presented to demonstrate the effectiveness of the proposed method.

[87] 2010.15404

On Efficient and Scalable Time-Continuous Spatial Crowdsourcing -- Full Version

The proliferation of advanced mobile terminals opened up a new crowdsourcing avenue, spatial crowdsourcing, to utilize the crowd potential to perform real-world tasks. In this work, we study a new type of spatial crowdsourcing, called time-continuous spatial crowdsourcing (TCSC in short). It supports broad applications for long-term continuous spatial data acquisition, ranging from environmental monitoring to traffic surveillance in citizen science and crowdsourcing projects. However, due to limited budgets and limited availability of workers in practice, the data collected is often incomplete, incurring data deficiency problem. To tackle that, in this work, we first propose an entropy-based quality metric, which captures the joint effects of incompletion in data acquisition and the imprecision in data interpolation. Based on that, we investigate quality-aware task assignment methods for both single- and multi-task scenarios. We show the NP-hardness of the single-task case, and design polynomial-time algorithms with guaranteed approximation ratios. We study novel indexing and pruning techniques for further enhancing the performance in practice. Then, we extend the solution to multi-task scenarios and devise a parallel framework for speeding up the process of optimization. We conduct extensive experiments on both real and synthetic datasets to show the effectiveness of our proposals.

[88] 2010.15411

Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management

Task-oriented dialogue systems typically rely on large amounts of high-quality training data or require complex handcrafted rules. However, existing datasets are often limited in size considering the complexity of the dialogues. Additionally, conventional training signal inference is not suitable for non-deterministic agent behaviour, i.e. considering multiple actions as valid in identical dialogue states. We propose the Conversation Graph (ConvGraph), a graph-based representation of dialogues that can be exploited for data augmentation, multi-reference training and evaluation of non-deterministic agents. ConvGraph generates novel dialogue paths to augment data volume and diversity. Intrinsic and extrinsic evaluation across three datasets shows that data augmentation and/or multi-reference training with ConvGraph can improve dialogue success rates by up to 6.4%.

[89] 2010.15413

Measuring and Harnessing Transference in Multi-Task Learning

Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, na\"ive formulations often degrade performance and in particular, identifying the tasks that would benefit from co-training remains a challenging design question. In this paper, we analyze the dynamics of information transfer, or transference, across tasks throughout training. Specifically, we develop a similarity measure that can quantify transference among tasks and use this quantity to both better understand the optimization dynamics of multi-task learning as well as improve overall learning performance. In the latter case, we propose two methods to leverage our transference metric. The first operates at a macro-level by selecting which tasks should train together while the second functions at a micro-level by determining how to combine task gradients at each training step. We find these methods can lead to significant improvement over prior work on three supervised multi-task learning benchmarks and one multi-task reinforcement learning paradigm.

[90] 2010.15415

A Novel Anomaly Detection Algorithm for Hybrid Production Systems based on Deep Learning and Timed Automata

Performing anomaly detection in hybrid systems is a challenging task since it requires analysis of timing behavior and mutual dependencies of both discrete and continuous signals. Typically, it requires modeling system behavior, which is often accomplished manually by human engineers. Using machine learning for creating a behavioral model from observations has advantages, such as lower development costs and fewer requirements for specific knowledge about the system. The paper presents DAD:DeepAnomalyDetection, a new approach for automatic model learning and anomaly detection in hybrid production systems. It combines deep learning and timed automata for creating behavioral model from observations. The ability of deep belief nets to extract binary features from real-valued inputs is used for transformation of continuous to discrete signals. These signals, together with the original discrete signals are than handled in an identical way. Anomaly detection is performed by the comparison of actual and predicted system behavior. The algorithm has been applied to few data sets including two from real systems and has shown promising results.

[91] 2010.15421

Scalable Graph Neural Networks via Bidirectional Propagation

Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data. Recently, there has been increased interest in designing GNN that scales to large graphs. Most existing methods use "graph sampling" or "layer-wise sampling" techniques to reduce training time. However, these methods still suffer from degrading performance and scalability problems when applying to graphs with billions of edges. This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes. Theoretical analysis shows that GBP is the first method that achieves sub-linear time complexity for both the precomputation and the training phases. An extensive empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time. Most notably, GBP can deliver superior performance on a graph with over 60 million nodes and 1.8 billion edges in less than half an hour on a single machine.

[92] 2010.15423

Tilde at WMT 2020: News Task Systems

This paper describes Tilde's submission to the WMT2020 shared task on news translation for both directions of the English-Polish language pair in both the constrained and the unconstrained tracks. We follow our submissions from the previous years and build our baseline systems to be morphologically motivated sub-word unit-based Transformer base models that we train using the Marian machine translation toolkit. Additionally, we experiment with different parallel and monolingual data selection schemes, as well as sampled back-translation. Our final models are ensembles of Transformer base and Transformer big models that feature right-to-left re-ranking.

[93] 2010.15426

Physics-informed deep learning for flow and deformation in poroelastic media

A physics-informed neural network is presented for poroelastic problems with coupled flow and deformation processes. The governing equilibrium and mass balance equations are discussed and specific derivations for two-dimensional cases are presented. A fully-connected deep neural network is used for training. Barry and Mercer's source problem with time-dependent fluid injection/extraction in an idealized poroelastic medium, which has an exact analytical solution, is used as a numerical example. A random sample from the analytical solution is used as training data and the performance of the model is tested by predicting the solution on the entire domain after training. The deep learning model predicts the horizontal and vertical deformations well while the error in the predicted pore pressure predictions is slightly higher because of the sparsity of the pore pressure values.

[94] 2010.15434

Self-paced Data Augmentation for Training Neural Networks

Data augmentation is widely used for machine learning; however, an effective method to apply data augmentation has not been established even though it includes several factors that should be tuned carefully. One such factor is sample suitability, which involves selecting samples that are suitable for data augmentation. A typical method that applies data augmentation to all training samples disregards sample suitability, which may reduce classifier performance. To address this problem, we propose the self-paced augmentation (SPA) to automatically and dynamically select suitable samples for data augmentation when training a neural network. The proposed method mitigates the deterioration of generalization performance caused by ineffective data augmentation. We discuss two reasons the proposed SPA works relative to curriculum learning and desirable changes to loss function instability. Experimental results demonstrate that the proposed SPA can improve the generalization performance, particularly when the number of training samples is small. In addition, the proposed SPA outperforms the state-of-the-art RandAugment method.

[95] 2010.15435

Group-Harmonic and Group-Closeness Maximization -- Approximation and Engineering

Centrality measures characterize important nodes in networks. Efficiently computing such nodes has received a lot of attention. When considering the generalization of computing central groups of nodes, challenging optimization problems occur. In this work, we study two such problems, group-harmonic maximization and group-closeness maximization both from a theoretical and from an algorithm engineering perspective. On the theoretical side, we obtain the following results. For group-harmonic maximization, unless $P=NP$, there is no polynomial-time algorithm that achieves an approximation factor better than $1-1/e$ (directed) and $1-1/(4e)$ (undirected), even for unweighted graphs. On the positive side, we show that a greedy algorithm achieves an approximation factor of $\lambda(1-2/e)$ (directed) and $\lambda(1-1/e)/2$ (undirected), where $\lambda$ is the ratio of minimal and maximal edge weights. For group-closeness maximization, the undirected case is $NP$-hard to be approximated to within a factor better than $1-1/(e+1)$ and a constant approximation factor is achieved by a local-search algorithm. For the directed case, however, we show that, for any $\epsilon<1/2$, the problem is $NP$-hard to be approximated within a factor of $4|V|^{-\epsilon}$. From the algorithm engineering perspective, we provide efficient implementations of the above greedy and local search algorithms. In our experimental study we show that, on small instances where an optimum solution can be computed in reasonable time, the quality of both the greedy and the local search algorithms come very close to the optimum. On larger instances, our local search algorithms yield results with superior quality compared to existing greedy and local search solutions, at the cost of additional running time. We thus advocate local search for scenarios where solution quality is of highest concern.

[96] 2010.15436

Affordance-Aware Handovers with Human Arm Mobility Constraints

Reasoning about object handover configurations allows an assistive agent to estimate the appropriateness of handover for a receiver with different arm mobility capacities. While there are existing approaches to estimating the effectiveness of handovers, their findings are limited to users without arm mobility impairments and to specific objects. Therefore, current state-of-the-art approaches are unable to hand over novel objects to receivers with different arm mobility capacities. We propose a method that generalises handover behaviours to previously unseen objects, subject to the constraint of a user's arm mobility levels and the task context. We propose a heuristic-guided hierarchically optimised cost whose optimisation adapts object configurations for receivers with low arm mobility. This also ensures that the robot grasps consider the context of the user's upcoming task, i.e., the usage of the object. To understand preferences over handover configurations, we report on the findings of an online study, wherein we presented different handover methods, including ours, to $259$ users with different levels of arm mobility. We encapsulate these preferences in a SRL that is able to reason about the most suitable handover configuration given a receiver's arm mobility and upcoming task. We find that people's preferences over handover methods are correlated to their arm mobility capacities. In experiments with a PR2 robotic platform, we obtained an average handover accuracy of $90.8\%$ when generalising handovers to novel objects.

[97] 2010.15437

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

This paper presents a novel fusion method for integrating an external language model (LM) into the Transformer based sequence-to-sequence (seq2seq) model. While paired data are basically required to train the seq2seq model, the external LM can be trained with only unpaired data. Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data. However, the existing fusion methods assume that the LM is integrated with recurrent neural network-based seq2seq models instead of the Transformer. Therefore, this paper proposes a fusion method that can explicitly utilize network structures in the Transformer. The proposed method, called {\bf memory attentive fusion}, leverages the Transformer-style attention mechanism that repeats source-target attention in a multi-hop manner for reading the memorized knowledge in the LM. Our experiments on two text-style conversion tasks demonstrate that the proposed method performs better than conventional fusion methods.

[98] 2010.15441

Self-awareness in intelligent vehicles: Feature based dynamic Bayesian models for abnormality detection

The evolution of Intelligent Transportation Systems in recent times necessitates the development of self-awareness in agents. Before the intensive use of Machine Learning, the detection of abnormalities was manually programmed by checking every variable and creating huge nested conditions that are very difficult to track. This paper aims to introduce a novel method to develop self-awareness in autonomous vehicles that mainly focuses on detecting abnormal situations around the considered agents. Multi-sensory time-series data from the vehicles are used to develop the data-driven Dynamic Bayesian Network (DBN) models used for future state prediction and the detection of dynamic abnormalities. Moreover, an initial level collective awareness model that can perform joint anomaly detection in co-operative tasks is proposed. The GNG algorithm learns the DBN models' discrete node variables; probabilistic transition links connect the node variables. A Markov Jump Particle Filter (MJPF) is applied to predict future states and detect when the vehicle is potentially misbehaving using learned DBNs as filter parameters. In this paper, datasets from real experiments of autonomous vehicles performing various tasks used to learn and test a set of switching DBN models.

[99] 2010.15444

Advanced Python Performance Monitoring with Score-P

Within the last years, Python became more prominent in the scientific community and is now used for simulations, machine learning, and data analysis. All these tasks profit from additional compute power offered by parallelism and offloading. In the domain of High Performance Computing (HPC), we can look back to decades of experience exploiting different levels of parallelism on the core, node or inter-node level, as well as utilising accelerators. By using performance analysis tools to investigate all these levels of parallelism, we can tune applications for unprecedented performance. Unfortunately, standard Python performance analysis tools cannot cope with highly parallel programs. Since the development of such software is complex and error-prone, we demonstrate an easy-to-use solution based on an existing tool infrastructure for performance analysis. In this paper, we describe how to apply the established instrumentation framework \scorep to trace Python applications. We finish with a study of the overhead that users can expect for instrumenting their applications.

[100] 2010.15453

Capacity-achieving codes: a review on double transitivity

Recently it was proved that if a linear code is invariant under the action of a doubly transitive permutation group, it achieves the capacity of erasure channel. Therefore, it is of sufficient interest to classify all codes, invariant under such permutation groups. We take a step in this direction and give a review of all suitable groups and the known results on codes invariant under these groups. It turns out that there are capacity-achieving families of algebraic geometric codes.

[101] 2010.15454

Scalable Federated Learning over Passive Optical Networks

Two-step aggregation is introduced to facilitate scalable federated learning (SFL) over passive optical networks (PONs). Results reveal that the SFL keeps the required PON upstream bandwidth constant regardless of the number of involved clients, while bringing ~10% learning accuracy improvement.

[102] 2010.15455

Optimal Sharing and and Fair Cost Allocation of Community Energy Storage

This paper studies an ES sharing model where multiple buildings cooperatively invest and share a community ES (CES) to harness economic benefits from on-site renewable integration and utility price arbitrage. Particularly, we formulate the problem that integrates the optimal ES sizing, operation and cost allocation as a coalition game, which are generally addressed separately in the literature. Particularly, we address the fair ex-post cost allocation which has not been well studied. To overcome the computational challenge of computing the entire information of explicit characteristic functions that takes exponential time, we propose a fair cost allocation based on nucleolus by employing a constraints generation technique. We study the fairness and computational efficiency of the method through a number of case studies. The numeric results imply that the proposed method outperforms the Shapley approach and proportional method either in computational efficiency or fairness. Notably, for the proposed method, only a small fraction of characteristic functions (2.54%) is computed to achieve the cost allocation versus the entire information required by Shapley approach. With the proposed cost allocation, we investigate the enhanced economic benefits of the CES model for individual buildings over individual ES (IES) installation. We see the CES model provides higher cost reduction to each committed buildings. Moreover, the value of storage is obviously improved (about 1.83 times) with the CES model over the IES model.

[103] 2010.15456

Multilayer Clustered Graph Learning

Multilayer graphs are appealing mathematical tools for modeling multiple types of relationship in the data. In this paper, we aim at analyzing multilayer graphs by properly combining the information provided by individual layers, while preserving the specific structure that allows us to eventually identify communities or clusters that are crucial in the analysis of graph data. To do so, we learn a clustered representative graph by solving an optimization problem that involves a data fidelity term to the observed layers, and a regularization pushing for a sparse and community-aware graph. We use the contrastive loss as a data fidelity term, in order to properly aggregate the observed layers into a representative graph. The regularization is based on a measure of graph sparsification called "effective resistance", coupled with a penalization of the first few eigenvalues of the representative graph Laplacian matrix to favor the formation of communities. The proposed optimization problem is nonconvex but fully differentiable, and thus can be solved via the projected gradient method. Experiments show that our method leads to a significant improvement w.r.t. state-of-the-art multilayer graph learning algorithms for solving clustering problems.

[104] 2010.15457

FiGLearn: Filter and Graph Learning using Optimal Transport

In many applications, a dataset can be considered as a set of observed signals that live on an unknown underlying graph structure. Some of these signals may be seen as white noise that has been filtered on the graph topology by a graph filter. Hence, the knowledge of the filter and the graph provides valuable information about the underlying data generation process and the complex interactions that arise in the dataset. We hence introduce a novel graph signal processing framework for jointly learning the graph and its generating filter from signal observations. We cast a new optimisation problem that minimises the Wasserstein distance between the distribution of the signal observations and the filtered signal distribution model. Our proposed method outperforms state-of-the-art graph learning frameworks on synthetic data. We then apply our method to a temperature anomaly dataset, and further show how this framework can be used to infer missing values if only very little information is available.

[105] 2010.15458

Named Entity Recognition for Social Media Texts with Semantic Augmentation

Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.

[106] 2010.15461

Concatenated Codes for Recovery From Multiple Reads of DNA Sequences

Decoding sequences that stem from multiple transmissions of a codeword over an insertion, deletion, and substitution channel is a critical component of efficient deoxyribonucleic acid (DNA) data storage systems. In this paper, we consider a concatenated coding scheme with an outer low-density parity-check code and either an inner convolutional code or a block code. We propose two new decoding algorithms for inference from multiple received sequences, both combining the inner code and channel to a joint hidden Markov model to infer symbolwise a posteriori probabilities (APPs). The first decoder computes the exact APPs by jointly decoding the received sequences, whereas the second decoder approximates the APPs by combining the results of separately decoded received sequences. Using the proposed algorithms, we evaluate the performance of decoding multiple received sequences by means of achievable information rates and Monte-Carlo simulations. We show significant performance gains compared to a single received sequence.

[107] 2010.15464

Self-Supervised Video Representation Using Pretext-Contrastive Learning

Pretext tasks and contrastive learning have been successful in self-supervised learning for video retrieval and recognition. In this study, we analyze their optimization targets and utilize the hyper-sphere feature space to explore the connections between them, indicating the compatibility and consistency of these two different learning methods. Based on the analysis, we propose a self-supervised training method, referred as Pretext-Contrastive Learning (PCL), to learn video representations. Extensive experiments based on different combinations of pretext task baselines and contrastive losses confirm the strong agreement with their self-supervised learning targets, demonstrating the effectiveness and the generality of PCL. The combination of pretext tasks and contrastive losses showed significant improvements in both video retrieval and recognition over the corresponding baselines. And we can also outperform current state-of-the-art methods in the same manner. Further, our PCL is flexible and can be applied to almost all existing pretext task methods.

[108] 2010.15466

Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information

Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriately leveraging the knowledge such as distinguishing the important ones for particular context. In this paper, we improve NER by leveraging different types of syntactic information through attentive ensemble, which functionalizes by the proposed key-value memory networks, syntax attention, and the gate mechanism for encoding, weighting and aggregating such syntactic information, respectively. Experimental results on six English and Chinese benchmark datasets suggest the effectiveness of the proposed model and show that it outperforms previous studies on all experiment datasets.

[109] 2010.15469

Emergence of Spatial Coordinates via Exploration

Spatial knowledge is a fundamental building block for the development of advanced perceptive and cognitive abilities. Traditionally, in robotics, the Euclidean (x,y,z) coordinate system and the agent's forward model are defined a priori. We show that a naive agent can autonomously build an internal coordinate system, with the same dimension and metric regularity as the external space, simply by learning to predict the outcome of sensorimotor transitions in a self-supervised way.

[110] 2010.15470

Hybrid mimetic finite-difference and virtual element formulation for coupled poromechanics

We present a hybrid mimetic finite-difference and virtual element formulation for coupled single-phase poromechanics on unstructured meshes. The key advantage of the scheme is that it is convergent on complex meshes containing highly distorted cells with arbitrary shapes. We use a local pressure-jump stabilization method based on unstructured macro-elements to prevent the development of spurious pressure modes in incompressible problems approaching undrained conditions. A scalable linear solution strategy is obtained using a block-triangular preconditioner designed specifically for the saddle-point systems arising from the proposed discretization. The accuracy and efficiency of our approach are demonstrated numerically on two-dimensional benchmark problems.

[111] 2010.15476

Iteratively reweighted greedy set cover

We empirically analyze a simple heuristic for large sparse set cover problems. It uses the weighted greedy algorithm as a basic building block. By multiplicative updates of the weights attached to the elements, the greedy solution is iteratively improved. The implementation of this algorithm is trivial and the algorithm is essentially free of parameters that would require tuning. More iterations can only improve the solution. This set of features makes the approach attractive for practical problems.

[112] 2010.15479

Learned infinite elements

We study the numerical solution of scalar time-harmonic wave equations on unbounded domains which can be split into a bounded interior domain of primary interest and an exterior domain with separable geometry. To compute the solution in the interior domain, approximations to the Dirichlet-to-Neumann (DtN) map of the exterior domain have to be imposed as transparent boundary conditions on the artificial coupling boundary. Although the DtN map can be computed by separation of variables, it is a nonlocal operator with dense matrix representations, and hence computationally inefficient. Therefore, approximations of DtN maps by sparse matrices, usually involving additional degrees of freedom, have been studied intensively in the literature using a variety of approaches including different types of infinite elements, local non-reflecting boundary conditions, and perfectly matched layers. The entries of these sparse matrices are derived analytically, e.g. from transformations or asymptotic expansions of solutions to the differential equation in the exterior domain. In contrast, in this paper we propose to `learn' the matrix entries from the DtN map in its separated form by solving an optimization problem as a preprocessing step. Theoretical considerations suggest that the approximation quality of learned infinite elements improves exponentially with increasing number of infinite element degrees of freedom, which is confirmed in numerical experiments. These numerical studies also show that learned infinite elements outperform state-of-the-art methods for the Helmholtz equation. At the same time, learned infinite elements are much more flexible than traditional methods as they, e.g., work similarly well for exterior domains involving strong reflections, for example, for the atmosphere of the Sun, which is strongly inhomogeneous and exhibits reflections at the corona.

[113] 2010.15482

Convergence of Constrained Anderson Acceleration

We prove non asymptotic linear convergence rates for the constrained Anderson acceleration extrapolation scheme. These guarantees come from new upper bounds on the constrained Chebyshev problem, which consists in minimizing the maximum absolute value of a polynomial on a bounded real interval with $l_1$ constraints on its coefficients vector. Constrained Anderson Acceleration has a numerical cost comparable to that of the original scheme.

[114] 2010.15487

Beyond cross-entropy: learning highly separable feature distributions for robust and accurate classification

Deep learning has shown outstanding performance in several applications including image classification. However, deep classifiers are known to be highly vulnerable to adversarial attacks, in that a minor perturbation of the input can easily lead to an error. Providing robustness to adversarial attacks is a very challenging task especially in problems involving a large number of classes, as it typically comes at the expense of an accuracy decrease. In this work, we propose the Gaussian class-conditional simplex (GCCS) loss: a novel approach for training deep robust multiclass classifiers that provides adversarial robustness while at the same time achieving or even surpassing the classification accuracy of state-of-the-art methods. Differently from other frameworks, the proposed method learns a mapping of the input classes onto target distributions in a latent space such that the classes are linearly separable. Instead of maximizing the likelihood of target labels for individual samples, our objective function pushes the network to produce feature distributions yielding high inter-class separation. The mean values of the distributions are centered on the vertices of a simplex such that each class is at the same distance from every other class. We show that the regularization of the latent space based on our approach yields excellent classification accuracy and inherently provides robustness to multiple adversarial attacks, both targeted and untargeted, outperforming state-of-the-art approaches over challenging datasets.

[115] 2010.15492

"What, not how" -- Solving an under-actuated insertion task from scratch

Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as grasping an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete manipulation ('what' to solve) is specified. In contrast, we study a complex manipulation goal that requires an agent to learn and combine diverse manipulation skills. We propose a challenging, highly under-actuated peg-in-hole task with a free, rotational asymmetrical peg, requiring a broad range of manipulation skills. While correct peg (re-)orientation is a requirement for successful insertion, there is no reward associated with it. Hence an agent needs to understand this pre-condition and learn the skill to fulfil it. The final insertion reward is sparse, allowing freedom in the solution and leading to complex emerging behaviour not envisioned during the task design. We tackle the problem in a multi-task RL framework using Scheduled Auxiliary Control (SAC-X) combined with Regularized Hierarchical Policy Optimization (RHPO) which successfully solves the task in simulation and from scratch on a single robot where data is severely limited.

[116] 2010.15502

Enhancing Vulnerable Road User Safety: A Survey of Existing Practices and Consideration for Using Mobile Devices for V2X Connections

Vulnerable road users (VRUs) such as pedestrians, cyclists and motorcyclists are at the highest risk in the road traffic environment. Globally, over half of road traffic deaths are vulnerable road users. Although substantial efforts are being made to improve VRU safety from engineering solutions to law enforcement, the death toll of VRUs' continues to rise. The emerging technology, Cooperative Intelligent Transportation System (C-ITS), has the proven potential to enhance road safety by enabling wireless communication to exchange information among road users. Such exchanged information is utilized for creating situational awareness and detecting any potential collisions in advance to take necessary measures to avoid any possible road casualties. The current state-of-the-art solutions of C-ITS for VRU safety, however, are limited to unidirectional communication where VRUs are only responsible for alerting their presence to drivers with the intention of avoiding collisions. This one-way interaction is substantially limiting the enormous potential of C-ITS which otherwise can be employed to devise a more effective solution for the VRU safety where VRU can be equipped with bidirectional communication with full C-ITS functionalities. To address such problems and to explore better C-ITS solution suggestions for VRU, this paper reviewed and evaluated the current technologies and safety methods proposed for VRU safety over the period 2007-2020. Later, it presents the design considerations for a cellular-based Vehicle-to-VRU (V2VRU) communication system along with potential challenges of a cellular-based approach to provide necessary recommendations.

[117] 2010.15504

A stochastic $θ$-SEIHRD model: adding randomness to the COVID-19 spread

In this article we mainly extend the deterministic model developed in [10] to a stochastic setting. More precisely, we incorporated randomness in some coefficients by assuming that they follow a prescribed stochastic dynamics. In this way, the model variables are now represented by stochastic process, that can be simulated by appropriately solve the system of stochastic differential equations. Thus, the model becomes more complete and flexible than the deterministic analogous, as it incorporates additional uncertainties which are present in more realistic situations. In particular, confidence intervals for the main variables and worst case scenarios can be computed.

[118] 2010.15506

Dynamic Formation Reshaping Based on Point Set Registration in a Swarm of Drones

This work focuses on the formation reshaping in an optimized manner in autonomous swarm of drones. Here, the two main problems are: 1) how to break and reshape the initial formation in an optimal manner, and 2) how to do such reformation while minimizing the overall deviation of the drones and the overall time, i.e., without slowing down. To address the first problem, we introduce a set of routines for the drones/agents to follow while reshaping to a secondary formation shape. And the second problem is resolved by utilizing the temperature function reduction technique, originally used in the point set registration process. The goal is to be able to dynamically reform the shape of multi-agent based swarm in near-optimal manner while going through narrow openings between, for instance obstacles, and then bringing the agents back to their original shape after passing through the narrow passage using point set registration technique.

[119] 2010.15507

Dynamic Resource-aware Corner Detection for Bio-inspired Vision Sensors

Event-based cameras are vision devices that transmit only brightness changes with low latency and ultra-low power consumption. Such characteristics make event-based cameras attractive in the field of localization and object tracking in resource-constrained systems. Since the number of generated events in such cameras is huge, the selection and filtering of the incoming events are beneficial from both increasing the accuracy of the features and reducing the computational load. In this paper, we present an algorithm to detect asynchronous corners from a stream of events in real-time on embedded systems. The algorithm is called the Three Layer Filtering-Harris or TLF-Harris algorithm. The algorithm is based on an events' filtering strategy whose purpose is 1) to increase the accuracy by deliberately eliminating some incoming events, i.e., noise, and 2) to improve the real-time performance of the system, i.e., preserving a constant throughput in terms of input events per second, by discarding unnecessary events with a limited accuracy loss. An approximation of the Harris algorithm, in turn, is used to exploit its high-quality detection capability with a low-complexity implementation to enable seamless real-time performance on embedded computing platforms. The proposed algorithm is capable of selecting the best corner candidate among neighbors and achieves an average execution time savings of 59 % compared with the conventional Harris score. Moreover, our approach outperforms the competing methods, such as eFAST, eHarris, and FA-Harris, in terms of real-time performance, and surpasses Arc* in terms of accuracy.

[120] 2010.15509

Night vision obstacle detection and avoidance based on Bio-Inspired Vision Sensors

Moving towards autonomy, unmanned vehicles rely heavily on state-of-the-art collision avoidance systems (CAS). However, the detection of obstacles especially during night-time is still a challenging task since the lighting conditions are not sufficient for traditional cameras to function properly. Therefore, we exploit the powerful attributes of event-based cameras to perform obstacle detection in low lighting conditions. Event cameras trigger events asynchronously at high output temporal rate with high dynamic range of up to 120 $dB$. The algorithm filters background activity noise and extracts objects using robust Hough transform technique. The depth of each detected object is computed by triangulating 2D features extracted utilising LC-Harris. Finally, asynchronous adaptive collision avoidance (AACA) algorithm is applied for effective avoidance. Qualitative evaluation is compared using event-camera and traditional camera.

[121] 2010.15510

Asynchronous Corner Tracking Algorithm based on Lifetime of Events for DAVIS Cameras

Event cameras, i.e., the Dynamic and Active-pixel Vision Sensor (DAVIS) ones, capture the intensity changes in the scene and generates a stream of events in an asynchronous fashion. The output rate of such cameras can reach up to 10 million events per second in high dynamic environments. DAVIS cameras use novel vision sensors that mimic human eyes. Their attractive attributes, such as high output rate, High Dynamic Range (HDR), and high pixel bandwidth, make them an ideal solution for applications that require high-frequency tracking. Moreover, applications that operate in challenging lighting scenarios can exploit the high HDR of event cameras, i.e., 140 dB compared to 60 dB of traditional cameras. In this paper, a novel asynchronous corner tracking method is proposed that uses both events and intensity images captured by a DAVIS camera. The Harris algorithm is used to extract features, i.e., frame-corners from keyframes, i.e., intensity images. Afterward, a matching algorithm is used to extract event-corners from the stream of events. Events are solely used to perform asynchronous tracking until the next keyframe is captured. Neighboring events, within a window size of 5x5 pixels around the event-corner, are used to calculate the velocity and direction of extracted event-corners by fitting the 2D planar using a randomized Hough transform algorithm. Experimental evaluation showed that our approach is able to update the location of the extracted corners up to 100 times during the blind time of traditional cameras, i.e., between two consecutive intensity images.

[122] 2010.15524

A brief overview of swarm intelligence-based algorithms for numerical association rule mining

Numerical Association Rule Mining is a popular variant of Association Rule Mining, where numerical attributes are handled without discretization. This means that the algorithms for dealing with this problem can operate directly, not only with categorical, but also with numerical attributes. Until recently, a big portion of these algorithms were based on a stochastic nature-inspired population-based paradigm. As a result, evolutionary and swarm intelligence-based algorithms showed big efficiency for dealing with the problem. In line with this, the main mission of this chapter is to make a historical overview of swarm intelligence-based algorithms for Numerical Association Rule Mining, as well as to present the main features of these algorithms for the observed problem. A taxonomy of the algorithms was proposed on the basis of the applied features found in this overview. Challenges, waiting in the future, finish this paper.

[123] 2010.15525

Self-Learning Threshold-Based Load Balancing

We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The dispatcher uses a threshold for balancing the load and keeping the maximum number of concurrent tasks across server pools low. We demonstrate that such a policy is optimal on the fluid and diffusion scales for a suitable threshold value, while only involving a small communication overhead. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be uncertain or even time-varying. For that purpose, we design a control rule for tuning the threshold in an online manner. We provide conditions which guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens.

[124] 2010.15528

An End to End Network Architecture for Fundamental Matrix Estimation

In this paper, we present a novel end-to-end network architecture to estimate fundamental matrix directly from stereo images. To establish a complete working pipeline, different deep neural networks in charge of finding correspondences in images, performing outlier rejection and calculating fundamental matrix, are integrated into an end-to-end network architecture. To well train the network and preserve geometry properties of fundamental matrix, a new loss function is introduced. To evaluate the accuracy of estimated fundamental matrix more reasonably, we design a new evaluation metric which is highly consistent with visualization result. Experiments conducted on both outdoor and indoor data-sets show that this network outperforms traditional methods as well as previous deep learning based methods on various metrics and achieves significant performance improvements.

[125] 2010.15530

Probabilistic interval predictor based on dissimilarity functions

This work presents a new method to obtain probabilistic interval predictions of a dynamical system. The method uses stored past system measurements to estimate the future evolution of the system. The proposed method relies on the use of dissimilarity functions to estimate the conditional probability density function of the outputs. A family of empirical probability density functions, parameterized by means of two parameters, is introduced. It is shown that the the proposed family encompasses the multivariable normal probability density function as a particular case. We show that the proposed method constitutes a generalization of classical estimation methods. A cross-validation scheme is used to tune the two parameters on which the methodology relies. In order to prove the effectiveness of the methodology presented, some numerical examples and comparisons are provided.

[126] 2010.15531

Coordinated Formation Control for Intelligent and Connected Vehicles in Multiple Traffic Scenarios

In this paper, a unified multi-vehicle formation control framework for Intelligent and Connected Vehicles (ICVs) that can apply to multiple traffic scenarios is proposed. In the one-dimensional scenario, different formation geometries are analyzed and the interlaced structure is mathematically modelized to improve driving safety while making full use of the lane capacity. The assignment problem for vehicles and target positions is solved using Hungarian Algorithm to improve the flexibility of the method in multiple scenarios. In the two-dimensional scenario, an improved virtual platoon method is proposed to transfer the complex two-dimensional passing problem to the one-dimensional formation control problem based on the idea of rotation projection. Besides, the vehicle regrouping method is proposed to connect the two scenarios. Simulation results prove that the proposed multi-vehicle formation control framework can apply to multiple typical scenarios and have better performance than existing methods.

[127] 2010.15533

How do Offline Measures for Exploration in Reinforcement Learning behave?

Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-ratio estimator, highlighting that the implementation choices have a profound impact on these measures.

[128] 2010.15534

Poster: Benchmarking Financial Data Feed Systems

Data-driven solutions for the investment industry require event-based backend systems to process high-volume financial data feeds with low latency, high throughput, and guaranteed delivery modes. At vwd we process an average of 18 billion incoming event notifications from 500+ data sources for 30 million symbols per day and peak rates of 1+ million notifications per second using custom-built platforms that keep audit logs of every event. We currently assess modern open source event-processing platforms such as Kafka, NATS, Redis, Flink or Storm for the use in our ticker plant to reduce the maintenance effort for cross-cutting concerns and leverage hybrid deployment models. For comparability and repeatability we benchmark candidates with a standardized workload we derived from our real data feeds. We have enhanced an existing light-weight open source benchmarking tool in its processing, logging, and reporting capabilities to cope with our workloads. The resulting tool wrench can simulate workloads or replay snapshots in volume and dynamics like those we process in our ticker plant. We provide the tool as open source. As part of ongoing work we contribute details on (a) our workload and requirements for benchmarking candidate platforms for financial feed processing; (b) the current state of the tool wrench.

[129] 2010.15535

Unbabel's Participation in the WMT20 Metrics Shared Task

We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: We train several estimator models to regress on different human-generated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.

[130] 2010.15545

Systematic literature review protocol Identification and classification of feature modeling errors

Context: The importance of feature modeling languages for software product lines and the planning stage for a systematic literature review. Objective: A protocol for carrying out a systematic literature review about the evidence for identifying and classifying the errors in feature modeling languages. Method: The definition of a protocol to conduct a systematic literature review according to the guidelines of B. Kitchenham. Results: A validated protocol to conduct a systematic literature review. Conclusions: A proposal for the protocol definition of a systematic literature review about the identification and classification of errors in feature modeling was built. Initial results show that the effects and results for solving these errors should be carried out.

[131] 2010.15549

Multi-Constitutive Neural Network for Large Deformation Poromechanics Problem

In this paper, we study the problem of large-strain consolidation in poromechanics with deep neural networks. Given different material properties and different loading conditions, the goal is to predict pore pressure and settlement. We propose a novel method "multi-constitutive neural network" (MCNN) such that one model can solve several different constitutive laws. We introduce a one-hot encoding vector as an additional input vector, which is used to label the constitutive law we wish to solve. Then we build a DNN which takes as input (X, t) along with a constitutive model label and outputs the corresponding solution. It is the first time, to our knowledge, that we can evaluate multi-constitutive laws through only one training process while still obtaining good accuracies. We found that MCNN trained to solve multiple PDEs outperforms individual neural network solvers trained with PDE.

[132] 2010.15550

ADABOOK & MULTIBOOK: Adaptive Boosting with Chance Correction

There has been considerable interest in boosting and bagging, including the combination of the adaptive techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa, Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and when this happens the booster can give up too early. This phenomenon is known to occur with conventional Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems using restart techniques based on bagging. This paper thus complements the theoretical work showing the necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a chance-corrected measure can improve boosting. We show that the early surrender problem occurs in MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when.

[133] 2010.15552

Successive Halving Top-k Operator

We propose a differentiable successive halving method of relaxing the top-k operator, rendering gradient-based optimization possible. The need to perform softmax iteratively on the entire vector of scores is avoided by using a tournament-style selection. As a result, a much better approximation of top-k with lower computational cost is achieved compared to the previous approach.

[134] 2010.15556

Modulation Pattern Detection Using Complex Convolutions in Deep Learning

Transceivers used for telecommunications transmit and receive specific modulation patterns that are represented as sequences of complex numbers. Classifying modulation patterns is challenging because noise and channel impairments affect the signals in complicated ways such that the received signal bears little resemblance to the transmitted signal. Although deep learning approaches have shown great promise over statistical methods in this problem space, deep learning frameworks continue to lag in support for complex-valued data. To address this gap, we study the implementation and use of complex convolutions in a series of convolutional neural network architectures. Replacement of data structure and convolution operations by their complex generalization in an architecture improves performance, with statistical significance, at recognizing modulation patterns in complex-valued signals with high SNR after being trained on low SNR signals. This suggests complex-valued convolutions enables networks to learn more meaningful representations. We investigate this hypothesis by comparing the features learned in each experiment by visualizing the inputs that results in one-hot modulation pattern classification for each network.

[135] 2010.15559

Quantum Computing: A Taxonomy, Systematic Review and Future Directions

Quantum computing is an emerging paradigm with the potential to offer significant computational advantage over conventional classical computing by exploiting quantum-mechanical principles such as entanglement and superposition. It is anticipated that this computational advantage of quantum computing will help to solve many complex and computationally intractable problems in several areas of research such as drug design, data science, clean energy, finance, industrial chemical development, secure communications, and quantum chemistry, among others. In recent years, tremendous progress in both quantum hardware development and quantum software/algorithm have brought quantum computing much closer to reality. As the quantum devices are expected to steadily scale up in the next few years, quantum decoherence and qubit interconnectivity are two of the major challenges to achieve quantum advantage in the NISQ era. Quantum computing is a highly topical and fast-moving field of research with significant ongoing progress in all facets. A systematic review of the existing literature on quantum computing will be invaluable to understand the current status of this emerging field and identify open challenges for the quantum computing community in the coming years. This review article presents a comprehensive review of quantum computing literature, and taxonomy of quantum computing. Further, the proposed taxonomy is used to map various related studies to identify the research gaps. A detailed overview of quantum software tools and technologies, post-quantum cryptography and quantum computer hardware development to document the current state-of-the-art in the respective areas. We finish the article by highlighting various open challenges and promising future directions for research.

[136] 2010.15561

Federated Transfer Learning: concept and applications

Development of Artificial Intelligence (AI) is inherently tied to the development of data. However, in most industries data exists in form of isolated islands, with limited scope of sharing between different organizations. This is an hindrance to the further development of AI. Federated learning has emerged as a possible solution to this problem in the last few years without compromising user privacy. Among different variants of the federated learning, noteworthy is federated transfer learning (FTL) that allows knowledge to be transferred across domains that do not have many overlapping features and users. In this work we provide a comprehensive survey of the existing works on this topic. In more details, we study the background of FTL and its different existing applications. We further analyze FTL from privacy and machine learning perspective.

[137] 2010.15562

Limitations of the recall capabilities in delay based reservoir computing systems

We analyze the memory capacity of a delay based reservoir computer with a Hopf normal form as nonlinearity and numerically compute the linear as well as the higher order recall capabilities. A possible physical realisation could be a laser with external cavity, for which the information is fed via electrical injection. A task independent quantification of the computational capability of the reservoir system is done via a complete orthonormal set of basis functions. Our results suggest that even for constant readout dimension the total memory capacity is dependent on the ratio between the information input period, also called the clock cycle, and the time delay in the system. Optimal performance is found for a time delay about 1.6 times the clock cycle

[138] 2010.15571

Overcoming The Limitations of Neural Networks in Composite-Pattern Learning with Architopes

The effectiveness of neural networks in solving complex problems is well recognized; however, little is known about their limitations. We demonstrate that the feed-forward architecture, for most commonly used activation functions, is incapable of approximating functions comprised of multiple sub-patterns while simultaneously respecting their composite-pattern structure. We overcome this bottleneck with a simple architecture modification that reallocates the neurons of any single feed-forward network across several smaller sub-networks, each specialized on a distinct part of the input-space. The modified architecture, called an Architope, is more expressive on two fronts. First, it is dense in an associated space of piecewise continuous functions in which the feed-forward architecture is not dense. Second, it achieves the same approximation rate as the feed-forward networks while only requiring $\mathscr{O}(N^{-1})$ fewer parameters in its hidden layers. Moreover, the architecture achieves these approximation improvements while preserving the target's composite-pattern structure.

[139] 2010.15572

Experimental Analysis of Communication Relaying Delay in Low-Energy Ad-hoc Networks

In recent years, more and more applications use ad-hoc networks for local M2M communications, but in some cases such as when using WSNs, the software processing delay induced by packets relaying may not be negligible. In this paper, we planned and carried out a delay measurement experiment using Raspberry Pi Zero W. The results demonstrated that, in low-energy ad-hoc networks, processing delay of the application is always too large to ignore; it is at least ten times greater than the kernel routing and corresponds to 30% of the transmission delay. Furthermore, if the task is CPU-intensive, such as packet encryption, the processing delay can be greater than the transmission delay and its behavior is represented by a simple linear model. Our findings indicate that the key factor for achieving QoS in ad-hoc networks is an appropriate node-to-node load balancing that takes into account the CPU performance and the amount of traffic passing through each node.

[140] 2010.15577

Import test questions into Moodle LMS

The purpose of the study is to highlight the theoretical and methodological aspects of preparing the test questions of the most common types in the form of text files for further import into learning management system (LMS) Moodle. The subject of the research is the automated filling of the Moodle LMS test database. The objectives of the study: to analyze the import files of test questions, their advantages and disadvantages; to develop guidelines for the preparation of test questions of common types in the form of text files for further import into Moodle LMS. The action algorithms for importing questions and instructions for submitting question files in such formats as Aiken, GIFT, Moodle XML, "True/False" questions, "Multiple Choice" (one of many and many of many), "Matching", with an open answer - "Numerical" or "Short answer" and "Essay" are offered in this article. The formats for submitting questions, examples of its designing and developed questions were demonstrated in view mode in Moodle LMS.

[141] 2010.15578

Exploring the Nuances of Designing (with/for) Artificial Intelligence

Solutions relying on artificial intelligence are devised to predict data patterns and answer questions that are clearly defined, involve an enumerable set of solutions, clear rules, and inherently binary decision mechanisms. Yet, as they become exponentially implemented in our daily activities, they begin to transcend these initial boundaries and to affect the larger sociotechnical system in which they are situated. In this arrangement, a solution is under pressure to surpass true or false criteria and move to an ethical evaluation of right and wrong. Neither algorithmic solutions, nor purely humanistic ones will be enough to fully mitigate undesirable outcomes in the narrow state of AI or its future incarnations. We must take a holistic view. In this paper we explore the construct of infrastructure as a means to simultaneously address algorithmic and societal issues when designing AI.

[142] 2010.15579

Modeling biomedical breathing signals with convolutional deep probabilistic autoencoders

One of the main problems with biomedical signals is the limited amount of patient-specific data and the significant amount of time needed to record a sufficient number of samples for diagnostic and treatment purposes. We explore the use of Variational Autoencoder (VAE) and Adversarial Autoencoder (AAE) algorithms based on one-dimensional convolutional neural networks in order to build generative models able to capture and represent the variability of a set of unlabeled quasi-periodic signals using as few as 10 parameters. Furthermore, we introduce a modified AAE architecture that allows simultaneous semi-supervised classification and generation of different types of signals. Our study is based on physical breathing signals, i.e. time series describing the position of chest markers, generally used to describe respiratory motion. The time series are discretized into a vector of periods, with each period containing 6 time and position values. These vectors can be transformed back into time series through an additional reconstruction neural network and allow to generate extended signals while simplifying the modeling task. The obtained models can be used to generate realistic breathing realizations from patient or population data and to classify new recordings. We show that by incorporating the labels from around 10-15\% of the dataset during training, the model can be guided to group data according to the patient it belongs to, or based on the presence of different types of breathing irregularities such as baseline shifts. Our specific motivation is to model breathing motion during radiotherapy lung cancer treatments, for which the developed model serves as an efficient tool to robustify plans against breathing uncertainties. However, the same methodology can in principle be applied to any other kind of quasi-periodic biomedical signal, representing a generically applicable tool.

[143] 2010.15581

The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research

Increasingly, modern Artificial Intelligence (AI) research has become more computationally intensive. However, a growing concern is that due to unequal access to computing power, only certain firms and elite universities have advantages in modern AI research. Using a novel dataset of 171394 papers from 57 prestigious computer science conferences, we document that firms, in particular, large technology firms and elite universities have increased participation in major AI conferences since deep learning's unanticipated rise in 2012. The effect is concentrated among elite universities, which are ranked 1-50 in the QS World University Rankings. Further, we find two strategies through which firms increased their presence in AI research: first, they have increased firm-only publications; and second, firms are collaborating primarily with elite universities. Consequently, this increased presence of firms and elite universities in AI research has crowded out mid-tier (QS ranked 201-300) and lower-tier (QS ranked 301-500) universities. To provide causal evidence that deep learning's unanticipated rise resulted in this divergence, we leverage the generalized synthetic control method, a data-driven counterfactual estimator. Using machine learning based text analysis methods, we provide additional evidence that the divergence between these two groups - large firms and non-elite universities - is driven by access to computing power or compute, which we term as the "compute divide". This compute divide between large firms and non-elite universities increases concerns around bias and fairness within AI technology, and presents an obstacle towards "democratizing" AI. These results suggest that a lack of access to specialized equipment such as compute can de-democratize knowledge production.

[144] 2010.15582

Improving Accuracy of Federated Learning in Non-IID Settings

Federated Learning (FL) is a decentralized machine learning protocol that allows a set of participating agents to collaboratively train a model without sharing their data. This makes FL particularly suitable for settings where data privacy is desired. However, it has been observed that the performance of FL is closely tied with the local data distributions of agents. Particularly, in settings where local data distributions vastly differ among agents, FL performs rather poorly with respect to the centralized training. To address this problem, we hypothesize the reasons behind the performance degradation, and develop some techniques to address these reasons accordingly. In this work, we identify four simple techniques that can improve the performance of trained models without incurring any additional communication overhead to FL, but rather, some light computation overhead either on the client, or the server-side. In our experimental analysis, combination of our techniques improved the validation accuracy of a model trained via FL by more than 12% with respect to our baseline. This is about 5% less than the accuracy of the model trained on centralized data.

[145] 2010.15583

Probabilistic Transformers

We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to other probabilistic cases.

[146] 2010.15584

Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program

The CSSI 2019 workshop was held on October 28-29, 2019, in Austin, Texas. The main objectives of this workshop were to (1) understand the impact of the CSSI program on the community over the last 9 years, (2) engage workshop participants in identifying gaps and opportunities in the current CSSI landscape, (3) gather ideas on the cyberinfrastructure needs and expectations of the community with respect to the CSSI program, and (4) prepare a report summarizing the feedback gathered from the community that can inform the future solicitations of the CSSI program. The workshop brought together different stakeholders interested in provisioning sustainable cyberinfrastructure that can power discoveries impacting the various fields of science and technology and maintaining the nation's competitiveness in the areas such as scientific software, HPC, networking, cybersecurity, and data/information science. The workshop served as a venue for gathering the community-feedback on the current state of the CSSI program and its future directions.

[147] 2010.15585

Panel: Economic Policy and Governance during Pandemics using AI

The global food supply chain (starting at farms and ending with consumers) has been seriously disrupted by many outlier events such as trade wars, the China demand shock, natural disasters, and pandemics. Outlier events create uncertainty along the entire supply chain in addition to intervening policy responses to mitigate their adverse effects. Artificial Intelligence (AI) methods (i.e. machine/reinforcement/deep learning) provide an opportunity to better understand outcomes during outlier events by identifying regular, irregular and contextual components. Employing AI can provide guidance to decision making suppliers, farmers, processors, wholesalers, and retailers along the supply chain, and policy makers to facilitate welfare-improving outcomes. This panel discusses these issues.

[148] 2010.15588

Impact of (SARS-CoV-2) COVID 19 on the indigenous language-speaking population in Mexico

The importance of the working document is that it allows the analysis of the information and the status of cases associated with (SARS-CoV-2) COVID-19 as open data at the municipal, state and national level, with a daily record of patients, according to a age, sex, comorbidities, for the condition of (SARS-CoV-2) COVID-19 according to the following characteristics: a) Positive, b) Negative, c) Suspicious. Likewise, it presents information related to the identification of an outpatient and / or hospitalized patient, attending to their medical development, identifying: a) Recovered, b) Deaths and c) Active, in Phase 3 and Phase 4, in the five main population areas speaker of indigenous language in the State of Veracruz - Mexico. The data analysis is carried out through the application of a data mining algorithm, which provides the information, fast and timely, required for the estimation of Medical Care Scenarios of (SARS-CoV-2) COVID-19, as well as for know the impact on the indigenous language-speaking population in Mexico.

[149] 2010.15590

Enjeux éthiques de l'IA en santé : une humanisation du parcours de soin par l'intelligence artificielle ?

Considering the use of artificial intelligence for greater personalization of patient care and better management of human and material resources may seem like an opportunity not to be missed. In order to offer a better humanization of the care pathway, artificial intelligence is a tool that decision-makers in the hospital sector must appropriate by taking care of the new ethical issues and conflicts of values that this technology generates. Envisager le recours \`a l'intelligence artificielle pour une plus grande personnalisation de la prise en charge du patient et une meilleure gestion des ressources humaines et mat\'erielles peut sembler une opportunit\'e \`a ne pas manquer. Afin de proposer une meilleure humanisation du parcours de soin, l'intelligence artificielle est un outil que les d\'ecideurs du milieu hospitalier doivent s'approprier en veillant aux nouveaux enjeux \'ethiques et conflits de valeurs que cette technologie engendre.

[150] 2010.15594

Shared Space Transfer Learning for analyzing multi-site fMRI data

Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data, for distinguishing when subjects are performing different cognitive tasks -- e.g., watching movies or making decisions. MVPA works best with a well-designed feature set and an adequate sample size. However, most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes. Further, training a robust, generalized predictive model that can analyze homogeneous cognitive tasks provided by multi-site fMRI datasets has additional challenges. This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning (TL) approach that can functionally align homogeneous multi-site fMRI datasets, and so improve the prediction performance in every site. SSTL first extracts a set of common features for all subjects in each site. It then uses TL to map these site-specific features to a site-independent shared space in order to improve the performance of the MVPA. SSTL uses a scalable optimization procedure that works effectively for high-dimensional fMRI datasets. The optimization procedure extracts the common features for each site by using a single-iteration algorithm and maps these site-specific common features to the site-independent shared space. We evaluate the effectiveness of the proposed method for transferring between various cognitive tasks. Our comprehensive experiments validate that SSTL achieves superior performance to other state-of-the-art analysis techniques.

[151] 2010.15596

Verification of Patterns

The software patterns provide building blocks to the design and implementation of a software system, and try to make the software engineering to progress from experience to science. The software patterns were made famous because of the introduction as the design patterns. After that, patterns have been researched and developed widely and rapidly. The series of books of pattern-oriented software architecture should be marked in the development of software patterns. As mentioned in these books, formalization of patterns and an intermediate pattern language are needed and should be developed in the future of patterns. So, in this book, we formalize software patterns according to the categories of the series of books of pattern-oriented software architecture, and verify the correctness of patterns based on truly concurrent process algebra. In one aspect, patterns are formalized and verified; in the other aspect, truly concurrent process algebra can play a role of an intermediate pattern language for its rigorous theory.

[152] 2010.15597

Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control

In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment and consequently the agent receives the reward corresponding to that action by a delay called action-effect delay. Such delays reduce the performance of the learning algorithm and increase the computational costs, as the reinforcement learning agent values the immediate rewards more than the future reward that is more related to the taken action. This paper addresses this issue by introducing an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action and builds a function that reflects the environments response to that action, called the reflexive $\gamma$ - function. During the training phase, the agent utilizes the created reflexive $\gamma$- function to update the Q-values. We have applied the developed method to a structural control problem in which the goal of the agent is to reduce the vibrations of a building subjected to earthquake excitations with a specified delay. Seismic control problems are considered as a complex task in structural engineering because of the stochastic and unpredictable nature of earthquakes and the complex behavior of the structure. Three scenarios are presented to study the effects of zero, medium, and long action-effect delays and the performance of the Enhanced method is compared to the standard Q-learning method. Both RL methods use neural network to learn to estimate the state-action value function that is used to control the structure. The results show that the enhanced method significantly outperforms the performance of the original method in all cases, and also improves the stability of the algorithm in dealing with action-effect delays.

[153] 2010.15598

May I Ask Who's Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance

We investigate using Named Entity Recognition on a new type of user-generated text: a call center conversation. These conversations combine problems from spontaneous speech with problems novel to conversational Automated Speech Recognition, including incorrect recognition, alongside other common problems from noisy user-generated text. Using our own corpus with new annotations, training custom contextual string embeddings, and applying a BiLSTM-CRF, we match state-of-the-art results on our novel task.

[154] 2010.15599

Expert Selection in High-Dimensional Markov Decision Processes

In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall performance of the system. This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.

[155] 2010.15600

Three computational models and its equivalence

The study of computability has its origin in Hilbert's conference of 1900, where an adjacent question, to the ones he asked, is to give a precise description of the notion of algorithm. In the search for a good definition arose three independent theories: Turing and the Turing machines, G\"odel and the recursive functions, Church and the Lambda Calculus. Later there were established by Kleene that the classic models of computation are equivalent. This fact is widely accepted by many textbooks and the proof is omitted since the proof is tedious and unreadable. We intend to fill this gap presenting the proof in a modern way, without forgetting the mathematical details.

[156] 2010.15601

Using a Binary Classification Model to Predict the Likelihood of Enrolment to the Undergraduate Program of a Philippine University

With the recent implementation of the K to 12 Program, academic institutions, specifically, Colleges and Universities in the Philippines have been faced with difficulties in determining projected freshmen enrollees vis-a-vis decision-making factors for efficient resource management. Enrollment targets directly impacts success factors of Higher Education Institutions. This study covered an analysis of various characteristics of freshmen applicants affecting their admission status in a Philippine university. A predictive model was developed using Logistic Regression to evaluate the probability that an admitted student will pursue to enroll in the Institution or not. The dataset used was acquired from the University Admissions Office. The office designed an online application form to capture applicants' details. The online form was distributed to all student applicants, and most often, students, tend to provide incomplete information. Despite this fact, student characteristics, as well as geographic and demographic data based on the students' location are significant predictors of enrollment decision. The results of the study show that given limited information about prospective students, Higher Education Institutions can implement machine learning techniques to supplement management decisions and provide estimates of class sizes, in this way, it will allow the institution to optimize the allocation of resources and will have better control over net tuition revenue.

[157] 2010.15602

Designing learning experiences for online teaching and learning

Teaching is about constantly innovating strategies, ways and means to engage diverse students in active and meaningful learning. In line with this, SUTD adopts various student-centric teaching and learning teaching methods and approaches. This means that our graduate/undergraduate instructors have to be ready to teach using these student student-centric teaching and learning pedagogies. In this article, I share my experiences of redesigning this teaching course that is typically conducted face-to-face to a synchronous online course and also invite one of the participant in this course to reflect on his experience as a student.

[158] 2010.15603

Suppressing Mislabeled Data via Grouping and Self-Attention

Deep networks achieve excellent results on large-scale clean data but degrade significantly when learning from noisy labels. To suppressing the impact of mislabeled data, this paper proposes a conceptually simple yet efficient training block, termed as Attentive Feature Mixup (AFM), which allows paying more attention to clean samples and less to mislabeled ones via sample interactions in small groups. Specifically, this plug-and-play AFM first leverages a \textit{group-to-attend} module to construct groups and assign attention weights for group-wise samples, and then uses a \textit{mixup} module with the attention weights to interpolate massive noisy-suppressed samples. The AFM has several appealing benefits for noise-robust deep learning. (i) It does not rely on any assumptions and extra clean subset. (ii) With massive interpolations, the ratio of useless samples is reduced dramatically compared to the original noisy ratio. (iii) \pxj{It jointly optimizes the interpolation weights with classifiers, suppressing the influence of mislabeled data via low attention weights. (iv) It partially inherits the vicinal risk minimization of mixup to alleviate over-fitting while improves it by sampling fewer feature-target vectors around mislabeled data from the mixup vicinal distribution.} Extensive experiments demonstrate that AFM yields state-of-the-art results on two challenging real-world noisy datasets: Food101N and Clothing1M. The code will be available at

[159] 2010.15604

Autoregressive Asymmetric Linear Gaussian Hidden Markov Models

In a real life process evolving over time, the relationship between its relevant variables may change. Therefore, it is advantageous to have different inference models for each state of the process. Asymmetric hidden Markov models fulfil this dynamical requirement and provide a framework where the trend of the process can be expressed as a latent variable. In this paper, we modify these recent asymmetric hidden Markov models to have an asymmetric autoregressive component, allowing the model to choose the order of autoregression that maximizes its penalized likelihood for a given training set. Additionally, we show how inference, hidden states decoding and parameter learning must be adapted to fit the proposed model. Finally, we run experiments with synthetic and real data to show the capabilities of this new model.

[160] 2010.15605

Manifold learning-based feature extraction for structural defect reconstruction

Data-driven quantitative defect reconstructions using ultrasonic guided waves has recently demonstrated great potential in the area of non-destructive testing. In this paper, we develop an efficient deep learning-based defect reconstruction framework, called NetInv, which recasts the inverse guided wave scattering problem as a data-driven supervised learning progress that realizes a mapping between reflection coefficients in wavenumber domain and defect profiles in the spatial domain. The superiorities of the proposed NetInv over conventional reconstruction methods for defect reconstruction have been demonstrated by several examples. Results show that NetInv has the ability to achieve the higher quality of defect profiles with remarkable efficiency and provides valuable insight into the development of effective data driven structural health monitoring and defect reconstruction using machine learning.

[161] 2010.15606

Design and Evaluation of Electric Bus Systems for Metropolitan Cities

Over the past decade, most of the metropolitan cities across the world have been witnessing a degrading trend in air quality index. Exhaust emission data observations show that promotion of public transport could be a potential way out of this gridlock. Due to environmental concerns, numerous public transport authorities harbor a great interest in introducing zero emission electric buses. A shift from conventional diesel buses to electric buses comes with several benefits in terms of reduction in local pollution, noise, and fuel consumption. This paper proposes the relevant vehicle technologies, powertrain, and charging systems, which, in combination, provides a comprehensive methodology to design an Electric Bus that can be deployed in metropolitan cities to mitigate emission concerns.

[162] 2010.15607

CRICTRS: Embeddings based Statistical and Semi Supervised Cricket Team Recommendation System

Team Recommendation has always been a challenging aspect in team sports. Such systems aim to recommend a player combination best suited against the opposition players, resulting in an optimal outcome. In this paper, we propose a semi-supervised statistical approach to build a team recommendation system for cricket by modelling players into embeddings. To build these embeddings, we design a qualitative and quantitative rating system which considers the strength of opposition also for evaluating player performance. The embeddings obtained, describes the strengths and weaknesses of the players based on past performances of the player. We also embark on a critical aspect of team composition, which includes the number of batsmen and bowlers in the team. The team composition changes over time, depending on different factors which are tough to predict, so we take this input from the user and use the player embeddings to decide the best possible team combination with the given team composition.

[163] 2010.15614

An Overview Of 3D Object Detection

Point cloud 3D object detection has recently received major attention and becomes an active research topic in 3D computer vision community. However, recognizing 3D objects in LiDAR (Light Detection and Ranging) is still a challenge due to the complexity of point clouds. Objects such as pedestrians, cyclists, or traffic cones are usually represented by quite sparse points, which makes the detection quite complex using only point cloud. In this project, we propose a framework that uses both RGB and point cloud data to perform multiclass object recognition. We use existing 2D detection models to localize the region of interest (ROI) on the RGB image, followed by a pixel mapping strategy in the point cloud, and finally, lift the initial 2D bounding box to 3D space. We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.

[164] 2010.15620

CAFE: Coarse-to-Fine Neural Symbolic Reasoning for Explainable Recommendation

Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. This can be achieved by explicit KG reasoning, where a model starts from a user node, sequentially determines the next step, and walks towards an item node of potential interest to the user. However, this is challenging due to the huge search space, unknown destination, and sparse signals over the KG, so informative and effective guidance is needed to achieve a satisfactory recommendation quality. To this end, we propose a CoArse-to-FinE neural symbolic reasoning approach (CAFE). It first generates user profiles as coarse sketches of user behaviors, which subsequently guide a path-finding process to derive reasoning paths for recommendations as fine-grained predictions. User profiles can capture prominent user behaviors from the history, and provide valuable signals about which kinds of path patterns are more likely to lead to potential items of interest for the user. To better exploit the user profiles, an improved path-finding algorithm called Profile-guided Path Reasoning (PPR) is also developed, which leverages an inventory of neural symbolic reasoning modules to effectively and efficiently find a batch of paths over a large-scale KG. We extensively experiment on four real-world benchmarks and observe substantial gains in the recommendation performance compared with state-of-the-art methods.

[165] 2010.15638

Abstract Value Iteration for Hierarchical Reinforcement Learning

We propose a novel hierarchical reinforcement learning framework for control with continuous state and action spaces. In our framework, the user specifies subgoal regions which are subsets of states; then, we (i) learn options that serve as transitions between these subgoal regions, and (ii) construct a high-level plan in the resulting abstract decision process (ADP). A key challenge is that the ADP may not be Markov, which we address by proposing two algorithms for planning in the ADP. Our first algorithm is conservative, allowing us to prove theoretical guarantees on its performance, which help inform the design of subgoal regions. Our second algorithm is a practical one that interweaves planning at the abstract level and learning at the concrete level. In our experiments, we demonstrate that our approach outperforms state-of-the-art hierarchical reinforcement learning algorithms on several challenging benchmarks.

[166] 2010.15643

Free-Form Image Inpainting via Contrastive Attention Network

Most deep learning based image inpainting approaches adopt autoencoder or its variants to fill missing regions in images. Encoders are usually utilized to learn powerful representational spaces, which are important for dealing with sophisticated learning tasks. Specifically, in image inpainting tasks, masks with any shapes can appear anywhere in images (i.e., free-form masks) which form complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. To tackle this problem, we propose a self-supervised Siamese inference network to improve the robustness and generalization. It can encode contextual semantics from full resolution images and obtain more discriminative representations. we further propose a multi-scale decoder with a novel dual attention fusion module (DAF), which can combine both the restored and known regions in a smooth way. This multi-scale architecture is beneficial for decoding discriminative representations learned by encoders into images layer by layer. In this way, unknown regions will be filled naturally from outside to inside. Qualitative and quantitative experiments on multiple datasets, including facial and natural datasets (i.e., Celeb-HQ, Pairs Street View, Places2 and ImageNet), demonstrate that our proposed method outperforms state-of-the-art methods in generating high-quality inpainting results.

[167] 2010.15651

Reliable Graph Neural Networks via Robust Aggregation

Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node's neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50\%. Our novel aggregation function, Soft Medoid, is a fully differentiable generalization of the Medoid and therefore lends itself well for end-to-end deep learning. Equipping a GNN with our aggregation improves the robustness with respect to structure perturbations on Cora ML by a factor of 3 (and 5.5 on Citeseer) and by a factor of 8 for low-degree nodes.

[168] 2010.15653

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training targets. The newly proposed graph-based temporal classification (GTC) objective is applied for self-training with WFST-based supervision, which is generated from an N-best list of pseudo-labels. In this setup, GTC is used to learn not only a temporal alignment, similarly to CTC, but also a label alignment to obtain the optimal pseudo-label sequence from the weighted graph. Results show that this approach can effectively exploit an N-best list of pseudo-labels with associated scores, outperforming standard pseudo-labeling by a large margin, with ASR results close to an oracle experiment in which the best hypotheses of the N-best lists are selected manually.

[169] 2010.15665

Machine Ethics and Automated Vehicles

Road vehicle travel at a reasonable speed involves some risk, even when using computer-controlled driving with failure-free hardware and perfect sensing. A fully-automated vehicle must continuously decide how to allocate this risk without a human driver's oversight. These are ethical decisions, particularly in instances where an automated vehicle cannot avoid crashing. In this chapter, I introduce the concept of moral behavior for an automated vehicle, argue the need for research in this area through responses to anticipated critiques, and discuss relevant applications from machine ethics and moral modeling research.

[170] 2010.15668

PeopleXploit -- A hybrid tool to collect public data

This paper introduces the concept of Open Source Intelligence (OSINT) as an important application in intelligent profiling of individuals. With a variety of tools available, significant data shall be obtained on an individual as a consequence of analyzing his/her internet presence but all of this comes at the cost of low relevance. To increase the relevance score in profiling, PeopleXploit is being introduced. PeopleXploit is a hybrid tool which helps in collecting the publicly available information that is reliable and relevant to the given input. This tool is used to track and trace the given target with their digital footprints like Name, Email, Phone Number, User IDs etc. and the tool will scan & search other associated data from public available records from the internet and create a summary report against the target. PeopleXploit profiles a person using authorship analysis and finds the best matching guess. Also, the type of analysis performed (professional/matrimonial/criminal entity) varies with the requirement of the user.

[171] 2010.15669

Using Twitter to Analyze Political Polarization During National Crises

Democrats and Republicans have seemed to grow apart in the past three decades. Since the United States as we know it today is undeniably bipartisan, this phenomenon would not appear as a surprise to most. However, there are triggers which can cause spikes in disagreements between Democrats and Republicans at a higher rate than how the two parties have been growing apart gradually over time. This study has analyzed the idea that national events which generally are detrimental to all individuals can be one of those triggers. By testing polarization before and after three events (Hurricane Sandy [2012], N. Korea Missile Test Surge [2019], COVID-19 [2020]) using Twitter data, we show that a measurable spike in polarization occurs between the Democrat and Republican party. In order to measure polarization, sentiments of Twitter users aligned to the Democrat and Republican parties are compared on identical entities (events, people, locations, etc.). Using hundreds of thousands of data samples, a 2.8% increase in polarization was measured during times of crisis compared to times where no crises were occurring. Regardless of the reasoning that the gap between political parties can increase so much during times of suffering and stress, it is definitely alarming to see that among other aspects of life, the partisan gap worsens during detrimental national events.

[172] 2010.15670

Detecting Individuals with Depressive Disorder fromPersonal Google Search and YouTube History Logs

Depressive disorder is one of the most prevalent mental illnesses among the global population. However, traditional screening methods require exacting in-person interviews and may fail to provide immediate interventions. In this work, we leverage ubiquitous personal longitudinal Google Search and YouTube engagement logs to detect individuals with depressive disorder. We collected Google Search and YouTube history data and clinical depression evaluation results from $212$ participants ($99$ of them suffered from moderate to severe depressions). We then propose a personalized framework for classifying individuals with and without depression symptoms based on mutual-exciting point process that captures both the temporal and semantic aspects of online activities. Our best model achieved an average F1 score of $0.77 \pm 0.04$ and an AUC ROC of $0.81 \pm 0.02$.

[173] 2010.15671

Computing Crisp Bisimulations for Fuzzy Structures

Fuzzy structures such as fuzzy automata, fuzzy transition systems, weighted social networks and fuzzy interpretations in fuzzy description logics have been widely studied. For such structures, bisimulation is a natural notion for characterizing indiscernibility between states or individuals. There are two kinds of bisimulations for fuzzy structures: crisp bisimulations and fuzzy bisimulations. While the latter fits to the fuzzy paradigm, the former has also attracted attention due to the application of crisp equivalence relations, for example, in minimizing structures. Bisimulations can be formulated for fuzzy labeled graphs and then adapted to other fuzzy structures. In this article, we present an efficient algorithm for computing the partition corresponding to the largest crisp bisimulation of a given finite fuzzy labeled graph. Its complexity is of order $O((m\log{l} + n)\log{n})$, where $n$, $m$ and $l$ are the number of vertices, the number of nonzero edges and the number of different fuzzy degrees of edges of the input graph, respectively. We also study a similar problem for the setting with counting successors, which corresponds to the case with qualified number restrictions in description logics and graded modalities in modal logics. In particular, we provide an efficient algorithm with the complexity $O((m\log{m} + n)\log{n})$ for the considered problem in that setting.

[174] 2010.15673

Machine Learning Based Demand Modelling for On-Demand Transit Services: A Case Study of Belleville, Ontario

The use of mobile applications apps and GPS service on smartphones for transportation management applications has enabled the new "on-demand mobility" service, where the transportation supply is following the users' schedule and routes. In September 2018, the City of Belleville in Canada and Pantonium operationalized the same idea, but for the public transit service in the city to develop an on-demand transit (ODT) service. An existing fixed route (RT 11) public transit service was converted into an on-demand service during the night as a pilot project to maintain a higher demand sensitivity and highest operation cost efficiency per trip. In this study, Random Forest (RF), Bagging, Artificial Neural Network (ANN), and Deep Neural Network (DNN) machine learning algorithms were adopted to develop a pickup demand model (trip generation) and a trip demand model (trip distribution model) for Belleville ODT service based on the dissemination areas' demographic characteristics and the existing trip characteristics. The developed models aim to explain the demand behavior, investigate the main factors affecting the trip pattern and their relative importance, and to predict the number of generated trips from any dissemination area as well as between any two dissemination areas. The results indicate that the developed models can predict 63% and 70% of the pickup and trip demand levels, respectively. Both models are most affected by the month of the year and the day of the week variables. In addition, the population density has a higher impact on the ODT service pickup demand levels than the other demographic characteristics followed by the working age percentages and median income characteristics. Whereas, the distribution of the trips depends on the demographic characteristics of the destination area more than the origin area.

[175] 2010.15674

Analyzing Societal Impact of COVID-19: A Study During the Early Days of the Pandemic

In this paper, we collect and study Twitter communications to understand the societal impact of COVID-19 in the United States during the early days of the pandemic. With infections soaring rapidly, users took to Twitter asking people to self isolate and quarantine themselves. Users also demanded closure of schools, bars, and restaurants as well as lockdown of cities and states. We methodically collect tweets by identifying and tracking trending COVID-related hashtags. We first manually group the hashtags into six main categories, namely, 1) General COVID, 2) Quarantine, 3) Panic Buying, 4) School Closures, 5) Lockdowns, and 6) Frustration and Hope}, and study the temporal evolution of tweets in these hashtags. We conduct a linguistic analysis of words common to all hashtag groups and specific to each hashtag group and identify the chief concerns of people as the pandemic gripped the nation (e.g., exploring bidets as an alternative to toilet paper). We conduct sentiment analysis and our investigation reveals that people reacted positively to school closures and negatively to the lack of availability of essential goods due to panic buying. We adopt a state-of-the-art semantic role labeling approach to identify the action words and then leverage a LSTM-based dependency parsing model to analyze the context of action words (e.g., verb deal is accompanied by nouns such as anxiety, stress, and crisis). Finally, we develop a scalable seeded topic modeling approach to automatically categorize and isolate tweets into hashtag groups and experimentally validate that our topic model provides a grouping similar to our manual grouping. Our study presents a systematic way to construct an aggregated picture of peoples' response to the pandemic and lays the groundwork for future fine-grained linguistic and behavioral analysis.

[176] 2010.15675

Deep DA for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labeled Videos

Automatic estimation of pain intensity from facial expressions in videos has an immense potential in health care applications. However, domain adaptation (DA) is needed to alleviate the problem of domain shifts that typically occurs between video data captured in source and target do-mains. Given the laborious task of collecting and annotating videos, and the subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning (WSL)is gaining attention in such applications. Yet, most state-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relation between intensity levels, nor the temporal coherence of multiple consecutive frames. This paper introduces a new deep learn-ing model for weakly-supervised DA with ordinal regression(WSDA-OR), where videos in target domain have coarse la-bels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among the intensity levels as-signed to the target sequences, and associates multiple relevant frames to sequence-level labels (instead of a single frame). In particular, it learns discriminant and domain-invariant feature representations by integrating multiple in-stance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from the target domain. The proposed approach was validated on the RECOLA video dataset as fully-labeled source domain, and UNBC-McMaster video data as weakly-labeled target domain. We have also validated WSDA-OR on BIOVID and Fatigue (private) datasets for sequence level estimation. Experimental results indicate that our approach can provide a significant improvement over the state-of-the-art models, allowing to achieve a greater localization accuracy.

[177] 2010.15676

Optimization Fabrics for Behavioral Design

Second-order differential equations define smooth system behavior. In general, there is no guarantee that a system will behave well when forced by a potential function, but in some cases they do and may exhibit smooth optimization properties such as convergence to a local minimum of the potential. Such a property is desirable in system design since it is inherently linked to asymptotic stability. This paper presents a comprehensive theory of optimization fabrics which are second-order differential equations that encode nominal behaviors on a space and are guaranteed to optimize when forced away from those nominal trajectories by a potential function. Optimization fabrics, or fabrics for short, can encode commonalities among optimization problems that reflect the structure of the space itself, enabling smooth optimization processes to intelligently navigate each problem even when the potential function is simple and relatively naive. Importantly, optimization over a fabric is asymptotically stable, so optimization fabrics constitute a building block for provably stable system design.

[178] 2010.15678

On the Failure of the Smart Approach of the GPT Cryptosystem

This paper describes a new algorithm for breaking the smart approach of the GPT cryptosystem. We show that by puncturing the public code several times on specific positions, we get a public code on which applying the Frobenius operator appropriately allows to build an alternative secret key.

[179] 2010.15680

LSTM for Model-Based Anomaly Detection in Cyber-Physical Systems

Anomaly detection is the task of detecting data which differs from the normal behaviour of a system in a given context. In order to approach this problem, data-driven models can be learned to predict current or future observations. Oftentimes, anomalous behaviour depends on the internal dynamics of the system and looks normal in a static context. To address this problem, the model should also operate depending on state. Long Short-Term Memory (LSTM) neural networks have been shown to be particularly useful to learn time sequences with varying length of temporal dependencies and are therefore an interesting general purpose approach to learn the behaviour of arbitrarily complex Cyber-Physical Systems. In order to perform anomaly detection, we slightly modify the standard norm 2 error to incorporate an estimate of model uncertainty. We analyse the approach on artificial and real data.

[180] 2010.15683

Resilient Energy Efficient Healthcare Monitoring Infrastructure with Server and Network Protection

In this paper, a 1+1 server protection scheme is considered where two servers, a primary and a secondary processing server are used to serve ECG monitoring applications concurrently. The infrastructure is designed to be resilient against server failure under two scenarios related to the geographic location of primary and secondary servers and resilient against both server and network failures. A Mixed Integer Linear Programming (MILP) model is used to optimise the number and locations of both primary and secondary processing servers so that the energy consumption of the networking equipment and processing are minimised. The results show that considering a scenario for server protection without geographical constraints compared to the non-resilient scenario has resulted in both network and processing energy penalty as the traffic is doubled. The results also reveal that increasing the level of resilience to consider geographical constraints compared to case without geographical constraints resulted in higher network energy penalty when the demand is low as more nodes are utilised to place the processing servers under the geographic constraints. Also, increasing the level of resilience to consider network protection with link and node disjoint selection has resulted in a low network energy penalty at high demands due to the activation of a large part of the network in any case due to the demands. However, the results show that the network energy penalty is reduced with the increasing number of processing servers at each candidate node. Meanwhile, the same energy for processing is consumed regardless of the increasing level of resilience as the same number of processing servers are utilised. A heuristic is developed for each resilience scenario for real-time implementation where the results show that the performance of the heuristic is approaching the results of the MILP model.

[181] 2010.15684

Governance & Autonomy: Towards a Governance-based Analysis of Autonomy in Cyber-Physical Systems-of-Systems

One of the main challenges in integrating Cyber-Physical System-of-Systems (CPSoS) to function as a single unified system is the autonomy of its Cyber-Physical Systems (CPSs), which may lead to a lack of coordination among CPSs and results in various kinds of conflicts. We advocate that to efficiently integrate CPSs within the CPSoS, we may need to adjust the autonomy of some CPSs in a way that allows them to coordinate their activities to avoid any potential conflict among one another. To achieve that, we need to incorporate the notion of governance within the design of CPSoS, which defines rules that can be used for clearly specifying who and how can adjust the autonomy of a CPS. In this paper, we try to tackle this problem by proposing a new conceptual model that can be used for performing a governance-based analysis of autonomy for CPSs within CPSoS. We illustrate the utility of the model with an example from the automotive domain.

[182] 2010.15689

Learning Deep Interleaved Networks with Asymmetric Co-Attention for Image Restoration

Recently, convolutional neural network (CNN) has demonstrated significant success for image restoration (IR) tasks (e.g., image super-resolution, image deblurring, rain streak removal, and dehazing). However, existing CNN based models are commonly implemented as a single-path stream to enrich feature representations from low-quality (LQ) input space for final predictions, which fail to fully incorporate preceding low-level contexts into later high-level features within networks, thereby producing inferior results. In this paper, we present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction. The proposed DIN follows a multi-path and multi-branch pattern allowing multiple interconnected branches to interleave and fuse at different states. In this way, the shallow information can guide deep representative features prediction to enhance the feature expression ability. Furthermore, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies. Such AsyCA can not only adaptively emphasize the informative features from different states, but also improves the discriminative ability of networks. Our presented DIN can be trained end-to-end and applied to various IR tasks. Comprehensive evaluations on public benchmarks and real-world datasets demonstrate that the proposed DIN perform favorably against the state-of-the-art methods quantitatively and qualitatively.

[183] 2010.15690

Analyzing the tree-layer structure of Deep Forests

Random forests on the one hand, and neural networks on the other hand, have met great success in the machine learning community for their predictive performance. Combinations of both have been proposed in the literature, notably leading to the so-called deep forests (DF) [25]. In this paper, we investigate the mechanisms at work in DF and outline that DF architecture can generally be simplified into more simple and computationally efficient shallow forests networks. Despite some instability, the latter may outperform standard predictive tree-based methods. In order to precisely quantify the improvement achieved by these light network configurations over standard tree learners, we theoretically study the performance of a shallow tree network made of two layers, each one composed of a single centered tree. We provide tight theoretical lower and upper bounds on its excess risk. These theoretical results show the interest of tree-network architectures for well-structured data provided that the first layer, acting as a data encoder, is rich enough.

[184] 2010.15692

Unveiling process insights from refactoring practices

Context : Software comprehension and maintenance activities, such as refactoring, are said to be negatively impacted by software complexity. The methods used to measure software product and processes complexity have been thoroughly debated in the literature. However, the discernment about the possible links between these two dimensions, particularly on the benefits of using the process perspective, has a long journey ahead. Objective: To improve the understanding of the liaison of developers' activities and software complexity within a refactoring task, namely by evaluating if process metrics gathered from the IDE, using process mining methods and tools, are suitable to accurately classify different refactoring practices and the resulting software complexity. Method: We mined source code metrics from a software product after a quality improvement task was given in parallel to (117) software developers, organized in (71) teams. Simultaneously, we collected events from their IDE work sessions (320) and used process mining to model their processes and extract the correspondent metrics. Results: Most teams using a plugin for refactoring (JDeodorant) reduced software complexity more effectively and with simpler processes than the ones that performed refactoring using only Eclipse native features. We were able to find moderate correlations (43%) between software cyclomatic complexity and process cyclomatic complexity. The best models found for the refactoring method and cyclomatic complexity level predictions, had an accuracy of 92.95% and 94.36%, respectively. Conclusions: Our approach agnostic to programming languages, geographic location, or development practices. Initial findings are encouraging, and lead us to suggest practitioners may use our method in other development tasks, such as, defect analysis and unit or integration tests.

[185] 2010.15697

Generalized Insider Attack Detection Implementation using NetFlow Data

Insider Attack Detection in commercial networks is a critical problem that does not have any good solutions at this current time. The problem is challenging due to the lack of visibility into live networks and a lack of a standard feature set to distinguish between different attacks. In this paper, we study an approach centered on using network data to identify attacks. Our work builds on unsupervised machine learning techniques such as One-Class SVM and bi-clustering as weak indicators of insider network attacks. We combine these techniques to limit the number of false positives to an acceptable level required for real-world deployments by using One-Class SVM to check for anomalies detected by the proposed Bi-clustering algorithm. We present a prototype implementation in Python and associated results for two different real-world representative data sets. We show that our approach is a promising tool for insider attack detection in realistic settings.

[186] 2010.15698

Constrained Online Learning to Mitigate Distortion Effects in Pulse-Agile Cognitive Radar

Pulse-agile radar systems have demonstrated favorable performance in dynamic electromagnetic scenarios. However, the use of non-identical waveforms within a radar's coherent processing interval may lead to harmful distortion effects when pulse-Doppler processing is used. This paper presents an online learning framework to optimize detection performance while mitigating harmful sidelobe levels. The radar waveform selection process is formulated as a linear contextual bandit problem, within which waveform adaptations which exceed a tolerable level of expected distortion are eliminated. The constrained online learning approach is effective and computationally feasible, evidenced by simulations in a radar-communication coexistence scenario and in the presence of intentional adaptive jamming. This approach is applied to both stochastic and adversarial contextual bandit learning models and the detection performance in dynamic scenarios is evaluated.

[187] 2010.15703

Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress. Finally, we rely on an annealed quantization algorithm to better compress the network and achieve higher final accuracy. We show results on image classification, object detection, and segmentation, reducing the gap with the uncompressed model by 40 to 70% with respect to the current state of the art.

[188] 2010.15711

5W1H-based Expression for the Effective Sharing of Information in Digital Forensic Investigations

Digital forensic investigation is used in various areas related to digital devices including the cyber crime. This is an investigative process using many techniques, which have implemented as tools. The types of files covered by the digital forensic investigation are wide and varied, however, there is no way to express the results into a standardized format. The standardization are different by types of device, file system, or application. Different outputs make it time-consuming and difficult to share information and to implement integration. In addition, it could weaken cyber security. Thus, it is important to define normalization and to present data in the same format. In this paper, a 5W1H-based expression for information sharing for effective digital forensic investigation is proposed to analyze digital forensic information using six questions--what, who, where, when, why and how. Based on the 5W1H-based expression, digital information from different types of files is converted and represented in the same format of outputs. As the 5W1H is the basic writing principle, application of the 5W1H-based expression on the case studies shows that this expression enhances clarity and correctness for information sharing. Furthermore, in the case of security incidents, this expression has an advantage in being compatible with STIX.

[189] 2010.15716

Playing a Part: Speaker Verification at the Movies

The goal of this work is to investigate the performance of popular speaker recognition models on speech segments from movies, where often actors intentionally disguise their voice to play a character. We make the following three contributions: (i) We collect a novel, challenging speaker recognition dataset called VoxMovies, with speech for 856 identities from almost 4000 movie clips. VoxMovies contains utterances with varying emotion, accents and background noise, and therefore comprises an entirely different domain to the interview-style, emotionally calm utterances in current speaker recognition datasets such as VoxCeleb; (ii) We provide a number of domain adaptation evaluation sets, and benchmark the performance of state-of-the-art speaker recognition models on these evaluation pairs. We demonstrate that both speaker verification and identification performance drops steeply on this new data, showing the challenge in transferring models across domains; and finally (iii) We show that simple domain adaptation paradigms improve performance, but there is still large room for improvement.

[190] 2010.15718

What can we learn from gradients?

Recent work (\cite{zhu2019deep}) has shown that it is possible to reconstruct the input (image) from the gradient of a neural network. In this paper, our aim is to better understand the limits to reconstruction and to speed up image reconstruction by imposing prior image information and improved initialization. Firstly, we show that for the \textbf{non-linear} neural network, gradient-based reconstruction approximates to solving a high-dimension \textbf{linear} equations for both fully-connected neural network and convolutional neural network. Exploring the theoretical limits of input reconstruction, we show that a fully-connected neural network with a \textbf{one} hidden node is enough to reconstruct a \textbf{single} input image, regardless of the number of nodes in the output layer. Then we generalize this result to a gradient averaged over mini-batches of size B. In this case, the full mini-batch can be reconstructed in a fully-connected network if the number of hidden units exceeds B. For a convolutional neural network, the required number of filters in the first convolutional layer again is decided by the batch size B, however, in this case, input width d and the width after filter $d^{'}$ also play the role $h=(\frac{d}{d^{'}})^2BC$, where C is channel number of input. Finally, we validate and underpin our theoretical analysis on bio-medical data (fMRI, ECG signals, and cell images) and on benchmark data (MNIST, CIFAR100, and face images).

[191] 2010.15728

Explainable Automated Coding of Clinical Notes using Hierarchical Label-wise Attention Networks and Label Embedding Initialisation

Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlation among medical codes which can potentially be exploited to improve the performance. We propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS COVID-19 shielding codes. Experiments were conducted to compare HLAN and LE initialisation to the state-of-the-art neural network based methods. HLAN achieved the best Micro-level AUC and $F_1$ on the top-50 code prediction and comparable results on the NHS COVID-19 shielding code prediction to other models. By highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to its downgraded baselines and the CNN-based models. LE initialisation consistently boosted most deep learning models for automated medical coding.

[192] 2010.15738

The Agile Coach Role: Coaching for Agile Performance Impact

It is increasingly common to introduce agile coaches to help gain speed and advantage in agile companies. Following the success of Spotify, the role of the agile coach has branched out in terms of tasks and responsibilities, but little research has been conducted to examine how this role is practiced. This paper examines the role of the agile coach through 19 semistructured interviews with agile coaches from ten different companies. We describe the role in terms of the tasks the coach has in agile projects, valuable traits, skills, tools, and the enablers of agile coaching. Our findings indicate that agile coaches perform at the team and organizational levels. They affect effort, strategies, knowledge, and skills of the agile teams. The most essential traits of an agile coach are being emphatic, people-oriented, able to listen, diplomatic, and persistent. We suggest empirically based advice for agile coaching, for example companies giving their agile coaches the authority to implement the required organizational changes within and outside the teams.

[193] 2010.15740

Recurrent Neural Networks for video object detection

There is lots of scientific work about object detection in images. For many applications like for example autonomous driving the actual data on which classification has to be done are videos. This work compares different methods, especially those which use Recurrent Neural Networks to detect objects in videos. We differ between feature-based methods, which feed feature maps of different frames into the recurrent units, box-level methods, which feed bounding boxes with class probabilities into the recurrent units and methods which use flow networks. This study indicates common outcomes of the compared methods like the benefit of including the temporal context into object detection and states conclusions and guidelines for video object detection networks.

[194] 2010.15745

Causal variables from reinforcement learning using generalized Bellman equations

Many open problems in machine learning are intrinsically related to causality, however, the use of causal analysis in machine learning is still in its early stage. Within a general reinforcement learning setting, we consider the problem of building a general reinforcement learning agent which uses experience to construct a causal graph of the environment, and use this graph to inform its policy. Our approach has three characteristics: First, we learn a simple, coarse-grained causal graph, in which the variables reflect states at many time instances, and the interventions happen at the level of policies, rather than individual actions. Secondly, we use mediation analysis to obtain an optimization target. By minimizing this target, we define the causal variables. Thirdly, our approach relies on estimating conditional expectations rather the familiar expected return from reinforcement learning, and we therefore apply a generalization of Bellman's equations. We show the method can learn a plausible causal graph in a grid-world environment, and the agent obtains an improvement in performance when using the causally informed policy. To our knowledge, this is the first attempt to apply causal analysis in a reinforcement learning setting without strict restrictions on the number of states. We have observed that mediation analysis provides a promising avenue for transforming the problem of causal acquisition into one of cost-function minimization, but importantly one which involves estimating conditional expectations. This is a new challenge, and we think that causal reinforcement learning will involve development methods suited for online estimation of such conditional expectations. Finally, a benefit of our approach is the use of very simple causal models, which are arguably a more natural model of human causal understanding.

[195] 2010.15750

Gaussian Process Bandit Optimization of theThermodynamic Variational Objective

Achieving the full promise of the Thermodynamic Variational Objective (TVO),a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" ofsorted discretization points. This paper introduces a bespoke Gaussian processbandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adaptstheir positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimizationconverges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference inVariational Autoencoders and Sigmoid Belief Networks.

[196] 2010.15755

A more Pragmatic Implementation of the Lock-free, Ordered, Linked List

The lock-free, ordered, linked list is an important, standard example of a concurrent data structure. An obvious, practical drawback of textbook implementations is that failed compare-and-swap (CAS) operations lead to retraversal of the entire list (retries), which is particularly harmful for such a linear-time data structure. We alleviate this drawback by first observing that failed CAS operations under some conditions do not require a full retry, and second by maintaining approximate backwards pointers that are used to find a closer starting position in the list for operation retry. Experiments with both a worst-case deterministic benchmark, and a standard, randomized, mixed-operation throughput benchmark on three shared-memory systems (Intel Xeon, AMD EPYC, SPARC-T5) show practical improvements ranging from significant, to dramatic, several orders of magnitude.

[197] 2010.15760

Identifying Transition States of Chemical Kinetic Systems using Network Embedding Techniques

Using random walk sampling methods for feature learning on networks, we develop a method for generating low-dimensional node embeddings for directed graphs and identifying transition states of stochastic chemical reacting systems. We modified objective functions adopted in existing random walk based network embedding methods to handle directed graphs and neighbors of different degrees. Through optimization via gradient ascent, we embed the weighted graph vertices into a low-dimensional vector space Rd while preserving the neighborhood of each node. We then demonstrate the effectiveness of the method on dimension reduction through several examples regarding identification of transition states of chemical reactions, especially for entropic systems.

[198] 2010.15770

Recursive Random Contraction Revisited

In this note, we revisit the recursive random contraction algorithm of Karger and Stein for finding a minimum cut in a graph. Our revisit is occasioned by a paper of Fox, Panigrahi, and Zhang which gives an extension of the Karger-Stein algorithm to minimum cuts and minimum $k$-cuts in hypergraphs. When specialized to the case of graphs, the algorithm is somewhat different than the original Karger-Stein algorithm. We show that the analysis becomes particularly clean in this case: we can prove that the probability that a fixed minimum cut in an $n$ node graph is returned by the algorithm is bounded below by $1/(2H_n-2)$, where $H_n$ is the $n$th harmonic number. We also consider other similar variants of the algorithm, and show that no such algorithm can achieve an asymptotically better probability of finding a fixed minimum cut.

[199] 2010.15772

GANs & Reels: Creating Irish Music using a Generative Adversarial Network

In this paper we present a method for algorithmic melody generation using a generative adversarial network without recurrent components. Music generation has been successfully done using recurrent neural networks, where the model learns sequence information that can help create authentic sounding melodies. Here, we use DC-GAN architecture with dilated convolutions and towers to capture sequential information as spatial image information, and learn long-range dependencies in fixed-length melody forms such as Irish traditional reel.

[200] 2010.15773

WaveTransform: Crafting Adversarial Examples via Input Decomposition

Frequency spectrum has played a significant role in learning unique and discriminating features for object recognition. Both low and high frequency information present in images have been extracted and learnt by a host of representation learning techniques, including deep learning. Inspired by this observation, we introduce a novel class of adversarial attacks, namely `WaveTransform', that creates adversarial noise corresponding to low-frequency and high-frequency subbands, separately (or in combination). The frequency subbands are analyzed using wavelet decomposition; the subbands are corrupted and then used to construct an adversarial example. Experiments are performed using multiple databases and CNN models to establish the effectiveness of the proposed WaveTransform attack and analyze the importance of a particular frequency component. The robustness of the proposed attack is also evaluated through its transferability and resiliency against a recent adversarial defense algorithm. Experiments show that the proposed attack is effective against the defense algorithm and is also transferable across CNNs.

[201] 2010.15775

Understanding the Failure Modes of Out-of-Distribution Generalization

Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained linear classifiers on some easy-to-learn tasks, we uncover two complementary failure modes. These modes arise from how spurious correlations induce two kinds of skews in the data: one geometric in nature, and another, statistical in nature. Finally, we construct natural modifications of image classification datasets to understand when these failure modes can arise in practice. We also design experiments to isolate the two failure modes when training modern neural networks on these datasets.

[202] 2010.15778

Contextual BERT: Conditioning the Language Model Using a Global State

BERT is a popular language model whose main pre-training task is to fill in the blank, i.e., predicting a word that was masked out of a sentence, based on the remaining words. In some applications, however, having an additional context can help the model make the right prediction, e.g., by taking the domain or the time of writing into account. This motivates us to advance the BERT architecture by adding a global state for conditioning on a fixed-sized context. We present our two novel approaches and apply them to an industry use-case, where we complete fashion outfits with missing articles, conditioned on a specific customer. An experimental comparison to other methods from the literature shows that our methods improve personalization significantly.

[203] 2010.15784

Stable and efficient Petrov-Galerkin methods for a kinetic Fokker-Planck equation

We propose a stable Petrov-Galerkin discretization of a kinetic Fokker-Planck equation constructed in such a way that uniform inf-sup stability can be inferred directly from the variational formulation. Inspired by well-posedness results for parabolic equations, we derive a lower bound for the dual inf-sup constant of the Fokker-Planck bilinear form by means of stable pairs of trial and test functions. The trial function of such a pair is constructed by applying the kinetic transport operator and the inverse velocity Laplace-Beltrami operator to a given test function. For the Petrov-Galerkin projection we choose an arbitrary discrete test space and then define the discrete trial space using the same application of transport and inverse Laplace-Beltrami operator. As a result, the spaces replicate the stable pairs of the continuous level and we obtain a well-posed numerical method with a discrete inf-sup constant identical to the inf-sup constant of the continuous problem independently of the mesh size. We show how the specific basis functions can be efficiently computed by low-dimensional elliptic problems, and confirm the practicability and performance of the method for a numerical example.

[204] 2010.15785

Quickest detection of false data injection attack in remote state estimation

In this paper, quickest detection of false data injection attack on remote state estimation is considered. A set of $N$ sensors make noisy linear observations of a discrete-time linear process with Gaussian noise, and report the observations to a remote estimator. The challenge is the presence of a few potentially malicious sensors which can start strategically manipulating their observations at a random time in order to skew the estimates. The quickest attack detection problem for a known linear attack scheme is posed as a constrained Markov decision process in order to minimise the expected detection delay subject to a false alarm constraint, with the state involving the probability belief at the estimator that the system is under attack. State transition probabilities are derived in terms of system parameters, and the structure of the optimal policy is derived analytically. It turns out that the optimal policy amounts to checking whether the probability belief exceeds a threshold. Numerical results demonstrate significant performance gain under the proposed algorithm against competing algorithms.

[205] 2010.15786

Light-Weight DDoS Mitigation at Network Edge with Limited Resources

The Internet of Things (IoT) has been growing rapidly in recent years. With the appearance of 5G, it is expected to become even more indispensable to people's lives. In accordance with the increase of Distributed Denial-of-Service (DDoS) attacks from IoT devices, DDoS defense has become a hot research topic. DDoS detection mechanisms executed on routers and SDN environments have been intensely studied. However, these methods have the disadvantage of requiring the cost and performance of the devices. In addition, there is no existing DDoS mitigation algorithm on the network edge that can be performed with the low-cost and low performance equipments. Therefore, this paper proposes a light-weight DDoS mitigation scheme at the network edge using limited resources of inexpensive devices such as home gateways. The goal of the proposed scheme is to simply detect and mitigate flooding attacks. It utilizes unused queue resources to detect malicious flows by random shuffling of queue allocation and discard the packets of the detected flows. The performance of the proposed scheme was confirmed via theoretical analysis and computer simulation. The simulation results match the theoretical results and the proposed algorithm can efficiently detect malicious flows using limited resources.

[206] 2010.15792

A Framework for Learning Predator-prey Agents from Simulation to Real World

In this paper, we propose an evolutionary predatorprey robot system which can be generally implemented from simulation to the real world. We design the closed-loop robot system with camera and infrared sensors as inputs of controller. Both the predators and prey are co-evolved by NeuroEvolution of Augmenting Topologies (NEAT) to learn the expected behaviours. We design a framework that integrate Gym of OpenAI, Robot Operating System (ROS), Gazebo. In such a framework, users only need to focus on algorithms without being worried about the detail of manipulating robots in both simulation and the real world. Combining simulations, real-world evolution, and robustness analysis, it can be applied to develop the solutions for the predator-prey tasks. For the convenience of users, the source code and videos of the simulated and real world are published on Github.

[207] 2010.15793

A computational periporomechanics model for localized failure in unsaturated porous media

We implement a computational periporomechanics model for simulating localized failure in unsaturated porous media. The coupled periporomechanics model is based on the peridynamic state concept and the effective force state concept. The coupled governing equations are integral-differential equations without assuming the continuity of solid displacement and fluid pressures. The fluid flow and effective force states are determined by nonlocal fluid pressure and deformation gradients through the recently formulated multiphase constitutive correspondence principle. The coupled peri-poromechanics is implemented numerically for high-performance computing by an implicit multiphase meshfree method utilizing the message passing interface. The numerical implementation is validated by simulating classical poromechanics problems and comparing the numerical results with analytical solutions and experimental data. Numerical examples are presented to demonstrate the robustness of the fully coupled peri-poromechanics in modeling localized failures in unsaturated porous media.

[208] 2010.15794

Eccentricity queries and beyond using Hub Labels

Hub labeling schemes are popular methods for computing distances on road networks and other large complex networks, often answering to a query within a few microseconds for graphs with millions of edges. In this work, we study their algorithmic applications beyond distance queries. We focus on eccentricity queries and distance-sum queries, for several versions of these problems on directed weighted graphs, that is in part motivated by their importance in facility location problems. On the negative side, we show conditional lower bounds for these above problems on unweighted undirected sparse graphs, via standard constructions from "Fine-grained" complexity. However, things take a different turn when the hub labels have a sublogarithmic size. Indeed, given a hub labeling of maximum label size $\leq k$, after pre-processing the labels in total $2^{{O}(k)} \cdot |V|^{1+o(1)}$ time, we can compute both the eccentricity and the distance-sum of any vertex in $2^{{O}(k)} \cdot |V|^{o(1)}$ time. It can also be applied to the fast global computation of some topological indices. Finally, as a by-product of our approach, on any fixed class of unweighted graphs with bounded expansion, we can decide whether the diameter of an $n$-vertex graph in the class is at most $k$ in $f(k) \cdot n^{1+o(1)}$ time, for some "explicit" function $f$.

[209] 2010.15803

Isometric embeddings in trees and their use in the diameter problem

We prove that given a discrete space with $n$ points which is either embedded in a system of $k$ trees, or the Cartesian product of $k$ trees, we can compute all eccentricities in ${\cal O}(2^{{\cal O}(k\log{k})}(N+n)^{1+o(1)})$ time, where $N$ is the cumulative total order over all these $k$ trees. This is near optimal under the Strong Exponential-Time Hypothesis, even in the very special case of an $n$-vertex graph embedded in a system of $\omega(\log{n})$ spanning trees. However, given such an embedding in the strong product of $k$ trees, there is a much faster ${\cal O}(N + kn)$-time algorithm for this problem. All our positive results can be turned into approximation algorithms for the graphs and finite spaces with a quasi isometric embedding in trees, if such embedding is given as input, where the approximation factor (resp., the approximation constant) depends on the distortion of the embedding (resp., of its stretch). The existence of embeddings in the Cartesian product of finitely many trees has been thoroughly investigated for cube-free median graphs. We give the first-known quasi linear-time algorithm for computing the diameter within this graph class. It does not require an embedding in a product of trees to be given as part of the input. On our way, being given an $n$-node tree $T$, we propose a data structure with ${\cal O}(n\log{n})$ pre-processing time in order to compute in ${\cal O}(k\log^2{n})$ time the eccentricity of any subset of $k$ nodes. We combine the latter technical contribution, of independent interest, with a recent distance-labeling scheme that was designed for cube-free median graphs.

[210] 2010.15805

A Local Search Framework for Experimental Design

We present a local search framework to design and analyze both combinatorial algorithms and rounding algorithms for experimental design problems. This framework provides a unifying approach to match and improve all known results in D/A/E-design and to obtain new results in previously unknown settings. For combinatorial algorithms, we provide a new analysis of the classical Fedorov's exchange method. We prove that this simple local search algorithm works well as long as there exists an almost optimal solution with good condition number. Moreover, we design a new combinatorial local search algorithm for E-design using the regret minimization framework. For rounding algorithms, we provide a unified randomized exchange algorithm to match and improve previous results for D/A/E-design. Furthermore, the algorithm works in the more general setting to approximately satisfy multiple knapsack constraints, which can be used for weighted experimental design and for incorporating fairness constraints into experimental design.

[211] 2010.15809

The ins and outs of speaker recognition: lessons from VoxSRC 2020

The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 offers a challenging evaluation for speaker recognition systems, which includes celebrities playing different parts in movies. The goal of this work is robust speaker recognition of utterances recorded in these challenging environments. We utilise variants of the popular ResNet architecture for speaker recognition and perform extensive experiments using a range of loss functions and training parameters. To this end, we optimise an efficient training framework that allows powerful models to be trained with limited time and resources. Our trained models demonstrate improvements over most existing works with lighter models and a simple pipeline. The paper shares the lessons learned from our participation in the challenge.

[212] 2010.15814

Around the diameter of AT-free graphs

A graph algorithm is truly subquadratic if it runs in ${\cal O}(m^b)$ time on connected $m$-edge graphs, for some positive $b < 2$. Roditty and Vassilevska Williams (STOC'13) proved that under plausible complexity assumptions, there is no truly subquadratic algorithm for computing the diameter of general graphs. In this work, we present positive and negative results on the existence of such algorithms for computing the diameter on some special graph classes. Specifically, three vertices in a graph form an asteroidal triple (AT) if between any two of them there exists a path that avoids the closed neighbourhood of the third one. We call a graph AT-free if it does not contain an AT. We first prove that for all $m$-edge AT-free graphs, one can compute all the eccentricities in truly subquadratic ${\cal O}(m^{3/2})$ time. Then, we extend our study to several subclasses of chordal graphs -- all of them generalizing interval graphs in various ways --, as an attempt to understand which of the properties of AT-free graphs, or natural generalizations of the latter, can help in the design of fast algorithms for the diameter problem on broader graph classes. For instance, for all chordal graphs with a dominating shortest path, there is a linear-time algorithm for computing a diametral pair if the diameter is at least four. However, already for split graphs with a dominating edge, under plausible complexity assumptions, there is no truly subquadratic algorithm for deciding whether the diameter is either $2$ or $3$.

[213] 2010.15820

Down the bot hole: actionable insights from a 1-year analysis of bots activity on Twitter

Nowadays, social media represent persuasive tools that have been progressively weaponized to affect people's beliefs, spread manipulative narratives, and sow conflicts along divergent factions. Software-controlled accounts (i.e., bots) are one of the main actors associated with manipulation campaigns, especially in the political context. Uncovering the strategies behind bots' activities is of paramount importance to detect and curb such campaigns. In this paper, we present a long term (one year) analysis of bots activity on Twitter in the run-up to the 2018 U.S. Midterm Elections. We identify different classes of accounts based on their nature (bot vs. human) and engagement within the online discussion and we observe that hyperactive bots played a pivotal role in the dissemination of conspiratorial narratives, while dominating the political debate since the year before the election. Our analysis, on the horizon of the upcoming U.S. 2020 Presidential Election, reveals both alarming findings of humans' susceptibility to bots and actionable insights that can contribute to curbing coordinated campaigns.

[214] 2010.15821

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in the hypernetwork. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at

[215] 2010.15823

Black-Box Optimization of Object Detector Scales

Object detectors have improved considerably in the last years by using advanced CNN architectures. However, many detector hyper-parameters are generally manually tuned, or they are used with values set by the detector authors. Automatic Hyper-parameter optimization has not been explored in improving CNN-based object detectors hyper-parameters. In this work, we propose the use of Black-box optimization methods to tune the prior/default box scales in Faster R-CNN and SSD, using Bayesian Optimization, SMAC, and CMA-ES. We show that by tuning the input image size and prior box anchor scale on Faster R-CNN mAP increases by 2% on PASCAL VOC 2007, and by 3% with SSD. On the COCO dataset with SSD there are mAP improvement in the medium and large objects, but mAP decreases by 1% in small objects. We also perform a regression analysis to find the significant hyper-parameters to tune.

[216] 2010.15824

Passport-aware Normalization for Deep Model Protection

Despite tremendous success in many application scenarios, deep learning faces serious intellectual property (IP) infringement threats. Considering the cost of designing and training a good model, infringements will significantly infringe the interests of the original model owner. Recently, many impressive works have emerged for deep model IP protection. However, they either are vulnerable to ambiguity attacks, or require changes in the target network structure by replacing its original normalization layers and hence cause significant performance drops. To this end, we propose a new passport-aware normalization formulation, which is generally applicable to most existing normalization layers and only needs to add another passport-aware branch for IP protection. This new branch is jointly trained with the target model but discarded in the inference stage. Therefore it causes no structure change in the target model. Only when the model IP is suspected to be stolen by someone, the private passport-aware branch is added back for ownership verification. Through extensive experiments, we verify its effectiveness in both image and 3D point recognition models. It is demonstrated to be robust not only to common attack techniques like fine-tuning and model compression, but also to ambiguity attacks. By further combining it with trigger-set based methods, both black-box and white-box verification can be achieved for enhanced security of deep learning models deployed in real systems. Code can be found at

[217] 2010.15831

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Existing object detection frameworks are usually built on a single format of object/part representation, i.e., anchor/proposal rectangle boxes in RetinaNet and Faster R-CNN, center points in FCOS and RepPoints, and corner points in CornerNet. While these different representations usually drive the frameworks to perform well in different aspects, e.g., better classification or finer localization, it is in general difficult to combine these representations in a single framework to make good use of each strength, due to the heterogeneous or non-grid feature extraction by different representations. This paper presents an attention-based decoder module similar as that in Transformer~\cite{vaswani2017attention} to bridge other representations into a typical object detector built on a single representation format, in an end-to-end fashion. The other representations act as a set of \emph{key} instances to strengthen the main \emph{query} representation features in the vanilla detectors. Novel techniques are proposed towards efficient computation of the decoder module, including a \emph{key sampling} approach and a \emph{shared location embedding} approach. The proposed module is named \emph{bridging visual representations} (BVR). It can perform in-place and we demonstrate its broad effectiveness in bridging other representations into prevalent object detection frameworks, including RetinaNet, Faster R-CNN, FCOS and ATSS, where about $1.5\sim3.0$ AP improvements are achieved. In particular, we improve a state-of-the-art framework with a strong backbone by about $2.0$ AP, reaching $52.7$ AP on COCO test-dev. The resulting network is named RelationNet++. The code will be available at

[218] 2010.15832

Proceedings 9th International Workshop on Theorem Proving Components for Educational Software

The 9th International Workshop on Theorem-Proving Components for Educational Software (ThEdu'20) was scheduled to happen on June 29 as a satellite of the IJCAR-FSCD 2020 joint meeting, in Paris. The COVID-19 pandemic came by surprise, though, and the main conference was virtualised. Fearing that an online meeting would not allow our community to fully reproduce the usual face-to-face networking opportunities of the ThEdu initiative, the Steering Committee of ThEdu decided to cancel our workshop. Given that many of us had already planned and worked for that moment, we decided that ThEdu'20 could still live in the form of an EPTCS volume. The EPTCS concurred with us, recognising this very singular situation, and accepted our proposal of organising a special issue with papers submitted to ThEdu'20. An open call for papers was then issued, and attracted five submissions, all of which have been accepted by our reviewers, who produced three careful reports on each of the contributions. The resulting revised papers are collected in the present volume. We, the volume editors, hope that this collection of papers will help further promoting the development of theorem-proving-based software, and that it will collaborate to improve the mutual understanding between computer mathematicians and stakeholders in education. With some luck, we would actually expect that the very special circumstances set up by the worst sanitary crisis in a century will happen to reinforce the need for the application of certified components and of verification methods for the production of educational software that would be available even when the traditional on-site learning experiences turn out not to be recommendable.

[219] 2010.14544

The fundamental equations of change in statistical ensembles and biological populations

A recent article in Nature Physics unified key results from thermodynamics, statistics, and information theory. The unification arose from a general equation for the rate of change in the information content of a system. The general equation describes the change in the moments of an observable quantity over a probability distribution. One term in the equation describes the change in the probability distribution. The other term describes the change in the observable values for a given state. We show the equivalence of this general equation for moment dynamics with the widely known Price equation from evolutionary theory, named after George Price. We introduce the Price equation from its biological roots, review a mathematically abstract form of the equation, and discuss the potential for this equation to unify diverse mathematical theories from different disciplines. The new work in Nature Physics and many applications in biology show that this equation also provides the basis for deriving many novel theoretical results within each discipline.

[220] 2010.14734

Generalized eigen, singular value, and partial least squares decompositions: The GSVD package

The generalized singular value decomposition (GSVD, a.k.a. "SVD triplet", "duality diagram" approach) provides a unified strategy and basis to perform nearly all of the most common multivariate analyses (e.g., principal components, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares). Though the GSVD is ubiquitous, powerful, and flexible, it has very few implementations. Here I introduce the GSVD package for R. The general goal of GSVD is to provide a small set of accessible functions to perform the GSVD and two other related decompositions (generalized eigenvalue decomposition, generalized partial least squares-singular value decomposition). Furthermore, GSVD helps provide a more unified conceptual approach and nomenclature to many techniques. I first introduce the concept of the GSVD, followed by a formal definition of the generalized decompositions. Next I provide some key decisions made during development, and then a number of examples of how to use GSVD to implement various statistical techniques. These examples also illustrate one of the goals of GSVD: how others can (or should) build analysis packages that depend on GSVD. Finally, I discuss the possible future of GSVD.

[221] 2010.14746

Continuous Chaotic Nonlinear System and Lyapunov controller Optimization using Deep Learning

The introduction of unexpected system disturbances and new system dynamics does not allow initially selected static system and controller parameters to guarantee continued system stability and performance. In this research we present a novel approach for detecting early failure indicators of non-linear highly chaotic system and accordingly predict the best parameter calibrations to offset such instability using deep machine learning regression model. The approach proposed continuously monitors the system and controller signals. The Re-calibration of the system and controller parameters is triggered according to a set of conditions designed to maintain system stability without compromise to the system speed, intended outcome or required processing power. The deep neural model predicts the parameter values that would best counteract the expected system in-stability. To demonstrate the effectiveness of the proposed approach, it is applied to the non-linear complex combination of Duffing Van der pol oscillators. The approach is also tested under different scenarios the system and controller parameters are initially chosen incorrectly or the system parameters are changed while running or new system dynamics are introduced while running to measure effectiveness and reaction time.

[222] 2010.15153

On the Optimality and Convergence Properties of the Learning Model Predictive Controller

In this technical note we analyse the performance improvement and optimality properties of the Learning Model Predictive Control (LMPC) strategy for linear deterministic systems. The LMPC framework is a policy iteration scheme where closed-loop trajectories are used to update the control policy for the next execution of the control task. We show that, when a Linear Independence Constraint Qualification (LICQ) condition holds, the LMPC scheme guarantees strict iterative performance improvement and optimality, meaning that the closed-loop cost evaluated over the entire task converges asymptotically to the optimal cost of the infinite-horizon control problem. Compared to previous works this sufficient LICQ condition can be easily checked, it holds for a larger class of systems and it can be used to adaptively select the prediction horizon of the controller, as demonstrated by a numerical example.

[223] 2010.15156

Diagnostic data integration using deep neural networks for real-time plasma analysis

Recent advances in acquisition equipment is providing experiments with growing amounts of precise yet affordable sensors. At the same time an improved computational power, coming from new hardware resources (GPU, FPGA, ACAP), has been made available at relatively low costs. This led us to explore the possibility of completely renewing the chain of acquisition for a fusion experiment, where many high-rate sources of data, coming from different diagnostics, can be combined in a wide framework of algorithms. If on one hand adding new data sources with different diagnostics enriches our knowledge about physical aspects, on the other hand the dimensions of the overall model grow, making relations among variables more and more opaque. A new approach for the integration of such heterogeneous diagnostics, based on composition of deep \textit{variational autoencoders}, could ease this problem, acting as a structural sparse regularizer. This has been applied to RFX-mod experiment data, integrating the soft X-ray linear images of plasma temperature with the magnetic state. However to ensure a real-time signal analysis, those algorithmic techniques must be adapted to run in well suited hardware. In particular it is shown that, attempting a quantization of neurons transfer functions, such models can be modified to create an embedded firmware. This firmware, approximating the deep inference model to a set of simple operations, fits well with the simple logic units that are largely abundant in FPGAs. This is the key factor that permits the use of affordable hardware with complex deep neural topology and operates them in real-time.

[224] 2010.15166

Polymer Informatics with Multi-Task Learning

Modern data-driven tools are transforming application-specific polymer development cycles. Surrogate models that can be trained to predict the properties of new polymers are becoming commonplace. Nevertheless, these models do not utilize the full breadth of the knowledge available in datasets, which are oftentimes sparse; inherent correlations between different property datasets are disregarded. Here, we demonstrate the potency of multi-task learning approaches that exploit such inherent correlations effectively, particularly when some property dataset sizes are small. Data pertaining to 36 different properties of over $13, 000$ polymers (corresponding to over $23,000$ data points) are coalesced and supplied to deep-learning multi-task architectures. Compared to conventional single-task learning models (that are trained on individual property datasets independently), the multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available. Moreover, these models are interpretable. Chemical rules, that explain how certain features control trends in specific property values, emerge from the present work, paving the way for the rational design of application specific polymers meeting desired property or performance objectives.

[225] 2010.15209

Ground Roll Suppression using Convolutional Neural Networks

Seismic data processing plays a major role in seismic exploration as it conditions much of the seismic interpretation performance. In this context, generating reliable post-stack seismic data depends also on disposing of an efficient pre-stack noise attenuation tool. Here we tackle ground roll noise, one of the most challenging and common noises observed in pre-stack seismic data. Since ground roll is characterized by relative low frequencies and high amplitudes, most commonly used approaches for its suppression are based on frequency-amplitude filters for ground roll characteristic bands. However, when signal and noise share the same frequency ranges, these methods usually deliver also signal suppression or residual noise. In this paper we take advantage of the highly non-linear features of convolutional neural networks, and propose to use different architectures to detect ground roll in shot gathers and ultimately to suppress them using conditional generative adversarial networks. Additionally, we propose metrics to evaluate ground roll suppression, and report strong results compared to expert filtering. Finally, we discuss generalization of trained models for similar and different geologies to better understand the feasibility of our proposal in real applications.

[226] 2010.15221

Geometric Sampling of Networks

Motivated by the methods and results of manifold sampling based on Ricci curvature, we propose a similar approach for networks. To this end we make appeal to three types of discrete curvature, namely the graph Forman-, full Forman- and Haantjes-Ricci curvatures for edge-based and node-based sampling. We present the results of experiments on real life networks, as well as for square grids arising in Image Processing. Moreover, we consider fitting Ricci flows and we employ them for the detection of networks' backbone. We also develop embedding kernels related to the Forman-Ricci curvatures and employ them for the detection of the coarse structure of networks, as well as for network visualization with applications to SVM. The relation between the Ricci curvature of the original manifold and that of a Ricci curvature driven discretization is also studied.

[227] 2010.15233

Accurate Prostate Cancer Detection and Segmentation on Biparametric MRI using Non-local Mask R-CNN with Histopathological Ground Truth

Purpose: We aimed to develop deep machine learning (DL) models to improve the detection and segmentation of intraprostatic lesions (IL) on bp-MRI by using whole amount prostatectomy specimen-based delineations. We also aimed to investigate whether transfer learning and self-training would improve results with small amount labelled data. Methods: 158 patients had suspicious lesions delineated on MRI based on bp-MRI, 64 patients had ILs delineated on MRI based on whole mount prostatectomy specimen sections, 40 patients were unlabelled. A non-local Mask R-CNN was proposed to improve the segmentation accuracy. Transfer learning was investigated by fine-tuning a model trained using MRI-based delineations with prostatectomy-based delineations. Two label selection strategies were investigated in self-training. The performance of models was evaluated by 3D detection rate, dice similarity coefficient (DSC), 95 percentile Hausdrauff (95 HD, mm) and true positive ratio (TPR). Results: With prostatectomy-based delineations, the non-local Mask R-CNN with fine-tuning and self-training significantly improved all evaluation metrics. For the model with the highest detection rate and DSC, 80.5% (33/41) of lesions in all Gleason Grade Groups (GGG) were detected with DSC of 0.548[0.165], 95 HD of 5.72[3.17] and TPR of 0.613[0.193]. Among them, 94.7% (18/19) of lesions with GGG > 2 were detected with DSC of 0.604[0.135], 95 HD of 6.26[3.44] and TPR of 0.580[0.190]. Conclusion: DL models can achieve high prostate cancer detection and segmentation accuracy on bp-MRI based on annotations from histologic images. To further improve the performance, more data with annotations of both MRI and whole amount prostatectomy specimens are required.

[228] 2010.15245

A marine radioisotope gamma-ray spectrum analysis method based on Monte Carlo simulation and MLP neural network

A multilayer perceptron (MLP) neural network is built to analyze the Cs-137 concentration in seawater via gamma-ray spectrums measured by a LaBr3 detector. The MLP is trained and tested by a large data set generated by combining measured and Monte Carlo simulated spectrums under the assumption that all the measured spectrums have 0 Cs-137 concentration. And the performance of MLP is evaluated and compared with the traditional net-peak area method. The results show an improvement of 7% in accuracy and 0.036 in the ROC-curve area compared to those of the net peak area method. And the influence of the assumption of Cs-137 concentration in the training data set on the classifying performance of MLP is evaluated.

[229] 2010.15269

GloFlow: Global Image Alignment for Creation of Whole Slide Images for Pathology from Video

The application of deep learning to pathology assumes the existence of digital whole slide images of pathology slides. However, slide digitization is bottlenecked by the high cost of precise motor stages in slide scanners that are needed for position information used for slide stitching. We propose GloFlow, a two-stage method for creating a whole slide image using optical flow-based image registration with global alignment using a computationally tractable graph-pruning approach. In the first stage, we train an optical flow predictor to predict pairwise translations between successive video frames to approximate a stitch. In the second stage, this approximate stitch is used to create a neighborhood graph to produce a corrected stitch. On a simulated dataset of video scans of WSIs, we find that our method outperforms known approaches to slide-stitching, and stitches WSIs resembling those produced by slide scanners.

[230] 2010.15271

A globally convergent modified Newton method for the direct minimization of the Ohta-Kawasaki energy with application to the directed self-assembly of diblock copolymers

We propose a fast and robust scheme for the direct minimization of the Ohta-Kawasaki energy that characterizes the microphase separation of diblock copolymer melts. The scheme employs a globally convergent modified Newton method with line search which is shown to be mass-conservative, energy-descending, asymptotically quadratically convergent, and three orders of magnitude more efficient than the commonly-used gradient flow approach. The regularity and the first-order condition of minimizers are analyzed. A numerical study of the chemical substrate guided directed self-assembly of diblock copolymer melts, based on a novel polymer-substrate interaction model and the proposed scheme, is provided.

[231] 2010.15272

The distribution of inhibitory neurons in the C. elegans connectome facilitates self-optimization of coordinated neural activity

The nervous system of the nematode soil worm Caenorhabditis elegans exhibits remarkable complexity despite the worm's small size. A general challenge is to better understand the relationship between neural organization and neural activity at the system level, including the functional roles of inhibitory connections. Here we implemented an abstract simulation model of the C. elegans connectome that approximates the neurotransmitter identity of each neuron, and we explored the functional role of these physiological differences for neural activity. In particular, we created a Hopfield neural network in which all of the worm's neurons characterized by inhibitory neurotransmitters are assigned inhibitory outgoing connections. Then, we created a control condition in which the same number of inhibitory connections are arbitrarily distributed across the network. A comparison of these two conditions revealed that the biological distribution of inhibitory connections facilitates the self-optimization of coordinated neural activity compared with an arbitrary distribution of inhibitory connections.

[232] 2010.15289

Link inference of noisy delay-coupled networks: Machine learning and opto-electronic experimental tests

We devise a machine learning technique to solve the general problem of inferring network links that have time-delays. The goal is to do this purely from time-series data of the network nodal states. This task has applications in fields ranging from applied physics and engineering to neuroscience and biology. To achieve this, we first train a type of machine learning system known as reservoir computing to mimic the dynamics of the unknown network. We formulate and test a technique that uses the trained parameters of the reservoir system output layer to deduce an estimate of the unknown network structure. Our technique, by its nature, is non-invasive, but is motivated by the widely-used invasive network inference method whereby the responses to active perturbations applied to the network are observed and employed to infer network links (e.g., knocking down genes to infer gene regulatory networks). We test this technique on experimental and simulated data from delay-coupled opto-electronic oscillator networks. We show that the technique often yields very good results particularly if the system does not exhibit synchrony. We also find that the presence of dynamical noise can strikingly enhance the accuracy and ability of our technique, especially in networks that exhibit synchrony.

[233] 2010.15306

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task increases system complexity and network size. To address these problems, we propose an activity-coupled Cartesian DOA (ACCDOA) representation, which assigns a sound event activity to the length of a corresponding Cartesian DOA vector. The ACCDOA representation enables us to solve a SELD task with a single target and has two advantages: avoiding the necessity of balancing the objectives and model size increase. In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size. The ACCDOA-based SELD system also performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.

[234] 2010.15311

DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech

With the number of smart devices increasing, the demand for on-device text-to-speech (TTS) increases rapidly. In recent years, many prominent End-to-End TTS methods have been proposed, and have greatly improved the quality of synthesized speech. However, to ensure the qualified speech, most TTS systems depend on large and complex neural network models, and it's hard to deploy these TTS systems on-device. In this paper, a small-footprint, fast, stable network for on-device TTS is proposed, named as DeviceTTS. DeviceTTS makes use of a duration predictor as a bridge between encoder and decoder so as to avoid the problem of words skipping and repeating in Tacotron. As we all know, model size is a key factor for on-device TTS. For DeviceTTS, Deep Feedforward Sequential Memory Network (DFSMN) is used as the basic component. Moreover, to speed up inference, mix-resolution decoder is proposed for balance the inference speed and speech quality. Experiences are done with WORLD and LPCNet vocoder. Finally, with only 1.4 million model parameters and 0.099 GFLOPS, DeviceTTS achieves comparable performance with Tacotron and FastSpeech. As far as we know, the DeviceTTS can meet the needs of most of the devices in practical application.

[235] 2010.15322

Improvement of EAST Data Acquisition Configuration Management

The data acquisition console is an important component of the EAST data acquisition system which provides unified data acquisition and long-term data storage for diagnostics. The data acquisition console is used to manage the data acquisition configuration information and control the data acquisition workflow. The data acquisition console has been developed many years, and with increasing of data acquisition nodes and emergence of new control nodes, the function of configuration management has become inadequate. It is going to update the configuration management function of data acquisition console. The upgraded data acquisition console based on LabVIEW should be oriented to the data acquisition administrator, with the functions of managing data acquisition nodes, managing control nodes, setting and publishing configuration parameters, batch management, database backup, monitoring the status of data acquisition nodes, controlling the data acquisition workflow, and shot simulation data acquisition test. The upgraded data acquisition console has been designed and under testing recently.

[236] 2010.15347

Distance Invariant Sparse Autoencoder for Wireless Signal Strength Mapping

Wireless signal strength based localization can enable robust localization for robots using inexpensive sensors. For this, a location-to-signal-strength map has to be learned for each access point in the environment. Due to the ubiquity of Wireless networks in most environments, this can result in tens or hundreds of maps. To reduce the dimensionality of this problem, we employ autoencoders, which are a popular unsupervised approach for feature extraction and data compression. In particular, we propose the use of sparse autoencoders that learn latent spaces that preserve the relative distance between inputs. Distance invariance between input and latent spaces allows our system to successfully learn compact representations that allow precise data reconstruction but also have a low impact on localization performance when using maps from the latent space rather than the input space. We demonstrate the feasibility of our approach by performing experiments in outdoor environments.

[237] 2010.15352

An automated and multi-parametric algorithm for objective analysis of meibography images

Meibography is a non-contact imaging technique used by ophthalmologists to assist in the evaluation and diagnosis of meibomian gland dysfunction (MGD). While artificial qualitative analysis of meibography images could lead to low repeatability and efficiency and multi-parametric analysis is demanding to offer more comprehensive information in discovering subtle changes of meibomian glands during MGD progression, we developed an automated and multi-parametric algorithm for objective and quantitative analysis of meibography images. The full architecture of the algorithm can be divided into three steps: (1) segmentation of the tarsal conjunctiva area as the region of interest (ROI); (2) segmentation and identification of glands within the ROI; and (3) quantitative multi-parametric analysis including newly defined gland diameter deformation index (DI), gland tortuosity index (TI), and glands signal index (SI). To evaluate the performance of the automated algorithm, the similarity index (k) and the segmentation error including the false positive rate (r_P) and the false negative rate (r_N) are calculated between the manually defined ground truth and the automatic segmentations of both the ROI and meibomian glands of 15 typical meibography images. The feasibility of the algorithm is demonstrated in analyzing typical meibograhy images.

[238] 2010.15358

A stochastic optimization algorithm for analyzing planar central and balanced configurations in the $n$-body problem

A stochastic optimization algorithm for analyzing planar central and balanced configurations in the $n$-body problem is presented. We find a comprehensive list of equal mass central configurations satisfying the Morse equality up to $n=12$. We show some exemplary balanced configurations in the case $n=5$, as well as some balanced configurations without any axis of symmetry in the cases $n=4$ and $n=10$.

[239] 2010.15376

Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth

Sparse signal recovery problems from noisy linear measurements appear in many areas of wireless communications. In recent years, deep learning (DL) based approaches have attracted interests of researchers to solve the sparse linear inverse problem by unfolding iterative algorithms as neural networks. Typically, research concerning DL assume a fixed number of network layers. However, it ignores a key character in traditional iterative algorithms, where the number of iterations required for convergence changes with varying sparsity levels. By investigating on the projected gradient descent, we unveil the drawbacks of the existing DL methods with fixed depth. Then we propose an end-to-end trainable DL architecture, which involves an extra halting score at each layer. Therefore, the proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase. We conduct experiments using both synthetic data and applications including random access in massive MTC and massive MIMO channel estimation, and the results demonstrate the improved efficiency for the proposed approach.

[240] 2010.15379

The Performance Analysis of Generalized Margin Maximizer (GMM) on Separable Data

Logistic models are commonly used for binary classification tasks. The success of such models has often been attributed to their connection to maximum-likelihood estimators. It has been shown that gradient descent algorithm, when applied on the logistic loss, converges to the max-margin classifier (a.k.a. hard-margin SVM). The performance of the max-margin classifier has been recently analyzed. Inspired by these results, in this paper, we present and study a more general setting, where the underlying parameters of the logistic model possess certain structures (sparse, block-sparse, low-rank, etc.) and introduce a more general framework (which is referred to as "Generalized Margin Maximizer", GMM). While classical max-margin classifiers minimize the $2$-norm of the parameter vector subject to linearly separating the data, GMM minimizes any arbitrary convex function of the parameter vector. We provide a precise analysis of the performance of GMM via the solution of a system of nonlinear equations. We also provide a detailed study for three special cases: ($1$) $\ell_2$-GMM that is the max-margin classifier, ($2$) $\ell_1$-GMM which encourages sparsity, and ($3$) $\ell_{\infty}$-GMM which is often used when the parameter vector has binary entries. Our theoretical results are validated by extensive simulation results across a range of parameter values, problem instances, and model structures.

[241] 2010.15417

ProCAN: Progressive Growing Channel Attentive Non-Local Network for Lung Nodule Classification

Lung cancer classification in screening computed tomography (CT) scans is one of the most crucial tasks for early detection of this disease. Many lives can be saved if we are able to accurately classify malignant/ cancerous lung nodules. Consequently, several deep learning based models have been proposed recently to classify lung nodules as malignant or benign. Nevertheless, the large variation in the size and heterogeneous appearance of the nodules makes this task an extremely challenging one. We propose a new Progressive Growing Channel Attentive Non-Local (ProCAN) network for lung nodule classification. The proposed method addresses this challenge from three different aspects. First, we enrich the Non-Local network by adding channel-wise attention capability to it. Second, we apply Curriculum Learning principles, whereby we first train our model on easy examples before hard/ difficult ones. Third, as the classification task gets harder during the Curriculum learning, our model is progressively grown to increase its capability of handling the task at hand. We examined our proposed method on two different public datasets and compared its performance with state-of-the-art methods in the literature. The results show that the ProCAN model outperforms state-of-the-art methods and achieves an AUC of 98.05% and accuracy of 95.28% on the LIDC-IDRI dataset. Moreover, we conducted extensive ablation studies to analyze the contribution and effects of each new component of our proposed method.

[242] 2010.15425

Detection of asteroid trails in Hubble Space Telescope images using Deep Learning

We present an application of Deep Learning for the image recognition of asteroid trails in single-exposure photos taken by the Hubble Space Telescope. Using algorithms based on multi-layered deep Convolutional Neural Networks, we report accuracies of above 80% on the validation set. Our project was motivated by the Hubble Asteroid Hunter project on Zooniverse, which focused on identifying these objects in order to localize and better characterize them. We aim to demonstrate that Machine Learning techniques can be very useful in trying to solve problems that are closely related to Astronomy and Astrophysics, but that they are still not developed enough for very specific tasks.

[243] 2010.15427

Sparse Signal Reconstruction for Nonlinear Models via Piecewise Rational Optimization

We propose a method to reconstruct sparse signals degraded by a nonlinear distortion and acquired at a limited sampling rate. Our method formulates the reconstruction problem as a nonconvex minimization of the sum of a data fitting term and a penalization term. In contrast with most previous works which settle for approximated local solutions, we seek for a global solution to the obtained challenging nonconvex problem. Our global approach relies on the so-called Lasserre relaxation of polynomial optimization. We here specifically include in our approach the case of piecewise rational functions, which makes it possible to address a wide class of nonconvex exact and continuous relaxations of the $\ell_0$ penalization function. Additionally, we study the complexity of the optimization problem. It is shown how to use the structure of the problem to lighten the computational burden efficiently. Finally, numerical simulations illustrate the benefits of our method in terms of both global optimality and signal reconstruction.

[244] 2010.15438

Modeling and Control of COVID-19 Epidemic through Testing Policies

Testing for the infected cases is one of the most important mechanisms to control an epidemic. It enables to isolate the detected infected individuals, thereby limiting the disease transmission to the susceptible population. However, despite the significance of testing policies, the recent literature on the subject lacks a control-theoretic perspective. In this work, an epidemic model that incorporates the testing rate as a control input is presented. The proposed model differentiates the undetected infected from the detected infected cases, who are assumed to be removed from the disease spreading process in the population. First, the model is estimated and validated for COVID-19 data in France. Then, two testing policies are proposed, the so-called best-effort strategy for testing (BEST) and constant optimal strategy for testing (COST). The BEST policy is a suppression strategy that provides a lower bound on the testing rate such that the epidemic switches from a spreading to a non-spreading state. The COST policy is a mitigation strategy that provides an optimal value of testing rate that minimizes the peak value of the infected population when the total stockpile of tests is limited. Both testing policies are evaluated by predicting the number of active intensive care unit (ICU) cases and the cumulative number of deaths due to COVID-19.

[245] 2010.15440

FlatNet: Towards Photorealistic Scene Reconstruction from Lensless Measurements

Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer images. In this work, we propose a non-iterative deep learning based reconstruction approach that results in orders of magnitude improvement in image quality for lensless reconstructions. Our approach, called $\textit{FlatNet}$, lays down a framework for reconstructing high-quality photorealistic images from mask-based lensless cameras, where the camera's forward model formulation is known. FlatNet consists of two stages: (1) an inversion stage that maps the measurement into a space of intermediate reconstruction by learning parameters within the forward model formulation, and (2) a perceptual enhancement stage that improves the perceptual quality of this intermediate reconstruction. These stages are trained together in an end-to-end manner. We show high-quality reconstructions by performing extensive experiments on real and challenging scenes using two different types of lensless prototypes: one which uses a separable forward model and another, which uses a more general non-separable cropped-convolution model. Our end-to-end approach is fast, produces photorealistic reconstructions, and is easy to adopt for other mask-based lensless cameras.

[246] 2010.15446

Progressive Voice Trigger Detection: Accuracy vs Latency

We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision. However, waiting to listen to more audio each time incurs a latency increase. Progressive Voice Trigger Detection allows us to trade-off latency and accuracy by accepting clear trigger candidates quickly, but waiting for more context to decide whether to accept more marginal examples. Using a two-stage architecture, we show that by delaying the decision for just 3% of detected true triggers in the test set, we are able to obtain a relative improvement of 66% in false rejection rate, while incurring only a negligible increase in latency.

[247] 2010.15490

Linearizing Combinators

In 2017, Bauer, Johnson, Osborne, Riehl, and Tebbe (BJORT) showed that the Abelian functor calculus provides an example of a Cartesian differential category. The definition of a Cartesian differential category is based on a differential combinator which directly formalizes the total derivative from multivariable calculus. However, in the aforementioned work the authors used techniques from Goodwillie's functor calculus to establish a linearization process from which they then derived a differential combinator. This raised the question of what the precise relationship between linearization and having a differential combinator might be. In this paper, we introduce the notion of a linearizing combinator which abstracts linearization in the Abelian functor calculus. We then use it to provide an alternative axiomatization of a Cartesian differential category. Every Cartesian differential category comes equipped with a canonical linearizing combinator obtained by differentiation at zero. Conversely, a differential combinator can be constructed \`a la BJORT when one has a system of partial linearizing combinators in each context. Thus, while linearizing combinators do provide an alternative axiomatization of Cartesian differential categories, an explicit notion of partial linearization is required. This is in contrast to the situation for differential combinators where partial differentiation is automatic in the presence of total differentiation. The ability to form a system of partial linearizing combinators from a total linearizing combinator, while not being possible in general, is possible when the setting is Cartesian closed.

[248] 2010.15491

A Novel Fast 3D Single Image Super-Resolution Algorithm

This paper introduces a novel computationally efficient method of solving the 3D single image super-resolution (SR) problem, i.e., reconstruction of a high-resolution volume from its low-resolution counterpart. The main contribution lies in the original way of handling simultaneously the associated decimation and blurring operators, based on their underlying properties in the frequency domain. In particular, the proposed decomposition technique of the 3D decimation operator allows a straightforward implementation for Tikhonov regularization, and can be further used to take into consideration other regularization functions such as the total variation, enabling the computational cost of state-of-the-art algorithms to be considerably decreased. Numerical experiments carried out showed that the proposed approach outperforms existing 3D SR methods.

[249] 2010.15508

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).

[250] 2010.15511

An Exact Solution Path Algorithm for SLOPE and Quasi-Spherical OSCAR

Sorted $L_1$ penalization estimator (SLOPE) is a regularization technique for sorted absolute coefficients in high-dimensional regression. By arbitrarily setting its regularization weights $\lambda$ under the monotonicity constraint, SLOPE can have various feature selection and clustering properties. On weight tuning, the selected features and their clusters are very sensitive to the tuning parameters. Moreover, the exhaustive tracking of their changes is difficult using grid search methods. This study presents a solution path algorithm that provides the complete and exact path of solutions for SLOPE in fine-tuning regularization weights. A simple optimality condition for SLOPE is derived and used to specify the next splitting point of the solution path. This study also proposes a new design of a regularization sequence $\lambda$ for feature clustering, which is called the quasi-spherical and octagonal shrinkage and clustering algorithm for regression (QS-OSCAR). QS-OSCAR is designed with a contour surface of the regularization terms most similar to a sphere. Among several regularization sequence designs, sparsity and clustering performance are compared through simulation studies. The numerical observations show that QS-OSCAR performs feature clustering more efficiently than other designs.

[251] 2010.15521

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of it. We evaluate the performance of the UNetGAN at low SNR conditions (up to -20dB) on the public benchmark. The result demonstrates that it significantly improves the speech quality and substantially outperforms the representative deep learning models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive spectrum approximation cost function (PSA-BLSTM) and Wave-U-Net regarding Short-Time Objective Intelligibility (STOI) and Perceptual evaluation of speech quality (PESQ).

[252] 2010.15526

A comparison of automatic multi-tissue segmentation methods of the human fetal brain using the FeTA Dataset

It is critical to quantitatively analyse the developing human fetal brain in order to fully understand neurodevelopment in both normal fetuses and those with congenital disorders. To facilitate this analysis, automatic multi-tissue fetal brain segmentation algorithms are needed, which in turn requires open databases of segmented fetal brains. Here we introduce a publicly available database of 50 manually segmented pathological and non-pathological fetal magnetic resonance brain volume reconstructions across a range of gestational ages (20 to 33 weeks) into 7 different tissue categories (external cerebrospinal fluid, grey matter, white matter, ventricles, cerebellum, deep grey matter, brainstem/spinal cord). In addition, we quantitatively evaluate the accuracy of several automatic multi-tissue segmentation algorithms of the developing human fetal brain. Four research groups participated, submitting a total of 10 algorithms, demonstrating the benefits the database for the development of automatic algorithms.

[253] 2010.15527

On the robustness of kernel-based pairwise learning

It is shown that many results on the statistical robustness of kernel-based pairwise learning can be derived under basically no assumptions on the input and output spaces. In particular neither moment conditions on the conditional distribution of Y given X = x nor the boundedness of the output space is needed. We obtain results on the existence and boundedness of the influence function and show qualitative robustness of the kernel-based estimator. The present paper generalizes results by Christmann and Zhou (2016) by allowing the prediction function to take two arguments and can thus be applied in a variety of situations such as ranking.

[254] 2010.15538

Matern Gaussian Processes on Graphs

Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Mat\'{e}rn Gaussian processes - a widely-used model class in the Euclidean setting - to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Mat\'{e}rn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.

[255] 2010.15541

Micromagnetics of thin films in the presence of Dzyaloshinskii-Moriya interaction

In this paper, we study the thin-film limit of the micromagnetic energy functional in the presence of bulk Dzyaloshinskii-Moriya interaction (DMI). Our analysis includes both a stationary $\Gamma$-convergence result for the micromagnetic energy, as well as the identification of the asymptotic behavior of the associated Landau-Lifshitz-Gilbert equation. In particular, we prove that, in the limiting model, part of the DMI term behaves like the projection of the magnetic moment onto the normal to the film, contributing this way to an increase in the shape anisotropy arising from the magnetostatic self-energy. Finally, we discuss a convergent finite element approach for the approximation of the time-dependent case and use it to numerically compare the original three-dimensional model with the two-dimensional thin-film limit.

[256] 2010.15551

Investigating the Robustness of Artificial Intelligent Algorithms with Mixture Experiments

Artificial intelligent (AI) algorithms, such as deep learning and XGboost, are used in numerous applications including computer vision, autonomous driving, and medical diagnostics. The robustness of these AI algorithms is of great interest as inaccurate prediction could result in safety concerns and limit the adoption of AI systems. In this paper, we propose a framework based on design of experiments to systematically investigate the robustness of AI classification algorithms. A robust classification algorithm is expected to have high accuracy and low variability under different application scenarios. The robustness can be affected by a wide range of factors such as the imbalance of class labels in the training dataset, the chosen prediction algorithm, the chosen dataset of the application, and a change of distribution in the training and test datasets. To investigate the robustness of AI classification algorithms, we conduct a comprehensive set of mixture experiments to collect prediction performance results. Then statistical analyses are conducted to understand how various factors affect the robustness of AI classification algorithms. We summarize our findings and provide suggestions to practitioners in AI applications.

[257] 2010.15560

Genetic U-Net: Automatically Designing Lightweight U-shaped CNN Architectures Using the Genetic Algorithm for Retinal Vessel Segmentation

Many previous works based on deep learning for retinal vessel segmentation have achieved promising performance by manually designing U-shaped convolutional neural networks (CNNs). However, the manual design of these CNNs is time-consuming and requires extensive empirical knowledge. To address this problem, we propose a novel method using genetic algorithms (GAs) to automatically design a lightweight U-shaped CNN for retinal vessel segmentation, called Genetic U-Net. Here we first design a special search space containing the structure of U-Net and its corresponding operations, and then use genetic algorithm to search for superior architectures in this search space. Experimental results show that the proposed method outperforms the existing methods on three public datasets, DRIVE, CHASE_DB1 and STARE. In addition, the architectures obtained by the proposed method are more lightweight but robust than the state-of-the-art models.

[258] 2010.15586

Event-Driven Learning of Systematic Behaviours in Stock Markets

It is reported that financial news, especially financial events expressed in news, provide information to investors' long/short decisions and influence the movements of stock markets. Motivated by this, we leverage financial event streams to train a classification neural network that detects latent event-stock linkages and stock markets' systematic behaviours in the U.S. stock market. Our proposed pipeline includes (1) a combined event extraction method that utilizes Open Information Extraction and neural co-reference resolution, (2) a BERT/ALBERT enhanced representation of events, and (3) an extended hierarchical attention network that includes attentions on event, news and temporal levels. Our pipeline achieves significantly better accuracies and higher simulated annualized returns than state-of-the-art models when being applied to predicting Standard\&Poor 500, Dow Jones, Nasdaq indices and 10 individual stocks.

[259] 2010.15618

Sampling and Reconstruction of Sparse Signals in Shift-Invariant Spaces: Generalized Shannon's Theorem Meets Compressive Sensing

This paper introduces a novel framework and corresponding methods for sampling and reconstruction of sparse signals in shift-invariant (SI) spaces. We reinterpret the random demodulator, a system that acquires sparse bandlimited signals, as a system for acquisition of linear combinations of the samples in the SI setting with the box function as the sampling kernel. The sparsity assumption is exploited by compressive sensing (CS) framework for recovery of the SI samples from a reduced set of measurements. The samples are subsequently filtered by a discrete-time correction filter in order to reconstruct expansion coefficients of an observed signal. Furthermore, we offer a generalization of the proposed framework to other sampling kernels that lie in arbitrary SI spaces. The generalized method embeds the correction filter in a CS optimization problem which directly reconstructs expansion coefficients of the signal. Both approaches recast an inherently infinite-dimensional inverse problem as a finite-dimensional CS problem in an exact way. Finally, we conduct numerical experiments on signals in B-spline spaces whose expansion coefficients are assumed to be sparse in a certain transform domain. The coefficients can be regarded as parametric models of an underlying continuous signal, obtained from a reduced set of measurements. Such continuous signal representations are particularly suitable for signal processing without converting them into samples.

[260] 2010.15622

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approach to reduce the variance of policy gradient estimates using learned world models (WM's). In WMPG, a WM is trained online and used to imagine trajectories. The imagined trajectories are used in two ways. Firstly, to calculate a without-replacement estimator of the policy gradient. Secondly, the return of the imagined trajectories is used as an informed baseline. We compare the proposed approach with AC and MAC on a set of environments of increasing complexity (CartPole, LunarLander and Pong) and find that WMPG has better sample efficiency. Based on these results, we conclude that WMPG can yield increased sample efficiency in cases where a robust latent representation of the environment can be learned.

[261] 2010.15623

Fast Minimal Presentations of Bi-graded Persistence Modules

Multi-parameter persistent homology is a recent branch of topological data analysis. In this area, data sets are investigated through the lens of homology with respect to two or more scale parameters. The high computational cost of many algorithms calls for a preprocessing step to reduce the input size. In general, a minimal presentation is the smallest possible representation of a persistence module. Lesnick and Wright proposed recently an algorithm (the LW-algorithm) for computing minimal presentations based on matrix reduction. In this work, we propose, implement and benchmark several improvements over the LW-algorithm. Most notably, we propose the use of priority queues to avoid extensive scanning of the matrix columns, which constitutes the computational bottleneck in the LW-algorithm, and we combine their algorithm with ideas from the multi-parameter chunk algorithm by Fugacci and Kerber. Our extensive experiments show that our algorithm outperforms the LW-algorithm and computes the minimal presentation for data sets with millions of simplices within a few seconds. Our software is publicly available.

[262] 2010.15639

Teaching a GAN What Not to Learn

Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. In this paper, we approach the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, "The art of knowing is knowing what to ignore." In the GAN framework, we not only provide the GAN positive data that it must learn to model, but also present it with so-called negative samples that it must learn to avoid - we call this "The Rumi Framework." This formulation allows the discriminator to represent the underlying target distribution better by learning to penalize generated samples that are undesirable - we show that this capability accelerates the learning process of the generator. We present a reformulation of the standard GAN (SGAN) and least-squares GAN (LSGAN) within the Rumi setting. The advantage of the reformulation is demonstrated by means of experiments conducted on MNIST, Fashion MNIST, CelebA, and CIFAR-10 datasets. Finally, we consider an application of the proposed formulation to address the important problem of learning an under-represented class in an unbalanced dataset. The Rumi approach results in substantially lower FID scores than the standard GAN frameworks while possessing better generalization capability.

[263] 2010.15647

Brain Tumor Segmentation Network Using Attention-based Fusion and Spatial Relationship Constraint

Delineating the brain tumor from magnetic resonance (MR) images is critical for the treatment of gliomas. However, automatic delineation is challenging due to the complex appearance and ambiguous outlines of tumors. Considering that multi-modal MR images can reflect different tumor biological properties, we develop a novel multi-modal tumor segmentation network (MMTSN) to robustly segment brain tumors based on multi-modal MR images. The MMTSN is composed of three sub-branches and a main branch. Specifically, the sub-branches are used to capture different tumor features from multi-modal images, while in the main branch, we design a spatial-channel fusion block (SCFB) to effectively aggregate multi-modal features. Additionally, inspired by the fact that the spatial relationship between sub-regions of tumor is relatively fixed, e.g., the enhancing tumor is always in the tumor core, we propose a spatial loss to constrain the relationship between different sub-regions of tumor. We evaluate our method on the test set of multi-modal brain tumor segmentation challenge 2020 (BraTs2020). The method achieves 0.8764, 0.8243 and 0.773 dice score for whole tumor, tumor core and enhancing tumor, respectively.

[264] 2010.15654

Identification of complex mixtures for Raman spectroscopy using a novel scheme based on a new multi-label deep neural network

With noisy environment caused by fluoresence and additive white noise as well as complicated spectrum fingerprints, the identification of complex mixture materials remains a major challenge in Raman spectroscopy application. In this paper, we propose a new scheme based on a constant wavelet transform (CWT) and a deep network for classifying complex mixture. The scheme first transforms the noisy Raman spectrum to a two-dimensional scale map using CWT. A multi-label deep neural network model (MDNN) is then applied for classifying material. The proposed model accelerates the feature extraction and expands the feature graph using the global averaging pooling layer. The Sigmoid function is implemented in the last layer of the model. The MDNN model was trained, validated and tested with data collected from the samples prepared from substances in palm oil. During training and validating process, data augmentation is applied to overcome the imbalance of data and enrich the diversity of Raman spectra. From the test results, it is found that the MDNN model outperforms previously proposed deep neural network models in terms of Hamming loss, one error, coverage, ranking loss, average precision, F1 macro averaging and F1 micro averaging, respectively. The average detection time obtained from our model is 5.31 s, which is much faster than the detection time of the previously proposed models.

[265] 2010.15658

Generalization bounds for deep thresholding networks

We consider compressive sensing in the scenario where the sparsity basis (dictionary) is not known in advance, but needs to be learned from examples. Motivated by the well-known iterative soft thresholding algorithm for the reconstruction, we define deep networks parametrized by the dictionary, which we call deep thresholding networks. Based on training samples, we aim at learning the optimal sparsifying dictionary and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. The dictionary learning is performed via minimizing the empirical risk. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks. We obtain estimates of the sample complexity that depend only linearly on the dimensions and on the depth.

[266] 2010.15662

Independence Tests Without Ground Truth for Noisy Learners

Exact ground truth invariant polynomial systems can be written for arbitrarily correlated binary classifiers. Their solutions give estimates for sample statistics that require knowledge of the ground truth of the correct labels in the sample. Of these polynomial systems, only a few have been solved in closed form. Here we discuss the exact solution for independent binary classifiers - resolving an outstanding problem that has been presented at this conference and others. Its practical applicability is hampered by its sole remaining assumption - the classifiers need to be independent in their sample errors. We discuss how to use the closed form solution to create a self-consistent test that can validate the independence assumption itself absent the correct labels ground truth. It can be cast as an algebraic geometry conjecture for binary classifiers that remains unsolved. A similar conjecture for the ground truth invariant algebraic system for scalar regressors is solvable, and we present the solution here. We also discuss experiments on the Penn ML Benchmark classification tasks that provide further evidence that the conjecture may be true for the polynomial system of binary classifiers.

[267] 2010.15672

FD Cell-Free mMIMO: Analysis and Optimization

We consider a full-duplex cell-free massive multiple-input-multiple-output system with limited capacity fronthaul links. We derive its downlink/uplink closed-form spectral efficiency (SE) lower bounds with maximum-ratio transmission/maximum-ratio combining and optimal uniform quantization. To reduce carbon footprint, this paper maximizes the non-convex weighted sum energy efficiency (WSEE) via downlink and uplink power control, and successive convex approximation framework. We show that with low fronthaul capacity, the system requires a higher number of fronthaul quantization bits to achieve high SE and WSEE. For high fronthaul capacity, higher number of bits, however, achieves high SE but a reduced WSEE.

[268] 2010.15679

Lie-Trotter Splitting for the Nonlinear Stochastic Manakov System

This article analyses the convergence of the Lie-Trotter splitting scheme for the stochastic Manakov equation, a system arising in the study of pulse propagation in randomly birefringent optical fibers. First, we prove that the strong order of the numerical approximation is 1/2 if the nonlinear term in the system is globally Lipschitz. Then, we show that the splitting scheme has convergence order 1/2 in probability and almost sure order 1/2- in the case of a cubic nonlinearity. We provide several numerical experiments illustrating the aforementioned results and the efficiency of the Lie-Trotter splitting scheme. Finally, we numerically investigate the possible blowup of solutions for some power-law nonlinearities.

[269] 2010.15682

Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising

Optical coherence tomography angiography (OCTA) is a novel and clinically promising imaging modality to image retinal and sub-retinal vasculature. Based on repeated optical coherence tomography (OCT) scans, intensity changes are observed over time and used to compute OCTA image data. OCTA data are prone to noise and artifacts caused by variations in flow speed and patient movement. We propose a novel iterative maximum a posteriori signal recovery algorithm in order to generate OCTA volumes with reduced noise and increased image quality. This algorithm is based on previous work on probabilistic OCTA signal models and maximum likelihood estimates. Reconstruction results using total variation minimization and wavelet shrinkage for regularization were compared against an OCTA ground truth volume, merged from six co-registered single OCTA volumes. The results show a significant improvement in peak signal-to-noise ratio and structural similarity. The presented algorithm brings together OCTA image generation and Bayesian statistics and can be developed into new OCTA image generation and denoising algorithms.

[270] 2010.15687

Deep Autofocus for Synthetic Aperture Sonar

Synthetic aperture sonar (SAS) requires precise positional and environmental information to produce well-focused output during the image reconstruction step. However, errors in these measurements are commonly present resulting in defocused imagery. To overcome these issues, an \emph{autofocus} algorithm is employed as a post-processing step after image reconstruction for the purpose of improving image quality using the image content itself. These algorithms are usually iterative and metric-based in that they seek to optimize an image sharpness metric. In this letter, we demonstrate the potential of machine learning, specifically deep learning, to address the autofocus problem. We formulate the problem as a self-supervised, phase error estimation task using a deep network we call Deep Autofocus. Our formulation has the advantages of being non-iterative (and thus fast) and not requiring ground truth focused-defocused images pairs as often required by other deblurring deep learning methods. We compare our technique against a set of common sharpness metrics optimized using gradient descent over a real-world dataset. Our results demonstrate Deep Autofocus can produce imagery that is perceptually as good as benchmark iterative techniques but at a substantially lower computational cost. We conclude that our proposed Deep Autofocus can provide a more favorable cost-quality trade-off than state-of-the-art alternatives with significant potential of future research.

[271] 2010.15694

Learning interaction kernels in mean-field equations of 1st-order systems of interacting particles

We introduce a nonparametric algorithm to learn interaction kernels of mean-field equations for 1st-order systems of interacting particles. The data consist of discrete space-time observations of the solution. By least squares with regularization, the algorithm learns the kernel on data-adaptive hypothesis spaces efficiently. A key ingredient is a probabilistic error functional derived from the likelihood of the mean-field equation's diffusion process. The estimator converges, in a reproducing kernel Hilbert space and an L2 space under an identifiability condition, at a rate optimal in the sense that it equals the numerical integrator's order. We demonstrate our algorithm on three typical examples: the opinion dynamics with a piecewise linear kernel, the granular media model with a quadratic kernel, and the aggregation-diffusion with a repulsive-attractive kernel.

[272] 2010.15727

Attentive Clustering Processes

Amortized approaches to clustering have recently received renewed attention thanks to novel objective functions that exploit the expressiveness of deep learning models. In this work we revisit a recent proposal for fast amortized probabilistic clustering, the Clusterwise Clustering Process (CCP), which yields samples from the posterior distribution of cluster labels for sets of arbitrary size using only O(K) forward network evaluations, where K is an arbitrary number of clusters. While adequate in simple datasets, we show that the model can severely underfit complex datasets, and hypothesize that this limitation can be traced back to the implicit assumption that the probability of a point joining a cluster is equally sensitive to all the points available to join the same cluster. We propose an improved model, the Attentive Clustering Process (ACP), that selectively pays more attention to relevant points while preserving the invariance properties of the generative model. We illustrate the advantages of the new model in applications to spike-sorting in multi-electrode arrays and community discovery in networks. The latter case combines the ACP model with graph convolutional networks, and to our knowledge is the first deep learning model that handles an arbitrary number of communities.

[273] 2010.15729

Fundamental limitations to key distillation from Gaussian states with Gaussian operations

We establish fundamental upper bounds on the amount of secret key that can be extracted from continuous variable quantum Gaussian states by using only local Gaussian operations, local classical processing, and public communication. For one-way communication, we prove that the key is bounded by the R\'enyi-$2$ Gaussian entanglement of formation $E_{F,2}^{\mathrm{\scriptscriptstyle G}}$, with the inequality being saturated for pure Gaussian states. The same is true if two-way public communication is allowed but Alice and Bob employ protocols that start with destructive local Gaussian measurements. In the most general setting of two-way communication and arbitrary interactive protocols, we argue that $2 E_{F,2}^{\mathrm{\scriptscriptstyle G}}$ is still a bound on the extractable key, although we conjecture that the factor of $2$ is superfluous. Finally, for a wide class of Gaussian states that includes all two-mode states, we prove a recently proposed conjecture on the equality between $E_{F,2}^{\mathrm{\scriptscriptstyle G}}$ and the Gaussian intrinsic entanglement, thus endowing both measures with a more solid operational meaning.

[274] 2010.15761

A Helmholtz equation solver using unsupervised learning: Application to transcranial ultrasound

Transcranial ultrasound therapy is increasingly used for the non-invasive treatment of brain disorders. However, conventional numerical wave solvers are currently too computationally expensive to be used online during treatments to predict the acoustic field passing through the skull (e.g., to account for subject-specific dose and targeting variations). As a step towards real-time predictions, in the current work, a fast iterative solver for the heterogeneous Helmholtz equation in 2D is developed using a fully-learned optimizer. The lightweight network architecture is based on a modified UNet that includes a learned hidden state. The network is trained using a physics-based loss function and a set of idealized sound speed distributions with fully unsupervised training (no knowledge of the true solution is required). The learned optimizer shows excellent performance on the test set, and is capable of generalization well outside the training examples, including to much larger computational domains, and more complex source and sound speed distributions, for example, those derived from x-ray computed tomography images of the skull.

[275] 2010.15764

Domain adaptation under structural causal models

Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions of our theory are violated.

[276] 2010.15768

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/\epsilon^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/\epsilon^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.

[277] 2010.15776

Quantum advantage for differential equation analysis

Quantum algorithms for both differential equation solving and for machine learning potentially offer an exponential speedup over all known classical algorithms. However, there also exist obstacles to obtaining this potential speedup in useful problem instances. The essential obstacle for quantum differential equation solving is that outputting useful information may require difficult post-processing, and the essential obstacle for quantum machine learning is that inputting the training set is a difficult task just by itself. In this paper, we demonstrate, when combined, these difficulties solve one another. We show how the output of quantum differential equation solving can serve as the input for quantum machine learning, allowing dynamical analysis in terms of principal components, power spectra, and wavelet decompositions. To illustrate this, we consider continuous time Markov processes on epidemiological and social networks. These quantum algorithms provide an exponential advantage over existing classical Monte Carlo methods.

[278] 2010.15801

Ray-marching Thurston geometries

We describe algorithms that produce accurate real-time interactive in-space views of the eight Thurston geometries using ray-marching. We give a theoretical framework for our algorithms, independent of the geometry involved. In addition to scenes within a geometry $X$, we also consider scenes within quotient manifolds and orbifolds $X / \Gamma$. We adapt the Phong lighting model to non-euclidean geometries. The most difficult part of this is the calculation of light intensity, which relates to the area density of geodesic spheres. We also give extensive practical details for each geometry.

[279] 2010.15811

Algorithmic pure states for the negative spherical perceptron

We consider the spherical perceptron with Gaussian disorder. This is the set $S$ of points $\sigma \in \mathbb{R}^N$ on the sphere of radius $\sqrt{N}$ satisfying $\langle g_a , \sigma \rangle \ge \kappa\sqrt{N}\,$ for all $1 \le a \le M$, where $(g_a)_{a=1}^M$ are independent standard gaussian vectors and $\kappa \in \mathbb{R}$ is fixed. Various characteristics of $S$ such as its surface measure and the largest $M$ for which it is non-empty, were computed heuristically in statistical physics in the asymptotic regime $N \to \infty$, $M/N \to \alpha$. The case $\kappa<0$ is of special interest as $S$ is conjectured to exhibit a hierarchical tree-like geometry known as "full replica-symmetry breaking" (FRSB) close to the satisfiability threshold $\alpha_{\text{SAT}}(\kappa)$, and whose characteristics are captured by a Parisi variational principle akin to the one appearing in the Sherrington-Kirkpatrick model. In this paper we design an efficient algorithm which, given oracle access to the solution of the Parisi variational principle, exploits this conjectured FRSB structure for $\kappa<0$ and outputs a vector $\hat{\sigma}$ satisfying $\langle g_a , \hat{\sigma}\rangle \ge \kappa \sqrt{N}$ for all $1\le a \le M$ and lying on a sphere of non-trivial radius $\sqrt{\bar{q} N}$, where $\bar{q} \in (0,1)$ is the right-end of the support of the associated Parisi measure. We expect $\hat{\sigma}$ to be approximately the barycenter of a pure state of the spherical perceptron. Moreover we expect that $\bar{q} \to 1$ as $\alpha \to \alpha_{\text{SAT}}(\kappa)$, so that $\big\langle g_a,\hat{\sigma}/|\hat{\sigma}|\big\rangle \geq (\kappa-o(1))\sqrt{N}$ near criticality.

[280] 2010.15819

Tensor Completion via Tensor Networks with a Tucker Wrapper

In recent years, low-rank tensor completion (LRTC) has received considerable attention due to its applications in image/video inpainting, hyperspectral data recovery, etc. With different notions of tensor rank (e.g., CP, Tucker, tensor train/ring, etc.), various optimization based numerical methods are proposed to LRTC. However, tensor network based methods have not been proposed yet. In this paper, we propose to solve LRTC via tensor networks with a Tucker wrapper. Here by "Tucker wrapper" we mean that the outermost factor matrices of the tensor network are all orthonormal. We formulate LRTC as a problem of solving a system of nonlinear equations, rather than a constrained optimization problem. A two-level alternative least square method is then employed to update the unknown factors. The computation of the method is dominated by tensor matrix multiplications and can be efficiently performed. Also, under proper assumptions, it is shown that with high probability, the method converges to the exact solution at a linear rate. Numerical simulations show that the proposed algorithm is comparable with state-of-the-art methods.