New articles on eess


[1] 2011.07149

Satisfaction of linear temporal logic specifications through recurrence tools for hybrid systems

In this work we formulate the problem of satisfying a linear temporal logic formula on a linear plant with output feedback, through a recent hybrid systems formalism. We relate this problem to the notion of recurrence introduced for the considered formalism, and we then extend Lyapunov-like conditions for recurrence of an open, unbounded set. One of the proposed relaxed conditions allows certifying recurrence of a suitable set, and this guarantees that the high-level evolution of the plant satisfies the formula, without relying on discretizations of the plant. Simulations illustrate the proposed approach.


[2] 2011.07170

The balanced truncation bound is tight for SISO systems when the truncated system is state-space symmetric

Balanced truncation model reduction for linear dynamical systems yields a reduced-order model that satisfies a well-known error bound involving the system's Hankel singular values. We identify a new class of single-input, single output systems for which this bound holds with equality; in this class the truncated systems exhibit a certain state-space symmetry. This result extends to singular perturbation balancing. We illustrate the result with an example from power-system modeling.


[3] 2011.07183

Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects

This paper presents a method to design a min-norm Control Lyapunov Function (CLF)-based stabilizing controller for a control-affine system with uncertain dynamics using Gaussian Process (GP) regression. We propose a novel compound kernel that captures the control-affine nature of the problem, which permits the estimation of both state and input-dependent model uncertainty in a single GP regression problem. Furthermore, we provide probabilistic guarantees of convergence by the use of GP Upper Confidence Bound analysis and the formulation of a CLF-based stability chance constraint which can be incorporated in a min-norm optimization problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP). The data-collection process and the training of the GP regression model are carried out in an episodic learning fashion. We validate the proposed algorithm and controller in numerical simulations of an inverted pendulum and a kinematic bicycle model, resulting in stable trajectories which are very similar to the ones obtained if we actually knew the true plant dynamics.


[4] 2011.07184

A needle-based deep-neural-network camera

We experimentally demonstrate a camera whose primary optic is a cannula (diameter=0.22mm and length=12.5mm) that acts a lightpipe transporting light intensity from an object plane (35cm away) to its opposite end. Deep neural networks (DNNs) are used to reconstruct color and grayscale images with field of view of 180 and angular resolution of ~0.40. When trained on images with depth information, the DNN can create depth maps. Finally, we show DNN-based classification of the EMNIST dataset without and with image reconstructions. The former could be useful for imaging with enhanced privacy.


[5] 2011.07206

Synchronization in dynamical systems coupled via multiple directed networks

We study synchronization in a group of dynamical systems coupled via multiple directed networks. We show that even though the coupling in a single network may not be sufficient to synchronize the systems, combination of multiple networks can contribute to synchronization. We illustrate how the effectiveness of a collection of networks to synchronize the coupled systems depends on the graph topology. In particular, we show that if the graph sum is a directed graph whose reversal contains a spanning directed tree, then the network synchronizes if the coupling is strong enough. This is intuitive as there is a root node that influence every other node via edges where each edge is in one of the networks.


[6] 2011.07232

Visual Tool for Assessing Stability of DER Configurations on Three-Phase Radial Networks

We present a method for evaluating the placement of Distributed Energy Resources (DER) on distribution circuits in order to control voltages and power flows. Our previous work described Phasor-Based Control (PBC), a novel control framework where DERs inject real and reactive power to track voltage magnitude and phase angle targets. Here, we employ linearized power flow equations and integral controllers to develop a linear state space model for PBC acting on a three-phase unbalanced network. We use this model to evaluate whether a given inverter-based DER configuration admits a stable set of controller gains, which cannot be done by analyzing controllability nor by using the Lyapunov equation. Instead, we sample over a parameter space to identify a stable set of controller gains. Our stability analysis requires only a line impedance model and does not entail simulating the system or solving an optimization problem. We incorporate this assessment into a publicly available visualization tool and demonstrate three processes for evaluating many control configurations on the IEEE 123-node test feeder (123NF).


[7] 2011.07265

Channel Estimation for Large Intelligent Surface Aided MISO Communications: From LMMSE to Deep Learning Solutions

We consider multi-antenna wireless systems aided by large intelligent surfaces (LIS). LIS presents a new physical layer technology for improving coverage and energy efficiency by intelligently controlling the propagation environment. In practice however, achieving the anticipated gains of LIS requires accurate channel estimation. Recent attempts to solve this problem have considered the least-squares (LS) approach, which is simple but also sub-optimal. The optimal channel estimator, based on the minimum mean-squared-error (MMSE) criterion, is challenging to obtain and is non-linear due to the non-Gaussianity of the effective channel seen at the receiver. Here we present approaches to approximate the optimal MMSE channel estimator. As a first approach, we analytically develop the best linear estimator, the LMMSE, together with a corresponding majorization-minimization based algorithm designed to optimize the LIS phase shift matrix during the training phase. This estimator is shown to yield improved accuracy over the LS approach by exploiting second-order statistical properties of the wireless channel and the noise. To further improve performance and better approximate the globally-optimal MMSE channel estimator, we propose data-driven non-linear solutions based on deep learning. Specifically, by posing the MMSE channel estimation problem as an image denoising problem, we propose two convolutional neural network (CNN) based methods to perform the denoising and approximate the optimal MMSE channel estimation solution. Our numerical results show that these CNN-based estimators give superior performance compared with linear estimation approaches. They also have low computational complexity requirements, thereby motivating their potential use in future LIS-aided wireless communication systems.


[8] 2011.07274

On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when training and subsequently testing the network. For two different state of the art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low pass filters during training and leads to improved generalization to unseen filtering conditions at test time.


[9] 2011.07294

Pose-dependent weights and Domain Randomization for fully automatic X-ray to CT Registration

Fully automatic X-ray to CT registration requires a solid initialization to provide an initial alignment within the capture range of existing intensity-based registrations. This work adresses that need by providing a novel automatic initialization, which enables end to end registration. First, a neural network is trained once to detect a set of anatomical landmarks on simulated X-rays. A domain randomization scheme is proposed to enable the network to overcome the challenge of being trained purely on simulated data and run inference on real Xrays. Then, for each patient CT, a patient-specific landmark extraction scheme is used. It is based on backprojecting and clustering the previously trained networks predictions on a set of simulated X-rays. Next, the network is retrained to detect the new landmarks. Finally the combination of network and 3D landmark locations is used to compute the initialization using a perspective-n-point algorithm. During the computation of the pose, a weighting scheme is introduced to incorporate the confidence of the network in detecting the landmarks. The algorithm is evaluated on the pelvis using both real and simulated x-rays. The mean (+-standard deviation) target registration error in millimetres is 4.1 +- 4.3 for simulated X-rays with a success rate of 92% and 4.2 +- 3.9 for real X-rays with a success rate of 86.8%, where a success is defined as a translation error of less than 30mm.


[10] 2011.07320

Co-optimisation and Settlement of Power-Gas Coupled System in Day-ahead Market under Multiple Uncertainties

The interdependency of power systems and natural gas systems is being reinforced by the emerging power-to-gas facilities (PtGs), and the existing gas-fired generators. To jointly improve the efficiency and security under diverse uncertainties from renewable energy resources and load demands, it is essential to co-optimise these two energy systems for day-ahead market clearance. In this paper, a data-driven integrated electricity-gas system stochastic co-optimisation model is proposed. The model is accurately approximated by sequential mixed integer second-order cone programming, which can then be solved in parallel and decentralised manners by leveraging generalised Benders decomposition. Since the price formation and settlement issues have rarely been investigated for integrated electricity-gas systems in an uncertainty setting, a novel concept of expected locational marginal value is proposed to credit the flexibility of PtGs that helps hedging uncertainties. By comparing with a deterministic model and a distributionally robust model, the advantage of the proposed stochastic model and the efficiency of the proposed solution method are validated. Detailed results of pricing and settlement for PtGs are presented, showing that the expected locational marginal value can fairly credit the contribution of PtGs and reflect the system deficiency of capturing uncertainties.


[11] 2011.07338

Distortion-controlled Training for End-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss

The performance of speech enhancement and separation systems in anechoic environments has been significantly advanced with the recent progress in end-to-end neural network architectures. However, the performance of such systems in reverberant environments is yet to be explored. A core problem in reverberant speech separation is about the training and evaluation metrics. Standard time-domain metrics may introduce unexpected distortions during training and fail to properly evaluate the separation performance due to the presence of the reverberations. In this paper, we first introduce the "equal-valued contour" problem in reverberant separation where multiple outputs can lead to the same performance measured by the common metrics. We then investigate how "better" outputs with lower target-specific distortions can be selected by auxiliary autoencoding training (A2T). A2T assumes that the separation is done by a linear operation on the mixture signal, and it adds an loss term on the autoencoding of the direct-path target signals to ensure that the distortion introduced on the direct-path signals is controlled during separation. Evaluations on separation signal quality and speech recognition accuracy show that A2T is able to control the distortion on the direct-path signals and improve the recognition accuracy.


[12] 2011.07353

Pneumothorax and chest tube classification on chest x-rays for detection of missed pneumothorax

Chest x-ray imaging is widely used for the diagnosis of pneumothorax and there has been significant interest in developing automated methods to assist in image interpretation. We present an image classification pipeline which detects pneumothorax as well as the various types of chest tubes that are commonly used to treat pneumothorax. Our multi-stage algorithm is based on lung segmentation followed by pneumothorax classification, including classification of patches that are most likely to contain pneumothorax. This algorithm achieves state of the art performance for pneumothorax classification on an open-source benchmark dataset. Unlike previous work, this algorithm shows comparable performance on data with and without chest tubes and thus has an improved clinical utility. To evaluate these algorithms in a realistic clinical scenario, we demonstrate the ability to identify real cases of missed pneumothorax in a large dataset of chest x-ray studies.


[13] 2011.07445

Data-Driven Scenario Optimization for Automated Controller Tuning with Probabilistic Performance Guarantees

Systematic design and verification of advanced control strategies for complex systems under uncertainty largely remains an open problem. Despite the promise of blackbox optimization methods for automated controller tuning, they generally lack formal guarantees on the solution quality, which is especially important in the control of safety-critical systems. This paper focuses on obtaining closed-loop performance guarantees for automated controller tuning, which can be formulated as a black-box optimization problem under uncertainty. We use recent advances in non-convex scenario theory to provide a distribution-free bound on the probability of the closed-loop performance measures. To mitigate the computational complexity of the data-driven scenario optimization method, we restrict ourselves to a discrete set of candidate tuning parameters. We propose to generate these candidates using constrained Bayesian optimization run multiple times from different random seed points. We apply the proposed method for tuning an economic nonlinear model predictive controller for a semibatch reactor modeled by seven highly nonlinear differential equations.


[14] 2011.07458

Deep-RLS: A Model-Inspired Deep Learning Approach to Nonlinear PCA

In this work, we consider the application of model-based deep learning in nonlinear principal component analysis (PCA). Inspired by the deep unfolding methodology, we propose a task-based deep learning approach, referred to as Deep-RLS, that unfolds the iterations of the well-known recursive least squares (RLS) algorithm into the layers of a deep neural network in order to perform nonlinear PCA. In particular, we formulate the nonlinear PCA for the blind source separation (BSS) problem and show through numerical analysis that Deep-RLS results in a significant improvement in the accuracy of recovering the source signals in BSS when compared to the traditional RLS algorithm.


[15] 2011.07462

Nonlinearity Characteristic of High Impedance Fault at Resonant Distribution Networks: Theoretical Basis to Identify the Faulty Feeder

Feeder identification is indispensable for distribution networks to locate faults at a specific feeder, especially when measuring de-vices are insufficient for precise locations. For the high imped-ance fault (HIF), the feeder identification is much more compli-cated and related approaches are still in the early stage. This paper thoroughly and theoretically reveals the features of dif-ferent feeders when a HIF happens at the resonant grounded neutral (RGN) network, which is the most challenging condition for feeder identification. Firstly, the diversity of nonlinearity existing in HIFs is explained from the aspect of energy. Then, the differences of nonlinearities of zero-sequence currents between healthy and faulty feeders are deduced theoretically. Variations of the detuning index and damping ratio that exist in industries are both considered. Afterward, these theoretical conclusions are verified by the HIF cases experimented in a real 10kV system. Finally, based on the theories, we discuss about why the existing approaches are not reliable enough, and suggest some improve-ments.


[16] 2011.07514

Assessing the Economic Value of Renewable Resource Complementarity for Power Systems: an ENTSO-E Study

Spatiotemporal complementarity between variable renewable energy sources (RES) has received a great deal of attention in recent years. However, its value for power systems is still not properly understood. This research gap is tackled in the current work by evaluating the benefits of siting RES assets according to resource complementarity criteria. To this end, a two-stage method is employed. First, the complementarity between RES is assessed and the locations sets that maximize it are selected using an integer programming model. Subsequently, the outcome of the first stage is used within an expansion planning framework which identifies the optimal system design and serves as a basis for assessing the economic value of RES complementarity for power systems. The analysis is conducted on a realistic case study targeting the deployment of 450 GW of offshore wind in Europe. Results show that siting based on RES complementarity is particularly attractive when the power density of wind developments is relatively high and when the inter-annual variability of the underlying resource is accounted for. More specifically, such a siting strategy leads to yearly savings between 0.3 and 1.2 billion EUR compared with conventional schemes seeking to deploy generation capacity at the most productive locations.


[17] 2011.07529

Full Attitude Intelligent Controller Design of a Heliquad under Complete Failure of an Actuator

In this paper, we design a reliable Heliquad and develop an intelligent controller to handle one actuators complete failure. Heliquad is a multi-copter similar to Quadcopter, with four actuators diagonally symmetric from the center. Each actuator has two control inputs; the first input changes the propeller blades collective pitch (also called variable pitch), and the other input changes the rotation speed. For reliable operation and high torque characteristic requirement for yaw control, a cambered airfoil is used to design propeller blades. A neural network-based control allocation is designed to provide complete control authority even under a complete loss of one actuator. Nonlinear quaternion based outer loop position control, with proportional-derivative inner loop for attitude control and neural network-based control allocation is used in controller design. The proposed controller and Heliquad designs performance is evaluated using a software-in-loop simulation to track the position reference command under failure. The results clearly indicate that the Heliquad with an intelligent controller provides necessary tracking performance even under a complete loss of one actuator.


[18] 2011.07534

SAG-GAN: Semi-Supervised Attention-Guided GANs for Data Augmentation on Medical Images

Recently deep learning methods, in particular, convolutional neural networks (CNNs), have led to a massive breakthrough in the range of computer vision. Also, the large-scale annotated dataset is the essential key to a successful training procedure. However, it is a huge challenge to get such datasets in the medical domain. Towards this, we present a data augmentation method for generating synthetic medical images using cycle-consistency Generative Adversarial Networks (GANs). We add semi-supervised attention modules to generate images with convincing details. We treat tumor images and normal images as two domains. The proposed GANs-based model can generate a tumor image from a normal image, and in turn, it can also generate a normal image from a tumor image. Furthermore, we show that generated medical images can be used for improving the performance of ResNet18 for medical image classification. Our model is applied to three limited datasets of tumor MRI images. We first generate MRI images on limited datasets, then we trained three popular classification models to get the best model for tumor classification. Finally, we train the classification model using real images with classic data augmentation methods and classification models using synthetic images. The classification results between those trained models showed that the proposed SAG-GAN data augmentation method can boost Accuracy and AUC compare with classic data augmentation methods. We believe the proposed data augmentation method can apply to other medical image domains, and improve the accuracy of computer-assisted diagnosis.


[19] 2011.07545

Automatic dysarthric speech detection exploiting pairwise distance-based convolutional neural networks

Automatic dysarthric speech detection can provide reliable and cost-effective computer-aided tools to assist the clinical diagnosis and management of dysarthria. In this paper we propose a novel automatic dysarthric speech detection approach based on analyses of pairwise distance matrices using convolutional neural networks (CNNs). We represent utterances through articulatory posteriors and consider pairs of phonetically-balanced representations, with one representation from a healthy speaker (i.e., the reference representation) and the other representation from the test speaker (i.e., test representation). Given such pairs of reference and test representations, features are first extracted using a feature extraction front-end, a frame-level distance matrix is computed, and the obtained distance matrix is considered as an image by a CNN-based binary classifier. The feature extraction, distance matrix computation, and CNN-based classifier are jointly optimized in an end-to-end framework. Experimental results on two databases of healthy and dysarthric speakers for different languages and pathologies show that the proposed approach yields a high dysarthric speech detection performance, outperforming other CNN-based baseline approaches.


[20] 2011.07547

Multi-task single channel speech enhancement using speech presence probability as a secondary task training target

To cope with reverberation and noise in single channel acoustic scenarios, typical supervised deep neural network~(DNN)-based techniques learn a mapping from reverberant and noisy input features to a user-defined target. Commonly used targets are the desired signal magnitude, a time-frequency mask such as the Wiener gain, or the interference power spectral density and signal-to-interference ratio that can be used to compute a time-frequency mask. In this paper, we propose to incorporate multi-task learning in such DNN-based enhancement techniques by using speech presence probability (SPP) estimation as a secondary task assisting the target estimation in the main task. The advantage of multi-task learning lies in sharing domain-specific information between the two tasks (i.e., target and SPP estimation) and learning more generalizable and robust representations. To simultaneously learn both tasks, we propose to use the adaptive weighting method of losses derived from the homoscedastic uncertainty of tasks. Simulation results show that the dereverberation and noise reduction performance of a single-task DNN trained to directly estimate the Wiener gain is higher than the performance of single-task DNNs trained to estimate the desired signal magnitude, the interference power spectral density, or the signal-to-interference ratio. Incorporating the proposed multi-task learning scheme to jointly estimate the Wiener gain and the SPP increases the dereverberation and noise reduction further.


[21] 2011.07549

Learning-Assisted User Clustering in Cell-Free Massive MIMO-NOMA Networks

The superior spectral efficiency (SE) and user fairness feature of non-orthogonal multiple access (NOMA) systems are achieved by exploiting user clustering (UC) more efficiently. However, a random UC certainly results in a suboptimal solution while an exhaustive search method comes at the cost of high complexity, especially for systems of medium-to-large size. To address this problem, we develop two efficient unsupervised machine learning (ML) based UC algorithms, namely k-means++ and improved k-means++, to effectively cluster users into disjoint clusters in cell-free massive multiple-input multiple-output (CFmMIMO) system. Using full-pilot zero-forcing at access points, we derive the sum SE in closed-form expression taking into account the impact of intra-cluster pilot contamination, inter-cluster interference, and imperfect successive interference cancellation. To comprehensively assess the system performance, we formulate the sum SE optimization problem, and then develop a simple yet efficient iterative algorithm for its solution. In addition, the performance of collocated massive MIMO-NOMA (COmMIMO-NOMA) system is also characterized. Numerical results are provided to show the superior performance of the proposed UC algorithms compared to other baseline schemes. The effectiveness of applying NOMA in CFmMIMO and COmMIMO systems is also validated.


[22] 2011.07564

Generalized Short Circuit Ratio for Grid Strength Assessment in Inhomogeneous Multi-infeed LCC-HVDC Systems

Generalized short circuit ratio (gSCR) for gird strength assessment of multi-infeed high voltage direct current (MIDC) systems is a rigorous theoretical extension of traditional short circuit ratio (SCR), which allows the considerable experience of using SCR to be extended to MIDC systems. However, gSCR was originally derived based on the assumption of homogeneous MIDC systems, where all HVDC converters have an identical control configuration, which poses challenges to the applications of gSCR to inhomogeneous MIDC systems. To weaken this assumption, this letter applies modal perturbation theory to explore the possibility of applying gSCR in inhomogeneous MIDC systems. Results of numerical experiments show that, in inhomogeneous MIDC systems, the previously proposed gSCR can still be used without modification, but critical gSCR (CgSCR) needs to be redefined by considering the characteristics of HVDC converter control configurations. Accordingly, the difference between gSCR and redefined CgSCR can effectively quantify the pertinent ac grid strength in terms of static voltage stability margin. The performance of our proposed method is demonstrated in a triple-infeed inhomogeneous LCC-HVDC system.


[23] 2011.07567

Structure Preserving Model Order Reduction by Parameter Optimization

Model order reduction (MOR) methods that are designed to preserve structural features of a given full order model (FOM) often suffer from a lower accuracy when compared to their non structure preserving counterparts. In this paper, we present a framework for MOR based on direct parameter optimization. This means that the elements of the system matrices are iteratively varied to minimize an objective functional that measures the difference between the FOM and the reduced order model (ROM). Structural constraints are encoded in the parametrization of the ROM. The method only depends on frequency response data and can thus be applied to a wide range of dynamical systems. We illustrate the effectiveness of our method on a port-Hamiltonian and on a symmetric second order system in a comparison with other structure preserving MOR algorithms.


[24] 2011.07569

Networked Multi-Virus Spread with a Shared Resource: Analysis and Mitigation Strategies

The paper studies multi-competitive continuous-time epidemic processes in the presence of a shared resource. We consider the setting where multiple viruses are simultaneously prevalent in the population, and the spread occurs due to not only individual-to-individual interaction but also due to individual-to-resource interaction. In such a setting, an individual is either not affected by any of the viruses, or infected by one and exactly one of the multiple viruses. We classify the equilibria into three classes: a) the healthy state (all viruses are eradicated), b) single-virus endemic equilibria (all but one viruses are eradicated), and c) coexisting equilibria (multiple viruses simultaneously infect separate fractions of the population). We provide i) a sufficient condition for exponential (resp. asymptotic) eradication of a virus; ii) a sufficient condition for the existence, uniqueness and asymptotic stability of a single-virus endemic equilibrium; iii) a necessary and sufficient condition for the healthy state to be the unique equilibrium; and iv) for the bi-virus setting (i.e., two competing viruses), a sufficient condition and a necessary condition for the existence of a coexisting equilibrium. Building on these analytical results, we provide two mitigation strategies: a technique that guarantees convergence to the healthy state; and, in a bi-virus setup, a scheme that employs one virus to ensure that the other virus is eradicated. The results are illustrated in a numerical study of a spread scenario in Stockholm city.


[25] 2011.07590

MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

We present a novel compression algorithm for reducing the storage of LiDAR sensor data streams. Our model exploits spatio-temporal relationships across multiple LiDAR sweeps to reduce the bitrate of both geometry and intensity values. Towards this goal, we propose a novel conditional entropy model that models the probabilities of the octree symbols by considering both coarse level geometry and previous sweeps' geometric and intensity information. We then use the learned probability to encode the full data stream into a compact one. Our experiments demonstrate that our method significantly reduces the joint geometry and intensity bitrate over prior state-of-the-art LiDAR compression methods, with a reduction of 7-17% and 15-35% on the UrbanCity and SemanticKITTI datasets respectively.


[26] 2011.07592

Studying Robustness of Semantic Segmentation under Domain Shift in cardiac MRI

Cardiac magnetic resonance imaging (cMRI) is an integral part of diagnosis in many heart related diseases. Recently, deep neural networks have demonstrated successful automatic segmentation, thus alleviating the burden of time-consuming manual contouring of cardiac structures. Moreover, frameworks such as nnU-Net provide entirely automatic model configuration to unseen datasets enabling out-of-the-box application even by non-experts. However, current studies commonly neglect the clinically realistic scenario, in which a trained network is applied to data from a different domain such as deviating scanners or imaging protocols. This potentially leads to unexpected performance drops of deep learning models in real life applications. In this work, we systematically study challenges and opportunities of domain transfer across images from multiple clinical centres and scanner vendors. In order to maintain out-of-the-box usability, we build upon a fixed U-Net architecture configured by the nnU-net framework to investigate various data augmentation techniques and batch normalization layers as an easy-to-customize pipeline component and provide general guidelines on how to improve domain generalizability abilities in existing deep learning methods. Our proposed method ranked first at the Multi-Centre, Multi-Vendor & Multi-Disease Cardiac Image Segmentation Challenge (M&Ms).


[27] 2011.07626

Stability Analysis of Complementarity Systems with Neural Network Controllers

Complementarity problems, a class of mathematical optimization problems with orthogonality constraints, are widely used in many robotics tasks, such as locomotion and manipulation, due to their ability to model non-smooth phenomena (e.g., contact dynamics). In this paper, we propose a method to analyze the stability of complementarity systems with neural network controllers. First, we introduce a method to represent neural networks with rectified linear unit (ReLU) activations as the solution to a linear complementarity problem. Then, we show that systems with ReLU network controllers have an equivalent linear complementarity system (LCS) description. Using the LCS representation, we turn the stability verification problem into a linear matrix inequality (LMI) feasibility problem. We demonstrate the approach on several examples, including multi-contact problems and friction models with non-unique solutions.


[28] 2011.07649

Adaptive Step Size Incremental Conductance Based Maximum Power Point Tracking (MPPT)

Extracting maximum power available from photovoltaic arrays requires the system operating at the maximum power point (MPP). Therefore, finding the MPP is necessary for efficient operation of PV arrays. The MPP changes with multiple environmental factors, mainly temperature and irradiance. Traditionally, the incremental conductance technique with fixed step size was used to find the MPP, which suffers from a trade-off between speed of convergence and accuracy. In this work, we propose an incremental conductance maximum power point tracking (MPPT) algorithm with a variable step size, which adaptively changes the step size after each iteration based on how far away the current operating point is from a new MPP. This mitigates the aforementioned trade-off drastically by achieving faster convergence speed without the loss of accuracy. A series of simulations involving variations in temperature and irradiance were performed using MATLAB, and the speed of convergence and accuracy were compared with the traditional IC technique.


[29] 2011.07673

Spatiotemporal Characteristics of Ride-sourcing Operation in Urban Area

The emergence of ride-sourcing platforms has brought an innovative alternative in transportation, radically changed travel behaviors, and suggested new directions for transportation planners and operators. This paper provides an exploratory analysis on the operations of a ride-sourcing service using large-scale data on service performance. Observations over multiple days in Singapore suggest reproducible demand patterns and provide empirical estimates of fleet operations over time and space. During peak periods, we observe significant increases in the service rate along with surge price multipliers. We perform an in-depth analysis of fleet utilization rates and are able to explain daily patterns based on drivers' behavior by involving the number of shifts, shift duration, and shift start and end time choices. We also evaluate metrics of user experience, namely waiting and travel time distribution, and explain our empirical findings with distance metrics from driver trajectory analysis and congestion patterns. Our results of empirical observations on actual service in Singapore can help to understand the spatiotemporal characteristics of ride-sourcing services and provide important insights for transportation planning and operations.


[30] 2011.07684

Can a wearable device detect airway obstruction?

Objective: Lung health monitoring may enable early detection and treatment of exacerbations in respiratory diseases such as chronic obstructive pulmonary disease (COPD). Our objective is to explore the feasibility of using a wearable device to continuously monitor lung health. Towards that goal, this work studies the relationship between wearable device measurements of tidal breathing and spirometer lung function parameters. Methods: Data was collected from 25 single visit adult volunteers with a confirmed or suspected diagnosis of COPD. A respiratory chest belt was used to measure the fractional inspiratory time, respiratory rate and tidal volume of subjects during quiet breathing. The subjects also performed standard spirometry in a pulmonary function testing laboratory under the supervision of trained clinical staff to produce forced expiratory volume in one second (FEV1), forced vital capacity (FVC) and the ratio FEV1/FVC. Two classification models were built and trained: one to detect the presence of lung airway obstruction, and another to stratify the severity of airway obstruction. Results: The classifier detected airway obstruction with a sensitivity of 95% and a specificity of 80%. Severity of airway obstruction was classified as either mild/moderate or severe/very severe with a sensitivity of 91% and a specificity of 78%. Conclusion: Tidal breathing parameters that are measured with a wearable device can be used to detect airway obstruction with a level of accuracy that is comparable to that of conventional screening tests used in primary care settings. In addition, these parameters can reliably distinguish between severe and mild airway obstruction.


[31] 2011.07755

Audio-visual Multi-channel Integration and Recognition of Overlapped Speech

Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades. However, recognition of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in current ASR systems. Motivated by the invariance of visual modality to acoustic signal corruption and the additional cues they provide to separate the target speaker from the interfering sound sources, this paper presents an audio-visual multi-channel based recognition system for overlapped speech. It benefits from a tight integration between a speech separation front-end and recognition back-end, both of which incorporate additional video input. A series of audio-visual multi-channel speech separation front-end components based on TF masking, Filter&Sum and mask-based MVDR neural channel integration approaches are developed. To reduce the error cost mismatch between the separation and recognition components, the entire system is jointly fine-tuned using a multi-task criterion interpolation of the scale-invariant signal to noise ratio (Si-SNR) with either the connectionist temporal classification (CTC), or lattice-free maximum mutual information (LF-MMI) loss function. Experiments suggest that: the proposed audio-visual multi-channel recognition system outperforms the baseline audio-only multi-channel ASR system by up to 8.04% (31.68% relative) and 22.86% (58.51% relative) absolute WER reduction on overlapped speech constructed using either simulation or replaying of the LRS2 dataset respectively. Consistent performance improvements are also obtained using the proposed audio-visual multi-channel recognition system when using occluded video input with the face region randomly covered up to 60%.


[32] 2011.07782

Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment

There has been a growing interest in developing data-driven and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment where parameters such as CSIs keep changing. This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. Specifically, we consider an ``episodically dynamic" setting where the environment changes in ``episodes", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into the modeling process of learning wireless systems, so that the learning model can incrementally adapt to the new episodes, {\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel min-max formulation which ensures certain ``fairness" across different data samples. We demonstrate the effectiveness of the CL approach by customizing it to two popular DNN based models (one for power control and one for beamforming), and testing using both synthetic and real data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it maintains high performance over the previously encountered scenarios as well.


[33] 2011.07791

Block-Online Guided Source Separation

We propose a block-online algorithm of guided source separation (GSS). GSS is a speech separation method that uses diarization information to update parameters of the generative model of observation signals. Previous studies have shown that GSS performs well in multi-talker scenarios. However, it requires a large amount of calculation time, which is an obstacle to the deployment of online applications. It is also a problem that the offline GSS is an utterance-wise algorithm so that it produces latency according to the length of the utterance. With the proposed algorithm, block-wise input samples and corresponding time annotations are concatenated with those in the preceding context and used to update the parameters. Using the context enables the algorithm to estimate time-frequency masks accurately only from one iteration of optimization for each block, and its latency does not depend on the utterance length but predetermined block length. It also reduces calculation cost by updating only the parameters of active speakers in each block and its context. Evaluation on the CHiME-6 corpus and a meeting corpus showed that the proposed algorithm achieved almost the same performance as the conventional offline GSS algorithm but with 32x faster calculation, which is sufficient for real-time applications.


[34] 2011.07795

Deep learning in magnetic resonance prostate segmentation: A review and a new perspective

Prostate radiotherapy is a well established curative oncology modality, which in future will use Magnetic Resonance Imaging (MRI)-based radiotherapy for daily adaptive radiotherapy target definition. However the time needed to delineate the prostate from MRI data accurately is a time consuming process. Deep learning has been identified as a potential new technology for the delivery of precision radiotherapy in prostate cancer, where accurate prostate segmentation helps in cancer detection and therapy. However, the trained models can be limited in their application to clinical setting due to different acquisition protocols, limited publicly available datasets, where the size of the datasets are relatively small. Therefore, to explore the field of prostate segmentation and to discover a generalisable solution, we review the state-of-the-art deep learning algorithms in MR prostate segmentation; provide insights to the field by discussing their limitations and strengths; and propose an optimised 2D U-Net for MR prostate segmentation. We evaluate the performance on four publicly available datasets using Dice Similarity Coefficient (DSC) as performance metric. Our experiments include within dataset evaluation and cross-dataset evaluation. The best result is achieved by composite evaluation (DSC of 0.9427 on Decathlon test set) and the poorest result is achieved by cross-dataset evaluation (DSC of 0.5892, Prostate X training set, Promise 12 testing set). We outline the challenges and provide recommendations for future work. Our research provides a new perspective to MR prostate segmentation and more importantly, we provide standardised experiment settings for researchers to evaluate their algorithms. Our code is available at https://github.com/AIEMMU/MRI\_Prostate.


[35] 2011.07859

A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network

Polyphonic sound event detection and localization (SELD) task is challenging because it is difficult to jointly optimize sound event detection (SED) and direction-of-arrival (DOA) estimation in the same network. We propose a general network architecture for SELD in which the SELD network comprises sub-networks that are pretrained to solve SED and DOA estimation independently, and a recurrent layer that combines the SED and DOA estimation outputs into SELD outputs. The recurrent layer does the alignment between the sound classes and DOAs of sound events while being unaware of how these outputs are produced by the upstream SED and DOA estimation algorithms. This simple network architecture is compatible with different existing SED and DOA estimation algorithms. It is highly practical since the sub-networks can be improved independently. The experimental results using the DCASE 2020 SELD dataset show that the performances of our proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms. The source code for the proposed SELD network architecture is available at Github.


[36] 2011.07950

Comprehensive evaluation of no-reference image quality assessment algorithms on authentic distortions

Objective image quality assessment deals with the prediction of digital images' perceptual quality. No-reference image quality assessment predicts the quality of a given input image without any knowledge or information about its pristine (distortion free) counterpart. Machine learning algorithms are heavily used in no-reference image quality assessment because it is very complicated to model the human visual system's quality perception. Moreover, no-reference image quality assessment algorithms are evaluated on publicly available benchmark databases. These databases contain images with their corresponding quality scores. In this study, we evaluate several machine learning based NR-IQA methods and one opinion unaware method on databases consisting of authentic distortions. Specifically, LIVE In the Wild and KonIQ-10k databases were applied to evaluate the state-of-the-art. For machine learning based methods, appx. 80% were used for training and the remaining 20% were used for testing. Furthermore, average PLCC, SROCC, and KROCC values were reported over 100 random train-test splits. The statistics of PLCC, SROCC, and KROCC values were also published using boxplots. Our evaluation results may be helpful to obtain a clear understanding about the status of state-of-the-art no-reference image quality assessment methods.


[37] 2011.07952

Multi-channel MR Reconstruction (MC-MRRec) Challenge -- Comparing Accelerated MR Reconstruction Models and Assessing Their Genereralizability to Datasets Collected with Different Coils

The 2020 Multi-channel Magnetic Resonance Reconstruction (MC-MRRec) Challenge had two primary goals: 1) compare different MR image reconstruction models on a large dataset and 2) assess the generalizability of these models to datasets acquired with a different number of receiver coils (i.e., multiple channels). The challenge had two tracks: Track 01 focused on assessing models trained and tested with 12-channel data. Track 02 focused on assessing models trained with 12-channel data and tested on both 12-channel and 32-channel data. While the challenge is ongoing, here we describe the first edition of the challenge and summarise submissions received prior to 5 September 2020. Track 01 had five baseline models and received four independent submissions. Track 02 had two baseline models and received two independent submissions. This manuscript provides relevant comparative information on the current state-of-the-art of MR reconstruction and highlights the challenges of obtaining generalizable models that are required prior to clinical adoption. Both challenge tracks remain open and will provide an objective performance assessment for future submissions. Subsequent editions of the challenge are proposed to investigate new concepts and strategies, such as the integration of potentially available longitudinal information during the MR reconstruction process. An outline of the proposed second edition of the challenge is presented in this manuscript.


[38] 2011.07981

Resilient Identification of Distribution Network Topology

Network topology identification (TI) is an essential function for distributed energy resources management systems (DERMS) to organize and operate widespread distributed energy resources (DERs). In this paper, discriminant analysis (DA) is deployed to develop a network TI function that relies only on the measurements available to DERMS. The propounded method is able to identify the network switching configuration, as well as the status of protective devices. Following, to improve the TI resiliency against the interruption of communication channels, a quadratic programming optimization approach is proposed to recover the missing signals. By deploying the propounded data recovery approach and Bayes' theorem together, a benchmark is developed afterward to identify anomalous measurements. This benchmark can make the TI function resilient against cyber-attacks. Having a low computational burden, this approach is fast-track and can be applied in real-time applications. Sensitivity analysis is performed to assess the contribution of different measurements and the impact of the system load type and loading level on the performance of the proposed approach.


[39] 2011.07995

Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model

Breast cancer screening is one of the most common radiological tasks with over 39 million exams performed each year. While breast cancer screening has been one of the most studied medical imaging applications of artificial intelligence, the development and evaluation of the algorithms are hindered due to the lack of well-annotated large-scale publicly available datasets. This is particularly an issue for digital breast tomosynthesis (DBT) which is a relatively new breast cancer screening modality. We have curated and made publicly available a large-scale dataset of digital breast tomosynthesis images. It contains 22,032 reconstructed DBT volumes belonging to 5,610 studies from 5,060 patients. This included four groups: (1) 5,129 normal studies, (2) 280 studies where additional imaging was needed but no biopsy was performed, (3) 112 benign biopsied studies, and (4) 89 studies with cancer. Our dataset included masses and architectural distortions which were annotated by two experienced radiologists. Additionally, we developed a single-phase deep learning detection model and tested it using our dataset to serve as a baseline for future research. Our model reached a sensitivity of 65% at 2 false positives per breast. Our large, diverse, and highly-curated dataset will facilitate development and evaluation of AI algorithms for breast cancer screening through providing data for training as well as common set of cases for model validation. The performance of the model developed in our study shows that the task remains challenging and will serve as a baseline for future model development.


[40] 2011.08001

Deep-LIBRA: Artificial intelligence method for robust quantification of breast density with independent validation in breast cancer risk assessment

Breast density is an important risk factor for breast cancer that also affects the specificity and sensitivity of screening mammography. Current federal legislation mandates reporting of breast density for all women undergoing breast screening. Clinically, breast density is assessed visually using the American College of Radiology Breast Imaging Reporting And Data System (BI-RADS) scale. Here, we introduce an artificial intelligence (AI) method to estimate breast percentage density (PD) from digital mammograms. Our method leverages deep learning (DL) using two convolutional neural network architectures to accurately segment the breast area. A machine-learning algorithm combining superpixel generation, texture feature analysis, and support vector machine is then applied to differentiate dense from non-dense tissue regions, from which PD is estimated. Our method has been trained and validated on a multi-ethnic, multi-institutional dataset of 15,661 images (4,437 women), and then tested on an independent dataset of 6,368 digital mammograms (1,702 women; cases=414) for both PD estimation and discrimination of breast cancer. On the independent dataset, PD estimates from Deep-LIBRA and an expert reader were strongly correlated (Spearman correlation coefficient = 0.90). Moreover, Deep-LIBRA yielded a higher breast cancer discrimination performance (area under the ROC curve, AUC = 0.611 [95\% confidence interval (CI): 0.583, 0.639]) compared to four other widely-used research and commercial PD assessment methods (AUCs = 0.528 to 0.588). Our results suggest a strong agreement of PD estimates between Deep-LIBRA and gold-standard assessment by an expert reader, as well as improved performance in breast cancer risk assessment over state-of-the-art open-source and commercial methods.


[41] 2011.08129

Tissue characterization based on the analysis on i3DUS data for diagnosis support in neurosurgery

Brain shift makes the pre-operative MRI navigation highly inaccurate hence the intraoperative modalities are adopted in surgical theatre. Due to the excellent economic and portability merits, the Ultrasound imaging is used at our collaborating hospital, Charing Cross Hospital, Imperial College London, UK. However, it is found that intraoperative diagnosis on Ultrasound images is not straightforward and consistent, even for very experienced clinical experts. Hence, there is a demand to design a Computer-aided-diagnosis system to provide a robust second opinion to help the surgeons. The proposed CAD system based on "Mixed-Attention Res-U-net with asymmetric loss function" achieves the state-of-the-art results comparing to the ground truth by classification at pixel-level directly, it also outperforms all the current main stream pixel-level classification methods (e.g. U-net, FCN) in all the evaluation metrices.


[42] 2011.08146

Temporal Dynamic Model for Resting State fMRI Data: A Neural Ordinary Differential Equation approach

The objective of this paper is to provide a temporal dynamic model for resting state functional Magnetic Resonance Imaging (fMRI) trajectory to predict future brain images based on the given sequence. To this end, we came up with the model that takes advantage of representation learning and Neural Ordinary Differential Equation (Neural ODE) to compress the fMRI image data into latent representation and learn to predict the trajectory following differential equation. Latent space was analyzed by Gaussian Mixture Model. The learned fMRI trajectory embedding can be used to explain the variance of the trajectory and predict human traits for each subject. This method achieves average 0.5 spatial correlation for the whole predicted trajectory, and provide trained ODE parameter for further analysis.


[43] 2011.06092

40 Gbps Readout interface STARE for the AGATA Project

The Advanced GAmma Tracking Array (AGATA) multi detector spectrometer will provide precise information for the study of the properties of the exotic nuclear matter (very unbalanced proton (Z) and neutron (N) numbers) along proton- and neutron- drip lines and of super-heavy nuclei. This is done using the latest technology of particle accelerators. The AGATA spectrometer consists of 180 high purity Germanium detectors. Each detector is segmented into 38 segments. The very harsh project requirements are to measure gamma ray energies with very high resolution (< 1x 10 -3) at a high detector counting rate (50 Kevents / sec / crystal). This results in a very high data transfer rate per crystal (5 to 8 Gbps). The 38 segments are sampled @ 100 MHz with 14 bits of resolution. The samples are continuously transferred to the CAP module which reduces the data rate from 64 Gbps to 5 Gbps. The CAP module also adds continuous monitoring data which results in total outgoing data rate of 10 Gbps. The STARE module is designed to fit between the CAP module and the computer farm. It will package the data from the CAP module and transmit it to the server farm using a 10 Gbps UDP connection with a delivery insurance mechanism implemented to ensure that all data is transferred.


[44] 2011.06748

Safe and Robust Motion Planning for Dynamic Robotics via Control Barrier Functions

Control Barrier Functions (CBF) are widely used to enforce the safety-critical constraints on nonlinear systems. Recently, these functions are being incorporated into a path planning framework to design a safety-critical path planner. However, these methods fall short of providing a realistic path considering both run-time complexity and safety-critical constraints. This paper proposes a novel motion planning approach using Rapidly exploring Random Trees (RRT) algorithm to enforce the robust CBF and kinodynamic constraints to generate a safety-critical path that is free of any obstacles while taking into account the model uncertainty from robot dynamics as well as perception. Result analysis indicates that the proposed method outperforms various conventional RRT based path planners, guaranteeing a safety-critical path with reduced computational overhead. We present numerical validation of the algorithm on the Hamster V7 robot car, a micro autonomous Unmanned Ground Vehicle, where it performs dynamic navigation on an obstacle-ridden path with various uncertainties in perception noises, and robot dynamics.


[45] 2011.07068

Fast and Robust Cascade Model for Multiple Degradation Single Image Super-Resolution

Single Image Super-Resolution (SISR) is one of the low-level computer vision problems that has received increased attention in the last few years. Current approaches are primarily based on harnessing the power of deep learning models and optimization techniques to reverse the degradation model. Owing to its hardness, isotropic blurring or Gaussians with small anisotropic deformations have been mainly considered. Here, we widen this scenario by including large non-Gaussian blurs that arise in real camera movements. Our approach leverages the degradation model and proposes a new formulation of the Convolutional Neural Network (CNN) cascade model, where each network sub-module is constrained to solve a specific degradation: deblurring or upsampling. A new densely connected CNN-architecture is proposed where the output of each sub-module is restricted using some external knowledge to focus it on its specific task. As far we know this use of domain-knowledge to module-level is a novelty in SISR. To fit the finest model, a final sub-module takes care of the residual errors propagated by the previous sub-modules. We check our model with three state of the art (SOTA) datasets in SISR and compare the results with the SOTA models. The results show that our model is the only one able to manage our wider set of deformations. Furthermore, our model overcomes all current SOTA methods for a standard set of deformations. In terms of computational load, our model also improves on the two closest competitors in terms of efficiency. Although the approach is non-blind and requires an estimation of the blur kernel, it shows robustness to blur kernel estimation errors, making it a good alternative to blind models.


[46] 2011.07089

Robust Quadruped Jumping via Deep Reinforcement Learning

In this paper we consider a general task of jumping varying distances and heights for a quadrupedal robot in noisy environments, such as off of uneven terrain and with variable robot dynamics parameters. To accurately jump in such conditions, we propose a framework using deep reinforcement learning to leverage the complex solution of nonlinear trajectory optimization for quadrupedal jumping. While the standalone optimization limits jumping to take-off from flat ground and requires accurate assumption of robot dynamics, our proposed approach improves the robustness to allow jumping off of significantly uneven terrain with variable robot dynamical parameters. Through our method, the quadruped is able to jump distances of up to 1 m and heights of up to 0.4 m, while being robust to environment noise of foot disturbances of up to 0.1 m in height as well as with 5% variability of its body mass and inertia. This behavior is learned through just a few thousand simulated jumps, and video results can be found at https://youtu.be/WVoImmxImL8.


[47] 2011.07104

Trajectory Optimization for High-Dimensional Nonlinear Systems under STL Specifications

Signal Temporal Logic (STL) has gained popularity in recent years as a specification language for cyber-physical systems, especially in robotics. Beyond being expressive and easy to understand, STL is appealing because the synthesis problem---generating a trajectory that satisfies a given specification---can be formulated as a trajectory optimization problem. Unfortunately, the associated cost function is nonsmooth and non-convex. As a result, existing synthesis methods scale poorly to high-dimensional nonlinear systems. In this letter, we present a new trajectory optimization approach for STL synthesis based on Differential Dynamic Programming (DDP). It is well known that DDP scales well to extremely high-dimensional nonlinear systems like robotic quadrupeds and humanoids: we show that these advantages can be harnessed for STL synthesis. We prove the soundness of our proposed approach, demonstrate order-of-magnitude speed improvements over the state-of-the-art on several benchmark problems, and demonstrate the scalability of our approach to the full nonlinear dynamics of a 7 degree-of-freedom robot arm.


[48] 2011.07124

Survey2Survey: A deep learning generative model approach for cross-survey image mapping

During the last decade, there has been an explosive growth in survey data and deep learning techniques, both of which have enabled great advances for astronomy. The amount of data from various surveys from multiple epochs with a wide range of wavelengths and vast sky coverage, albeit with varying brightness and quality, is overwhelming, and leveraging information from overlapping observations from different surveys has limitless potential in understanding galaxy formation and evolution. Synthetic galaxy image generation using physical models has been an important tool for survey data analysis, while using deep learning generative models shows great promise. In this paper, we present a novel approach for robustly expanding and improving survey data through cross-survey feature translation. We trained two types of generative neural networks to map images from the Sloan Digital Sky Survey (SDSS) into corresponding images from the Dark Energy Survey (DES), increasing the brightness and S/N of the fainter, lower quality source images without losing important morphological information. We demonstrate the robustness of our method by generating DES representations of SDSS images from outside the overlapping region, showing that the brightness and quality are improved even when the source images are of lower quality than the training images. Finally, we highlight several images in which the reconstruction process appears to have removed large artifacts from SDSS images. While only an initial application, our method shows promise as a method for robustly expanding and improving the quality of optical survey data and provides a potential avenue for cross-band reconstruction.


[49] 2011.07168

Expertise and confidence explain how social influence evolves along intellective tasks

Discovering the antecedents of individuals' influence in collaborative environments is an important, practical, and challenging problem. In this paper, we study interpersonal influence in small groups of individuals who collectively execute a sequence of intellective tasks. We observe that along an issue sequence with feedback, individuals with higher expertise and social confidence are accorded higher interpersonal influence. We also observe that low-performing individuals tend to underestimate their high-performing teammate's expertise. Based on these observations, we introduce three hypotheses and present empirical and theoretical support for their validity. We report empirical evidence on longstanding theories of transactive memory systems, social comparison, and confidence heuristics on the origins of social influence. We propose a cognitive dynamical model inspired by these theories to describe the process by which individuals adjust interpersonal influences over time. We demonstrate the model's accuracy in predicting individuals' influence and provide analytical results on its asymptotic behavior for the case with identically performing individuals. Lastly, we propose a novel approach using deep neural networks on a pre-trained text embedding model for predicting the influence of individuals. Using message contents, message times, and individual correctness collected during tasks, we are able to accurately predict individuals' self-reported influence over time. Extensive experiments verify the accuracy of the proposed models compared to baselines such as structural balance and reflected appraisal model. While the neural networks model is the most accurate, the dynamical model is the most interpretable for influence prediction.


[50] 2011.07210

Energy-Efficient Resource Allocation in UAV-Enabled Detection and Communication Systems

This paper investigates the problem of resource allocation for unmanned aerial vehicle (UAV) enabled joint radar and communication systems with rate-splitting multiple access (RSMA). In the considered system, UAV serves as a aerial base station to simultaneously communicate with multiple users and probe signals to targets of interest. By virtue of using linearly precoded rate splitting at the transmitter and successive interference cancellation at the receivers, RSMA is introduced as a promising paradigm to manage interference as well as enhance spectrum and energy efficiency. To maximize the energy efficiency of UAV networks, the deployment location and the beamforming matrix are jointly optimized under the constraints of power budget, transmission rate and approximation error. To solve the formulated non-convex problem (P1) efficiently, we decompose it into the UAV deployment subproblem (P2) and the beamforming optimization subproblem (P3). Then, we invoke the successive convex approximation and difference-of-convex programming as well as Dinkelbach methods to transform the intractable subproblems P2 and P3 into convex ones at each iteration. Next, an alternating algorithm is designed to solve the non-linear and non-convex problem (P1) in an efficient manner, while the corresponding complexity is analyzed as well. Finally, simulation results reveal that proposed algorithm with RSMA is superior to orthogonal multiple access and power-domain non-orthogonal multiple access in terms of power consumption and energy efficiency.


[51] 2011.07242

Deep Learning for Joint Channel Estimation and Feedback in Massive MIMO Systems

The great potentials of massive multiple-input multiple-output (MIMO) in frequency division duplex (FDD) mode can be fully exploited when the downlink channel state information (CSI) is available at base stations, which is difficult due to the large amount of feedback overhead caused by massive antennas. In this paper, we propose the deep learning based joint channel estimation and feedback framework, which comprehensively realizes the estimation, compression, and reconstruction of downlink channels in FDD massive MIMO systems. Two networks are constructed to estimate and feedback channels explicitly and implicitly. The explicit channel estimation and feedback network adopts a multi-signal-to-noise-ratios (SNRs) technique to obtain a single trained channel estimation subnet that works well with different SNRs and employs a deep residual network to reconstruct the channels, while the implicit channel estimation and feedback network directly compresses pilots and sends them back to reduce network parameters. Quantization module is also designed to generate data-bearing bitstreams. Simulation results show that the two proposed networks exhibit excellent performance of reconstruction and are robust to different environments and quantization errors.


[52] 2011.07348

Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays

When performing multi-channel speech enhancement with a wireless acoustic sensor network, streaming information from all sensors can be prohibitive in terms of communication costs. However, not all sensors will be necessary to achieve good performance, which presents an opportunity to reduce communication costs. We propose a data-driven technique to leverage these opportunities by jointly learning a speech enhancement and data-request neural network. Our model is trained with a task-performance/communication-cost trade off. While working within the trade off, our method can intelligently stream from more microphones in lower SNR scenes and fewer microphones in higher SNR scenes. We evaluate the model in a complex echoic acoustic scene with moving sources and show that it matches the performance of a baseline model while streaming less data.


[53] 2011.07406

Using Convolutional Variational Autoencoders to Predict Post-Trauma Health Outcomes from Actigraphy Data

Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly associated with experiencing a traumatic event. Estimating mental health status through non-invasive techniques such as activity-based algorithms can help to identify successful early interventions. In this work, we used locomotor activity captured from 1113 individuals who wore a research grade smartwatch post-trauma. A convolutional variational autoencoder (VAE) architecture was used for unsupervised feature extraction from four weeks of actigraphy data. By using VAE latent variables and the participant's pre-trauma physical health status as features, a logistic regression classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.64 to estimate mental health outcomes. The results indicate that the VAE model is a promising approach for actigraphy data analysis for mental health outcomes in long-term studies.


[54] 2011.07424

Intention-Based Lane Changing and Lane Keeping Haptic Guidance Steering System

Haptic guidance in a shared steering assistance system has drawn significant attention in intelligent vehicle fields, owing to its mutual communication ability for vehicle control. By exerting continuous torque on the steering wheel, both the driver and support system can share lateral control of the vehicle. However, current haptic guidance steering systems demonstrate some deficiencies in assisting lane changing. This study explored a new steering interaction method, including the design and evaluation of an intention-based haptic shared steering system. Such an intention-based method can support both lane keeping and lane changing assistance, by detecting a driver lane change intention. By using a deep learning-based method to model a driver decision timing regarding lane crossing, an adaptive gain control method was proposed for realizing a steering control system. An intention consistency method was proposed to detect whether the driver and the system were acting towards the same target trajectories and to accurately capture the driver intention. A driving simulator experiment was conducted to test the system performance. Participants were required to perform six trials with assistive methods and one trial without assistance. The results demonstrated that the supporting system decreased the lane departure risk in the lane keeping tasks and could support a fast and stable lane changing maneuver.


[55] 2011.07430

Audio-Visual Event Recognition through the lens of Adversary

As audio/visual classification models are widely deployed for sensitive tasks like content filtering at scale, it is critical to understand their robustness along with improving the accuracy. This work aims to study several key questions related to multimodal learning through the lens of adversarial noises: 1) The trade-off between early/middle/late fusion affecting its robustness and accuracy 2) How do different frequency/time domain features contribute to the robustness? 3) How do different neural modules contribute to the adversarial noise? In our experiment, we construct adversarial examples to attack state-of-the-art neural models trained on Google AudioSet. We compare how much attack potency in terms of adversarial perturbation of size $\epsilon$ using different $L_p$ norms we would need to "deactivate" the victim model. Using adversarial noise to ablate multimodal models, we are able to provide insights into what is the best potential fusion strategy to balance the model parameters/accuracy and robustness trade-off and distinguish the robust features versus the non-robust features that various neural networks model tend to learn.


[56] 2011.07442

Speech enhancement guided by contextual articulatory information

Previous studies have confirmed the effectiveness of leveraging articulatory information to attain improved speech enhancement (SE) performance. By augmenting the original acoustic features with the place/manner of articulatory features, the SE process can be guided to consider the articulatory properties of the input speech when performing enhancement. Hence, we believe that the contextual information of articulatory attributes should include useful information and can further benefit SE. In this study, we propose an SE system that incorporates contextual articulatory information; such information is obtained using broad phone class (BPC) end-to-end automatic speech recognition (ASR). Meanwhile, two training strategies are developed to train the SE system based on the BPC-based ASR: multitask-learning and deep-feature training strategies. Experimental results on the TIMIT dataset confirm that the contextual articulatory information facilitates an SE system in achieving better results. Moreover, in contrast to another SE system that is trained with monophonic ASR, the BPC-based ASR (providing contextual articulatory information) can improve the SE performance more effectively under different signal-to-noise ratios(SNR).


[57] 2011.07470

An efficient label-free analyte detection algorithm for time-resolved spectroscopy

Time-resolved spectral techniques play an important analysis tool in many contexts, from physical chemistry to biomedicine. Customarily, the label-free detection of analytes is manually performed by experts through the aid of classic dimensionality-reduction methods, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). This fundamental reliance on expert analysis for unknown analyte detection severely hinders the applicability and the throughput of these such techniques. For this reason, in this paper, we formulate this detection problem as an unsupervised learning problem and propose a novel machine learning algorithm for label-free analyte detection. To show the effectiveness of the proposed solution, we consider the problem of detecting the amino-acids in Liquid Chromatography coupled with Raman spectroscopy (LC-Raman).


[58] 2011.07482

Towards Trainable Saliency Maps in Medical Imaging

While success of Deep Learning (DL) in automated diagnosis can be transformative to the medicinal practice especially for people with little or no access to doctors, its widespread acceptability is severely limited by inherent black-box decision making and unsafe failure modes. While saliency methods attempt to tackle this problem in non-medical contexts, their apriori explanations do not transfer well to medical usecases. With this study we validate a model design element agnostic to both architecture complexity and model task, and show how introducing this element gives an inherently self-explanatory model. We compare our results with state of the art non-trainable saliency maps on RSNA Pneumonia Dataset and demonstrate a much higher localization efficacy using our adopted technique. We also compare, with a fully supervised baseline and provide a reasonable alternative to it's high data labelling overhead. We further investigate the validity of our claims through qualitative evaluation from an expert reader.


[59] 2011.07491

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting.


[60] 2011.07511

Wide-field Decodable Orthogonal Fingerprints of Single Nanoparticles Unlock Multiplexed Digital Assays

The control in optical uniformity of single nanoparticles and tuning their diversity in orthogonal dimensions, dot to dot, holds the key to unlock nanoscience and applications. Here we report that the time-domain emissive profile from single upconversion nanoparticle, including the rising, decay and peak moment of the excited state population (T2 profile), can be arbitrarily tuned by upconversion schemes, including interfacial energy migration, concentration dependency, energy transfer, and isolation of surface quenchers. This allows us to significantly increase the coding capacity at the nanoscale. We further implement both time-resolved wide-field imaging and deep-learning techniques to decode these fingerprints, showing high accuracies at high throughput. These high-dimensional optical fingerprints provide a new horizon for applications spanning from sub-diffraction-limit data storage, security inks, to high-throughput single-molecule digital assays and super-resolution imaging.


[61] 2011.07515

Nonlinear Cooperative Control of Double Drone-Bar Transportation System

Due to the limitation of the drone's load capacity, various specific tasks need to be accomplished by multiple drones in collaboration. In some transportation tasks, two drones are required to lift the load together, which brings even more significant challenges to the control problem because the transportation system is underactuated and it contains very complex dynamic coupling. When transporting bar-shaped objects, the load's attitude, the rope's swing motion, as well as the distance between the drones, should be carefully considered to ensure the security of the system. So far, few works have been implemented for double drone transportation systems to guarantee their transportation performance, especially in the aforementioned aspect. In this paper, a nonlinear cooperative control method is proposed, with both rigorous stability analysis and experimental results demonstrating its great performance. Without the need to distinguish the identities between the leader and the follower, the proposed method successfully realizes effective control for the two drones separately, mainly owning to the deep analysis for the system dynamics and the elaborate design for the control law. By utilizing Lyapunov techniques, the proposed controller achieves simultaneous positioning and mutual distance control of the drones, meanwhile, it efficiently eliminates the swing of the load. Flight experiments are presented to demonstrate the performance of the proposed nonlinear cooperative control strategy.


[62] 2011.07542

Automatic and perceptual discrimination between dysarthria, apraxia of speech, and neurotypical speech

Automatic techniques in the context of motor speech disorders~(MSDs) are typically two-class techniques aiming to discriminate between dysarthria and neurotypical speech or between dysarthria and apraxia of speech (AoS). Further, although such techniques are proposed to support the perceptual assessment of clinicians, the automatic and perceptual classification accuracy have never been compared. In this paper, we propose a three-class automatic technique and a set of handcrafted features for the discrimination of dysarthria, AoS and neurotypical speech. Instead of following the commonly used One-versus-One or One-versus-Rest approaches for multi-class classification, a hierarchical approach is proposed. Further, a perceptual study is conducted where speech and language pathologists are asked to listen to recordings of dysarthria, AoS, and neurotypical speech and decide which class the recordings belong to. The proposed automatic technique is evaluated on the same recordings and the automatic and perceptual classification performance are compared. The presented results show that the proposed hierarchical classification approach yields a higher classification accuracy than baseline One-versus-One and One-versus-Rest approaches. Further, the presented results show that the proposed approach yields a higher classification accuracy than the perceptual assessment of speech and language pathologists, demonstrating the potential advantages of integrating automatic tools in clinical practice.


[63] 2011.07546

Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio-to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time.


[64] 2011.07584

Pix2Streams: Dynamic Hydrology Maps from Satellite-LiDAR Fusion

Where are the Earth's streams flowing right now? Inland surface waters expand with floods and contract with droughts, so there is no one map of our streams. Current satellite approaches are limited to monthly observations that map only the widest streams. These are fed by smaller tributaries that make up much of the dendritic surface network but whose flow is unobserved. A complete map of our daily waters can give us an early warning for where droughts are born: the receding tips of the flowing network. Mapping them over years can give us a map of impermanence of our waters, showing where to expect water, and where not to. To that end, we feed the latest high-res sensor data to multiple deep learning models in order to map these flowing networks every day, stacking the times series maps over many years. Specifically, i) we enhance water segmentation to $50$ cm/pixel resolution, a 60$\times$ improvement over previous state-of-the-art results. Our U-Net trained on 30-40cm WorldView3 images can detect streams as narrow as 1-3m (30-60$\times$ over SOTA). Our multi-sensor, multi-res variant, WasserNetz, fuses a multi-day window of 3m PlanetScope imagery with 1m LiDAR data, to detect streams 5-7m wide. Both U-Nets produce a water probability map at the pixel-level. ii) We integrate this water map over a DEM-derived synthetic valley network map to produce a snapshot of flow at the stream level. iii) We apply this pipeline, which we call Pix2Streams, to a 2-year daily PlanetScope time-series of three watersheds in the US to produce the first high-fidelity dynamic map of stream flow frequency. The end result is a new map that, if applied at the national scale, could fundamentally improve how we manage our water resources around the world.


[65] 2011.07595

Accelerating Distributed SGD for Linear Least-Squares Problem

This paper considers the multi-agent distributed linear least-squares problem. The system comprises multiple agents, each agent with a locally observed set of data points, and a common server with whom the agents can interact. The agents' goal is to compute a linear model that best fits the collective data points observed by all the agents. In the server-based distributed settings, the server cannot access the data points held by the agents. The recently proposed Iteratively Pre-conditioned Gradient-descent (IPG) method has been shown to converge faster than other existing distributed algorithms that solve this problem. In the IPG algorithm, the server and the agents perform numerous iterative computations. Each of these iterations relies on the entire batch of data points observed by the agents for updating the current estimate of the solution. Here, we extend the idea of iterative pre-conditioning to the stochastic settings, where the server updates the estimate and the iterative pre-conditioning matrix based on a single randomly selected data point at every iteration. We show that our proposed Iteratively Pre-conditioned Stochastic Gradient-descent (IPSG) method converges linearly in expectation to a proximity of the solution. Importantly, we empirically show that the proposed IPSG method's convergence rate compares favorably to prominent stochastic algorithms for solving the linear least-squares problem in server-based networks.


[66] 2011.07600

Resource Allocation of Dual-Hop VLC/RF Systems with Light Energy Harvesting

In this paper, we study the time allocation optimization problem to maximize the sum throughput in a dual-hop heterogeneous visible light communication (VLC)/radio frequency (RF) communication system. Two scenarios are investigated in this paper. For the first scenario, we consider an optical wireless powered communication network (WPCN) in which all users harvest energy from the received lightwave over downlink (DL), and then they use the harvested energy to transmit information signals in the uplink (UL) channels based on the time division multiple access (TDMA) scheme. The optimal time allocation in the UL is obtained to maximize the sum throughput of all users. For the second scenario, the time switching simultaneous lightwave information and power transfer (TS-SLIPT) based on the dual-hop VLC/RF is assumed that the LED transmits information and power simultaneously in the first hop DL (i.e., VLC link). The harvested energy at the relay is used to transmit information signals over the UL in the second hop (i.e., RF link). We propose a multi-objective optimization problem (MOOP) to study the trade-off between UL and DL sum-rate maximization. The non-convex MOOP framework is then transformed into an equivalent form, which yields a set of Pareto optimal resource allocation policies. We also illustrate the effectiveness of the proposed approaches through numerical results.


[67] 2011.07616

Unsupervised Contrastive Learning of Sound Event Representations

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by other data augmentations. We analyze the main components of our method via ablation experiments. We evaluate the learned representations using linear evaluation, and in two in-domain downstream sound event classification tasks, namely, using limited manually labeled data, and using noisy labeled data. Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels, outperforming supervised baselines.


[68] 2011.07643

Advances in the training, pruning and enforcement of shape constraints of Morphological Neural Networks using Tropical Algebra

In this paper we study an emerging class of neural networks based on the morphological operators of dilation and erosion. We explore these networks mathematically from a tropical geometry perspective as well as mathematical morphology. Our contributions are threefold. First, we examine the training of morphological networks via Difference-of-Convex programming methods and extend a binary morphological classifier to multiclass tasks. Second, we focus on the sparsity of dense morphological networks trained via gradient descent algorithms and compare their performance to their linear counterparts under heavy pruning, showing that the morphological networks cope far better and are characterized with superior compression capabilities. Our approach incorporates the effect of the training optimizer used and offers quantitative and qualitative explanations. Finally, we study how the architectural structure of a morphological network can affect shape constraints, focusing on monotonicity. Via Maslov Dequantization, we obtain a softened version of a known architecture and show how this approach can improve training convergence and performance.


[69] 2011.07706

Mode Penalty Generative Adversarial Network with adapted Auto-encoder

Generative Adversarial Networks (GAN) are trained to generate sample images of interest distribution. To this end, generator network of GAN learns implicit distribution of real data set from the classification with candidate generated samples. Recently, various GANs have suggested novel ideas for stable optimizing of its networks. However, in real implementation, sometimes they still represent a only narrow part of true distribution or fail to converge. We assume this ill posed problem comes from poor gradient from objective function of discriminator, which easily trap the generator in a bad situation. To address this problem, we propose a mode penalty GAN combined with pre-trained auto encoder for explicit representation of generated and real data samples in the encoded space. In this space, we make a generator manifold to follow a real manifold by finding entire modes of target distribution. In addition, penalty for uncovered modes of target distribution is given to the generator which encourages it to find overall target distribution. We demonstrate that applying the proposed method to GANs helps generator's optimization becoming more stable and having faster convergence through experimental evaluations.


[70] 2011.07735

iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering

Most prior art in visual understanding relies solely on analyzing the "what" (e.g., event recognition) and "where" (e.g., event localization), which in some cases, fails to describe correct contextual relationships between events or leads to incorrect underlying visual attention. Part of what defines us as human and fundamentally different from machines is our instinct to seek causality behind any association, say an event Y that happened as a direct result of event X. To this end, we propose iPerceive, a framework capable of understanding the "why" between events in a video by building a common-sense knowledge base using contextual cues to infer causal relationships between objects in the video. We demonstrate the effectiveness of our technique using the dense video captioning (DVC) and video question answering (VideoQA) tasks. Furthermore, while most prior work in DVC and VideoQA relies solely on visual information, other modalities such as audio and speech are vital for a human observer's perception of an environment. We formulate DVC and VideoQA tasks as machine translation problems that utilize multiple modalities. By evaluating the performance of iPerceive DVC and iPerceive VideoQA on the ActivityNet Captions and TVQA datasets respectively, we show that our approach furthers the state-of-the-art. Code and samples are available at: iperceive.amanchadha.com.


[71] 2011.07738

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

The principle of Reward-Biased Maximum Likelihood Estimate Based Adaptive Control (RBMLE) that was proposed in Kumar and Becker (1982) is an alternative approach to the Upper Confidence Bound Based (UCB) Approach (Lai and Robbins, 1985) for employing the principle now known as "optimism in the face of uncertainty" (Auer et al., 2002). It utilizes a modified maximum likelihood estimate, with a bias towards those Markov Decision Process (MDP) models that yield a higher average reward. However, its regret performance has never been analyzed earlier for reinforcement learning (RL (Sutton et al., 1998)) tasks that involve the optimal control of unknown MDPs. We show that it has a learning regret of $O(\log T )$ where $T$ is the time-horizon, similar to the state-of-art algorithms. It provides an alternative general purpose method for solving RL problems.


[72] 2011.07754

Deep Shallow Fusion for RNN-T Personalization

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external language models and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized language models for more robust biasing. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement compared to a strong RNN-T baseline which uses shallow fusion and text-to-speech augmentation. Our work helps push the boundary of RNN-T personalization and close the gap with hybrid systems on use cases where biasing and entity recognition are crucial.


[73] 2011.07771

Indoor Positioning System based on Visible Light Communication for Mobile Robot in Nuclear Power Plant

Visible light positioning (VLP) is widely believed to be a cost-effective answer to the growing demanded for robot indoor positioning. Considering that some extreme environments require robot to be equipped with a precise and radiation-resistance indoor positioning system for doing difficult work, a novel VLP system with high accuracy is proposed to realize the long-playing inspection and intervention under radiation environment. The proposed system with sufficient radiation-tolerance is critical for operational inspection, maintenance and intervention tasks in nuclear facilities. Firstly, we designed intelligent LED lamp with visible light communication (VLC) function to dynamically create the indoor GPS tracking system. By installing the proposed lamps that replace standard lighting in key locations in the nuclear power plant, the proposed system can strengthen the safety of mobile robot and help for efficient inspection in the large-scale field. Secondly, in order to enhance the radiation-tolerance and multi-scenario of the proposed system, we proposed a shielding protection method for the camera vertically installed on the robot, which ensures that the image elements of the camera namely the captured VLP information is not affected by radiation. Besides, with the optimized visible light positioning algorithm based on dispersion calibration method, the proposed VLP system can achieve an average positioning accuracy of 0.82cm and ensure that 90% positioning errors are less than 1.417cm. Therefore, the proposed system not only has sufficient radiation-tolerance but achieve state-of-the-art positioning accuracy in the visible light positioning field.


[74] 2011.07792

Training Strategies and Data Augmentations in CNN-based DeepFake Video Detection

The fast and continuous growth in number and quality of deepfake videos calls for the development of reliable detection systems capable of automatically warning users on social media and on the Internet about the potential untruthfulness of such contents. While algorithms, software, and smartphone apps are getting better every day in generating manipulated videos and swapping faces, the accuracy of automated systems for face forgery detection in videos is still quite limited and generally biased toward the dataset used to design and train a specific detection system. In this paper we analyze how different training strategies and data augmentation techniques affect CNN-based deepfake detectors when training and testing on the same dataset or across different datasets.


[75] 2011.07833

Data-driven stabilization of nonlinear polynomial systems with noisy data

In a recent paper we have shown how to learn controllers for unknown linear systems using finite-sized noisy data by solving linear matrix inequalities. In this note we extend this approach to deal with unknown nonlinear polynomial systems by formulating stability certificates in the form of data-dependent sum of squares programs, whose solution directly provides a stabilizing controller and a Lyapunov function. We then derive variations of this result that lead to more advantageous controller designs. The results also reveal connections to the problem of designing a controller starting from a least-square estimate of the polynomial system.


[76] 2011.07939

Modeling, Reduction, and Control of a Helically Actuated Inertial Soft Robotic Arm via the Koopman Operator

Soft robots promise improved safety and capability over rigid robots when deployed in complex, delicate, and dynamic environments. However, the infinite degrees of freedom and highly nonlinear dynamics of these systems severely complicate their modeling and control. As a step toward addressing this open challenge, we apply the data-driven, Hankel Dynamic Mode Decomposition (HDMD) with time delay observables to the model identification of a highly inertial, helical soft robotic arm with a high number of underactuated degrees of freedom. The resulting model is linear and hence amenable to control via a Linear Quadratic Regulator (LQR). Using our test bed device, a dynamic, lightweight pneumatic fabric arm with an inertial mass at the tip, we show that the combination of HDMD and LQR allows us to command our robot to achieve arbitrary poses using only open loop control. We further show that Koopman spectral analysis gives us a dimensionally reduced basis of modes which decreases computational complexity without sacrificing predictive power.


[77] 2011.07945

Do not trust the neighbors! Adversarial Metric Learning for Self-Supervised Scene Flow Estimation

Scene flow is the task of estimating 3D motion vectors to individual points of a dynamic 3D scene. Motion vectors have shown to be beneficial for downstream tasks such as action classification and collision avoidance. However, data collected via LiDAR sensors and stereo cameras are computation and labor intensive to precisely annotate for scene flow. We address this annotation bottleneck on two ends. We propose a 3D scene flow benchmark and a novel self-supervised setup for training flow models. The benchmark consists of datasets designed to study individual aspects of flow estimation in progressive order of complexity, from a single object in motion to real-world scenes. Furthermore, we introduce Adversarial Metric Learning for self-supervised flow estimation. The flow model is fed with sequences of point clouds to perform flow estimation. A second model learns a latent metric to distinguish between the points translated by the flow estimations and the target point cloud. This latent metric is learned via a Multi-Scale Triplet loss, which uses intermediary feature vectors for the loss calculation. We use our proposed benchmark to draw insights about the performance of the baselines and of different models when trained using our setup. We find that our setup is able to keep motion coherence and preserve local geometries, which many self-supervised baselines fail to grasp. Dealing with occlusions, on the other hand, is still an open challenge.


[78] 2011.07953

Shimon the Robot Film Composer and DeepScore: An LSTM for Generation of Film Scores based on Visual Analysis

Composing for a film requires developing an understanding of the film, its characters and the film aesthetic choices made by the director. We propose using existing visual analysis systems as a core technology for film music generation. We extract film features including main characters and their emotions to develop a computer understanding of the film's narrative arc. This arc is combined with visually analyzed director aesthetic choices including pacing and levels of movement. Two systems are presented, the first using a robotic film composer and marimbist to generate film scores in real-time performance. The second software-based system builds on the results from the robot film composer to create narrative driven film scores.


[79] 2011.08024

Local power estimation of neuromodulations using point process modeling

Extracellular electrical potentials (EEP) recorded from the brain are an active manifestation of all cellular processes that propagate within a volume of brain tissue. A standard approach for their quantification are power spectral analyses methods that reflect the global distribution of signal power over frequency. However, these methods incorporate analysis windows to achieve locality and therefore, are limited by the inherent trade - off between time and frequency resolutions. In this paper, we present a novel approach to estimate local power more precisely at a resolution as high as the sampling frequency. Our methods are well grounded on established neurophysiology of the bio-signals where we model EEPs as comprising of two components: neuromodulations and background activity. A local measure of power, we call Marked Point Process (MPP) spectrogram, is then derived as a power - weighted intensity function of the point process for neuromodulations. We demonstrate our results on two datasets: 1) local field potentials recorded from the prefrontal cortex of 3 rats performing a working memory task and 2) EEPs recorded via electroencephalography from the visual cortex of human subjects performing a conditioned stimulus task. A detailed analysis of the power - specific marked features of neuromodulations confirm high correlation between power spectral density and power in neuromodulations establishing the aptness of MPP spectrogram as a finer measure of power where it is able to track local variations in power while preserving the global structure of signal power distribution.


[80] 2011.08061

FRDet: Balanced and Lightweight Object Detector based on Fire-Residual Modules for Embedded Processor of Autonomous Driving

For deployment on an embedded processor for autonomous driving, the object detection network should satisfy all of the accuracy, real-time inference, and light model size requirements. Conventional deep CNN-based detectors aim for high accuracy, making their model size heavy for an embedded system with limited memory space. In contrast, lightweight object detectors are greatly compressed but at a significant sacrifice of accuracy. Therefore, we propose FRDet, a lightweight one-stage object detector that is balanced to satisfy all the constraints of accuracy, model size, and real-time processing on an embedded GPU processor for autonomous driving applications. Our network aims to maximize the compression of the model while achieving or surpassing YOLOv3 level of accuracy. This paper proposes the Fire-Residual (FR) module to design a lightweight network with low accuracy loss by adapting fire modules with residual skip connections. In addition, the Gaussian uncertainty modeling of the bounding box is applied to further enhance the localization accuracy. Experiments on the KITTI dataset showed that FRDet reduced the memory size by 50.8% but achieved higher accuracy by 1.12% mAP compared to YOLOv3. Moreover, the real-time detection speed reached 31.3 FPS on an embedded GPU board(NVIDIA Xavier). The proposed network achieved higher compression with comparable accuracy compared to other deep CNN object detectors while showing improved accuracy than the lightweight detector baselines. Therefore, the proposed FRDet is a well-balanced and efficient object detector for practical application in autonomous driving that can satisfies all the criteria of accuracy, real-time inference, and light model size.


[81] 2011.08062

Multiclass Yeast Segmentation in Microstructured Environments with Deep Learning

Cell segmentation is a major bottleneck in extracting quantitative single-cell information from microscopy data. The challenge is exasperated in the setting of microstructured environments. While deep learning approaches have proven useful for general cell segmentation tasks, existing segmentation tools for the yeast-microstructure setting rely on traditional machine learning approaches. Here we present convolutional neural networks trained for multiclass segmenting of individual yeast cells and discerning these from cell-similar microstructures. We give an overview of the datasets recorded for training, validating and testing the networks, as well as a typical use-case. We showcase the method's contribution to segmenting yeast in microstructured environments with a typical synthetic biology application in mind. The models achieve robust segmentation results, outperforming the previous state-of-the-art in both accuracy and speed. The combination of fast and accurate segmentation is not only beneficial for a posteriori data processing, it also makes online monitoring of thousands of trapped cells or closed-loop optimal experimental design feasible from an image processing perspective.


[82] 2011.08064

Research Needed in Computational Social Science for Power System Reliability, Resilience, and Restoration

In the literature, smart grids are modeled as cyber-physical power systems without considering the computational social aspects. However, end-users are playing a key role in their operation and response to disturbances via demand response and distributed energy resources. Therefore, due to the critical role of active and passive end-users and the intermittency of renewable energy, smart grids must be planned and operated by considering the computational social aspects in addition to the technical aspects. The level of cooperation, flexibility, and other social features of the various stakeholders, including consumers, prosumers, and microgrids, affect the system efficiency, reliability, and resilience. In this paper, we design an artificial society simulating the interaction between power systems and the social communities that they serve via agent-based modeling inspired by Barsade's theory on the emotional spread. The simulation results show a decline in the consumers' and prosumers' satisfaction levels induced by a shortage of electricity. It also shows the effects of social diffusion via the Internet and mass media on the satisfaction level. In view of the importance of computational social science for power system applications and the limited number of publications devoted to it, we provide a list of research topics that need to be achieved to enhance the reliability and resilience of power systems' operation and planning.


[83] 2011.08065

Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy

Gastrointestinal (GI) pathologies are periodically screened, biopsied, and resected using surgical tools. Usually the procedures and the treated or resected areas are not specifically tracked or analysed during or after colonoscopies. Information regarding disease borders, development and amount and size of the resected area get lost. This can lead to poor follow-up and bothersome reassessment difficulties post-treatment. To improve the current standard and also to foster more research on the topic we have released the ``Kvasir-Instrument'' dataset which consists of $590$ annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, the dataset includes ground truth masks and bounding boxes and has been verified by two expert GI endoscopists. Additionally, we provide a baseline for the segmentation of the GI tools to promote research and algorithm development. We obtained a dice coefficient score of 0.9158 and a Jaccard index of 0.8578 using a classical U-Net architecture. A similar dice coefficient score was observed for DoubleUNet. The qualitative results showed that the model did not work for the images with specularity and the frames with multiple instruments, while the best result for both methods was observed on all other types of images. Both, qualitative and quantitative results show that the model performs reasonably good, but there is a large potential for further improvements. Benchmarking using the dataset provides an opportunity for researchers to contribute to the field of automatic endoscopic diagnostic and therapeutic tool segmentation for GI endoscopy.


[84] 2011.08068

Smartphone-Based Test and Predictive Models for Rapid, Non-Invasive, and Point-of-Care Monitoring of Ocular and Cardiovascular Complications Related to Diabetes

Among the most impactful diabetic complications are diabetic retinopathy, the leading cause of blindness among working class adults, and cardiovascular disease, the leading cause of death worldwide. This study describes the development of improved machine learning based screening of these conditions. First, a random forest model was developed by retrospectively analyzing the influence of various risk factors (obtained quickly and non-invasively) on cardiovascular risk. Next, a deep-learning model was developed for prediction of diabetic retinopathy from retinal fundus images by a modified and re-trained InceptionV3 image classification model. The input was simplified by automatically segmenting the blood vessels in the retinal image. The technique of transfer learning enables the model to capitalize on existing infrastructure on the target device, meaning more versatile deployment, especially helpful in low-resource settings. The models were integrated into a smartphone-based device, combined with an inexpensive 3D-printed retinal imaging attachment. Accuracy scores, as well as the receiver operating characteristic curve, the learning curve, and other gauges, were promising. This test is much cheaper and faster, enabling continuous monitoring for two damaging complications of diabetes. It has the potential to replace the manual methods of diagnosing both diabetic retinopathy and cardiovascular risk, which are time consuming and costly processes only done by medical professionals away from the point of care, and to prevent irreversible blindness and heart-related complications through faster, cheaper, and safer monitoring of diabetic complications. As well, tracking of cardiovascular and ocular complications of diabetes can enable improved detection of other diabetic complications, leading to earlier and more efficient treatment on a global scale.