New articles on eess

[1] 2007.01866

Selecting Regions of Interest in Large Multi-Scale Images for Cancer Pathology

Recent breakthroughs in object detection and image classification using Convolutional Neural Networks (CNNs) are revolutionizing the state of the art in medical imaging, and microscopy in particular presents abundant opportunities for computer vision algorithms to assist medical professionals in diagnosis of diseases ranging from malaria to cancer. High resolution scans of microscopy slides called Whole Slide Images (WSIs) offer enough information for a cancer pathologist to come to a conclusion regarding cancer presence, subtype, and severity based on measurements of features within the slide image at multiple scales and resolutions. WSIs' extremely high resolutions and feature scales ranging from gross anatomical structures down to cell nuclei preclude the use of standard CNN models for object detection and classification, which have typically been designed for images with dimensions in the hundreds of pixels and with objects on the order of the size of the image itself. We explore parallel approaches based on Reinforcement Learning and Beam Search to learn to progressively zoom into the WSI to detect Regions of Interest (ROIs) in liver pathology slides containing one of two types of liver cancer, namely Hepatocellular Carcinoma (HCC) and Cholangiocarcinoma (CC). These ROIs can then be presented directly to the pathologist to aid in measurement and diagnosis or be used for automated classification of tumor subtype.

[2] 2007.01904

Adaptive Background Compensation of FI-DACs with Application to Coherent Optical Transceivers

This work proposes a novel adaptive background compensation scheme for frequency interleaved digital-to-analog converters (FI-DACs). The technique is applicable to high speed digital transceivers such as those used in coherent optical communications. Although compensation of FI-DACs has been discussed before in the technical literature, adaptive background techniques have not yet been reported. The importance of the latter lies in their capability to automatically compensate errors caused by process, voltage, and temperature variations in the echnology (e.g., CMOS, SiGe, etc.) implementations of the data converters, and therefore ensure high manufacturing yield. The key ingredients of the proposed technique are a multiple-input multiple-output (MIMO) equalizer and the backpropagation algorithm used to adapt the coefficients of the aforementioned equalizer. Simulations show that the impairments of the analog signal path are accurately compensated and their effect is essentially eliminated, resulting in a high performance transmitter system.

[3] 2007.01938

Coherent Free-Space Optical Communication Using Non-mode-Selective Photonic Lantern

A coherent free-space optical communication system based on non-mode-selective photonic lantern is studied. Based on simulation of photon distribution, the power distribution at single-mode fiber end of the photonic lantern is quantitatively described as a truncated Gaussian distribution over a simplex. The signal-to-noise and the outage probability are analyzed for the communication system using photonic lantern based receiver with equal-gain combining, and they are compared with those of the single-mode fiber receiver and multimode fiber receiver. The scope of application of the communication system is provided. It is shown that the signal-to-noise ratio gain of the photonic lantern based receiver over single-mode fiber receiver and multimode fiber receiver can be greater than $7$ dB. The integral solution, series lower bound solution and asymptotic solution are presented for bit-error rate of photonic lantern based receiver, single-mode fiber receiver and multimode fiber receiver over the Gamma-Gamma atmosphere turbulence channels. Simulation results show that for the considered system the power distribution of the photonic lantern has limited influence on the outage probability and the bit-error rate performance.

[4] 2007.01942

A model-free tuning method for proportional-multi-resonant controllers

Resonant controllers are widely used in applications involving reference tracking and disturbance rejection of periodic signals. The controller design is typically performed by a trial-and-error approach or by means of time and resource-consuming analytic methods that require an accurate plant model, intricated mathematics and sophisticated tools. In this paper, we propose an easily implementable, model-free method for tuning a proportional-multi-resonant controller applicable to general linear time-invariant causal plants. Just like the Ziegler-Nichols methods, the proposed methodology consist in identifying one specific point of the plant's frequency response -- which is easily obtained in a relay with adjustable phase experiment -- and then designing the controller with simple tuning formulas and tables. The method is analyzed in detail for three examples, showing its practical appeal and wide applicability.

[5] 2007.01944

Consistency of Muscle Synergies Extracted via Higher-Order Tensor Decomposition Towards Myoelectric Control

In recent years, muscle synergies have been pro-posed for proportional myoelectric control. Synergies were extracted using matrix factorisation techniques (mainly non-negative matrix factorisation, NMF), which requires identification of synergies to tasks or movements. In addition, NMF methods were viable only with a task dimension of 2 degrees of freedoms(DoFs). Here, the potential use of a higher-order tensor model for myoelectric control is explored. We assess the ability of a constrained Tucker tensor decomposition to estimate consistent synergies when the task dimensionality is increased up to 3-DoFs. Synergies extracted from 3rd-order tensor of 1 and 3 DoFs were compared. Results showed that muscle synergies extracted via constrained Tucker decomposition were consistent with the increase of task-dimension. Hence, these results support the consideration of proportional 3-DoF myoelectric control based on tensor decompositions.

[6] 2007.01949

On the use of higher-order tensors to model muscle synergies

The muscle synergy concept provides the best framework to understand motor control and it has been recently utilised in many applications such as prosthesis control. The current muscle synergy model relies on decomposing multi-channel surface Electromyography (EMG) signals into a synergy matrix (spatial mode) and its weighting function (temporal mode). This is done using several matrix factorisation techniques, with Non-negative matrix factorisation (NMF) being the most prominent method. Here, we introduce a 4th-order tensor muscle synergy model that extends the current state of the art by taking spectral information and repetitions (movements) into account. This adds more depth to the model and provides more synergistic information. In particular, we illustrate a proof-of-concept study where the Tucker3 tensor decomposition model was applied to a subset of wrist movements from the Ninapro database. The results showed the potential of Tucker3 tensor factorisation in finding patterns of muscle synergies with information about the movements and highlights the differences between the current and proposed model.

[7] 2007.01960

Dynamic Weight-Based Collaborative Optimization for Power Grid Voltage Regulation

Power distribution grids with high PV generation are exposed to voltage disturbances due to the unpredictable nature of renewable resources. Smart PV inverters, if controlled in coordination with each other and continuously adapted to the real-time conditions of the generation and load, can effectively regulate nodal voltages across the feeder. This is a fairly new concept and requires communication and a distributed control logic to realize a fair utilization of reactive power across all PV systems. In this paper, a collaborative reactive power optimization is proposed to minimize voltage deviation under changing feeder conditions. The weight matrix of the collaborative optimization is updated based on the reactive power availability of each PV system, which changes over time depending on the cloud conditions and feeder loading. The proposed updates allow PV systems with higher reactive power availability to help other PV systems regulate their nodal voltage. Proof-of-concept simulations on a modified IEEE 123-node test feeder are performed to show the effectiveness of the proposed method in comparison with four common reactive power control methods.

[8] 2007.01975

Interpretation of Disease Evidence for Medical Images Using Adversarial Deformation Fields

The high complexity of deep learning models is associated with the difficulty of explaining what evidence they recognize as correlating with specific disease labels. This information is critical for building trust in models and finding their biases. Until now, automated deep learning visualization solutions have identified regions of images used by classifiers, but these solutions are too coarse, too noisy, or have a limited representation of the way images can change. We propose a novel method for formulating and presenting spatial explanations of disease evidence, called deformation field interpretation with generative adversarial networks (DeFI-GAN). An adversarially trained generator produces deformation fields that modify images of diseased patients to resemble images of healthy patients. We validate the method studying chronic obstructive pulmonary disease (COPD) evidence in chest x-rays (CXRs) and Alzheimer's disease (AD) evidence in brain MRIs. When extracting disease evidence in longitudinal data, we show compelling results against a baseline producing difference maps. DeFI-GAN also highlights disease biomarkers not found by previous methods and potential biases that may help in investigations of the dataset and of the adopted learning methods.

[9] 2007.02018

Deep Bilateral Retinex for Low-Light Image Enhancement

Low-light images, i.e. the images captured in low-light conditions, suffer from very poor visibility caused by low contrast, color distortion and significant measurement noise. Low-light image enhancement is about improving the visibility of low-light images. As the measurement noise in low-light images is usually significant yet complex with spatially-varying characteristic, how to handle the noise effectively is an important yet challenging problem in low-light image enhancement. Based on the Retinex decomposition of natural images, this paper proposes a deep learning method for low-light image enhancement with a particular focus on handling the measurement noise. The basic idea is to train a neural network to generate a set of pixel-wise operators for simultaneously predicting the noise and the illumination layer, where the operators are defined in the bilateral space. Such an integrated approach allows us to have an accurate prediction of the reflectance layer in the presence of significant spatially-varying measurement noise. Extensive experiments on several benchmark datasets have shown that the proposed method is very competitive to the state-of-the-art methods, and has significant advantage over others when processing images captured in extremely low lighting conditions.

[10] 2007.02029

Deconvolved Image Restoration from Autocorrelations

Recovering a signal from auto-correlations or, equivalently, retrieving the phase linked to a given Fourier modulus, is a wide-spread problem in imaging. This problem has been tackled in a number of experimental situations, from optical microscopy to adaptive astronomy, making use of assumptions based on constraints and prior information about the recovered object. In a similar fashion, deconvolution is another common problem in imaging, in particular within the optical community, allowing high-resolution reconstruction of blurred images. Here we address the mixed problem of performing the auto-correlation inversion while, at the same time, deconvolving its current estimation. To this end, we propose an I-divergence optimization, driving our formalism into a widely used iterative scheme, inspired by Bayesian-based approaches. We demonstrate the method recovering the signal from blurred auto-correlations, further analysing the cases of blurred objects and band-limited Fourier measurements.

[11] 2007.02044

Three Dimensional Moving Path Following Control for Robotic Vehicles with Minimum Positive Forward Speed

This paper addresses the problem of steering a robotic vehicle along a geometric path specified with respect to a reference frame moving in three dimensions, termed the Moving Path Following (MPF) motion control problem. The MPF motion control problem is solved for a large class of robotic vehicles that require a minimum positive forward speed to operate, which poses additional constraints, and is developed using geometric concepts, wherein the attitude control problem is formulated on Special Orthogonal group SO(3). Furthermore, the proposed control law is derived from a novel MPF error model formulation that allows to exclude the conservative constraints on the initial position of the vehicle with respect to the reference path by enabling the explicit control of the progression of a virtual point moving along the reference path. The task of the MPF control law is then to steer the vehicle towards the moving path and converge to the virtual point. Formal stability and convergence guarantees are provided using the Input-to-State Stability concept. In particular, we show that the proposed controller is robust to imperfect tracking errors by the autopilot and wind gusts. Simulation results are presented to illustrate the efficacy of the proposed MPF control law.

[12] 2007.02052

Choosing a sampling frequency for ECG QRS detection using convolutional networks

Automated QRS detection methods depend on the ECG data which is sampled at a certain frequency, irrespective of filter-based traditional methods or convolutional network (CNN) based deep learning methods. These methods require a selection of the sampling frequency at which they operate in the very first place. While working with data from two different datasets, which are sampled at different frequencies, often, data from both the datasets may need to resample at a common target frequency, which may be the frequency of either of the datasets or could be a different one. However, choosing data sampled at a certain frequency may have an impact on the model's generalisation capacity, and complexity. There exist some studies that investigate the effects of ECG sample frequencies on traditional filter-based methods, however, an extensive study of the effect of ECG sample frequency on deep learning-based models (convolutional networks), exploring their generalisability and complexity is yet to be explored. This experimental research investigates the impact of six different sample frequencies (50, 100, 250, 500, 1000, and 2000Hz) on four different convolutional network-based models' generalisability and complexity in order to form a basis to decide on an appropriate sample frequency for the QRS detection task for a particular performance requirement. Intra-database tests report an accuracy improvement no more than approximately 0.6\% from 100Hz to 250Hz and the shorter interquartile range for those two frequencies for all CNN-based models. The findings reveal that convolutional network-based deep learning models are capable of scoring higher levels of detection accuracies on ECG signals sampled at frequencies as low as 100Hz or 250Hz while maintaining lower model complexity (number of trainable parameters and training time).

[13] 2007.02064

Monitoring Depression in Bipolar Disorder using Circadian Measures from Smartphone Accelerometers

Current management of bipolar disorder relies on self-reported questionnaires and interviews with clinicians. The development of objective measures of deteriorating mood may also allow for early interventions to take place to avoid transitions into depressive states. The objective of this study was to use acceleration data recorded from smartphones to predict levels of depression in a population of participants diagnosed with bipolar disorder. Data were collected from 52 participants, with a mean of 37 weeks of acceleration data with a corresponding depression score recorded per participant. Time varying hidden Markov models were used to extract weekly features of activity, sleep and circadian rhythms. Personalised regression achieved mean absolute errors of 1.00(0.57) from a possible scale of 0 to 27 and was able to classify depression with an accuracy of 0.84(0.16). The results demonstrate features derived from smartphone accelerometers are able to provide objective markers of depression. Low barriers for uptake exist due to the widespread use of smartphones, with personalised models able to account for differences in the behaviour of individuals and provide accurate predictions of depression.

[14] 2007.02070

Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency

The design of an automated vehicle controller can be generally formulated into an optimal control problem. This paper proposes a continuous-time finite-horizon approximate dynamicprogramming (ADP) method, which can synthesis off-line near-optimal control policy with analytical vehicle dynamics. Lying on the general Policy Iteration framework, it employs value andpolicy neural networks to approximate the mappings from thesystem states to value function and control inputs, respectively. The proposed method can converge to the near-optimal solutionof the finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. We further applied our algorithm to the simulation of automated vehicle control for the path tracking maneuver. The results suggest that the proposed ADP method can obtain the near-optimal policy with 1% error and less calculation time. What is more, the proposed ADP algorithm is also suitable for nonlinear control systems, where ADP is almost 500 times faster than the nonlinear MPC ipopt solver.

[15] 2007.02074

A Linear Branch Flow Model for Radial Distribution Networks and its Application to Reactive Power Optimization and Network Reconfiguration

This paper presents a cold-start linear branch flow model named modified DistFlow. In modified DistFlow, the active and reactive power are replaced by their ratios to voltage magnitude as state variables, so that errors introduced by conventional branch flow linearization approaches due to their complete ignoring of the quadratic term are reduced. Based on the path-branch incidence matrix, branch power flows and nodal voltage magnitudes can be obtained in a non-iterative and explicit manner. Subsequently, the proposed modified DistFlow model is applied to the problem of reactive power optimization and network reconfiguration, transforming it into a mixed-integer quadratic programming (MIQP). Simulations show that the proposed modified DistFlow has a better accuracy than existing cold-start linear branch flow models for distribution networks, and the resulting MIQP model for reactive power optimization and network reconfiguration is much more computationally efficient than existing benchmarks.

[16] 2007.02075

Speckle2Void: Deep Self-Supervised SAR Despeckling with Blind-Spot Convolutional Neural Networks

Information extraction from synthetic aperture radar (SAR) images is heavily impaired by speckle noise, hence despeckling is a crucial preliminary step in scene analysis algorithms. The recent success of deep learning envisions a new generation of despeckling techniques that could outperform classical model-based methods. However, current deep learning approaches to despeckling require supervision for training, whereas clean SAR images are impossible to obtain. In the literature, this issue is tackled by resorting to either synthetically speckled optical images, which exhibit different properties with respect to true SAR images, or multi-temporal SAR images, which are difficult to acquire or fuse accurately. In this paper, inspired by recent works on blind-spot denoising networks, we propose a self-supervised Bayesian despeckling method. The proposed method is trained employing only noisy SAR images and can therefore learn features of real SAR images rather than synthetic data. Experiments show that the performance of the proposed approach is very close to the supervised training approach on synthetic data and superior on real data in both quantitative and visual assessments.

[17] 2007.02078

Registration of Histopathogy Images Using Structural Information From Fine Grained Feature Maps

Registration is an important part of many clinical workflows and factually, including information of structures of interest improves registration performance. We propose a novel approach of combining segmentation information in a registration framework using self supervised segmentation feature maps extracted using a pre-trained segmentation network followed by clustering. Using self supervised feature maps enables us to use segmentation information despite the unavailability of manual segmentations. Experimental results show our approach effectively replaces manual segmentation maps and demonstrate the possibility of obtaining state of the art registration performance in real world cases where manual segmentation maps are unavailable.

[18] 2007.02096

Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (this http URL) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue.

[19] 2007.02165

CardioLearn: A Cloud Deep Learning Service for Cardiac Disease Detection from Electrocardiogram

Electrocardiogram (ECG) is one of the most convenient and non-invasive tools for monitoring peoples' heart condition, which can use for diagnosing a wide range of heart diseases, including Cardiac Arrhythmia, Acute Coronary Syndrome, et al. However, traditional ECG disease detection models show substantial rates of misdiagnosis due to the limitations of the abilities of extracted features. Recent deep learning methods have shown significant advantages, but they do not provide publicly available services for those who have no training data or computational resources. In this paper, we demonstrate our work on building, training, and serving such out-of-the-box cloud deep learning service for cardiac disease detection from ECG named CardioLearn. The analytic ability of any other ECG recording devices can be enhanced by connecting to the Internet and invoke our open API. As a practical example, we also design a portable smart hardware device along with an interactive mobile program, which can collect ECG and detect potential cardiac diseases anytime and anywhere.

[20] 2007.02180

A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images

Acquiring count annotations generally requires less human effort than point-level and bounding box annotations. Thus, we propose the novel problem setup of localizing objects in dense scenes under this weaker supervision. We propose LOOC, a method to Localize Overlapping Objects with Count supervision. We train LOOC by alternating between two stages. In the first stage, LOOC learns to generate pseudo point-level annotations in a semi-supervised manner. In the second stage, LOOC uses a fully-supervised localization method that trains on these pseudo labels. The localization method is used to progressively improve the quality of the pseudo labels. We conducted experiments on popular counting datasets. For localization, LOOC achieves a strong new baseline in the novel problem setup where only count supervision is available. For counting, LOOC outperforms current state-of-the-art methods that only use count as their supervision. Code is available at:

[21] 2007.02214

A Nested Decomposition Method and Its Application for Coordinated Operation of Hierarchical Electrical Power Grids

Multilevel, multiarea, and hierarchically interconnected electrical power grids confront substantial challenges with the increasing integration of many volatile energy resources. The traditional isolated operation of interconnected power grids is uneconomical due to a lack of coordination; it may result in severe accidents that affect operational safety. However, the centralized operation of interconnected power grids is impractical, considering the operational independence and information privacy of each power grid. This paper proposes a nested decomposition method for the coordinated operation of hierarchical electrical power grids, which can achieve global optimization by iterating among upper- and lower-level power grids with exchange of boundary information alone. During each iteration, a projection function, which embodies the optimal objective value of a lower-level power grid projected onto its boundary variable space, is computed with second-order exactness. Thus, the proposed method can be applied widely to nonlinear continuous optimizations and can converge much more rapidly than existing decomposition methods. We conducted numerical tests of coordinated operation examples with a trilevel power grid that demonstrate the validity and performance of the proposed method.

[22] 2007.02219

A deep learning framework based on Koopman operator for data-driven modeling of vehicle dynamics

Autonomous vehicles and driving technologies have received notable attention in the past decades. In autonomous driving systems, \textcolor{black}{the} information of vehicle dynamics is required in most cases for designing of motion planning and control algorithms. However, it is nontrivial for identifying a global model of vehicle dynamics due to the existence of strong non-linearity and uncertainty. Many efforts have resorted to machine learning techniques for building data-driven models, but it may suffer from interpretability and result in a complex nonlinear representation. In this paper, we propose a deep learning framework relying on an interpretable Koopman operator to build a data-driven predictor of the vehicle dynamics. The main idea is to use the Koopman operator for representing the nonlinear dynamics in a linear lifted feature space. The approach results in a global model that integrates the dynamics in both longitudinal and lateral directions. As the core contribution, we propose a deep learning-based extended dynamic mode decomposition (Deep EDMD) algorithm to learn a finite approximation of the Koopman operator. Different from other machine learning-based approaches, deep neural networks play the role of learning feature representations for EDMD in the framework of the Koopman operator. Simulation results in a high-fidelity CarSim environment are reported, which show the capability of the Deep EDMD approach in multi-step prediction of vehicle dynamics at a wide operating range. Also, the proposed approach outperforms the EDMD method, the multi-layer perception (MLP) method, and the Extreme Learning Machines-based EDMD (ELM-EDMD) method in terms of modeling performance. Finally, we design a linear MPC with Deep EDMD (DE-MPC) for realizing reference tracking and test the controller in the CarSim environment.

[23] 2007.02232

An Integer Approximation Method for Discrete Sinusoidal Transforms

Approximate methods have been considered as a means to the evaluation of discrete transforms. In this work, we propose and analyze a class of integer transforms for the discrete Fourier, Hartley, and cosine transforms (DFT, DHT, and DCT), based on simple dyadic rational approximation methods. The introduced method is general, applicable to several block-lengths, whereas existing approaches are usually dedicated to specific transform sizes. The suggested approximate transforms enjoy low multiplicative complexity and the orthogonality property is achievable via matrix polar decomposition. We show that the obtained transforms are competitive with archived methods in literature. New 8-point square wave approximate transforms for the DFT, DHT, and DCT are also introduced as particular cases of the introduced methodology.

[24] 2007.02246

Blind Inverse Gamma Correction with Maximized Differential Entropy

Unwanted nonlinear gamma distortion frequently occurs in a great diversity of images during the procedures of image acquisition, processing, and/or display. And the gamma distortion often varies with capture setup change and luminance variation. Blind inverse gamma correction, which automatically determines a proper restoration gamma value from a given image, is of paramount importance to attenuate the distortion. For blind inverse gamma correction, an adaptive gamma transformation method (AGT-ME) is proposed directly from a maximized differential entropy model. And the corresponding optimization has a mathematical concise closed-form solution, resulting in efficient implementation and accurate gamma restoration of AGT-ME. Considering the human eye has a non-linear perception sensitivity, a modified version AGT-ME-VISUAL is also proposed to achieve better visual performance. Tested on variable datasets, AGT-ME could obtain an accurate estimation of a large range of gamma distortion (0.1 to 3.0), outperforming the state-of-the-art methods. Besides, the proposed AGT-ME and AGT-ME-VISUAL were applied to three typical applications, including automatic gamma adjustment, natural/medical image contrast enhancement, and fringe projection profilometry image restoration. Furthermore, the AGT-ME/ AGT-ME-VISUAL is general and can be seamlessly extended to the masked image, multi-channel (color or spectrum) image, or multi-frame video, and free of the arbitrary tuning parameter. Besides, the corresponding Python code ( is also provided for interested users.

[25] 2007.02252

Spatial-Angular Attention Network for Light Field Reconstruction

Learning-based light field reconstruction methods demand in constructing a large receptive field by deepening the network to capture correspondences between input views. In this paper, we propose a spatial-angular attention network to perceive correspondences in the light field non-locally, and reconstruction high angular resolution light field in an end-to-end manner. Motivated by the non-local attention mechanism, a spatial-angular attention module specifically for the high-dimensional light field data is introduced to compute the responses from all the positions in the epipolar plane for each pixel in the light field, and generate an attention map that captures correspondences along the angular dimension. We then propose a multi-scale reconstruction structure to efficiently implement the non-local attention in the low spatial scale, while also preserving the high frequency components in the high spatial scales. Extensive experiments demonstrate the superior performance of the proposed spatial-angular attention network for reconstructing sparsely-sampled light fields with non-Lambertian effects.

[26] 2007.02271

Temporal Logic Trees for Model Checking and Control Synthesis of Uncertain Discrete-time Systems

We propose algorithms for performing model checking and control synthesis for discrete-time uncertain systems under linear temporal logic (LTL) specifications. We construct temporal logic trees (TLT) from LTL formulae via reachability analysis. In contrast to automaton-based methods, the construction of the TLT is abstraction-free for infinite systems, that is, we do not construct discrete abstractions of the infinite systems. Moreover, for a given transition system and an LTL formula, we prove that there exist both a universal TLT and an existential TLT via minimal and maximal reachability analysis, respectively. We show that the universal TLT is an underapproximation for the LTL formula and the existential TLT is an overapproximation. We provide sufficient conditions and necessary conditions to verify whether a transition system satisfies an LTL formula by using the TLT approximations. As a major contribution of this work, for a controlled transition system and an LTL formula, we prove that a controlled TLT can be constructed from the LTL formula via control-dependent reachability analysis. Based on the controlled TLT, we design an online control synthesis algorithm, under which a set of feasible control inputs can be generated at each time step. We also prove that this algorithm is recursively feasible. We illustrate the proposed methods for both finite and infinite systems and highlight the generality and online scalability with two simulated examples.

[27] 2007.02294

Ultra-Wideband Antenna with MIMO Diversity for 5G Wireless Communication

An eight element, compact Ultra Wideband-Multiple Input Multiple Output (UWB-MIMO) antenna capable of providing high data rates for future Fifth Generation (5G) terminal equipments along with the provision of necessary bandwidth for Third Generation (3G) and Fourth Generation (4G) communications that accomplishes band rejection from 4.85 to 6.35 GHz by deploying a Inductor Capacitor (LC) stub on the ground plane is presented. The incorporated stub also provides flexibility to reject any selected band as well as bandwidth control. The orthogonal placement of the printed monopoles permits polarization diversity and provides high isolation. In the proposed eight element UWB-MIMO/diversity antenna, monopole pair 3-4 are 180o mirrored transform of monopole pair 1-2 which lie on the opposite corners of a planar 50 x 50 mm2 substrate. Four additional monopoles are then placed perpendicularly to the same board leading to a total size of 50 x 50 x 25 mm3 only. The simulated results are validated by comparing the measurements of a fabricated prototype. It was concluded that the design meets the target specifications over the entire bandwidth of 2 to 12 GHz with a reflection coefficient better than -10 dB (except the rejected band), isolation more than 17 dB, low envelope correlation, low gain variation, stable radiation pattern, and strong rejection of the signals in the Wireless Local Area Network (WLAN) band. Overall, compact and reduced complexity of the proposed eight element architecture, strengthens its practical viability for the diversity applications in future 5G terminal equipments amongst other MIMO antennas designs present in the literature.

[28] 2007.02358

Contour-based Bone Axis Detection for X-Ray Guided Surgery on the Knee

The anatomical axis of long bones is an important reference line for guiding fracture reduction and assisting in the correct placement of guide pins, screws, and implants in orthopedics and trauma surgery. This study investigates an automatic approach for detection of such axes on X-ray images based on the segmentation contour of the bone. For this purpose, we use the medically established two-line method and translate it into a learning-based approach. The proposed method is evaluated on 38 clinical test images of the femoral and tibial bone and achieves a median angulation error of 0.19{\deg} and 0.33{\deg} respectively. An inter-rater study with three trauma surgery experts confirms reliability of the method and recommends further clinical application.

[29] 2007.02361

Self-supervised Depth Estimation to Regularise Semantic Segmentation in Knee Arthroscopy

Intra-operative automatic semantic segmentation of knee joint structures can assist surgeons during knee arthroscopy in terms of situational awareness. However, due to poor imaging conditions (e.g., low texture, overexposure, etc.), automatic semantic segmentation is a challenging scenario, which justifies the scarce literature on this topic. In this paper, we propose a novel self-supervised monocular depth estimation to regularise the training of the semantic segmentation in knee arthroscopy. To further regularise the depth estimation, we propose the use of clean training images captured by the stereo arthroscope of routine objects (presenting none of the poor imaging conditions and with rich texture information) to pre-train the model. We fine-tune such model to produce both the semantic segmentation and self-supervised monocular depth using stereo arthroscopic images taken from inside the knee. Using a data set containing 3868 arthroscopic images captured during cadaveric knee arthroscopy with semantic segmentation annotations, 2000 stereo image pairs of cadaveric knee arthroscopy, and 2150 stereo image pairs of routine objects, we show that our semantic segmentation regularised by self-supervised depth estimation produces a more accurate segmentation than a state-of-the-art semantic segmentation approach modeled exclusively with semantic segmentation annotation.

[30] 2007.02367

GanglionNet: Objectively Assess the Density and Distribution of Ganglion Cells With NABLA-N Network

Hirschsprungs disease (HD) is a birth defect which is diagnosed and managed by multiple medical specialties such as pediatric gastroenterology, surgery, radiology, and pathology. HD is characterized by absence of ganglion cells in the distal intestinal tract with a gradual normalization of ganglion cell numbers in adjacent upstream bowel, termed as the transition zone (TZ). Definitive surgical management to remove the abnormal bowel requires accurate assessment of ganglion cell density in histological sections from the TZ, which is difficult, time-consuming and prone to operator error. We present an automated method to detect and count immunostained ganglion cells using a new NABLA_N network based deep learning (DL) approach, called GanglionNet. The morphological image analysis methods are applied for refinement of the regions for counting of the cells and define ganglia regions (a set of ganglion cells) from the predicted masks. The proposed model is trained with single point annotated samples by the expert pathologist. The GanglionNet is tested on ten completely new High Power Field (HPF) images with dimension of 2560x1920 pixels and the outputs are compared against the manual counting results by the expert pathologist. The proposed method shows a robust 97.49% detection accuracy for ganglion cells, when compared to counts by the expert pathologist, which demonstrates the robustness of GanglionNet. The proposed DL based ganglion cell detection and counting method will simplify and standardize TZ diagnosis for HD patients.

[31] 2007.02438

DepthNet: Real-Time LiDAR Point Cloud Depth Completion for Autonomous Vehicles

Autonomous vehicles rely heavily on sensors such as camera and LiDAR, which provide real-time information about their surroundings for the tasks of perception, planning and control. Typically a LiDAR can only provide sparse point cloud owing to a limited number of scanning lines. By employing depth completion, a dense depth map can be generated by assigning each camera pixel a corresponding depth value. However, the existing depth completion convolutional neural networks are very complex that requires high-end GPUs for processing, and thus they are not applicable to real-time autonomous driving. In this paper, a light-weight network is proposed for the task of LiDAR point cloud depth completion. With an astonishing 96.2% reduction in the number of parameters, it still achieves comparable performance (9.3% better in MAE but 3.9% worse in RMSE) to the state-of-the-art network. For real-time embedded platforms, depthwise separable technique is applied to both convolution and deconvolution operations and the number of parameters decreases further by a factor of 7.3, with only a small percentage increase in RMSE and MAE performance. Moreover, a system-on-chip architecture for depth completion is developed on a PYNQ-based FPGA platform that achieves real-time processing for HDL-64E LiDAR at the speed 11.1 frame per second.

[32] 2007.02441

GAN-based Hyperspectral Anomaly Detection

In this paper, we propose a generative adversarial network (GAN)-based hyperspectral anomaly detection algorithm. In the proposed algorithm, we train a GAN model to generate a synthetic background image which is close to the original background image as much as possible. By subtracting the synthetic image from the original one, we are able to remove the background from the hyperspectral image. Anomaly detection is performed by applying Reed-Xiaoli (RX) anomaly detector (AD) on the spectral difference image. In the experimental part, we compare our proposed method with the classical RX, Weighted-RX (WRX) and support vector data description (SVDD)-based anomaly detectors and deep autoencoder anomaly detection (DAEAD) method on synthetic and real hyperspectral images. The detection results show that our proposed algorithm outperforms the other methods in the benchmark.

[33] 2007.02457

Using Capsule Neural Network to predict Tuberculosis in lens-free microscopic images

Tuberculosis, caused by a bacteria called Mycobacterium tuberculosis, is one of the most serious public health problems worldwide. This work seeks to facilitate and automate the prediction of tuberculosis by the MODS method and using lens-free microscopy, which is easy to use by untrained personnel. We employ the CapsNet architecture in our collected dataset and show that it has a better accuracy than traditional CNN architectures.

[34] 2007.02462

Compressible Latent-Space Invertible Networks for Generative Model-Constrained Image Reconstruction

There remains an important need for the development of image reconstruction methods that can produce diagnostically useful images from undersampled measurements. In magnetic resonance imaging (MRI), for example, such methods can facilitate reductions in data-acquisition times. Deep learning-based methods hold potential for learning object priors or constraints that can serve to mitigate the effects of data-incompleteness on image reconstruction. One line of emerging research involves formulating an optimization-based reconstruction method in the latent space of a generative deep neural network. However, when generative adversarial networks (GANs) are employed, such methods can result in image reconstruction errors if the sought-after solution does not reside within the range of the GAN. To circumvent this problem, in this work, a framework for reconstructing images from incomplete measurements is proposed that is formulated in the latent space of invertible neural network-based generative models. A novel regularization strategy is introduced that takes advantage of the multiscale architecture of certain invertible neural networks, which can result in improved reconstruction performance over classical methods in terms of traditional metrics. The proposed method is investigated for reconstructing images from undersampled MRI data. The method is shown to achieve comparable performance to a state-of-the-art generative model-based reconstruction method while benefiting from a deterministic reconstruction procedure and easier control over regularization parameters.

[35] 2007.02466

Hybrid RF/VLC Systems: A Comprehensive Survey on Network Topologies, Performance Analyses, Applications, and Future Directions

Wireless communications refer to data transmissions in unguided propagation media through the use of wireless carriers such as radio frequency (RF) and visible light (VL) waves. The rising demand for high data rates, especially, in indoor scenarios, overloads conventional RF technologies. Therefore, technologies such as millimeter waves (mmWave) and cognitive radios have been adopted as possible solutions to overcome the spectrum scarcity and capacity limitations of the conventional RF systems. In parallel, visible light communication (VLC) has been proposed as an alternative solution, where a light source is used for both illumination and data transmission. In comparison to RF links, VLC links present a very high bandwidth that allows much higher data rates. VLC exhibits also immunity to interference from electromagnetic sources, has unlicensed channels, is a very low power consumption system, and has no health hazard. VLC is appealing for a wide range of applications including reliable communications with low latency such as vehicle safety communication. Despite the major advantages of VLC technology and a variety of its applications, its use has been hampered by its cons such as its dependence on a line of sight connectivity. Recently, hybrid RF/VLC systems were proposed to take advantage of the high capacity of VLC links and better connectivity of RF links. Thus, hybrid RF/VLC systems are envisioned as a key enabler to improve the user rates and mobility on the one hand and to optimize the capacity, interference, and power consumption of the overall network on the other hand. This paper seeks to provide a detailed survey of hybrid RF/VLC systems. This paper represents an overview of the current developments in the hybrid RF/VLC systems, their benefits, and limitations for both newcomers and expert researchers.

[36] 2007.02471

Can Un-trained Neural Networks Compete with Trained Neural Networks at Image Reconstruction?

Convolutional Neural Networks (CNNs) are highly effective for image reconstruction problems. Typically, CNNs are trained on large amounts of training images. Recently, however, un-trained neural networks such as the Deep Image Prior and Deep Decoder have achieved excellent image reconstruction performance for standard image reconstruction problems such as image denoising and image inpainting, without using any training data. This success raises the question whether un-trained neural networks can compete with trained ones for practical imaging tasks. To address this question, we consider accelerated magnetic resonance imaging (MRI), an important medical imaging problem, which has received significant attention from the deep-learning community, and for which a dedicated training set exists. We study and optimize un-trained architectures, and as a result, propose a variation of the architectures of the deep image prior and deep decoder. We show that the resulting convolutional decoder out-performs other un-trained methods and---most importantly---achieves on-par performance with a standard trained baseline, the U-net, on the FastMRI dataset, a new dataset for benchmarking deep learning based reconstruction methods. Besides achieving on-par reconstruction performance relative to trained methods, we demonstrate that a key advantage over trained methods is robustness to out-of-distribution examples.

[37] 2007.02480

ResNeXt and Res2Net Structure for Speaker Verification

ResNet-based architecture has been widely adopted as the speaker embedding extractor in speaker verification system. Its standard topology and modularized design ease the human efforts on hyper parameter tuning. Therefore, width and depth are left as two major dimensions to further improve ResNet's representation power. However, simply increasing width or depth is not efficient. In this paper, we investigate the effectiveness of two new structures, i.e., ResNeXt and Res2Net, for speaker verification task. They introduce another two effective dimensions to improve model's representation capacity, called cardinality and scale, respectively. Experimental results on VoxCeleb data demonstrated increasing these two dimensions is more efficient than going deeper or wider. Experiments on two internal test sets with mismatched acoustic conditions also proved the generalization of ResNeXt and Res2Net architecture. Particularly, with Res2Net structure, our best model achieved state-of-the-art performance on VoxCeleb1 test set by reducing the EER by 18.5% relative. In addition, our system's modeling power for short utterances has been largely improved as a result of Res2Net module's multi-scale feature representation ability.

[38] 2007.02482

Automatic semantic segmentation for prediction of tuberculosis using lens-free microscopy images

Tuberculosis (TB), caused by a germ called Mycobacterium tuberculosis, is one of the most serious public health problems in Peru and the world. The development of this project seeks to facilitate and automate the diagnosis of tuberculosis by the MODS method and using lens-free microscopy, due they are easier to calibrate and easier to use (by untrained personnel) in comparison with lens microscopy. Thus, we employ a U-Net network in our collected dataset to perform the automatic segmentation of the TB cords in order to predict tuberculosis. Our initial results show promising evidence for automatic segmentation of TB cords.

[39] 2007.02494

Data Based Linearization: Least-Squares Based Approximation

Linearization of power flow is an important topic in power system analysis. The computational burden can be greatly reduced under the linear power flow model while the model error is the main concern. Therefore, various linear power flow models have been proposed in literature and dedicated to seek the optimal approximation. Most linear power flow models are based on some kind of transformation/simplification/Taylor expansion of AC power flow equations and fail to be accurate under cold-start mode. It is surprising that data-based linearization methods have not yet been fully investigated. In this paper, the performance of a data-based least-squares approximation method is investigated. The resulted cold-start sensitive factors are named as least-squares distribution factors (LSDF). Compared with the traditional power transfer distribution factors (PTDF), it is found that the LSDF can work very well for systems with large load variation, and the average error of LSDF is only about 1% of the average error of PTDF. Comprehensive numerical testing is performed and the results show that LSDF has attractive performance in all studied cases and has great application potential in occasions requiring only cold-start linear power flow models.

[40] 2007.02565

$S^2$-$cGAN$: Self-Supervised Adversarial Representation Learning for Binary Change Detection in Multispectral Images

Deep Neural Networks have recently demonstrated promising performance in binary change detection (CD) problems in remote sensing (RS), requiring a large amount of labeled multitemporal training samples. Since collecting such data is time-consuming and costly, most of the existing methods rely on pre-trained networks on publicly available computer vision (CV) datasets. However, because of the differences in image characteristics in CV and RS, this approach limits the performance of the existing CD methods. To address this problem, we propose a self-supervised conditional Generative Adversarial Network ($S^2$-$cGAN$). The proposed $S^2$-$cGAN$ is trained to generate only the distribution of unchanged samples. To this end, the proposed method consists of two main steps: 1) Generating a reconstructed version of the input image as an unchanged image 2) Learning the distribution of unchanged samples through an adversarial game. Unlike the existing GAN based methods (which only use the discriminator during the adversarial training to supervise the generator), the $S^2$-$cGAN$ directly exploits the discriminator likelihood to solve the binary CD task. Experimental results show the effectiveness of the proposed $S^2$-$cGAN$ when compared to the state of the art CD methods.

[41] 2007.02593

Data-Driven Multi-Objective Controller Optimization for a Magnetically-Levitated Nanopositioning System

The performance achieved with traditional model-based control system design approaches typically relies heavily upon accurate modeling of the motion dynamics. However, modeling the true dynamics of present-day increasingly complex systems can be an extremely challenging task; and the usually necessary practical approximations often render the automation system to operate in a non-optimal condition. This problem can be greatly aggravated in the case of a multi-axis magnetically-levitated nanopositioning system where the fully floating behavior and multi-axis coupling make extremely accurate identification of the motion dynamics largely impossible. On the other hand, in many related industrial automation applications, e.g., the scanning process with the maglev system, repetitive motions are involved which could generate a large amount of motion data under non-optimal conditions. These motion data essentially contain rich information; therefore, the possibility exists to develop an intelligent automation system to learn from these motion data and to drive the system to operate towards optimality in a data-driven manner. Along this line then, this paper proposes a data-driven controller optimization approach that learns from the past non-optimal motion data to iteratively improve the motion control performance. Specifically, a novel data-driven multi-objective optimization approach is proposed that is able to automatically estimate the gradient and Hessian purely based on the measured motion data; the multi-objective cost function is suitably designed to take into account both smooth and accurate trajectory tracking. Experiments are then conducted on the maglev nanopositioning system to demonstrate the effectiveness of the proposed method, and the results show rather clearly the practical appeal of our methodology for related complex robotic systems with no accurate model available.

[42] 2007.02606

A Convolutional Approach to Vertebrae Detection and Labelling in Whole Spine MRI

We propose a novel convolutional method for the detection and identification of vertebrae in whole spine MRIs. This involves using a learnt vector field to group detected vertebrae corners together into individual vertebral bodies and convolutional image-to-image translation followed by beam search to label vertebral levels in a self-consistent manner. The method can be applied without modification to lumbar, cervical and thoracic-only scans across a range of different MR sequences. The resulting system achieves 98.1% detection rate and 96.5% identification rate on a challenging clinical dataset of whole spine scans and matches or exceeds the performance of previous systems on lumbar-only scans. Finally, we demonstrate the clinical applicability of this method, using it for automated scoliosis detection in both lumbar and whole spine MR scans.

[43] 2007.02663

An Elastic Interaction-Based Loss Function for Medical Image Segmentation

Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in the deep segmentation task are pixel-wise loss functions. This results in a bottleneck for these models to achieve high precision for complicated structures in biomedical images. For example, the predicted small blood vessels in retinal images are often disconnected or even missed under the supervision of the pixel-wise losses. This paper addresses this problem by introducing a long-range elastic interaction-based training strategy. In this strategy, convolutional neural network (CNN) learns the target region under the guidance of the elastic interaction energy between the boundary of the predicted region and that of the actual object. Under the supervision of the proposed loss, the boundary of the predicted region is attracted strongly by the object boundary and tends to stay connected. Experimental results show that our method is able to achieve considerable improvements compared to commonly used pixel-wise loss functions (cross entropy and dice Loss) and other recent loss functions on three retinal vessel segmentation datasets, DRIVE, STARE and CHASEDB1.

[44] 2007.02675

Towards Distributed Accommodation of Covert Attacks in Interconnected Systems

The problem of mitigating maliciously injected signals in interconnected systems is dealt with in this paper. We consider the class of covert attacks, as they are stealthy and cannot be detected by conventional means in centralized settings. Distributed architectures can be leveraged for revealing such stealthy attacks by exploiting communication and local model knowledge. We show how such detection schemes can be improved to estimate the action of an attacker and we propose an accommodation scheme in order to mitigate or neutralize abnormal behavior of a system under attack.

[45] 2007.02676

Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

Audio captioning is the task of automatically creating a textual description for the contents of a general audio signal. Typical audio captioning methods rely on deep neural networks (DNNs), where the target of the DNN is to map the input audio sequence to an output sequence of words, i.e. the caption. Though, the length of the textual description is considerably less than the length of the audio signal, for example 10 words versus some thousands of audio feature vectors. This clearly indicates that an output word corresponds to multiple input feature vectors. In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence. We employ a sequence-to-sequence method, which uses a fixed-length vector as an output from the encoder, and we apply temporal sub-sampling between the RNNs of the encoder. We evaluate the benefit of our approach by employing the freely available dataset Clotho and we evaluate the impact of different factors of temporal sub-sampling. Our results show an improvement to all considered metrics.

[46] 2007.02681

Large-scale Analysis and Simulation of Traffic Flow using Markov Models

Modeling and simulating movement of vehicles in established transportation infrastructures, especially in large urban road networks is an important task. It helps with understanding and handling traffic problems, optimizing traffic regulations and adapting the traffic management in real time for unexpected disaster events. A mathematically rigorous stochastic model that can be used for traffic analysis was proposed earlier by other researchers which is based on an interplay between graph and Markov chain theories. This model provides a transition probability matrix which describes the traffic's dynamic with its unique stationary distribution of the vehicles on the road network. In this paper, a new parametrization is presented for this model by introducing the concept of two-dimensional stationary distribution which can handle the traffic's dynamic together with the vehicles' distribution. In addition, the weighted least squares estimation method is applied for estimating this new parameter matrix using trajectory data. In a case study, we apply our method on the Taxi Trajectory Prediction dataset and road network data from the OpenStreetMap project, both available publicly. To test our approach, we have implemented the proposed model in software. We have run simulations in medium and large scales and both the model and estimation procedure, based on artificial and real datasets, have been proved satisfactory. In a real application, we have unfolded a stationary distribution on the map graph of Porto, based on the dataset. The approach described here combines techniques whose use together to analyze traffic on large road networks has not previously been reported.

[47] 2007.02683

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

Recent approaches for music source separation are almost exclusively based on deep neural networks, mostly employing recurrent neural networks (RNNs). Although RNNs are in many cases superior than other types of deep neural networks for sequence processing, they are known to have specific difficulties in training and parallelization, especially for the typically long sequences encountered in music source separation. In this paper we present a use-case of replacing RNNs with depth-wise separable (DWS) convolutions, which are a lightweight and faster variant of the typical convolutions. We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs). We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance, by utilizing the standard metrics of signal-to-artifacts, signal-to-interference, and signal-to-distortion ratio. Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.

[48] 2007.02703

Self-Triggered Output-Feedback Control of LTI Systems Subject to Disturbances and Noise

Self-triggered control (STC) and periodic event-triggered control (PETC) are aperiodic sampling techniques aiming at reducing control data communication when compared to periodic sampling. In both techniques, the effects of measurement noise in continuous-time systems with output feedback are unaddressed. In this work we prove that additive noise does not hinder stability of output-feedback PETC of linear time-invariant (LTI) systems. Then we build an STC strategy that estimates PETC's worst-case triggering times. To accomplish this, we use set-based methods, more specifically ellipsoidal sets, which describe uncertainties on state, disturbances and noise. Ellipsoidal reachability is thus used to predict worst-case triggering condition violations, ultimately determining the next communication time. The ellipsoidal state estimate is recursively updated using Guaranteed State Estimation (GSE) methods. The proposed STC is designed to be computationally tractable at the expense of some added conservatism. It is expected to be a practical STC implementation for a broad range of applications.

[49] 2007.02711

Perceptually Optimizing Deep Image Compression

Mean squared error (MSE) and $\ell_p$ norms have largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess visual information loss, these simple norms are not highly consistent with human perception. Here, we propose a different proxy approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, which mimics the perceptual model while serving as a loss layer of the network.We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of a modern deep image compression models, we are able to demonstrate an averaged bitrate reduction of $28.7\%$ over MSE optimization, given a specified perceptual quality (VMAF) level.

[50] 2007.02761

A New Model-Free Method for MIMO Systems and Discussion on Model-Free or Model-Based

Current model-free adaptive control (MFAC) can hardly deal with the time delay problem in multiple-input multiple-output (MIMO) systems. To solve this problem, a novel model-free adaptive predictive control (MFAPC) method is proposed. Compared to the current MFAC, i) the proposed method is based on a kind of prediction model which derives from the equivalent-dynamic-linearization model (EDLM) modified with stochastic disturbance; ii) the previous assumptions and applicable range of MFAPC are extended wider. The leading coefficient of the control input vector in system description is no more restricted in the diagonally dominant square matrix; iii) the performance analysis and the issue of how to choose the matrix {\lambda} are completed by an easy manner that is analyzing the function of the closed-loop poles, while both problems may not be realized by the previous contraction mapping method.

[51] 2007.02764

Information Theoretic Data Injection Attacks with Sparsity Constraints

Information theoretic sparse attacks that minimize simultaneously the information obtained by the operator and the probability of detection are studied in a Bayesian state estimation setting. The attack construction is formulated as an optimization problem that aims to minimize the mutual information between the state variables and the observations while guaranteeing the stealth of the attack. Stealth is described in terms of the Kullback-Leibler (KL) divergence between the distributions of the observations under attack and without attack. To overcome the difficulty posed by the combinatorial nature of a sparse attack construction, the attack case in which only one sensor is compromised is analytically solved first. The insight generated in this case is then used to propose a greedy algorithm that constructs random sparse attacks. The performance of the proposed attack is evaluated in the IEEE 30 Bus Test Case.

[52] 2007.02790

Adversarial Uni- and Multi-modal Stream Networks for Multimodal Image Registration

Deformable image registration between Computed Tomography (CT) images and Magnetic Resonance (MR) imaging is essential for many image-guided therapies. In this paper, we propose a novel translation-based unsupervised deformable image registration method. Distinct from other translation-based methods that attempt to convert the multimodal problem (e.g., CT-to-MR) into a unimodal problem (e.g., MR-to-MR) via image-to-image translation, our method leverages the deformation fields estimated from both: (i) the translated MR image and (ii) the original CT image in a dual-stream fashion, and automatically learns how to fuse them to achieve better registration performance. The multimodal registration network can be effectively trained by computationally efficient similarity metrics without any ground-truth deformation. Our method has been evaluated on two clinical datasets and demonstrates promising results compared to state-of-the-art traditional and learning-based methods.

[53] 2007.02818

Framework for Studying Stability of Switching Max-Plus Linear Systems

We propose a framework for studying the stability of discrete-event systems modelled as switching max-plus linear systems. In this framework, we propose a set of notions of stability for generic discrete-event systems in the max-plus algebra. Then we show the loss of equivalence of these notions for switching max-plus linear systems due to the lack of global monotonicity and the accompanying difficulty in rigorous analysis. This serves as a motivation to relax the assumption on monotonicity of the dynamics to positive invariance of max-plus cones. Then we proceed to generalise the notions of stability when the dynamics is restricted to such cones. The stability analysis approach presented in this paper serves as a first step to study the stability of a general class of switching max-plus linear systems.

[54] 2007.02906

Compact representation of temporal processes in echosounder time series via matrix decomposition

Echosounders are high-frequency sonar systems widely used to observe mid-trophic level animals in the ocean. The recent deluge of echosounder data from diverse ocean observing platforms has created unprecedented opportunities to study the marine ecosystems at broad scales. However, there is a critical lack of methods capable of automatic and adaptive extraction of ecologically relevant spatio-temporal structures from echosounder observation, limiting effective and wider use of these rich datasets in marine ecological research. Here we present a data-driven methodology based on matrix decomposition that builds a compact representation of long-term echosounder time series using intrinsic features in the data, and demonstrate its utility by analyzing an example multi-frequency dataset from the northeast Pacific Ocean. We show that Principal Component Pursuit (PCP) successfully removes noise interference from the data, and that a temporally smooth Nonnegative Matrix Factorization (tsNMF) automatically discovers a small number of distinct daily echogram patterns, whose time-varying linear combination (activation) reconstructs the dominant structures in the original time series. This low-rank representation is more tractable and interpretable than the original time series. It is also suitable for visualization and systematic analysis with other ocean variables such as currents. Unlike existing echo analysis methods that rely on fixed, handcrafted rules, the data-driven and thus adaptable nature of our methodology is well-suited for analyzing data collected from unfamiliar ecosystems or ecosystems undergoing rapid changes in the changing climate. Future developments and applications based on this work will catalyze advancements in marine ecology by providing robust time series analytics for large-scale, acoustics-based biological observation in the ocean.

[55] 2007.02928

Multiperiod Stochastic Peak Shaving Using Storage

We present an online stochastic model predictive control framework for demand charge management for a grid-connected consumer with attached electrical energy storage. The consumer we consider must satisfy an inflexible but stochastic electricity demand, and also receives a stochastic electricity inflow. The optimization problem formulated solves a stochastic cost minimization problem, with given weather forecast scenarios converted into forecast demand and inflow. We introduce a novel weighting scheme to account for cases where the optimization horizon spans multiple demand charge periods. The optimization scheme is tested in a setting with building demand and photovoltaic array inflow data from a real office building. The simulation study allows us to compare various design and modeling alternatives, ultimately proposing a policy based on causal affine decision rules.

[56] 2007.01857

Deep Learning Models for Visual Inspection on Automotive Assembling Line

Automotive manufacturing assembly tasks are built upon visual inspections such as scratch identification on machined surfaces, part identification and selection, etc, which guarantee product and process quality. These tasks can be related to more than one type of vehicle that is produced within the same manufacturing line. Visual inspection was essentially human-led but has recently been supplemented by the artificial perception provided by computer vision systems (CVSs). Despite their relevance, the accuracy of CVSs varies accordingly to environmental settings such as lighting, enclosure and quality of image acquisition. These issues entail costly solutions and override part of the benefits introduced by computer vision systems, mainly when it interferes with the operating cycle time of the factory. In this sense, this paper proposes the use of deep learning-based methodologies to assist in visual inspection tasks while leaving very little footprints in the manufacturing environment and exploring it as an end-to-end tool to ease CVSs setup. The proposed approach is illustrated by four proofs of concept in a real automotive assembly line based on models for object detection, semantic segmentation, and anomaly detection.

[57] 2007.01867

TLIO: Tight Learned Inertial Odometry

In this work we propose a tightly-coupled Extended Kalman Filter framework for IMU-only state estimation. Strap-down IMU measurements provide relative state estimates based on IMU kinematic motion model. However the integration of measurements is sensitive to sensor bias and noise, causing significant drift within seconds. Recent research by Yan et al. (RoNIN) and Chen et al. (IONet) showed the capability of using trained neural networks to obtain accurate 2D displacement estimates from segments of IMU data and obtained good position estimates from concatenating them. This paper demonstrates a network that regresses 3D displacement estimates and its uncertainty, giving us the ability to tightly fuse the relative state measurement into a stochastic cloning EKF to solve for pose, velocity and sensor biases. We show that our network, trained with pedestrian data from a headset, can produce statistically consistent measurement and uncertainty to be used as the update step in the filter, and the tightly-coupled system outperforms velocity integration approaches in position estimates, and AHRS attitude filter in orientation estimates.

[58] 2007.01921

Human-Robot Team Coordination with Dynamic and Latent Human Task Proficiencies: Scheduling with Learning Curves

As robots become ubiquitous in the workforce, it is essential that human-robot collaboration be both intuitive and adaptive. A robot's quality improves based on its ability to explicitly reason about the time-varying (i.e. learning curves) and stochastic capabilities of its human counterparts, and adjust the joint workload to improve efficiency while factoring human preferences. We introduce a novel resource coordination algorithm that enables robots to explore the relative strengths and learning abilities of their human teammates, by constructing schedules that are robust to stochastic and time-varying human task performance. We first validate our algorithmic approach using data we collected from a user study (n = 20), showing we can quickly generate and evaluate a robust schedule while discovering the latest individual worker proficiency. Second, we conduct a between-subjects experiment (n = 90) to validate the efficacy of our coordinating algorithm. Results from the human-subjects experiment indicate that scheduling strategies favoring exploration tend to be beneficial for human-robot collaboration as it improves team fluency (p = 0.0438), while also maximizing team efficiency (p < 0.001).

[59] 2007.01926

Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control

Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, these approaches fail to handle the case when coordinates are embedded in high-dimensional data such as images. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.

[60] 2007.01929

A Coupled Manifold Optimization Framework to Jointly Model the Functional Connectomics and Behavioral Data Spaces

The problem of linking functional connectomics to behavior is extremely challenging due to the complex interactions between the two distinct, but related, data domains. We propose a coupled manifold optimization framework which projects fMRI data onto a low dimensional matrix manifold common to the cohort. The patient specific loadings simultaneously map onto a behavioral measure of interest via a second, non-linear, manifold. By leveraging the kernel trick, we can optimize over a potentially infinite dimensional space without explicitly computing the embeddings. As opposed to conventional manifold learning, which assumes a fixed input representation, our framework directly optimizes for embedding directions that predict behavior. Our optimization algorithm combines proximal gradient descent with the trust region method, which has good convergence guarantees. We validate our framework on resting state fMRI from fifty-eight patients with Autism Spectrum Disorder using three distinct measures of clinical severity. Our method outperforms traditional representation learning techniques in a cross validated setting, thus demonstrating the predictive power of our coupled objective.

[61] 2007.01931

A Deep-Generative Hybrid Model to Integrate Multimodal and Dynamic Connectivity for Predicting Spectrum-Level Deficits in Autism

We propose an integrated deep-generative framework, that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract predictive biomarkers of a disease. The generative part of our framework is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networks and time varying patient-specific loadings. This matrix factorization is guided by the DTI tractography matrices to learn anatomically informed connectivity profiles. The deep part of our framework is an LSTM-ANN block, which models the temporal evolution of the patient sr-DDL loadings to predict multidimensional clinical severity. Our coupled optimization procedure collectively estimates the basis networks, the patient-specific dynamic loadings, and the neural network weights. We validate our framework on a multi-score prediction task in 57 patients diagnosed with Autism Spectrum Disorder (ASD). Our hybrid model outperforms state-of-the-art baselines in a five-fold cross validated setting and extracts interpretable multimodal neural signatures of brain dysfunction in ASD.

[62] 2007.01940

Feedback Neural Network based Super-resolution of DEM for generating high fidelity features

High resolution Digital Elevation Models(DEMs) are an important requirement for many applications like modelling water flow, landslides, avalanches etc. Yet publicly available DEMs have low resolution for most parts of the world. Despite tremendous success in image super resolution task using deep learning solutions, there are very few works that have used these powerful systems on DEMs to generate HRDEMs. Motivated from feedback neural networks, we propose a novel neural network architecture that learns to add high frequency details iteratively to low resolution DEM, turning it into a high resolution DEM without compromising its fidelity. Our experiments confirm that without any additional modality such as aerial images(RGB), our network DSRFB achieves RMSEs of 0.59 to 1.27 across 4 different datasets.

[63] 2007.01950

Ultra-high spatial resolution BOLD fMRI in humans using combined segmented-accelerated VFA-FLEET with a recursive RF pulse design

Purpose To alleviate the spatial encoding limitations of single-shot EPI by developing multi-shot segmented EPI for ultra-high-resolution fMRI with reduced ghosting artifacts from subject motion and respiration. Methods Segmented EPI can reduce readout duration and reduce acceleration factors, however, the time elapsed between segment acquisitions (on the order of seconds) can result in intermittent ghosting, limiting its use for fMRI. Here, "FLEET" segment ordering--where segments are looped over before slices--was combined with a variable flip angle progression (VFA-FLEET) to improve inter-segment fidelity and maximize signal for fMRI. Scaling a sinc pulse's flip angle for each segment (VFA-FLEET-Sinc) produced inconsistent slice profiles and ghosting, therefore, a recursive Shinnar-Le Roux (SLR) RF pulse design was developed (VFA-FLEET-SLR) to generate unique pulses for every segment that together produce consistent slice profiles and signals. Results The temporal stability of VFA-FLEET-SLR was compared against conventional-segmented EPI and VFA-FLEET-Sinc at 3 T and 7 T. VFA-FLEET-SLR showed reductions in both intermittent and stable ghosting compared to conventional-segmented and VFA-FLEET-Sinc, resulting in improved image quality with a minor trade-off in temporal SNR. Combining VFA-FLEET-SLR with acceleration, we achieved a 0.6-mm isotropic acquisition at 7 T--without zoomed imaging or partial Fourier--demonstrating reliable detection of BOLD responses to a visual stimulus. To counteract the increased repetition time from segmentation, simultaneous multi-slice VFA-FLEET-SLR was demonstrated using RF-encoded controlled aliasing. Conclusions VFA-FLEET with a recursive RF pulse design supports acquisitions with low levels of artifact and spatial blur, enabling fMRI at previously inaccessible spatial resolutions with a "full-brain" field of view.

[64] 2007.02017

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Further, our method can be naturally combined with channel pruning for better computation cost allocation. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.

[65] 2007.02022

SAXSDOG: open software for real-time azimuthal integration of 2D scattering images

In-situ small- and wide-angle scattering experiments at synchrotrons often result in massive amounts of data within seconds only. Especially during such beamtimes, processing of the acquired data online, so without mentionable delay, is key to obtain feedback on failure or success of the experiment. We thus developed SAXSDOG, a python based environment for real-time azimuthal integration of large-area scattering-images. The software is primarily designed for dedicated data-pipelines: once a scattering image is transferred from the detector onto the storage-unit, it is automatically integrated and pre-evaluated using integral parameters within milliseconds. The control and configuration of the underlying server-based processes is done via a graphical user interface SAXSLEASH, which visualizes the resulting 1D data together with integral classifiers in real time. SAXSDOG further includes a portable 'take-home' version for users that runs on standalone computers, enabling its use in labs or at the preferred workspace.

[66] 2007.02025

Robust Prediction of Punctuation and Truecasingfor Medical ASR

Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or "exclamation point", while truecasing enhances user readability and improves the performance of downstream NLP tasks. This paper proposes a conditional joint modeling framework for prediction of punctuation and truecasing using pretrained masked language models such as BERT, BioBERT and RoBERTa. We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data. Finally, we improve the robustness of the model against common errors made in ASR by performing data augmentation. Experiments performed on dictation and conversational style corpora show that our proposed model achieves ~5% absolute improvement on ground truth text and ~10% improvement on ASR outputs over baseline models under F1 metric.

[67] 2007.02042

Single Image Brightening via Multi-Scale Exposure Fusion with Hybrid Learning

A small ISO and a small exposure time are usually used to capture an image in the back or low light conditions which results in an image with negligible motion blur and small noise but look dark. In this paper, a single image brightening algorithm is introduced to brighten such an image. The proposed algorithm includes a unique hybrid learning framework to generate two virtual images with large exposure times. The virtual images are first generated via intensity mapping functions (IMFs) which are computed using camera response functions (CRFs) and this is a model-driven approach. Both the virtual images are then enhanced by using a data-driven approach, i.e. a residual convolutional neural network to approach the ground truth images. The model-driven approach and the data-driven one compensate each other in the proposed hybrid learning framework. The final brightened image is obtained by fusing the original image and two virtual images via a multi-scale exposure fusion algorithm with properly defined weights. Experimental results show that the proposed brightening algorithm outperforms existing algorithms in terms of the MEF-SSIM metric.

[68] 2007.02067

Sensor-Based Control for Collaborative Robots: Fundamentals, Challenges and Opportunities

The objective of this paper is to present a systematic review of existing sensor-based control methodologies for applications that involve direct interaction between humans and robots, in the form of either physical collaboration or safe coexistence. To this end, we first introduce the basic formulation of the sensor-servo problem, then present the most common approaches: vision-based, touch-based, audio-based, and distance-based control. Afterwards, we discuss and formalize the methods that integrate heterogeneous sensors at the control level. The surveyed body of literature is classified according to the type of sensor, to the way multiple measurements are combined, and to the target objectives and applications. Finally, we discuss open problems, potential applications, and future research directions.

[69] 2007.02091

Semantic Segmentation Using Deep Learning to Extract Total Extraocular Muscles and Optic Nerve from Orbital Computed Tomography Images

Objectives: Precise segmentation of total extraocular muscles (EOM) and optic nerve (ON) is essential to assess anatomical development and progression of thyroid-associated ophthalmopathy (TAO). We aim to develop a semantic segmentation method based on deep learning to extract the total EOM and ON from orbital CT images in patients with suspected TAO. Materials and Methods: A total of 7,879 images obtained from 97 subjects who underwent orbit CT scans due to suspected TAO were enrolled in this study. Eighty-eight patients were randomly selected into the training/validation dataset, and the rest were put into the test dataset. Contours of the total EOM and ON in all the patients were manually delineated by experienced radiologists as the ground truth. A three-dimensional (3D) end-to-end fully convolutional neural network called semantic V-net (SV-net) was developed for our segmentation task. Intersection over Union (IoU) was measured to evaluate the accuracy of the segmentation results, and Pearson correlation analysis was used to evaluate the volumes measured from our segmentation results against those from the ground truth. Results: Our model in the test dataset achieved an overall IoU of 0.8207; the IoU was 0.7599 for the superior rectus muscle, 0.8183 for the lateral rectus muscle, 0.8481 for the medial rectus muscle, 0.8436 for the inferior rectus muscle and 0.8337 for the optic nerve. The volumes measured from our segmentation results agreed well with those from the ground truth (all R>0.98, P<0.0001). Conclusion: The qualitative and quantitative evaluations demonstrate excellent performance of our method in automatically extracting the total EOM and ON and measuring their volumes in orbital CT images. There is a great promise for clinical application to assess these anatomical structures for the diagnosis and prognosis of TAO.

[70] 2007.02106

Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey

Automatic Target Recognition (ATR) for military applications is one of the core processes towards enhancing intelligencer and autonomously operating military platforms. Spurred by this and given that Synthetic Aperture Radar (SAR) presents several advantages over its counterpart data domains, this paper surveys and assesses current SAR ATR architectures that employ the most popular dataset for the SAR domain, namely the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset. Based on the current methodology trends, we propose a taxonomy for the SAR ATR architectures, along with a direct comparison of the strengths and weaknesses of each method under both standard and extended operational conditions. Additionally, despite MSTAR being the standard SAR ATR benchmarking dataset we also highlight its weaknesses and suggest future research directions.

[71] 2007.02126

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.

[72] 2007.02147

A Dynamized Power Flow Method based on Differential Transformation

This paper proposes a novel method for solving and tracing power flow solutions with changes of a loading parameter. Different from the conventional continuation power flow method, which repeatedly solves static AC power flow equations, the proposed method extends the power flow model into a fictitious dynamic system by adding a differential equation on the loading parameter. As a result, the original solution curve tracing problem is converted to solving the time domain trajectories of the reformulated dynamic system. A non-iterative algorithm based on differential transformation is proposed to analytically solve the aforementioned dynamized model in form of power series of time. This paper proves that the nonlinear power flow equations in the time domain are converted to formally linear equations in the domain of the power series order after the differential transformation, thus avoiding numerical iterations. Case studies on several test systems including a 2383-bus system show the merits of the proposed method.

[73] 2007.02190

BézierSketch: A generative model for scalable vector sketches

The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B\'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B\'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.

[74] 2007.02191

Coded Distributed Computing with Partial Recovery

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed distributed computation scheme with partial recovery in terms of the trade-off between the computation accuracy and latency.

[75] 2007.02200

Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches

We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches with respect to a given anchor, both in online and offline mining. While many works focus solely on how to select the triplets online (batch-wise), we also study the effect of extreme distances and neighbor patches before training in an offline fashion. We analyze the impacts of extreme cases for offline versus online mining, including easy positive, batch semi-hard, and batch hard triplet mining as well as the neighborhood component analysis loss, its proxy version, and distance weighted sampling. We also investigate online approaches based on extreme distance and comprehensively compare the performance of offline and online mining based on the data patterns and explain offline mining as a tractable generalization of the online mining with large mini-batch size. As well, we discuss the relations of different colorectal tissue types in terms of extreme distances. We found that offline mining can generate a better statistical representation of the population by working on the whole dataset.

[76] 2007.02248

CIDMP: Completely Interpretable Detection of Malaria Parasite in Red Blood Cells using Lower-dimensional Feature Space

Predicting if red blood cells (RBC) are infected with the malaria parasite is an important problem in Pathology. Recently, supervised machine learning approaches have been used for this problem, and they have had reasonable success. In particular, state-of-the-art methods such as Convolutional Neural Networks automatically extract increasingly complex feature hierarchies from the image pixels. While such generalized automatic feature extraction methods have significantly reduced the burden of feature engineering in many domains, for niche tasks such as the one we consider in this paper, they result in two major problems. First, they use a very large number of features (that may or may not be relevant) and therefore training such models is computationally expensive. Further, more importantly, the large feature-space makes it very hard to interpret which features are truly important for predictions. Thus, a criticism of such methods is that learning algorithms pose opaque black boxes to its users, in this case, medical experts. The recommendation of such algorithms can be understood easily, but the reason for their recommendation is not clear. This is the problem of non-interpretability of the model, and the best-performing algorithms are usually the least interpretable. To address these issues, in this paper, we propose an approach to extract a very small number of aggregated features that are easy to interpret and compute, and empirically show that we obtain high prediction accuracy even with a significantly reduced feature-space.

[77] 2007.02277

Weakly Supervised Domain Adaptation for Built-up Region Segmentation in Aerial and Satellite Imagery

This paper proposes a novel domain adaptation algorithm to handle the challenges posed by the satellite and aerial imagery, and demonstrates its effectiveness on the built-up region segmentation problem. Built-up area estimation is an important component in understanding the human impact on the environment, the effect of public policy, and general urban population analysis. The diverse nature of aerial and satellite imagery and lack of labeled data covering this diversity makes machine learning algorithms difficult to generalize for such tasks, especially across multiple domains. On the other hand, due to the lack of strong spatial context and structure, in comparison to the ground imagery, the application of existing unsupervised domain adaptation methods results in the sub-optimal adaptation. We thoroughly study the limitations of existing domain adaptation methods and propose a weakly-supervised adaptation strategy where we assume image-level labels are available for the target domain. More specifically, we design a built-up area segmentation network (as encoder-decoder), with an image classification head added to guide the adaptation. The devised system is able to address the problem of visual differences in multiple satellite and aerial imagery datasets, ranging from high resolution (HR) to very high resolution (VHR). A realistic and challenging HR dataset is created by hand-tagging the 73.4 sq-km of Rwanda, capturing a variety of build-up structures over different terrain. The developed dataset is spatially rich compared to existing datasets and covers diverse built-up scenarios including built-up areas in forests and deserts, mud houses, tin, and colored rooftops. Extensive experiments are performed by adapting from the single-source domain, to segment out the target domain. We achieve high gains ranging 11.6%-52% in IoU over the existing state-of-the-art methods.

[78] 2007.02397

High space-bandwidth in quantitative phase imaging using partially spatially coherent optical coherence microscopy and deep neural network

Quantitative phase microscopy (QPM) is a label-free technique that enables to monitor morphological changes at subcellular level. The performance of the QPM system in terms of spatial sensitivity and resolution depends on the coherence properties of the light source and the numerical aperture (NA) of objective lenses. Here, we propose high space-bandwidth QPM using partially spatially coherent optical coherence microscopy (PSC-OCM) assisted with deep neural network. The PSC source synthesized to improve the spatial sensitivity of the reconstructed phase map from the interferometric images. Further, compatible generative adversarial network (GAN) is used and trained with paired low-resolution (LR) and high-resolution (HR) datasets acquired from PSC-OCM system. The training of the network is performed on two different types of samples i.e. mostly homogenous human red blood cells (RBC) and on highly heterogenous macrophages. The performance is evaluated by predicting the HR images from the datasets captured with low NA lens and compared with the actual HR phase images. An improvement of 9 times in space-bandwidth product is demonstrated for both RBC and macrophages datasets. We believe that the PSC-OCM+GAN approach would be applicable in single-shot label free tissue imaging, disease classification and other high-resolution tomography applications by utilizing the longitudinal spatial coherence properties of the light source.

[79] 2007.02532

ModeNet: Mode Selection Network For Learned Video Coding

In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.

[80] 2007.02534

Tensor Convolutional Sparse Coding with Low-Rank activations, an application to EEG analysis

Recently, there has been growing interest in the analysis of spectrograms of ElectroEncephaloGram (EEG), particularly to study the neural correlates of (un)-consciousness during General Anesthesia (GA). Indeed, it has been shown that order three tensors (channels x frequencies x times) are a natural and useful representation of these signals. However this encoding entails significant difficulties, especially for convolutional sparse coding (CSC) as existing methods do not take advantage of the particularities of tensor representation, such as rank structures, and are vulnerable to the high level of noise and perturbations that are inherent to EEG during medical acts. To address this issue, in this paper we introduce a new CSC model, named Kruskal CSC (K-CSC), that uses the Kruskal decomposition of the activation tensors to leverage the intrinsic low rank nature of these representations in order to extract relevant and interpretable encodings. Our main contribution, TC-FISTA, uses multiple tools to efficiently solve the resulting optimization problem despite the increasing complexity induced by the tensor representation. We then evaluate TC-FISTA on both synthetic dataset and real EEG recorded during GA. The results show that TC-FISTA is robust to noise and perturbations, resulting in accurate, sparse and interpretable encoding of the signals.

[81] 2007.02578

Learning Graph-Convolutional Representations for Point Cloud Denoising

Point clouds are an increasingly relevant data type but they are often corrupted by noise. We propose a deep neural network based on graph-convolutional layers that can elegantly deal with the permutation-invariance problem encountered by learning-based point cloud processing methods. The network is fully-convolutional and can build complex hierarchies of features by dynamically constructing neighborhood graphs from similarity among the high-dimensional feature representations of the points. When coupled with a loss promoting proximity to the ideal surface, the proposed approach significantly outperforms state-of-the-art methods on a variety of metrics. In particular, it is able to improve in terms of Chamfer measure and of quality of the surface normals that can be estimated from the denoised data. We also show that it is especially robust both at high noise levels and in presence of structured noise such as the one encountered in real LiDAR scans.

[82] 2007.02684

On the Influence of Ageing on Face Morph Attacks: Vulnerability and Detection

Face morphing attacks have raised critical concerns as they demonstrate a new vulnerability of Face Recognition Systems (FRS), which are widely deployed in border control applications. The face morphing process uses the images from multiple data subjects and performs an image blending operation to generate a morphed image of high quality. The generated morphed image exhibits similar visual characteristics corresponding to the biometric characteristics of the data subjects that contributed to the composite image and thus making it difficult for both humans and FRS, to detect such attacks. In this paper, we report a systematic investigation on the vulnerability of the Commercial-Off-The-Shelf (COTS) FRS when morphed images under the influence of ageing are presented. To this extent, we have introduced a new morphed face dataset with ageing derived from the publicly available MORPH II face dataset, which we refer to as MorphAge dataset. The dataset has two bins based on age intervals, the first bin - MorphAge-I dataset has 1002 unique data subjects with the age variation of 1 year to 2 years while the MorphAge-II dataset consists of 516 data subjects whose age intervals are from 2 years to 5 years. To effectively evaluate the vulnerability for morphing attacks, we also introduce a new evaluation metric, namely the Fully Mated Morphed Presentation Match Rate (FMMPMR), to quantify the vulnerability effectively in a realistic scenario. Extensive experiments are carried out by using two different COTS FRS (COTS I - Cognitec and COTS II - Neurotechnology) to quantify the vulnerability with ageing. Further, we also evaluate five different Morph Attack Detection (MAD) techniques to benchmark their detection performance with ageing.

[83] 2007.02721

Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On

The 2D virtual try-on task has recently attracted a great interest from the research community, for its direct potential applications in online shopping as well as for its inherent and non-addressed scientific challenges. This task requires fitting an in-shop cloth image on the image of a person, which is highly challenging because it involves cloth warping, image compositing, and synthesizing. Casting virtual try-on into a supervised task faces a difficulty: available datasets are composed of pairs of pictures (cloth, person wearing the cloth). Thus, we have no access to ground-truth when the cloth on the person changes. State-of-the-art models solve this by masking the cloth information on the person with both a human parser and a pose estimator. Then, image synthesis modules are trained to reconstruct the person image from the masked person image and the cloth image. This procedure has several caveats: firstly, human parsers are prone to errors; secondly, it is a costly pre-processing step, which also has to be applied at inference time; finally, it makes the task harder than it is since the mask covers information that should be kept such as hands or accessories. In this paper, we propose a novel student-teacher paradigm where the teacher is trained in the standard way (reconstruction) before guiding the student to focus on the initial task (changing the cloth). The student additionally learns from an adversarial loss, which pushes it to follow the distribution of the real images. Consequently, the student exploits information that is masked to the teacher. A student trained without the adversarial loss would not use this information. Also, getting rid of both human parser and pose estimator at inference time allows obtaining a real-time virtual try-on.

[84] 2007.02780

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances

In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation. We build upon a previously proposed method for learning representations of time-domain music signals with a re-parameterized denoising autoencoder, extending it by using the family of Sinkhorn distances with entropic regularization. We evaluate our method on the freely available MUSDB18 dataset of professionally produced music recordings, and our results show that Sinkhorn distances with small strength of entropic regularization are marginally improving the performance of informed singing voice separation. By increasing the strength of the entropic regularization, the learned representations of the mixture signal consists of almost perfectly additive and distinctly structured sources.

[85] 2007.02811

Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method

Automated human action recognition is one of the most attractive and practical research fields in computer vision, in spite of its high computational costs. In such systems, the human action labelling is based on the appearance and patterns of the motions in the video sequences; however, the conventional methodologies and classic neural networks cannot use temporal information for action recognition prediction in the upcoming frames in a video sequence. On the other hand, the computational cost of the preprocessing stage is high. In this paper, we address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. Furthermore, we extract the key features of the representative frame rather than the entire features. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. The combination of a CNN and the LSTM recursive network is considered for feature selection and maintaining the previous information, and finally, a Softmax-KNN classifier is used for labelling human activities. We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short. To evaluate the proposed method, we use the UCF dataset for the benchmarking which is widely-used among researchers in action recognition research. The dataset includes 101 complicated activities in the wild. Experimental results show a significant improvement in terms of accuracy and speed in comparison with six state-of-the-art articles.

[86] 2007.02895

Coronary Heart Disease Diagnosis Based on Improved Ensemble Learning

Accurate diagnosis is required before performing proper treatments for coronary heart disease. Machine learning based approaches have been proposed by many researchers to improve the accuracy of coronary heart disease diagnosis. Ensemble learning and cascade generalization are among the methods which can be used to improve the generalization ability of learning algorithm. The objective of this study is to develop heart disease diagnosis method based on ensemble learning and cascade generalization. Cascade generalization method with loose coupling strategy is proposed in this study. C4. 5 and RIPPER algorithm were used as meta-level algorithm and Naive Bayes was used as baselevel algorithm. Bagging and Random Subspace were evaluated for constructing the ensemble. The hybrid cascade ensemble methods are compared with the learning algorithms in non-ensemble mode and non-cascade mode. The methods are also compared with Rotation Forest. Based on the evaluation result, the hybrid cascade ensemble method demonstrated the best result for the given heart disease diagnosis case. Accuracy and diversity evaluation was performed to analyze the impact of the cascade strategy. Based on the result, the accuracy of the classifiers in the ensemble is increased but the diversity is decreased.