New articles on cs


[1] 2008.02815

IEEE 802.11be: Wi-Fi 7 Strikes Back

As hordes of data-hungry devices challenge its current capabilities, Wi-Fi strikes back with 802.11be, alias Wi-Fi 7. This brand-new amendment promises a (r)evolution of unlicensed wireless connectivity as we know it. With its standardisation process being consolidated, we provide an updated digest of 802.11be essential features, vouching for multi-AP coordination as a must-have for critical and latency-sensitive applications. We then get down to the nitty-gritty of one of its most enticing implementations-coordinated beamforming-, for which our standard-compliant simulations confirm near-tenfold reductions in worst-case delays.


[2] 2008.02834

Integration of 3D Knowledge for On-Board UAV Visual Tracking

Visual tracking from an unmanned aerial vehicle (UAV) poses challenges such as occlusions or background clutter. In order to achieve more robust on-board UAV visual tracking, a pipeline combining information extracted from a visual tracker and a sparse 3D reconstruction of the static environment is introduced. The 3D reconstruction is based on an image-based structure-from-motion (SfM) component and thus allows to utilize a state estimator in a pseudo-3D space. Thereby improved handling of occlusion situations and background clutter is realized. Evaluation is done on prototypical image sequences captured from a UAV with low-altitude oblique views. The experimental results demonstrate the benefit of the proposed approach compared to only relying on visual cues or using a state estimation in the image space.


[3] 2008.02837

aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning

We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task, the consistency between nested spans, repetitions, and labels from similar spans in training. We achieved sizable improvements over baseline fine-tuned RoBERTa models, and the official evaluation ranked our system 3rd (almost tied with the 2nd) out of 36 teams on the span identification subtask with an F1 score of 0.491, and 2nd (almost tied with the 1st) out of 31 teams on the technique classification subtask with an F1 score of 0.62.


[4] 2008.02839

Learned convex regularizers for inverse problems

We consider the variational reconstruction framework for inverse problems and propose to learn a data-adaptive input-convex neural network (ICNN) as the regularization functional. The ICNN-based convex regularizer is trained adversarially to discern ground-truth images from unregularized reconstructions. Convexity of the regularizer is attractive since (i) one can establish analytical convergence guarantees for the corresponding variational reconstruction problem and (ii) devise efficient and provable algorithms for reconstruction. In particular, we show that the optimal solution to the variational problem converges to the ground-truth if the penalty parameter decays sub-linearly with respect to the norm of the noise. Further, we prove the existence of a subgradient-based algorithm that leads to monotonically decreasing error in the parameter space with iterations. To demonstrate the performance of our approach for solving inverse problems, we consider the tasks of deblurring natural images and reconstructing images in computed tomography (CT), and show that the proposed convex regularizer is at least competitive with and sometimes superior to state-of-the-art data-driven techniques for inverse problems.


[5] 2008.02840

Assisted Perception: Optimizing Observations to Communicate State

We aim to help users estimate the state of the world in tasks like robotic teleoperation and navigation with visual impairments, where users may have systematic biases that lead to suboptimal behavior: they might struggle to process observations from multiple sensors simultaneously, receive delayed observations, or overestimate distances to obstacles. While we cannot directly change the user's internal beliefs or their internal state estimation process, our insight is that we can still assist them by modifying the user's observations. Instead of showing the user their true observations, we synthesize new observations that lead to more accurate internal state estimates when processed by the user. We refer to this method as assistive state estimation (ASE): an automated assistant uses the true observations to infer the state of the world, then generates a modified observation for the user to consume (e.g., through an augmented reality interface), and optimizes the modification to induce the user's new beliefs to match the assistant's current beliefs. We evaluate ASE in a user study with 12 participants who each perform four tasks: two tasks with known user biases -- bandwidth-limited image classification and a driving video game with observation delay -- and two with unknown biases that our method has to learn -- guided 2D navigation and a lunar lander teleoperation video game. A different assistance strategy emerges in each domain, such as quickly revealing informative pixels to speed up image classification, using a dynamics model to undo observation delay in driving, identifying nearby landmarks for navigation, and exaggerating a visual indicator of tilt in the lander game. The results show that ASE substantially improves the task performance of users with bandwidth constraints, observation delay, and other unknown biases.


[6] 2008.02846

Motion Planning and Control for On-Orbit Assembly using LQR-RRT* and Nonlinear MPC

Deploying large, complex space structures is of great interest to the modern scientific world as it can provide new capabilities in obtaining scientific, communicative, and observational information. However, many theoretical mission designs contain complexities that must be constrained by the requirements of the launch vehicle, such as volume and mass. To mitigate such constraints, the use of on-orbit additive manufacturing and robotic assembly allows for the flexibility of building large complex structures including telescopes, space stations, and communication satellites. The contribution of this work is to develop motion planning and control algorithms using the linear quadratic regulator and rapidly-exploring randomized trees (LQR-RRT*), path smoothing, and tracking the trajectory using a closed-loop nonlinear receding horizon control optimizer for a robotic Astrobee free-flyer. By obtaining controlled trajectories that consider obstacle avoidance and dynamics of the vehicle and manipulator, the free-flyer rapidly considers and plans the construction of space structures. The approach is a natural generalization to repairing, refueling, and re-provisioning space structure components while providing optimal collision-free trajectories during operation.


[7] 2008.02848

Grid-aware Distributed Model Predictive Control of Heterogeneous Resources in a Distribution Network: Theory and Experimental Validation

In this paper, we propose and experimentally validate a scheduling and control framework for distributed energy resources (DERs) that achieves to track a day-ahead dispatch plan of a distribution network hosting controllable and stochastic heterogeneous resources while respecting the local grid constraints on nodal voltages and lines ampacities. The framework consists of two algorithmic layers. In the first one (day-ahead scheduling), we determine an aggregated dispatch plan. In the second layer (real-time control), a distributed model predictive control (MPC) determines the active and reactive power set-points of the DERs so that their aggregated contribution tracks the dispatch plan while obeying to DERs operational constraints as well as the grids ones. The proposed framework is experimentally validated on a real-scale microgrid that reproduces the network specifications of the CIGRE microgrid benchmark system.


[8] 2008.02849

A Multiperiod Workforce Scheduling and Routing Problem with Dependent Tasks

In this paper, we study a new Workforce Scheduling and Routing Problem, denoted Multiperiod Workforce Scheduling and Routing Problem with Dependent Tasks. In this problem, customers request services from a company. Each service is composed of dependent tasks, which are executed by teams of varying skills along one or more days. Tasks belonging to a service may be executed by different teams, and customers may be visited more than once a day, as long as precedences are not violated. The objective is to schedule and route teams so that the makespan is minimized, i.e., all services are completed in the minimum number of days. In order to solve this problem, we propose a Mixed-Integer Programming model, a constructive algorithm and heuristic algorithms based on the Ant Colony Optimization (ACO) metaheuristic. The presence of precedence constraints makes it difficult to develop efficient local search algorithms. This motivates the choice of the ACO metaheuristic, which is effective in guiding the construction process towards good solutions. Computational results show that the model is capable of consistently solving problems with up to about 20 customers and 60 tasks. In most cases, the best performing ACO algorithm was able to match the best solution provided by the model in a fraction of its computational time.


[9] 2008.02851

A no-phone/no-app contact tracing hardware token

We report the development of an open-source, hardware-based contact tracer, made from readily available parts, costing less than $20USD. This work was motivated by the need for a technology-assisted contact tracer that avoids privacy issues found with those involving a mobile phone. Contact tracing is done here without the use of a mobile phone or an app at all. Instead, contact tracing is implemented using Bluetooth Low Energy on an ESP32 micro-controller. The ESP32 is used to both advertise and receive health information to others in close proximity, forming a strictly peer-to-peer contact tracer. The contact tracer can be assembled by an individual and configured use within minutes.


[10] 2008.02857

Bisimulation and bisimilarity for fuzzy description logics under the Gödel semantics

Description logics (DLs) are a suitable formalism for representing knowledge about domains in which objects are described not only by attributes but also by binary relations between objects. Fuzzy extensions of DLs can be used for such domains when data and knowledge about them are vague and imprecise. One of the possible ways to specify classes of objects in such domains is to use concepts in fuzzy DLs. As DLs are variants of modal logics, indiscernibility in DLs is characterized by bisimilarity. The bisimilarity relation of an interpretation is the largest auto-bisimulation of that interpretation. In DLs and their fuzzy extensions, such equivalence relations can be used for concept learning. In this paper, we define and study fuzzy bisimulation and bisimilarity for fuzzy DLs under the G\"odel semantics, as well as crisp bisimulation and strong bisimilarity for such logics extended with involutive negation. The considered logics are fuzzy extensions of the DL $\mathcal{ALC}_{reg}$ (a variant of PDL) with additional features among inverse roles, nominals, (qualified or unqualified) number restrictions, the universal role, local reflexivity of a role and involutive negation. We formulate and prove results on invariance of concepts under fuzzy (resp. crisp) bisimulation, conditional invariance of fuzzy TBoxex/ABoxes under bisimilarity (resp. strong bisimilarity), and the Hennessy-Milner property of fuzzy (resp. crisp) bisimulation for fuzzy DLs without (resp. with) involutive negation under the G\"odel semantics. Apart from these fundamental results, we also provide results on using fuzzy bisimulation to separate the expressive powers of fuzzy DLs, as well as results on using strong bisimilarity to minimize fuzzy interpretations.


[11] 2008.02858

Semantic Complexity in End-to-End Spoken Language Understanding

End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied. We introduce empirical measures of dataset semantic complexity to quantify the difficulty of the SLU tasks. We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets that have low semantic complexity values. We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease. Our results show that it is important to contextualize an STI model's performance with the complexity values of its training dataset to reveal the scope of its applicability.


[12] 2008.02862

Data-driven reduced-order models via regularized operator inference for a single-injector combustion process

This paper derives predictive reduced-order models for rocket engine combustion dynamics via a scientific machine learning approach, based on Operator Inference, that blends data-driven learning with physics-based modeling. The non-intrusive nature of the approach enables variable transformations that expose system structure. The specific contribution of this paper is to advance the formulation robustness and algorithmic scalability of the approach. Regularization is introduced to the formulation to avoid over-fitting. The task of determining an optimal regularization is posed as an optimization problem that balances training error and stability of long-time integration dynamics. A scalable algorithm and open-source implementation are presented, then demonstrated for a single-injector rocket combustion example. This example exhibits rich dynamics that are difficult to capture with state-of-the-art reduced models. With appropriate regularization and an informed selection of learning variables, the reduced-order models exhibit high accuracy in re-predicting the training regime and acceptable accuracy in predicting future dynamics, while achieving close to a million times speedup in computational cost. When compared to a state-of-the-art model reduction method, the Operator Inference models provide the same or better accuracy at approximately one thousandth of the computational cost.


[13] 2008.02865

Numerical Methods for a Diffusive Class Nonlocal Operators

In this paper we develop a numerical scheme based on quadratures to approximate solutions of integro-differential equations involving convolution kernels, $\nu$, of diffusive type. In particular, we assume $\nu$ is symmetric and exponentially decaying at infinity. We consider problems posed in bounded domains and in $\R$. In the case of bounded domains with nonlocal Dirichlet boundary conditions, we show the convergence of the scheme for kernels that have positive tails, but that can take on negative values. When the equations are posed on all of $\R$, we show that our scheme converges for nonnegative kernels. Since nonlocal Neumann boundary conditions lead to an equivalent formulation as in the unbounded case, we show that these last results also apply to the Neumann problem.


[14] 2008.02866

Improving Explainability of Image Classification in Scenarios with Class Overlap: Application to COVID-19 and Pneumonia

Trust in predictions made by machine learning models is increased if the model generalizes well on previously unseen samples and when inference is accompanied by cogent explanations of the reasoning behind predictions. In the image classification domain, generalization can also be assessed through accuracy, sensitivity, and specificity, and one measure to assess explainability is how well the model localizes the object of interest within an image. However, in multi-class settings, both generalization and explanation through localization are degraded when available training data contains features with significant overlap between classes. We propose a method to enhance explainability of image classification through better localization by mitigating the model uncertainty induced by class overlap. Our technique performs discriminative localization on images that contain features with significant class overlap, without explicitly training for localization. Our method is particularly promising in real-world class overlap scenarios, such as COVID19 vs pneumonia, where expertly labeled data for localization is not available. This can be useful for early, rapid, and trustworthy screening for COVID-19.


[15] 2008.02867

Efficient multiscale algorithms for simulating nonlocal optical response of metallic nanostructure arrays

In this paper, we consider numerical simulations of the nonlocal optical response of metallic nanostructure arrays inside a dielectric host, which is of particular interest to the nanoplasmonics community due to many unusual properties and potential applications. Mathematically, it is described by Maxwell's equations with discontinuous coefficients coupled with a set of Helmholtz-type equations defined only on the domains of metallic nanostructures. To solve this challenging problem, we develop an efficient multiscale method consisting of three steps. First, we extend the system into the domain occupied by the dielectric medium in a novel way and result in a coupled system with rapidly oscillating coefficients. A rigorous analysis of the error between the solutions of the original system and the extended system is given. Second, we derive the homogenized system and define the multiscale approximate solution for the extended system by using the multiscale asymptotic method. Third, to fix the inaccuracy of the multiscale asymptotic method inside the metallic nanostructures, we solve the original system in each metallic nanostructure separately with boundary conditions given by the multiscale approximate solution. A fast algorithm based on the $LU$ decomposition is proposed for solving the resulting linear systems. By applying the multiscale method, we obtain the results that are in good agreement with those obtained by solving the original system directly at a much lower computational cost. Numerical examples are provided to validate the efficiency and accuracy of the proposed method.


[16] 2008.02868

Performance of Underwater Wireless Optical Communications in Presents of Cascaded Mixture Exponential-Generalized Gamma Turbulence

Underwater wireless optical communication is one of the critical technologies for buoy-based high-speed cross-sea surface communication, where the communication nodes are vertically deployed. Due to the vertically inhomogeneous nature of the underwater environment, seawater is usually vertically divided into multiple layers with different parameters that reflect the real environment. In this work, we consider a generalized UWOC channel model that contains$N$ layers. To capture the effects of air bubbles and temperature gradients on channel statistics, we model each layer by a mixture Exponential-Generalized Gamma(EGG) distribution. We derive the PDF and CDF of the end-to-end SNR in exact closed-form. Then, unified BER and outage expressions using OOK and BPSK are also derived. The performance and behavior of common vertical underwater optical communication scenarios are thoroughly analyzed through the appropriate selection of parameters. All the derived expressions are verified via Monte Carlo simulations.


[17] 2008.02870

NewsTweet: A Dataset of Social Media Embedding in Online Journalism

The inclusion of social media posts---tweets, in particular---in digital news stories, both as commentary and increasingly as news sources, has become commonplace in recent years. In order to study this phenomenon with sufficient depth, robust large-scale data collection from both news publishers and social media platforms is necessary. This work describes the construction of such a data pipeline. In the data collected from Google News, 13% of all stories were found to include embedded tweets, with sports and entertainment news containing the largest volumes of them. Public figures and celebrities are found to dominate these stories; however, relatively unknown users have also been found to achieve newsworthiness. The collected data set, NewsTweet, and the associated pipeline for acquisition stand to engender a wave of new inquiries into social content embedding from multiple research communities.


[18] 2008.02871

Fatigue Assessment using ECG and Actigraphy Sensors

Fatigue is one of the key factors in the loss of work efficiency and health-related quality of life, and most fatigue assessment methods were based on self-reporting, which may suffer from many factors such as recall bias. To address this issue, we developed an automated system using wearable sensing and machine learning techniques for objective fatigue assessment. ECG/Actigraphy data were collected from subjects in free-living environments. Preprocessing and feature engineering methods were applied, before interpretable solution and deep learning solution were introduced. Specifically, for interpretable solution, we proposed a feature selection approach which can select less correlated and high informative features for better understanding system's decision-making process. For deep learning solution, we used state-of-the-art self-attention model, based on which we further proposed a consistency self-attention (CSA) mechanism for fatigue assessment. Extensive experiments were conducted, and very promising results were achieved.


[19] 2008.02872

Quality Classification of Defective Parts from Injection Moulding

This report examines machine learning algorithms for detecting short forming and weaving in plastic parts produced by injection moulding. Transfer learning was implemented by using pretrained models and finetuning them on our dataset of 494 samples of 150 by 150 pixels images. The models tested were Xception, InceptionV3 and Resnet-50. Xception showed the highest overall accuracy (86.66%), followed by InceptionV3 (82.47%) and Resnet-50 (80.41%). Short forming was the easiest fault to identify, with the highest F1 score for each model.


[20] 2008.02874

Investigating Coordinated 'Social' Targeting of High-Profile Twitter Accounts

Following the 2016 US presidential election, there has been an increased focus on politically-motivated manipulation of mass-user behavior on social media platforms. Since a large volume of political discussion occurs on these platforms, identifying malicious activity and coordinated campaigns is essential to ensuring a robust democratic environment. Twitter has become a critical communication channel for politicians and other public figures, enabling them to maintain a direct relationship with supporters. However, the platform has been fertile ground for large-scale malicious activity. As the 2020 U.S. presidential election approaches, we have developed tools to monitor follower dynamics of some of the most prominent Twitter users, including U.S. presidential candidates. We investigate numerous, strange phenomena, such as dramatic spike and saw-tooth waveforms on follower-count charts; cohorts of user accounts which 'circulate', i.e., re-follow high profile accounts numerous times; and other 'resurrected' accounts, which have recently re-engaged on Twitter after years of non-activity. So through various analyses in these contexts, we reveal multiple, coordinated 'social' targeting campaigns aimed at affecting the outcomes of socially critical events through the use of networks of social automations (bots), often optimizing their social capital through 'compromised' accounts, which have--unbeknownst to the greater world--been hijacked.


[21] 2008.02878

A Multilingual Neural Machine Translation Model for Biomedical Data

We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.


[22] 2008.02879

Efficient Neural Query Auto Completion

Query Auto Completion (QAC), as the starting point of information retrieval tasks, is critical to user experience. Generally it has two steps: generating completed query candidates according to query prefixes, and ranking them based on extracted features. Three major challenges are observed for a query auto completion system: (1) QAC has a strict online latency requirement. For each keystroke, results must be returned within tens of milliseconds, which poses a significant challenge in designing sophisticated language models for it. (2) For unseen queries, generated candidates are of poor quality as contextual information is not fully utilized. (3) Traditional QAC systems heavily rely on handcrafted features such as the query candidate frequency in search logs, lacking sufficient semantic understanding of the candidate. In this paper, we propose an efficient neural QAC system with effective context modeling to overcome these challenges. On the candidate generation side, this system uses as much information as possible in unseen prefixes to generate relevant candidates, increasing the recall by a large margin. On the candidate ranking side, an unnormalized language model is proposed, which effectively captures deep semantics of queries. This approach presents better ranking performance over state-of-the-art neural ranking methods and reduces $\sim$95\% latency compared to neural language modeling methods. The empirical results on public datasets show that our model achieves a good balance between accuracy and efficiency. This system is served in LinkedIn job search with significant product impact observed.


[23] 2008.02880

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

Zero-shot learning (ZSL) makes object recognition in images possible in absence of visual training data for a part of the classes from a dataset. When the number of classes is large, classes are usually represented by semantic class prototypes learned automatically from unannotated text collections. This typically leads to much lower performances than with manually designed semantic prototypes such as attributes. While most ZSL works focus on the visual aspect and reuse standard semantic prototypes learned from generic text collections, we focus on the problem of semantic class prototype design for large scale ZSL. More specifically, we investigate the use of noisy textual metadata associated to photos as text collections, as we hypothesize they are likely to provide more plausible semantic embeddings for visual classes if exploited appropriately. We thus make use of a source-based voting strategy to improve the robustness of semantic prototypes. Evaluation on the large scale ImageNet dataset shows a significant improvement in ZSL performances over two strong baselines, and over usual semantic embeddings used in previous works. We show that this improvement is obtained for several embedding methods, leading to state of the art results when one uses automatically created visual and text features.


[24] 2008.02881

Parts-Based Articulated Object Localization in Clutter Using Belief Propagation

Robots working in human environments must be able to perceive and act on challenging objects with articulations, such as a pile of tools. Articulated objects increase the dimensionality of the pose estimation problem, and partial observations under clutter create additional challenges. To address this problem, we present a generative-discriminative parts-based recognition and localization method for articulated objects in clutter. We formulate the problem of articulated object pose estimation as a Markov Random Field (MRF). Hidden nodes in this MRF express the pose of the object parts, and edges express the articulation constraints between parts. Localization is performed within the MRF using an efficient belief propagation method. The method is informed by both part segmentation heatmaps over the observation, generated by a neural network, and the articulation constraints between object parts. Our generative-discriminative approach allows the proposed method to function in cluttered environments by inferring the pose of occluded parts using hypotheses from the visible parts. We demonstrate the efficacy of our methods in a tabletop environment for recognizing and localizing hand tools in uncluttered and cluttered configurations.


[25] 2008.02883

Stronger and Faster Wasserstein Adversarial Attacks

Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality and has recently risen as a compelling alternative to the $\ell_p$ metric in adversarial attacks. However, constructing an effective attack under the Wasserstein metric is computationally much more challenging and calls for better optimization algorithms. We address this gap in two ways: (a) we develop an exact yet efficient projection operator to enable a stronger projected gradient attack; (b) we show that the Frank-Wolfe method equipped with a suitable linear minimization oracle works extremely fast under Wasserstein constraints. Our algorithms not only converge faster but also generate much stronger attacks. For instance, we decrease the accuracy of a residual network on CIFAR-10 to $3.4\%$ within a Wasserstein perturbation ball of radius $0.005$, in contrast to $65.6\%$ using the previous Wasserstein attack based on an \emph{approximate} projection operator. Furthermore, employing our stronger attacks in adversarial training significantly improves the robustness of adversarially trained models.


[26] 2008.02888

Evaluating computational models of infant phonetic learning across languages

In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infants' perceptual development.


[27] 2008.02890

Diagnosis of Autism in Children using Facial Analysis and Deep Learning

In this paper, we introduce a deep learning model to classify children as either healthy or potentially autistic with 94.6% accuracy using Deep Learning. Autistic patients struggle with social skills, repetitive behaviors, and communication, both verbal and nonverbal. Although the disease is considered to be genetic, the highest rates of accurate diagnosis occur when the child is tested on behavioral characteristics and facial features. Patients have a common pattern of distinct facial deformities, allowing researchers to analyze only an image of the child to determine if the child has the disease. While there are other techniques and models used for facial analysis and autism classification on their own, our proposal bridges these two ideas allowing classification in a cheaper, more efficient method. Our deep learning model uses MobileNet and two dense layers in order to perform feature extraction and image classification. The model is trained and tested using 3,014 images, evenly split between children with autism and children without it. 90% of the data is used for training, and 10% is used for testing. Based on our accuracy, we propose that the diagnosis of autism can be done effectively using only a picture. Additionally, there may be other diseases that are similarly diagnosable.


[28] 2008.02891

Mesh sampling and weighting for the hyperreduction of nonlinear Petrov-Galerkin reduced-order models with local reduced-order bases

The energy-conserving sampling and weighting (ECSW) method is a hyperreduction method originally developed for accelerating the performance of Galerkin projection-based reduced-order models (PROMs) associated with large-scale finite element models, when the underlying projected operators need to be frequently recomputed as in parametric and/or nonlinear problems. In this paper, this hyperreduction method is extended to Petrov-Galerkin PROMs where the underlying high-dimensional models can be associated with arbitrary finite element, finite volume, and finite difference semi-discretization methods. Its scope is also extended to cover local PROMs based on piecewise-affine approximation subspaces, such as those designed for mitigating the Kolmogorov $n$-width barrier issue associated with convection-dominated flow problems. The resulting ECSW method is shown in this paper to be robust and accurate. In particular, its offline phase is shown to be fast and parallelizable, and the potential of its online phase for large-scale applications of industrial relevance is demonstrated for turbulent flow problems with $O(10^7)$ and $O(10^8)$ degrees of freedom. For such problems, the online part of the ECSW method proposed in this paper for Petrov-Galerkin PROMs is shown to enable wall-clock time and CPU time speedup factors of several orders of magnitude while delivering exceptional accuracy.


[29] 2008.02897

Iterative Compression of End-to-End ASR Model using AutoML

Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in developing automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selection approaches. However, we show that current AutoML-based search techniques only work up to a certain compression level, beyond which they fail to produce compressed models with acceptable word error rates (WER). In this work, we propose an iterative AutoML-based LRF approach that achieves over 5x compression without degrading the WER, thereby advancing the state-of-the-art in ASR compression.


[30] 2008.02912

Predicting Visual Importance Across Graphic Design Types

This paper introduces a Unified Model of Saliency and Importance (UMSI), which learns to predict visual importance in input graphic designs, and saliency in natural images, along with a new dataset and applications. Previous methods for predicting saliency or visual importance are trained individually on specialized datasets, making them limited in application and leading to poor generalization on novel image classes, while requiring a user to know which model to apply to which input. UMSI is a deep learning-based model simultaneously trained on images from different design classes, including posters, infographics, mobile UIs, as well as natural images, and includes an automatic classification module to classify the input. This allows the model to work more effectively without requiring a user to label the input. We also introduce Imp1k, a new dataset of designs annotated with importance information. We demonstrate two new design interfaces that use importance prediction, including a tool for adjusting the relative importance of design elements, and a tool for reflowing designs to new aspect ratios while preserving visual importance. The model, code, and importance dataset are available at https://predimportance.mit.edu .


[31] 2008.02916

An Indexing Scheme and Descriptor for 3D Object Retrieval Based on Local Shape Querying

A binary descriptor indexing scheme based on Hamming distance called the Hamming tree for local shape queries is presented. A new binary clutter resistant descriptor named Quick Intersection Count Change Image (QUICCI) is also introduced. This local shape descriptor is extremely small and fast to compare. Additionally, a novel distance function called Weighted Hamming applicable to QUICCI images is proposed for retrieval applications. The effectiveness of the indexing scheme and QUICCI is demonstrated on 828 million QUICCI images derived from the SHREC2017 dataset, while the clutter resistance of QUICCI is shown using the clutterbox experiment.


[32] 2008.02918

Polysemy Deciphering Network for Robust Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is important to human-centric scene understanding tasks. Existing works tend to assume that the same verb has similar visual characteristics in different HOI categories, an approach that ignores the diverse semantic meanings of the verb. To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net) that decodes the visual polysemy of verbs for HOI detection in three distinct ways. First, we refine features for HOI detection to be polysemyaware through the use of two novel modules: namely, Language Prior-guided Channel Attention (LPCA) and Language Prior-based Feature Augmentation (LPFA). LPCA highlights important elements in human and object appearance features for each HOI category to be identified; moreover, LPFA augments human pose and spatial features for HOI detection using language priors, enabling the verb classifiers to receive language hints that reduce intra-class variation for the same verb. Second, we introduce a novel Polysemy-Aware Modal Fusion module (PAMF), which guides PD-Net to make decisions based on feature types deemed more important according to the language priors. Third, we propose to relieve the verb polysemy problem through sharing verb classifiers for semantically similar HOI categories. Furthermore, to expedite research on the verb polysemy problem, we build a new benchmark dataset named HOI-VerbPolysemy (HOIVP), which includes common verbs (predicates) that have diverse semantic meanings in the real world. Finally, through deciphering the visual polysemy of verbs, our approach is demonstrated to outperform state-of-the-art methods by significant margins on the HICO-DET, V-COCO, and HOI-VP databases. Code and data in this paper will be released at https://github.com/MuchHair/PD-Net.


[33] 2008.02919

Can Smartphone Co-locations Detect Friendship? ItDepends How You Model It

We present a study to detect friendship, its strength, and its change from smartphone location data collectedamong members of a fraternity. We extract a rich set of co-location features and build classifiers that detectfriendships and close friendship at 30% above a random baseline. We design cross-validation schema to testour model performance in specific application settings, finding it robust to seeing new dyads and to temporalvariance.


[34] 2008.02924

A Sub-linear Time Algorithm for Approximating k-Nearest-Neighbor with Full Quality Guarantee

In this paper we propose an algorithm for the approximate k-Nearest-Neighbors problem. According to the existing researches, there are two kinds of approximation criterion. One is the distance criteria, and the other is the recall criteria. All former algorithms suffer the problem that there are no theoretical guarantees for the two approximation criterion. The algorithm proposed in this paper unifies the two kinds of approximation criterion, and has full theoretical guarantees. Furthermore, the query time of the algorithm is sub-linear. As far as we know, it is the first algorithm that achieves both sub-linear query time and full theoretical approximation guarantee.


[35] 2008.02927

A Historical Account of My Early Research Interests

This paper presents a brief account of some of the my early research interests. This historical account starts from my laurea thesis on Signal Theory and my master thesis on Computation Theory. It recalls some results in Combinatory Logic and Term Rewriting Systems. Some other results concern Program Transformation, Parallel Computation, Theory of Concurrency, and Proof of Program Properties. My early research activity has been mainly done in cooperation with Andrzej Skowron, Anna Labella, and Maurizio Proietti.


[36] 2008.02928

Privacy-Preserved Collaborative Estimation for Networked Vehicles with Application to Road Anomaly Detection

Road information such as road profile and traffic density have been widely used in intelligent vehicle systems to improve road safety, ride comfort, and fuel economy. However, vehicle heterogeneity and parameter uncertainty make it extremely difficult for a single vehicle to accurately and reliably measure such information. In this work, we propose a unified framework for learning-based collaborative estimation to fuse local road estimation from a fleet of connected heterogeneous vehicles. The collaborative estimation scheme exploits the sequential measurements made by multiple vehicles traversing the same road segment and let these vehicles relay a learning signal to iteratively refine local estimations. Given that the privacy of individual vehicles' identity must be protected in collaborative estimation, we directly incorporate privacy-protection design into the collaborative estimation design and establish a unified framework for privacy-preserving collaborative estimation. Different from patching conventional privacy mechanisms like differential privacy which will compromise algorithmic accuracy or homomorphic encryption which will incur heavy communication/computational overhead, we leverage the dynamical properties of collective estimation to enable inherent privacy protection without sacrificing accuracy or significantly increasing communication/computation overhead. Numerical simulations confirm the effectiveness and efficiency of our proposed framework.


[37] 2008.02929

From Well Structured Transition Systems to Program Verification

We describe the use of the theory of WSTS for verifying programs.


[38] 2008.02930

Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval

Many recent advances in neural information retrieval models, which predict top-K items given a query, learn directly from a large training set of (query, item) pairs. However, they are often insufficient when there are many previously unseen (query, item) combinations, often referred to as the cold start problem. Furthermore, the search system can be biased towards items that are frequently shown to a query previously, also known as the 'rich get richer' (a.k.a. feedback loop) problem. In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation. In this paper, we propose a new Zero-Shot Heterogeneous Transfer Learning framework that transfers learned knowledge from the recommender system component to improve the search component of a content platform. First, it learns representations of items and their natural-language features by predicting (item, item) correlation graphs derived from the recommender system as an auxiliary task. Then, the learned representations are transferred to solve the target search retrieval task, performing query-to-item prediction without having seen any (query, item) pairs in training. We conduct online and offline experiments on one of the world's largest search and recommender systems from Google, and present the results and lessons learned. We demonstrate that the proposed approach can achieve high performance on offline search retrieval tasks, and more importantly, achieved significant improvements on relevance and user interactions over the highly-optimized production system in online experiments.


[39] 2008.02931

From Big-Step to Small-Step Semantics and Back with Interpreter Specialisation

We investigate representations of imperative programs as constrained Horn clauses. Starting from operational semantics transition rules, we proceed by writing interpreters as constrained Horn clause programs directly encoding the rules. We then specialise an interpreter with respect to a given source program to achieve a compilation of the source language to Horn clauses (an instance of the first Futamura projection). The process is described in detail for an interpreter for a subset of C, directly encoding the rules of big-step operational semantics for C. A similar translation based on small-step semantics could be carried out, but we show an approach to obtaining a small-step representation using a linear interpreter for big-step Horn clauses. This interpreter is again specialised to achieve the translation from big-step to small-step style. The linear small-step program can be transformed back to a big-step non-linear program using a third interpreter. A regular path expression is computed for the linear program using Tarjan's algorithm, and this regular expression then guides an interpreter to compute a program path. The transformation is realised by specialisation of the path interpreter. In all of the transformation phases, we use an established partial evaluator and exploit standard logic program transformation to remove redundant data structures and arguments in predicates and rename predicates to make clear their link to statements in the original source program.


[40] 2008.02932

Cons-free Programs and Complexity Classes between LOGSPACE and PTIME

Programming language concepts are used to give some new perspectives on a long-standing open problem: is logspace = ptime ?


[41] 2008.02933

Prolog for Verification, Analysis and Transformation Tools

This article examines the use of the Prolog language for writing verification, analysis and transformation tools. Guided by experience in teaching and the development of verification tools like ProB or specialisation tools like ECCE and LOGEN, the article presents an assessment of various aspects of Prolog and provides guidelines for using them. The article shows the usefulness of a few key Prolog features. In particular, it discusses how to deal with negation at the level of the object programs being verified or analysed.


[42] 2008.02934

Transformational Verification of Quicksort

Many transformation techniques developed for constraint logic programs, also known as constrained Horn clauses (CHCs), have found new useful applications in the field of program verification. In this paper, we work out a nontrivial case study through the transformation-based verification approach. We consider the familiar Quicksort program for sorting lists, written in a functional programming language, and we verify the pre/-postconditions that specify the intended correctness properties of the functions defined in the program. We verify these properties by: (1) translating them into CHCs, (2) transforming the CHCs by removing all list occurrences, and (3) checking the satisfiability of the transformed CHCs by using the Eldarica solver over booleans and integers. The transformation mentioned at Point (2) requires an extension of the algorithms for the elimination of inductively defined data structures presented in previous work, because during one stage of the transformation we use as lemmas some properties that have been proved at previous stages.


[43] 2008.02935

Generating Distributed Programs from Event-B Models

Distributed algorithms offer challenges in checking that they meet their specifications. Verification techniques can be extended to deal with the verification of safety properties of distributed algorithms. In this paper, we present an approach for combining correct-by-construction approaches and transformations of formal models (Event-B) into programs (DistAlgo) to address the design of verified distributed programs. We define a subset LB (Local Event-B) of the Event-B modelling language restricted to events modelling the classical actions of distributed programs as internal or local computations, sending messages and receiving messages. We define then transformations of the various elements of the LB language into DistAlgo programs. The general methodology consists in starting from a statement of the problem to program and then progressively producing an LB model obtained after several refinement steps of the initial LB model. The derivation of the LB model is not described in the current paper and has already been addressed in other works. The transformation of LB models into DistAlgo programs is illustrated through a simple example. The refinement process and the soundness of the transformation allow one to produce correct-by-construction distributed programs.


[44] 2008.02936

Distilling Programs to Prove Termination

The problem of determining whether or not any program terminates was shown to be undecidable by Turing, but recent advances in the area have allowed this information to be determined for a large class of programs. The classic method for deciding whether a program terminates dates back to Turing himself and involves finding a ranking function that maps a program state to a well-order, and then proving that the result of this function decreases for every possible program transition. More recent approaches to proving termination have involved moving away from the search for a single ranking function and toward a search for a set of ranking functions; this set is a choice of ranking functions and a disjunctive termination argument is used. In this paper, we describe a new technique for determining whether programs terminate. Our technique is applied to the output of the distillation program transformation that converts programs into a simplified form called distilled form. Programs in distilled form are converted into a corresponding labelled transition system and termination can be demonstrated by showing that all possible infinite traces through this labelled transition system would result in an infinite descent of well-founded data values. We demonstrate our technique on a number of examples, and compare it to previous work.


[45] 2008.02937

An Experiment Combining Specialization with Abstract Interpretation

It was previously shown that control-flow refinement can be achieved by a program specializer incorporating property-based abstraction, to improve termination and complexity analysis tools. We now show that this purpose-built specializer can be reconstructed in a more modular way, and that the previous results can be achieved using an off-the-shelf partial evaluation tool, applied to an abstract interpreter. The key feature of the abstract interpreter is the abstract domain, which is the product of the property-based abstract domain with the concrete domain. This language-independent framework provides a practical approach to implementing a variety of powerful specializers, and contributes to a stream of research on using interpreters and specialization to achieve program transformations.


[46] 2008.02938

A Deeper Look at Salient Object Detection: Bi-stream Network with a Small Training Dataset

Compared with the conventional hand-crafted approaches, the deep learning based methods have achieved tremendous performance improvements by training exquisitely crafted fancy networks over large-scale training sets. However, do we really need large-scale training set for salient object detection (SOD)? In this paper, we provide a deeper insight into the interrelationship between the SOD performances and the training sets. To alleviate the conventional demands for large-scale training data, we provide a feasible way to construct a novel small-scale training set, which only contains 4K images. Moreover, we propose a novel bi-stream network to take full advantage of our proposed small training set, which is consisted of two feature backbones with different structures, achieving complementary semantical saliency fusion via the proposed gate control unit. To our best knowledge, this is the first attempt to use a small-scale training set to outperform state-of-the-art models which are trained on large-scale training sets; nevertheless, our method can still achieve the leading state-of-the-art performance on five benchmark datasets.


[47] 2008.02939

Competition Report: CHC-COMP-20

CHC-COMP-20 is the third competition of solvers for Constrained Horn Clauses. In this year, 9 solvers participated at the competition, and were evaluated in four separate tracks on problems in linear integer arithmetic, linear real arithmetic, and arrays. The competition was run in the first week of May 2020 using the StarExec computing cluster. This report gives an overview of the competition design, explains the organisation of the competition, and presents the competition results.


[48] 2008.02940

Mean Field Game and Decentralized Intelligent Adaptive Pursuit Evasion Strategy for Massive Multi-Agent System under Uncertain Environment

In this paper, a novel decentralized intelligent adaptive optimal strategy has been developed to solve the pursuit-evasion game for massive Multi-Agent Systems (MAS) under uncertain environment. Existing strategies for pursuit-evasion games are neither efficient nor practical for large population multi-agent system due to the notorious "Curse of dimensionality" and communication limit while the agent population is large. To overcome these challenges, the emerging mean field game theory is adopted and further integrated with reinforcement learning to develop a novel decentralized intelligent adaptive strategy with a new type of adaptive dynamic programing architecture named the Actor-Critic-Mass (ACM). Through online approximating the solution of the coupled mean field equations, the developed strategy can obtain the optimal pursuit-evasion policy even for massive MAS under uncertain environment. In the proposed ACM learning based strategy, each agent maintains five neural networks, which are 1) the critic neural network to approximate the solution of the HJI equation for each individual agent; 2) the mass neural network to estimate the population density function (i.e., mass) of the group; 3) the actor neural network to approximate the decentralized optimal strategy, and 4) two more neural networks are designed to estimate the opponents' group mass as well as the optimal cost function. Eventually, a comprehensive numerical simulation has been provided to demonstrate the effectiveness of the designed strategy.


[49] 2008.02944

Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair

A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although validated by the oracle, may actually be incorrect. While the state of the art explore research directions that require dynamic information or rely on manually-crafted heuristics, we study the benefit of learning code representations to learn deep features that may encode the properties of patch correctness. Our work mainly investigates different representation learning approaches for code changes to derive embeddings that are amenable to similarity computations. We report on findings based on embeddings produced by pre-trained and re-trained neural networks. Experimental results demonstrate the potential of embeddings to empower learning algorithms in reasoning about patch correctness: a machine learning predictor with BERT transformer-based embeddings associated with logistic regression yielded an AUC value of about 0.8 in predicting patch correctness on a deduplicated dataset of 1000 labeled patches. Our study shows that learned representations can lead to reasonable performance when comparing against the state-of-the-art, PATCH-SIM, which relies on dynamic information. These representations may further be complementary to features that were carefully (manually) engineered in the literature.


[50] 2008.02951

Fast Identification of Saturated Cut-sets using Graph Search Techniques

When multiple outages occur in rapid succession, it is important to know quickly if the power transfer capability of different interconnections (or cut-sets) of the power network are limited. The algorithm developed in this paper identifies such limited cut-sets very fast, thereby enhancing the real-time situational awareness of power system operators. The significance of the proposed approach is described using the IEEE 39-bus test system, while its computational benefits are demonstrated using relatively large test-cases containing thousands of buses. The results indicate that the proposed network analysis can estimate the impact of an outage on any cut-set of the system and screen out the cut-set that gets saturated by the largest margin, very quickly.


[51] 2008.02952

Few Shot Learning Framework to Reduce Inter-observer Variability in Medical Images

Most computer aided pathology detection systems rely on large volumes of quality annotated data to aid diagnostics and follow up procedures. However, quality assuring large volumes of annotated medical image data can be subjective and expensive. In this work we present a novel standardization framework that implements three few-shot learning (FSL) models that can be iteratively trained by atmost 5 images per 3D stack to generate multiple regional proposals (RPs) per test image. These FSL models include a novel parallel echo state network (ParESN) framework and an augmented U-net model. Additionally, we propose a novel target label selection algorithm (TLSA) that measures relative agreeability between RPs and the manually annotated target labels to detect the "best" quality annotation per image. Using the FSL models, our system achieves 0.28-0.64 Dice coefficient across vendor image stacks for intra-retinal cyst segmentation. Additionally, the TLSA is capable of automatically classifying high quality target labels from their noisy counterparts for 60-97% of the images while ensuring manual supervision on remaining images. Also, the proposed framework with ParESN model minimizes manual annotation checking to 12-28% of the total number of images. The TLSA metrics further provide confidence scores for the automated annotation quality assurance. Thus, the proposed framework is flexible to extensions for quality image annotation curation of other image stacks as well.


[52] 2008.02953

Neural Complexity Measures

While various complexity measures for diverse model classes have been proposed, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven to be challenging. We propose \textit{Neural Complexity} (NC), an alternative data-driven approach that meta-learns a scalar complexity measure through interactions with a large number of heterogeneous tasks. The trained NC model can be added to the standard training loss to regularize any task learner under standard learning frameworks. We contrast NC's approach against existing manually-designed complexity measures and also against other meta-learning models, and validate NC's performance on multiple regression and classification tasks.


[53] 2008.02954

Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification

Privacy policies are statements that notify users of the services' data practices. However, few users are willing to read through policy texts due to the length and complexity. While automated tools based on machine learning exist for privacy policy analysis, to achieve high classification accuracy, classifiers need to be trained on a large labeled dataset. Most existing policy corpora are labeled by skilled human annotators, requiring significant amount of labor hours and effort. In this paper, we leverage active learning and crowdsourcing techniques to develop an automated classification tool named Calpric (Crowdsourcing Active Learning PRIvacy Policy Classifier), which is able to perform annotation equivalent to those done by skilled human annotators with high accuracy while minimizing the labeling cost. Specifically, active learning allows classifiers to proactively select the most informative segments to be labeled. On average, our model is able to achieve the same F1 score using only 62% of the original labeling effort. Calpric's use of active learning also addresses naturally occurring class imbalance in unlabeled privacy policy datasets as there are many more statements stating the collection of private information than stating the absence of collection. By selecting samples from the minority class for labeling, Calpric automatically creates a more balanced training set.


[54] 2008.02956

Bootstrapping Neural Processes

Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still relies on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility. To this end, we propose the Bootstrapping Neural Process (BNP), a novel extension of the NP family using the bootstrap. The bootstrap is a classical data-driven technique for estimating uncertainty, which allows BNP to learn the stochasticity in NPs without assuming a particular form. We demonstrate the efficacy of BNP on various types of data and its robustness in the presence of model-data mismatch.


[55] 2008.02961

From Connectomic to Task-evoked Fingerprints: Individualized Prediction of Task Contrasts from Resting-state Functional Connectivity

Resting-state functional MRI (rsfMRI) yields functional connectomes that can serve as cognitive fingerprints of individuals. Connectomic fingerprints have proven useful in many machine learning tasks, such as predicting subject-specific behavioral traits or task-evoked activity. In this work, we propose a surface-based convolutional neural network (BrainSurfCNN) model to predict individual task contrasts from their resting-state fingerprints. We introduce a reconstructive-contrastive loss that enforces subject-specificity of model outputs while minimizing predictive error. The proposed approach significantly improves the accuracy of predicted contrasts over a well-established baseline. Furthermore, BrainSurfCNN's prediction also surpasses test-retest benchmark in a subject identification task.


[56] 2008.02964

Which Kind Is Better in Open-domain Multi-turn Dialog,Hierarchical or Non-hierarchical Models? An Empirical Study

Currently, open-domain generative dialog systems have attracted considerable attention in academia and industry. Despite the success of single-turn dialog generation, multi-turn dialog generation is still a big challenge. So far, there are two kinds of models for open-domain multi-turn dialog generation: hierarchical and non-hierarchical models. Recently, some works have shown that the hierarchical models are better than non-hierarchical models under their experimental settings; meanwhile, some works also demonstrate the opposite conclusion. Due to the lack of adequate comparisons, it's not clear which kind of models are better in open-domain multi-turn dialog generation. Thus, in this paper, we will measure systematically nearly all representative hierarchical and non-hierarchical models over the same experimental settings to check which kind is better. Through extensive experiments, we have the following three important conclusions: (1) Nearly all hierarchical models are worse than non-hierarchical models in open-domain multi-turn dialog generation, except for the HRAN model. Through further analysis, the excellent performance of HRAN mainly depends on its word-level attention mechanism; (2) The performance of other hierarchical models will also obtain a great improvement if integrating the word-level attention mechanism into these models. The modified hierarchical models even significantly outperform the non-hierarchical models; (3) The reason why the word-level attention mechanism is so powerful for hierarchical models is because it can leverage context information more effectively, especially the fine-grained information. Besides, we have implemented all of the models and already released the codes.


[57] 2008.02965

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. Residual connections are also considered and we show that our regularizer also forms an upper bound to input gradients of such a residual network. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.


[58] 2008.02966

A Novel Video Salient Object Detection Method via Semi-supervised Motion Quality Perception

Previous video salient object detection (VSOD) approaches have mainly focused on designing fancy networks to achieve their performance improvements. However, with the slow-down in development of deep learning techniques recently, it may become more and more difficult to anticipate another breakthrough via fancy networks solely. To this end, this paper proposes a universal learning scheme to get a further 3\% performance improvement for all state-of-the-art (SOTA) methods. The major highlight of our method is that we resort the "motion quality"---a brand new concept, to select a sub-group of video frames from the original testing set to construct a new training set. The selected frames in this new training set should all contain high-quality motions, in which the salient objects will have large probability to be successfully detected by the "target SOTA method"---the one we want to improve. Consequently, we can achieve a significant performance improvement by using this new training set to start a new round of network training. During this new round training, the VSOD results of the target SOTA method will be applied as the pseudo training objectives. Our novel learning scheme is simple yet effective, and its semi-supervised methodology may have large potential to inspire the VSOD community in the future.


[59] 2008.02973

Exploring Rich and Efficient Spatial Temporal Interactions for Real Time Video Salient Object Detection

The current main stream methods formulate their video saliency mainly from two independent venues, i.e., the spatial and temporal branches. As a complementary component, the main task for the temporal branch is to intermittently focus the spatial branch on those regions with salient movements. In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter. Thus, the key factor to improve the overall video saliency is how to further boost the performance of these branches efficiently. In this paper, we propose a novel spatiotemporal network to achieve such improvement in a full interactive fashion. We integrate a lightweight temporal model into the spatial branch to coarsely locate those spatially salient regions which are correlated with trustworthy salient movements. Meanwhile, the spatial branch itself is able to recurrently refine the temporal model in a multi-scale manner. In this way, both the spatial and temporal branches are able to interact with each other, achieving the mutual performance improvement. Our method is easy to implement yet effective, achieving high quality video saliency detection in real-time speed with 50 FPS.


[60] 2008.02974

MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

Click-through rate (CTR) prediction is a critical task in online advertising systems. Existing works mainly address the single-domain CTR prediction problem and model aspects such as feature interaction, user behavior history and contextual information. Nevertheless, ads are usually displayed with natural content, which offers an opportunity for cross-domain CTR prediction. In this paper, we address this problem and leverage auxiliary data from a source domain to improve the CTR prediction performance of a target domain. Our study is based on UC Toutiao (a news feed service integrated with the UC Browser App, serving hundreds of millions of users daily), where the source domain is the news and the target domain is the ad. In order to effectively leverage news data for predicting CTRs of ads, we propose the Mixed Interest Network (MiNet) which jointly models three types of user interest: 1) long-term interest across domains, 2) short-term interest from the source domain and 3) short-term interest in the target domain. MiNet contains two levels of attentions, where the item-level attention can adaptively distill useful information from clicked news / ads and the interest-level attention can adaptively fuse different interest representations. Offline experiments show that MiNet outperforms several state-of-the-art methods for CTR prediction. We have deployed MiNet in UC Toutiao and the A/B test results show that the online CTR is also improved substantially. MiNet now serves the main ad traffic in UC Toutiao.


[61] 2008.02976

Data Weighted Training Strategies for Grammatical Error Correction

Recent progress in the task of Grammatical Error Correction (GEC) has been driven by addressing data sparsity, both through new methods for generating large and noisy pretraining data and through the publication of small and higher-quality finetuning data in the BEA-2019 shared task. Building upon recent work in Neural Machine Translation (NMT), we make use of both kinds of data by deriving example-level scores on our large pretraining data based on a smaller, higher-quality dataset. In this work, we perform an empirical study to discover how to best incorporate delta-log-perplexity, a type of example scoring, into a training schedule for GEC. In doing so, we perform experiments that shed light on the function and applicability of delta-log-perplexity. Models trained on scored data achieve state-of-the-art results on common GEC test sets.


[62] 2008.02977

A Channel Model of Transceivers for Multiterminal Secret Key Agreement

Information theoretic secret key agreement is impossible without making initial assumptions. One type of initial assumption is correlated random variables that are generated by using a noisy channel that connects the terminals. Terminals use the correlated random variables and communication over a reliable public channel to arrive at a shared secret key. Previous channel models assume that each terminal either controls one input to the channel, or receives one output variable of the channel. In this paper, we propose a new channel model of transceivers where each terminal simultaneously controls an input variable and observes an output variable of the (noisy) channel. We give upper and lower bounds for the secret key capacity (i.e., highest achievable key rate) of this transceiver model, and prove the secret key capacity under the conditions that the public communication is noninteractive and input variables of the noisy channel are independent.


[63] 2008.02979

Role-Based Deception in Enterprise Networks

Historically, enterprise network reconnaissance is an active process, often involving port scanning. However, as routers and switches become more complex, they also become more susceptible to compromise. From this vantage point, an attacker can passively identify high-value hosts such as the workstations of IT administrators, C-suite executives, and finance personnel. The goal of this paper is to develop a technique to deceive and dissuade such adversaries. We propose HoneyRoles, which uses honey connections to build metaphorical haystacks around the network traffic of client hosts belonging to high-value organizational roles. The honey connections also act as network canaries to signal network compromise, thereby dissuading the adversary from acting on information observed in network flows. We design a prototype implementation of HoneyRoles using an OpenFlow SDN controller and evaluate its security using the PRISM probabilistic model checker. Our performance evaluation shows that HoneyRoles has a small effect on network request completion time and our security analysis demonstrates that once an alert is raised, HoneyRoles can quickly identify the compromised switch with high probability. In doing so, we show that a role-based network deception is a promising approach for defending against adversaries that have compromised network devices.


[64] 2008.02980

Textual Description for Mathematical Equations

Reading of mathematical expression or equation in the document images is very challenging due to the large variability of mathematical symbols and expressions. In this paper, we pose reading of mathematical equation as a task of generation of the textual description which interprets the internal meaning of this equation. Inspired by the natural image captioning problem in computer vision, we present a mathematical equation description (MED) model, a novel end-to-end trainable deep neural network based approach that learns to generate a textual description for reading mathematical equation images. Our MED model consists of a convolution neural network as an encoder that extracts features of input mathematical equation images and a recurrent neural network with attention mechanism which generates description related to the input mathematical equation images. Due to the unavailability of mathematical equation image data sets with their textual descriptions, we generate two data sets for experimental purpose. To validate the effectiveness of our MED model, we conduct a real-world experiment to see whether the students are able to write equations by only reading or listening their textual descriptions or not. Experiments conclude that the students are able to write most of the equations correctly by reading their textual descriptions only.


[65] 2008.02986

Global Context Aware Convolutions for 3D Point Cloud Understanding

Recent advances in deep learning for 3D point clouds have shown great promises in scene understanding tasks thanks to the introduction of convolution operators to consume 3D point clouds directly in a neural network. Point cloud data, however, could have arbitrary rotations, especially those acquired from 3D scanning. Recent works show that it is possible to design point cloud convolutions with rotation invariance property, but such methods generally do not perform as well as translation-invariant only convolution. We found that a key reason is that compared to point coordinates, rotation-invariant features consumed by point cloud convolution are not as distinctive. To address this problem, we propose a novel convolution operator that enhances feature distinction by integrating global context information from the input point cloud to the convolution. To this end, a globally weighted local reference frame is constructed in each point neighborhood in which the local point set is decomposed into bins. Anchor points are generated in each bin to represent global shape features. A convolution can then be performed to transform the points and anchor features into final rotation-invariant features. We conduct several experiments on point cloud classification, part segmentation, shape retrieval, and normals estimation to evaluate our convolution, which achieves state-of-the-art accuracy under challenging rotations.


[66] 2008.02987

Why are Developers Struggling to Put GDPR into Practice when Developing Privacy-Preserving Software Systems?

The use of software applications is inevitable as they provide different services to users. The software applications collect, store users' data, and sometimes share with the third party, even without the user consent. One can argue that software developers do not implement privacy into the software applications they develop or take GDPR (General Data Protection Law) law into account. Failing to do this, may lead to software applications that open up privacy breaches (e.g. data breach). The GDPR law provides a set of guidelines for developers and organizations on how to protect user data when they are interacting with software applications. Previous research has attempted to investigate what hinders developers from embedding privacy into software systems. However, there has been no detailed investigation on why they cannot develop privacy-preserving systems taking GDPR into consideration, which is imperative to develop software applications that preserve privacy. Therefore, this paper investigates the issues that hinder software developers from implementing software applications taking GDPR law on-board. Our study findings revealed that developers are not familiar with GDPR principles. Even some of them are, they lack knowledge of the GDPR principles and their techniques to use when developing privacy-preserving software systems


[67] 2008.02988

An Analytical Framework for Delay Optimal Mobile Edge Deployment in Wireless Networks

The emerging edge caching provides an effective way to reduce service delay for mobile users. However, due to high deployment cost of edge hosts, a practical problem is how to achieve minimum delay under a proper edge deployment strategy. In this letter, we provide an analytical framework for delay optimal mobile edge deployment in a partially connected wireless network, where the request files can be cached at the edge hosts and cooperatively transmitted through multiple base stations. In order to deal with the heterogeneous transmission requirements, we separate the entire transmission into backhaul and wireless phases, and propose average user normalized delivery time (AUNDT) as the performance metric. On top of that, we characterize the trade-off relations between the proposed AUNDT and other network deployment parameters. Using the proposed analytical framework, we are able to provide the optimal mobile edge deployment strategy in terms of AUNDT, which provides a useful guideline for future mobile edge deployment.


[68] 2008.02992

Leveraging Localization for Multi-camera Association

We present McAssoc, a deep learning approach to the as-sociation of detection bounding boxes in different views ofa multi-camera system. The vast majority of the academiahas been developing single-camera computer vision algo-rithms, however, little research attention has been directedto incorporating them into a multi-camera system. In thispaper, we designed a 3-branch architecture that leveragesdirect association and additional cross localization infor-mation. A new metric, image-pair association accuracy(IPAA) is designed specifically for performance evaluationof cross-camera detection association. We show in the ex-periments that localization information is critical to suc-cessful cross-camera association, especially when similar-looking objects are present. This paper is an experimentalwork prior to MessyTable, which is a large-scale bench-mark for instance association in mutliple cameras.


[69] 2008.02993

Joint Uplink-and-Downlink Optimization of 3D UAV Swarm Deployment for Wireless-Powered NB-IoT Networks

This paper investigates a full-duplex orthogonal-frequency-division multiple access (OFDMA) based multiple unmanned aerial vehicles (UAVs)-enabled wireless-powered Internet-of-Things (IoT) networks. In this paper, a swarm of UAVs is first deployed in three dimensions (3D) to simultaneously charge all devices, i.e., a downlink (DL) charging period, and then flies to new locations within this area to collect information from scheduled devices in several epochs via OFDMA due to potential limited number of channels available in Narrow Band IoT, i.e., an uplink (UL) communication period. To maximize the UL throughput of IoT devices, we jointly optimizes the UL-and-DL 3D deployment of the UAV swarm, including the device-UAV association, the scheduling order, and the UL-DL time allocation. In particular, the DL energy harvesting (EH) threshold of devices and the UL signal decoding threshold of UAVs are taken into consideration when studying the problem. Besides, both line-of-sight (LoS) and non-line-of-sight (NLoS) channel models are studied depending on the position of sensors and UAVs. The influence of the potential limited channels issue in NB-IoT is also considered by studying the IoT scheduling policy. Two scheduling policies, a near-first (NF) policy and a far-first (FF) policy, are studied. It is shown that the NF scheme outperforms FF scheme in terms of sum throughput maximization; whereas FF scheme outperforms NF scheme in terms of system fairness.


[70] 2008.02999

Single-stage intake gesture detection using CTC loss and extended prefix beam search

Accurate detection of individual intake gestures is a key step towards automatic dietary monitoring. Both inertial sensor data of wrist movements and video data depicting the upper body have been used for this purpose. The most advanced approaches to date use a two-stage approach, in which (i) frame-level intake probabilities are learned from the sensor data using a deep neural network, and then (ii) sparse intake events are detected by finding the maxima of the frame-level probabilities. In this study, we propose a single-stage approach which directly decodes the probabilities learned from sensor data into sparse intake detections. This is achieved by weakly supervised training using Connectionist Temporal Classification (CTC) loss, and decoding using a novel extended prefix beam search decoding algorithm. Benefits of this approach include (i) end-to-end training for detections, (ii) consistency with the fuzzy nature of intake gestures, and (iii) avoidance of hard-coded rules. Across two separate datasets, we quantify these benefits by showing relative $F_1$ score improvements between 2.0% and 6.2% over the two-stage approach for intake detection and eating vs. drinking recognition tasks, for both video and inertial sensors.


[71] 2008.03007

Modelling Multi-Agent Epistemic Planning in ASP. Theory and Practice of Logic Programming

Designing agents that reason and act upon the world has always been one of the main objectives of the Artificial Intelligence community. While for planning in "simple" domains the agents can solely rely on facts about the world, in several contexts, e.g., economy, security, justice and politics, the mere knowledge of the world could be insufficient to reach a desired goal. In these scenarios, epistemic reasoning, i.e., reasoning about agents' beliefs about themselves and about other agents' beliefs, is essential to design winning strategies. This paper addresses the problem of reasoning in multi-agent epistemic settings exploiting declarative programming techniques. In particular, the paper presents an actual implementation of a multi-shot Answer Set Programming-based planner that can reason in multi-agent epistemic settings, called PLATO (ePistemic muLti-agent Answer seT programming sOlver). The ASP paradigm enables a concise and elegant design of the planner, w.r.t. other imperative implementations, facilitating the development of formal verification of correctness. The paper shows how the planner, exploiting an ad-hoc epistemic state representation and the efficiency of ASP solvers, has competitive performance results on benchmarks collected from the literature. It is under consideration for acceptance in TPLP.


[72] 2008.03014

A Multi-Task Learning Approach for Human Action Detection and Ergonomics Risk Assessment

We propose a new approach to Human Action Evaluation (HAE) in long videos using graph-based multi-task modeling. Previous works in activity assessment either directly compute a metric using a detected skeleton or use the scene information to regress the activity score. These approaches are insufficient for accurate activity assessment since they only compute an average score over a clip, and do not consider the correlation between the joints and body dynamics. Moreover, they are highly scene-dependent which makes the generalizability of these methods questionable. We propose a novel multi-task framework for HAE that utilizes a Graph Convolutional Network backbone to embed the interconnection between human joints in the features. In this framework, we solve the Human Action Detection (HAD) problem as an auxiliary task to improve activity assessment. The HAD head is powered by an Encoder-Decoder Temporal Convolutional Network to detect activities in long videos and HAE uses a Long-Short-Term-Memory-based architecture. We evaluate our method on the UW-IOM and TUM Kitchen datasets and discuss the success and failure cases on these two datasets.


[73] 2008.03020

A Context-based Disambiguation Model for Sentiment Concepts Using a Bag-of-concepts Approach

With the widespread dissemination of user-generated content on different social networks, and online consumer systems such as Amazon, the quantity of opinionated information available on the Internet has been increased. One of the main tasks of the sentiment analysis is to detect polarity within a text. The existing polarity detection methods mainly focus on keywords and their naive frequency counts; however, they less regard the meanings and implicit dimensions of the natural concepts. Although background knowledge plays a critical role in determining the polarity of concepts, it has been disregarded in polarity detection methods. This study presents a context-based model to solve ambiguous polarity concepts using commonsense knowledge. First, a model is presented to generate a source of ambiguous sentiment concepts based on SenticNet by computing the probability distribution. Then the model uses a bag-of-concepts approach to remove ambiguities and semantic augmentation with the ConceptNet handling to overcome lost knowledge. ConceptNet is a large-scale semantic network with a large number of commonsense concepts. In this paper, the point mutual information (PMI) measure is used to select the contextual concepts having strong relationships with ambiguous concepts. The polarity of the ambiguous concepts is precisely detected using positive/negative contextual concepts and the relationship of the concepts in the semantic knowledge base. The text representation scheme is semantically enriched using Numberbatch, which is a word embedding model based on the concepts from the ConceptNet semantic network. The proposed model is evaluated by applying a corpus of product reviews, called Semeval. The experimental results revealed an accuracy rate of 82.07%, representing the effectiveness of the proposed model.


[74] 2008.03030

Deep Robust Clustering by Contrastive Learning

Recently, many unsupervised deep learning methods have been proposed to learn clustering with unlabelled data. By introducing data augmentation, most of the latest methods look into deep clustering from the perspective that the original image and its tansformation should share similar semantic clustering assignment. However, the representation features before softmax activation function could be quite different even the assignment probability is very similar since softmax is only sensitive to the maximum value. This may result in high intra-class diversities in the representation feature space, which will lead to unstable local optimal and thus harm the clustering performance. By investigating the internal relationship between mutual information and contrastive learning, we summarized a general framework that can turn any maximizing mutual information into minimizing contrastive loss. We apply it to both the semantic clustering assignment and representation feature and propose a novel method named Deep Robust Clustering by Contrastive Learning (DRC). Different to existing methods, DRC aims to increase inter-class diver-sities and decrease intra-class diversities simultaneously and achieve more robust clustering results. Extensive experiments on six widely-adopted deep clustering benchmarks demonstrate the superiority of DRC in both stability and accuracy. e.g., attaining 71.6% mean accuracy on CIFAR-10, which is 7.1% higher than state-of-the-art results.


[75] 2008.03034

An Empirical Study on Developing Secure Mobile Health Apps: The Developers Perspective

Mobile apps exploit embedded sensors and wireless connectivity of a device to empower users with portable computations, context-aware communication, and enhanced interaction. Specifically, mobile health apps (mHealth apps for short) are becoming integral part of mobile and pervasive computing to improve the availability and quality of healthcare services. Despite the offered benefits, mHealth apps face a critical challenge, i.e., security of health critical data that is produced and consumed by the app. Several studies have revealed that security specific issues of mHealth apps have not been adequately addressed. The objectives of this study are to empirically (a) investigate the challenges that hinder development of secure mHealth apps, (b) identify practices to develop secure apps, and (c) explore motivating factors that influence secure development. We conducted this study by collecting responses of 97 developers from 25 countries, across 06 continents, working in diverse teams and roles to develop mHealth apps for Android, iOS, and Windows platform. Qualitative analysis of the survey data is based on (i) 8 critical challenges, (ii) taxonomy of best practices to ensure security, and (iii) 6 motivating factors that impact secure mHealth apps. This research provides empirical evidence as practitioners view and guidelines to develop emerging and next generation of secure mHealth apps.


[76] 2008.03039

A boosted outlier detection method based on the spectrum of the Laplacian matrix of a graph

This papers explores a new outlier detection algorithm based on the spectrum of the Laplacian matrix of a graph. Taking advantage of boosting together with sparse-data based learners. The sparcity of the Laplacian matrix significantly decreases the computational burden, enabling a spectrum based outlier detection method to be applied to larger datasets compared to spectral clustering. The method is competitive on synthetic datasets with commonly used outlier detection algorithms like Isolation Forest and Local Outlier Factor.


[77] 2008.03042

PSCS: A Path-based Neural Model for SemanticCode Search

To obtain code snippets for reuse, programmers prefer to search for related documents, e.g., blogs or Q&A, instead of code itself. The major reason is due to the semantic diversity and mismatch between queries and code snippets. Deep learning models have been proposed to address this challenge. Compared with approaches using information retrieval techniques, deep learning models do not suffer from the information loss caused by refining user intention into keywords. However, the performance of previous works is not satisfactory because they ignore the importance of code structure. When the semantics of code (e.g., identifier names, APIs) are ambiguous, code structure may be the only feature for the model to utilize. In that case, previous works relearn the structural information from lexical tokens of code, which is extremely difficult for a model without any domain knowledge. In this work, we propose PSCS, a path-based neural model for semantic code search. Our model encodes both the semantics and structures of code represented by AST paths. We train and evaluate our model over 330k-19k query-function pairs, respectively. The evaluation results demonstrate that PSCS achieves a SuccessRate of 47.6% and a Mean Reciprocal Rank (MRR) of 30.4% when considering the top-10 results with a match. The proposed approach significantly outperforms both DeepCS, the first approach that applies deep learning to code search task, and CARLCS, a state-of-the-art approach that introduces a co-attentive representation learning model on the basis of DeepCS. The importance of code structure is demonstrated with an ablation study on code features, which enlightens model design for further studies.


[78] 2008.03043

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Multispectral pedestrian detection is capable of adapting to insufficient illumination conditions by leveraging color-thermal modalities. On the other hand, it is still lacking of in-depth insights on how to fuse the two modalities effectively. Compared with traditional pedestrian detection, we find multispectral pedestrian detection suffers from modality imbalance problems which will hinder the optimization process of dual-modality network and depress the performance of detector. Inspired by this observation, we propose Modality Balance Network (MBNet) which facilitates the optimization process in a much more flexible and balanced manner. Firstly, we design a novel Differential Modality Aware Fusion (DMAF) module to make the two modalities complement each other. Secondly, an illumination aware feature alignment module selects complementary features according to the illumination conditions and aligns the two modality features adaptively. Extensive experimental results demonstrate MBNet outperforms the state-of-the-arts on both the challenging KAIST and CVC-14 multispectral pedestrian datasets in terms of the accuracy and the computational efficiency. Code is available at https://github.com/CalayZhou/MBNet.


[79] 2008.03044

Energy Communities: From European Law to Numerical Modeling

In 2019, the European Union introduced two new actors in the European energy system: Renewable and Citizen Energy Communities (RECs and CECs). Modelling these two new actors and their effects on the energy system is crucial when implementing the European Legislation, incorporating energy communities (ECs) into the electric grid, planning ECs, and conducting academic research. This paper aims to bridge the gap between the letter of the law and numerical models of ECs. After introducing RECs and CECs, we list elements of the law to be considered by regulators, distribution system operators, EC planners, researchers, and other stakeholders when modelling ECs. Finally, we provide three case studies of EC models that explicitly include elements of the European Law.


[80] 2008.03046

Towards Using Probabilistic Models to Design Software Systems with Inherent Uncertainty

The adoption of machine learning (ML) components in software systems raises new engineering challenges. In particular, the inherent uncertainty regarding functional suitability and the operation environment makes architecture evaluation and trade-off analysis difficult. We propose a software architecture evaluation method called Modeling Uncertainty During Design (MUDD) that explicitly models the uncertainty associated to ML components and evaluates how it propagates through a system. The method supports reasoning over how architectural patterns can mitigate uncertainty and enables comparison of different architectures focused on the interplay between ML and classical software components. While our approach is domain-agnostic and suitable for any system where uncertainty plays a central role, we demonstrate our approach using as example a perception system for autonomous driving.


[81] 2008.03050

A General Framework for Stable Roommates Problems using Answer Set Programming

The Stable Roommates problem (SR) is characterized by the preferences of agents over other agents as roommates: each agent ranks all others in strict order of preference. A solution to SR is then a partition of the agents into pairs so that each pair shares a room, and there is no pair of agents that would block this matching (i.e., who prefers the other to their roommate in the matching). There are interesting variations of SR that are motivated by applications (e.g., the preference lists may be incomplete (SRI) and involve ties (SRTI)), and that try to find a more fair solution (e.g., Egalitarian SR). Unlike the Stable Marriage problem, every SR instance is not guaranteed to have a solution. For that reason, there are also variations of SR that try to find a good-enough solution (e.g., Almost SR). Most of these variations are NP-hard. We introduce a formal framework, called SRTI-ASP, utilizing the logic programming paradigm Answer Set Programming, that is provable and general enough to solve many of such variations of SR. Our empirical analysis shows that SRTI-ASP is also promising for applications. This paper is under consideration for acceptance in TPLP.


[82] 2008.03053

Ontology-based Graph Visualization for Summarized View

Data summarization that presents a small subset of a dataset to users has been widely applied in numerous applications and systems. Many datasets are coded with hierarchical terminologies, e.g., the international classification of Diseases-9, Medical Subject Heading, and Gene Ontology, to name a few. In this paper, we study the problem of selecting a diverse set of k elements to summarize an input dataset with hierarchical terminologies, and visualize the summary in an ontology structure. We propose an efficient greedy algorithm to solve the problem with (1-1/e) = 62% -approximation guarantee. Although this greedy solution achieves quality-guaranteed answers approximately but it is still not optimal. To tackle the problem optimally, we further develop a dynamic programming algorithm to obtain optimal answers for graph visualization of log-data using ontology terminologies called OVDO . The complexity and correctness of OVDO are theoretically analyzed. In addition, we propose a useful optimization technique of tree reduction to remove useless nodes with zero weights and shrink the tree into a smaller one, which ensures the efficiency acceleration of OVDO in many real-world applications. Extensive experimental results on real-world datasets show the effectiveness and efficiency of our proposed approximate and exact algorithms for tree data summarization.


[83] 2008.03061

Hierarchical Clusterings of Unweighted Graphs

We study the complexity of finding an optimal hierarchical clustering of an unweighted similarity graph under the recently introduced Dasgupta objective function. We introduce a proof technique, called the normalization procedure, that takes any such clustering of a graph $G$ and iteratively improves it until a desired target clustering of G is reached. We use this technique to show both a negative and a positive complexity result. Firstly, we show that in general the problem is NP-complete. Secondly, we consider min-well-behaved graphs, which are graphs $H$ having the property that for any $k$ the graph $H(k)$ being the join of $k$ copies of $H$ has an optimal hierarchical clustering that splits each copy of $H$ in the same optimal way. To optimally cluster such a graph $H(k)$ we thus only need to optimally cluster the smaller graph $H$. Co-bipartite graphs are min-well-behaved, but otherwise they seem to be scarce. We use the normalization procedure to show that also the cycle on 6 vertices is min-well-behaved.


[84] 2008.03064

A Surgery of the Neural Architecture Evaluators

Neural architecture search (NAS) recently received extensive attention due to its effectiveness in automatically designing effective neural architectures. A major challenge in NAS is to conduct a fast and accurate evaluation of neural architectures. Commonly used fast architecture evaluators include one-shot evaluators (including weight sharing and hypernet-based ones) and predictor-based evaluators. Despite their high evaluation efficiency, the evaluation correlation of these evaluators is still questionable. In this paper, we conduct an extensive assessment of both the one-shot and predictor-based evaluator on the NAS-Bench-201 benchmark search space, and break up how and why different factors influence the evaluation correlation and other NAS-oriented criteria.


[85] 2008.03066

A Game-Theoretic Drone-as-a-Service Composition for Delivery

We propose a novel game-theoretic approach for drone service composition considering recharging constraints. We design a non-cooperative game model for drone services. We propose a non-cooperative game algorithm for the selection and composition of optimal drone services. We conduct several experiments on a real drone dataset to demonstrate the efficiency of our proposed approach.


[86] 2008.03069

Spacecraft Collision Avoidance Challenge: design and results of a machine learning competition

Spacecraft collision avoidance procedures have become an essential part of satellite operations. Complex and constantly updated estimates of the collision risk between orbiting objects inform the various operators who can then plan risk mitigation measures. Such measures could be aided by the development of suitable machine learning models predicting, for example, the evolution of the collision risk in time. In an attempt to study this opportunity, the European Space Agency released, in October 2019, a large curated dataset containing information about close approach events, in the form of Conjunction Data Messages (CDMs), collected from 2015 to 2019. This dataset was used in the Spacecraft Collision Avoidance Challenge, a machine learning competition where participants had to build models to predict the final collision risk between orbiting objects. This paper describes the design and results of the competition and discusses the challenges and lessons learned when applying machine learning methods to this problem domain.


[87] 2008.03071

Oversampling Adversarial Network for Class-Imbalanced Fault Diagnosis

The collected data from industrial machines are often imbalanced, which poses a negative effect on learning algorithms. However, this problem becomes more challenging for a mixed type of data or while there is overlapping between classes. Class-imbalance problem requires a robust learning system which can timely predict and classify the data. We propose a new adversarial network for simultaneous classification and fault detection. In particular, we restore the balance in the imbalanced dataset by generating faulty samples from the proposed mixture of data distribution. We designed the discriminator of our model to handle the generated faulty samples to prevent outlier and overfitting. We empirically demonstrate that; (i) the discriminator trained with a generator to generates samples from a mixture of normal and faulty data distribution which can be considered as a fault detector; (ii), the quality of the generated faulty samples outperforms the other synthetic resampling techniques. Experimental results show that the proposed model performs well when comparing to other fault diagnosis methods across several evaluation metrics; in particular, coalescing of generative adversarial network (GAN) and feature matching function is effective at recognizing faulty samples.


[88] 2008.03072

Optimizing Information Loss Towards Robust Neural Networks

Neural Networks (NNs) are vulnerable to adversarial examples. Such inputs differ only slightly from their benign counterparts yet provoke misclassifications of the attacked NNs. The required perturbations to craft the examples are often negligible and even human imperceptible. To protect deep learning based system from such attacks, several countermeasures have been proposed with adversarial training still being considered the most effective. Here, NNs are iteratively retrained using adversarial examples forming a computational expensive and time consuming process often leading to a performance decrease. To overcome the downsides of adversarial training while still providing a high level of security, we present a new training approach we call entropic retraining. Based on an information-theoretic analysis, entropic retraining mimics the effects of adversarial training without the need of the laborious generation of adversarial examples. We empirically show that entropic retraining leads to a significant increase in NNs' security and robustness while only relying on the given original data. With our prototype implementation we validate and show the effectiveness of our approach for various NN architectures and data sets.


[89] 2008.03075

On Packet Reordering in Time-Sensitive Networks

In time-sensitive networks (as in the context of IEEE TSN and IETF DetNet), some limited amount of packet reordering may be acceptable. Re-sequencing buffers are then used to provide in-order delivery of packets; they depend on critical parameters (timeout value and buffer size) and may affect the worst-case delay and delay jitter. However, there is no precise understanding of per-flow reordering metrics nor of the dimensioning of re-sequencing buffers in order to provide worst-case guarantees, as required in time-sensitive networks. We close this gap as follows. First, we show that a previously proposed per-flow reordering metric, namely, reordering late time offset (RTO) determines the timeout value. If the network can be assumed to be lossless, another previously defined metric, the reordering byte offset (RBO), determines the required buffer size. However, if packet losses cannot be ignored, the required buffer may be larger than the RBO, and the exact value depends on a combination of the delay jitter between the source and the re-sequencing buffer, an arrival curve of the flow at its source, and the timeout. Then we develop a method to compute the RTO for a flow path; the method uses a novel relation with delay jitter and arrival curve, together with a decomposition of the path into non order-preserving and order-preserving elements. We also analyze the effect of re-sequencing buffers on worst-case delay, delay jitter and arrival curve propagation. We show in particular that, in a lossless (but non order-preserving) network, re-sequencing is "for free", in the sense that it does not increase worst-case delay nor delay jitter, whereas in a lossy network, re-sequencing increases the worst-case delay and delay jitter. We apply the analysis to evaluate the performance impact of placing re-sequencing buffers at intermediate points and illustrate the results on two industrial test cases.


[90] 2008.03077

Deep Ordinal Regression Forests

Ordinal regression is a type of regression techniques used for predicting an ordinal variable. Recent methods formulate an ordinal regression problem as a series of binary classification problems. Such methods cannot ensure the global ordinal relationship is preserved since the relationships among different binary classifiers are neglected. We propose a novel ordinal regression approach called Deep Ordinal Regression Forests (DORFs), which is constructed with the differentiable decision trees for obtaining precise and stable global ordinal relationships. The advantages of the proposed DORFs are twofold. First, instead of learning a series of binary classifiers independently, the proposed method learns an ordinal distribution for ordinal regression. Second, the differentiable decision trees can be trained together with the ordinal distribution in an end-to-end manner. The effectiveness of the proposed DORFs is verified on two ordinal regression tasks, i.e., facial age estimation and image aesthetic assessment, showing significant improvements and better stability over the state-of-the-art ordinal regression methods.


[91] 2008.03082

Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.


[92] 2008.03085

SimPatch: A Nearest Neighbor Similarity Match between Image Patches

Measuring the similarity between patches in images is a fundamental building block in various tasks. Naturally, the patch-size has a major impact on the matching quality, and on the consequent application performance. We try to use large patches instead of relatively small patches so that each patch contains more information. We use different feature extraction mechanisms to extract the features of each individual image patches which forms a feature matrix and find out the nearest neighbor patches in the image. The nearest patches are calculated using two different nearest neighbor algorithms in this paper for a query patch for a given image and the results have been demonstrated in this paper.


[93] 2008.03087

Cascade Graph Neural Networks for RGB-D Salient Object Detection

In this paper, we study the problem of salient object detection (SOD) for RGB-D images using both color and depth information.A major technical challenge in performing salient object detection fromRGB-D images is how to fully leverage the two complementary data sources. Current works either simply distill prior knowledge from the corresponding depth map for handling the RGB-image or blindly fuse color and geometric information to generate the coarse depth-aware representations, hindering the performance of RGB-D saliency detectors.In this work, we introduceCascade Graph Neural Networks(Cas-Gnn),a unified framework which is capable of comprehensively distilling and reasoning the mutual benefits between these two data sources through a set of cascade graphs, to learn powerful representations for RGB-D salient object detection. Cas-Gnn processes the two data sources individually and employs a novelCascade Graph Reasoning(CGR) module to learn powerful dense feature embeddings, from which the saliency map can be easily inferred. Contrast to the previous approaches, the explicitly modeling and reasoning of high-level relations between complementary data sources allows us to better overcome challenges such as occlusions and ambiguities. Extensive experiments demonstrate that Cas-Gnn achieves significantly better performance than all existing RGB-DSOD approaches on several widely-used benchmarks.


[94] 2008.03091

Low-Congestion Shortcuts for Graphs Excluding Dense Minors

We prove that any $n$-node graph $G$ with diameter $D$ admits shortcuts with congestion $O(\delta D \log n)$ and dilation $O(\delta D)$, where $\delta$ is the maximum edge-density of any minor of $G$. Our proof is simple, elementary, and constructive - featuring a $\tilde{\Theta}(\delta D)$-round distributed construction algorithm. Our results are tight up to $\tilde{O}(1)$ factors and generalize, simplify, unify, and strengthen several prior results. For example, for graphs excluding a fixed minor, i.e., graphs with constant $\delta$, only a $\tilde{O}(D^2)$ bound was known based on a very technical proof that relies on the Robertson-Seymour Graph Structure Theorem. A direct consequence of our result is that many graph families, including any minor-excluded ones, have near-optimal $\tilde{\Theta}(D)$-round distributed algorithms for many fundamental communication primitives and optimization problems including minimum spanning tree, minimum cut, and shortest-path approximations.


[95] 2008.03095

Boosting Parallel Influence-Maximization Kernels for Undirected Networks with Fusing and Vectorization

Influence maximization (IM) is the problem of finding a seed vertex set which is expected to incur the maximum influence spread on a graph. It has various applications in practice such as devising an effective and efficient approach to disseminate information, news or ad within a social network. The problem is shown to be NP-hard and approximation algorithms with provable quality guarantees exist in the literature. However, these algorithms are computationally expensive even for medium-scaled graphs. Furthermore, graph algorithms usually suffer from spatial and temporal irregularities during memory accesses, and this adds an extra cost on top of the already expensive IM kernels. In this work, we leverage fused sampling, memoization, and vectorization to restructure, parallelize and boost their performance on undirected networks. The proposed approach employs a pseudo-random function and performs multiple Monte-Carlo simulations in parallel to exploit the SIMD lanes effectively and efficiently. Besides, it significantly reduces the number of edge traversals, hence the amount of data brought from the memory, which is critical for almost all memory-bound graph kernels. We apply the proposed approach to the traditional MixGreedy algorithm and propose Infuser which is more than 3000 times faster than the traditional greedy approaches and can run on large graphs that have been considered as too large in the literature.


[96] 2008.03100

Conflict Generalisation in ASP: Learning Correct and Effective Non-Ground Constraints

Generalising and re-using knowledge learned while solving one problem instance has been neglected by state-of-the-art answer set solvers. We suggest a new approach that generalises learned nogoods for re-use to speed-up the solving of future problem instances. Our solution combines well-known ASP solving techniques with deductive logic-based machine learning. Solving performance can be improved by adding learned non-ground constraints to the original program. We demonstrate the effects of our method by means of realistic examples, showing that our approach requires low computational cost to learn constraints that yield significant performance benefits in our test cases. These benefits can be seen with ground-and-solve systems as well as lazy-grounding systems. However, ground-and-solve systems suffer from additional grounding overheads, induced by the additional constraints in some cases. By means of conflict minimization, non-minimal learned constraints can be reduced. This can result in significant reductions of grounding and solving efforts, as our experiments show. (Under consideration for acceptance in TPLP.)


[97] 2008.03101

Privacy Guarantees for De-identifying Text Transformations

Machine Learning approaches to Natural Language Processing tasks benefit from a comprehensive collection of real-life user data. At the same time, there is a clear need for protecting the privacy of the users whose data is collected and processed. For text collections, such as, e.g., transcripts of voice interactions or patient records, replacing sensitive parts with benign alternatives can provide de-identification. However, how much privacy is actually guaranteed by such text transformations, and are the resulting texts still useful for machine learning? In this paper, we derive formal privacy guarantees for general text transformation-based de-identification methods on the basis of Differential Privacy. We also measure the effect that different ways of masking private information in dialog transcripts have on a subsequent machine learning task. To this end, we formulate different masking strategies and compare their privacy-utility trade-offs. In particular, we compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks like named entity recognition, intent detection, and dialog act classification. We find that only word-by-word replacement is robust against performance drops in various tasks.


[98] 2008.03103

Gluing resource proof-structures: inhabitation and inverting the Taylor expansion

A Multiplicative-Exponential Linear Logic (MELL) proof-structure can be expanded into a set of resource proof-structures: its Taylor expansion. We introduce a new criterion characterizing those sets of resource proof-structures that are part of the Taylor expansion of some MELL proof-structure, through a rewriting system acting both on resource and MELL proof-structures. As a consequence, we prove also the semi-decidability of the type inhabitation problem for MELL proof-structures.


[99] 2008.03107

Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-calling

Nanopore genome sequencing is the key to enabling personalized medicine, global food security, and virus surveillance. The state-of-the-art base-callers adopt deep neural networks (DNNs) to translate electrical signals generated by nanopore sequencers to digital DNA symbols. A DNN-based base-caller consumes $44.5\%$ of total execution time of a nanopore sequencing pipeline. However, it is difficult to quantize a base-caller and build a power-efficient processing-in-memory (PIM) to run the quantized base-caller. In this paper, we propose a novel algorithm/architecture co-designed PIM, Helix, to power-efficiently and accurately accelerate nanopore base-calling. From algorithm perspective, we present systematic error aware training to minimize the number of systematic errors in a quantized base-caller. From architecture perspective, we propose a low-power SOT-MRAM-based ADC array to process analog-to-digital conversion operations and improve power efficiency of prior DNN PIMs. Moreover, we revised a traditional NVM-based dot-product engine to accelerate CTC decoding operations, and create a SOT-MRAM binary comparator array to process read voting. Compared to state-of-the-art PIMs, Helix improves base-calling throughput by $6\times$, throughput per Watt by $11.9\times$ and per $mm^2$ by $7.5\times$ without degrading base-calling accuracy.


[100] 2008.03108

On the Distribution of the Sum of Málaga-$\mathcal{M}$ Random Variables and Applications

In this paper, a very accurate approximation method for the statistics of the sum of M\'{a}laga-$\mathcal{M}$ random variates with pointing error (MRVs) is proposed. In particular, the probability density function of MRV is approximated by a Fox's $H$-function through the moment-based approach. Then, the respective moment-generating function of the sum of $N$ MRVs is provided, based on which the average symbol error rate is evaluated for an $N $-branch maximal-ratio combining (MRC) receiver. The retrieved results show that the proposed approximate results match accurately with the exact simulated ones. Additionally, the results show that the achievable diversity order increases as a function of the number of MRC diversity branches.


[101] 2008.03110

A Technique for Determining Relevance Scores of Process Activities using Graph-based Neural Networks

Process models generated through process mining depict the as-is state of a process. Through annotations with metrics such as the frequency or duration of activities, these models provide generic information to the process analyst. To improve business processes with respect to performance measures, process analysts require further guidance from the process model. In this study, we design Graph Relevance Miner (GRM), a technique based on graph neural networks, to determine the relevance scores for process activities with respect to performance measures. Annotating process models with such relevance scores facilitates a problem-focused analysis of the business process, placing these problems at the centre of the analysis. We quantitatively evaluate the predictive quality of our technique using four datasets from different domains, to demonstrate the faithfulness of the relevance scores. Furthermore, we present the results of a case study, which highlight the utility of the technique for organisations. Our work has important implications both for research and business applications, because process model-based analyses feature shortcomings that need to be urgently addressed to realise successful process mining at an enterprise level.


[102] 2008.03111

Associative Partial Domain Adaptation

Partial Adaptation (PDA) addresses a practical scenario in which the target domain contains only a subset of classes in the source domain. While PDA should take into account both class-level and sample-level to mitigate negative transfer, current approaches mostly rely on only one of them. In this paper, we propose a novel approach to fully exploit multi-level associations that can arise in PDA. Our Associative Partial Domain Adaptation (APDA) utilizes intra-domain association to actively select out non-trivial anomaly samples in each source-private class that sample-level weighting cannot handle. Additionally, our method considers inter-domain association to encourage positive transfer by mapping between nearby target samples and source samples with high label-commonness. For this, we exploit feature propagation in a proposed label space consisting of source ground-truth labels and target probabilistic labels. We further propose a geometric guidance loss based on the label commonness of each source class to encourage positive transfer. Our APDA consistently achieves state-of-the-art performance across public datasets.


[103] 2008.03115

Approximating Constraint Satisfaction Problems Symmetrically

This thesis investigates the extent to which the optimal value of a constraint satisfaction problem (CSP) can be approximated by some sentence of fixed point logic with counting (FPC). It is known that, assuming $\mathsf{P} \neq \mathsf{NP}$ and the Unique Games Conjecture, the best polynomial time approximation algorithm for any CSP is given by solving and rounding a specific semidefinite programming relaxation. We prove an analogue of this result for algorithms that are definable as FPC-interpretations, which holds without the assumption that $\mathsf{P} \neq \mathsf{NP}$. While we are not able to drop (an FPC-version of) the Unique Games Conjecture as an assumption, we do present some partial results toward proving it. Specifically, we give a novel construction which shows that, for all $\alpha > 0$, there exists a positive integer $q = \text{poly}(\frac{1}{\alpha})$ such that no there is no FPC-interpretation giving an $\alpha$-approximation of Unique Games on a label set of size $q$.


[104] 2008.03118

Super-relaxation of space-time-quantized ensemble of energy loads

Ensembles of thermostatically controlled loads (TCL) provide a significant demand response reserve for the system operator to balance power grids. However, this also results in the parasitic synchronization of individual devices within the ensemble leading to long post-demand-response oscillations in the integrated energy consumption of the ensemble. The synchronization is eventually destructed by fluctuations, thus leading to the (pre-demand response) steady state; however, this natural desynchronization, or relaxation to a statistically steady-state is too long. A resolution of this problem consists in measuring the ensemble's instantaneous consumption and using it as a feedback to stochastic switching of the ensemble's devices between on- and off- states. It was recently shown with a simplified continuous-time model that carefully tuned nonlinear feedback results in a fast relaxation of the ensemble energy consumption -- coined super-relaxation. Since both state information and control signals are discrete, the actual TCL devices operation is space-time quantized, and this must be considered for realistic TCL ensemble modelling. Here, assuming that states are characterized by a temperature (quantifying comfort) and the air conditioner regime (on, off), we construct a discrete model based on the probabilistic description of state transitions. We demonstrate that super-relaxation holds in such a more realistic setting, and that while it is stable against randomness in the stochastic matrix of the quantized model, it remains sensitive to the time discretization scheme. Aiming to achieve a balance between super-relaxation and customer's comfort, we analyze the dependence of super-relaxation on details of the space-time quantization, and provide a simple analytical criterion to avoid undesirable oscillations in consumption.


[105] 2008.03122

Sulla decifratura di Enigma -- Come un reverendo del XVIII secolo contribuì alla sconfitta degli U-boot tedeschi durante la Seconda Guerra Mondiale

This article, written in Italian language, explores the contribution given by Bayes' rule and by subjective probability in the work at Bletchley Park towards cracking Enigma cyphered messages during WWII. -- In questo articolo, scritto in Italiano, esploriamo il contributo dato dal teorema di Bayes e dalle idee della probabilit\`a soggettiva nel lavoro compiuto a Bletchley Park che ha portato a decifrare i messaggi cifrati con macchine Enigma durante la Seconda Guerra Mondiale.


[106] 2008.03124

Design Space Exploration of Power Delivery For Advanced Packaging Technologies

In this paper, a design space exploration of power delivery networks is performed for multi-chip 2.5-D and 3-D IC technologies. The focus of the paper is the effective placement of the voltage regulator modules (VRMs) for power supply noise (PSN) suppression. Multiple on-package VRM configurations have been analyzed and compared. Additionally, 3D IC chip-on-VRM and backside-of-the-package VRM configurations are studied. From the PSN perspective, the 3D IC chip-on-VRM case suppresses the PSN the most even with high current density hotspots. The paper also studies the impact of different parameters such as VRM-chip distance on the package, on-chip decoupling capacitor density, etc. on the PSN.


[107] 2008.03128

Revisiting Mid-Level Patterns for Distant-Domain Few-Shot Recognition

Cross-domain few-shot learning (FSL) is proposed recently to transfer knowledge from general-domain known classes (e.g., ImageNet) to novel classes in other domains, and recognize novel classes with only few training samples. In this paper, we go further to define a more challenging scenario that transfers knowledge from general-domain known classes to novel classes in distant domains which are significantly different from the general domain, e.g., medical data. To solve this challenging problem, we propose to exploit mid-level features, which are more transferable, yet under-explored in recent main-stream FSL works. To boost the discriminability of mid-level features, we propose a residual-prediction task for the training on known classes. In this task, we view the current training sample as a sample from a pseudo-novel class, so as to provide simulated novel-class data. However, this simulated data is from the same domain as known classes, and shares high-level patterns with other known classes. Therefore, we then use high-level patterns from other known classes to represent it and remove this high-level representation from the simulated data, outputting a residual term containing discriminative information of it that could not be represented by high-level patterns from other known classes. Then, mid-level features from multiple mid-layers are dynamically weighted to predict this residual term, which encourages the mid-level features to be discriminative. Notably, our method can be applied to both the regular in-domain FSL setting by emphasizing high-level & transformed mid-level features and the distant-domain FSL setting by emphasizing mid-level features. Experiments under both settings on six public datasets (including two challenging medical datasets) validate the rationale of the proposed method, demonstrating state-of-the-art performance on both settings.


[108] 2008.03129

Brain Drain and Brain Gain in Russia: Analyzing International Migration of Researchers by Discipline using Scopus Bibliometric Data 1996-2020

We study international mobility in academia with a focus on migration of researchers to and from Russia. Using all Scopus publications from 1996 to 2020, we analyze bibliometric data from over half a million researchers who have published with a Russian affiliation address at some point in their careers. Migration of researchers is observed through the changes in their affiliation addresses. For the first time, we analyze origins and destinations of migrant researchers with respect to their fields and performance and compute net migration rates based on incoming and outgoing flows. Our results indicate that while Russia has been a donor country in the late 1990s and early 2000s, it has experienced a relatively symmetric circulation of researchers in more recent years. Using subject categories of publications, we quantify the impact of migration on each field of scholarship. Our analysis shows that Russia has suffered a net loss in almost all disciplines and more so in neuroscience, decision sciences, dentistry, biochemistry, and mathematics. For economics and environmental science, there is a relatively balanced circulation of researchers to and from Russia. Our substantive results reveal new aspects of international mobility in academia and its impact on a national science system which speak directly to policy development. Methodologically, our new approach of handling big data can be adopted as a framework of analysis for studying scholarly migration in other countries.


[109] 2008.03130

Convolutional Complex Knowledge Graph Embeddings

In this paper, we study the problem of learning continuous vector representations of knowledge graphs for predicting missing links. We present a new approach called ConEx, which infers missing links by leveraging the composition of a 2D convolution with a Hermitian inner product of complex-valued embedding vectors. We evaluate ConEx against state-of-the-art approaches on the WN18RR, FB15K-237, KINSHIP and UMLS benchmark datasets. Our experimental results show that ConEx achieves a performance superior to that of state-of-the-art approaches such as RotatE, QuatE and TuckER on the link prediction task on all datasets while requiring at least 8 times fewer parameters. We ensure the reproducibility of our results by providing an open-source implementation which includes the training, evaluation scripts along with pre-trained models at https://github.com/conex-kge/ConEx.


[110] 2008.03131

A $2^{O(k)}n$ algorithm for $k$-cycle in minor-closed graph families

Let ${\mathcal C}$ be a proper minor-closed family of graphs. We present a randomized algorithm that given a graph $G \in {\mathcal C}$ with $n$ vertices, finds a simple cycle of size $k$ in $G$ (if exists) in $2^{O(k)}n$ time. The algorithm applies to both directed and undirected graphs. In previous linear time algorithms for this problem, the runtime dependence on $k$ is super-exponential. The algorithm can be derandomized yielding a $2^{O(k)}n\log n$ time algorithm.


[111] 2008.03134

Transistors: A Network Science-Based Historical Perspective

The development of modern electronics was to a large extent related to the advent and popularization of bipolar junction technology. The present work applies science of science concepts and methodologies in order to develop a relatively systematic, quantitative study of the development of electronics from a bipolar-junction-centered perspective. First, we searched the adopted dataset (Microsoft Academic Graph) for entries related to "bipolar junction transistor". Community detection was then applied in order to derive sub-areas, which were tentatively labeled into 10 overall groups. This modular graph was then studied from several perspectives, including topological measurements and time evolution. A number of interesting results are reported, including a good level of thematic coherence within each identified area, as well as the identification of distinct periods along the time evolution including the onset and coming of age of bipolater junction technology and related areas. A particularly surprising result was the verification of stable interrelationship between the identified areas along time.


[112] 2008.03138

On the Potential of Extending Aircraft Service Time Using a Fatigue Damage Index

Aircraft structures experience various kinds of loads over their entire lifetime, leading to fatigue and ultimately structural failure. In order to avoid structural failures during operation, the maximum number of flight cycles and flight hours is regulated by laws ensuring continued airworthiness. However, since every flight impacts the aircraft differently, not all airframes have been equally stressed at the time of decommissioning. Therefore, a new retirement criterion based on the fatigue damage index (FDI) is proposed. The criterion takes into account that aircraft are differently operated and thus enables an individual decommissioning of aircraft without compromising its safety. Based on aircraft sample data covering 95% of the Airbus A320 fleet over two years, the enhanced decommissioning criterion is estimated to significantly extend the average aircraft service life. The impact varies within the fleet, depending on the experienced seat load factors, cruise altitudes, and taxi times considered for the individual aircraft during operation. While seat load factors and flight altitudes significantly affect the defined FDI, the influence of taxi times is only minor. Based on the estimated increase in aircraft service life, the paper at hand motivates that for service life extensions, the FDI shall be considered as the limit of validity in the regulatory framework governing the decommissioning of aircraft.


[113] 2008.03140

Mathematical model of LoRaWAN channel access with capture effect

LoRaWAN is a promising low power long range wireless communications technology for the Internet of Things. An important feature of LoRaWAN gateways is related to so-called capture effect: under some conditions the gateway may correctly receive a frame even if it overlaps with other ones. In this paper, we develop a pioneering mathematical model of a LoRaWAN network which allows finding network capacity and transmission reliability taking into account the capture effect.


[114] 2008.03148

A note on the asymptotic stability of the Semi-Discrete method for Stochastic Differential Equations

We study the asymptotic stability of the semi-discrete (SD) numerical method for the approximation of stochastic differential equations. Recently, we examined the order of $\mathcal L^2$-convergence of the truncated SD method and showed that it can be arbitrarily close to $1/2,$ see \textit{Stamatiou, Halidias (2019), Convergence rates of the Semi-Discrete method for stochastic differential equations, Theory of Stochastic Processes, 24(40)}. We show that the truncated SD method is able to preserve the asymptotic stability of the underlying SDE. Motivated by a numerical example, we also propose a different SD scheme, using the Lamperti transformation to the original SDE, which we call Lamperti semi-discrete (LSD). Numerical simulations support our theoretical findings.


[115] 2008.03154

Decomposition of Longitudinal Deformations via Beltrami Descriptors

We present a mathematical model to decompose a longitudinal deformation into normal and abnormal components. The goal is to detect and extract subtle quivers from periodic motions in a video sequence. It has important applications in medical image analysis. To achieve this goal, we consider a representation of the longitudinal deformation, called the Beltrami descriptor, based on quasiconformal theories. The Beltrami descriptor is a complex-valued matrix. Each longitudinal deformation is associated to a Beltrami descriptor and vice versa. To decompose the longitudinal deformation, we propose to carry out the low rank and sparse decomposition of the Beltrami descriptor. The low rank component corresponds to the periodic motions, whereas the sparse part corresponds to the abnormal motions of a longitudinal deformation. Experiments have been carried out on both synthetic and real video sequences. Results demonstrate the efficacy of our proposed model to decompose a longitudinal deformation into regular and irregular components.


[116] 2008.03156

Better Fine-Tuning by Reducing Representational Collapse

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.


[117] 2008.03164

IMS at SemEval-2020 Task 1: How low can you go? Dimensionality in Lexical Semantic Change Detection

We present the results of our system for SemEval-2020 Task 1 that exploits a commonly used lexical semantic change detection model based on Skip-Gram with Negative Sampling. Our system focuses on Vector Initialization (VI) alignment, compares VI to the currently top-ranking models for Subtask 2 and demonstrates that these can be outperformed if we optimize VI dimensionality. We demonstrate that differences in performance can largely be attributed to model-specific sources of noise, and we reveal a strong relationship between dimensionality and frequency-induced noise in VI alignment. Our results suggest that lexical semantic change models integrating vector space alignment should pay more attention to the role of the dimensionality parameter.


[118] 2008.03172

Orthologics for Cones

In applications that use knowledge representation (KR) techniques, in particular those that combine data-driven and logic methods, the domain of objects is not an abstract unstructured domain, but it exhibits a dedicated, deep structure of geometric objects. One example is the class of convex sets used to model natural concepts in conceptual spaces, which also links via convex optimization techniques to machine learning. In this paper we study logics for such geometric structures. Using the machinery of lattice theory, we describe an extension of minimal orthologic with a partial modularity rule that holds for closed convex cones. This logic combines a feasible data structure (exploiting convexity/conicity) with sufficient expressivity, including full orthonegation (exploiting conicity).


[119] 2008.03174

Adapting Nielsen's Usability Heuristics to the Context of Mobile Augmented Reality

Augmented reality (AR) is an emerging technology in mobile app design during recent years. However, usability challenges in these apps are prominent. There are currently no established guidelines for designing and evaluating interactions in AR as there are in traditional user interfaces. In this work, we aimed to examine the usability of current mobile AR applications and interpreting classic usability heuristics in the context of mobile AR. Particularly, we focused on AR home design apps because of their popularity and ability to incorporate important mobile AR interaction schemas. Our findings indicated that it is important for the designers to consider the unfamiliarity of AR technology to the vast users and to take technological limitations into consideration when designing mobile AR apps. Our work serves as a first step for establishing more general heuristics and guidelines for mobile AR.


[120] 2008.03182

Privacy-Preserving Dynamic Average Consensus via State Decomposition: Case Study on Multi-Robot Formation Control

In this paper, the problem of privacy preservation in the continuous-time dynamic average consensus is addressed by using a state decomposition scheme. We first show that for a conventional dynamic average consensus algorithm, the external eavesdropper can successfully wiretap the reference signals of each local agent. Then, to provide privacy protection against the eavesdropper, a state decomposition scheme is proposed. The main idea of the proposed scheme is to decompose the original state of each agent into two sub-states. One of the two sub-states succeeds the role of the original state in inter-node interactions, while the other sub-state is invisible to other neighboring agents and only communicates with the first sub-state of the same agent. The new reference signals for the two sub-states can be constructed randomly under certain constraints, which ensures that the convergence properties of the consensus algorithm can be retained. Theoretical analysis shows that under the state decomposition scheme, the eavesdropper cannot discover the private reference signals of each agent with any guaranteed accuracy. Moreover, the proposed privacy-preserving consensus algorithm is successfully applied to solve a formation control problem for multiple nonholonomic mobile robots. Numerical simulation is provided to demonstrate the effectiveness of the proposed approach.


[121] 2008.03185

Analysis of the second order BDF scheme with variable steps for the molecular beam epitaxial model without slope selection

In this work, we are concerned with the stability and convergence analysis of the second order BDF (BDF2) scheme with variable steps for the molecular beam epitaxial model without slope selection. We first show that the variable-step BDF2 scheme is convex and uniquely solvable under a weak time-step constraint. Then we show that it preserves an energy dissipation law if the adjacent time-step ratios $r_k:=\tau_k/\tau_{k-1}<3.561.$ Moreover, with a novel discrete orthogonal convolution kernels argument and some new discrete convolutional inequalities, the $L^2$ norm stability and rigorous error estimates are established, under the same step-ratios constraint that ensuring the energy stability., i.e., $0<r_k<3.561.$ This is known to be the best result in literature. We finally adopt an adaptive time-stepping strategy to accelerate the computations of the steady state solution and confirm our theoretical findings by numerical examples.


[122] 2008.03186

An integrated numerical model for coupled poro-hydro-mechanics and fracture propagation using embedded meshes

Integrated models for fluid-driven fracture propagation and general multiphase flow in porous media are valuable to the study and engineering of several systems, including hydraulic fracturing, underground disposal of waste, and geohazard mitigation across such applications. This work extends the coupled model multiphase flow and poromechanical model of \cite{ren2018embedded} to admit fracture propagation (FP). The coupled XFEM-EDFM scheme utilizes a separate fracture mesh that is embedded on a static background mesh. The onset and dynamics of fracture propagation (FP) are governed by the equivalent stress intensity factor (SIF) criterion. A domain-integral method (J integral) is applied to compute this information. An adaptive time-marching scheme is proposed to rapidly restrict and grow temporal resolution to match the underlying time-scales. The proposed model is verified with analytical solutions, and shows the capability to accurately and adaptively co-simulate fluid transport and deformation as well as the propagation of multiple fractures.


[123] 2008.03195

A Study on Visual Perception of Light Field Content

The effective design of visual computing systems depends heavily on the anticipation of visual attention, or saliency. While visual attention is well investigated for conventional 2D images and video, it is nevertheless a very active research area for emerging immersive media. In particular, visual attention of light fields (light rays of a scene captured by a grid of cameras or micro lenses) has only recently become a focus of research. As they may be rendered and consumed in various ways, a primary challenge that arises is the definition of what visual perception of light field content should be. In this work, we present a visual attention study on light field content. We conducted perception experiments displaying them to users in various ways and collected corresponding visual attention data. Our analysis highlights characteristics of user behaviour in light field imaging applications. The light field data set and attention data are provided with this paper.


[124] 2008.03201

Convolutional neural network based deep-learning architecture for intraprostatic tumour contouring on PSMA PET images in patients with primary prostate cancer

Accurate delineation of the intraprostatic gross tumour volume (GTV) is a prerequisite for treatment approaches in patients with primary prostate cancer (PCa). Prostate-specific membrane antigen positron emission tomography (PSMA-PET) may outperform MRI in GTV detection. However, visual GTV delineation underlies interobserver heterogeneity and is time consuming. The aim of this study was to develop a convolutional neural network (CNN) for automated segmentation of intraprostatic tumour (GTV-CNN) in PSMA-PET. Methods: The CNN (3D U-Net) was trained on [68Ga]PSMA-PET images of 152 patients from two different institutions and the training labels were generated manually using a validated technique. The CNN was tested on two independent internal (cohort 1: [68Ga]PSMA-PET, n=18 and cohort 2: [18F]PSMA-PET, n=19) and one external (cohort 3: [68Ga]PSMA-PET, n=20) test-datasets. Accordance between manual contours and GTV-CNN was assessed with Dice-S{\o}rensen coefficient (DSC). Sensitivity and specificity were calculated for the two internal test-datasets by using whole-mount histology. Results: Median DSCs for cohorts 1-3 were 0.84 (range: 0.32-0.95), 0.81 (range: 0.28-0.93) and 0.83 (range: 0.32-0.93), respectively. Sensitivities and specificities for GTV-CNN were comparable with manual expert contours: 0.98 and 0.76 (cohort 1) and 1 and 0.57 (cohort 2), respectively. Computation time was around 6 seconds for a standard dataset. Conclusion: The application of a CNN for automated contouring of intraprostatic GTV in [68Ga]PSMA- and [18F]PSMA-PET images resulted in a high concordance with expert contours and in high sensitivities and specificities in comparison with histology reference. This robust, accurate and fast technique may be implemented for treatment concepts in primary PCa. The trained model and the study's source code are available in an open source repository.


[125] 2008.03202

Middle-Aged Video Consumers' Beliefs About Algorithmic Recommendations on YouTube

User beliefs about algorithmic systems are constantly co-produced through user interaction and the complex socio-technical systems that generate recommendations. Identifying these beliefs is crucial because they influence how users interact with recommendation algorithms. With no prior work on user beliefs of algorithmic video recommendations, practitioners lack relevant knowledge to improve the user experience of such systems. To address this problem, we conducted semi-structured interviews with middle-aged YouTube video consumers to analyze their user beliefs about the video recommendation system. Our analysis revealed different factors that users believe influence their recommendations. Based on these factors, we identified four groups of user beliefs: Previous Actions, Social Media, Recommender System, and Company Policy. Additionally, we propose a framework to distinguish the four main actors that users believe influence their video recommendations: the current user, other users, the algorithm, and the organization. This framework provides a new lens to explore design suggestions based on the agency of these four actors. It also exposes a novel aspect previously unexplored: the effect of corporate decisions on the interaction with algorithmic recommendations. While we found that users are aware of the existence of the recommendation system on YouTube, we show that their understanding of this system is limited.


[126] 2008.03209

Investigating maximum likelihood based training of infinite mixtures for uncertainty quantification

Uncertainty quantification in neural networks gained a lot of attention in the past years. The most popular approaches, Bayesian neural networks (BNNs), Monte Carlo dropout, and deep ensembles have one thing in common: they are all based on some kind of mixture model. While the BNNs build infinite mixture models and are derived via variational inference, the latter two build finite mixtures trained with the maximum likelihood method. In this work we investigate the effect of training an infinite mixture distribution with the maximum likelihood method instead of variational inference. We find that the proposed objective leads to stochastic networks with an increased predictive variance, which improves uncertainty based identification of miss-classification and robustness against adversarial attacks in comparison to a standard BNN with equivalent network structure. The new model also displays higher entropy on out-of-distribution data.


[127] 2008.03210

A Theory of Hypergames on Graphs for Synthesizing Dynamic Cyber Defense with Deception

In this chapter, we present an approach using formal methods to synthesize reactive defense strategy in a cyber network, equipped with a set of decoy systems. We first generalize formal graphical security models--attack graphs--to incorporate defender's countermeasures in a game-theoretic model, called an attack-defend game on graph. This game captures the dynamic interactions between the defender and the attacker and their defense/attack objectives in formal logic. Then, we introduce a class of hypergames to model asymmetric information created by decoys in the attacker-defender interactions. Given qualitative security specifications in formal logic, we show that the solution concepts from hypergames and reactive synthesis in formal methods can be extended to synthesize effective dynamic defense strategy using cyber deception. The strategy takes the advantages of the misperception of the attacker to ensure security specification is satisfied, which may not be satisfiable when the information is symmetric.


[128] 2008.03212

Managing caching strategies for stream reasoning with reinforcement learning

Efficient decision-making over continuously changing data is essential for many application domains such as cyber-physical systems, industry digitalization, etc. Modern stream reasoning frameworks allow one to model and solve various real-world problems using incremental and continuous evaluation of programs as new data arrives in the stream. Applied techniques use, e.g., Datalog-like materialization or truth maintenance algorithms to avoid costly re-computations, thus ensuring low latency and high throughput of a stream reasoner. However, the expressiveness of existing approaches is quite limited and, e.g., they cannot be used to encode problems with constraints, which often appear in practice. In this paper, we suggest a novel approach that uses the Conflict-Driven Constraint Learning (CDCL) to efficiently update legacy solutions by using intelligent management of learned constraints. In particular, we study the applicability of reinforcement learning to continuously assess the utility of learned constraints computed in previous invocations of the solving algorithm for the current one. Evaluations conducted on real-world reconfiguration problems show that providing a CDCL algorithm with relevant learned constraints from previous iterations results in significant performance improvements of the algorithm in stream reasoning scenarios. Under consideration for acceptance in TPLP.


[129] 2008.03215

Autonomous Six-Degree-of-Freedom Spacecraft Docking Maneuvers via Reinforcement Learning

A policy for six-degree-of-freedom docking maneuvers is developed through reinforcement learning and implemented as a feedback control law. Reinforcement learning provides a potential framework for robust, autonomous maneuvers in uncertain environments with low on-board computational cost. Specifically, proximal policy optimization is used to produce a docking policy that is valid over a portion of the six-degree-of-freedom state-space while striving to minimize performance and control costs. Experiments using the simulated Apollo transposition and docking maneuver exhibit the policy's capabilities and provide a comparison with standard optimal control techniques. Furthermore, specific challenges and work-arounds, as well as a discussion on the benefits and disadvantages of reinforcement learning for docking policies, are discussed to facilitate future research. As such, this work will serve as a foundation for further investigation of learning-based control laws for spacecraft proximity operations in uncertain environments.


[130] 2008.03225

A Probabilistic Numerical Extension of the Conjugate Gradient Method

We present a Conjugate Gradient (CG) implementation of the probabilistic numerical solver BayesCG, whose error estimates are a fully integrated design feature, easy to compute, and competitive with the best existing estimators. More specifically, we extend BayesCG to singular prior covariances, derive recursions for the posterior covariances, express the posteriors as projections, and establish that BayesCG retains the minimization properties over Krylov spaces regardless of the singular priors. We introduce a possibly singular Krylov prior covariance, under which the BayesCG posterior means coincide with the CG iterates and the posteriors can be computed efficiently. Because of its factored form, the Krylov prior is amenable to low-rank approximation, which produces an efficient BayesCG implementation as a CG method. We also introduce a probabilistic error estimator, the `$S$-statistic'. Although designed for sampling from BayesCG posteriors, its mean and variance under approximate Krylov priors can be computed with CG. An approximation of the $S$-statistic by a `95 percent credible interval' avoids the cost of sampling altogether. Numerical experiments illustrate that the resulting error estimates are competitive with the best existing methods and are easy to compute.


[131] 2008.03229

Towards Sample Efficient Agents through Algorithmic Alignment

Deep reinforcement-learning agents have demonstrated great success on various tasks. However, current methods typically suffer from sample complexity problems when learning in high dimensional observation spaces, which limits the application of deep reinforcement-learning agents to complex, uncertain real-world tasks. In this work, we propose and explore Deep Graph Value Network as a promising method to work around this drawback using a message-passing mechanism. The main idea is that the RL agent should be guided by structured non-neural-network algorithms like dynamic programming. According to recent advances in algorithmic alignment, neural networks with structured computation procedures can be trained efficiently. We demonstrate the potential of graph neural network in supporting sample efficient learning by showing that Deep Graph Value Network can outperform unstructured baselines by a large margin with low sample complexity.


[132] 2008.03230

ESPRESSO: Entropy and ShaPe awaRe timE-Series SegmentatiOn for processing heterogeneous sensor data

Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi-dimensional time-series that is formulated to exploit the entropy and temporal shape properties of time-series. ESPRESSO differs from existing methods that focus upon particular statistical or temporal properties of time-series exclusively. As part of model development, a novel temporal representation of time-series $WCAC$ was introduced along with a greedy search approach that estimate segments based upon the entropy metric. ESPRESSO was shown to offer superior performance to four state-of-the-art methods across seven public datasets of wearable and wear-free sensing. In addition, we undertake a deeper investigation of these datasets to understand how ESPRESSO and its constituent methods perform with respect to different dataset characteristics. Finally, we provide two interesting case-studies to show how applying ESPRESSO can assist in inferring daily activity routines and the emotional state of humans.


[133] 2008.03231

Dissipativity verification with guarantees for polynomial systems from noisy input-state data

In this paper, we investigate the verification of dissipativity properties for polynomial systems without explicit knowledge of a model but directly from noise-corrupted measurements. Contrary to most data-driven approaches for nonlinear systems, we determine dissipativity properties over infinite time horizon using input-state data. To this end, we propose two characterizations of the noise that affects the system and deduce from each characterization a data-based set-membership representation of the ground-truth system. Each representation then serves as a framework to derive computationally attractive conditions to verify dissipativity properties with rigorous guarantees from noise-corrupted data using SOS optimization.


[134] 2008.03232

Quran Intelligent Ontology Construction Approach Using Association Rules Mining

Ontology can be seen as a formal representation of knowledge. They have been investigated in many artificial intelligence studies including semantic web, software engineering, and information retrieval. The aim of ontology is to develop knowledge representations that can be shared and reused. This research project is concerned with the use of association rules to extract the Quran ontology. The manual acquisition of ontologies from Quran verses can be very costly; therefore, we need an intelligent system for Quran ontology construction using patternbased schemes and associations rules to discover Quran concepts and semantics relations from Quran verses. Our system is based on the combination of statistics and linguistics methods to extract concepts and conceptual relations from Quran. In particular, a linguistic pattern-based approach is exploited to extract specific concepts from the Quran, while the conceptual relations are found based on association rules technique. The Quran ontology will offer a new and powerful representation of Quran knowledge, and the association rules will help to represent the relations between all classes of connected concepts in the Quran ontology.


[135] 2008.03252

A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet of Things

Internet of Things (IoT) is an innovative paradigm envisioned to provide massive applications that are now part of our daily lives. Millions of smart devices are deployed within complex networks to provide vibrant functionalities including communications, monitoring, and controlling of critical infrastructures. However, this massive growth of IoT devices and the corresponding huge data traffic generated at the edge of the network created additional burdens on the state-of-the-art centralized cloud computing paradigm due to the bandwidth and resources scarcity. Hence, edge computing (EC) is emerging as an innovative strategy that brings data processing and storage near to the end users, leading to what is called EC-assisted IoT. Although this paradigm provides unique features and enhanced quality of service (QoS), it also introduces huge risks in data security and privacy aspects. This paper conducts a comprehensive survey on security and privacy issues in the context of EC-assisted IoT. In particular, we first present an overview of EC-assisted IoT including definitions, applications, architecture, advantages, and challenges. Second, we define security and privacy in the context of EC-assisted IoT. Then, we extensively discuss the major classifications of attacks in EC-assisted IoT and provide possible solutions and countermeasures along with the related research efforts. After that, we further classify some security and privacy issues as discussed in the literature based on security services and based on security objectives and functions. Finally, several open challenges and future research directions for secure EC-assisted IoT paradigm are also extensively provided.


[136] 2008.03254

Evaluating Snowflake as an Indistinguishable Censorship Circumvention Tool

Tor is the most well-known tool for circumventing censorship. Unfortunately, Tor traffic has been shown to be detectable using deep-packet inspection. WebRTC is a popular web frame-work that enables browser-to-browser connections. Snowflake is a novel pluggable transport that leverages WebRTC to connect Tor clients to the Tor network. In theory, Snowflake was created to be indistinguishable from other WebRTC services. In this paper, we evaluate the indistinguishability of Snowflake. We collect over 6,500 DTLS handshakes from Snowflake, Facebook Messenger, Google Hangouts, and Discord WebRTC connections and show that Snowflake is identifiable among these applications with 100% accuracy. We show that several features, including the extensions offered and the number of packets in the handshake, distinguish Snowflake among these services. Finally, we suggest recommendations for improving identification resistance in Snowflake


[137] 2008.03260

Tera-SLASH: A Distributed Energy-Efficient MPI based LSH System for Tera-Scale Similarity Search

We present Tera-SLASH, an MPI (Message Passing Interface) based distributed system for approximate similarity search over tera-scale datasets. SLASH provides a multi-node implementation of the popular LSH (locality sensitive hashing) algorithm, which is generally implemented on a single machine. We offer a novel design and sketching solution to reduce the inter-machine communication overheads exponentially. In a direct comparison on comparable hardware, SLASH is more than 10000x faster than the popular LSH package in PySpark. PySpark is a widely-adopted distributed implementation of the LSH algorithm for large datasets and is deployed in commercial platforms. In the end, we show how our system scale to Tera-scale Criteo dataset with more than 4 billion samples. SLASH can index this 2.3 terabyte data over 20 nodes (on a shared cluster at Rice) in under an hour, with a query time in a fraction of milliseconds. To the best of our knowledge, there is no open-source system that can index and perform a similarity search on Criteo with a commodity cluster.


[138] 2008.03263

COVID, BLM, and the polarization of US politicians on Twitter

We mapped the tweets of 520 US Congress members, focusing on analyzing their engagement with two broad topics: first, the COVID-19 pandemic, and second, the recent wave of anti-racist protest. We find that, in discussing COVID-19, Democrats frame the issue in terms of public health, while Republicans are more likely to focus on small businesses and the economy. When looking at the discourse around anti-Black violence, we find that Democrats are far more likely to name police brutality as a specific concern. In contrast, Republicans not only discuss the issue far less, but also keep their terms more general, as well as criticizing perceived protest violence.


[139] 2008.03270

Multi-Level Temporal Pyramid Network for Action Detection

Currently, one-stage frameworks have been widely applied for temporal action detection, but they still suffer from the challenge that the action instances span a wide range of time. The reason is that these one-stage detectors, e.g., Single Shot Multi-Box Detector (SSD), extract temporal features only applying a single-level layer for each head, which is not discriminative enough to perform classification and regression. In this paper, we propose a Multi-Level Temporal Pyramid Network (MLTPN) to improve the discrimination of the features. Specially, we first fuse the features from multiple layers with different temporal resolutions, to encode multi-layer temporal information. We then apply a multi-level feature pyramid architecture on the features to enhance their discriminative abilities. Finally, we design a simple yet effective feature fusion module to fuse the multi-level multi-scale features. By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations. We evaluate MLTPN on two challenging datasets: THUMOS'14 and Activitynet v1.3, and the experimental results show that MLTPN obtains competitive performance on Activitynet v1.3 and outperforms the state-of-the-art approaches on THUMOS'14 significantly.


[140] 2008.03273

SafePILCO: a software tool for safe and data-efficient policy synthesis

SafePILCO is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known PILCO algorithm, originally written in MATLAB, to support safe learning. We provide a Python implementation and leverage existing libraries that allow the codebase to remain short and modular, which is appropriate for wider use by the verification, reinforcement learning, and control communities.


[141] 2008.03274

SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

In this paper, we present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media. The goal of this shared task is to design automatic methods for emphasis selection, i.e. choosing candidates for emphasis in textual content to enable automated design assistance in authoring. The main focus is on short text instances for social media, with a variety of examples, from social media posts to inspirational quotes. Participants were asked to model emphasis using plain text with no additional context from the user or other design considerations. SemEval-2020 Emphasis Selection shared task attracted 197 participants in the early phase and a total of 31 teams made submissions to this task. The highest-ranked submission achieved 0.823 Matchm score. The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used, and part of speech tag (POS) was the most useful feature. Full results can be found on the task's website.


[142] 2008.03276

Total Variation Diminishing (TVD) method for Elastohydrodynamic Lubrication (EHL) problem on Parallel Computers

In this article, we offer a novel numerical approach for the solution of elastohydrodynamic lubrication line and point contact problems using a class of total variation diminishing (TVD) schemes on parallel computers. A direct parallel approach is presented by introducing a novel solver named as projected alternate quadrant interlocking factorization (PAQIF) by solving discrete variational inequality. For one-dimensional EHL case, we use weighted change in Newton-Raphson approximation to compute the Jacobian matrix in the form of a banded matrix by dividing two subregions on the whole computation domain. Such subregion matrices are assembled by measuring the ratio of diffusive coefficient and discrete grid length on the domain of the interest. The banded matrix is then processed to parallel computers for solving discrete linearized complementarity system using PAQIF algorithm. The idea is easily extended in two-dimensional EHL case by taking appropriate splitting in x and y alternating directions respectively. Numerical experiments are performed and analyzed to validate the performance of computed solution on serial and parallel computers.


[143] 2008.03277

Learning a natural-language to LTL executable semantic parser for grounded robotics

Children acquire their native language with apparent ease by observing how language is used in context and attempting to use it themselves. They do so without laborious annotations, negative examples, or even direct corrections. We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands. In particular, we focus on the difficult domain of commands with a temporal aspect, whose semantics we capture with Linear Temporal Logic, LTL. Our parser is trained with pairs of sentences and executions as well as an executor. At training time, the parser hypothesizes a meaning representation for the input as a formula in LTL. Three competing pressures allow the parser to discover meaning from language. First, any hypothesized meaning for a sentence must be permissive enough to reflect all the annotated execution trajectories. Second, the executor -- a pretrained end-to-end LTL planner -- must find that the observed trajectories are likely executions of the meaning. Finally, a generator, which reconstructs the original input, encourages the model to find representations that conserve knowledge about the command. Together these ensure that the meaning is neither too general nor too specific. Our model generalizes well, being able to parse and execute both machine-generated and human-generated commands, with near-equal accuracy, despite the fact that the human-generated sentences are much more varied and complex with an open lexicon. The approach presented here is not specific to LTL; it can be applied to any domain where sentence meanings can be hypothesized and an executor can verify these meanings, thus opening the door to many applications for robotic agents.


[144] 2008.03281

Scanning electron diffraction tomography of strain

Strain engineering is used to obtain desirable materials properties in a range of modern technologies. Direct nanoscale measurement of the three-dimensional strain tensor field within these materials has however been limited by a lack of suitable experimental techniques and data analysis tools. Scanning electron diffraction has emerged as a powerful tool for obtaining two-dimensional maps of strain components perpendicular to the incident electron beam direction. Extension of this method to recover the full three-dimensional strain tensor field has been restricted though by the absence of a formal framework for tensor tomography using such data. Here, we show that it is possible to reconstruct the full non-symmetric strain tensor field as the solution to an ill-posed tensor tomography inverse problem. We then demonstrate the properties of this tomography problem both analytically and computationally, highlighting why incorporating precession to perform scanning precession electron diffraction may be important. We establish a general framework for non-symmetric tensor tomography and demonstrate computationally its applicability for achieving strain tomography with scanning precession electron diffraction data.


[145] 2008.03285

Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning

Dexterous manipulation of objects in virtual environments with our bare hands, by using only a depth sensor and a state-of-the-art 3D hand pose estimator (HPE), is challenging. While virtual environments are ruled by physics, e.g. object weights and surface frictions, the absence of force feedback makes the task challenging, as even slight inaccuracies on finger tips or contact points from HPE may make the interactions fail. Prior arts simply generate contact forces in the direction of the fingers' closures, when finger joints penetrate virtual objects. Although useful for simple grasping scenarios, they cannot be applied to dexterous manipulations such as in-hand manipulation. Existing reinforcement learning (RL) and imitation learning (IL) approaches train agents that learn skills by using task-specific rewards, without considering any online user input. In this work, we propose to learn a model that maps noisy input hand poses to target virtual poses, which introduces the needed contacts to accomplish the tasks on a physics simulator. The agent is trained in a residual setting by using a model-free hybrid RL+IL approach. A 3D hand pose estimation reward is introduced leading to an improvement on HPE accuracy when the physics-guided corrected target poses are remapped to the input space. As the model corrects HPE errors by applying minor but crucial joint displacements for contacts, this helps to keep the generated motion visually close to the user input. Since HPE sequences performing successful virtual interactions do not exist, a data generation scheme to train and evaluate the system is proposed. We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.


[146] 2008.03286

HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures

We present HoliCity, a city-scale 3D dataset with rich structural information. Currently, this dataset has 6,300 real-world panoramas of resolution $13312 \times 6656$ that are accurately aligned with the CAD model of downtown London with an area of more than 20 km$^2$, in which the median reprojection error of the alignment of an average image is less than half a degree. This dataset aims to be an all-in-one data platform for research of learning abstracted high-level holistic 3D structures that can be derived from city CAD models, e.g., corners, lines, wireframes, planes, and cuboids, with the ultimate goal of supporting real-world applications including city-scale reconstruction, localization, mapping, and augmented reality. The accurate alignment of the 3D CAD models and panoramas also benefits low-level 3D vision tasks such as surface normal estimation, as the surface normal extracted from previous LiDAR-based datasets is often noisy. We conduct experiments to demonstrate the applications of HoliCity, such as predicting surface segmentation, normal maps, depth maps, and vanishing points, as well as test the generalizability of methods trained on HoliCity and other related datasets. HoliCity is available at https://holicity.io.


[147] 2008.03290

A Novel Tampering Attack on AES Cores with Hardware Trojans

The implementation of cryptographic primitives in integrated circuits (ICs) continues to increase over the years due to the recent advancement of semiconductor manufacturing and reduction of cost per transistors. The hardware implementation makes cryptographic operations faster and more energy-efficient. However, various hardware attacks have been proposed aiming to extract the secret key in order to undermine the security of these primitives. In this paper, we focus on the widely used advanced encryption standard (AES) block cipher and demonstrate its vulnerability against tampering attack. Our proposed attack relies on implanting a hardware Trojan in the netlist by an untrusted foundry, which can design and implement such a Trojan as it has access to the design layout and mask information. The hardware Trojan's activation modifies a particular round's input data by preventing the effect of all previous rounds' key-dependent computation. We propose to use a sequential hardware Trojan to deliver the payload at the input of an internal round for achieving this modification of data. All the internal subkeys, and finally, the secret key can be computed from the observed ciphertext once the Trojan is activated. We implement our proposed tampering attack with a sequential hardware Trojan inserted into a 128-bit AES design from OpenCores benchmark suite and report the area overhead to demonstrate the feasibility of the proposed tampering attack.


[148] 2008.02830

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator. The proposed generative architecture is invariant to the speaker's identity and can be trained to generate target singers from unlabeled training data, using either speech or singing sources. The model is optimized in an end-to-end fashion without any manual supervision, such as lyrics, musical notes or parallel samples. The proposed approach is fully-convolutional and can generate audio in real-time. Experiments show that our method significantly outperforms the baseline methods while generating convincingly better audio samples than alternative attempts.


[149] 2008.02852

Learning Insulin-Glucose Dynamics in the Wild

We develop a new model of insulin-glucose dynamics for forecasting blood glucose in type 1 diabetics. We augment an existing biomedical model by introducing time-varying dynamics driven by a machine learning sequence model. Our model maintains a physiologically plausible inductive bias and clinically interpretable parameters -- e.g., insulin sensitivity -- while inheriting the flexibility of modern pattern recognition algorithms. Critical to modeling success are the flexible, but structured representations of subject variability with a sequence model. In contrast, less constrained models like the LSTM fail to provide reliable or physiologically plausible forecasts. We conduct an extensive empirical study. We show that allowing biomedical model dynamics to vary in time improves forecasting at long time horizons, up to six hours, and produces forecasts consistent with the physiological effects of insulin and carbohydrates.


[150] 2008.02856

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

This paper considers the multi-agent linear least-squares problem in a server-agent network. In this problem, the system comprises multiple agents, each having a set of local data points, that are connected to a server. The goal for the agents is to compute a linear mathematical model that optimally fits the collective data points held by all the agents, without sharing their individual local data points. This goal can be achieved, in principle, using the server-agent variant of the traditional iterative gradient-descent method. The gradient-descent method converges linearly to a solution, and its rate of convergence is lower bounded by the conditioning of the agents' collective data points. If the data points are ill-conditioned, the gradient-descent method may require a large number of iterations to converge. We propose an iterative pre-conditioning technique that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method. We rigorously show that the resulting pre-conditioned gradient-descent method, with the proposed iterative pre-conditioning, achieves superlinear convergence when the least-squares problem has a unique solution. In general, the convergence is linear with improved rate of convergence in comparison to the traditional gradient-descent method and the state-of-the-art accelerated gradient-descent methods. We further illustrate the improved rate of convergence of our proposed algorithm through experiments on different real-world least-squares problems in both noise-free and noisy computation environment.


[151] 2008.02859

Confidence-guided Lesion Mask-based Simultaneous Synthesis of Anatomic and Molecular MR Images in Patients with Post-treatment Malignant Gliomas

Data-driven automatic approaches have demonstrated their great potential in resolving various clinical diagnostic dilemmas in neuro-oncology, especially with the help of standard anatomic and advanced molecular MR images. However, data quantity and quality remain a key determinant of, and a significant limit on, the potential of such applications. In our previous work, we explored synthesis of anatomic and molecular MR image network (SAMR) in patients with post-treatment malignant glioms. Now, we extend it and propose Confidence Guided SAMR (CG-SAMR) that synthesizes data from lesion information to multi-modal anatomic sequences, including T1-weighted (T1w), gadolinium enhanced T1w (Gd-T1w), T2-weighted (T2w), and fluid-attenuated inversion recovery (FLAIR), and the molecular amide proton transfer-weighted (APTw) sequence. We introduce a module which guides the synthesis based on confidence measure about the intermediate results. Furthermore, we extend the proposed architecture for unsupervised synthesis so that unpaired data can be used for training the network. Extensive experiments on real clinical data demonstrate that the proposed model can perform better than the state-of-theart synthesis methods.


[152] 2008.02863

A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition

This paper presents a transfer learning method in speech emotion recognition based on a Time-Delay Neural Network (TDNN) architecture. A major challenge in the current speech-based emotion detection research is data scarcity. The proposed method resolves this problem by applying transfer learning techniques in order to leverage data from the automatic speech recognition (ASR) task for which ample data is available. Our experiments also show the advantage of speaker-class adaptation modeling techniques by adopting identity-vector (i-vector) based features in addition to standard Mel-Frequency Cepstral Coefficient (MFCC) features.[1] We show the transfer learning models significantly outperform the other methods without pretraining on ASR. The experiments performed on the publicly available IEMOCAP dataset which provides 12 hours of motional speech data. The transfer learning was initialized by using the Ted-Lium v.2 speech dataset providing 207 hours of audio with the corresponding transcripts. We achieve the highest significantly higher accuracy when compared to state-of-the-art, using five-fold cross validation. Using only speech, we obtain an accuracy 71.7% for anger, excitement, sadness, and neutrality emotion content.


[153] 2008.02885

A general solution to the preferential selection model

We provide a general analytic solution to Herbert Simon's 1955 model for time-evolving novelty functions. This has far-reaching consequences: Simon's is a pre-cursor model for Barabasi's 1999 preferential attachment model for growing social networks, and our general abstraction of it more considers attachment to be a form of link selection. We show that any system which can be modeled as instances of types---i.e., occurrence data (frequencies)---can be generatively modeled (and simulated) from a distributional perspective with an exceptionally high-degree of accuracy.


[154] 2008.02900

Respiratory Sound Classification Using Long-Short Term Memory

Developing a reliable sound detection and recognition system offers many benefits and has many useful applications in different industries. This paper examines the difficulties that exist when attempting to perform sound classification as it relates to respiratory disease classification. Some methods which have been employed such as independent component analysis and blind source separation are examined. Finally, an examination on the use of deep learning and long short-term memory networks is performed in order to identify how such a task can be implemented.


[155] 2008.02901

Benign Overfitting and Noisy Features

Modern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recently been characterized using so called \textit{double descent} curves where the risk undergoes another descent (in addition to the classical U-shaped learning curve when the number of parameters is small) as we increase the number of parameters beyond a certain threshold. In this paper, we examine the conditions under which \textit{Benign Overfitting} occurs in the random feature (RF) models, i.e. in a two-layer neural network with fixed first layer weights. We adopt a new view of random feature and show that \textit{benign overfitting} arises due to the noise which resides in such features (the noise may already be present in the data and propagate to the features or it may be added by the user to the features directly) and plays an important implicit regularization role in the phenomenon.


[156] 2008.02908

Demand Response For Residential Uses: A Data Analytics Approach

In the Smart Grid environment, the advent of intelligent measuring devices facilitates monitoring appliance electricity consumption. This data can be used in applying Demand Response (DR) in residential houses through data analytics, and developing data mining techniques. In this research, we introduce a smart system foundation that is applied to user's disaggregated power consumption data. This system encourages the users to apply DR by changing their behaviour of using heavier operation modes to lighter modes, and by encouraging users to shift their usages to off-peak hours. First, we apply Cross Correlation (XCORR) to detect times of the occurrences when an appliance is being used. We then use The Dynamic Time Warping (DTW) to recognize the operation mode used.


[157] 2008.02941

Exploring entanglement and optimization within the Hamiltonian Variational Ansatz

Quantum variational algorithms are one of the most promising applications of near-term quantum computers; however, recent studies have demonstrated that unless the variational quantum circuits are configured in a problem-specific manner, optimization of such circuits will most likely fail. In this paper, we focus on a special family of quantum circuits called the Hamiltonian Variational Ansatz (HVA), which takes inspiration from the quantum approximation optimization algorithm and adiabatic quantum computation. Through the study of its entanglement spectrum and energy gradient statistics, we find that HVA exhibits favorable structural properties such as mild or entirely absent barren plateaus and a restricted state space that eases their optimization in comparison to the well-studied "hardware-efficient ansatz." We also numerically observe that the optimization landscape of HVA becomes almost trap free when the ansatz is over-parameterized. We observe a size-dependent "computational phase transition" as the number of layers in the HVA circuit is increased where the optimization crosses over from a hard to an easy region in terms of the quality of the approximations and speed of convergence to a good solution. In contrast with the analogous transitions observed in the learning of random unitaries which occur at a number of layers that grows exponentially with the number of qubits, our Variational Quantum Eigensolver experiments suggest that the threshold to achieve the over-parameterization phenomenon scales at most polynomially in the number of qubits for the transverse field Ising and XXZ models. Lastly, as a demonstration of its entangling power and effectiveness, we show that HVA can find accurate approximations to the ground states of a modified Haldane-Shastry Hamiltonian on a ring, which has long-range interactions and has a power-law entanglement scaling.


[158] 2008.02950

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a single model. Although many approaches using deep neural networks (DNNs) have been proposed, DNNs are prone to overfitting when the amount of training data is limited. We propose a framework for multi-speaker speech synthesis using deep Gaussian processes (DGPs); a DGP is a deep architecture of Bayesian kernel regressions and thus robust to overfitting. In this framework, speaker information is fed to duration/acoustic models using speaker codes. We also examine the use of deep Gaussian process latent variable models (DGPLVMs). In this approach, the representation of each speaker is learned simultaneously with other model parameters, and therefore the similarity or dissimilarity of speakers is considered efficiently. We experimentally evaluated two situations to investigate the effectiveness of the proposed methods. In one situation, the amount of data from each speaker is balanced (speaker-balanced), and in the other, the data from certain speakers are limited (speaker-imbalanced). Subjective and objective evaluation results showed that both the DGP and DGPLVM synthesize multi-speaker speech more effective than a DNN in the speaker-balanced situation. We also found that the DGPLVM outperforms the DGP significantly in the speaker-imbalanced situation.


[159] 2008.02957

Dual Convolutional Neural Networks for BreastMass Segmentation and Diagnosis inMammography

Deep convolutional neural networks (CNNs) have emerged as a new paradigm for Mammogram diagnosis. Contemporary CNN-based computer-aided-diagnosis (CAD) for breast cancer directly extract latent features from input mammogram image and ignore the importance of morphological features. {In this paper, we introduce a novel deep learning framework for mammogram image processing, which computes mass segmentation and simultaneously predict diagnosis results.} Specifically, our method is constructed in a dual-path architecture that solves the mapping in a dual-problem manner, with an additional consideration of important shape and boundary knowledge. One path called the Locality Preserving Learner (LPL), is devoted to hierarchically extracting and exploiting intrinsic features of the input. Whereas the other path, called the Conditional Graph Learner (CGL) focuses on generating geometrical features via modeling pixel-wise image to mask correlations. By integrating the two learners, both the semantics and structure are well preserved and the component learning paths in return complement each other, contributing an improvement to the mass segmentation and cancer classification problem at the same time. We evaluated our method on two most used public mammography datasets, DDSM and INbreast. Experimental results show that \dcn achieves the best mammography segmentation (in both high and low resolution) and classification simultaneously, outperforming recent state-of-the-art models.


[160] 2008.02984

NuI-Go: Recursive Non-Local Encoder-Decoder Network for Retinal Image Non-Uniform Illumination Removal

Retinal images have been widely used by clinicians for early diagnosis of ocular diseases. However, the quality of retinal images is often clinically unsatisfactory due to eye lesions and imperfect imaging process. One of the most challenging quality degradation issues in retinal images is non-uniform which hinders the pathological information and further impairs the diagnosis of ophthalmologists and computer-aided analysis.To address this issue, we propose a non-uniform illumination removal network for retinal image, called NuI-Go, which consists of three Recursive Non-local Encoder-Decoder Residual Blocks (NEDRBs) for enhancing the degraded retinal images in a progressive manner. Each NEDRB contains a feature encoder module that captures the hierarchical feature representations, a non-local context module that models the context information, and a feature decoder module that recovers the details and spatial dimension. Additionally, the symmetric skip-connections between the encoder module and the decoder module provide long-range information compensation and reuse. Extensive experiments demonstrate that the proposed method can effectively remove the non-uniform illumination on retinal images while well preserving the image details and color. We further demonstrate the advantages of the proposed method for improving the accuracy of retinal vessel segmentation.


[161] 2008.02995

A Review on Modern Computational Optimal Transport Methods with Applications in Biomedical Research

Optimal transport has been one of the most exciting subjects in mathematics, starting from the 18th century. As a powerful tool to transport between two probability measures, optimal transport methods have been reinvigorated nowadays in a remarkable proliferation of modern data science applications. To meet the big data challenges, various computational tools have been developed in the recent decade to accelerate the computation for optimal transport methods. In this review, we present some cutting-edge computational optimal transport methods with a focus on the regularization-based methods and the projection-based methods. We discuss their real-world applications in biomedical research.


[162] 2008.03006

Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure

Multimarginal Optimal Transport (MOT) has recently attracted significant interest due to its many applications. However, in most applications, the success of MOT is severely hindered by a lack of sub-exponential time algorithms. This paper develops a general theory about "structural properties" that make MOT tractable. We identify two such properties: decomposability of the cost into either (i) local interactions and simple global interactions; or (ii) low-rank interactions and sparse interactions. We also provide strong evidence that (iii) repulsive costs make MOT intractable by showing that several such problems of interest are NP-hard to solve--even approximately. These three structures are quite general, and collectively they encompass many (if not most) current MOT applications. We demonstrate our results on a variety of applications in machine learning, statistics, physics, and computational geometry.


[163] 2008.03008

The Ensemble Method for Thorax Diseases Classification

A common problem found in real-word medical image classification is the inherent imbalance of the positive and negative patterns in the dataset where positive patterns are usually rare. Moreover, in the classification of multiple classes with neural network, a training pattern is treated as a positive pattern in one output node and negative in all the remaining output nodes. In this paper, the weights of a training pattern in the loss function are designed based not only on the number of the training patterns in the class but also on the different nodes where one of them treats this training pattern as positive and the others treat it as negative. We propose a combined approach of weights calculation algorithm for deep network training and the training optimization from the state-of-the-art deep network architecture for thorax diseases classification problem. Experimental results on the Chest X-Ray image dataset demonstrate that this new weighting scheme improves classification performances, also the training optimization from the EfficientNet improves the performance furthermore. We compare the ensemble method with several performances from the previous study of thorax diseases classifications to provide the fair comparisons against the proposed method.


[164] 2008.03009

DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

Singing voice conversion is converting the timbre in the source singing to the target speaker's voice while keeping singing content the same. However, singing data for target speaker is much more difficult to collect compared with normal speech data.In this paper, we introduce a singing voice conversion algorithm that is capable of generating high quality target speaker's singing using only his/her normal speech data. First, we manage to integrate the training and conversion process of speech and singing into one framework by unifying the features used in standard speech synthesis system and singing synthesis system. In this way, normal speech data can also contribute to singing voice conversion training, making the singing voice conversion system more robust especially when the singing database is small.Moreover, in order to achieve one-shot singing voice conversion, a speaker embedding module is developed using both speech and singing data, which provides target speaker identify information during conversion. Experiments indicate proposed sing conversion system can convert source singing to target speaker's high-quality singing with only 20 seconds of target speaker's enrollment speech data.


[165] 2008.03022

Construction of a minimum energy path for the VT flash model by an exponential time differencing scheme with the string method

Phase equilibrium calculation, also known as flash calculation, plays significant roles in various aspects of petroleum and chemical industries. Since Michelsen proposed his milestone studies in 1982, through several decades of development, the current research interest on flash calculation has been shifted from accuracy to efficiency, but the ultimate goal remains the same focusing on estimation of the equilibrium phase amounts and phase compositions under the given variable specification. However, finding the transition route and its related saddle points are very often helpful to study the evolution of phase change and partition. Motivated by this, in this study we apply the string method to find the minimum energy paths and saddle points information of a single-component VT flash model with the Peng-Robinson equation of state. As the system has strong stiffness, common ordinary differential equation solvers have their limitations. To overcome these issues, a Rosenbrock-type exponential time differencing scheme is employed to reduce the computational difficulty caused by the high stiffness of the investigated system. In comparison with the published results and experimental data, the proposed numerical algorithm not only shows good feasibility and accuracy on phase equilibrium calculation, but also successfully calculates the minimum energy path and and saddle point of the single-component VT flash model with strong stiffness.


[166] 2008.03024

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.


[167] 2008.03029

Peking Opera Synthesis via Duration Informed Attention Network

Peking Opera has been the most dominant form of Chinese performing art since around 200 years ago. A Peking Opera singer usually exhibits a very strong personal style via introducing improvisation and expressiveness on stage which leads the actual rhythm and pitch contour to deviate significantly from the original music score. This inconsistency poses a great challenge in Peking Opera singing voice synthesis from a music score. In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework. To tackle the rhythm mismatch, Lagrange multiplier is used to find the optimal output phoneme duration sequence with the constraint of the given note duration from music score. As for the pitch contour mismatch, instead of directly inferring from music score, we adopt a pseudo music score generated from the real singing and feed it as input during training. The experiments demonstrate that with the proposed system we can synthesize Peking Opera singing voice with high-quality timbre, pitch and expressiveness.


[168] 2008.03033

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

A probability forecast or probabilistic classifier is reliable or calibrated if the predicted probabilities are matched by ex post observed frequencies, as examined visually in reliability diagrams. The classical binning and counting approach to plotting reliability diagrams has been hampered by a lack of stability under unavoidable, ad hoc implementation decisions. Here we introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way. CORP is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm - essentially, the CORP reliability diagram shows the graph of the PAV- (re)calibrated forecast probabilities. The CORP approach allows for uncertainty quantification via either resampling techniques or asymptotic theory, furnishes a new numerical measure of miscalibration, and provides a CORP based Brier score decomposition that generalizes to any proper scoring rule. We anticipate that judicious uses of the PAV algorithm yield improved tools for diagnostics and inference for a very wide range of statistical and machine learning methods.


[169] 2008.03038

Fractal Gaussian Networks: A sparse random graph model based on Gaussian Multiplicative Chaos

We propose a novel stochastic network model, called Fractal Gaussian Network (FGN), that embodies well-defined and analytically tractable fractal structures. Such fractal structures have been empirically observed in diverse applications. FGNs interpolate continuously between the popular purely random geometric graphs (a.k.a. the Poisson Boolean network), and random graphs with increasingly fractal behavior. In fact, they form a parametric family of sparse random geometric graphs that are parametrized by a fractality parameter $\nu$ which governs the strength of the fractal structure. FGNs are driven by the latent spatial geometry of Gaussian Multiplicative Chaos (GMC), a canonical model of fractality in its own right. We asymptotically characterize the expected number of edges and triangle in FGNs. We then examine the natural question of detecting the presence of fractality and the problem of parameter estimation based on observed network data, in addition to fundamental properties of the FGN as a random graph model. We also explore fractality in community structures by unveiling a natural stochastic block model in the setting of FGNs.


[170] 2008.03073

From the power law to extreme value mixture distributions

The power law is useful in describing count phenomena such as network degrees and word frequencies. With a single parameter, it captures the main feature that the frequencies are linear on the log-log scale. Nevertheless, there have been criticisms of the power law, and various approaches have been proposed to resolve issues such as selecting the required threshold and quantifying the uncertainty around it, and to test hypotheses on whether the data could have come from the power law. As extreme value theory generalises the (continuous) power law, it is natural to consider the former as a solution to these problems around the latter. In this paper, we propose two extreme value mixture distributions, in one of which the power law is incorporated, without the need of pre-specifying the threshold. The proposed distributions are shown to fit the data well, quantify the threshold uncertainty in a natural way, and satisfactorily answer whether the power law is useful enough.


[171] 2008.03088

Pretraining Techniques for Sequence-to-Sequence Voice Conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody. Nonetheless, without sufficient data, seq2seq VC models can suffer from unstable training and mispronunciation problems in the converted speech, thus far from practical. To tackle these shortcomings, we propose to transfer knowledge from other speech processing tasks where large-scale corpora are easily available, typically text-to-speech (TTS) and automatic speech recognition (ASR). We argue that VC models initialized with such pretrained ASR or TTS model parameters can generate effective hidden representations for high-fidelity, highly intelligible converted speech. We apply such techniques to recurrent neural network (RNN)-based and Transformer based models, and through systematical experiments, we demonstrate the effectiveness of the pretraining scheme and the superiority of Transformer based models over RNN-based models in terms of intelligibility, naturalness, and similarity.


[172] 2008.03090

Multilevel Monte Carlo for quantum mechanics on a lattice

Monte Carlo simulations of quantum field theories on a lattice become increasingly expensive as the continuum limit is approached since the cost per independent sample grows with a high power of the inverse lattice spacing. Simulations on fine lattices suffer from critical slowdown, the rapid growth of autocorrelations in the Markov chain with decreasing lattice spacing. This causes a strong increase in the number of lattice configurations that have to be generated to obtain statistically significant results. This paper discusses hierarchical sampling methods to tame this growth in autocorrelations. Combined with multilevel variance reduction, this significantly reduces the computational cost of simulations for given tolerances $\epsilon_{\text{disc}}$ on the discretisation error and $\epsilon_{\text{stat}}$ on the statistical error. For an observable with lattice errors of order $\alpha$ and an integrated autocorrelation time that grows like $\tau_{\mathrm{int}}\propto a^{-z}$, multilevel Monte Carlo (MLMC) can reduce the cost from $\mathcal{O}(\epsilon_{\text{stat}}^{-2}\epsilon_{\text{disc}}^{-(1+z)/\alpha})$ to $\mathcal{O}(\epsilon_{\text{stat}}^{-2}\vert\log \epsilon_{\text{disc}} \vert^2+\epsilon_{\text{disc}}^{-1/\alpha})$. Even higher performance gains are expected for simulations of quantum field theories in $D$ dimensions. The efficiency of the approach is demonstrated on two model systems, including a topological oscillator that is badly affected by critical slowdown due to freezing of the topological charge. On fine lattices, the new methods are orders of magnitude faster than standard sampling based on Hybrid Monte Carlo. For high resolutions, MLMC can be used to accelerate even the cluster algorithm for the topological oscillator. Performance is further improved through perturbative matching which guarantees efficient coupling of theories on the multilevel hierarchy.


[173] 2008.03096

Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised. This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation. Interleaving the action of reading a character with that of synthesising audio reduces this latency. However, the order of this sequence of interleaved actions varies across sentences, which raises the question of how the actions should be chosen. We propose a reinforcement learning based framework to train an agent to make this decision. We compare our performance against that of deterministic, rule-based systems. Our results demonstrate that our agent successfully balances the trade-off between the latency of audio generation and the quality of synthesised audio. More broadly, we show that neural sequence-to-sequence models can be adapted to run in an incremental manner.


[174] 2008.03102

Pricing group membership

We consider a model where agents differ in their `types' which determines their voluntary contribution towards a public good. We analyze what the equilibrium composition of groups are under centralized and centralized choice. We show that there exists a top-down sorting equilibrium i.e. an equilibrium where there exists a set of prices which leads to groups that can be ordered by level of types, with the first k types in the group with the highest price and so on. This exists both under decentralized and centralized choosing. We also analyze the model with endogenous group size and examine under what conditions is top-down sorting socially efficient. We illustrate when integration (i.e. mixing types so that each group's average type if the same) is socially better than top-down sorting. Finally, we show that top down sorting is efficient even when groups compete among themselves.


[175] 2008.03127

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems.


[176] 2008.03132

BAT.jl -- A Julia-based tool for Bayesian inference

We describe the development of a multi-purpose software for Bayesian statistical inference, BAT.jl, written in the Julia language. The major design considerations and implemented algorithms are summarized here, together with a test suite that ensures the proper functioning of the algorithms. We also give an extended example from the realm of physics that demonstrates the functionalities of BAT.jl.


[177] 2008.03143

Image Transformation Network for Privacy-Preserving Deep Neural Networks and Its Security Evaluation

We propose a transformation network for generating visually-protected images for privacy-preserving DNNs. The proposed transformation network is trained by using a plain image dataset so that plain images are transformed into visually protected ones. Conventional perceptual encryption methods have a weak visual-protection performance and some accuracy degradation in image classification. In contrast, the proposed network enables us not only to strongly protect visual information but also to maintain the image classification accuracy that using plain images achieves. In an image classification experiment, the proposed network is demonstrated to strongly protect visual information on plain images without any performance degradation under the use of CIFAR datasets. In addition, it is shown that the visually protected images are robust against a DNN-based attack, called inverse transformation network attack (ITN-Attack) in an experiment.


[178] 2008.03149

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation. This work investigates how to extend dual-path BiLSTM to result in a new state-of-the-art approach, called TasTas, for multi-talker monaural speech separation (a.k.a cocktail party problem). TasTas introduces two simple but effective improvements, one is an iterative multi-stage refinement scheme, and the other is to correct the speech with imperfect separation through a loss of speaker identity consistency between the separated speech and original speech, to boost the performance of dual-path BiLSTM based networks. TasTas takes the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. Our experiments on the notable benchmark WSJ0-2mix data corpus result in 20.55dB SDR improvement, 20.35dB SI-SDR improvement, 3.69 of PESQ, and 94.86\% of ESTOI, which shows that our proposed networks can lead to big performance improvement on the speaker separation task. We have open sourced our re-implementation of the DPRNN-TasNet here (https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation), and our TasTas is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.


[179] 2008.03152

Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in this paper on ultrasound-based articulatory-to-acoustic conversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of English and Hungarian speech data. The inputs of the convolutional neural network are ultrasound tongue images. The training target is the 80-dimensional mel-spectrogram, which results in a finer detailed spectral representation than the previously used 25-dimensional Mel-Generalized Cepstrum. From the output of the ultrasound-to-mel-spectrogram prediction, WaveGlow inference results in synthesized speech. We compare the proposed WaveGlow-based system with a continuous vocoder which does not use strict voiced/unvoiced decision when predicting F0. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the WaveGlow neural vocoder produces significantly more natural synthesized speech than the baseline system. Besides, the advantage of WaveGlow is that F0 is included in the mel-spectrogram representation, and it is not necessary to predict the excitation separately.


[180] 2008.03157

A nudged hybrid analysis and modeling approach for realtime wake-vortex transport and decay prediction

We put forth a long short-term memory (LSTM) nudging framework for the enhancement of reduced order models (ROMs) of fluid flows utilizing noisy measurements for air traffic improvements. Toward emerging applications of digital twins in aviation, the proposed approach allows for constructing a realtime predictive tool for wake-vortex transport and decay systems. We build on the fact that in realistic application, there are uncertainties in initial and boundary conditions, model parameters, as well as measurements. Moreover, conventional nonlinear ROMs based on Galerkin projection (GROMs) suffer from imperfection and solution instabilities, especially for advection-dominated flows with slow decay in the Kolmogorov width. In the presented LSTM nudging (LSTM-N) approach, we fuse forecasts from a combination of imperfect GROM and uncertain state estimates, with sparse Eulerian sensor measurements to provide more reliable predictions in a dynamical data assimilation framework. We illustrate our concept by solving a two-dimensional vorticity transport equation. We investigate the effects of measurements noise and state estimate uncertainty on the performance of the LSTM-N behavior. We also demonstrate that it can sufficiently handle different levels of temporal and spatial measurement sparsity, and offer a huge potential in developing next-generation digital twin technologies.


[181] 2008.03167

Generating a Heterosexual Bipartite Network Embedded in Social Network

We describe how to generate a heterosexual network with a prescribed joint-degree distribution that is embedded in a prescribed large-scale social contact network. The structure of a sexual network plays an important role in how sexually transmitted infections (STIs) spread. Generating an ensemble of networks that mimics the real-world is crucial to evaluating robust mitigation strategies for controling STIs. Most of the current algorithms to generate sexual networks only use sexual activity data, such as the number of partners per month, to generate the sexual network. Real-world sexual networks also depend on biased mixing based on age, location, and social and work activities. We describe an approach to use a broad range of social activity data to generate possible heterosexual networks. We start with a large-scale simulation of thousands of people in a city as they go through their daily activities, including work, school, shopping, and activities at home. We extract a social network from these activities where the nodes are the people and the edges indicate a social interaction, such as working in the same location. This social network captures the correlations between people of different ages, living in different locations, their economic status, and other demographic factors. We use the social contact network to define a bipartite heterosexual network that is embedded within an extended social network. The resulting sexual network captures the biased mixing inherent in the social network, and models based on this pairing of networks can be used to investigate novel intervention strategies based on the social contacts of infected people. We illustrate the approach in a model for the spread of Chlamydia in the heterosexual network representing the young sexually active community in New Orleans.


[182] 2008.03175

Perfect Reconstruction of Sparse Signals via Greedy Monte-Carlo Search

We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorithm is called the greedy Monte-Carlo (GMC) search algorithm. Its performance is examined via numerical experiments, which suggests that in the noiseless case, GMC can achieve perfect reconstruction in undersampling situations of a reasonable level: it can outperform the $\ell_1$ relaxation but does not reach the algorithmic limit of MC-based methods theoretically clarified by an earlier analysis. Additionally, an experiment on a real-world dataset supports the practicality of GMC.


[183] 2008.03183

Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) consists of three Sub-Challenges, where the tasks are to identify the level of arousal and valence of elderly speakers, determine whether the actual speaker wearing a surgical mask, and estimate the actual breathing of the speaker. In our contribution to the Challenge, we focus on the Elderly Emotion and the Mask sub-challenges. Besides utilizing standard or close-to-standard features such as ComParE functionals, Bag-of-Audio-Words and Fisher vectors, we exploit that emotion is related to the velocity of speech (i.e. speech rate). To utilize this, we perform phone-level recognition using an ASR system, and extract features from the output such as articulation tempo, speech tempo, and various attributes measuring the amount of pauses. We also hypothesize that wearing a surgical mask makes the speaker feel uneasy, leading to a slower speech rate and more hesitations; hence, we experiment with the same features in the Mask sub-challenge as well. Although this theory was not justified by the experimental results on the Mask Sub-Challenge, in the Elderly Emotion Sub-Challenge we got significantly improved arousal and valence values with this feature type both on the development set and in cross-validation.


[184] 2008.03188

CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

This paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus is to support scientific and clinical research, as well as technology development related to child speech assessment. The design of the corpus, including selection of words, participants recruitment, data acquisition process, and data pre-processing are described in detail. The results of acoustical analysis are presented to illustrate the properties of child speech. Potential applications of the corpus in automatic speech recognition, phonological error detection and speaker diarization are also discussed.


[185] 2008.03193

Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder

Speech sound disorder (SSD) refers to the developmental disorder in which children encounter persistent difficulties in correctly pronouncing words. Assessment of SSD has been relying largely on trained speech and language pathologists (SLPs). With the increasing demand for and long-lasting shortage of SLPs, automated assessment of speech disorder becomes a highly desirable approach to assisting clinical work. This paper describes a study on automatic detection of phonological errors in Cantonese speech of kindergarten children, based on a newly collected large speech corpus. The proposed approach to speech error detection involves the use of a Siamese recurrent autoencoder, which is trained to learn the similarity and discrepancy between phone segments in the embedding space. Training of the model requires only speech data from typically developing (TD) children. To distinguish disordered speech from typical one, cosine distance between the embeddings of the test segment and the reference segment is computed. Different model architectures and training strategies are experimented. Results on detecting the 6 most common consonant errors demonstrate satisfactory performance of the proposed model, with the average precision value from 0.82 to 0.93.


[186] 2008.03194

Scalable Low-Rank Autoregressive Tensor Learning for Spatiotemporal Traffic Data Imputation

Missing value problem in spatiotemporal traffic data has long been a challenging topic, in particular for large-scale and high-dimensional data with complex missing mechanisms and diverse degrees of missingness. Recent studies based on tensor nuclear norm have demonstrated the superiority of tensor learning in imputation tasks by effectively characterizing the complex correlations/dependencies in spatiotemporal data. However, despite the promising results, these approaches do not scale well to large tensors. In this paper, we focus on addressing the missing data imputation problem for large-scale spatiotemporal traffic data. To achieve both high accuracy and efficiency, we develop a scalable autoregressive tensor learning model---Low-Tubal-Rank Autoregressive Tensor Completion (LATC-Tubal)---based on the existing framework of Low-Rank Autoregressive Tensor Completion (LATC), which is well-suited for spatiotemporal traffic data that characterized by multidimensional structure of location$\times$ time of day $\times$ day. In particular, the proposed LATC-Tubal model involves a scalable tensor nuclear norm minimization scheme by integrating linear unitary transformation. Therefore, the tensor nuclear norm minimization can be solved by singular value thresholding on the transformed matrix of each day while the day-to-day correlation can be effectively preserved by the unitary transform matrix. Before setting up the experiment, we consider two large-scale 5-minute traffic speed data sets collected by the California PeMS system with 11160 sensors. We compare LATC-Tubal with state-of-the-art baseline models, and find that LATC-Tubal can achieve competitively accuracy with a significantly lower computational cost. In addition, the LATC-Tubal will also benefit other tasks in modeling large-scale spatiotemporal traffic data, such as network-level traffic forecasting.


[187] 2008.03205

Multi-Task Driven Explainable Diagnosis of COVID-19 using Chest X-ray Images

With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the RT-PCR kits are available in sufficient quantity in several countries, others are facing challenges with limited availability of testing kits and processing centers in remote areas. This has motivated researchers to find alternate methods of testing which are reliable, easily accessible and faster. Chest X-Ray is one of the modalities that is gaining acceptance as a screening modality. Towards this direction, the paper has two primary contributions. Firstly, we present the COVID-19 Multi-Task Network which is an automated end-to-end network for COVID-19 screening. The proposed network not only predicts whether the CXR has COVID-19 features present or not, it also performs semantic segmentation of the regions of interest to make the model explainable. Secondly, with the help of medical professionals, we manually annotate the lung regions of 9000 frontal chest radiographs taken from ChestXray-14, CheXpert and a consolidated COVID-19 dataset. Further, 200 chest radiographs pertaining to COVID-19 patients are also annotated for semantic segmentation. This database will be released to the research community.


[188] 2008.03206

In-Depth DCT Coefficient Distribution Analysis for First Quantization Estimation

The exploitation of traces in JPEG double compressed images is of utter importance for investigations. Properly exploiting such insights, First Quantization Estimation (FQE) could be performed in order to obtain source camera model identification (CMI) and therefore reconstruct the history of a digital image. In this paper, a method able to estimate the first quantization factors for JPEG double compressed images is presented, employing a mixed statistical and Machine Learning approach. The presented solution is demonstrated to work without any a-priori assumptions about the quantization matrices. Experimental results and comparisons with the state-of-the-art show the goodness of the proposed technique.


[189] 2008.03213

6G Wireless Systems: Vision, Requirements, Challenges, Insights, and Opportunities

Mobile communications have been undergoing a generational change every ten years or so. However, the time difference between the so-called "G's" is also decreasing. While fifth-generation (5G) systems are becoming a commercial reality, there is already significant interest in systems beyond 5G - which we refer to as the sixth-generation (6G) of wireless systems. In contrast to the many published papers on the topic, we take a top-down approach to 6G. We present a holistic discussion of 6G systems beginning with the lifestyle and societal changes driving the need for next generation networks, to the technical requirements needed to enable 6G applications, through to the challenges, as well as possibilities for practically realizable system solutions across all layers of the Open Systems Interconnection stack. Since many of the 6G applications will need access to an order-of-magnitude more spectrum, utilization of frequencies between 100 GHz and 1 THz becomes of paramount importance. We comprehensively characterize the limitations that must be overcome to realize working systems in these bands; and provide a unique perspective on the physical, as well as higher layer challenges relating to the design of next generation core networks, new modulation and coding methods, novel multiple access techniques, antenna arrays, wave propagation, radio-frequency transceiver design, as well as real-time signal processing. We rigorously discuss the fundamental changes required in the core networks of the future, such as the redesign or significant reduction of the transport architecture that serves as a major source of latency. While evaluating the strengths and weaknesses of key technologies, we differentiate what may be practically achievable over the next decade, relative to what is possible in theory. For each discussed system aspect, we present concrete research challenges.


[190] 2008.03221

Manifold-adaptive dimension estimation revisited

Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold-adaptive Farahmand-Szepesv\'ari-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected-median-FSA estimator with kNN estimators: maximum likelihood (ML, Levina-Bickel) and two implementations of DANCo (R and matlab). We show that corrected-median-FSA estimator beats the ML estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones.


[191] 2008.03226

The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry

The space of synthesizable molecules is greater than $10^{60}$, meaning only a vanishingly small fraction of these molecules have ever been realized in the lab. In order to prioritize which regions of this space to explore next, synthetic chemists need access to accurate molecular property predictions. While great advances in molecular machine learning have been made, there is a dearth of benchmarks featuring properties that are useful for the synthetic chemist. Focussing directly on the needs of the synthetic chemist, we introduce the Photoswitch Dataset, a new benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule's efficacy is governed by its electronic transition wavelengths. We demonstrate superior performance in predicting these wavelengths compared to both time-dependent density functional theory (TD-DFT), the incumbent first principles quantum mechanical approach, as well as a panel of human experts. Our baseline models are currently being deployed in the lab as part of the decision process for candidate synthesis. It is our hope that this benchmark can drive real discoveries in photoswitch chemistry and that future benchmarks can be introduced to pivot learning algorithm development to benefit more expansive areas of synthetic chemistry.


[192] 2008.03235

Individual Treatment Effect Estimation in a Low Compliance Setting

Individual Treatment Effect (ITE) estimation is an extensively researched problem, with applications in various domains. We model the case where there is heterogeneous non-compliance to a randomly assigned treatment, a typical situation in health (because of non-compliance to prescription) or digital advertising (because of competition and ad blockers for instance). The lower the compliance, the more the effect of treatment prescription, or individual prescription effect (IPE), signal fades away and becomes hard to capture. We propose a new approach to estimate IPE that takes advantage of observed compliance information to prevent signal fading. Using the Structural Causal Model framework and do-calculus, we define a general mediated causal effect setting under which our proposed estimator soundly recovers the IPE, and study its asymptotic variance. Finally, we conduct extensive experiments on both synthetic and real-world datasets that highlight the benefit of the approach, which consistently improves state-of-the-art in low compliance settings.


[193] 2008.03240

Quantum State Tomography with Conditional Generative Adversarial Networks

Quantum state tomography (QST) is a challenging task in intermediate-scale quantum devices. Here, we apply conditional generative adversarial networks (CGANs) to QST. In the CGAN framework, two duelling neural networks, a generator and a discriminator, learn multi-modal models from data. We augment a CGAN with custom neural-network layers that enable conversion of output from any standard neural network into a physical density matrix. To reconstruct the density matrix, the generator and discriminator networks train each other on data using standard gradient-based methods. We demonstrate that our QST-CGAN reconstructs optical quantum states with high fidelity orders of magnitude faster, and from less data, than a standard maximum-likelihood method. We also show that the QST-CGAN can reconstruct a quantum state in a single evaluation of the generator network if it has been pre-trained on similar quantum states.


[194] 2008.03247

Investigation of Speaker-adaptation methods in Transformer based ASR

End-to-end models are fast replacing conventional hybrid models in automatic speech recognition. A transformer is a sequence-to-sequence framework solely based on attention, that was initially applied to machine translation task. This end-to-end framework has been shown to give promising results when used for automatic speech recognition as well. In this paper, we explore different ways of incorporating speaker information while training a transformer-based model to improve its performance. We present speaker information in the form of speaker embeddings for each of the speakers. Two broad categories of speaker embeddings are used: (i)fixed embeddings, and (ii)learned embeddings. We experiment using speaker embeddings learned along with the model training, as well as one-hot vectors and x-vectors. Using these different speaker embeddings, we obtain an average relative improvement of 1% to 3% in the token error rate. We report results on the NPTEL lecture database. NPTEL is an open-source e-learning portal providing content from top Indian universities.


[195] 2008.03256

Opening practice: supporting Reproducibility and Critical spatial data science

This paper reflects on a number of trends towards a more open and reproducible approach to geographic and spatial data science over recent years. In particular it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling. It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering 'black boxes' where the internal workings of the analysis are not revealed. It is argued that this closed form software is problematic, and considers a number of ways in which issues identified in spatial data analysis (such as the MAUP) could be overlooked when working with closed tools, leading to problems of interpretation and possibly inappropriate actions and policies based on these. In addition, this paper and considers the role that reproducible and open spatial science may play in such an approach, taking into account the issues raised. It highlights the dangers of failing to account for the geographical properties of data, now that all data are spatial (they are collected somewhere), the problems of a desire for n=all observations in data science and it identifies the need for a critical approach. This is one in which openness, transparency, sharing and reproducibility provide a mantra for defensible and robust spatial data science.