New articles on Computer Science

[1] 2107.12373

Relational Boosted Regression Trees

Many tasks use data housed in relational databases to train boosted regression tree models. In this paper, we give a relational adaptation of the greedy algorithm for training boosted regression trees. For the subproblem of calculating the sum of squared residuals of the dataset, which dominates the runtime of the boosting algorithm, we provide a $(1 + \epsilon)$-approximation using the tensor sketch technique. Employing this approximation within the relational boosted regression trees algorithm leads to learning similar model parameters, but with asymptotically better runtime.

[2] 2107.12374

Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike Hybrid Input Encoding

Spiking Neural Networks (SNNs) have emerged as an attractive alternative to traditional deep learning frameworks, since they provide higher computational efficiency in event driven neuromorphic hardware. However, the state-of-the-art (SOTA) SNNs suffer from high inference latency, resulting from inefficient input encoding and training techniques. The most widely used input coding schemes, such as Poisson based rate-coding, do not leverage the temporal learning capabilities of SNNs. This paper presents a training framework for low-latency energy-efficient SNNs that uses a hybrid encoding scheme at the input layer in which the analog pixel values of an image are directly applied during the first timestep and a novel variant of spike temporal coding is used during subsequent timesteps. In particular, neurons in every hidden layer are restricted to fire at most once per image which increases activation sparsity. To train these hybrid-encoded SNNs, we propose a variant of the gradient descent based spike timing dependent back propagation (STDB) mechanism using a novel cross entropy loss function based on both the output neurons' spike time and membrane potential. The resulting SNNs have reduced latency and high activation sparsity, yielding significant improvements in computational efficiency. In particular, we evaluate our proposed training scheme on image classification tasks from CIFAR-10 and CIFAR-100 datasets on several VGG architectures. We achieve top-1 accuracy of $66.46$\% with $5$ timesteps on the CIFAR-100 dataset with ${\sim}125\times$ less compute energy than an equivalent standard ANN. Additionally, our proposed SNN performs $5$-$300\times$ faster inference compared to other state-of-the-art rate or temporally coded SNN models.

[3] 2107.12407

Selective MPC: Distributed Computation of Differentially Private Key Value Statistics

An increasingly popular method for computing aggregate statistics while preserving users' privacy is local differential privacy (LDP). Under this model, users perturb their data before sending it to an untrusted central party to be processed. Key value data is a naturally occurring data type that has not been thoroughly investigated in the local trust model. Existing LDP solutions for computing statistics over key value data suffer from the inherent accuracy limitations of each user adding their own noise. Multi-party computation (MPC) is a common alternative to LDP that removes the requirement for a trusted central party while maintaining accuracy; however, naively applying MPC to key value data results in prohibitively expensive computation costs. In this work, we present selective multi-party computation, a novel approach to distributed computation that leverages DP leakage to efficiently and accurately compute statistics over key value data. We show that our protocol satisfies pure DP and is provably secure in the combined DP/MPC model. Our empirical evaluation demonstrates that we can compute statistics over 10,000 keys in 20 seconds and can scale up to 30 servers while obtaining results for a single key in under a second.

[4] 2107.12411

Red blue $k$-center clustering with distance constraints

We consider a variant of the $k$-center clustering problem in $\Re^d$, where the centers can be divided into two subsets, one, the red centers of size $p$, and the other, the blue centers of size $q$, where $p+q=k$, and such that each red center and each blue center must be apart a distance of at least some given $\alpha \geq 0$, with the aim of minimizing the covering radius. We provide a bi-criteria approximation algorithm for the problem and a polynomial time algorithm for the constrained problem where all centers must lie on a given line $\ell$.

[5] 2107.12416

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.

[6] 2107.12422

Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework

Advanced tensor decomposition, such as Tensor train (TT) and Tensor ring (TR), has been widely studied for deep neural network (DNN) model compression, especially for recurrent neural networks (RNNs). However, compressing convolutional neural networks (CNNs) using TT/TR always suffers significant accuracy loss. In this paper, we propose a systematic framework for tensor decomposition-based model compression using Alternating Direction Method of Multipliers (ADMM). By formulating TT decomposition-based model compression to an optimization problem with constraints on tensor ranks, we leverage ADMM technique to systemically solve this optimization problem in an iterative way. During this procedure, the entire DNN model is trained in the original structure instead of TT format, but gradually enjoys the desired low tensor rank characteristics. We then decompose this uncompressed model to TT format and fine-tune it to finally obtain a high-accuracy TT-format DNN model. Our framework is very general, and it works for both CNNs and RNNs, and can be easily modified to fit other tensor decomposition approaches. We evaluate our proposed framework on different DNN models for image classification and video recognition tasks. Experimental results show that our ADMM-based TT-format models demonstrate very high compression performance with high accuracy. Notably, on CIFAR-100, with 2.3X and 2.4X compression ratios, our models have 1.96% and 2.21% higher top-1 accuracy than the original ResNet-20 and ResNet-32, respectively. For compressing ResNet-18 on ImageNet, our model achieves 2.47X FLOPs reduction without accuracy loss.

[7] 2107.12423

HySec-Flow: Privacy-Preserving Genomic Computing with SGX-based Big-Data Analytics Framework

Trusted execution environments (TEE) such as Intel's Software Guard Extension (SGX) have been widely studied to boost security and privacy protection for the computation of sensitive data such as human genomics. However, a performance hurdle is often generated by SGX, especially from the small enclave memory. In this paper, we propose a new Hybrid Secured Flow framework (called "HySec-Flow") for large-scale genomic data analysis using SGX platforms. Here, the data-intensive computing tasks can be partitioned into independent subtasks to be deployed into distinct secured and non-secured containers, therefore allowing for parallel execution while alleviating the limited size of Page Cache (EPC) memory in each enclave. We illustrate our contributions using a workflow supporting indexing, alignment, dispatching, and merging the execution of SGX- enabled containers. We provide details regarding the architecture of the trusted and untrusted components and the underlying Scorn and Graphene support as generic shielding execution frameworks to port legacy code. We thoroughly evaluate the performance of our privacy-preserving reads mapping algorithm using real human genome sequencing data. The results demonstrate that the performance is enhanced by partitioning the time-consuming genomic computation into subtasks compared to the conventional execution of the data-intensive reads mapping algorithm in an enclave. The proposed HySec-Flow framework is made available as an open-source and adapted to the data-parallel computation of other large-scale genomic tasks requiring security and scalable computational resources.

[8] 2107.12425

Applying Model-Driven Engineering to Stimulate the Adoption of DevOps Processes in Small and Medium-Sized Development Organizations

Purpose: Microservice Architecture (MSA) denotes an increasingly popular architectural style in which business capabilities are wrapped into autonomously developable and deployable software components called microservices. Microservice applications are developed by multiple DevOps teams each owning one or more services. In this article, we explore the state of how DevOps teams in small and medium-sized organizations (SMOs) cope with MSA and how they can be supported. Methods: We show through a secondary analysis of an exploratory interview study comprising six cases, that the organizational and technological complexity resulting from MSA poses particular challenges for small and medium-sized organizations (SMOs). We apply Model-Driven Engineering to address these challenges. Results: As results of the second analysis, we identify the challenge areas of building and maintaining a common architectural understanding, and dealing with deployment technologies. To support DevOps teams of SMOs in coping with these challenges, we present a model-driven workflow based on LEMMA - the Language Ecosystem for Modeling Microservice Architecture. To implement the workflow, we extend LEMMA with the functionality to (i) generate models from API documentation; (ii) reference remote models owned by other teams; (iii) generate deployment specifications; and (iv) generate a visual representation of the overall architecture. Conclusion: We validate the model-driven workflow and our extensions to LEMMA through a case study showing that the added functionality to LEMMA can bring efficiency gains for DevOps teams. To develop best practices for applying our workflow to maximize efficiency in SMOs, we plan to conduct more empirical research in the field in the future.

[9] 2107.12428

Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning

We introduce an unsupervised approach for correcting highly imperfect speech transcriptions based on a decision-level fusion of stemming and two-way phoneme pruning. Transcripts are acquired from videos by extracting audio using Ffmpeg framework and further converting audio to text transcript using Google API. In the benchmark LRW dataset, there are 500 word categories, and 50 videos per class in mp4 format. All videos consist of 29 frames (each 1.16 s long) and the word appears in the middle of the video. In our approach we tried to improve the baseline accuracy from 9.34% by using stemming, phoneme extraction, filtering and pruning. After applying the stemming algorithm to the text transcript and evaluating the results, we achieved 23.34% accuracy in word recognition. To convert words to phonemes we used the Carnegie Mellon University (CMU) pronouncing dictionary that provides a phonetic mapping of English words to their pronunciations. A two-way phoneme pruning is proposed that comprises of the two non-sequential steps: 1) filtering and pruning the phonemes containing vowels and plosives 2) filtering and pruning the phonemes containing vowels and fricatives. After obtaining results of stemming and two-way phoneme pruning, we applied decision-level fusion and that led to an improvement of word recognition rate upto 32.96%.

[10] 2107.12429

MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

Self-supervised depth estimation for indoor environments is more challenging than its outdoor counterpart in at least the following two aspects: (i) the depth range of indoor sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues, whereas the maximum distance in outdoor scenes mostly stays the same as the camera usually sees the sky; (ii) the indoor sequences contain much more rotational motions, which cause difficulties for the pose network, while the motions of outdoor sequences are pre-dominantly translational, especially for driving datasets such as KITTI. In this paper, special considerations are given to those challenges and a set of good practices are consolidated for improving the performance of self-supervised monocular depth estimation in indoor environments. The proposed method mainly consists of two novel modules, \ie, a depth factorization module and a residual pose estimation module, each of which is designed to respectively tackle the aforementioned challenges. The effectiveness of each module is shown through a carefully conducted ablation study and the demonstration of the state-of-the-art performance on two indoor datasets, \ie, EuRoC and NYUv2.

[11] 2107.12433

The Graph Neural Networking Challenge: A Worldwide Competition for Education in AI/ML for Networks

During the last decade, Machine Learning (ML) has increasingly become a hot topic in the field of Computer Networks and is expected to be gradually adopted for a plethora of control, monitoring and management tasks in real-world deployments. This poses the need to count on new generations of students, researchers and practitioners with a solid background in ML applied to networks. During 2020, the International Telecommunication Union (ITU) has organized the "ITU AI/ML in 5G challenge'', an open global competition that has introduced to a broad audience some of the current main challenges in ML for networks. This large-scale initiative has gathered 23 different challenges proposed by network operators, equipment manufacturers and academia, and has attracted a total of 1300+ participants from 60+ countries. This paper narrates our experience organizing one of the proposed challenges: the "Graph Neural Networking Challenge 2020''. We describe the problem presented to participants, the tools and resources provided, some organization aspects and participation statistics, an outline of the top-3 awarded solutions, and a summary with some lessons learned during all this journey. As a result, this challenge leaves a curated set of educational resources openly available to anyone interested in the topic.

[12] 2107.12435

A Comprehensive Study on Colorectal Polyp Segmentation with ResUNet++, Conditional Random Field and Test-Time Augmentation

Colonoscopy is considered the gold standard for detection of colorectal cancer and its precursors. Existing examination methods are, however, hampered by high overall miss-rate, and many abnormalities are left undetected. Computer-Aided Diagnosis systems based on advanced machine learning algorithms are touted as a game-changer that can identify regions in the colon overlooked by the physicians during endoscopic examinations, and help detect and characterize lesions. In previous work, we have proposed the ResUNet++ architecture and demonstrated that it produces more efficient results compared with its counterparts U-Net and ResUNet. In this paper, we demonstrate that further improvements to the overall prediction performance of the ResUNet++ architecture can be achieved by using conditional random field and test-time augmentation. We have performed extensive evaluations and validated the improvements using six publicly available datasets: Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS-Larib Polyp DB, ASU-Mayo Clinic Colonoscopy Video Database, and CVC-VideoClinicDB. Moreover, we compare our proposed architecture and resulting model with other State-of-the-art methods. To explore the generalization capability of ResUNet++ on different publicly available polyp datasets, so that it could be used in a real-world setting, we performed an extensive cross-dataset evaluation. The experimental results show that applying CRF and TTA improves the performance on various polyp segmentation datasets both on the same dataset and cross-dataset.

[13] 2107.12436

Feature Synergy, Redundancy, and Independence in Global Model Explanations using SHAP Vector Decomposition

We offer a new formalism for global explanations of pairwise feature dependencies and interactions in supervised models. Building upon SHAP values and SHAP interaction values, our approach decomposes feature contributions into synergistic, redundant and independent components (S-R-I decomposition of SHAP vectors). We propose a geometric interpretation of the components and formally prove its basic properties. Finally, we demonstrate the utility of synergy, redundancy and independence by applying them to a constructed data set and model.

[14] 2107.12443

SeismographAPI: Visualising Temporal-Spatial Crisis Data

Effective decision-making for crisis mitigation increasingly relies on visualisation of large amounts of data. While interactive dashboards are more informative than static visualisations, their development is far more time-demanding and requires a range of technical and financial capabilities. There are few open-source libraries available, which is blocking contributions from low-resource environments and impeding rapid crisis responses. To address these limitations, we present SeismographAPI, an open-source library for visualising temporal-spatial crisis data on the country- and sub-country level in two use cases: Conflict Monitoring Map and Pandemic Monitoring Map. The library provides easy-to-use data connectors, broad functionality, clear documentation and run time-efficiency.

[15] 2107.12445

Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression

Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks, due to their promise to provide increased compute efficiency on event-driven neuromorphic hardware. However, to perform well on complex vision applications, most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency. Hence,minimizing average spike activity while preserving accuracy indeep SNNs remains a significant challenge and opportunity.This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activitywhile maintaining high inference accuracy. In particular, our framework first uses the attention-maps of an un compressed meta-model to yield compressed ANNs. This step can be tuned to support both irregular and structured channel pruning to leverage computational benefits over a broad range of platforms. The framework then performs sparse-learning-based supervised SNN training using direct inputs. During the training, it jointly optimizes the SNN weight, threshold, and leak parameters to drastically minimize the number of time steps required while retaining compression. To evaluate the merits of our approach, we performed experiments with variants of VGG and ResNet, on both CIFAR-10 and CIFAR-100, and VGG16 on Tiny-ImageNet.The SNN models generated through the proposed technique yield SOTA compression ratios of up to 33.4x with no significant drops in accuracy compared to baseline unpruned counterparts. Compared to existing SNN pruning methods, we achieve up to 8.3x higher compression with improved accuracy.

[16] 2107.12450

Resilient Distributed Averaging

In this paper, a fully distributed averaging algorithm in the presence of adversarial Byzantine agents is proposed. The algorithm is based on a resilient retrieval procedure, where all non-Byzantine nodes send their own initial values and retrieve those of other agents. We establish that the convergence of the proposed algorithm relies on strong robustness of the graph for locally bounded adversaries. A topology analysis in terms of time complexity and relation between connectivity metrics is also presented. Simulation results are provided to verify the effectiveness of the proposed algorithms under prescribed graph conditions.

[17] 2107.12452

Accelerated Gradient Descent Learning over Multiple Access Fading Channels

We consider a distributed learning problem in a wireless network, consisting of N distributed edge devices and a parameter server (PS). The objective function is a sum of the edge devices' local loss functions, who aim to train a shared model by communicating with the PS over multiple access channels (MAC). This problem has attracted a growing interest in distributed sensing systems, and more recently in federated learning, known as over-the-air computation. In this paper, we develop a novel Accelerated Gradient-descent Multiple Access (AGMA) algorithm that uses momentum-based gradient signals over noisy fading MAC to improve the convergence rate as compared to existing methods. Furthermore, AGMA does not require power control or beamforming to cancel the fading effect, which simplifies the implementation complexity. We analyze AGMA theoretically, and establish a finite-sample bound of the error for both convex and strongly convex loss functions with Lipschitz gradient. For the strongly convex case, we show that AGMA approaches the best-known linear convergence rate as the network increases. For the convex case, we show that AGMA significantly improves the sub-linear convergence rate as compared to existing methods. Finally, we present simulation results using real datasets that demonstrate better performance by AGMA.

[18] 2107.12455

Combining Reward and Rank Signals for Slate Recommendation

We consider the problem of slate recommendation, where the recommender system presents a user with a collection or slate composed of K recommended items at once. If the user finds the recommended items appealing then the user may click and the recommender system receives some feedback. Two pieces of information are available to the recommender system: was the slate clicked? (the reward), and if the slate was clicked, which item was clicked? (rank). In this paper, we formulate several Bayesian models that incorporate the reward signal (Reward model), the rank signal (Rank model), or both (Full model), for non-personalized slate recommendation. In our experiments, we analyze performance gains of the Full model and show that it achieves significantly lower error as the number of products in the catalog grows or as the slate size increases.

[19] 2107.12460

Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers

Self-supervised pre-training of large-scale transformer models on text corpora followed by finetuning has achieved state-of-the-art on a number of natural language processing tasks. Recently, Lu et al. (2021, arXiv:2103.05247) claimed that frozen pretrained transformers (FPTs) match or outperform training from scratch as well as unfrozen (fine-tuned) pretrained transformers in a set of transfer tasks to other modalities. In our work, we find that this result is, in fact, an artifact of not tuning the learning rates. After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned. Thus, while transfer from pretrained language models to other modalities does indeed provide gains and hints at exciting possibilities for future work, properly tuning hyperparameters is important for arriving at robust findings.

[20] 2107.12463

Sample Preparation Meets Farey Sequence: A New Design Technique for Free-Flowing Microfluidic Networks

Design of microfluidic biochips has led to newer challenges to the EDA community due to the availability of various flow-based architectures and the need for catering to diverse applications such as sample preparation, personalized medicine, point-of-care diagnostics, and drug design. The ongoing Covid-19 pandemic has increased the demand for low-cost diagnostic lab-on-chips manifold. Sample preparation (dilution or mixing of biochemical fluids) is an indispensable step of any biochemical experiment including sensitive detection and successful assay execution downstream. Although for valve-based microfluidic biochips various design automation tools are currently available, they are expensive, and prone to various manufacturing and operational defects. Additionally, many problems are left open in the domain of free-flowing biochips, where only a single layer of flow-channels is used for fluid-flow devoid of any kind of control layer/valves. In this work, we present a methodology for designing a free-flowing biochip that is capable of performing fluid dilution according to users requirement. The proposed algorithm for sample preparation utilizes the Farey-sequence arithmetic of fractions that are used to represent the concentration factor of the target fluid. We also present the detailed layout design of a free-flowing microfluidic architecture that emulates the dilution algorithm. The network is simulated using COMSOL multi-physics software accounting for relevant hydrodynamic parameters. Experiments on various test-cases support the efficacy of the proposed design in terms of accuracy, convergence time, reactant cost, and simplicity of the fluidic network compared to prior art.

[21] 2107.12466

High-Dimensional Distribution Generation Through Deep Neural Networks

We show that every $d$-dimensional probability distribution of bounded support can be generated through deep ReLU networks out of a $1$-dimensional uniform input distribution. What is more, this is possible without incurring a cost - in terms of approximation error measured in Wasserstein-distance - relative to generating the $d$-dimensional target distribution from $d$ independent random variables. This is enabled by a vast generalization of the space-filling approach discovered in (Bailey & Telgarsky, 2018). The construction we propose elicits the importance of network depth in driving the Wasserstein distance between the target distribution and its neural network approximation to zero. Finally, we find that, for histogram target distributions, the number of bits needed to encode the corresponding generative network equals the fundamental limit for encoding probability distributions as dictated by quantization theory.

[22] 2107.12473

Adversarial Attacks with Time-Scale Representations

We propose a novel framework for real-time black-box universal attacks which disrupts activations of early convolutional layers in deep learning models. Our hypothesis is that perturbations produced in the wavelet space disrupt early convolutional layers more effectively than perturbations performed in the time domain. The main challenge in adversarial attacks is to preserve low frequency image content while minimally changing the most meaningful high frequency content. To address this, we formulate an optimization problem using time-scale (wavelet) representations as a dual space in three steps. First, we project original images into orthonormal sub-spaces for low and high scales via wavelet coefficients. Second, we perturb wavelet coefficients for high scale projection using a generator network. Third, we generate new adversarial images by projecting back the original coefficients from the low scale and the perturbed coefficients from the high scale sub-space. We provide a theoretical framework that guarantees a dual mapping from time and time-scale domain representations. We compare our results with state-of-the-art black-box attacks from generative-based and gradient-based models. We also verify efficacy against multiple defense methods such as JPEG compression, Guided Denoiser and Comdefend. Our results show that wavelet-based perturbations consistently outperform time-based attacks thus providing new insights into vulnerabilities of deep learning models and could potentially lead to robust architectures or new defense and attack mechanisms by leveraging time-scale representations.

[23] 2107.12475

On the hardness of knowing busy beaver values BB(15) and BB(5,4)

The busy beaver value BB($n$) is the maximum number of steps made by any $n$-state, 2-symbol deterministic halting Turing machine starting on blank tape, and BB($n,k$) denotes its $k$-symbol generalisation to $k\geq 2$. The busy beaver function $n \mapsto \text{BB}(n)$ is uncomputable and its values have been linked to hard open problems in mathematics and notions of unprovability. In this paper, we show that there are two explicit Turing machines, one with 15 states and 2 symbols, the other with 5 states and 4 symbols, that halt if and only if the following Collatz-related conjecture by Erd\H{o}s [1979] does not hold does not hold: for all $n>8$ there is at least one digit 2 in the base 3 representation of $2^n$. This result implies that knowing the values of BB(15) or BB(5,4) is at least as hard as solving Erd\H{o}s' conjecture and makes, to date, BB(15) the smallest busy beaver value that is related to a natural open problem in mathematics. For comparison, Yedidia and Aaronson [2016] show that knowing BB(4,888) and BB(5,372) are as hard as solving Goldbach's conjecture and the Riemann hypothesis, respectively (later informally improved to BB(27) and BB(744)). Finally, our result puts a finite, albeit large, bound on Erd\H{o}s' conjecture, by making it equivalent to the following finite statement: for all $8 < n \leq \min(\text{BB}(15), \text{BB}(5,4))$ there is at least one digit 2 in the base 3 representation of $2^n$.

[24] 2107.12477

Decision Making Using Rough Set based Spanning Sets for a Decision System

Rough Set based concepts of Span and Spanning Sets were recently proposed to deal with uncertainties in data. Here, this paper, presents novel concepts for generic decision-making process using Rough Set based span for a decision table. Majority of problems in Artificial Intelligence deal with decision making. This paper provides real life applications of proposed Rough Set based span for decision tables. Here, novel concept of span for a decision table is proposed, illustrated with real life example of flood relief and rescue team assignment. Its uses, applications and properties are explored. The key contribution of paper is primarily to study decision making using Rough Set based Span for a decision tables, as against an information system in prior works. Here, the main contribution is that decision classes are automatically learned by the technique of Rough Set based span, for a particular problem, hence automating the decision-making process. These decision-making tools based on span can guide an expert in taking decisions in tough and time-bound situations.

[25] 2107.12479

Terrain-perception-free Quadrupedal Spinning Locomotion on Versatile Terrains: Modeling, Analysis, and Experimental Validation

Dynamic quadrupedal locomotion over rough terrains, although revealing remarkable progress over the last few decades, remains a challenging task. Small-scale quadruped robots are adequately flexible and adaptable to traverse numerous uneven terrains, such as slopes and stairs, while moving along its Sagittal direction. However, spinning behaviors on uneven terrain often exhibit position drifts. Motivated by this problem, this study presents an algorithmic method to enable accurate spinning motions over uneven terrain and constrain the spinning radius of the Center of Mass (CoM) to be bounded within a small range so as to minimize the drift risks. A modified spherical foot kinematics representation is proposed to improve the foot kinematic model and rolling dynamics of the quadruped during locomotion. A CoM planner is proposed to generate stable spinning motion based on projected stability margins. Accurate motion tracking is accomplished with Linear Quadratic Regulator (LQR) to bound the position drift during the spinning movement. Experiments are conducted on a small-scale quadruped robot and the effectiveness of the proposed method is verified on versatile terrains including flat ground, stairs and slope terrains, respectively.

[26] 2107.12480

Circular-Symmetric Correlation Layer based on FFT

Despite the vast success of standard planar convolutional neural networks, they are not the most efficient choice for analyzing signals that lie on an arbitrarily curved manifold, such as a cylinder. The problem arises when one performs a planar projection of these signals and inevitably causes them to be distorted or broken where there is valuable information. We propose a Circular-symmetric Correlation Layer (CCL) based on the formalism of roto-translation equivariant correlation on the continuous group $S^1 \times \mathbb{R}$, and implement it efficiently using the well-known Fast Fourier Transform (FFT) algorithm. We showcase the performance analysis of a general network equipped with CCL on various recognition and classification tasks and datasets. The PyTorch package implementation of CCL is provided online.

[27] 2107.12482

High-Payload Online Identification and Adaptive Control for an Electrically-actuated Quadruped Robot

Quadruped robots manifest great potential to traverse rough terrains with payload. Numerous traditional control methods for legged dynamic locomotion are model-based and exhibit high sensitivity to model uncertainties and payload variations. Therefore, high-performance model parameter estimation becomes indispensable. However, the inertia parameters of payload are usually unknown and dynamically changing when the quadruped robot is deployed in versatile tasks. To address this problem, online identification of the inertia parameters and the Center of Mass (CoM) position of the payload for the quadruped robots draw an increasing interest. This study presents an adaptive controller based on the online payload identification for the high payload capacity (the ratio between payload and robot's self-weight) quadruped locomotion. We name it as Adaptive Controller for Quadruped Locomotion (ACQL), which consists of a recursive update law and a control law. ACQL estimates the external forces and torques induced by the payload online. The estimation is incorporated in inverse-dynamics-based Quadratic Programming (QP) to realize a trotting gait. As such, the tracking accuracy of the robot's CoM and orientation trajectories are improved. The proposed method, ACQL, is verified in a real quadruped robot platform. Experiments prove the estimation efficacy for the payload weighing from 20 kg to 75 kg and loaded at different locations of the robot's torso.

[28] 2107.12486

AI Multi-Tenancy on Edge: Concurrent Deep Learning Model Executions and Dynamic Model Placements on Edge Devices

Many real-world applications are widely adopting the edge computing paradigm due to its low latency and better privacy protection. With notable success in AI and deep learning (DL), edge devices and AI accelerators play a crucial role in deploying DL inference services at the edge of the Internet. While prior works quantified various edge devices' efficiency, most studies focused on the performance of edge devices with single DL tasks. Therefore, there is an urgent need to investigate AI multi-tenancy on edge devices, required by many advanced DL applications for edge computing. This work investigates two techniques - concurrent model executions and dynamic model placements - for AI multi-tenancy on edge devices. With image classification as an example scenario, we empirically evaluate AI multi-tenancy on various edge devices, AI accelerators, and DL frameworks to identify its benefits and limitations. Our results show that multi-tenancy significantly improves DL inference throughput by up to 3.3x -- 3.8x on Jetson TX2. These AI multi-tenancy techniques also open up new opportunities for flexible deployment of multiple DL services on edge devices and AI accelerators.

[29] 2107.12490

LEGATO: A LayerwisE Gradient AggregaTiOn Algorithm for Mitigating Byzantine Attacks in Federated Learning

Federated learning has arisen as a mechanism to allow multiple participants to collaboratively train a model without sharing their data. In these settings, participants (workers) may not trust each other fully; for instance, a set of competitors may collaboratively train a machine learning model to detect fraud. The workers provide local gradients that a central server uses to update a global model. This global model can be corrupted when Byzantine workers send malicious gradients, which necessitates robust methods for aggregating gradients that mitigate the adverse effects of Byzantine inputs. Existing robust aggregation algorithms are often computationally expensive and only effective under strict assumptions. In this paper, we introduce LayerwisE Gradient AggregatTiOn (LEGATO), an aggregation algorithm that is, by contrast, scalable and generalizable. Informed by a study of layer-specific responses of gradients to Byzantine attacks, LEGATO employs a dynamic gradient reweighing scheme that is novel in its treatment of gradients based on layer-specific robustness. We show that LEGATO is more computationally efficient than multiple state-of-the-art techniques and more generally robust across a variety of attack settings in practice. We also demonstrate LEGATO's benefits for gradient descent convergence in the absence of an attack.

[30] 2107.12492

SpectGRASP: Robotic Grasping by Spectral Correlation

This paper presents a spectral correlation-based method (SpectGRASP) for robotic grasping of arbitrarily shaped, unknown objects. Given a point cloud of an object, SpectGRASP extracts contact points on the object's surface matching the hand configuration. It neither requires offline training nor a-priori object models. We propose a novel Binary Extended Gaussian Image (BEGI), which represents the point cloud surface normals of both object and robot fingers as signals on a 2-sphere. Spherical harmonics are then used to estimate the correlation between fingers and object BEGIs. The resulting spectral correlation density function provides a similarity measure of gripper and object surface normals. This is highly efficient in that it is simultaneously evaluated at all possible finger rotations in SO(3). A set of contact points are then extracted for each finger using rotations with high correlation values. We then use our previous work, Local Contact Moment (LoCoMo) similarity metric, to sequentially rank the generated grasps such that the one with maximum likelihood is executed. We evaluate the performance of SpectGRASP by conducting experiments with a 7-axis robot fitted with a parallel-jaw gripper, in a physics simulation environment. Obtained results indicate that the method not only can grasp individual objects, but also can successfully clear randomly organized groups of objects. The SpectGRASP method also outperforms the closest state-of-the-art method in terms of grasp generation time and grasp-efficiency.

[31] 2107.12499

CalCROP21: A Georeferenced multi-spectral dataset of Satellite Imagery and Crop Labels

Mapping and monitoring crops is a key step towards sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous errors and absence of input imagery along with class labels). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training, but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of clouds and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset.

[32] 2107.12501

Adversarial Random Forest Classifier for Automated Game Design

Autonomous game design, generating games algorithmically, has been a longtime goal within the technical games research field. However, existing autonomous game design systems have relied in large part on human-authoring for game design knowledge, such as fitness functions in search-based methods. In this paper, we describe an experiment to attempt to learn a human-like fitness function for autonomous game design in an adversarial manner. While our experimental work did not meet our expectations, we present an analysis of our system and results that we hope will be informative to future autonomous game design research.

[33] 2107.12506

TaikoNation: Patterning-focused Chart Generation for Rhythm Action Games

Generating rhythm game charts from songs via machine learning has been a problem of increasing interest in recent years. However, all existing systems struggle to replicate human-like patterning: the placement of game objects in relation to each other to form congruent patterns based on events in the song. Patterning is a key identifier of high quality rhythm game content, seen as a necessary component in human rankings. We establish a new approach for chart generation that produces charts with more congruent, human-like patterning than seen in prior work.

[34] 2107.12507

Analyzing vehicle pedestrian interactions combining data cube structure and predictive collision risk estimation model

Traffic accidents are a threat to human lives, particularly pedestrians causing premature deaths. Therefore, it is necessary to devise systems to prevent accidents in advance and respond proactively, using potential risky situations as one of the surrogate safety measurements. This study introduces a new concept of a pedestrian safety system that combines the field and the centralized processes. The system can warn of upcoming risks immediately in the field and improve the safety of risk frequent areas by assessing the safety levels of roads without actual collisions. In particular, this study focuses on the latter by introducing a new analytical framework for a crosswalk safety assessment with behaviors of vehicle/pedestrian and environmental features. We obtain these behavioral features from actual traffic video footage in the city with complete automatic processing. The proposed framework mainly analyzes these behaviors in multidimensional perspectives by constructing a data cube structure, which combines the LSTM based predictive collision risk estimation model and the on line analytical processing operations. From the PCR estimation model, we categorize the severity of risks as four levels and apply the proposed framework to assess the crosswalk safety with behavioral features. Our analytic experiments are based on two scenarios, and the various descriptive results are harvested the movement patterns of vehicles and pedestrians by road environment and the relationships between risk levels and car speeds. Thus, the proposed framework can support decision makers by providing valuable information to improve pedestrian safety for future accidents, and it can help us better understand their behaviors near crosswalks proactively. In order to confirm the feasibility and applicability of the proposed framework, we implement and apply it to actual operating CCTVs in Osan City, Korea.

[35] 2107.12512

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness of these techniques is, however, subject to the availability of a large number (several tens) of input views of the scene, and computationally demanding optimizations. In this paper, we tackle these limitations for the specific problem of few-shot full 3D head reconstruction, by endowing coordinate-based representations with a probabilistic shape prior that enables faster convergence and better generalization when using few input images (down to three). First, we learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations. At test time, we jointly overfit two coordinate-based neural networks to the scene, one modeling the geometry and another estimating the surface radiance, using implicit differentiable rendering. We devise a two-stage optimization strategy in which the learned prior is used to initialize and constrain the geometry during an initial optimization phase. Then, the prior is unfrozen and fine-tuned to the scene. By doing this, we achieve high-fidelity head reconstructions, including hair and shoulders, and with a high level of detail that consistently outperforms both state-of-the-art 3D Morphable Models methods in the few-shot scenario, and non-parametric methods when large sets of views are available.

[36] 2107.12514

Language Grounding with 3D Objects

Seemingly simple natural language requests to a robot are generally underspecified, for example "Can you bring me the wireless mouse?" When viewing mice on the shelf, the number of buttons or presence of a wire may not be visible from certain angles or positions. Flat images of candidate mice may not provide the discriminative information needed for "wireless". The world, and objects in it, are not flat images but complex 3D shapes. If a human requests an object based on any of its basic properties, such as color, shape, or texture, robots should perform the necessary exploration to accomplish the task. In particular, while substantial effort and progress has been made on understanding explicitly visual attributes like color and category, comparatively little progress has been made on understanding language about shapes and contours. In this work, we introduce a novel reasoning task that targets both visual and non-visual language about 3D objects. Our new benchmark, ShapeNet Annotated with Referring Expressions (SNARE), requires a model to choose which of two objects is being referenced by a natural language description. We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation. In particular, we find that adding view estimation to language grounding models improves accuracy on both SNARE and when identifying objects referred to in language on a robot platform.

[37] 2107.12518

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

We introduce a method that allows to automatically segment images into semantically meaningful regions without human supervision. Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets. In cases where semantic regions might be hard for human to define and consistently label, our method is still able to find meaningful and consistent semantic classes. In our work, we use pretrained StyleGAN2~\cite{karras2020analyzing} generative model: clustering in the feature space of the generative model allows to discover semantic classes. Once classes are discovered, a synthetic dataset with generated images and corresponding segmentation masks can be created. After that a segmentation model is trained on the synthetic dataset and is able to generalize to real images. Additionally, by using CLIP~\cite{radford2021learning} we are able to use prompts defined in a natural language to discover some desired semantic classes. We test our method on publicly available datasets and show state-of-the-art results.

[38] 2107.12519

Proactive Composition of Mobile IoT Energy Services

We propose a novel proactive composition framework of wireless energy services in a crowdsourced IoT environment. We define a new model for energy services and requests that includes providers' and consumers' mobility patterns and energy usage behavior. The proposed composition approach leverages the mobility and energy usage behavior to generate energy services and requests proactively. Preliminary experimental results demonstrate the effectiveness of generating proactive energy requests and composing proactive services.

[39] 2107.12521

Restricted Boltzmann Machine and Deep Belief Network: Tutorial and Survey

This is a tutorial and survey paper on Boltzmann Machine (BM), Restricted Boltzmann Machine (RBM), and Deep Belief Network (DBN). We start with the required background on probabilistic graphical models, Markov random field, Gibbs sampling, statistical physics, Ising model, and the Hopfield network. Then, we introduce the structures of BM and RBM. The conditional distributions of visible and hidden variables, Gibbs sampling in RBM for generating variables, training BM and RBM by maximum likelihood estimation, and contrastive divergence are explained. Then, we discuss different possible discrete and continuous distributions for the variables. We introduce conditional RBM and how it is trained. Finally, we explain deep belief network as a stack of RBM models. This paper on Boltzmann machines can be useful in various fields including data science, statistics, neural computation, and statistical physics.

[40] 2107.12524

Ensemble Learning For Mega Man Level Generation

Procedural content generation via machine learning (PCGML) is the process of procedurally generating game content using models trained on existing game content. PCGML methods can struggle to capture the true variance present in underlying data with a single model. In this paper, we investigated the use of ensembles of Markov chains for procedurally generating \emph{Mega Man} levels. We conduct an initial investigation of our approach and evaluate it on measures of playability and stylistic similarity in comparison to a non-ensemble, existing Markov chain approach.

[41] 2107.12527

Physics-Enforced Modeling for Insertion Loss of Transmission Lines by Deep Neural Networks

In this paper, we investigate data-driven parameterized modeling of insertion loss for transmission lines with respect to design parameters. We first show that direct application of neural networks can lead to non-physics models with negative insertion loss. To mitigate this problem, we propose two deep learning solutions. One solution is to add a regulation term, which represents the passive condition, to the final loss function to enforce the negative quantity of insertion loss. In the second method, a third-order polynomial expression is defined first, which ensures positiveness, to approximate the insertion loss, then DeepONet neural network structure, which was proposed recently for function and system modeling, was employed to model the coefficients of polynomials. The resulting neural network is applied to predict the coefficients of the polynomial expression. The experimental results on an open-sourced SI/PI database of a PCB design show that both methods can ensure the positiveness for the insertion loss. Furthermore, both methods can achieve similar prediction results, while the polynomial-based DeepONet method is faster than DeepONet based method in training time.

[42] 2107.12530

Convergence of Deep ReLU Networks

We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.

[43] 2107.12532

Generating Lode Runner Levels by Learning Player Paths with LSTMs

Machine learning has been a popular tool in many different fields, including procedural content generation. However, procedural content generation via machine learning (PCGML) approaches can struggle with controllability and coherence. In this paper, we attempt to address these problems by learning to generate human-like paths, and then generating levels based on these paths. We extract player path data from gameplay video, train an LSTM to generate new paths based on this data, and then generate game levels based on this path data. We demonstrate that our approach leads to more coherent levels for the game Lode Runner in comparison to an existing PCGML approach.

[44] 2107.12533

Toward Co-creative Dungeon Generation via Transfer Learning

Co-creative Procedural Content Generation via Machine Learning (PCGML) refers to systems where a PCGML agent and a human work together to produce output content. One of the limitations of co-creative PCGML is that it requires co-creative training data for a PCGML agent to learn to interact with humans. However, acquiring this data is a difficult and time-consuming process. In this work, we propose approximating human-AI interaction data and employing transfer learning to adapt learned co-creative knowledge from one game to a different game. We explore this approach for co-creative Zelda dungeon room generation.

[45] 2107.12534

On the Construction of Protograph-based Partially Doped GLDPC Codes

A generalized low-density parity-check (GLDPC) code is a class of codes, where single parity check nodes in a conventional low-density parity-check (LDPC) code are replaced by linear codes with higher parity check constraints. In this paper, we introduce a new method of constructing GLDPC codes by inserting the generalized check nodes for partial doping. While the conventional protograph GLDPC code dopes the protograph check nodes by replacing them with the generalized check nodes, a new GLDPC code is constructed by adding the generalized check nodes and partially doping the selected variable nodes to possess higher degrees of freedom, called a partially doped GLDPC (PD-GLDPC) code. The proposed PD-GLDPC codes can make it possible to do more accurate extrinsic information transfer (EXIT) analysis and the doping granularity can become finer in terms of the protograph than the conventional GLDPC code. We also propose the constraint for the typical minimum distance of PD-GLDPC codes and prove that the PD-GLDPC codes satisfying this condition have the linear minimum distance growth property. Furthermore, we obtain the threshold optimized protograph for both regular and irregular ensembles of the proposed PD-GLDPC codes over the binary erasure channel (BEC). Specifically, we propose the construction algorithms for both regular and irregular protograph-based PD-GLDPC codes that enable the construction of GLDPC codes with higher rates than the conventional ones. The block error rate performance of the proposed PD-GLDPC code shows that it has a reasonably good waterfall performance with low error floor and outperforms other LDPC codes for the same code rate, code length, and degree distribution.

[46] 2107.12540

A Neurorobotics Approach to Behaviour Selection based on Human Activity Recognition

Behaviour selection has been an active research topic for robotics, in particular in the field of human-robot interaction. For a robot to interact effectively and autonomously with humans, the coupling between techniques for human activity recognition, based on sensing information, and robot behaviour selection, based on decision-making mechanisms, is of paramount importance. However, most approaches to date consist of deterministic associations between the recognised activities and the robot behaviours, neglecting the uncertainty inherent to sequential predictions in real-time applications. In this paper, we address this gap by presenting a neurorobotics approach based on computational models that resemble neurophysiological aspects of living beings. This neurorobotics approach was compared to a non-bioinspired, heuristics-based approach. To evaluate both approaches, a robot simulation is developed, in which a mobile robot has to accomplish tasks according to the activity being performed by the inhabitant of an intelligent home. The outcomes of each approach were evaluated according to the number of correct outcomes provided by the robot. Results revealed that the neurorobotics approach is advantageous, especially considering the computational models based on more complex animals.

[47] 2107.12541

BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation

Depth map super-resolution is a task with high practical application requirements in the industry. Existing color-guided depth map super-resolution methods usually necessitate an extra branch to extract high-frequency detail information from RGB image to guide the low-resolution depth map reconstruction. However, because there are still some differences between the two modalities, direct information transmission in the feature dimension or edge map dimension cannot achieve satisfactory result, and may even trigger texture copying in areas where the structures of the RGB-D pair are inconsistent. Inspired by the multi-task learning, we propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. For the interaction of two subnetworks, we adopt a differentiated guidance strategy and design two bridges correspondingly. One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task. The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task. The entire network architecture is highly portable and can provide a paradigm for associating the DSR and MDE tasks. Extensive experiments on benchmark datasets demonstrate that our method achieves competitive performance. Our code and models are available at

[48] 2107.12542

Energy-based Unknown Intent Detection with Data Manipulation

Unknown intent detection aims to identify the out-of-distribution (OOD) utterance whose intent has never appeared in the training set. In this paper, we propose using energy scores for this task as the energy score is theoretically aligned with the density of the input and can be derived from any classifier. However, high-quality OOD utterances are required during the training stage in order to shape the energy gap between OOD and in-distribution (IND), and these utterances are difficult to collect in practice. To tackle this problem, we propose a data manipulation framework to Generate high-quality OOD utterances with importance weighTs (GOT). Experimental results show that the energy-based detector fine-tuned by GOT can achieve state-of-the-art results on two benchmark datasets.

[49] 2107.12544

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the world's oldest board games and many classic video games, but they require vast quantities of experience to learn successfully -- none of today's algorithms account for the human ability to learn so many different tasks, so quickly. Here we propose a new approach to this challenge based on a particularly strong form of model-based RL which we call Theory-Based Reinforcement Learning, because it uses human-like intuitive theories -- rich, abstract, causal models of physical objects, intentional agents, and their interactions -- to explore and model an environment, and plan effectively to achieve task goals. We instantiate the approach in a video game playing agent called EMPA (the Exploring, Modeling, and Planning Agent), which performs Bayesian inference to learn probabilistic generative models expressed as programs for a game-engine simulator, and runs internal simulations over these models to support efficient object-based, relational exploration and heuristic planning. EMPA closely matches human learning efficiency on a suite of 90 challenging Atari-style video games, learning new games in just minutes of game play and generalizing robustly to new game situations and new levels. The model also captures fine-grained structure in people's exploration trajectories and learning dynamics. Its design and behavior suggest a way forward for building more general human-like AI systems.

[50] 2107.12545

Double Deep Q-learning Based Real-Time Optimization Strategy for Microgrids

The uncertainties from distributed energy resources (DERs) bring significant challenges to the real-time operation of microgrids. In addition, due to the nonlinear constraints in the AC power flow equation and the nonlinearity of the battery storage model, etc., the optimization of the microgrid is a mixed-integer nonlinear programming (MINLP) problem. It is challenging to solve this kind of stochastic nonlinear optimization problem. To address the challenge, this paper proposes a deep reinforcement learning (DRL) based optimization strategy for the real-time operation of the microgrid. Specifically, we construct the detailed operation model for the microgrid and formulate the real-time optimization problem as a Markov Decision Process (MDP). Then, a double deep Q network (DDQN) based architecture is designed to solve the MINLP problem. The proposed approach can learn a near-optimal strategy only from the historical data. The effectiveness of the proposed algorithm is validated by the simulations on a 10-bus microgrid system and a modified IEEE 69-bus microgrid system. The numerical simulation results demonstrate that the proposed approach outperforms several existing methods.

[51] 2107.12547

Probing neural networks with t-SNE, class-specific projections and a guided tour

We use graphical methods to probe neural nets that classify images. Plots of t-SNE outputs at successive layers in a network reveal increasingly organized arrangement of the data points. They can also reveal how a network can diminish or even forget about within-class structure as the data proceeds through layers. We use class-specific analogues of principal components to visualize how succeeding layers separate the classes. These allow us to sort images from a given class from most typical to least typical (in the data) and they also serve as very useful projection coordinates for data visualization. We find them especially useful when defining versions guided tours for animated data visualization.

[52] 2107.12548

KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation

Visualization recommendation or automatic visualization generation can significantly lower the barriers for general users to rapidly create effective data visualizations, especially for those users without a background in data visualizations. However, existing rule-based approaches require tedious manual specifications of visualization rules by visualization experts. Other machine learning-based approaches often work like black-box and are difficult to understand why a specific visualization is recommended, limiting the wider adoption of these approaches. This paper fills the gap by presenting KG4Vis, a knowledge graph (KG)-based approach for visualization recommendation. It does not require manual specifications of visualization rules and can also guarantee good explainability. Specifically, we propose a framework for building knowledge graphs, consisting of three types of entities (i.e., data features, data columns and visualization design choices) and the relations between them, to model the mapping rules between data and effective visualizations. A TransE-based embedding technique is employed to learn the embeddings of both entities and relations of the knowledge graph from existing dataset-visualization pairs. Such embeddings intrinsically model the desirable visualization rules. Then, given a new dataset, effective visualizations can be inferred from the knowledge graph with semantically meaningful rules. We conducted extensive evaluations to assess the proposed approach, including quantitative comparisons, case studies and expert interviews. The results demonstrate the effectiveness of our approach.

[53] 2107.12549

Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation

6D pose estimation of rigid objects from a single RGB image has seen tremendous improvements recently by using deep learning to combat complex real-world variations, but a majority of methods build models on the per-object level, failing to scale to multiple objects simultaneously. In this paper, we present a novel approach for scalable 6D pose estimation, by self-supervised learning on synthetic data of multiple objects using a single autoencoder. To handle multiple objects and generalize to unseen objects, we disentangle the latent object shape and pose representations, so that the latent shape space models shape similarities, and the latent pose code is used for rotation retrieval by comparison with canonical rotations. To encourage shape space construction, we apply contrastive metric learning and enable the processing of unseen objects by referring to similar training objects. The different symmetries across objects induce inconsistent latent pose spaces, which we capture with a conditioned block producing shape-dependent pose codebooks by re-entangling shape and pose representations. We test our method on two multi-object benchmarks with real data, T-LESS and NOCS REAL275, and show it outperforms existing RGB-based methods in terms of pose estimation accuracy and generalization.

[54] 2107.12550

Accelerated Multiple Precision Direct Method and Mixed Precision Iterative Refinement on Python Programming Environment

Current Python programming environment does not have any reliable and efficient multiple precision floating-point (MPF) arithmetic except ``mpmath" and ``gmpy2" packages based on GNU MP(GMP) and MPFR libraries. Although it is well known that multi-component-type MPF library can be utilized for middle length precision arithmetic under 200 bits, they are not widely used on Python environment. In this paper, we describe our accelerated MPF direct method with AVX2 techniques and its application to mixed precision iterative refinement combined with mpmath, and demonstrate their efficiency on x86\_64 computational environments.

[55] 2107.12560

Perception-and-Regulation Network for Salient Object Detection

Effective fusion of different types of features is the key to salient object detection. The majority of existing network structure design is based on the subjective experience of scholars and the process of feature fusion does not consider the relationship between the fused features and highest-level features. In this paper, we focus on the feature relationship and propose a novel global attention unit, which we term the "perception- and-regulation" (PR) block, that adaptively regulates the feature fusion process by explicitly modeling interdependencies between features. The perception part uses the structure of fully-connected layers in classification networks to learn the size and shape of objects. The regulation part selectively strengthens and weakens the features to be fused. An imitating eye observation module (IEO) is further employed for improving the global perception ability of the network. The imitation of foveal vision and peripheral vision enables IEO to scrutinize highly detailed objects and to organize the broad spatial scene to better segment objects. Sufficient experiments conducted on SOD datasets demonstrate that the proposed method performs favorably against 22 state-of-the-art methods.

[56] 2107.12562

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

Cross-speaker style transfer is crucial to the applications of multi-style and expressive speech synthesis at scale. It does not require the target speakers to be experts in expressing all styles and to collect corresponding recordings for model training. However, the performances of existing style transfer methods are still far behind real application needs. The root causes are mainly twofold. Firstly, the style embedding extracted from single reference speech can hardly provide fine-grained and appropriate prosody information for arbitrary text to synthesize. Secondly, in these models the content/text, prosody, and speaker timbre are usually highly entangled, it's therefore not realistic to expect a satisfied result when freely combining these components, such as to transfer speaking style between speakers. In this paper, we propose a cross-speaker style transfer text-to-speech (TTS) model with explicit prosody bottleneck. The prosody bottleneck builds up the kernels accounting for speaking style robustly, and disentangles the prosody from content and speaker timbre, therefore guarantees high quality cross-speaker style transfer. Evaluation result shows the proposed method even achieves on-par performance with source speaker's speaker-dependent (SD) model in objective measurement of prosody, and significantly outperforms the cycle consistency and GMVAE-based baselines in objective and subjective evaluations.

[57] 2107.12563

Parallel Detection for Efficient Video Analytics at the Edge

Deep Neural Network (DNN) trained object detectors are widely deployed in many mission-critical systems for real time video analytics at the edge, such as autonomous driving and video surveillance. A common performance requirement in these mission-critical edge services is the near real-time latency of online object detection on edge devices. However, even with well-trained DNN object detectors, the online detection quality at edge may deteriorate for a number of reasons, such as limited capacity to run DNN object detection models on heterogeneous edge devices, and detection quality degradation due to random frame dropping when the detection processing rate is significantly slower than the incoming video frame rate. This paper addresses these problems by exploiting multi-model multi-device detection parallelism for fast object detection in edge systems with heterogeneous edge devices. First, we analyze the performance bottleneck of running a well-trained DNN model at edge for real time online object detection. We use the offline detection as a reference model, and examine the root cause by analyzing the mismatch among the incoming video streaming rate, video processing rate for object detection, and output rate for real time detection visualization of video streaming. Second, we study performance optimizations by exploiting multi-model detection parallelism. We show that the model-parallel detection approach can effectively speed up the FPS detection processing rate, minimizing the FPS disparity with the incoming video frame rate on heterogeneous edge devices. We evaluate the proposed approach using SSD300 and YOLOv3 on benchmark videos of different video stream rates. The results show that exploiting multi-model detection parallelism can speed up the online object detection processing rate and deliver near real-time object detection performance for efficient video analytics at edge.

[58] 2107.12565

A Biomedically oriented automatically annotated Twitter COVID-19 Dataset

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations do not generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

[59] 2107.12566

Thunder CTF: Learning Cloud Security on a Dime

Organizations have rapidly shifted infrastructure and applications over to public cloud computing services such as AWS (Amazon Web Services), Google Cloud Platform, and Azure. Unfortunately, such services have security models that are substantially different and more complex than traditional enterprise security models. As a result, misconfiguration errors in cloud deployments have led to dozens of well-publicized breaches. This paper describes Thunder CTF, a scaffolded, scenario-based CTF (Capture-the-Flag) for helping students learn about and practice cloud security skills. Thunder CTF is easily deployed at minimal cost and is highly extensible to allow for crowd-sourced development of new levels as security issues evolve in the cloud.

[60] 2107.12567

Guided Optimization for Image Processing Pipelines

Writing high-performance image processing code is challenging and labor-intensive. The Halide programming language simplifies this task by decoupling high-level algorithms from "schedules" which optimize their implementation. However, even with this abstraction, it is still challenging for Halide programmers to understand complicated scheduling strategies and productively write valid, optimized schedules. To address this, we propose a programming support method called "guided optimization." Guided optimization provides programmers a set of valid optimization options and interactive feedback about their current choices, which enables them to comprehend and efficiently optimize image processing code without the time-consuming trial-and-error process of traditional text editors. We implemented a proof-of-concept system, Roly-poly, which integrates guided optimization, program visualization, and schedule cost estimation to support the comprehension and development of efficient Halide image processing code. We conducted a user study with novice Halide programmers and confirmed that Roly-poly and its guided optimization was informative, increased productivity, and resulted in higher-performing schedules in less time.

[61] 2107.12568

Version Space Algebras are Acyclic Tree Automata

Version space algebras are ways of representing spaces of programs which can be combined using union, intersection, and cross-product/``join" operators. In their reified form as ASTs with explicit union and join nodes, they have the ability to compactly represent exponentially-large spaces of programs, owing to which they have become become the most popular approach to enumerative program synthesis since the introduction of FlashFill in 2010. We present a linear-time semantics-preserving constructive embedding from version space algebras into nondeterministic finite tree automata, showing that the former are but a special case of the latter. Combined with recent results finding a correspondence between e-graphs and minimal deterministic tree automata, this shows that tree automata are strict generalizations of all recent major approaches to efficiently representing large spaces of programs by sharing.

[62] 2107.12569

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

We propose a self-supervised spatio-temporal matching method coined Motion-Aware Mask Propagation (MAMP) for semi-supervised video object segmentation. During training, MAMP leverages the frame reconstruction task to train the model without the need for annotations. During inference, MAMP extracts high-resolution features from each frame to build a memory bank from the features as well as the predicted masks of selected past frames. MAMP then propagates the masks from the memory bank to subsequent frames according to our motion-aware spatio-temporal matching module, also proposed in this paper. Evaluation on DAVIS-2017 and YouTube-VOS datasets show that MAMP achieves state-of-the-art performance with stronger generalization ability compared to existing self-supervised methods, i.e. 4.9\% higher mean $\mathcal{J}\&\mathcal{F}$ on DAVIS-2017 and 4.85\% higher mean $\mathcal{J}\&\mathcal{F}$ on the unseen categories of YouTube-VOS than the nearest competitor. Moreover, MAMP performs on par with many supervised video object segmentation methods. Our code is available at: \url{}.

[63] 2107.12571

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows

Unsupervised anomaly detection with localization has many practical applications when labeling is infeasible and, moreover, when anomaly examples are completely missing in the train data. While recently proposed models for such data setup achieve high accuracy metrics, their complexity is a limiting factor for real-time processing. In this paper, we propose a real-time model and analytically derive its relationship to prior methods. Our CFLOW-AD model is based on a conditional normalizing flow framework adopted for anomaly detection with localization. In particular, CFLOW-AD consists of a discriminatively pretrained encoder followed by a multi-scale generative decoders where the latter explicitly estimate likelihood of the encoded features. Our approach results in a computationally and memory-efficient model: CFLOW-AD is faster and smaller by a factor of 10x than prior state-of-the-art with the same input setting. Our experiments on the MVTec dataset show that CFLOW-AD outperforms previous methods by 0.36% AUROC in detection task, by 1.12% AUROC and 2.5% AUPRO in localization task, respectively. We open-source our code with fully reproducible experiments.

[64] 2107.12576

CCGL: Contrastive Cascade Graph Learning

Supervised learning, while prevalent for information cascade modeling, often requires abundant labeled data in training, and the trained model is not easy to generalize across tasks and datasets. Semi-supervised learning facilitates unlabeled data for cascade understanding in pre-training. It often learns fine-grained feature-level representations, which can easily result in overfitting for downstream tasks. Recently, contrastive self-supervised learning is designed to alleviate these two fundamental issues in linguistic and visual tasks. However, its direct applicability for cascade modeling, especially graph cascade related tasks, remains underexplored. In this work, we present Contrastive Cascade Graph Learning (CCGL), a novel framework for cascade graph representation learning in a contrastive, self-supervised, and task-agnostic way. In particular, CCGL first designs an effective data augmentation strategy to capture variation and uncertainty. Second, it learns a generic model for graph cascade tasks via self-supervised contrastive pre-training using both unlabeled and labeled data. Third, CCGL learns a task-specific cascade model via fine-tuning using labeled data. Finally, to make the model transferable across datasets and cascade applications, CCGL further enhances the model via distillation using a teacher-student architecture. We demonstrate that CCGL significantly outperforms its supervised and semi-supervised counterpartsfor several downstream tasks.

[65] 2107.12578

Dual Slot Selector via Local Reliability Verification for Dialogue State Tracking

The goal of dialogue state tracking (DST) is to predict the current dialogue state given all previous dialogue contexts. Existing approaches generally predict the dialogue state at every turn from scratch. However, the overwhelming majority of the slots in each turn should simply inherit the slot values from the previous turn. Therefore, the mechanism of treating slots equally in each turn not only is inefficient but also may lead to additional errors because of the redundant slot value generation. To address this problem, we devise the two-stage DSS-DST which consists of the Dual Slot Selector based on the current turn dialogue, and the Slot Value Generator based on the dialogue history. The Dual Slot Selector determines each slot whether to update slot value or to inherit the slot value from the previous turn from two aspects: (1) if there is a strong relationship between it and the current turn dialogue utterances; (2) if a slot value with high reliability can be obtained for it through the current turn dialogue. The slots selected to be updated are permitted to enter the Slot Value Generator to update values by a hybrid method, while the other slots directly inherit the values from the previous turn. Empirical results show that our method achieves 56.93%, 60.73%, and 58.04% joint accuracy on MultiWOZ 2.0, MultiWOZ 2.1, and MultiWOZ 2.2 datasets respectively and achieves a new state-of-the-art performance with significant improvements.

[66] 2107.12579

Remember What You have drawn: Semantic Image Manipulation with Memory

Image manipulation with natural language, which aims to manipulate images with the guidance of language descriptions, has been a challenging problem in the fields of computer vision and natural language processing (NLP). Currently, a number of efforts have been made for this task, but their performances are still distant away from generating realistic and text-conformed manipulated images. Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description. We propose a two-stage network with an additional reconstruction stage to learn the latent memories efficiently. To avoid the unnecessary background changes, we propose a Target Localization Unit (TLU) to focus on the manipulation of the region mentioned by the text. Moreover, to learn a robust memory, we further propose a novel randomized memory training loss. Experiments on the four popular datasets show the better performance of our method compared to the existing ones.

[67] 2107.12580

Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization

The successes of deep learning critically rely on the ability of neural networks to output meaningful predictions on unseen data -- generalization. Yet despite its criticality, there remain fundamental open questions on how neural networks generalize. How much do neural networks rely on memorization -- seeing highly similar training examples -- and how much are they capable of human-intelligence styled reasoning -- identifying abstract rules underlying the data? In this paper we introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization. While PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty, they all have a simple underlying rule. One part of the PVR task input acts as a pointer, giving the location of a different part of the input, which forms the value (and output). We demonstrate that this task structure provides a rich testbed for understanding generalization, with our empirical study showing large variations in neural network performance based on dataset size, task complexity and model architecture. The interaction of position, values and the pointer rule also allow the development of nuanced tests of generalization, by introducing distribution shift and increasing functional complexity. These reveal both subtle failures and surprising successes, suggesting many promising directions of exploration on this benchmark.

[68] 2107.12581

Average-Case Analysis of Greedy Matching for D2D Resource Sharing

Given the proximity of many wireless users and their diversity in consuming local resources (e.g., data-plans, computation and even energy resources), device-to-device (D2D) resource sharing is a promising approach towards realizing a sharing economy. In the resulting networked economy, $n$ users segment themselves into sellers and buyers that need to be efficiently matched locally. This paper adopts an easy-to-implement greedy matching algorithm with distributed fashion and only sub-linear $O(\log n)$ parallel complexity, which offers a great advantage compared to the optimal but computational-expensive centralized matching. But is it efficient compared to the optimal matching? Extensive simulations indicate that in a large number of practical cases the average loss is no more than $10\%$, a far better result than the $50\%$ loss bound in the worst case. However, there is no rigorous average-case analysis in the literature to back up such encouraging findings, which is a fundamental step towards supporting the practical use of greedy matching in D2D sharing. This paper is the first to present the rigorous average analysis of certain representative classes of graphs with random parameters, by proposing a new asymptotic methodology. For typical 2D grids with random matching weights we rigorously prove that our greedy algorithm performs better than $84.9\%$ of the optimal, while for typical Erdos-Renyi random graphs we prove a lower bound of $79\%$ when the graph is neither dense nor sparse. Finally, we use realistic data to show that our random graph models approximate well D2D sharing networks encountered in practice.

[69] 2107.12585

Nearest Neighborhood-Based Deep Clustering for Source Data-absent Unsupervised Domain Adaptation

In the classic setting of unsupervised domain adaptation (UDA), the labeled source data are available in the training phase. However, in many real-world scenarios, owing to some reasons such as privacy protection and information security, the source data is inaccessible, and only a model trained on the source domain is available. This paper proposes a novel deep clustering method for this challenging task. Aiming at the dynamical clustering at feature-level, we introduce extra constraints hidden in the geometric structure between data to assist the process. Concretely, we propose a geometry-based constraint, named semantic consistency on the nearest neighborhood (SCNNH), and use it to encourage robust clustering. To reach this goal, we construct the nearest neighborhood for every target data and take it as the fundamental clustering unit by building our objective on the geometry. Also, we develop a more SCNNH-compliant structure with an additional semantic credibility constraint, named semantic hyper-nearest neighborhood (SHNNH). After that, we extend our method to this new geometry. Extensive experiments on three challenging UDA datasets indicate that our method achieves state-of-the-art results. The proposed method has significant improvement on all datasets (as we adopt SHNNH, the average accuracy increases by over 3.0\% on the large-scaled dataset). Code is available at

[70] 2107.12588

A Novel Interactive Two-stage Joint Retail Electricity Market for Multiple Microgrids

To accommodate the advent of microgrids (MG) managing distributed energy resources (DER) in distribution systems, an interactive two-stage joint retail electricity market mechanism is proposed to provide an effective platform for these prosumers to proactively join in retail transactions. Day-ahead stochastic energy trading between the distribution system operator (DSO) and MGs is conducted in the first stage of a centralized retail market, where a chance-constrained uncertainty distribution locational marginal price (CC-UDLMP) containing the cost of uncertainty precautions is used to settle transactions. In the second stage, a novel intra-day peer-to-peer-based (P2P) flexibility transaction pattern is implemented between MGs in local flexibility markets under the regulation of DSO to eliminate power imbalances caused by rolling-based estimates whilst considering systematic operations. A fully distributed iterative algorithm is presented to find the equilibrium solution of this two-stage sequential game framework. Moreover, in order to enhance the versatility of this algorithm, an improved Lp-box alternating direction methods of multipliers (ADMM) algorithm is used to efficiently resolve the first-stage stochastic economic dispatch problem with a mixed-integer second-order cone structure. It is verified that the proposed market mechanism can effectively improve the overall market efficiency under uncertainties.

[71] 2107.12589

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization

Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision. Both appearance and motion features are used in previous works, while they do not utilize them in a proper way but apply simple concatenation or score-level fusion. In this work, we argue that the features extracted from the pretrained extractor, e.g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy. Therefore, we propose a cross-modal consensus network (CO2-Net) to tackle this problem. In CO2-Net, we mainly introduce two identical proposed cross-modal consensus modules (CCM) that design a cross-modal attention mechanism to filter out the task-irrelevant information redundancy using the global information from the main modality and the cross-modal local information of the auxiliary modality. Moreover, we treat the attention weights derived from each CCMas the pseudo targets of the attention weights derived from another CCM to maintain the consistency between the predictions derived from two CCMs, forming a mutual learning manner. Finally, we conduct extensive experiments on two common used temporal action localization datasets, THUMOS14 and ActivityNet1.2, to verify our method and achieve the state-of-the-art results. The experimental results show that our proposed cross-modal consensus module can produce more representative features for temporal action localization.

[72] 2107.12591

Combining Probabilistic Logic and Deep Learning for Self-Supervised Learning

Deep learning has proven effective for various application tasks, but its applicability is limited by the reliance on annotated examples. Self-supervised learning has emerged as a promising direction to alleviate the supervision bottleneck, but existing work focuses on leveraging co-occurrences in unlabeled data for task-agnostic representation learning, as exemplified by masked language model pretraining. In this chapter, we explore task-specific self-supervision, which leverages domain knowledge to automatically annotate noisy training examples for end applications, either by introducing labeling functions for annotating individual instances, or by imposing constraints over interdependent label decisions. We first present deep probabilistic logic(DPL), which offers a unifying framework for task-specific self-supervision by composing probabilistic logic with deep learning. DPL represents unknown labels as latent variables and incorporates diverse self-supervision using probabilistic logic to train a deep neural network end-to-end using variational EM. Next, we present self-supervised self-supervision(S4), which adds to DPL the capability to learn new self-supervision automatically. Starting from an initial seed self-supervision, S4 iteratively uses the deep neural network to propose new self supervision. These are either added directly (a form of structured self-training) or verified by a human expert (as in feature-based active learning). Experiments on real-world applications such as biomedical machine reading and various text classification tasks show that task-specific self-supervision can effectively leverage domain expertise and often match the accuracy of supervised methods with a tiny fraction of human effort.

[73] 2107.12592

Detection of cybersecurity attacks through analysis of web browsing activities using principal component analysis

Organizations such as government departments and financial institutions provide online service facilities accessible via an increasing number of internet connected devices which make their operational environment vulnerable to cyber attacks. Consequently, there is a need to have mechanisms in place to detect cyber security attacks in a timely manner. A variety of Network Intrusion Detection Systems (NIDS) have been proposed and can be categorized into signature-based NIDS and anomaly-based NIDS. The signature-based NIDS, which identify the misuse through scanning the activity signature against the list of known attack activities, are criticized for their inability to identify new attacks (never-before-seen attacks). Among anomaly-based NIDS, which declare a connection anomalous if it expresses deviation from a trained model, the unsupervised learning algorithms circumvent this issue since they have the ability to identify new attacks. In this study, we use an unsupervised learning algorithm based on principal component analysis to detect cyber attacks. In the training phase, our approach has the advantage of also identifying outliers in the training dataset. In the monitoring phase, our approach first identifies the affected dimensions and then calculates an anomaly score by aggregating across only those components that are affected by the anomalies. We explore the performance of the algorithm via simulations and through two applications, namely to the UNSW-NB15 dataset recently released by the Australian Centre for Cyber Security and to the well-known KDD'99 dataset. The algorithm is scalable to large datasets in both training and monitoring phases, and the results from both the simulated and real datasets show that the method has promise in detecting suspicious network activities.

[74] 2107.12594

On decoding hyperbolic codes

Few decoding algorithms for hyperbolic codes are known in the literature, this article tries to fill this gap. The first part of this work compares hyperbolic codes and Reed-Muller codes. In particular, we determine when a Reed-Muller code is a hyperbolic code. As a byproduct, we state when a hyperbolic code has greater dimension than a Reed-Muller code when they both have the same minimum distance. We use the previous ideas to describe how to decode a hyperbolic code using the largest Reed-Muller code contained in it, or alternatively using the smallest Reed-Muller code that contains it. A combination of these two algorithms is proposed for the case when hyperbolic codes are defined by polynomials in two variables. Then, we compare hyperbolic codes and Cube codes (tensor product of Reed-Solomon codes) and we propose decoding algorithms of hyperbolic codes based on their closest Cube codes. Finally, we adapt to hyperbolic codes the Geil and Matsumoto's generalization of Sudan's list decoding algorithm.

[75] 2107.12595

Template-based Chatbot for Agriculture Related FAQs

Agriculture is the fundamental industry of the society, which is the basis of food supply and an important source of employment and GDP increase. However, the insufficient expert can not fulfill the demand of farmers. To address this problem, we design a chatbot to answer frequently asked questions in the Agriculture field. Template-based questions will be answered by AIML while LSA is used for other service-based questions. This chatbot will assist farmers by dealing with industry problems conveniently and efficiently.

[76] 2107.12596

Fully Distributed LQR-based Controller Design for Multi-input Time-varying Systems

In this paper, a cooperative Linear Quadratic Regulator (LQR) problem is investigated for multi-input systems, where each input is generated by an agent in a network. The input matrices are different and locally possessed by the corresponding agents respectively, which can be regarded as different ways for agents to control the multi-input system. By embedding a fully distributed information fusion strategy, a novel cooperative LQR-based controller is proposed. Each agent only needs to communicate with its neighbors, rather than sharing information globally in a network. Moreover, only the joint controllability is required, which allows the multi-input system to be uncontrollable for every single agent or even all its neighbors. In particular, only one-time information exchange is necessary at every control step, which significantly reduces the communication consumption. It is proved that the boundedness (convergence) of the controller gains is guaranteed for time-varying (time-invariant) systems. Furthermore, the control performance of the entire system is ensured. Generally, the proposed controller achieves a better trade-off between the control performance and the communication overhead, compared with the existing centralized/decentralized/consensus-based LQR controllers. Finally, the effectiveness of the theoretical results is illustrated by several comparative numerical examples.

[77] 2107.12597

A Long-Term Investigation on the Effects of (Personalized) Gamification on Course Participation in a Gym

Gamification is frequently used to motivate people getting more physically active. However, most systems follow a one-size-fits-all gamification approach, although past research has shown that interpersonal differences exist in the perception of gamification elements. Also, most studies investigating the effects of gamification are rather short, although it has been shown that gamification can suffer from novelty effects. In this paper, we address both these issues by investigating whether gamification elements, integrated into a fitness course booking system, have an effect on how frequently users participate in fitness courses in a gym (N=52) over a duration of 275 days (548 days including baseline). Also, the gamification elements that we implemented are tailored to specific Hexad user types, which allows us to investigate whether using suitable gamification elements leads to an increased course participation. Our results show that gamification increased the participation in fitness courses significantly and that users who received a suitable set of gamification elements - according to their Hexad user type - increased their participation significantly more than others.

[78] 2107.12598

Identify Apple Leaf Diseases Using Deep Learning Algorithm

Agriculture is an essential industry in the both society and economy of a country. However, the pests and diseases cause a great amount of reduction in agricultural production while there is not sufficient guidance for farmers to avoid this disaster. To address this problem, we apply CNNs to plant disease recognition by building a classification model. Within the dataset of 3,642 images of apple leaves, We use a pre-trained image classification model Restnet34 based on a Convolutional neural network (CNN) with the Fastai framework in order to save the training time. Overall, the accuracy of classification is 93.765%.

[79] 2107.12599

Design Guidelines to Increase the Persuasiveness of Achievement Goals for Physical Activity

Achievement goals are frequently used to support behavior change. However, they are often not specifically designed for this purpose nor account for the degree to which a user is already intending to perform the target behavior. In this paper, we investigate the perceived persuasiveness of different goal types as defined by the 3x2 Achievement Goal Model, what people like and dislike about them and the role that behavior change intentions play when aiming at increasing step counts. We created visualizations for each goal type based on a qualitative pre-study (N=18) and ensured their comprehensibility (N=18). In an online experiment (N=118), we show that there are differences in the perception of these goal types and that behavior change intentions should be considered to maximize their persuasiveness as goals evolve. Next, we derive design guidelines on when to use which type of achievement goal and what to consider when using them

[80] 2107.12600

PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

Since the superiority of Transformer in learning long-term dependency, the sign language Transformer model achieves remarkable progress in Sign Language Recognition (SLR) and Translation (SLT). However, there are several issues with the Transformer that prevent it from better sign language understanding. The first issue is that the self-attention mechanism learns sign video representation in a frame-wise manner, neglecting the temporal semantic structure of sign gestures. Secondly, the attention mechanism with absolute position encoding is direction and distance unaware, thus limiting its ability. To address these issues, we propose a new model architecture, namely PiSLTRc, with two distinctive characteristics: (i) content-aware and position-aware convolution layers. Specifically, we explicitly select relevant features using a novel content-aware neighborhood gathering method. Then we aggregate these features with position-informed temporal convolution layers, thus generating robust neighborhood-enhanced sign representation. (ii) injecting the relative position information to the attention mechanism in the encoder, decoder, and even encoder-decoder cross attention. Compared with the vanilla Transformer model, our model performs consistently better on three large-scale sign language benchmarks: PHOENIX-2014, PHOENIX-2014-T and CSL. Furthermore, extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on translation quality with $+1.6$ BLEU improvements.

[81] 2107.12603

Federated Learning Meets Natural Language Processing: A Survey

Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.

[82] 2107.12604

Image Scene Graph Generation (SGG) Benchmark

There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection. Due to the lack of a good benchmark, the reported results of different scene graph generation models are not directly comparable, impeding the research progress. We have developed a much-needed scene graph generation benchmark based on the maskrcnn-benchmark and several popular models. This paper presents main features of our benchmark and a comprehensive ablation study of scene graph generation models using the Visual Genome and OpenImages Visual relationship detection datasets. Our codebase is made publicly available at

[83] 2107.12606

Measuring daily-life fear perception change: a computational study in the context of COVID-19

COVID-19, as a global health crisis, has triggered the fear emotion with unprecedented intensity. Besides the fear of getting infected, the outbreak of COVID-19 also created significant disruptions in people's daily life and thus evoked intensive psychological responses indirect to COVID-19 infections. Here, we construct an expressed fear database using 16 million social media posts generated by 536 thousand users between January 1st, 2019 and August 31st, 2020 in China. We employ deep learning techniques to detect the fear emotion within each post and apply topic models to extract the central fear topics. Based on this database, we find that sleep disorders ("nightmare" and "insomnia") take up the largest share of fear-labeled posts in the pre-pandemic period (January 2019-December 2019), and significantly increase during the COVID-19. We identify health and work-related concerns are the two major sources of fear induced by the COVID-19. We also detect gender differences, with females generating more posts containing the daily-life fear sources during the COVID-19 period. This research adopts a data-driven approach to trace back public emotion, which can be used to complement traditional surveys to achieve real-time emotion monitoring to discern societal concerns and support policy decision-making.

[84] 2107.12612

Poisoning of Online Learning Filters: DDoS Attacks and Countermeasures

The recent advancements in machine learning have led to a wave of interest in adopting online learning-based approaches for long-standing attack mitigation issues. In particular, DDoS attacks remain a significant threat to network service availability even after more than two decades. These attacks have been well studied under the assumption that malicious traffic originates from a single attack profile. Based on this premise, malicious traffic characteristics are assumed to be considerably different from legitimate traffic. Consequently, online filtering methods are designed to learn network traffic distributions adaptively and rank requests according to their attack likelihood. During an attack, requests rated as malicious are precipitously dropped by the filters. In this paper, we conduct the first systematic study on the effects of data poisoning attacks on online DDoS filtering; introduce one such attack method, and propose practical protective countermeasures for these attacks. We investigate an adverse scenario where the attacker is "crafty", switching profiles during attacks and generating erratic attack traffic that is ever-shifting. This elusive attacker generates malicious requests by manipulating and shifting traffic distribution to poison the training data and corrupt the filters. To this end, we present a generative model MimicShift, capable of controlling traffic generation while retaining the originating regular traffic's intrinsic properties. Comprehensive experiments show that online learning filters are highly susceptible to poisoning attacks, sometimes performing much worse than a random filtering strategy in this attack scenario. At the same time, our proposed protective countermeasure effectively minimizes the attack impact.

[85] 2107.12613

Iterative Reed-Muller Decoding

Reed-Muller (RM) codes are known for their good maximum likelihood (ML) performance in the short block-length regime. Despite being one of the oldest classes of channel codes, finding a low complexity soft-input decoding scheme is still an open problem. In this work, we present a belief propagation (BP) decoding architecture for RM codes based on their rich automorphism group. The decoding algorithm can be seen as a generalization of multiple-bases belief propagation (MBBP) using polar BP as constituent decoders. We provide extensive error-rate performance simulations and compare our results to existing decoding schemes. We report a near-ML performance for the RM(3,7)-code (e.g., 0.05 dB away from the ML bound at BLER of $10^{-4}$) at a competitive computational cost. To the best of our knowledge, our proposed decoder achieves the best performance of all iterative RM decoders presented thus far.

[86] 2107.12614

Design and Analysis of a Robotic Lizard using Five-Bar Mechanism

Legged robots are being used to explore rough terrains as they are capable of traversing gaps and obstacles. In this paper, a new mechanism is designed to replicate a robotic lizard using integrated five-bar mechanisms. There are two five bar mechanisms from which two more are formed by connecting the links in a particular order. The legs are attached to the links of the five bar mechanism such that, when the mechanism is actuated, they move the robot forward. Position analysis using vector loop approach has been done for the mechanism. A prototype has been built and controlled using servo motors to verify the robotic lizard mechanism.

[87] 2107.12617

VIPose: Real-time Visual-Inertial 6D Object Pose Tracking

Estimating the 6D pose of objects is beneficial for robotics tasks such as transportation, autonomous navigation, manipulation as well as in scenarios beyond robotics like virtual and augmented reality. With respect to single image pose estimation, pose tracking takes into account the temporal information across multiple frames to overcome possible detection inconsistencies and to improve the pose estimation efficiency. In this work, we introduce a novel Deep Neural Network (DNN) called VIPose, that combines inertial and camera data to address the object pose tracking problem in real-time. The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose between consecutive image frames. The overall 6D pose is then estimated by consecutively combining relative poses. Our approach shows remarkable pose estimation results for heavily occluded objects that are well known to be very challenging to handle by existing state-of-the-art solutions. The effectiveness of the proposed approach is validated on a new dataset called VIYCB with RGB image, IMU data, and accurate 6D pose annotations created by employing an automated labeling technique. The approach presents accuracy performances comparable to state-of-the-art techniques, but with additional benefit to be real-time.

[88] 2107.12618

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021

This technical report presents an overview of our solution used in the submission to 2021 HACS Temporal Action Localization Challenge on both Supervised Learning Track and Weakly-Supervised Learning Track. Temporal Action Localization (TAL) requires to not only precisely locate the temporal boundaries of action instances, but also accurately classify the untrimmed videos into specific categories. However, Weakly-Supervised TAL indicates locating the action instances using only video-level class labels. In this paper, to train a supervised temporal action localizer, we adopt Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through ``local and global" temporal context aggregation and complementary as well as progressive boundary refinement. As for the WSTAL, a novel framework is proposed to handle the poor quality of CAS generated by simple classification network, which can only focus on local discriminative parts, rather than locate the entire interval of target actions. Further inspired by the transfer learning method, we also adopt an additional module to transfer the knowledge from trimmed videos (HACS Clips dataset) to untrimmed videos (HACS Segments dataset), aiming at promoting the classification performance on untrimmed videos. Finally, we employ a boundary regression module embedded with Outer-Inner-Contrastive (OIC) loss to automatically predict the boundaries based on the enhanced CAS. Our proposed scheme achieves 39.91 and 29.78 average mAP on the challenge testing set of supervised and weakly-supervised temporal action localization track respectively.

[89] 2107.12619

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting

Recently, the problem of inaccurate learning targets in crowd counting draws increasing attention. Inspired by a few pioneering work, we solve this problem by trying to predict the indices of pre-defined interval bins of counts instead of the count values themselves. However, an inappropriate interval setting might make the count error contributions from different intervals extremely imbalanced, leading to inferior counting performance. Therefore, we propose a novel count interval partition criterion called Uniform Error Partition (UEP), which always keeps the expected counting error contributions equal for all intervals to minimize the prediction risk. Then to mitigate the inevitably introduced discretization errors in the count quantization process, we propose another criterion called Mean Count Proxies (MCP). The MCP criterion selects the best count proxy for each interval to represent its count value during inference, making the overall expected discretization error of an image nearly negligible. As far as we are aware, this work is the first to delve into such a classification task and ends up with a promising solution for count interval partition. Following the above two theoretically demonstrated criterions, we propose a simple yet effective model termed Uniform Error Partition Network (UEPNet), which achieves state-of-the-art performance on several challenging datasets. The codes will be available at:

[90] 2107.12621

Intrusion Detection with Machine Learning Using Open-Sourced Datasets

No significant research has been conducted so far on Intrusion detection due to data availability since, network traffic within companies is private information and no available logs can be found on the Internet for independent research. This paper aims to answer the question whether open-sourced data, that is usually simulated network traffic can assist in developing a robust model that will effectively recognize and deter possible denial of service or infiltration attacks.

[91] 2107.12626

Unsupervised Deep Anomaly Detection for Multi-Sensor Time-Series Signals

Nowadays, multi-sensor technologies are applied in many fields, e.g., Health Care (HC), Human Activity Recognition (HAR), and Industrial Control System (ICS). These sensors can generate a substantial amount of multivariate time-series data. Unsupervised anomaly detection on multi-sensor time-series data has been proven critical in machine learning researches. The key challenge is to discover generalized normal patterns by capturing spatial-temporal correlation in multi-sensor data. Beyond this challenge, the noisy data is often intertwined with the training data, which is likely to mislead the model by making it hard to distinguish between the normal, abnormal, and noisy data. Few of previous researches can jointly address these two challenges. In this paper, we propose a novel deep learning-based anomaly detection algorithm called Deep Convolutional Autoencoding Memory network (CAE-M). We first build a Deep Convolutional Autoencoder to characterize spatial dependence of multi-sensor data with a Maximum Mean Discrepancy (MMD) to better distinguish between the noisy, normal, and abnormal data. Then, we construct a Memory Network consisting of linear (Autoregressive Model) and non-linear predictions (Bidirectional LSTM with Attention) to capture temporal dependence from time-series data. Finally, CAE-M jointly optimizes these two subnetworks. We empirically compare the proposed approach with several state-of-the-art anomaly detection methods on HAR and HC datasets. Experimental results demonstrate that our proposed model outperforms these existing methods.

[92] 2107.12627

Cross-lingual Transferring of Pre-trained Contextualized Language Models

Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant. In this work, building upon the recent works connecting cross-lingual model transferring and neural machine translation, we thus propose a novel cross-lingual model transferring framework for PrLMs: TreLM. To handle the symbol order and sequence length differences between languages, we propose an intermediate ``TRILayer" structure that learns from these differences and creates a better transfer in our primary translation direction, as well as a new cross-lingual language modeling objective for transfer training. Additionally, we showcase an embedding aligning that adversarially adapts a PrLM's non-contextualized embedding space and the TRILayer structure to learn a text transformation network across languages, which addresses the vocabulary difference between languages. Experiments on both language understanding and structure parsing tasks show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency. Moreover, despite an insignificant performance loss compared to pre-training from scratch in resource-rich scenarios, our cross-lingual model transferring framework is significantly more economical.

[93] 2107.12628

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

Confidence calibration is of great importance to the reliability of decisions made by machine learning systems. However, discriminative classifiers based on deep neural networks are often criticized for producing overconfident predictions that fail to reflect the true correctness likelihood of classification accuracy. We argue that such an inability to model uncertainty is mainly caused by the closed-world nature in softmax: a model trained by the cross-entropy loss will be forced to classify input into one of $K$ pre-defined categories with high probability. To address this problem, we for the first time propose a novel $K$+1-way softmax formulation, which incorporates the modeling of open-world uncertainty as the extra dimension. To unify the learning of the original $K$-way classification task and the extra dimension that models uncertainty, we propose a novel energy-based objective function, and moreover, theoretically prove that optimizing such an objective essentially forces the extra dimension to capture the marginal data distribution. Extensive experiments show that our approach, Energy-based Open-World Softmax (EOW-Softmax), is superior to existing state-of-the-art methods in improving confidence calibration.

[94] 2107.12630

Low-Complexity Improved-Throughput Generalised Spatial Modulation: Bit-to-Symbol Mapping, Detection and Performance Analysis

Low-complexity improved-throughput generalised spatial modulation (LCIT-GSM) is proposed. More explicitly, in GSM, extra information bits are conveyed implicitly by activating a fixed number $N_{a}$ out of $N_{t}$ transmit antennas (TAs) at a time. As a result, GSM has the advantage of a reduced number of radio-frequency (RF) chains and reduced inter-antenna interference (IAI) at the cost of a lower throughput than its multiplexing-oriented full-RF based counterparts. Variable-${N_a}$ GSM mitigates this throughput reduction by incorporating all possible TA activation patterns associated with a variable value $N_{a}$ ranging from $1$ to $N_{t}$ during a single channel-use, which maximises the throughput of GSM but suffers a high complexity of the mapping book design and demodulation. In order to mitigate the complexity, \emph{first of all}, we propose two efficient schemes for mapping the information bits to the TA activation patterns, which can be readily scaled to massive MIMO setups. \emph{Secondly}, in the absence of IAI, we derive a pair of low-complexity near-optimal detectors, one of them has a reduced search scope, while the other benefits from a decoupled single-stream based signal detection algorithm. \emph{Finally}, the performance of the proposed LCIT-GSM system is characterised by the error probability upper bound (UB). Our Monte Carlo based simulation results confirm the improved error performance of our proposed scheme, despite its reduced signal detection complexity.

[95] 2107.12636

Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Detection transformers have recently shown promising object detection results and attracted increasing attention. However, how to develop effective domain adaptation techniques to improve its cross-domain performance remains unexplored and unclear. In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. Technically, SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains. DQFA reduces the domain discrepancy in global feature representations and object relations when deploying in the transformer encoder and decoder, respectively. Meanwhile, TDA aligns token features in the sequence from both domains, which reduces the domain gaps in local and instance-level feature representations in the transformer encoder and decoder, respectively. Besides, a novel bipartite matching consistency loss is proposed to enhance the feature discriminability for robust object detection. Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods. Code has been made available at:

[96] 2107.12637

Topology Design and Position Analysis of a Reconfigurable Modular Hybrid-Parallel Manipulator

In the modern days, manipulators are found in the automated assembly lines of industries that produce products in masses. These manipulators can be used only in one configuration, that is either serial or parallel. In this paper, a new module which has two degrees of freedom is introduced. By connecting the two and three modules in series, 4 and 6 DoF hybrid manipulators can be formed respectively. By erecting 3 modules in parallel and with some minor modifications, a 6 DoF parallel manipulator can be formed. Hence the manipulator is reconfigurable and can be used as hybrid or parallel manipulator by disassembling and assembling. The topology design, forward and inverse position analysis has been done for the two hybrid configurations and the parallel configuration. This manipulator can be used in industries where flexible manufacturing system is to be deployed. The three configurations of the parallel manipulator has been experimentally demonstrated using a graphical user interface (GUI) control through a computer.

[97] 2107.12639

Structure and temporal evolution of transportation literature

Fifty years of evolution of the transportation field is revisited at a macro scale using scientometric analysis of all publications in all 39 journals indexed in the category of Transportation by the Web of Science. The size of the literature is estimated to have reached 50,000 documents. At the highest level of aggregation, four major divisions of the literature are differentiated through these analyses, namely (i) network analysis and traffic flow, (ii) economics of transportation and logistics, (iii) travel behaviour, and (iv) road safety. Influential and emerging authors of each division are identified. Temporal trends in transportation research are also investigated via document co-citation analysis. This analysis identifies various major streams of transportation research while determining their approximate time of emergence and duration of activity. It documents topics that have been most trendy at any period of time during the last fifty years. Three clusters associated with the travel behaviour division (collectively embodying topics of land-use, active transportation, residential self-selection, traveller experience/satisfaction, social exclusion and transport/spatial equity), one cluster of statistical modelling of road accidents, and a cluster of network modelling linked predominantly to the notion of macroscopic fundamental diagram demonstrate characteristics of being current hot topics of the field. Three smaller clusters linked predominantly to electric mobility and autonomous/automated vehicles show characteristics of being emerging hot topics. A cluster labelled shared mobility is the youngest emerging cluster. Influential articles within each cluster of references are identified. Additional outcomes are the determination the influential outsiders of the transportation field.

[98] 2107.12642

Unsupervised Outlier Detection using Memory and Contrastive Learning

Outlier detection is one of the most important processes taken to create good, reliable data in machine learning. The most methods of outlier detection leverage an auxiliary reconstruction task by assuming that outliers are more difficult to be recovered than normal samples (inliers). However, it is not always true, especially for auto-encoder (AE) based models. They may recover certain outliers even outliers are not in the training data, because they do not constrain the feature learning. Instead, we think outlier detection can be done in the feature space by measuring the feature distance between outliers and inliers. We then propose a framework, MCOD, using a memory module and a contrastive learning module. The memory module constrains the consistency of features, which represent the normal data. The contrastive learning module learns more discriminating features, which boosts the distinction between outliers and inliers. Extensive experiments on four benchmark datasets show that our proposed MCOD achieves a considerable performance and outperforms nine state-of-the-art methods.

[99] 2107.12646

Computer Vision-Based Guidance Assistance Concept for Plowing Using RGB-D Camera

This paper proposes a concept of computer vision-based guidance assistance for agricultural vehicles to increase the accuracy in plowing and reduce driver's cognitive burden in long-lasting tillage operations. Plowing is a common agricultural practice to prepare the soil for planting in many countries and it can take place both in the spring and the fall. Since plowing operation requires high traction forces, it causes increased energy consumption. Moreover, longer operation time due to unnecessary maneuvers leads to higher fuel consumption. To provide necessary information for the driver and the control unit of the tractor, a first concept of furrow detection system based on an RGB-D camera was developed.

[100] 2107.12648

Gradient Play in $n$-Cluster Games with Zero-Order Information

We study a distributed approach for seeking a Nash equilibrium in $n$-cluster games with strictly monotone mappings. Each player within each cluster has access to the current value of her own smooth local cost function estimated by a zero-order oracle at some query point. We assume the agents to be able to communicate with their neighbors in the same cluster over some undirected graph. The goal of the agents in the cluster is to minimize their collective cost. This cost depends, however, on actions of agents from other clusters. Thus, a game between the clusters is to be solved. We present a distributed gradient play algorithm for determining a Nash equilibrium in this game. The algorithm takes into account the communication settings and zero-order information under consideration. We prove almost sure convergence of this algorithm to a Nash equilibrium given appropriate estimations of the local cost functions' gradients.

[101] 2107.12650

Joint Power and User Grouping Optimization in Cell-Free Massive MIMO Systems

To relieve the stress on channel estimation and decoding complexity in cell-free massive multiple-input multiple-output (MIMO) systems, user grouping problem is investigated in this paper, where access points (APs) based on time-division duplex (TDD) are considered to serve users on different time resources and the same frequency resource. In addition, when quality of service (QoS) requirements are considered, widely-used max-min power control is no longer applicable. We derive the minimum power constraints under diverse QoS requirements considering user grouping. Based on the analysis, we formulate the joint power and user grouping problem under QoS constraints, aiming at minimizing the total transmit power. A generalized benders decomposition (GBD) based algorithm is proposed, where the primal problem and master problem are solved iteratively to approach the optimal solution. Simulation results demonstrate that by user grouping, the number of users served in cell-free MIMO systems can be as much as the number of APs without increasing the complexity of channel estimation and decoding. Furthermore, with the proposed user grouping strategy, the power consumption can be reduced by 2-3 dB compared with the reference user grouping strategy{, and by 7 dB compared with the total transmit power without grouping.

[102] 2107.12651

Greedy Gradient Ensemble for Robust Visual Question Answering

Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information. As a result, they suffer from performance drop on out-of-distribution data and inadequate visual explanation. Based on experimental analysis for existing robust VQA methods, we stress the language bias in VQA that comes from two aspects, i.e., distribution bias and shortcut bias. We further propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning. With the greedy strategy, GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models. The experiments demonstrate that our method makes better use of visual information and achieves state-of-the-art performance on diagnosing dataset VQA-CP without using extra annotations.

[103] 2107.12654

Co-Transport for Class-Incremental Learning

Traditional learning systems are trained in closed-world for a fixed number of classes, and need pre-collected datasets in advance. However, new classes often emerge in real-world applications and should be learned incrementally. For example, in electronic commerce, new types of products appear daily, and in a social media community, new topics emerge frequently. Under such circumstances, incremental models should learn several new classes at a time without forgetting. We find a strong correlation between old and new classes in incremental learning, which can be applied to relate and facilitate different learning stages mutually. As a result, we propose CO-transport for class Incremental Learning (COIL), which learns to relate across incremental tasks with the class-wise semantic relationship. In detail, co-transport has two aspects: prospective transport tries to augment the old classifier with optimal transported knowledge as fast model adaptation. Retrospective transport aims to transport new class classifiers backward as old ones to overcome forgetting. With these transports, COIL efficiently adapts to new tasks, and stably resists forgetting. Experiments on benchmark and real-world multimedia datasets validate the effectiveness of our proposed method.

[104] 2107.12655

CKConv: Learning Feature Voxelization for Point Cloud Analysis

Despite the remarkable success of deep learning, optimal convolution operation on point cloud remains indefinite due to its irregular data structure. In this paper, we present Cubic Kernel Convolution (CKConv) that learns to voxelize the features of local points by exploiting both continuous and discrete convolutions. Our continuous convolution uniquely employs a 3D cubic form of kernel weight representation that splits a feature into voxels in embedding space. By consecutively applying discrete 3D convolutions on the voxelized features in a spatial manner, preceding continuous convolution is forced to learn spatial feature mapping, i.e., feature voxelization. In this way, geometric information can be detailed by encoding with subdivided features, and our 3D convolutions on these fixed structured data do not suffer from discretization artifacts thanks to voxelization in embedding space. Furthermore, we propose a spatial attention module, Local Set Attention (LSA), to provide comprehensive structure awareness within the local point set and hence produce representative features. By learning feature voxelization with LSA, CKConv can extract enriched features for effective point cloud analysis. We show that CKConv has great applicability to point cloud processing tasks including object classification, object part segmentation, and scene semantic segmentation with state-of-the-art results.

[105] 2107.12657

Continual Learning with Neuron Activation Importance

Continual learning is a concept of online learning with multiple sequential tasks. One of the critical barriers of continual learning is that a network should learn a new task keeping the knowledge of old tasks without access to any data of the old tasks. In this paper, we propose a neuron activation importance-based regularization method for stable continual learning regardless of the order of tasks. We conduct comprehensive experiments on existing benchmark data sets to evaluate not just the stability and plasticity of our method with improved classification accuracy also the robustness of the performance along the changes of task order.

[106] 2107.12659

Employee-Driven Innovation to Fuel Internal Software Startups: Preliminary Findings

To keep up with the pace of innovation, established companies are increasingly relying on internal software startups. However, succeeding with such startups is a challenging task because internal startups need to find a balance between the interests of the company and the interest of the innovator. One approach that is argued to strengthen innovation in existing companies is employee-driven innovation (EDI). This study explores this argument by examining two internal software startups in companies aligned with the principles of EDI and with a strong focus on innovation. The preliminary findings indicate that startups with EDI are characterized by commitment towards innovation, cooperative orientation, and autonomy. The findings suggest that internal software startups may be strengthened when the parent companies practice EDI.

[107] 2107.12660

The Pursuit and Evasion of Drones Attacking an Automated Turret

This paper investigates the pursuit-evasion problem of a defensive gun turret and one or more attacking drones. The turret must ``visit" each attacking drone once, as quickly as possible, to defeat the threat. This constitutes a Shortest Hamiltonian Path (SHP) through the drones. The investigation considers situations with increasing fidelity, starting with a 2D kinematic model and progressing to a 3D dynamic model. In 2D we determine the region from which one or more drones can always reach a turret, or the region close enough to it where they can evade the turret. This provides optimal starting angles for $n$ drones around a turret and the maximum starting radius for one and two drones. We show that safety regions also exist in 3D and provide a controller so that a drone in this region can evade the pan-tilt turret. Through simulations we explore the maximum range $n$ drones can start and still have at least one reach the turret, and analyze the effect of turret behavior and the drones' number, starting configuration, and behaviors.

[108] 2107.12664

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

Arbitrary shape text detection is a challenging task due to the high complexity and variety of scene texts. In this work, we propose a novel adaptive boundary proposal network for arbitrary shape text detection, which can learn to directly produce accurate boundary for arbitrary shape text without any post-processing. Our method mainly consists of a boundary proposal model and an innovative adaptive boundary deformation model. The boundary proposal model constructed by multi-layer dilated convolutions is adopted to produce prior information (including classification map, distance field, and direction field) and coarse boundary proposals. The adaptive boundary deformation model is an encoder-decoder network, in which the encoder mainly consists of a Graph Convolutional Network (GCN) and a Recurrent Neural Network (RNN). It aims to perform boundary deformation in an iterative way for obtaining text instance shape guided by prior information from the boundary proposal model.In this way, our method can directly and efficiently generate accurate text boundaries without complex post-processing. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method.

[109] 2107.12666

Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification

Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level features from the two modalities. Second, we design a multi-view non-local network that captures the relationships between body parts, thereby establishing better correspondences between body parts and noun phrases. Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features. Finally, to expedite future research in text-to-image ReID, we build a new database named ICFG-PEDES. Extensive experiments demonstrate that SSAN outperforms state-of-the-art approaches by significant margins. Both the new ICFG-PEDES database and the SSAN code are available at

[110] 2107.12668

Next-Generation Multiple Access Based on NOMA with Power Level Modulation

To cope with the explosive traffic growth of next-generation wireless communications, it is necessary to design next-generation multiple access techniques that can provide higher spectral efficiency as well as larger-scale connectivity. As a promising candidate, power-domain non-orthogonal multiple access (NOMA) has been widely studied. In conventional power-domain NOMA, multiple users are multiplexed in the same time and frequency band by different preset power levels, which, however, may limit the spectral efficiency under practical finite alphabet inputs. Inspired by the concept of spatial modulation, we propose to solve this problem by encoding extra information bits into the power levels, and exploit different signal constellations to help the receiver distinguish between them. To convey this idea, termed power selection (PS)-NOMA, clearly, we consider a simple downlink two-user NOMA system with finite input constellations. Assuming maximum-likelihood detection, we derive closed-form approximate bit error ratio (BER) expressions for both users. The achievable rates of both users are also derived in closed form. Simulation results verify the analysis and show that the proposed PS-NOMA outperforms conventional NOMA in terms of BER and achievable rate.

[111] 2107.12672

Differentiable Direct Volume Rendering

We present a differentiable volume rendering solution that provides differentiability of all continuous parameters of the volume rendering process. This differentiable renderer is used to steer the parameters towards a setting with an optimal solution of a problem-specific objective function. We have tailored the approach to volume rendering by enforcing a constant memory footprint via analytic inversion of the blending functions. This makes it independent of the number of sampling steps through the volume and facilitates the consideration of small-scale changes. The approach forms the basis for automatic optimizations regarding external parameters of the rendering process and the volumetric density field itself. We demonstrate its use for automatic viewpoint selection using differentiable entropy as objective, and for optimizing a transfer function from rendered images of a given volume. Optimization of per-voxel densities is addressed in two different ways: First, we mimic inverse tomography and optimize a 3D density field from images using an absorption model. This simplification enables comparisons with algebraic reconstruction techniques and state-of-the-art differentiable path tracers. Second, we introduce a novel approach for tomographic reconstruction from images using an emission-absorption model with post-shading via an arbitrary transfer function.

[112] 2107.12673

COPS: Controlled Pruning Before Training Starts

State-of-the-art deep neural network (DNN) pruning techniques, applied one-shot before training starts, evaluate sparse architectures with the help of a single criterion -- called pruning score. Pruning weights based on a solitary score works well for some architectures and pruning rates but may also fail for other ones. As a common baseline for pruning scores, we introduce the notion of a generalized synaptic score (GSS). In this work we do not concentrate on a single pruning criterion, but provide a framework for combining arbitrary GSSs to create more powerful pruning strategies. These COmbined Pruning Scores (COPS) are obtained by solving a constrained optimization problem. Optimizing for more than one score prevents the sparse network to overly specialize on an individual task, thus COntrols Pruning before training Starts. The combinatorial optimization problem given by COPS is relaxed on a linear program (LP). This LP is solved analytically and determines a solution for COPS. Furthermore, an algorithm to compute it for two scores numerically is proposed and evaluated. Solving COPS in such a way has lower complexity than the best general LP solver. In our experiments we compared pruning with COPS against state-of-the-art methods for different network architectures and image classification tasks and obtained improved results.

[113] 2107.12674

Vision-Guided Forecasting -- Visual Context for Multi-Horizon Time Series Forecasting

Autonomous driving gained huge traction in recent years, due to its potential to change the way we commute. Much effort has been put into trying to estimate the state of a vehicle. Meanwhile, learning to forecast the state of a vehicle ahead introduces new capabilities, such as predicting dangerous situations. Moreover, forecasting brings new supervision opportunities by learning to predict richer a context, expressed by multiple horizons. Intuitively, a video stream originated from a front-facing camera is necessary because it encodes information about the upcoming road. Besides, historical traces of the vehicle's states give more context. In this paper, we tackle multi-horizon forecasting of vehicle states by fusing the two modalities. We design and experiment with 3 end-to-end architectures that exploit 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces. To demonstrate the effectiveness of our method, we perform extensive experiments on two publicly available real-world datasets, Comma2k19 and the Udacity challenge. We show that we are able to forecast a vehicle's state to various horizons, while outperforming the current state-of-the-art results on the related task of driving state estimation. We examine the contribution of vision features, and find that a model fed with vision features achieves an error that is 56.6% and 66.9% of the error of a model that doesn't use those features, on the Udacity and Comma2k19 datasets respectively.

[114] 2107.12675

Feature Fusion Methods for Indexing and Retrieval of Biometric Data: Application to Face Recognition with Privacy Protection

Computationally efficient, accurate, and privacy-preserving data storage and retrieval are among the key challenges faced by practical deployments of biometric identification systems worldwide. In this work, a method of protected indexing of biometric data is presented. By utilising feature-level fusion of intelligently paired templates, a multi-stage search structure is created. During retrieval, the list of potential candidate identities is successively pre-filtered, thereby reducing the number of template comparisons necessary for a biometric identification transaction. Protection of the biometric probe templates, as well as the stored reference templates and the created index is carried out using homomorphic encryption. The proposed method is extensively evaluated in closed-set and open-set identification scenarios on publicly available databases using two state-of-the-art open-source face recognition systems. With respect to a typical baseline algorithm utilising an exhaustive search-based retrieval algorithm, the proposed method enables a reduction of the computational workload associated with a biometric identification transaction by 90%, while simultaneously suffering no degradation of the biometric performance. Furthermore, by facilitating a seamless integration of template protection with open-source homomorphic encryption libraries, the proposed method guarantees unlinkability, irreversibility, and renewability of the protected biometric data.

[115] 2107.12676

QoS-aware User Grouping Strategy for Downlink Multi-Cell NOMA Systems

In multi-cell non-orthogonal multiple access (NOMA) systems, designing an appropriate user grouping strategy is an open problem due to diverse quality of service (QoS) requirements and inter-cell interference. In this paper, we exploit both game theory and graph theory to study QoS-aware user grouping strategies, aiming at minimizing power consumption in downlink multi-cell NOMA systems. Under different QoS requirements, we derive the optimal successive interference cancellation (SIC) decoding order with inter-cell interference, which is different from existing SIC decoding order of increasing channel gains, and obtain the corresponding power allocation strategy. Based on this, the exact potential game model of the user grouping strategies adopted by multiple cells is formulated. We prove that, in this game, the problem for each player to find a grouping strategy can be converted into the problem of searching for specific negative loops in the graph composed of users. Bellman-Ford algorithm is expanded to find these negative loops. Furthermore, we design a greedy based suboptimal strategy to approach the optimal solution with polynomial time. Extensive simulations confirm the effectiveness of grouping users with consideration of QoS and inter-cell interference, and show that the proposed strategies can considerably reduce total power consumption comparing with reference strategies.

[116] 2107.12677

Deep Variational Models for Collaborative Filtering-based Recommender Systems

Deep learning provides accurate collaborative filtering models to improve recommender system results. Deep matrix factorization and their related collaborative neural networks are the state-of-art in the field; nevertheless, both models lack the necessary stochasticity to create the robust, continuous, and structured latent spaces that variational autoencoders exhibit. On the other hand, data augmentation through variational autoencoder does not provide accurate results in the collaborative filtering field due to the high sparsity of recommender systems. Our proposed models apply the variational concept to inject stochasticity in the latent space of the deep architecture, introducing the variational technique in the neural collaborative filtering field. This method does not depend on the particular model used to generate the latent representation. In this way, this approach can be applied as a plugin to any current and future specific models. The proposed models have been tested using four representative open datasets, three different quality measures, and state-of-art baselines. The results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect. Additionally, a framework is provided to enable the reproducibility of the conducted experiments.

[117] 2107.12679

MFAGAN: A Compression Framework for Memory-Efficient On-Device Super-Resolution GAN

Generative adversarial networks (GANs) have promoted remarkable advances in single-image super-resolution (SR) by recovering photo-realistic images. However, high memory consumption of GAN-based SR (usually generators) causes performance degradation and more energy consumption, hindering the deployment of GAN-based SR into resource-constricted mobile devices. In this paper, we propose a novel compression framework \textbf{M}ulti-scale \textbf{F}eature \textbf{A}ggregation Net based \textbf{GAN} (MFAGAN) for reducing the memory access cost of the generator. First, to overcome the memory explosion of dense connections, we utilize a memory-efficient multi-scale feature aggregation net as the generator. Second, for faster and more stable training, our method introduces the PatchGAN discriminator. Third, to balance the student discriminator and the compressed generator, we distill both the generator and the discriminator. Finally, we perform a hardware-aware neural architecture search (NAS) to find a specialized SubGenerator for the target mobile phone. Benefiting from these improvements, the proposed MFAGAN achieves up to \textbf{8.3}$\times$ memory saving and \textbf{42.9}$\times$ computation reduction, with only minor visual quality degradation, compared with ESRGAN. Empirical studies also show $\sim$\textbf{70} milliseconds latency on Qualcomm Snapdragon 865 chipset.

[118] 2107.12682

Time-Varying Fuzzy Contour Trees

We present a holistic, topology-based visualization technique for spatial time series data based on an adaptation of Fuzzy Contour Trees. Common analysis approaches for time dependent scalar fields identify and track specific features. To give a more general overview of the data, we extend Fuzzy Contour Trees, from the visualization and simultaneous analysis of the topology of multiple scalar fields, to time dependent scalar fields. The resulting time-varying Fuzzy Contour Trees allow the comparison of multiple time steps that are not required to be consecutive. We provide specific interaction and navigation possibilities that allow the exploration of individual time steps and time windows in addition to the behavior of the contour trees over all time steps. To achieve this, we reduce an existing alignment to multiple sub-alignments and adapt the Fuzzy Contour Tree-layout to continuously reflect changes and similarities in the sub-alignments. We apply time-varying Fuzzy Contour Trees to different real-world data sets and demonstrate their usefulness.

[119] 2107.12683

Stability for finite element discretization of some elliptic inverse parameter problems from internal data -- application to elastography

In this article, we provide stability estimates for the finite element discretization of a class of inverse parameter problems of the form $-\nabla\cdot(\mu S) = \g f$ in a domain $\Omega$ of $\R^d$. Here $\mu$ is the unknown parameter to recover, the matrix valued function $S$ and the vector valued distribution $\g f$ are known. As uniqueness is not guaranteed in general for this problem, we prove a Lipschitz-type stability estimate in an hyperplane of $L^2(\Omega)$. This stability is obtained through an adaptation of the so-called discrete \emph{inf-sup} constant or LBB constant to a large class of first-order differential operators. We then provide a simple and original discretization based on hexagonal finite element that satisfies the discrete stability condition and shows corresponding numerical reconstructions. The obtained algebraic inversion method is efficient as it does not require any iterative solving of the forward problem and is very general as it does not require any smoothness hypothesis for the data nor any additional information at the boundary.

[120] 2107.12685

On the Role of Optimization in Double Descent: A Least Squares Study

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the covariance matrix of the input features, via a functional form that has the double descent behavior. This gives a new perspective on the double descent curves reported in the literature. Our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing work, shedding some light on a possible cause of this phenomena, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the covariance of intermediary hidden activations has a similar behavior as the one predicted by our derivations.

[121] 2107.12688

Toward simple "silico" experiments for drugs administration in some cancer treatments

We present some "in silico" experiments to design combined chemo- and immunotherapy treatment schedules. We introduce a new framework by combining flatness-based control, which is a model-based setting, along with model-free control. The flatness property of the used mathematical model yields straightforward reference trajectories. They provide us with the nominal open-loop control inputs. Closing the loop via model-free control allows to deal with the uncertainties on the injected drug doses. Several numerical simulations illustrating different case studies are displayed. We show in particular that the considered health indicators are driven to the safe region, even for critical initial conditions. Furthermore, in some specific cases there is no need to inject chemotherapeutic agents.

[122] 2107.12692

Dynamic and Static Object Detection Considering Fusion Regions and Point-wise Features

Object detection is a critical problem for the safe interaction between autonomous vehicles and road users. Deep-learning methodologies allowed the development of object detection approaches with better performance. However, there is still the challenge to obtain more characteristics from the objects detected in real-time. The main reason is that more information from the environment's objects can improve the autonomous vehicle capacity to face different urban situations. This paper proposes a new approach to detect static and dynamic objects in front of an autonomous vehicle. Our approach can also get other characteristics from the objects detected, like their position, velocity, and heading. We develop our proposal fusing results of the environment's interpretations achieved of YoloV3 and a Bayesian filter. To demonstrate our proposal's performance, we asses it through a benchmark dataset and real-world data obtained from an autonomous platform. We compared the results achieved with another approach.

[123] 2107.12693

A new recursive spectral Tau method on system of generalized Abel-Volterra integral equations

This paper provides an efficient recursive approach of the spectral Tau method to approximate the solution of system of generalized Abel-Volterra integral equations. In this regards, we first investigate the existence, uniqueness as well as smoothness of the solutions under assumption on the given data. Next, from a numerical perspective, we express approximated solution as a linear combination of suitable canonical polynomials which are constructed by an easy to use recursive formula. Mostly, the unknown parameters are calculated by solving a low dimensional algebraic systems independent of degree of approximation which prevent from high computational costs. Obviously, due to singular behavior of the exact solutions, using classical polynomials to construct canonical polynomials, leads to low accuracy results. In this regards, we develop a new fractional order canonical polynomials using M\"untz-Legendre polynomials which have a same asymptotic behavior with the solution of underlying problem. The convergence analysis is discussed, and the familiar spectral accuracy is achieved in $L^{\infty}$ norm. Finally, the reliability of the method is evaluated using various problems.

[124] 2107.12696

A tactile closed-loop device for musical interaction

This paper presents a device implementing a closed tactile loop for musical interaction, based on a small freely held magnet which serves as the medium for both input and output. The component parts as well as an example of its programmable behaviour are described.

[125] 2107.12699

A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI

Different security issues are a common problem for open source packages archived to and delivered through software ecosystems. These often manifest themselves as software weaknesses that may lead to concrete software vulnerabilities. This paper examines various security issues in Python packages with static analysis. The dataset is based on a snapshot of all packages stored to the Python Package Index (PyPI). In total, over 197 thousand packages and over 749 thousand security issues are covered. Even under the constraints imposed by static analysis, (a) the results indicate prevalence of security issues; at least one issue is present for about 46% of the Python packages. In terms of the issue types, (b) exception handling and different code injections have been the most common issues. The subprocess module stands out in this regard. Reflecting the generally small size of the packages, (c) software size metrics do not predict well the amount of issues revealed through static analysis. With these results and the accompanying discussion, the paper contributes to the field of large-scale empirical studies for better understanding security problems in software ecosystems.

[126] 2107.12701

End-To-End Real-Time Visual Perception Framework for Construction Automation

In this work, we present a robotic solution to automate the task of wall construction. To that end, we present an end-to-end visual perception framework that can quickly detect and localize bricks in a clutter. Further, we present a light computational method of brick pose estimation that incorporates the above information. The proposed detection network predicts a rotated box compared to YOLO and SSD, thereby maximizing the object's region in the predicted box regions. In addition, precision P, recall R, and mean-average-precision (mAP) scores are reported to evaluate the proposed framework. We observed that for our task, the proposed scheme outperforms the upright bounding box detectors. Further, we deploy the proposed visual perception framework on a robotic system endowed with a UR5 robot manipulator and demonstrate that the system can successfully replicate a simplified version of the wall-building task in an autonomous mode.

[127] 2107.12704

The cyclotactor: towards a tactile platform for musical interaction

This paper reports on work in progress on a finger-based tactile I/O device for musical interaction. Central to the device is the ability to set up cyclical relationships between tactile input and output. A direct practical application of this to musical interaction is given, using the idea to multiplex two degrees of freedom on a single tactile loop.

[128] 2107.12706

Improving ClusterGAN Using Self-AugmentedInformation Maximization of Disentangling LatentSpaces

The Latent Space Clustering in Generative adversarial networks (ClusterGAN) method has been successful with high-dimensional data. However, the method assumes uniformlydistributed priors during the generation of modes, which isa restrictive assumption in real-world data and cause loss ofdiversity in the generated modes. In this paper, we proposeself-augmentation information maximization improved Clus-terGAN (SIMI-ClusterGAN) to learn the distinctive priorsfrom the data. The proposed SIMI-ClusterGAN consists offour deep neural networks: self-augmentation prior network,generator, discriminator and clustering inference autoencoder.The proposed method has been validated using seven bench-mark data sets and has shown improved performance overstate-of-the art methods. To demonstrate the superiority ofSIMI-ClusterGAN performance on imbalanced dataset, wehave discussed two imbalanced conditions on MNIST datasetswith one-class imbalance and three classes imbalanced cases.The results highlight the advantages of SIMI-ClusterGAN.

[129] 2107.12707

DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic Voxelization

In this work, we propose a novel two-stage framework for the efficient 3D point cloud object detection. Instead of transforming point clouds into 2D bird eye view projections, we parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy. To achieve this goal, we propose dynamic voxelization, a method that voxellizes points at local scale on-the-fly. By doing so, we preserve the point cloud geometry with 3D voxels, and therefore waive the dependence on expensive MLPs to learn from point coordinates. On the other hand, we inherently still follow the same processing pattern as point-wise methods (e.g., PointNet) and no longer suffer from the quantization issue like conventional convolutions. For further speed optimization, we propose the grid-based downsampling and voxelization method, and provide different CUDA implementations to accommodate to the discrepant requirements during training and inference phases. We highlight our efficiency on KITTI 3D object detection dataset with 75 FPS and on Waymo Open dataset with 25 FPS inference speed with satisfactory accuracy.

[130] 2107.12708

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of ``reasoning types" in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.

[131] 2107.12709

Developing the cyclotactor

This paper presents developments in the technology underlying the cyclotactor, a finger-based tactile I/O device for musical interaction. These include significant improvements both in the basic characteristics of tactile interaction and in the related (vibro)tactile sample rates, latencies, and timing precision. After presenting the new prototype's tactile output force landscape, some of the new possibilities for interaction are discussed, especially those for musical interaction with zero audio/tactile latency.

[132] 2107.12711

Inclusion, equality and bias in designing online mass deliberative platforms

Designers of online deliberative platforms aim to counter the degrading quality of online debates and eliminate online discrimination based on class, race or gender. Support technologies such as machine learning and natural language processing open avenues for widening the circle of people involved in deliberation, moving from small groups to ``crowd'' scale. Some design features of large-scale online discussion systems allow larger numbers of people to discuss shared problems, enhance critical thinking, and formulate solutions. However, scaling up deliberation is challenging. We review the transdisciplinary literature on the design of digital mass-deliberation platforms and examine the commonly featured design aspects (e.g., argumentation support, automated facilitation, and gamification). We find that the literature is heavily focused on developing technical fixes for scaling up deliberation, with a heavy western influence on design and test users skew young and highly educated. Contrastingly, there is a distinct lack of discussion on the nature of the design process, the inclusion of stakeholders and issues relating to inclusion, which may unwittingly perpetuate bias. Another tendency of deliberation platforms is to nudge participants to desired forms of argumentation, and simplifying definitions of good and bad arguments to fit algorithmic purposes. Few studies bridge disciplines between deliberative theory, design and engineering. As a result, scaling up deliberation will likely advance in separate systemic siloes. We make design and process recommendations to correct this course and suggest avenues for future research.

[133] 2107.12714

Making grains tangible: microtouch for microsound

This paper proposes a new research direction for the large family of instrumental musical interfaces where sound is generated using digital granular synthesis, and where interaction and control involve the (fine) operation of stiff, flat contact surfaces. First, within a historical context, a general absence of, and clear need for, tangible output that is dynamically instantiated by the grain-generating process itself is identified. Second, to fill this gap, a concrete general approach is proposed based on the careful construction of non-vibratory and vibratory force pulses, in a one-to-one relationship with sonic grains. An informal pilot psychophysics experiment initiating the approach was conducted, which took into account the two main cases for applying forces to the human skin: perpendicular, and lateral. Initial results indicate that the force pulse approach can enable perceivably multidimensional, tangible display of the ongoing grain-generating process. Moreover, it was found that this can be made to meaningfully happen (in real time) in the same timescale of basic sonic grain generation. This is not a trivial property, and provides an important and positive fundament for further developing this type of enhanced display. It also leads to the exciting prospect of making arbitrary sonic grains actual physical manipulanda.

[134] 2107.12715

Information-Theoretic Based Target Search with Multiple Agents

This paper proposes an online path planning and motion generation algorithm for heterogeneous robot teams performing target search in a real-world environment. Path selection for each robot is optimized using an information-theoretic formulation and is computed sequentially for each agent. First, we generate candidate trajectories sampled from both global waypoints derived from vertical cell decomposition and local frontier points. From this set, we choose the path with maximum information gain. We demonstrate that the hierarchical sequential decision-making structure provided by the algorithm is scalable to multiple agents in a simulation setup. We also validate our framework in a real-world apartment setting using a two robot team comprised of the Unitree A1 quadruped and the Toyota HSR mobile manipulator searching for a person. The agents leverage an efficient leader-follower communication structure where only critical information is shared.

[135] 2107.12716

Ghostfinger: a novel platform for fully computational fingertip controllers

We present Ghostfinger, a technology for highly dynamic up/down fingertip haptics and control. The overall user experience offered by the technology can be described as that of tangibly and audibly interacting with a small hologram. More specifically, Ghostfinger implements automatic visualization of the dynamic instantiation/parametrization of algorithmic primitives that together determine the current haptic conditions for fingertip action. Some aspects of this visualization are visuospatial: A floating see-through cursor provides real-time, to-scale display of the fingerpad transducer, as it is being moved by the user. Simultaneously, each haptic primitive instance is represented by a floating block shape, type-colored, variably transparent, and possibly overlapping with other such block shapes. Further aspects of visualization are symbolic: Each instance is also represented by a type symbol, lighting up within a grid if the instance is providing output to the user. We discuss the system's user interface, programming interface, and potential applications. This is done from a general perspective that articulates and emphasizes the uniquely enabling role of the principle of computation in the implementation of new forms of instrumental control of musical sound. Beyond the currently presented technology, this also reflects more broadly on the role of Digital Musical Instruments (DMIs) in NIME.

[136] 2107.12717

Cooperative Reflection Design with Timing Offsets in Distributed Multi-RIS Communications

This letter investigates a wireless communication system deploying distributed reconfigurable intelligent surfaces (RISs). Existing works have assumed that perfect timing synchronization is available among all the cooperative RISs. For practical considerations, we first study cooperative reflection design for multi-RIS-aided communication systems taking into account timing synchronization errors. We aim to minimize the mean-squared error of the recovered data in the presence of timing offsets subject to the unit modulus constraints imposed on the phase shifts of the RISs. In order to handle this sophisticated nonconvex problem, we develop a computationally-efficient algorithm based on the majorization-minimization framework where the RIS phase shift matrices and the timing offset equalizer are respectively developed in a semi-closed form and a closed form. Numerical examples validate the improved performance of our proposed design compared with various schemes.

[137] 2107.12719

Multi-modal estimation of the properties of containers and their content: survey and evaluation

Acoustic and visual sensing can support the contactless estimation of the weight of a container and the amount of its content when the container is manipulated by a person. However, transparencies (both of the container and of the content) and the variability of materials, shapes and sizes make this problem challenging. In this paper, we present an open benchmarking framework and an in-depth comparative analysis of recent methods that estimate the capacity of a container, as well as the type, mass, and amount of its content. These methods use learned and handcrafted features, such as mel-frequency cepstrum coefficients, zero-crossing rate, spectrograms, with different types of classifiers to estimate the type and amount of the content with acoustic data, and geometric approaches with visual data to determine the capacity of the container. Results on a newly distributed dataset show that audio alone is a strong modality and methods achieves a weighted average F1-score up to 81% and 97% for content type and level classification, respectively. Estimating the container capacity with vision-only approaches and filling mass with multi-modal, multi-stage algorithms reaches up to 65% weighted average capacity and mass scores.

[138] 2107.12720

Efficient Parallel Graph Trimming by Arc-Consistency

Given a large data graph, trimming techniques can reduce the search space by removing vertices without outgoing edges. One application is to speed up the parallel decomposition of graphs into strongly connected components (SCC decomposition), which is a fundamental step for analyzing graphs. We observe that graph trimming is essentially a kind of arc-consistency problem, and AC-3, AC-4, and AC-6 are the most relevant arc-consistency algorithms for application to graph trimming. The existing parallel graph trimming methods require worst-case $\mathcal O(nm)$ time and worst-case $\mathcal O(n)$ space for graphs with $n$ vertices and $m$ edges. We call these parallel AC-3-based as they are much like the AC-3 algorithm. In this work, we propose AC-4-based and AC-6-based trimming methods. That is, AC-4-based trimming has an improved worst-case time of $\mathcal O(n+m)$ but requires worst-case space of $\mathcal O(n+m)$; compared with AC-4-based trimming, AC-6-based has the same worst-case time of $\mathcal O(n+m)$ but an improved worst-case space of $\mathcal O(n)$. We parallelize the AC-4-based and AC-6-based algorithms to be suitable for shared-memory multi-core machines. The algorithms are designed to minimize synchronization overhead. For these algorithms, we also prove the correctness and analyze time complexities with the work-depth model. In experiments, we compare these three parallel trimming algorithms over a variety of real and synthetic graphs. Specifically, for the maximum number of traversed edges per worker by using 16 workers, AC-3-based traverses up to 58.3 and 36.5 times more edges than AC-6-based trimming and AC-4-based trimming, respectively.

[139] 2107.12732

Towards Black-box Attacks on Deep Learning Apps

Deep learning is a powerful weapon to boost application performance in many fields, including face recognition, object detection, image classification, natural language understanding, and recommendation system. With the rapid increase in the computing power of mobile devices, developers can embed deep learning models into their apps for building more competitive products with more accurate and faster responses. Although there are several works about adversarial attacks against deep learning models in mobile apps, they all need information about the models' internals (i.e., structures, weights) or need to modify the models. In this paper, we propose an effective black-box approach by training a substitute model to spoof the deep learning system inside the apps. To evaluate our approach, we select 10 real-world deep-learning apps with high popularity from Google Play to perform black-box adversarial attacks. Through the study, we find three factors that can influence the performance of attacks. Our approach can reach a relatively high attack success rate of 66.60% on average. Compared with other adversarial attacks on mobile deep learning models, in terms of the average attack success rates, our approach outperforms counterparts by 27.63%.

[140] 2107.12734

ENHANCE (ENriching Health data by ANnotations of Crowd and Experts): A case study for skin lesion classification

We present ENHANCE, an open dataset with multiple annotations to complement the existing ISIC and PH2 skin lesion classification datasets. This dataset contains annotations of visual ABC (asymmetry, border, colour) features from non-expert annotation sources: undergraduate students, crowd workers from Amazon MTurk and classic image processing algorithms. In this paper we first analyse the correlations between the annotations and the diagnostic label of the lesion, as well as study the agreement between different annotation sources. Overall we find weak correlations of non-expert annotations with the diagnostic label, and low agreement between different annotation sources. We then study multi-task learning (MTL) with the annotations as additional labels, and show that non-expert annotations can improve (ensembles of) state-of-the-art convolutional neural networks via MTL. We hope that our dataset can be used in further research into multiple annotations and/or MTL. All data and models are available on Github:

[141] 2107.12740

Edge service resource allocation strategy based on intelligent prediction

Artificial intelligence is one of the important technologies for industrial applications, but it requires a lot of computing resources and sensor data to support it. With the development of edge computing and the Internet of Things, artificial intelligence are playing an increasingly important role in the field of edge services. Therefore, how to make intelligent algorithms provide better services and the development of the Internet of Things has become an increasingly important topic. This paper focuses on the application of edge service distribution strategy, and proposes an edge service distribution strategy based on intelligent prediction, which reduces the bandwidth consumption of edge service providers and minimizes the cost of edge service providers. In addition, this article uses the real data provided by the Wangsu Technology Company and an improved long and short term memory prediction method to dynamically change the bandwidth, and achieves better optimization of resources allocation comparing with actual industrial applications.The simulation results show that our intelligent prediction can achieve good results, and the mechanism can achieve higher resource utilization.

[142] 2107.12744

Real-Time Activity Recognition and Intention Recognition Using a Vision-based Embedded System

With the rapid increase in digital technologies, most fields of study include recognition of human activity and intention recognition, which are important in smart environments. In this research, we introduce a real-time activity recognition to recognize people's intentions to pass or not pass a door. This system, if applied in elevators and automatic doors will save energy and increase efficiency. For this study, data preparation is applied to combine the spatial and temporal features with the help of digital image processing principles. Nevertheless, unlike previous studies, only one AlexNet neural network is used instead of two-stream convolutional neural networks. Our embedded system was implemented with an accuracy of 98.78% on our Intention Recognition dataset. We also examined our data representation approach on other datasets, including HMDB-51, KTH, and Weizmann, and obtained accuracy of 78.48%, 97.95%, and 100%, respectively. The image recognition and neural network models were simulated and implemented using Xilinx simulators for ZCU102 board. The operating frequency of this embedded system is 333 MHz, and it works in real-time with 120 frames per second (fps).

[143] 2107.12746

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework

Localizing individuals in crowds is more in accordance with the practical demands of subsequent high-level crowd analysis tasks than simply counting. However, existing localization based methods relying on intermediate representations (\textit{i.e.}, density maps or pseudo boxes) serving as learning targets are counter-intuitive and error-prone. In this paper, we propose a purely point-based framework for joint crowd counting and individual localization. For this framework, instead of merely reporting the absolute counting error at image level, we propose a new metric, called density Normalized Average Precision (nAP), to provide more comprehensive and more precise performance evaluation. Moreover, we design an intuitive solution under this framework, which is called Point to Point Network (P2PNet). P2PNet discards superfluous steps and directly predicts a set of point proposals to represent heads in an image, being consistent with the human annotation results. By thorough analysis, we reveal the key step towards implementing such a novel idea is to assign optimal learning targets for these proposals. Therefore, we propose to conduct this crucial association in an one-to-one matching manner using the Hungarian algorithm. The P2PNet not only significantly surpasses state-of-the-art methods on popular counting benchmarks, but also achieves promising localization accuracy. The codes will be available at:

[144] 2107.12751

Bibliometric Profile of Nursing Research in Ex Yugoslavian Countries

The development of modern nursing and consequently nursing research in Ex- Yugoslavia is about a century old. To profile the development, volume, and content of nursing research we completed a performance and spatial bibliometric analysis combined with synthetic content analysis to identify the most productive countries and institutions, most prolific source titles, country cooperation, publication production trends, the content of research and hot topics. The corpus was harvested from the Web of Science All databases and contained 1380 papers. Slovenia was the most productive country, followed by Croatia and Serbia. The synthetic content analysis demonstrated that nursing research in ex-Yugoslavian countries is growing both in scope and number of publications, notwithstanding the fact that research content differs between countries and it seems that each country is focused on their local health problems. A substantial part of the research is published in national journals in national languages however, it is noteworthy to note that some ex-Yugoslavian authors have succeeded in publishing their research in top nursing journals. The study also revealed substantial international cooperation especially among ex-Yugoslavian countries and European Union.

[145] 2107.12753

Discriminative-Generative Representation Learning for One-Class Anomaly Detection

As a kind of generative self-supervised learning methods, generative adversarial nets have been widely studied in the field of anomaly detection. However, the representation learning ability of the generator is limited since it pays too much attention to pixel-level details, and generator is difficult to learn abstract semantic representations from label prediction pretext tasks as effective as discriminator. In order to improve the representation learning ability of generator, we propose a self-supervised learning framework combining generative methods and discriminative methods. The generator no longer learns representation by reconstruction error, but the guidance of discriminator, and could benefit from pretext tasks designed for discriminative methods. Our discriminative-generative representation learning method has performance close to discriminative methods and has a great advantage in speed. Our method used in one-class anomaly detection task significantly outperforms several state-of-the-arts on multiple benchmark data sets, increases the performance of the top-performing GAN-based baseline by 6% on CIFAR-10 and 2% on MVTAD.

[146] 2107.12762

Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign Language Recognition

Continuous sign language recognition (cSLR) is a public significant task that transcribes a sign language video into an ordered gloss sequence. It is important to capture the fine-grained gloss-level details, since there is no explicit alignment between sign video frames and the corresponding glosses. Among the past works, one promising way is to adopt a one-dimensional convolutional network (1D-CNN) to temporally fuse the sequential frames. However, CNNs are agnostic to similarity or dissimilarity, and thus are unable to capture local consistent semantics within temporally neighboring frames. To address the issue, we propose to adaptively fuse local features via temporal similarity for this task. Specifically, we devise a Multi-scale Local-Temporal Similarity Fusion Network (mLTSF-Net) as follows: 1) In terms of a specific video frame, we firstly select its similar neighbours with multi-scale receptive regions to accommodate different lengths of glosses. 2) To ensure temporal consistency, we then use position-aware convolution to temporally convolve each scale of selected frames. 3) To obtain a local-temporally enhanced frame-wise representation, we finally fuse the results of different scales using a content-dependent aggregator. We train our model in an end-to-end fashion, and the experimental results on RWTH-PHOENIX-Weather 2014 datasets (RWTH) demonstrate that our model achieves competitive performance compared with several state-of-the-art models.

[147] 2107.12765

Resource Optimization with Interference Coupling in Multi-IRS-assisted Multi-cell Systems

Deploying Intelligent reflecting surfaces (IRSs) to enhance wireless transmission is a promising approach. In this paper, we investigate large-scale multi-IRS-assisted multi-cell systems, where multiple IRSs are deployed in each cell. Different from the full-buffer scenario, the mutual interference in our system is not known a priori, and for this reason we apply the load coupling model to analyze this system. The objective is to minimize the total resource consumption subject to user demand requirement by optimizing the reflection coefficients in the cells. The cells are highly coupled and the overall problem is non-convex. To tackle this, we first investigate the single-cell case with given interference, and propose a low-complexity algorithm based on the Majorization-Minimization (MM) method to obtain a locally optimal solution. Then, we embed this algorithm into an algorithmic framework for the overall multi-cell problem, and prove its feasibility and convergence to a solution that is at least locally optimal. Simulation results demonstrate the benefit of IRS in time-frequency resource utilization in the multi-cell system.

[148] 2107.12766

Beyond 5G: Big Data Processing for Better Spectrum Utilization

This article emphasizes the great potential of big data processing for advanced user- and situation-oriented, so context-aware resource utilization in future wireless networks. In particular, we consider the application of dedicated, detailed, and rich-in-content maps and records called Radio Service Maps, (RSM) for unlocking the spectrum opportunities in 6G networks. Due to the characteristics of 5G, in the future, there will be a need for high convergence of various types of wireless networks, such as cellular and the Internet-of-Things (IoT) networks, which are steadily growing and consequently considered as the studied use case in this work. We show that the 6G network significantly benefits from effective Dynamic Spectrum management (DSM) based on RSM which provides rich and accurate knowledge of the radio context; a knowledge that is stored and processed within database-oriented subsystems designed to support wireless networks for improving spectral efficiency. In this article, we discuss context-aware RSM subsystem architecture and operation for DSM in convergent 6G radio and IoT networks. By providing various use-cases, we demonstrate that the accurate definition and access to the rich context information lead to a significant improvement of the system performance. In consequence, we also claim that efficient big-data processing algorithms will be necessary for future applications.

[149] 2107.12770

Comparing Prophet and Deep Learning to ARIMA in Forecasting Wholesale Food Prices

Setting sale prices correctly is of great importance for firms, and the study and forecast of prices time series is therefore a relevant topic not only from a data science perspective but also from an economic and applicative one. In this paper we exhamine different techniques to forecast the sale prices of three food products applied by an Italian food wholesaler, as a step towards the automation of pricing tasks usually taken care by human workforce. We consider ARIMA models and compare them to Prophet, a scalable forecasting tool developed by Facebook and based on a generalized additive model, and to deep learning models based on Long Short--Term Memory (LSTM) and Convolutional Neural Networks (CNNs). ARIMA models are frequently used in econometric analyses, providing a good bechmark for the problem under study. Our results indicate that ARIMA performs similarly to LSTM neural networks for the problem under study, while the combination of CNNs and LSTMs attains the best overall accuracy, but requires more time to be tuned. On the contrary, Prophet is very fast to use, but less accurate.

[150] 2107.12772

Collaborative Software Modeling in Virtual Reality

Modeling is a key activity in conceptual design and system design. Through collaborative modeling, end-users, stakeholders, experts, and entrepreneurs are able to create a shared understanding of a system representation. While the Unified Modeling Language (UML) is one of the major conceptual modeling languages in object-oriented software engineering, more and more concerns arise from the modeling quality of UML and its tool support. Among them, the limitation of the two-dimensional presentation of its notations and lack of natural collaborative modeling tools are reported to be significant. In this paper, we explore the potential of using Virtual Reality (VR) technology for collaborative UML software design by comparing it with classical collaborative software design using conventional devices (Desktop PC, Laptop). For this purpose, we have developed a VR modeling environment that offers a natural collaborative modeling experience for UML Class Diagrams. Based on a user study with 24 participants, we have compared collaborative VR modeling with conventional modeling with regard to efficiency, effectiveness, and user satisfaction. Results show that the use of VR has some disadvantages concerning efficiency and effectiveness, but the user's fun, the feeling of being in the same room with a remote collaborator, and the naturalness of collaboration were increased.

[151] 2107.12778

Assessing the performance of smart grid communication networks under both time and budget constraints

The smart grid concept has emerged to address the existing problems in the traditional electric grid, which has been functioning for more than a hundred years. The most crucial difference between traditional grids and smart grids is the communication infrastructure applied to the latter. However, coupling between these networks can increase the risk of significant failures. Hence, assessing the performance of the smart grid communication networks is of great importance and thus is considered here. As transmission time and cost play essential roles in many real-world communication networks, both time and budget constraints are considered in this work. To evaluate the performance of communication networks, we assume that the data is transmitted from a source to a destination through a single path. We propose an algorithm that computes the exact probability of transmitting d units of data from the source to the destination within T units of time and the budget of b. The algorithm is illustrated through a benchmark network example. The complexity results are also provided. A rather large-size benchmark, that is, Pan European topology, along with one thousand randomly generated test problems are used to generate the experimental results which show clearly the superiority of our proposed algorithms to some existing algorithm in the literature.

[152] 2107.12780

Physics-constrained Deep Learning for Robust Inverse ECG Modeling

The rapid developments in advanced sensing and imaging bring about a data-rich environment, facilitating the effective modeling, monitoring, and control of complex systems. For example, the body-sensor network captures multi-channel information pertinent to the electrical activity of the heart (i.e., electrocardiograms (ECG)), which enables medical scientists to monitor and detect abnormal cardiac conditions. However, the high-dimensional sensing data are generally complexly structured and realizing the full data potential depends to a great extent on advanced analytical and predictive methods. This paper presents a physics-constrained deep learning (P-DL) framework for high-dimensional inverse ECG modeling. This method integrates the physical laws of the complex system with the advanced deep learning infrastructure for effective prediction of the system dynamics. The proposed P-DL approach is implemented to solve the inverse ECG model and predict the time-varying distribution of electric potentials in the heart from the ECG data measured by the body-surface sensor network. Experimental results show that the proposed P-DL method significantly outperforms existing methods that are commonly used in current practice.

[153] 2107.12788

On the data persistency of replicated erasure codes in distributed storage systems

This paper studies the fundamental problem of data persistency for a general family of redundancy schemes in distributed storage systems, called replicated erasure codes. Namely, we analyze two strategies of replicated erasure codes distribution: random and symmetric. For both strategies we derive closed analytical and asymptotic formulas for expected data persistency despite nodes failure.

[154] 2107.12791

Clickbait Detection in YouTube Videos

YouTube videos often include captivating descriptions and intriguing thumbnails designed to increase the number of views, and thereby increase the revenue for the person who posted the video. This creates an incentive for people to post clickbait videos, in which the content might deviate significantly from the title, description, or thumbnail. In effect, users are tricked into clicking on clickbait videos. In this research, we consider the challenging problem of detecting clickbait YouTube videos. We experiment with multiple state-of-the-art machine learning techniques using a variety of textual features.

[155] 2107.12794

Short-Term Electricity Price Forecasting based on Graph Convolution Network and Attention Mechanism

In electricity markets, locational marginal price (LMP) forecasting is particularly important for market participants in making reasonable bidding strategies, managing potential trading risks, and supporting efficient system planning and operation. Unlike existing methods that only consider LMPs' temporal features, this paper tailors a spectral graph convolutional network (GCN) to greatly improve the accuracy of short-term LMP forecasting. A three-branch network structure is then designed to match the structure of LMPs' compositions. Such kind of network can extract the spatial-temporal features of LMPs, and provide fast and high-quality predictions for all nodes simultaneously. The attention mechanism is also implemented to assign varying importance weights between different nodes and time slots. Case studies based on the IEEE-118 test system and real-world data from the PJM validate that the proposed model outperforms existing forecasting models in accuracy, and maintains a robust performance by avoiding extreme errors.

[156] 2107.12800

Deep Reinforcement Learning for L3 Slice Localization in Sarcopenia Assessment

Sarcopenia is a medical condition characterized by a reduction in muscle mass and function. A quantitative diagnosis technique consists of localizing the CT slice passing through the middle of the third lumbar area (L3) and segmenting muscles at this level. In this paper, we propose a deep reinforcement learning method for accurate localization of the L3 CT slice. Our method trains a reinforcement learning agent by incentivizing it to discover the right position. Specifically, a Deep Q-Network is trained to find the best policy to follow for this problem. Visualizing the training process shows that the agent mimics the scrolling of an experienced radiologist. Extensive experiments against other state-of-the-art deep learning based methods for L3 localization prove the superiority of our technique which performs well even with limited amount of data and annotations.

[157] 2107.12801

Robust Optimization Framework for Training Shallow Neural Networks Using Reachability Method

In this paper, a robust optimization framework is developed to train shallow neural networks based on reachability analysis of neural networks. To characterize noises of input data, the input training data is disturbed in the description of interval sets. Interval-based reachability analysis is then performed for the hidden layer. With the reachability analysis results, a robust optimization training method is developed in the framework of robust least-square problems. Then, the developed robust least-square problem is relaxed to a semidefinite programming problem. It has been shown that the developed robust learning method can provide better robustness against perturbations at the price of loss of training accuracy to some extent. At last, the proposed method is evaluated on a robot arm model learning example.

[158] 2107.12806

Towards Industrial Private AI: A two-tier framework for data and model security

With the advances in 5G and IoT devices, the industries are vastly adopting artificial intelligence (AI) techniques for improving classification and prediction-based services. However, the use of AI also raises concerns regarding data privacy and security that can be misused or leaked. Private AI was recently coined to address the data security issue by combining AI with encryption techniques but existing studies have shown that model inversion attacks can be used to reverse engineer the images from model parameters. In this regard, we propose a federated learning and encryption-based private (FLEP) AI framework that provides two-tier security for data and model parameters in an IIoT environment. We proposed a three-layer encryption method for data security and provided a hypothetical method to secure the model parameters. Experimental results show that the proposed method achieves better encryption quality at the expense of slightly increased execution time. We also highlighted several open issues and challenges regarding the FLEP AI framework's realization.

[159] 2107.12807

HPTMT: Operator-Based Architecture for ScalableHigh-Performance Data-Intensive Frameworks

Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, graphs, and tables. Our key concepts are inspired from systems like MPI, HPF (High-Performance Fortran), NumPy, Pandas, Spark, Modin, PyTorch, TensorFlow, RAPIDS(NVIDIA), and OneAPI (Intel). Further, it is crucial to support different languages in everyday use in the Big Data arena, including Python, R, C++, and Java. We note the importance of Apache Arrow and Parquet for enabling language agnostic high performance and interoperability. In this paper, we propose High-Performance Tensors, Matrices and Tables (HPTMT), an operator-based architecture for data-intensive applications, and identify the fundamental principles needed for performance and usability success. We illustrate these principles by a discussion of examples using our software environments, Cylon and Twister2 that embody HPTMT.

[160] 2107.12808

Open-Ended Learning Leads to Generally Capable Agents

In this work we create agents that can perform well beyond a single, individual task, that exhibit much wider generalisation of behaviour to a massive, rich space of challenges. We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are generally capable across this vast space and beyond. The environment is natively multi-agent, spanning the continuum of competitive, cooperative, and independent games, which are situated within procedurally generated physical 3D worlds. The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem. We propose an iterative notion of improvement between successive generations of agents, rather than seeking to maximise a singular objective, allowing us to quantify progress despite tasks being incomparable in terms of achievable rewards. We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours. The resulting agent is able to score reward in every one of our humanly solvable evaluation levels, with behaviour generalising to many held-out points in the universe of tasks. Examples of this zero-shot generalisation include good performance on Hide and Seek, Capture the Flag, and Tag. Through analysis and hand-authored probe tasks we characterise the behaviour of our agent, and find interesting emergent heuristic behaviours such as trial-and-error experimentation, simple tool use, option switching, and cooperation. Finally, we demonstrate that the general capabilities of this agent could unlock larger scale transfer of behaviour through cheap finetuning.

[161] 2107.12809

Bayesian Optimisation for Sequential Experimental Design with Applications in Additive Manufacturing

Bayesian optimization (BO) is an approach to globally optimizing black-box objective functions that are expensive to evaluate. BO-powered experimental design has found wide application in materials science, chemistry, experimental physics, drug development, etc. This work aims to bring attention to the benefits of applying BO in designing experiments and to provide a BO manual, covering both methodology and software, for the convenience of anyone who wants to apply or learn BO. In particular, we briefly explain the BO technique, review all the applications of BO in additive manufacturing, compare and exemplify the features of different open BO libraries, unlock new potential applications of BO to other types of data (e.g., preferential output). This article is aimed at readers with some understanding of Bayesian methods, but not necessarily with knowledge of additive manufacturing; the software performance overview and implementation instructions are instrumental for any experimental-design practitioner. Moreover, our review in the field of additive manufacturing highlights the current knowledge and technological trends of BO.

[162] 2107.12815

Adaptive Denoising via GainTuning

Deep convolutional neural networks (CNNs) for image denoising are usually trained on large datasets. These models achieve the current state of the art, but they have difficulties generalizing when applied to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These models adapt to the features of the test image, but their performance is limited by the small amount of information used to train them. Here we propose "GainTuning", in which CNN models pre-trained on large datasets are adaptively and selectively adjusted for individual test images. To avoid overfitting, GainTuning optimizes a single multiplicative scaling parameter (the "Gain") of each channel in the convolutional layers of the CNN. We show that GainTuning improves state-of-the-art CNNs on standard image-denoising benchmarks, boosting their denoising performance on nearly every image in a held-out test set. These adaptive improvements are even more substantial for test images differing systematically from the training data, either in noise level or image type. We illustrate the potential of adaptive denoising in a scientific application, in which a CNN is trained on synthetic data, and tested on real transmission-electron-microscope images. In contrast to the existing methodology, GainTuning is able to faithfully reconstruct the structure of catalytic nanoparticles from these data at extremely low signal-to-noise ratios.

[163] 2107.12816

Dynamic Power and Frequency Allocation Scheme for Autonomous Platooning

In this paper, we consider the use of radio environment maps (REMs) in vehicular dynamic spectrum access (VDSA) for vehicle platooning applications. We propose an algorithm that dynamically allocates the frequency bands and transmission power in the so-called TV white spaces (TVWS) for intra-platoon messaging, intending to maximize the reliability of the communications, simultaneously keeping the interference to the primary system below the required threshold. The proposed solution is evaluated in simulations, with the results indicating a significant increase in communications reliability with VDSA.

[164] 2107.12824

A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

Although high-performance deep neural networks are in high demand in edge environments, computation resources are strictly limited in edge devices, and light-weight neural network techniques, such as Depthwise Separable Convolution (DSC), have been developed. ResNet is one of conventional deep neural network models that stack a lot of layers and parameters for a higher accuracy. To reduce the parameter size of ResNet, by utilizing a similarity to ODE (Ordinary Differential Equation), Neural ODE repeatedly uses most of weight parameters instead of having a lot of different parameters. Thus, Neural ODE becomes significantly small compared to that of ResNet so that it can be implemented in resource-limited edge devices. In this paper, a combination of Neural ODE and DSC, called dsODENet, is designed and implemented for FPGAs (Field-Programmable Gate Arrays). dsODENet is then applied to edge domain adaptation as a practical use case and evaluated with image classification datasets. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, training speed, FPGA resource utilization, and speedup rate compared to a software execution. The results demonstrate that dsODENet is comparable to or slightly better than our baseline Neural ODE implementation in terms of domain adaptation accuracy, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. The FPGA implementation accelerates the prediction tasks by 27.9 times faster than a software implementation.

[165] 2107.12825

Individual Survival Curves with Conditional Normalizing Flows

Survival analysis, or time-to-event modelling, is a classical statistical problem that has garnered a lot of interest for its practical use in epidemiology, demographics or actuarial sciences. Recent advances on the subject from the point of view of machine learning have been concerned with precise per-individual predictions instead of population studies, driven by the rise of individualized medicine. We introduce here a conditional normalizing flow based estimate of the time-to-event density as a way to model highly flexible and individualized conditional survival distributions. We use a novel hierarchical formulation of normalizing flows to enable efficient fitting of flexible conditional distributions without overfitting and show how the normalizing flow formulation can be efficiently adapted to the censored setting. We experimentally validate the proposed approach on a synthetic dataset as well as four open medical datasets and an example of a common financial problem.

[166] 2107.12826

Adversarial Stacked Auto-Encoders for Fair Representation Learning

Training machine learning models with the only accuracy as a final goal may promote prejudices and discriminatory behaviors embedded in the data. One solution is to learn latent representations that fulfill specific fairness metrics. Different types of learning methods are employed to map data into the fair representational space. The main purpose is to learn a latent representation of data that scores well on a fairness metric while maintaining the usability for the downstream task. In this paper, we propose a new fair representation learning approach that leverages different levels of representation of data to tighten the fairness bounds of the learned representation. Our results show that stacking different auto-encoders and enforcing fairness at different latent spaces result in an improvement of fairness compared to other existing approaches.

[167] 2107.12829

Conflict-Free Four-Dimensional Path Planning for Urban Air Mobility Considering Airspace Occupations

Urban air mobility (UAM) has attracted the attention of aircraft manufacturers, air navigation service providers and governments in recent years. Preventing the conflict among urban aircraft is crucial to UAM traffic safety, which is a key in enabling large scale UAM operation. Pre-flight conflict-free path planning can provide a strategic layer in the maintenance of safety performance, thus becomes an important element in UAM. This paper aims at tackling conflict-free path planning problem for UAM operation with a consideration of four-dimensional airspace management. In the first place, we introduced and extended a four-dimensional airspace management concept, AirMatrix. On the basis of AirMatrix, we formulated the shortest flight time path planning problem considering resolution of conflicts with both static and dynamic obstacles. A Conflict-Free A-Star algorithm was developed for planning four-dimensional paths based on first-come-first-served scheme. The algorithm contains a novel design of heuristic function as well as a conflict detection and resolution strategy. Numerical experiment was carried out in Jurong East area in Singapore, and the results show that the algorithm can generate paths resolving a significant number of potential conflicts in airspace utilization, with acceptable computational time and flight delay. The contributions of this study provide references for stakeholders to support the development of UAM.

[168] 2107.12831

Estudo Abordando o Contexto de Notícias Falsas em Países de Língua Portuguesa (Fake News)

This work consists of a study that addresses the context of false news in the reality of today's world. False news is a widely used expression currently. During the study, it was possible to identify problems generalized about this theme, such as the wide spread that these have and the impact they have on society. From these problems it was possible to identify more specific ones, such as the origin of the news, the news source, a person who shares and/or creates news and the interpersonal relationship existing. With the identification of the aforementioned sub-problems, it was possible develop a taxonomic model with the aim of implementing a tool that helps in detecting false news, identifying if a news is true, false or whether the user must be careful (when it is not possible identify whether the news is true or false). After implementation, it was possible get a tool that allows you to calculate a probability of a news being false, selected as selected options in each parameter. It was also possible to verify that a probability was correct and that the tool is reviewed in the study carried out.

[169] 2107.12833

Development of a NIC driver in C#

Drivers have a special status among the developer community that sees them as mysterious and inaccessible. We think their extensive communication with the hardware and their need of high performance are the cause of this bad reputation. According to a widely held view, these two requirements cannot be met using high level languages. However high level languages' compilers and runtimes made great progress these past years to enhance the performance of programs. The use of these languages can also reduce by a significant amount the number of bugs and security issues introduced by the programmers by taking care of some error-prone parts like memory allocation and accesses. We also think that using high level languages can help to demystify the drivers' development. With this project, we try to develop a driver for a network card, the Intel 82599, in C\#. Our goal is to find out the feasibility of such a development and the performance of such a driver. We will also be able to tell what could be missing today in C\# to write a driver. We base our driver on the model proposed by Pirelli (2020) and its implementation in C.

[170] 2107.12841

A finite element method for simulating soft active non-shearable rods immersed in generalized Newtonian fluids

We propose a finite element method for simulating one-dimensional solid models moving and experiencing large deformations while immersed in generalized Newtonian fluids. The method is oriented towards applications involving microscopic devices or organisms in the soft-bio-matter realm. By considering that the strain energy of the solid may explicitly depend on time, we incorporate a mechanism for active response. The solids are modeled as Cosserat rods, a detailed formulation being provided for the special case of a planar non-shearable rod. The discretization adopts one-dimensional Hermite elements for the rod and low-order Lagrange two-dimensional elements for the fluid's velocity and pressure. The fluid mesh is boundary-fitted, with remeshing at each time step. Several time marching schemes are studied, of which a semi-implicit scheme emerges as most effective. The method is demonstrated in very challenging examples: the roll-up of a rod to circular shape and later sudden release, the interaction of a soft rod with a fluid jet and the active self-locomotion of a sperm-like rod. The article includes a detailed description of a code that implements the method in the Firedrake library.

[171] 2107.12842

Technical Report: Quality Assessment Tool for Machine Learning with Clinical CT

Image Quality Assessment (IQA) is important for scientific inquiry, especially in medical imaging and machine learning. Potential data quality issues can be exacerbated when human-based workflows use limited views of the data that may obscure digital artifacts. In practice, multiple factors such as network issues, accelerated acquisitions, motion artifacts, and imaging protocol design can impede the interpretation of image collections. The medical image processing community has developed a wide variety of tools for the inspection and validation of imaging data. Yet, IQA of computed tomography (CT) remains an under-recognized challenge, and no user-friendly tool is commonly available to address these potential issues. Here, we create and illustrate a pipeline specifically designed to identify and resolve issues encountered with large-scale data mining of clinically acquired CT data. Using the widely studied National Lung Screening Trial (NLST), we have identified approximately 4% of image volumes with quality concerns out of 17,392 scans. To assess robustness, we applied the proposed pipeline to our internal datasets where we find our tool is generalizable to clinically acquired medical images. In conclusion, the tool has been useful and time-saving for research study of clinical data, and the code and tutorials are publicly available at

[172] 2107.12845

A Storytelling Robot managing Persuasive and Ethical Stances via ACT-R: an Exploratory Study

We present a storytelling robot, controlled via the ACT-R cognitive architecture, able to adopt different persuasive techniques and ethical stances while conversing about some topics concerning COVID-19. The main contribution of the paper consists in the proposal of a needs-driven model that guides and evaluates, during the dialogue, the use (if any) of persuasive techniques available in the agent procedural memory. The portfolio of persuasive techniques tested in such a model ranges from the use of storytelling, to framing techniques and rhetorical-based arguments. To the best of our knowledge, this represents the first attempt of building a persuasive agent able to integrate a mix of explicitly grounded cognitive assumptions about dialogue management, storytelling and persuasive techniques as well as ethical attitudes. The paper presents the results of an exploratory evaluation of the system on 63 participants

[173] 2107.12846

Higher-order sliding mode observer design for linear time-invariant multivariable systems based on a new observer normal form

In various applications in the field of control engineering the estimation of the state variables of dynamic systems in the presence of unknown inputs plays an important role. Existing methods require the so-called observer matching condition to be satisfied, rely on the boundedness of the state variables or exhibit an increased observer order of twice the plant order. In this article, a novel observer normal form for strongly observable linear time-invariant multivariable systems is proposed. In contrast to classical normal forms, the proposed approach also takes the unknown inputs into account. The proposed observer normal form allows for the straightforward construction of a higher-order sliding mode observer, which ensures global convergence of the estimation error within finite time even in the presence of unknown bounded inputs. Its application is not restricted to systems which satisfy the aforementioned limitations of already existing unknown input observers. The proposed approach can be exploited for the reconstruction of unknown inputs with bounded derivative and robust state-feedback control, which is shown by means of a tutorial example. Numerical simulations confirm the effectiveness of the presented work.

[174] 2107.12847

Learning Local Recurrent Models for Human Mesh Recovery

We consider the problem of estimating frame-level full human body meshes given a video of a person with natural motion dynamics. While much progress in this field has been in single image-based mesh estimation, there has been a recent uptick in efforts to infer mesh dynamics from video given its role in alleviating issues such as depth ambiguity and occlusions. However, a key limitation of existing work is the assumption that all the observed motion dynamics can be modeled using one dynamical/recurrent model. While this may work well in cases with relatively simplistic dynamics, inference with in-the-wild videos presents many challenges. In particular, it is typically the case that different body parts of a person undergo different dynamics in the video, e.g., legs may move in a way that may be dynamically different from hands (e.g., a person dancing). To address these issues, we present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model. We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body. This results in a structure-informed local recurrent learning architecture that can be trained in an end-to-end fashion with available annotations. We conduct a variety of experiments on standard video mesh recovery benchmark datasets such as Human3.6M, MPI-INF-3DHP, and 3DPW, demonstrating the efficacy of our design of modeling local dynamics as well as establishing state-of-the-art results based on standard evaluation metrics.

[175] 2107.12850

Guidelines on Minimum Standards for Developer Verification of Software

Executive Order (EO) 14028, "Improving the Nation's Cybersecurity," 12 May 2021, directs the National Institute of Standards and Technology (NIST) to recommend minimum standards for software testing within 60 days. This document describes eleven recommendations for software verification techniques as well as providing supplemental information about the techniques and references for further information. It recommends the following techniques: Threat modeling to look for design-level security issues Automated testing for consistency and to minimize human effort Static code scanning to look for top bugs Heuristic tools to look for possible hardcoded secrets Use of built-in checks and protections "Black box" test cases Code-based structural test cases Historical test cases Fuzzing Web app scanners, if applicable Address included code (libraries, packages, services) The document does not address the totality of software verification, but instead recommends techniques that are broadly applicable and form the minimum standards. The document was developed by NIST in consultation with the National Security Agency. Additionally, we received input from numerous outside organizations through papers submitted to a NIST workshop on the Executive Order held in early June, 2021 and discussion at the workshop as well as follow up with several of the submitters.

[176] 2107.12851

Task and Situation Structures for Service Agent Planning

Everyday tasks are characterized by their varieties and variations, and frequently are not clearly specified to service agents. This paper presents a comprehensive approach to enable a service agent to deal with everyday tasks in open, uncontrolled environments. We introduce a generic structure for representing tasks, and another structure for representing situations. Based on the two newly introduced structures, we present a methodology of situation handling that avoids hard-coding domain rules while improving the scalability of real-world task planning systems.

[177] 2107.12852

Real-time Keypoints Detection for Autonomous Recovery of the Unmanned Ground Vehicle

The combination of a small unmanned ground vehicle (UGV) and a large unmanned carrier vehicle allows more flexibility in real applications such as rescue in dangerous scenarios. The autonomous recovery system, which is used to guide the small UGV back to the carrier vehicle, is an essential component to achieve a seamless combination of the two vehicles. This paper proposes a novel autonomous recovery framework with a low-cost monocular vision system to provide accurate positioning and attitude estimation of the UGV during navigation. First, we introduce a light-weight convolutional neural network called UGV-KPNet to detect the keypoints of the small UGV from the images captured by a monocular camera. UGV-KPNet is computationally efficient with a small number of parameters and provides pixel-level accurate keypoints detection results in real-time. Then, six degrees of freedom pose is estimated using the detected keypoints to obtain positioning and attitude information of the UGV. Besides, we are the first to create a large-scale real-world keypoints dataset of the UGV. The experimental results demonstrate that the proposed system achieves state-of-the-art performance in terms of both accuracy and speed on UGV keypoint detection, and can further boost the 6-DoF pose estimation for the UGV.

[178] 2107.12854

Audio-to-Score Alignment Using Deep Automatic Music Transcription

Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.

[179] 2107.12855

Neural Network Branch-and-Bound for Neural Network Verification

Many available formal verification methods have been shown to be instances of a unified Branch-and-Bound (BaB) formulation. We propose a novel machine learning framework that can be used for designing an effective branching strategy as well as for computing better lower bounds. Specifically, we learn two graph neural networks (GNN) that both directly treat the network we want to verify as a graph input and perform forward-backward passes through the GNN layers. We use one GNN to simulate the strong branching heuristic behaviour and another to compute a feasible dual solution of the convex relaxation, thereby providing a valid lower bound. We provide a new verification dataset that is more challenging than those used in the literature, thereby providing an effective alternative for testing algorithmic improvements for verification. Whilst using just one of the GNNs leads to a reduction in verification time, we get optimal performance when combining the two GNN approaches. Our combined framework achieves a 50\% reduction in both the number of branches and the time required for verification on various convolutional networks when compared to several state-of-the-art verification methods. In addition, we show that our GNN models generalize well to harder properties on larger unseen networks.

[180] 2107.12858

Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network

Recent deep networks have convincingly demonstrated high capability in crowd counting, which is a critical task attracting widespread attention due to its various industrial applications. Despite such progress, trained data-dependent models usually can not generalize well to unseen scenarios because of the inherent domain shift. To facilitate this issue, this paper proposes a novel adversarial scoring network (ASNet) to gradually bridge the gap across domains from coarse to fine granularity. In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning. The distributions between two domains can thus be aligned roughly. At the fine-grained stage, we explore the transferability of source characteristics by scoring how similar the source samples are to target ones from multiple levels based on generative probability derived from coarse stage. Guided by these hierarchical scores, the transferable source features are properly selected to enhance the knowledge transfer during the adaptation process. With the coarse-to-fine design, the generalization bottleneck induced from the domain discrepancy can be effectively alleviated. Three sets of migration experiments show that the proposed methods achieve state-of-the-art counting performance compared with major unsupervised methods.

[181] 2107.12859

RGL-NET: A Recurrent Graph Learning framework for Progressive Part Assembly

Autonomous assembly of objects is an essential task in robotics and 3D computer vision. It has been studied extensively in robotics as a problem of motion planning, actuator control and obstacle avoidance. However, the task of developing a generalized framework for assembly robust to structural variants remains relatively unexplored. In this work, we tackle this problem using a recurrent graph learning framework considering inter-part relations and the progressive update of the part pose. Our network can learn more plausible predictions of shape structure by accounting for priorly assembled parts. Compared to the current state-of-the-art, our network yields up to 10% improvement in part accuracy and up to 15% improvement in connectivity accuracy on the PartNet dataset. Moreover, our resulting latent space facilitates exciting applications such as shape recovery from the point-cloud components. We conduct extensive experiments to justify our design choices and demonstrate the effectiveness of the proposed framework.

[182] 2107.12866

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Online harassment in the form of hate speech has been on the rise in recent years. Addressing the issue requires a combination of content moderation by people, aided by automatic detection methods. As content moderation is itself harmful to the people doing it, we desire to reduce the burden by improving the automatic detection of hate speech. Hate speech presents a challenge as it is directed at different target groups using a completely different vocabulary. Further the authors of the hate speech are incentivized to disguise their behavior to avoid being removed from a platform. This makes it difficult to develop a comprehensive data set for training and evaluating hate speech detection models because the examples that represent one hate speech domain do not typically represent others, even within the same language or culture. We propose an unsupervised domain adaptation approach to augment labeled data for hate speech detection. We evaluate the approach with three different models (character CNNs, BiLSTMs and BERT) on three different collections. We show our approach improves Area under the Precision/Recall curve by as much as 42% and recall by as much as 278%, with no loss (and in some cases a significant gain) in precision.

[183] 2107.12867

From Library Portability to Para-rehosting: Natively Executing Microcontroller Software on Commodity Hardware

Finding bugs in microcontroller (MCU) firmware is challenging, even for device manufacturers who own the source code. The MCU runs different instruction sets than x86 and exposes a very different development environment. This invalidates many existing sophisticated software testing tools on x86. To maintain a unified developing and testing environment, a straightforward way is to re-compile the source code into the native executable for a commodity machine (called rehosting). However, ad-hoc re-hosting is a daunting and tedious task and subject to many issues (library-dependence, kernel-dependence and hardware-dependence). In this work, we systematically explore the portability problem of MCU software and propose pararehosting to ease the porting process. Specifically, we abstract and implement a portable MCU (PMCU) using the POSIX interface. It models common functions of the MCU cores. For peripheral specific logic, we propose HAL-based peripheral function replacement, in which high-level hardware functions are replaced with an equivalent backend driver on the host. These backend drivers are invoked by well-designed para-APIs and can be reused across many MCU OSs. We categorize common HAL functions into four types and implement templates for quick backend development. Using the proposed approach, we have successfully rehosted nine MCU OSs including the widely deployed Amazon FreeRTOS, ARM Mbed OS, Zephyr and LiteOS. To demonstrate the superiority of our approach in terms of security testing, we used off-the-shelf dynamic analysis tools (AFL and ASAN) against the rehosted programs and discovered 28 previously-unknown bugs, among which 5 were confirmed by CVE and the other 19 were confirmed by vendors at the time of writing.

[184] 2107.12871

Model Free Barrier Functions via Implicit Evading Maneuvers

This paper demonstrates that in some cases the safety override arising from the use of a barrier function can be needlessly restrictive. In particular, we examine the case of fixed wing collision avoidance and show that when using a barrier function, there are cases where two fixed wing aircraft can come closer to colliding than if there were no barrier function at all. In addition, we construct cases where the barrier function labels the system as unsafe even when the vehicles start arbitrarily far apart. In other words, the barrier function ensures safety but with unnecessary costs to performance. We therefore introduce model free barrier functions which take a data driven approach to creating a barrier function. We demonstrate the effectiveness of model free barrier functions in a collision avoidance simulation of two fixed-wing aircraft.

[185] 2107.12873

PDF-Malware: An Overview on Threats, Detection and Evasion Attacks

In the recent years, Portable Document Format, commonly known as PDF, has become a democratized standard for document exchange and dissemination. This trend has been due to its characteristics such as its flexibility and portability across platforms. The widespread use of PDF has installed a false impression of inherent safety among benign users. However, the characteristics of PDF motivated hackers to exploit various types of vulnerabilities, overcome security safeguards, thereby making the PDF format one of the most efficient malicious code attack vectors. Therefore, efficiently detecting malicious PDF files is crucial for information security. Several analysis techniques has been proposed in the literature, be it static or dynamic, to extract the main features that allow the discrimination of malware files from benign ones. Since classical analysis techniques may be limited in case of zero-days, machine-learning based techniques have emerged recently as an automatic PDF-malware detection method that is able to generalize from a set of training samples. These techniques are themselves facing the challenge of evasion attacks where a malicious PDF is transformed to look benign. In this work, we give an overview on the PDF-malware detection problem. We give a perspective on the new challenges and emerging solutions.

[186] 2107.12877

Efficient TBox Reasoning with Value Restrictions using the $\mathcal{FL}_{o}$wer reasoner

The inexpressive Description Logic (DL) $\mathcal{FL}_0$, which has conjunction and value restriction as its only concept constructors, had fallen into disrepute when it turned out that reasoning in $\mathcal{FL}_0$ w.r.t. general TBoxes is ExpTime-complete, i.e., as hard as in the considerably more expressive logic $\mathcal{ALC}$. In this paper, we rehabilitate $\mathcal{FL}_0$ by presenting a dedicated subsumption algorithm for $\mathcal{FL}_0$, which is much simpler than the tableau-based algorithms employed by highly optimized DL reasoners. Our experiments show that the performance of our novel algorithm, as prototypically implemented in our $\mathcal{FL}_o$wer reasoner, compares very well with that of the highly optimized reasoners. $\mathcal{FL}_o$wer can also deal with ontologies written in the extension $\mathcal{FL}_{\bot}$ of $\mathcal{FL}_0$ with the top and the bottom concept by employing a polynomial-time reduction, shown in this paper, which eliminates top and bottom. We also investigate the complexity of reasoning in DLs related to the Horn-fragments of $\mathcal{FL}_0$ and $\mathcal{FL}_{\bot}$.

[187] 2107.12879

Consumer belonging behaviour: Qualitative testing of a theoretical framework and proposal of an alternative model

Much research has been conducted on how consumption is related to human relations, for example, consumer communities organized around specific brands, or the way people use products to define their own identity and transmit a desired image. However, only a scarcity of research has examined the consumption behaviour when the fundamental intention is to leverage group belonging. The literature comprises a single theoretical framework that describes this behaviour, a nascent proposition that has not been tested. This study examines the transferability of that theoretical framework in a different context than the one used for its proposal and its extent on the phenomenon of consuming to leverage belonging. A qualitative deductive case study and a pattern matching analysis technique were employed, followed by a structural coding analysis of interview data. The findings revealed the model is transferable, however its conceptual extent on the phenomenon it addresses faces limitations. These findings allow the proposal of an alternative framework, the Belonging-Oriented Consumption Model. This model provides a theoretical basis for future research on consumer belonging behaviour.

[188] 2107.12884

SimCleaner -- Sistema de Padronização de Bases de Dados utilizando Funções de Similaridade

The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of the State of Par\'a database and may be reused with other databases and other data mining projects.

[189] 2107.12886

Parameter-uniform numerical methods for singularly perturbed linear transport problems

Pointwise accurate numerical methods are constructed and analysed for three classes of singularly perturbed first order transport problems. The methods involve piecewise-uniform Shishkin meshes and the numerical approximations are shown to be parameter-uniformly convergent in the maximum norm. A transport problem from the modelling of fluid-particle interaction is formulated and used as a test problem for these numerical methods. Numerical results are presented to illustrate the performance of the numerical methods and to confirm the theoretical error bounds established in the paper.

[190] 2107.12895

Emotion Recognition under Consideration of the Emotion Component Process Model

Emotion classification in text is typically performed with neural network models which learn to associate linguistic units with emotions. While this often leads to good predictive performance, it does only help to a limited degree to understand how emotions are communicated in various domains. The emotion component process model (CPM) by Scherer (2005) is an interesting approach to explain emotion communication. It states that emotions are a coordinated process of various subcomponents, in reaction to an event, namely the subjective feeling, the cognitive appraisal, the expression, a physiological bodily reaction, and a motivational action tendency. We hypothesize that these components are associated with linguistic realizations: an emotion can be expressed by describing a physiological bodily reaction ("he was trembling"), or the expression ("she smiled"), etc. We annotate existing literature and Twitter emotion corpora with emotion component classes and find that emotions on Twitter are predominantly expressed by event descriptions or subjective reports of the feeling, while in literature, authors prefer to describe what characters do, and leave the interpretation to the reader. We further include the CPM in a multitask learning model and find that this supports the emotion categorization. The annotated corpora are available at

[191] 2107.12898

StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement

Image enhancement is a subjective process whose targets vary with user preferences. In this paper, we propose a deep learning-based image enhancement method covering multiple tonal styles using only a single model dubbed StarEnhancer. It can transform an image from one tonal style to another, even if that style is unseen. With a simple one-time setting, users can customize the model to make the enhanced images more in line with their aesthetics. To make the method more practical, we propose a well-designed enhancer that can process a 4K-resolution image over 200 FPS but surpasses the contemporaneous single style image enhancement methods in terms of PSNR, SSIM, and LPIPS. Finally, our proposed enhancement method has good interactability, which allows the user to fine-tune the enhanced image using intuitive options.

[192] 2107.12900

Projector Pixel Redirection Using Phase-Only Spatial Light Modulator

In projection mapping from a projector to a non-planar surface, the pixel density on the surface becomes uneven. This causes the critical problem of local spatial resolution degradation. We confirmed that the pixel density uniformity on the surface was improved by redirecting projected rays using a phase-only spatial light modulator.

[193] 2107.12902

Logical Characterization of Coherent Uninterpreted Programs

An uninterpreted program (UP) is a program whose semantics is defined over the theory of uninterpreted functions. This is a common abstraction used in equivalence checking, compiler optimization, and program verification. While simple, the model is sufficiently powerful to encode counter automata, and, hence, undecidable. Recently, a class of UP programs, called coherent, has been proposed and shown to be decidable. We provide an alternative, logical characterization, of this result. Specifically, we show that every coherent program is bisimilar to a finite state system. Moreover, an inductive invariant of a coherent program is representable by a formula whose terms are of depth at most 1. We also show that the original proof, via automata, only applies to programs over unary uninterpreted functions. While this work is purely theoretical, it suggests a novel abstraction that is complete for coherent programs but can be soundly used on arbitrary uninterpreted (and partially interpreted) programs.

[194] 2107.12908

Digital Collections of Examples in Mathematical Sciences

Some aspects of Computer Algebra (notably Computation Group Theory and Computational Number Theory) have some good databases of examples, typically of the form "all the X up to size n". But most of the others, especially on the polynomial side, are lacking such, despite the utility they have demonstrated in the related fields of SAT and SMT solving. We claim that the field would be enhanced by such community-maintained databases, rather than each author hand-selecting a few, which are often too large or error-prone to print, and therefore difficult for subsequent authors to reproduce.

[195] 2107.12909

So You Want to Analyze Scheme Programs With Datalog?

Static analysis approximates the results of a program by examining only its syntax. For example, control-flow analysis (CFA) determines which syntactic lambdas (for functional languages) or (for object-oriented) methods may be invoked at each call site within a program. Rich theoretical results exist studying control flow analysis for Scheme-like languages, but implementations are often complex and specialized. By contrast, object-oriented languages (Java in particular) enjoy high-precision control-flow analyses that scale to thousands (or more) of lines of code. State-of-the-art implementations (such as DOOP on Souffl\'e) structure the analysis using Horn-SAT (Datalog) to enable compilation of the analysis to efficient implementations such as high-performance relational algebra kernels. In this paper, we present an implementation of control-flow analysis for a significant subset of Scheme (including set!, call/cc, and primitive operations) using the Souffl\'e Datalog engine. We present an evaluation on a worst-case term demonstrating the polynomial complexity of our m-CFA and remark upon scalability results using Souffl\'e.

[196] 2107.12910

Sparse Bayesian Deep Learning for Dynamic System Identification

This paper proposes a sparse Bayesian treatment of deep neural networks (DNNs) for system identification. Although DNNs show impressive approximation ability in various fields, several challenges still exist for system identification problems. First, DNNs are known to be too complex that they can easily overfit the training data. Second, the selection of the input regressors for system identification is nontrivial. Third, uncertainty quantification of the model parameters and predictions are necessary. The proposed Bayesian approach offers a principled way to alleviate the above challenges by marginal likelihood/model evidence approximation and structured group sparsity-inducing priors construction. The identification algorithm is derived as an iterative regularized optimization procedure that can be solved as efficiently as training typical DNNs. Furthermore, a practical calculation approach based on the Monte-Carlo integration method is derived to quantify the uncertainty of the parameters and predictions. The effectiveness of the proposed Bayesian approach is demonstrated on several linear and nonlinear systems identification benchmarks with achieving good and competitive simulation accuracy.

[197] 2107.12912

AToM: Active Topology Monitoring for the Bitcoin Peer-to-Peer Network

Over the past decade, the Bitcoin P2P network protocol has become a reference model for all modern cryptocurrencies. While nodes in this network are known, the connections among them are kept hidden, as it is commonly believed that this helps protect from deanonymization and low-level attacks. However, adversaries can bypass this limitation by inferring connections through side channels. At the same time, the lack of topology information hinders the analysis of the network, which is essential to improve efficiency and security. In this paper, we thoroughly review network-level attacks and empirically show that topology obfuscation is not an effective countermeasure. We then argue that the benefits of an open topology potentially outweigh its risks, and propose a protocol to reliably infer and monitor connections among reachable nodes of the Bitcoin network. We formally analyze our protocol and experimentally evaluate its accuracy in both trusted and untrusted settings. Results show our system has a low impact on the network, and has precision and recall are over 90% with up to 20% of malicious nodes in the network.

[198] 2107.12917

Experiments on Properties of Hidden Structures of Sparse Neural Networks

Sparsity in the structure of Neural Networks can lead to less energy consumption, less memory usage, faster computation times on convenient hardware, and automated machine learning. If sparsity gives rise to certain kinds of structure, it can explain automatically obtained features during learning. We provide insights into experiments in which we show how sparsity can be achieved through prior initialization, pruning, and during learning, and answer questions on the relationship between the structure of Neural Networks and their performance. This includes the first work of inducing priors from network theory into Recurrent Neural Networks and an architectural performance prediction during a Neural Architecture Search. Within our experiments, we show how magnitude class blinded pruning achieves 97.5% on MNIST with 80% compression and re-training, which is 0.5 points more than without compression, that magnitude class uniform pruning is significantly inferior to it and how a genetic search enhanced with performance prediction achieves 82.4% on CIFAR10. Further, performance prediction for Recurrent Networks learning the Reber grammar shows an $R^2$ of up to 0.81 given only structural information.

[199] 2107.12919

Transfer Learning in Electronic Health Records through Clinical Concept Embedding

Deep learning models have shown tremendous potential in learning representations, which are able to capture some key properties of the data. This makes them great candidates for transfer learning: Exploiting commonalities between different learning tasks to transfer knowledge from one task to another. Electronic health records (EHR) research is one of the domains that has witnessed a growing number of deep learning techniques employed for learning clinically-meaningful representations of medical concepts (such as diseases and medications). Despite this growth, the approaches to benchmark and assess such learned representations (or, embeddings) is under-investigated; this can be a big issue when such embeddings are shared to facilitate transfer learning. In this study, we aim to (1) train some of the most prominent disease embedding techniques on a comprehensive EHR data from 3.1 million patients, (2) employ qualitative and quantitative evaluation techniques to assess these embeddings, and (3) provide pre-trained disease embeddings for transfer learning. This study can be the first comprehensive approach for clinical concept embedding evaluation and can be applied to any embedding techniques and for any EHR concept.

[200] 2107.12920

Emotion Stimulus Detection in German News Headlines

Emotion stimulus extraction is a fine-grained subtask of emotion analysis that focuses on identifying the description of the cause behind an emotion expression from a text passage (e.g., in the sentence "I am happy that I passed my exam" the phrase "passed my exam" corresponds to the stimulus.). Previous work mainly focused on Mandarin and English, with no resources or models for German. We fill this research gap by developing a corpus of 2006 German news headlines annotated with emotions and 811 instances with annotations of stimulus phrases. Given that such corpus creation efforts are time-consuming and expensive, we additionally work on an approach for projecting the existing English GoodNewsEveryone (GNE) corpus to a machine-translated German version. We compare the performance of a conditional random field (CRF) model (trained monolingually on German and cross-lingually via projection) with a multilingual XLM-RoBERTa (XLM-R) model. Our results show that training with the German corpus achieves higher F1 scores than projection. Experiments with XLM-R outperform their respective CRF counterparts.

[201] 2107.12921

Angel's Girl for Blind Painters: an Efficient Painting Navigation System Validated by Multimodal Evaluation Approach

For people who ardently love painting but unfortunately have visual impairments, holding a paintbrush to create a work is a very difficult task. People in this special group are eager to pick up the paintbrush, like Leonardo da Vinci, to create and make full use of their own talents. Therefore, to maximally bridge this gap, we propose a painting navigation system to assist blind people in painting and artistic creation. The proposed system is composed of cognitive system and guidance system. The system adopts drawing board positioning based on QR code, brush navigation based on target detection and bush real-time positioning. Meanwhile, this paper uses human-computer interaction on the basis of voice and a simple but efficient position information coding rule. In addition, we design a criterion to efficiently judge whether the brush reaches the target or not. According to the experimental results, the thermal curves extracted from the faces of testers show that it is relatively well accepted by blindfolded and even blind testers. With the prompt frequency of 1s, the painting navigation system performs best with the completion degree of 89% with SD of 8.37% and overflow degree of 347% with SD of 162.14%. Meanwhile, the excellent and good types of brush tip trajectory account for 74%, and the relative movement distance is 4.21 with SD of 2.51. This work demonstrates that it is practicable for the blind people to feel the world through the brush in their hands. In the future, we plan to deploy Angle's Eyes on the phone to make it more portable. The demo video of the proposed painting navigation system is available at:

[202] 2107.12922

Design Space Exploration of Sparse Accelerators for Deep Neural Networks

Novel architectures for deep learning exploit both activation and weight sparsity to improve the performance of DNN inference. However, this speedup usually brings non-negligible overheads which diminish the efficiency of such designs when running dense models. These overheads specifically are exacerbated for low precision accelerators with optimized SRAM size per core. This paper examines the design space trade-offs of such accelerators aiming to achieve competitive performance and efficiency metrics for all four combinations of dense or sparse activation/weight tensors. To do so, we systematically examine overheads of supporting sparsity on top of an optimized dense core. These overheads are modeled based on parameters that indicate how a multiplier can borrow a nonzero operation from the neighboring multipliers or future cycles. As a result of this exploration, we identify a few promising designs that perform better than prior work. Our findings suggest that even a best design targeting dual sparsity yields 20%-30% drop in power efficiency when performing on single sparse models, i.e., those with only sparse weight or sparse activation tensors. We introduce novel techniques to reuse resources of the same core to maintain high performance and efficiency when running single sparsity or dense models. We call this hybrid design Griffin. Griffin is 1.2, 3.0, 3.1, and 1.4X more power efficient than state-of-the-art sparse architectures, for dense, weight-only sparse, activation-only sparse, and dual sparse models, respectively.

[203] 2107.12924

Finite-Time Gradient Descent-Based Adaptive Neural Network Finite-Time Control Design for Attitude Tracking of a 3-DOF Helicopter

This paper investigates a novel finite-time gradient descent-based adaptive neural network finite-time control strategy for the attitude tracking of a 3-DOF lab helicopter platform subject to composite disturbances. First, the radial basis function neural network (RBFNN) is applied to estimate lumped disturbances, where the weights, centers and widths of the RBFNN are trained online via finite-time gradient descent algorithm. Then, a finite-time backstepping control scheme is constructed to fulfill the tracking control of the elevation and pitch angles. In addition, a hybrid finite-time differentiator (HFTD) is introduced for approximating the intermediate control signal and its derivative to avoid the problem of "explosion of complexity" in the traditional backstepping design protocol. Moreover, the errors caused by the HFTD can be attenuated by the combination of compensation signals. With the aid of the stability theorem, it is proved that the closed-loop system is semi-globally uniformly ultimately boundedness in finite time. Finally, a comparison result is provided to illustrate the effectiveness and advantages of the designed control strategy.

[204] 2107.12930

gaBERT -- an Irish Language Model

The BERT family of neural language models have become highly popular due to their ability to provide sequences of text with rich context-sensitive token encodings which are able to generalise well to many Natural Language Processing tasks. Over 120 monolingual BERT models covering over 50 languages have been released, as well as a multilingual model trained on 104 languages. We introduce, gaBERT, a monolingual BERT model for the Irish language. We compare our gaBERT model to multilingual BERT and show that gaBERT provides better representations for a downstream parsing task. We also show how different filtering criteria, vocabulary size and the choice of subword tokenisation model affect downstream performance. We release gaBERT and related code to the community.

[205] 2107.12931

Persistent Reinforcement Learning via Subgoal Curricula

Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents. However, the success of current reinforcement learning algorithms is predicated on an often under-emphasised requirement -- each trial needs to start from a fixed initial state distribution. Unfortunately, resetting the environment to its initial state after each trial requires substantial amount of human supervision and extensive instrumentation of the environment which defeats the purpose of autonomous reinforcement learning. In this work, we propose Value-accelerated Persistent Reinforcement Learning (VaPRL), which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks. The agent also learns to reach the initial states proposed by the curriculum, minimizing the reliance on human interventions into the learning. We observe that VaPRL reduces the interventions required by three orders of magnitude compared to episodic RL while outperforming prior state-of-the art methods for reset-free RL both in terms of sample efficiency and asymptotic performance on a variety of simulated robotics problems.

[206] 2107.12932

Predicting Take-over Time for Autonomous Driving with Real-World Data: Robust Data Augmentation, Models, and Evaluation

Understanding occupant-vehicle interactions by modeling control transitions is important to ensure safe approaches to passenger vehicle automation. Models which contain contextual, semantically meaningful representations of driver states can be used to determine the appropriate timing and conditions for transfer of control between driver and vehicle. However, such models rely on real-world control take-over data from drivers engaged in distracting activities, which is costly to collect. Here, we introduce a scheme for data augmentation for such a dataset. Using the augmented dataset, we develop and train take-over time (TOT) models that operate sequentially on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views, showing models trained on the augmented dataset to outperform the initial dataset. The demonstrated model features encode different aspects of the driver state, pertaining to the face, hands, foot and upper body of the driver. We perform ablative experiments on feature combinations as well as model architectures, showing that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay, suitable for complex real-world scenarios.

[207] 2107.12933

A Hybrid Reduced Order Model for nonlinear LES filtering

We develop a Reduced Order Model (ROM) for a Large Eddy Simulation (LES) approach that combines a three-step algorithm called Evolve-Filter-Relax (EFR) with a computationally efficient finite volume method. The main novelty of our ROM lies in the use within the EFR algorithm of a nonlinear, deconvolution-based indicator function that identifies the regions of the domain where the flow needs regularization. The ROM we propose is a hybrid projection/data-driven strategy: a classical Proper Orthogonal Decomposition Galerkin projection approach for the reconstruction of the velocity and the pressure fields and a data-driven reduction method to approximate the indicator function used by the nonlinear differential filter. This data-driven technique is based on interpolation with Radial Basis Functions. We test the performance of our ROM approach on two benchmark problems: 2D and 3D unsteady flow past a cylinder at Reynolds number 0 <= Re <= 100. The accuracy of the ROM is assessed against results obtained with the full order model for velocity, pressure, indicator function and time evolution of the aerodynamics coefficients.

[208] 2107.12938

Yet Another Combination of IR- and Neural-based Comment Generation

Code comment generation techniques aim to generate natural language descriptions for source code. There are two orthogonal approaches for this task, i.e., information retrieval (IR) based and neural-based methods. Recent studies have focused on combining their strengths by feeding the input code and its similar code snippets retrieved by the IR-based approach to the neural-based approach, which can enhance the neural-based approach's ability to output low-frequency words and further improve the performance. However, despite the tremendous progress, our pilot study reveals that the current combination is not generalizable and can lead to performance degradation. In this paper, we propose a straightforward but effective approach to tackle the issue of existing combinations of these two comment generation approaches. Instead of binding IR- and neural-based approaches statically, we combine them in a dynamic manner. Specifically, given an input code snippet, we first use an IR-based technique to retrieve a similar code snippet from the corpus. Then we use a Cross-Encoder based classifier to decide the comment generation method to be used dynamically, i.e., if the retrieved similar code snippet is a true positive (i.e., is semantically similar to the input), we directly use the IR-based technique. Otherwise, we pass the input to the neural-based model to generate the comment. We evaluate our approach on a large-scale dataset of Java projects. Experiment results show that our approach can achieve 25.45 BLEU score, which improves the state-of-the-art IR-based approach, neural-based approach, and their combination by 41%, 26%, and 7%, respectively. We propose a straightforward but effective dynamic combination of IR-based and neural-based comment generation, which outperforms state-of-the-art approaches by a substantial margin.

[209] 2107.12939

Optimal Frequency Regulation using Packetized Energy Management

Packetized energy management (PEM) is a demand dispatch scheme that can be used to provide ancillary services such as frequency regulation. In PEM, distributed energy resources (DERs) are granted uninterruptible access to the grid for a pre-specified time interval called the packet length. This results in a down ramp-limited response in PEM for DERs that can only consume power from the grid. In this work, a linearized virtual battery model of PEM is provided that is capable of predicting the down-ramp limited output of PEM and is used in a model predictive control (MPC) framework to improve the performance of PEM in tracking an automatic generation control (AGC) signal. By performing statistical analysis on the AGC regulation signal, PJM Reg-D, an ARMA model is derived as a predictor for the MPC-based precompensator. Finally, as an alternative to MPC, it is shown that by varying the packet length as a function of time, for example through packet randomization, frequency regulation can be improved under PEM.

[210] 2107.12940

Finding Failures in High-Fidelity Simulation using Adaptive Stress Testing and the Backward Algorithm

Validating the safety of autonomous systems generally requires the use of high-fidelity simulators that adequately capture the variability of real-world scenarios. However, it is generally not feasible to exhaustively search the space of simulation scenarios for failures. Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system. AST with a deep reinforcement learning solver has been shown to be effective in finding failures across a range of different systems. This approach generally involves running many simulations, which can be very expensive when using a high-fidelity simulator. To improve efficiency, we present a method that first finds failures in a low-fidelity simulator. It then uses the backward algorithm, which trains a deep neural network policy using a single expert demonstration, to adapt the low-fidelity failures to high-fidelity. We have created a series of autonomous vehicle validation case studies that represent some of the ways low-fidelity and high-fidelity simulators can differ, such as time discretization. We demonstrate in a variety of case studies that this new AST approach is able to find failures with significantly fewer high-fidelity simulation steps than are needed when just running AST directly in high-fidelity. As a proof of concept, we also demonstrate AST on NVIDIA's DriveSim simulator, an industry state-of-the-art high-fidelity simulator for finding failures in autonomous vehicles.

[211] 2107.12942

Reinforcement Learning with Formal Performance Metrics for Quadcopter Attitude Control under Non-nominal Contexts

We explore the reinforcement learning approach to designing controllers by extensively discussing the case of a quadcopter attitude controller. We provide all details allowing to reproduce our approach, starting with a model of the dynamics of a crazyflie 2.0 under various nominal and non-nominal conditions, including partial motor failures and wind gusts. We develop a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior and measure the performance of controllers. The paper thoroughly describes the choices in training algorithms, neural net architecture, hyperparameters, observation space in view of the different performance metrics we have introduced. We discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts and finish by drawing conclusions on practical controller design by reinforcement learning.

[212] 2107.12948

Let Trajectories Speak Out the Traffic Bottlenecks

Traffic bottlenecks are a set of road segments that have an unacceptable level of traffic caused by a poor balance between road capacity and traffic volume. A huge volume of trajectory data which captures real-time traffic conditions in road networks provides promising new opportunities to identify the traffic bottlenecks. In this paper, we define this problem as trajectory-driven traffic bottleneck identification: Given a road network R, a trajectory database T , find a representative set of seed edges of size K of traffic bottlenecks that influence the highest number of road segments not in the seed set. We show that this problem is NP-hard and propose a framework to find the traffic bottlenecks as follows. First, a traffic spread model is defined which represents changes in traffic volume for each road segment over time. Then, the traffic diffusion probability between two connected segments and the residual ratio of traffic volume for each segment can be computed using historical trajectory data. We then propose two different algorithmic approaches to solve the problem. The first one is a best-first algorithm BF, with an approximation ratio of 1-1/e. To further accelerate the identification process in larger datasets, we also propose a sampling-based greedy algorithm SG. Finally, comprehensive experiments using three different datasets compare and contrast various solutions, and provide insights into important efficiency and effectiveness trade-offs among the respective methods.

[213] 2107.12950

A Greedy Data Collection Scheme For Linear Dynamical Systems

Mathematical models are essential to analyze and understand the dynamics of complex systems. Recently, data-driven methodologies have got a lot of attention which is leveraged by advancements in sensor technology. However, the quality of obtained data plays a vital role in learning a good and reliable model. Therefore, in this paper, we propose an efficient heuristic methodology to collect data both in the frequency domain and time-domain, aiming at the best possible information gain from limited experimental data. The efficiency of the proposed methodology is illustrated by means of several examples, and also, its robustness in the presence of noisy data is shown.

[214] 2107.12954

Analysis of a stabilised finite element method for power-law fluids

A low-order finite element method is constructed and analysed for an incompressible non-Newtonian flow problem with power-law rheology. The method is based on a continuous piecewise linear approximation of the velocity field and piecewise constant approximation of the pressure. Stabilisation, in the form of pressure jumps, is added to the formulation to compensate for the failure of the inf-sup condition, and using an appropriate lifting of the pressure jumps a divergence-free approximation to the velocity field is built and included in the discretisation of the convection term. This construction allows us to prove the convergence of the resulting finite element method for the entire range $r>\frac{2 d}{d+2}$ of the power-law index $r$ for which weak solutions to the model are known to exist in $d$ space dimensions, $d \in \{2,3\}$.

[215] 2107.12957

Learning Numeric Optimal Differentially Private Truncated Additive Mechanisms

Differentially private (DP) mechanisms face the challenge of providing accurate results while protecting their inputs: the privacy-utility trade-off. A simple but powerful technique for DP adds noise to sensitivity-bounded query outputs to blur the exact query output: additive mechanisms. While a vast body of work considers infinitely wide noise distributions, some applications (e.g., real-time operating systems) require hard bounds on the deviations from the real query, and only limited work on such mechanisms exist. An additive mechanism with truncated noise (i.e., with bounded range) can offer such hard bounds. We introduce a gradient-descent-based tool to learn truncated noise for additive mechanisms with strong utility bounds while simultaneously optimizing for differential privacy under sequential composition, i.e., scenarios where multiple noisy queries on the same data are revealed. Our method can learn discrete noise patterns and not only hyper-parameters of a predefined probability distribution. For sensitivity bounded mechanisms, we show that it is sufficient to consider symmetric and that\new{, for from the mean monotonically falling noise,} ensuring privacy for a pair of representative query outputs guarantees privacy for all pairs of inputs (that differ in one element). We find that the utility-privacy trade-off curves of our generated noise are remarkably close to truncated Gaussians and even replicate their shape for $l_2$ utility-loss. For a low number of compositions, we also improved DP-SGD (sub-sampling). Moreover, we extend Moments Accountant to truncated distributions, allowing to incorporate mechanism output events with varying input-dependent zero occurrence probability.

[216] 2107.12958

Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Several prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate malicious workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this work, we propose Verifiable Coded Computing (VCC) framework that decouples Byzantine node detection challenge from the straggler tolerance. VCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach of verifiable computing to tackle Byzantine nodes. Furthermore, VCC dynamically adapts its coding scheme to tradeoff straggler tolerance with Byzantine protection and vice-versa. We evaluate VCC on compute intensive distributed logistic regression application. Our experiments show that VCC speeds up the conventional uncoded implementation of distributed logistic regression by $3.2\times-6.9\times$, and also improves the test accuracy by up to $12.6\%$.

[217] 2107.12960

Enriching Local and Global Contexts for Temporal Action Localization

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three sub-networks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3\% at IoU@0.5) and ActivityNet v1.3 (51.24\% at IoU@0.5) datasets, which outperforms recent states of the art.

[218] 2107.12964

A Physiologically-adapted Gold Standard for Arousal During a Stress Induced Scenario

Emotion is an inherently subjective psychophysiological human-state and to produce an agreed-upon representation (gold standard) for continuous emotion requires a time-consuming and costly training procedure of multiple human annotators. There is strong evidence in the literature that physiological signals are sufficient objective markers for states of emotion, particularly arousal. In this contribution, we utilise a dataset which includes continuous emotion and physiological signals - Heartbeats per Minute (BPM), Electrodermal Activity (EDA), and Respiration-rate - captured during a stress induced scenario (Trier Social Stress Test). We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features. We utilise the state-of-the-art MuSe-Toolbox to consider both annotation delay and inter-rater agreement weighting when fusing the target signals. An improvement in Concordance Correlation Coefficient (CCC) is seen across features sets when fusing EDA with arousal, compared to the arousal only gold standard results. Additionally, BERT-based textual features' results improved for arousal plus all physiological signals, obtaining up to .3344 CCC compared to .2118 CCC for arousal only. Multimodal fusion also improves overall CCC with audio plus video features obtaining up to .6157 CCC to recognize arousal plus EDA and BPM.

[219] 2107.12972

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.

[220] 2107.12973

The Space Complexity of Sum Labelling

A graph is called a sum graph if its vertices can be labelled by distinct positive integers such that there is an edge between two vertices if and only if the sum of their labels is the label of another vertex of the graph. Most papers on sum graphs consider combinatorial questions like the minimum number of isolated vertices that need to be added to a given graph to make it a sum graph. In this paper, we initiate the study of sum graphs from the viewpoint of computational complexity. Notice that every $n$-vertex sum graph can be represented by a sorted list of $n$ positive integers where edge queries can be answered in $O(\log n)$ time. Therefore, limiting the size of the vertex labels also upper-bounds the space complexity of storing the graph in the database. We show that every $n$-vertex, $m$-edge, $d$-degenerate graph can be made a sum graph by adding at most $m$ isolated vertices to it, such that the size of each vertex label is at most $O(n^2d)$. This enables us to store the graph using $O(m\log n)$ bits of memory. For sparse graphs (graphs with $O(n)$ edges), this matches the trivial lower bound of $\Omega(n\log n)$. Since planar graphs and forests have constant degeneracy, our result implies an upper bound of $O(n^2)$ on their label size. The previously best known upper bound on the label size of general graphs with the minimum number of isolated vertices was $O(4^n)$, due to Kratochv\'il, Miller & Nguyen. Furthermore, their proof was existential, whereas our labelling can be constructed in polynomial time.

[221] 2107.12977

The social dilemma in AI development and why we have to solve it

While the demand for ethical artificial intelligence (AI) systems increases, the number of unethical uses of AI accelerates, even though there is no shortage of ethical guidelines. We argue that a main underlying cause for this is that AI developers face a social dilemma in AI development ethics, preventing the widespread adaptation of ethical best practices. We define the social dilemma for AI development and describe why the current crisis in AI development ethics cannot be solved without relieving AI developers of their social dilemma. We argue that AI development must be professionalised to overcome the social dilemma, and discuss how medicine can be used as a template in this process.

[222] 2107.12979

Predictive Coding: a Theoretical and Experimental Review

Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world. The theory is closely related to the Bayesian brain framework and, over the last two decades, has gained substantial influence in the fields of theoretical and cognitive neuroscience. A large body of research has arisen based on both empirically testing improved and extended theoretical and mathematical models of predictive coding, as well as in evaluating their potential biological plausibility for implementation in the brain and the concrete neurophysiological and psychological predictions made by the theory. Despite this enduring popularity, however, no comprehensive review of predictive coding theory, and especially of recent developments in this field, exists. Here, we provide a comprehensive review both of the core mathematical structure and logic of predictive coding, thus complementing recent tutorials in the literature. We also review a wide range of classic and recent work within the framework, ranging from the neurobiologically realistic microcircuits that could implement predictive coding, to the close relationship between predictive coding and the widely-used backpropagation of error algorithm, as well as surveying the close relationships between predictive coding and modern machine learning techniques.

[223] 2107.12981

Cross-Referencing Method for Scalable Public Blockchain

We previously proposed a cross-referencing method for enabling multiple peer-to-peer network domains to manage their own public blockchains and periodically exchanging the state of the latest fixed block in the blockchain with hysteresis signatures among all the domains via an upper network layer. In this study, we evaluated the effectiveness of our method from three theoretical viewpoints: decentralization, scalability, and tamper resistance. We show that the performance of the entire system can be improved because transactions and blocks are distributed only inside the domain. We argue that the transaction processing capacity will increase to 56,000 transactions per second, which is as much as that of a VISA credit card system. The capacity is also evaluated by multiplying the number of domains by the average reduction in transaction-processing time due to the increase in block size and reduction in the block-generation-time interval by domain partition. For tamper resistance, each domain has evidence of the hysteresis signatures of the other domains in the blockchain. We introduce two types of tamper-resistance-improvement ratios as evaluation measures of tamper resistance for a blockchain and theoretically explain how tamper resistance is improved using our cross-referencing method. With our method, tamper resistance improves as the number of domains increases. The proposed system of 1,000 domains are 3-10 times more tamper-resistant than that of 100 domains, and the capacity is 10 times higher. We conclude that our method enables a more scalable and tamper-resistant public blockchain balanced with decentralization.

[224] 2107.12986

Logics Meet 2-Way 1-Clock Alternating Timed Automata

In this paper, we study the extension of 1-clock Alternating Timed Automata (1-ATA) with the ability to read in both forward and backward direction, the 2-Way 1-clock Alternating Timed Automata (2-Way 1-ATA). We show that subclass of 2-Way 1-ATA with reset free loops (2-Way 1-ATA-rfl) is expressively equivalent to MSO[<] extended with Guarded Metric Quantifiers (GQMSO). Emptiness Checking problem for 2-Way 1-ATA-rfl (and hence GQMSO) is undecidable, in general. We propose a "non-punctuality" like restriction, called non-adjacency, for 2-Way 1-ATA-rfl, and also for GQMSO, for which the emptiness (respectively, satisfiability) checking becomes decidable. Non-Adjacent 2-Way 1-ATA is the first such class of Timed Automata with alternations and 2-wayness for which the emptiness checking is decidable (and that too with elementary complexity). We also show that 2-Way 1-ATA-rfl, even with the non-adjacent restrictions, can express properties is not recognizable using 1-ATA.

[225] 2107.12996

Hamiltonian Operator Inference: Physics-preserving Learning of Reduced-order Models for Hamiltonian Systems

This work presents a nonintrusive physics-preserving method to learn reduced-order models (ROMs) of Hamiltonian systems. Traditional intrusive projection-based model reduction approaches utilize symplectic Galerkin projection to construct Hamiltonian reduced models by projecting Hamilton's equations of the full model onto a symplectic subspace. This symplectic projection requires complete knowledge about the full model operators and full access to manipulate the computer code. In contrast, the proposed Hamiltonian operator inference approach embeds the physics into the operator inference framework to develop a data-driven model reduction method that preserves the underlying symplectic structure. Our method exploits knowledge of the Hamiltonian functional to define and parametrize a Hamiltonian ROM form which can then be learned from data projected via symplectic projectors. The proposed method is `gray-box' in that it utilizes knowledge of the Hamiltonian structure at the partial differential equation level, as well as knowledge of spatially local components in the system. However, it does not require access to computer code, only data to learn the models. Our numerical results demonstrate Hamiltonian operator inference on a linear wave equation, the cubic nonlinear Schr\"{o}dinger equation, and a nonpolynomial sine-Gordon equation. Accurate long-time predictions far outside the training time interval for nonlinear examples illustrate the generalizability of our learned models.

[226] 1411.6614

Two-dimensional local Hamiltonian problem with area laws is QMA-complete

We show that the two-dimensional (2D) local Hamiltonian problem with the constraint that the ground state obeys area laws is QMA-complete. We also prove similar results in 2D translation-invariant systems and for the 3D Heisenberg and Hubbard models with local magnetic fields. Consequently, unless MA = QMA, not all ground states of 2D local Hamiltonians with area laws have efficient classical representations that support efficient computation of local expectation values. In the future, even if area laws are proved for ground states of 2D gapped systems, the computational complexity of these systems remains unclear.

[227] 2107.12102

Global optimization using random embeddings

We propose a random-subspace algorithmic framework for global optimization of Lipschitz-continuous objectives, and analyse its convergence using novel tools from conic integral geometry. X-REGO randomly projects, in a sequential or simultaneous manner, the high-dimensional original problem into low-dimensional subproblems that can then be solved with any global, or even local, optimization solver. We estimate the probability that the randomly-embedded subproblem shares (approximately) the same global optimum as the original problem. This success probability is then used to show convergence of X-REGO to an approximate global solution of the original problem, under weak assumptions on the problem (having a strictly feasible global solution) and on the solver (guaranteed to find an approximate global solution of the reduced problem with sufficiently high probability). In the particular case of unconstrained objectives with low effective dimension, that only vary over a low-dimensional subspace, we propose an X-REGO variant that explores random subspaces of increasing dimension until finding the effective dimension of the problem, leading to X-REGO globally converging after a finite number of embeddings, proportional to the effective dimension. We show numerically that this variant efficiently finds both the effective dimension and an approximate global minimizer of the original problem.

[228] 2107.12371

An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations

Field Programmable Gate Arrays generate algorithmic specific architectures that improve the code's FLOP per watt ratio. Such devices are re-gaining interest due to the rise of new tools that facilitate their programming, such as OmpSs. The computational fluid dynamics community is always investigating new architectures that can improve its algorithm's performance. Commonly, those algorithms have a low arithmetic intensity and only reach a small percentage of the peak performance. The sparse matrix-vector multiplication is one of the most time-consuming operations on unstructured simulations. The matrix's sparsity pattern determines the indirect memory accesses of the multiplying vector. This data path is hard to predict, making traditional implementations fail. In this work, we present an FPGA architecture that maximizes the vector's re-usability by introducing a cache-like architecture. The cache is implemented as a circular list that maintains the BRAM vector components while needed. Following this strategy, up to 16 times of acceleration is obtained compared to a naive implementation of the algorithm.

[229] 2107.12375

Geometric Deep Learning on Molecular Representations

Geometric deep learning (GDL), which is based on neural network architectures that incorporate and process symmetry information, has emerged as a recent paradigm in artificial intelligence. GDL bears particular promise in molecular modeling applications, in which various molecular representations with different symmetry properties and levels of abstraction exist. This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry. Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors. This review provides an overview of current challenges and opportunities, and presents a forecast of the future of GDL for molecular sciences.

[230] 2107.12395

Constraining dark matter annihilation with cosmic ray antiprotons using neural networks

The interpretation of data from indirect detection experiments searching for dark matter annihilations requires computationally expensive simulations of cosmic-ray propagation. In this work we present a new method based on Recurrent Neural Networks that significantly accelerates simulations of secondary and dark matter Galactic cosmic ray antiprotons while achieving excellent accuracy. This approach allows for an efficient profiling or marginalisation over the nuisance parameters of a cosmic ray propagation model in order to perform parameter scans for a wide range of dark matter models. We identify importance sampling as particularly suitable for ensuring that the network is only evaluated in well-trained parameter regions. We present resulting constraints using the most recent AMS-02 antiproton data on several models of Weakly Interacting Massive Particles. The fully trained networks are released as DarkRayNet together with this work and achieve a speed-up of the runtime by at least two orders of magnitude compared to conventional approaches.

[231] 2107.12408

A pressure-based method for weakly compressible two-phase flows under a Baer-Nunziato type model with generic equations of state and pressure and velocity disequilibrium

Within the framework of diffuse interface methods, we derive a pressure-based Baer-Nunziato type model well-suited to weakly compressible multiphase flows. The model can easily deal with different equation of states and it includes relaxation terms characterized by user-defined finite parameters, which drive the pressure and velocity of each phase toward the equilibrium. There is no clear notion of speed of sound, and thus, most of the classical low Mach approximation cannot easily be cast in this context. The proposed solution strategy consists of two operators: a semi-implicit finite-volume solver for the hyperbolic part and an ODE integrator for the relaxation processes. Being the acoustic terms in the hyperbolic part integrated implicitly, the stability condition on the time step is lessened. The discretization of non-conservative terms involving the gradient of the volume fraction fulfills by construction the non-disturbance condition on pressure and velocity to avoid oscillations across the multimaterial interfaces. The developed simulation tool is validated through one-dimensional simulations of shock-tube and Riemann-problems, involving water-aluminum and water-air mixtures, vapor-liquid mixture of water and of carbon dioxide, and almost pure flows. The numerical results match analytical and reference ones, except some expected discrepancies across shocks, which however remain acceptable (errors within some percentage points). All tests were performed with acoustic CFL numbers greater than one, and no stability issues arose, even for CFL greater than 10. The effects of different values of relaxation parameters and of different amount equations of state -- stiffened gas and Peng-Robinson -- were investigated.

[232] 2107.12421

Parallel Surrogate-assisted Optimization Using Mesh Adaptive Direct Search

We consider computationally expensive blackbox optimization problems and present a method that employs surrogate models and concurrent computing at the search step of the mesh adaptive direct search (MADS) algorithm. Specifically, we solve a surrogate optimization problem using locally weighted scatterplot smoothing (LOWESS) models to find promising candidate points to be evaluated by the blackboxes. We consider several methods for selecting promising points from a large number of points. We conduct numerical experiments to assess the performance of the modified MADS algorithm with respect to available CPU resources by means of five engineering design problems.

[233] 2107.12438

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is ``weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.

[234] 2107.12461

Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation

The U-Net architecture, built upon the fully convolutional network, has proven to be effective in biomedical image segmentation. However, U-Net applies skip connections to merge semantically different low- and high-level convolutional features, resulting in not only blurred feature maps, but also over- and under-segmented target regions. To address these limitations, we propose a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, called Sharp U-Net, for binary and multi-class biomedical image segmentation. The key rationale of Sharp U-Net is that instead of applying a plain skip connection, a depthwise convolution of the encoder feature map with a sharpening kernel filter is employed prior to merging the encoder and decoder features, thereby producing a sharpened intermediate feature map of the same size as the encoder map. Using this sharpening filter layer, we are able to not only fuse semantically less dissimilar features, but also to smooth out artifacts throughout the network layers during the early stages of training. Our extensive experiments on six datasets show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks, while adding no extra learnable parameters. Furthermore, Sharp U-Net outperforms baselines that have more than three times the number of learnable parameters.

[235] 2107.12469

SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery

Access to high resolution satellite imagery has dramatically increased in recent years as several new constellations have entered service. High revisit frequencies as well as improved resolution has widened the use cases of satellite imagery to areas such as humanitarian relief and even Search and Rescue (SaR). We propose a novel remote sensing object detection dataset for deep learning assisted SaR. This dataset contains only small objects that have been identified as potential targets as part of a live SaR response. We evaluate the application of popular object detection models to this dataset as a baseline to inform further research. We also propose a novel object detection metric, specifically designed to be used in a deep learning assisted SaR setting.

[236] 2107.12525

Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate. \algname leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget $N$. In this document, we theoretically analyze \algname and show that the MSE of the estimate decays at rate $O(N_1^{-1} + N_2^{-1} + N_1^{1/2}N_2^{-3/2})$, where $N=K \cdot N_1+N_2$ for some integer constant $K$ and $K \cdot N_1$ and $N_2$ represent the number of samples used in Stage 1 and Stage 2 of \algname respectively. Hence, if a constant fraction of the total sample budget $N$ is allocated to each stage, we will achieve a mean squared error of $O(N^{-1})$ which matches the rate of mean squared error of the optimal stratified sampling algorithm given a priori knowledge of the predicate positive rate and standard deviation per stratum.

[237] 2107.12526

Hamilton-Jacobi-Bellman-Isaacs equation for rational inattention in the long-run management of river environments under uncertainty

A new stochastic control model for the long-run environmental management of rivers is mathematically and numerically analyzed, focusing on a modern sediment replenishment problem with unique nonsmooth and nonlinear properties. Rational inattention as a novel adaptive strategy to collect information and intervene against the target system is modeled using Erlangization. The system dynamics containing the river discharge following a continuous-state branching with an immigration-type process and the controlled sediment storage dynamics lead to a nonsmooth and nonlocal infinitesimal generator. Modeling uncertainty, which is ubiquitous in certain applications, is considered in a robust control framework in which deviations between the benchmark and distorted models are penalized through relative entropy. The partial integro-differential Hamilton-Jacobi-Bellman-Isaacs (HJBI) equation as an optimality equation is derived, and its uniqueness, existence, and optimality are discussed. A monotone finite difference scheme guaranteeing the boundedness and uniqueness of numerical solutions is proposed to discretize the HJBI equation and is verified based on manufactured solutions. Model applications are also conducted with the parameter values identified from the available data and physical formulae. The computational results suggest that environmental management should be rationally inattentive in a state-dependent and adaptive manner.

[238] 2107.12536

A Data-Driven Biophysical Computational Model of Parkinson's Disease based on Marmoset Monkeys

In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex neuronal circuit of the brain. Although there are multiple mechanisms underlying the disease, a complete description of those mechanisms and molecular pathogenesis are still missing, and there is still no cure. To address this gap, computational models that resemble neurobiological aspects found in animal models have been proposed. In our model, we performed a data-driven approach in which a set of biologically constrained parameters is optimised using differential evolution. Evolved models successfully resembled single-neuron mean firing rates and spectral signatures of local field potentials from healthy and parkinsonian marmoset brain data. As far as we are concerned, this is the first computational model of Parkinson's Disease based on simultaneous electrophysiological recordings from seven brain regions of Marmoset monkeys. Results show that the proposed model could facilitate the investigation of the mechanisms of PD and support the development of techniques that can indicate new therapies. It could also be applied to other computational neuroscience problems in which biological data could be used to fit multi-scale models of brain circuits.

[239] 2107.12574

Effect of an attached end mass in the dynamics of uncertainty nonlinear continuous random system

This work studies the dynamics of a one dimensional elastic bar with random elastic modulus and prescribed boundary conditions, say, fixed at one end, and attached to a lumped mass and two springs (one linear and another nonlinear) on the other extreme. The system analysis assumes that the elastic modulus has gamma probability distribution and uses Monte Carlo simulations to compute the propagation of uncertainty in this continuous--discrete system. After describing the deterministic and the stochastic modeling of the system, some configurations of the model are analyzed in order to characterize the effect of the lumped mass in the overall behavior of this dynamical system.

[240] 2107.12601

Microphone Array Generalization for Multichannel Narrowband Deep Speech Enhancement

This paper addresses the problem of microphone array generalization for deep-learning-based end-to-end multichannel speech enhancement. We aim to train a unique deep neural network (DNN) potentially performing well on unseen microphone arrays. The microphone array geometry shapes the network's parameters when training on a fixed microphone array, and thus restricts the generalization of the trained network to another microphone array. To resolve this problem, a single network is trained using data recorded by various microphone arrays of different geometries. We design three variants of our recently proposed narrowband network to cope with the agnostic number of microphones. Overall, the goal is to make the network learn the universal information for speech enhancement that is available for any array geometry, rather than learn the one-array-dedicated characteristics. The experiments on both simulated and real room impulse responses (RIR) demonstrate the excellent across-array generalization capability of the proposed networks, in the sense that their performance measures are very close to, or even exceed the network trained with test arrays. Moreover, they notably outperform various beamforming methods and other advanced deep-learning-based methods.

[241] 2107.12631

Learning to Estimate RIS-Aided mmWave Channels

Inspired by the remarkable learning and prediction performance of deep neural networks (DNNs), we apply one special type of DNN framework, known as model-driven deep unfolding neural network, to reconfigurable intelligent surface (RIS)-aided millimeter wave (mmWave) single-input multiple-output (SIMO) systems. We focus on uplink cascaded channel estimation, where known and fixed base station combining and RIS phase control matrices are considered for collecting observations. To boost the estimation performance and reduce the training overhead, the inherent channel sparsity of mmWave channels is leveraged in the deep unfolding method. It is verified that the proposed deep unfolding network architecture can outperform the least squares (LS) method with a relatively smaller training overhead and online computational complexity.

[242] 2107.12665

Generating functions for message-passing on weighted networks: directed bond percolation and SIR epidemics

We study the SIR ("susceptible, infected, removed/recovered") model on directed graphs with heterogeneous transmission probabilities within the message-passing approximation. We characterize the percolation transition, predict cluster size distributions and suggest vaccination strategies. All predictions are compared to numerical simulations on real networks. The percolation threshold which we predict is a rigorous lower bound to the threshold on real networks. For large, locally tree-like networks, our predictions agree very well with the numerical data.

[243] 2107.12671

An Optimal Piezoelectric Beam for Acoustic Energy Harvesting

This study presents a novel piezoelectric beam structure for acoustic energy harvesting. The beams have been designed to maximize output energy in areas where the noise level is loud such as highway traffic. The beam consists of two layers of copper and polyvinylidene fluoride that convert the ambient noise's vibration energy to electrical energy. The piezoelectric material's optimum placement has been studied, and its best position is obtained on the substrate for the maximum yield. Unlike previous studies, in which the entire beam substrate used to be covered by a material, this study presents a modest material usage and contributes to lowering the harvester's final production cost. Additionally, in this study, an electrical model was developed for the sensor and a read-out circuitry was proposed for the converter. Moreover, the sensor was validated at different noise levels at various lengths and locations. The simulations were performed in COMSOL Multiphysics and MATLAB and report a maximum sound pressure of 140 dB from 100 dB point sources in an enclosed air-filled cubic meter chamber.

[244] 2107.12689

A persistent homology-based topological loss for CNN-based multi-class segmentation of CMR

Multi-class segmentation of cardiac magnetic resonance (CMR) images seeks a separation of data into anatomical components with known structure and configuration. The most popular CNN-based methods are optimised using pixel wise loss functions, ignorant of the spatially extended features that characterise anatomy. Therefore, whilst sharing a high spatial overlap with the ground truth, inferred CNN-based segmentations can lack coherence, including spurious connected components, holes and voids. Such results are implausible, violating anticipated anatomical topology. In response, (single-class) persistent homology-based loss functions have been proposed to capture global anatomical features. Our work extends these approaches to the task of multi-class segmentation. Building an enriched topological description of all class labels and class label pairs, our loss functions make predictable and statistically significant improvements in segmentation topology using a CNN-based post-processing framework. We also present (and make available) a highly efficient implementation based on cubical complexes and parallel execution, enabling practical application within high resolution 3D data for the first time. We demonstrate our approach on 2D short axis and 3D whole heart CMR segmentation, advancing a detailed and faithful analysis of performance on two publicly available datasets.

[245] 2107.12695

Orientations and matrix function-based centralities in multiplex network analysis of urban public transport

We study urban public transport systems by means of multiplex networks in which stops are represented as nodes and each line is represented by a layer. We determine and visualize public transport network orientations and compare them with street network orientations of the $36$ largest German as well as $18$ selected major European cities. We find that German urban public transport networks are mainly oriented in a direction close to the cardinal east-west axis, which usually coincides with one of two orthogonal preferential directions of the corresponding street network. While this behavior is present in only a subset of the considered European cities it remains true that none but one considered public transport network has a distinct north-south-like preferential orientation. Furthermore, we study the applicability of the class of matrix function-based centrality measures, which has recently been generalized from single-layer networks to layer-coupled multiplex networks, to our more general urban multiplex framework. Numerical experiments based on highly efficient and scalable methods from numerical linear algebra show promising results, which are in line with previous studies. We comment on advantages over existing methodology, elaborate on the comparison of different measures and weight models, and present detailed hyper-parameter studies. All results are illustrated by demonstrative graphical representations.

[246] 2107.12698

Source-Agnostic Gravitational-Wave Detection with Recurrent Autoencoders

We present an application of anomaly detection techniques based on deep recurrent autoencoders to the problem of detecting gravitational wave signals in laser interferometers. Trained on noise data, this class of algorithms could detect signals using an unsupervised strategy, i.e., without targeting a specific kind of source. We develop a custom architecture to analyze the data from two interferometers. We compare the obtained performance to that obtained with other autoencoder architectures and with a convolutional classifier. The unsupervised nature of the proposed strategy comes with a cost in terms of accuracy, when compared to more traditional supervised techniques. On the other hand, there is a qualitative gain in generalizing the experimental sensitivity beyond the ensemble of pre-computed signal templates. The recurrent autoencoder outperforms other autoencoders based on different architectures. The class of recurrent autoencoders presented in this paper could complement the search strategy employed for gravitational wave detection and extend the reach of the ongoing detection campaigns.

[247] 2107.12710

End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection

Artefacts that serve to distinguish bona fide speech from spoofed or deepfake speech are known to reside in specific subbands and temporal segments. Various approaches can be used to capture and model such artefacts, however, none works well across a spectrum of diverse spoofing attacks. Reliable detection then often depends upon the fusion of multiple detection systems, each tuned to detect different forms of attack. In this paper we show that better performance can be achieved when the fusion is performed within the model itself and when the representation is learned automatically from raw waveform inputs. The principal contribution is a spectro-temporal graph attention network (GAT) which learns the relationship between cues spanning different sub-bands and temporal intervals. Using a model-level graph fusion of spectral (S) and temporal (T) sub-graphs and a graph pooling strategy to improve discrimination, the proposed RawGAT-ST model achieves an equal error rate of 1.06 % for the ASVspoof 2019 logical access database. This is one of the best results reported to date and is reproducible using an open source implementation.

[248] 2107.12723

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

We revisit on-average algorithmic stability of Gradient Descent (GD) for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the Neural Tangent Kernel (NTK) or Polyak-{\L}ojasiewicz (PL) assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.

[249] 2107.12724

Quantum Meet-in-the-Middle Attack on 7-round Feistel Construction

Quantum attacks on Feistel constructions have attracted much more attention from worldwide cryptologists. To reduce the time complexity of quantum attacks on 7-round Feistel construction, we propose a quantum meet-in-the-middle attack based on quantum claw finding algorithm and 5-round distinguisher in Q1 model firstly. Compared with quantum attacks in Q2 model, our attack reduce the time complexity from $O({2^n})$ to $O({2^{7n/8}})$. Moreover, our attack belongs to Q1 model, which is more practical than Q2 model. When compared with best classical attacks, our attack not only reduces the time complexity, but also reduces the data and memory complexity by ${2^{n/2}}$ and ${2^{n/4}}$ respectively.

[250] 2107.12741

Partitioning all $k$-subsets into $r$-wise intersecting families

Let $r \geq 2$, $n$ and $k$ be integers satisfying $k \leq \frac{r-1}{r}n$. We conjecture that the family of all $k$-subsets of an $n$-set cannot be partitioned into fewer than $\lceil n-\frac{r}{r-1}(k-1) \rceil$ $r$-wise intersecting families. If true this is tight for all values of the parameters. The case $r=2$ is Kneser's conjecture, proved by Lov\'asz. Here we observe that the assertion also holds provided $r$ is either a prime number or a power of $2$.

[251] 2107.12773

Reradiation and Scattering from a Reconfigurable Intelligent Surface: A General Macroscopic Model

Reconfigurable intelligent surfaces (RISs) have attracted attention in the last year as nearly-passive, planar structures that can dynamically change their reflection or refraction characteristics, and therefore realize anomalous reflection, focalization, or other radiowave or signal transformations, to engineer and optimize complex propagation environments. Evaluating the performance and optimizing the deployment of RISs in wireless networks need physically sound frameworks that account for the actual electromagnetic and physical characteristics of engineered metasurfaces. In this paper, we introduce a general macroscopic model for the realistic evaluation of RIS scattering, based on its decomposition into multiple scattering mechanisms. Since state-of-the-art ray models can already efficiently simulate specular interactions (reflection, diffraction) and diffuse scattering, but not anomalous reradiation, we complement them with a Huygens principle approach implemented using either an integral formulation, or a simpler antenna-array-like formulation. The different scattering mechanisms are combined through a generalization of the Effective Roughness model using a suitable power conservation equation. Notably, multiple reradiation modes can be modeled through the proposed approach. In addition, we validate the overall model accuracy by benchmarking it against several case studies available in the literature, either based on analytical models, full-wave simulations, or experimental measurements.

[252] 2107.12775

Realistic Ultrasound Image Synthesis for Improved Classification of Liver Disease

With the success of deep learning-based methods applied in medical image analysis, convolutional neural networks (CNNs) have been investigated for classifying liver disease from ultrasound (US) data. However, the scarcity of available large-scale labeled US data has hindered the success of CNNs for classifying liver disease from US data. In this work, we propose a novel generative adversarial network (GAN) architecture for realistic diseased and healthy liver US image synthesis. We adopt the concept of stacking to synthesize realistic liver US data. Quantitative and qualitative evaluation is performed on 550 in-vivo B-mode liver US images collected from 55 subjects. We also show that the synthesized images, together with real in vivo data, can be used to significantly improve the performance of traditional CNN architectures for Nonalcoholic fatty liver disease (NAFLD) classification.

[253] 2107.12783

Statistical Guarantees for Fairness Aware Plug-In Algorithms

A plug-in algorithm to estimate Bayes Optimal Classifiers for fairness-aware binary classification has been proposed in (Menon & Williamson, 2018). However, the statistical efficacy of their approach has not been established. We prove that the plug-in algorithm is statistically consistent. We also derive finite sample guarantees associated with learning the Bayes Optimal Classifiers via the plug-in algorithm. Finally, we propose a protocol that modifies the plug-in approach, so as to simultaneously guarantee fairness and differential privacy with respect to a binary feature deemed sensitive.

[254] 2107.12797

Wasserstein-Splitting Gaussian Process Regression for Heterogeneous Online Bayesian Inference

Gaussian processes (GPs) are a well-known nonparametric Bayesian inference technique, but they suffer from scalability problems for large sample sizes, and their performance can degrade for non-stationary or spatially heterogeneous data. In this work, we seek to overcome these issues through (i) employing variational free energy approximations of GPs operating in tandem with online expectation propagation steps; and (ii) introducing a local splitting step which instantiates a new GP whenever the posterior distribution changes significantly as quantified by the Wasserstein metric over posterior distributions. Over time, then, this yields an ensemble of sparse GPs which may be updated incrementally, and adapts to locality, heterogeneity, and non-stationarity in training data.

[255] 2107.12838

Graph Autoencoders for Embedding Learning in Brain Networks and Major Depressive Disorder Identification

Brain functional connectivity (FC) reveals biomarkers for identification of various neuropsychiatric disorders. Recent application of deep neural networks (DNNs) to connectome-based classification mostly relies on traditional convolutional neural networks using input connectivity matrices on a regular Euclidean grid. We propose a graph deep learning framework to incorporate the non-Euclidean information about graph structure for classifying functional magnetic resonance imaging (fMRI)- derived brain networks in major depressive disorder (MDD). We design a novel graph autoencoder (GAE) architecture based on the graph convolutional networks (GCNs) to embed the topological structure and node content of large-sized fMRI networks into low-dimensional latent representations. In network construction, we employ the Ledoit-Wolf (LDW) shrinkage method to estimate the high-dimensional FC metrics efficiently from fMRI data. We consider both supervised and unsupervised approaches for the graph embedded learning. The learned embeddings are then used as feature inputs for a deep fully-connected neural network (FCNN) to discriminate MDD from healthy controls. Evaluated on a resting-state fMRI MDD dataset with 43 subjects, results show that the proposed GAE-FCNN model significantly outperforms several state-of-the-art DNN methods for brain connectome classification, achieving accuracy of 72.50% using the LDW-FC metrics as node features. The graph embeddings of fMRI FC networks learned by the GAE also reveal apparent group differences between MDD and HC. Our new framework demonstrates feasibility of learning graph embeddings on brain networks to provide discriminative information for diagnosis of brain disorders.

[256] 2107.12869

A Simplified Framework for Air Route Clustering Based on ADS-B Data

The volume of flight traffic gets increasing over the time, which makes the strategic traffic flow management become one of the challenging problems since it requires a lot of computational resources to model entire traffic data. On the other hand, Automatic Dependent Surveillance - Broadcast (ADS-B) technology has been considered as a promising data technology to provide both flight crews and ground control staff the necessary information safely and efficiently about the position and velocity of the airplanes in a specific area. In the attempt to tackle this problem, we presented in this paper a simplified framework that can support to detect the typical air routes between airports based on ADS-B data. Specifically, the flight traffic will be classified into major groups based on similarity measures, which helps to reduce the number of flight paths between airports. As a matter of fact, our framework can be taken into account to reduce practically the computational cost for air flow optimization and evaluate the operational performance. Finally, in order to illustrate the potential applications of our proposed framework, an experiment was performed using ADS-B traffic flight data of three different pairs of airports. The detected typical routes between each couple of airports show promising results by virtue of combining two indices for measuring the clustering performance and incorporating human judgment into the visual inspection.

[257] 2107.12878

Linear Prediction Residual for Efficient Diagnosis of Parkinson's Disease from Gait

Parkinson's Disease (PD) is a chronic and progressive neurological disorder that results in rigidity, tremors and postural instability. There is no definite medical test to diagnose PD and diagnosis is mostly a clinical exercise. Although guidelines exist, about 10-30% of the patients are wrongly diagnosed with PD. Hence, there is a need for an accurate, unbiased and fast method for diagnosis. In this study, we propose LPGNet, a fast and accurate method to diagnose PD from gait. LPGNet uses Linear Prediction Residuals (LPR) to extract discriminating patterns from gait recordings and then uses a 1D convolution neural network with depth-wise separable convolutions to perform diagnosis. LPGNet achieves an AUC of 0.91 with a 21 times speedup and about 99% lesser parameters in the model compared to the state of the art. We also undertake an analysis of various cross-validation strategies used in literature in PD diagnosis from gait and find that most methods are affected by some form of data leakage between various folds which leads to unnecessarily large models and inflated performance due to overfitting. The analysis clears the path for future works in correctly evaluating their methods.

[258] 2107.12889

Improved-Mask R-CNN: Towards an Accurate Generic MSK MRI instance segmentation platform (Data from the Osteoarthritis Initiative)

Objective assessment of Magnetic Resonance Imaging (MRI) scans of osteoarthritis (OA) can address the limitation of the current OA assessment. Segmentation of bone, cartilage, and joint fluid is necessary for the OA objective assessment. Most of the proposed segmentation methods are not performing instance segmentation and suffer from class imbalance problems. This study deployed Mask R-CNN instance segmentation and improved it (improved-Mask R-CNN (iMaskRCNN)) to obtain a more accurate generalized segmentation for OA-associated tissues. Training and validation of the method were performed using 500 MRI knees from the Osteoarthritis Initiative (OAI) dataset and 97 MRI scans of patients with symptomatic hip OA. Three modifications to Mask R-CNN yielded the iMaskRCNN: adding a 2nd ROIAligned block, adding an extra decoder layer to the mask-header, and connecting them by a skip connection. The results were assessed using Hausdorff distance, dice score, and coefficients of variation (CoV). The iMaskRCNN led to improved bone and cartilage segmentation compared to Mask RCNN as indicated with the increase in dice score from 95% to 98% for the femur, 95% to 97% for tibia, 71% to 80% for femoral cartilage, and 81% to 82% for tibial cartilage. For the effusion detection, dice improved with iMaskRCNN 72% versus MaskRCNN 71%. The CoV values for effusion detection between Reader1 and Mask R-CNN (0.33), Reader1 and iMaskRCNN (0.34), Reader2 and Mask R-CNN (0.22), Reader2 and iMaskRCNN (0.29) are close to CoV between two readers (0.21), indicating a high agreement between the human readers and both Mask R-CNN and iMaskRCNN. Mask R-CNN and iMaskRCNN can reliably and simultaneously extract different scale articular tissues involved in OA, forming the foundation for automated assessment of OA. The iMaskRCNN results show that the modification improved the network performance around the edges.

[259] 2107.12906

A rigorous formulation of and partial results on Lorenz's "consensus strikes back" phenomenon for the Hegselmann-Krause model

In a 2006 paper, Jan Lorenz observed a curious behaviour in numerical simulations of the Hegselmann-Krause model: Under some circumstances, making agents more closed-minded can produce a consensus from a dense configuration of opinions which otherwise leads to fragmentation. Suppose one considers initial opinions equally spaced on an interval of length $L$. As first observed by Lorenz, simulations suggest that there are three intervals $[0, L_1)$, $(L_1, L_2)$ and $(L_2, L_3)$, with $L_1 \approx 5.23$, $L_2 \approx 5.67$ and $L_3 \approx 6.84$ such that, when the number of agents is sufficiently large, consensus occurs in the first and third intervals, whereas for the second interval the system fragments into three clusters. In this paper, we prove consensus for $L \leq 5.2$ and for $L$ sufficiently close to 6. These proofs include large computations and in principle the set of $L$ for which consensus can be proven using our approach may be extended with the use of more computing power. We also prove that the set of $L$ for which consensus occurs is open. Moreover, we prove that, when consensus is assured for the equally spaced systems, this in turn implies asymptotic almost sure consensus for the same values of $L$ when initial opinions are drawn independently and uniformly at random. We thus conjecture a pair of phase transitions, making precise the formulation of Lorenz's "consensus strikes back" hypothesis. Our approach makes use of the continuous agent model introduced by Blondel, Hendrickx and Tsitsiklis. Indeed, one contribution of the paper is to provide a presentation of the relationships between the three different models with equally spaced, uniformly random and continuous agents, respectively, which is more rigorous than what can be found in the existing literature.

[260] 2107.12915

Initial Foundation for Predicting Individual Earthquake's Location and Magnitude by Using Glass-Box Physics Rule Learner

Although researchers accumulated knowledge about seismogenesis and decades-long earthquake data, predicting imminent individual earthquakes at a specific time and location remains a long-standing enigma. This study hypothesizes that the observed data conceal the hidden rules which may be unraveled by a novel glass-box (as opposed to black-box) physics rule learner (GPRL) framework. Without any predefined earthquake-related mechanisms or statistical laws, GPRL's two essentials, convolved information index and transparent link function, seek generic expressions of rules directly from data. GPRL's training with 10-years data appears to identify plausible rules, suggesting a combination of the pseudo power and the pseudo vorticity of released energy in the lithosphere. Independent feasibility test supports the promising role of the unraveled rules in predicting earthquakes' magnitudes and their specific locations. The identified rules and GPRL are in their infancy requiring substantial improvement. Still, this study hints at the existence of the data-guided hidden pathway to imminent individual earthquake prediction.

[261] 2107.12970

A Data-driven feature selection and machine-learning model benchmark for the prediction of longitudinal dispersion coefficient

Longitudinal Dispersion(LD) is the dominant process of scalar transport in natural streams. An accurate prediction on LD coefficient(Dl) can produce a performance leap in related simulation. The emerging machine learning(ML) techniques provide a self-adaptive tool for this problem. However, most of the existing studies utilize an unproved quaternion feature set, obtained through simple theoretical deduction. Few studies have put attention on its reliability and rationality. Besides, due to the lack of comparative comparison, the proper choice of ML models in different scenarios still remains unknown. In this study, the Feature Gradient selector was first adopted to distill the local optimal feature sets directly from multivariable data. Then, a global optimal feature set (the channel width, the flow velocity, the channel slope and the cross sectional area) was proposed through numerical comparison of the distilled local optimums in performance with representative ML models. The channel slope is identified to be the key parameter for the prediction of LDC. Further, we designed a weighted evaluation metric which enables comprehensive model comparison. With the simple linear model as the baseline, a benchmark of single and ensemble learning models was provided. Advantages and disadvantages of the methods involved were also discussed. Results show that the support vector machine has significantly better performance than other models. Decision tree is not suitable for this problem due to poor generalization ability. Notably, simple models show superiority over complicated model on this low-dimensional problem, for their better balance between regression and generalization.

[262] 2107.12974

Practical quantum multiparty signatures using quantum key distribution networks

Digital signatures are widely used for providing security of communications. At the same time, the security of currently deployed digital signature protocols is based on unproven computational assumptions. An efficient way to ensure an unconditional (information-theoretic) security of communication is to use quantum key distribution (QKD), whose security is based on laws of quantum mechanics. In this work, we develop an unconditionally secure signatures (USS) scheme that guarantees authenticity and transferability of arbitrary length messages in a QKD network. In the proposed setup, the QKD network consists of two subnetworks: (i) the internal network that includes the signer and with limitation on the number of malicious nodes, and (ii) the external one that has no assumptions on the number of malicious nodes. A price of the absence of the trust assumption in the external subnetwork is a necessity of the assistance from internal subnetwork recipients for the verification of message-signature pairs by external subnetwork recipients. We provide a comprehensive security analysis of the developed scheme, perform an optimization of the scheme parameters with respect to the secret key consumption, and demonstrate that the developed scheme is compatible with the capabilities of currently available QKD devices.

[263] 2107.12975

Cross-architecture Tuning of Silicon and SiGe-based Quantum Devices Using Machine Learning

The potential of Si and SiGe-based devices for the scaling of quantum circuits is tainted by device variability. Each device needs to be tuned to operation conditions. We give a key step towards tackling this variability with an algorithm that, without modification, is capable of tuning a 4-gate Si FinFET, a 5-gate GeSi nanowire and a 7-gate SiGe heterostructure double quantum dot device from scratch. We achieve tuning times of 30, 10, and 92 minutes, respectively. The algorithm also provides insight into the parameter space landscape for each of these devices. These results show that overarching solutions for the tuning of quantum devices are enabled by machine learning.

[264] 2107.12978

Optimizing Operating Points for High Performance Lesion Detection and Segmentation Using Lesion Size Reweighting

There are many clinical contexts which require accurate detection and segmentation of all focal pathologies (e.g. lesions, tumours) in patient images. In cases where there are a mix of small and large lesions, standard binary cross entropy loss will result in better segmentation of large lesions at the expense of missing small ones. Adjusting the operating point to accurately detect all lesions generally leads to oversegmentation of large lesions. In this work, we propose a novel reweighing strategy to eliminate this performance gap, increasing small pathology detection performance while maintaining segmentation accuracy. We show that our reweighing strategy vastly outperforms competing strategies based on experiments on a large scale, multi-scanner, multi-center dataset of Multiple Sclerosis patient images.