Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of The 12th Asian Conference on Machine Learning Held in Bangkok, Thailand on 18-20 November 2020 Published as Volume 129 by the Proceedings of Machine Learning Research on 25 September 2020. Volume Edited by: Sinno Jialin Pan Masashi Sugiyama Series Editors: Neil D. Lawrence Mark Reid https://proceedings.mlr.press/v129/ Thu, 28 Aug 2025 08:26:52 +0000 Thu, 28 Aug 2025 08:26:52 +0000 Jekyll v3.10.0 AARM: Action Attention Recalibration Module for Action Recognition Most of Action recognition methods deploy networks pretrained on image datasets, and a common limitation is that these networks hardly capture salient features of the video clip due to their training strategies. To address this issue, we propose Action Attention Recalibration Module (AARM), a lightweight but effective module which introduces the attention mechanism to process feature maps of the network. The proposed module is composed of two novel components: 1) convolutional attention submodule that obtains inter-channel attention maps and spatial-temporal attention maps during the convolutional stage, and 2) activation attention submodule that highlights the significant activations in the fully connected process. Based on ablation studies and extensive experiments, we demonstrate that AARM enables networks to be sensitive on informative parts and gain accuracy increasements, achieving the state-of-the-art performance on UCF101 and HMDB51. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/zhonghong20a.html https://proceedings.mlr.press/v129/zhonghong20a.html Dual Learning: Theoretical Study and an Algorithmic Extension Dual learning has been successfully applied in many machine learning applications including machine translation, image-to-image transformation, etc. The high-level idea of dual learning is very intuitive: if we map an $x$ from one domain to another and then map it back, we should recover the original $x$. Although its effectiveness has been empirically verified, theoretical understanding of dual learning is still very limited. In this paper, we aim at understanding why and when dual learning works. Based on our theoretical analysis, we further extend dual learning by introducing more related mappings and propose multi-step dual learning, in which we leverage feedback signals from additional domains to improve the qualities of the mappings. We prove that multi-step dual learning can boost the performance of standard dual learning under mild conditions. Experiments on WMT 14 English↔German and MultiUN English↔French translations verify our theoretical findings on dual learning, and the results on the translations among English, French, and Spanish of MultiUN demonstrate the effectiveness of multi-step dual learning Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/zhao20a.html https://proceedings.mlr.press/v129/zhao20a.html A One-step Approach to Covariate Shift Adaptation A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. However, such an assumption is often violated in the real world due to non-stationarity of the environment or bias in sample selection. In this work, we consider a prevalent setting called covariate shift, where the input distribution differs between the training and test stages while the conditional distribution of the output given the input remains unchanged. Most of the existing methods for covariate shift adaptation are two-step approaches, which first calculate the importance weights and then conduct importance-weighted empirical risk minimization. In this paper, we propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization by minimizing an upper bound of the test risk. We theoretically analyze the proposed method and provide a generalization error bound. We also empirically demonstrate the effectiveness of the proposed method. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/zhang20a.html https://proceedings.mlr.press/v129/zhang20a.html Efficient Attention Calibration Network for Real-Time Semantic Segmentation In recent years, the attention mechanism has been widely used in computer vision. Semantic segmentation, as one of the fundamental tasks of computer vision, has been subject to tremendous development as a result. But because of its huge computing overhead, attention-based approaches are difficult to use for real-time applications such as self-driving. In this paper, we propose a self-calibration method baesd on self-attentiion that successfully applies the attention mechanism to real-time semantic segmentation. Specifically, a spatial attention module to adjust the edges of the coarse segmentation results which gained from the real-time semantic segmentation backbone network, and obtain more granular segmentation results. We refer to this method as the Efficient Attentional Calibration Network (EACNet). Experiments on the Cityscapes dataset validate the efficiency and performance of the method. With the high-resolution input and without any post-processing, EACNet achieved 72.4% mIoU of accuracy while running at 116.9 FPS. Compared to other state-of-the-art methods for real-time semantic segmentation, our network gained a better balance between performance and speed. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/zha20a.html https://proceedings.mlr.press/v129/zha20a.html Geodesically-convex optimization for averaging partially observed covariance matrices Symmetric positive definite (SPD) matrices permeates numerous scientific disciplines, including machine learning, optimization, and signal processing. Equipped with a Riemannian geometry, the space of SPD matrices benefits from compelling properties and its derived Riemannian mean is now the gold standard in some applications, e.g. brain-computer interfaces (BCI). This paper addresses the problem of averaging covariance matrices with missing variables. This situation often occurs with inexpensive or unreliable sensors, or when artifact-suppression techniques remove corrupted sensors leading to rank deficient matrices, hindering the use of the Riemannian geometry in covariance-based approaches. An alternate but questionable method consists in removing the matrices with missing variables, thus reducing the training set size. We address those limitations and propose a new formulation grounded in geodesic convexity. Our approach is evaluated on generated datasets with a controlled number of missing variables and a known baseline, demonstrating the robustness of the proposed estimator. The practical interest of this approach is assessed on real BCI datasets. Our results show that the proposed average is more robust and better suited for classification than classical data imputation methods. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/yger20a.html https://proceedings.mlr.press/v129/yger20a.html NENN: Incorporate Node and Edge Features in Graph Neural Networks Graph neural networks (GNNs) have attracted an increasing attention in recent years. However, most existing state-of-the-art graph learning methods only focus on node features and largely ignore the edge features that contain rich information about graphs in modern applications. In this paper, we propose a novel model to incorporate Node and Edge features in graph Neural Networks (NENN) based on a hierarchical dual-level attention mechanism. NENN consists of node-level attention layer and edge-level attention layer. The two types of layers of NENN are alternately stacked to learn and aggregate embeddings for nodes and edges. Specifically, the node-level attention layer aims to learn the importance of the node based neighbors and edge based neighbors for each node, while the edge-level attention layer is able to learn the importance of the node based neighbors and edge based neighbors for each edge. Leveraging the proposed NENN, the node and edge embeddings can be mutually reinforced. Extensive experiments on academic citation and molecular networks have verified the effectiveness of our proposed graph embedding model. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/yang20a.html https://proceedings.mlr.press/v129/yang20a.html Disentangled Representations for Sequence Data using Information Bottleneck Principle We propose the factorizing variational autoencoder (FAVAE), a generative model for learning dis- entangled representations from sequential data via the information bottleneck principle without supervision. Real-world data are often generated by a few explanatory factors of variation, and disentangled representation learning obtains these factors from the data. We focus on the disen- tangled representation of sequential data which can be useful in a wide range of applications, such as video, speech, and stock markets. Factors in sequential data are categorized into dynamic and static ones: dynamic factors are time dependent, and static factors are time independent. Previous models disentangle between static and dynamic factors and between dynamic factors with different time dependencies by explicitly modeling the priors of latent variables. However, these models cannot disentangle representations between dynamic factors with the same time dependency, such as disentangling “picking up” and “throwing” in robotic tasks. On the other hand, FAVAE can disentangle multiple dynamic factors via the information bottleneck principle where it does not require modeling priors. We conducted experiments to show that FAVAE can extract disentangled dynamic factors on synthetic, video, and speech datasets. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/yamada20a.html https://proceedings.mlr.press/v129/yamada20a.html Proxy Network for Few Shot Learning The use of a few examples for each class to train a predictive model that can be generalizedto novel classes is a crucial and valuable research direction in artificial intelligence. Thiswork addresses this problem by proposing a few-shot learning (FSL) algorithm called proxynetwork under the architecture of meta-learning. Metric-learning based approaches assumethat the data points within the same class should be close, whereas the data points inthe different classes should be separated as far as possible in the embedding space. Weconclude that the success of metric-learning based approaches lies in the data embedding,the representative of each class, and the distance metric. In this work, we propose asimple but effective end-to-end model that directly learns proxies for class representativeand distance metric from data simultaneously. We conduct experiments on CUB andmini-ImageNet datasets in 1-shot-5-way and 5-shot-5-way scenarios, and the experimentalresults demonstrate the superiority of our proposed method over state-of-the-art methods.Besides, we provide a detailed analysis of our proposed method. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/xiao20a.html https://proceedings.mlr.press/v129/xiao20a.html Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks Currently it is well known that deep neural networks are vulnerable to adversarial examples, constructed by applying small but malicious perturbations to the original inputs. Moreover, the perturbed inputs can transfer between different models: adversarial examples generated based on a specific model will often fool other unseen models with a significant success rate. This allows the adversary to leverage it to attack the deployed systems without any query, which could raise severe security issue particularly in safety-critical scenarios. In this work, we empirically investigate two classes of factors that might influence the transferability of adversarial examples. One is about model-specific factors, including network architecture, model capacity and test accuracy. The other is the local smoothness of loss surface for generating adversarial examples. More importantly, relying on these findings on the transferability of adversarial examples, we propose a simple but effective strategy to improve the transferability, whose effectiveness is confirmed through extensive experiments on both CIFAR-10 and ImageNet datasets. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/wu20a.html https://proceedings.mlr.press/v129/wu20a.html Robust Document Distance with Wasserstein-Fisher-Rao metric Computing the distance among linguistic objects is an essential problem in natural language processing. The word mover’s distance (WMD) has been successfully applied to measure the document distance by synthesizing the low-level word similarity with the framework of optimal transport (OT). However, due to the global transportation nature of OT, the WMD may overestimate the semantic dissimilarity when documents contain unequal semantic details. In this paper, we propose to address this overestimation issue with a novel Wasserstein-Fisher-Rao (WFR) document distance grounded on unbalanced optimal transport theory. Compared to the WMD, the WFR document distance provides a trade-off between global transportation and local truncation, which leads to a better similarity measure for unequal semantic details. Moreover, an efficient prune strategy is particularly designed for the WFR document distance to facilitate the top-k queries among a large number of documents. Extensive experimental results show that the WFR document distance achieves higher accuracy that WMD and even its supervised variation s-WMD. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/wang20c.html https://proceedings.mlr.press/v129/wang20c.html Inferring Continuous Treatment Doses from Historical Data via Model-Based Entropy-Regularized Reinforcement Learning Developments in Reinforcement Learning and the availability of healthcare data sources such as Electronic Health Records (EHR) provide an opportunity to derive data-driven treatment dose recommendations for patients and improve clinical outcomes. Recent studies have focused on deriving discretized dosages using offline historical data extracted from EHR. In this paper, we propose an Actor-Critic framework to infer continuous dosage for treatment recommendation and demonstrate its advantage in numerical stability as well as interpretability. In addition, we incorporate a Bayesian Neural Network as a simulation model and probability-based regularization techniques to alleviate the distribution shift in off-line learning environments to increase practical safety. Experiments on a real-world EHR data set, MIMIC-III, show that our approach can achieve improved performance while maintaining similarity to expert clinician treatments in comparison to other baseline methods. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/wang20b.html https://proceedings.mlr.press/v129/wang20b.html Deep Dynamic Boosted Forest Random forest is widely exploited as an ensemble learning method. In many practical applications, however, there is still a significant challenge to learn from imbalanced data. To alleviate this limitation, we propose a deep dynamic boosted forest (DDBF), a novel ensemble algorithm that incorporates the notion of hard example mining into random forest. Speciﬁcally, we propose to measure the quality of each leaf node of every decision tree in the random forest to determine hard examples. By iteratively training and then removing easy examples from training data, we evolve the random forest to focus on hard examples dynamically so as to balance the proportion of samples and learn decision boundaries better. Data can be cascaded through these random forests learned in each iteration in sequence to generate more accurate predictions. Our DDBF outperforms random forest on 5 UCI datasets, MNIST and SATIMAGE, and achieved state-of-the-art results compared to other deep models. Moreover, we show that DDBF is also a new way of sampling and can be very useful and efficient when learning from imbalanced data. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/wang20a.html https://proceedings.mlr.press/v129/wang20a.html Thompson Sampling for Unsupervised Sequential Selection Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learner’s goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss for the selected arm. Clearly, learning is feasible only if the optimal arm can be inferred from the problem structure. As shown in the prior work, learning is possible when the problem instance satisfies the so-called ‘Weak Dominance’ (WD) property. Under WD, we show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/verma20a.html https://proceedings.mlr.press/v129/verma20a.html Learning from Label Proportions with Consistency Regularization The problem of learning from label proportions (LLP) involves training classifiers with weak labels on bags of instances, rather than strong labels on individual instances. The weak labels only contain the label proportion of each bag. The LLP problem is important for many practical applications that only allow label proportions to be collected because of data privacy or annotation cost, and has recently received lots of research attention. Most existing works focus on extending supervised learning models to solve the LLP problem, but the weak learning nature makes it hard to further improve LLP performance with a supervised angle. In this paper, we take a different angle from semi-supervised learning. In particular, we propose a novel model inspired by consistency regularization, a popular concept in semi-supervised learning that encourages the model to produce a decision boundary that better describes the data manifold. With the introduction of consistency regularization, we further extend our study to non-uniform bag-generation and validation-based parameter-selection procedures that better match practical needs. Experiments not only justify that LLP with consistency regularization achieves superior performance, but also demonstrate the practical usability of the proposed procedures. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/tsai20a.html https://proceedings.mlr.press/v129/tsai20a.html Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis Algorithm selection (AS) deals with the automatic selection of an algorithm from a fixed set of candidate algorithms most suitable for a specific instance of an algorithmic problem class, where “suitability” often refers to an algorithm’s runtime. Due to possibly extremely long runtimes of candidate algorithms, training data for algorithm selection models is usually generated under time constraints in the sense that not all algorithms are run to completion on all instances. Thus, training data usually comprises censored information, as the true runtime of algorithms timed out remains unknown. However, many standard AS approaches are not able to handle such information in a proper way. On the other side, survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime, as we demonstrate in this work. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. Moreover, taking advantage of a framework of this kind, we advocate a risk-averse approach to algorithm selection, in which the avoidance of a timeout is given high priority. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/tornede20a.html https://proceedings.mlr.press/v129/tornede20a.html Polytime Decomposition of Generalized Submodular Base Polytopes with Efficient Sampling We consider the problem of efficient decomposition of a given point $x$ in an $n$-dimensional convex polytope into convex combination of its extreme points. Besides the widespread scopes of the problem in theory of convex polytopes in mathematics, the problem also has applications in online combinatorial optimization problems. Towards this we first propose a general class of convex polytopes–Generalized Submodular Base Polytopes (GSBPs)–that includes several well known convex polytopes as its special case including permutahedron, $k$-forest, spanning tree, combinatorial subset choice polytopes. We next propose a general decomposition algorithm for above class of GSBPs that uses the novel idea of first decomposing the given point into at most $n$ \emph{face centers}, and further decomposing each face center into \emph{extreme points} of their corresponding faces. In addition, we discover a few special class of \emph{partition-respecting} and \emph{symmetric} GSBPs for which the above two steps could be performed in respectively $O(n^2 + nT(f))$ and $O(n^2T(f))$ time. We also give a complete characterization of the underlying submodular function $f$, for which the associated GSBP satisfies the above properties. One interesting fact is we show that the support of the resulting decomposition with our proposed algorithm is only $poly(n)$ in the number of extreme points which respects \emph{efficient sampling} from the resulting distribution. Finally we corroborate our theoretical results with empirical evaluations. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/saha20a.html https://proceedings.mlr.press/v129/saha20a.html Semantic-Guided Shared Feature Alignment for Occluded Person Re-IDentification Occluded Person Re-ID is a challenging task under resolved. Instead of extracting features over the entire image which would easily cause mismatching, we propose Semantic-Guided Shared Feature Alignment (SGSFA) method to extract features focusing on the non-occluded parts. SGSFA parses human body regions through Semantic Guided (SG) branch and aligns regions through Spatial Feature Alignment (SFA) branch simultaneously, and gets enriched representations over the regions for Re-ID. Dynamic classification loss of spatial features and their dynamical sequential combinations in the training stage help facilitate feature diversity. During the matching stage, we use only the visible feature shared by probe and gallery with no extra cues. The experiment results show that SGSFA achieves rank-1 of 62.3% and 50.5% respectively for Occluded-DukeMTMC and P-DukeMTMC-reID, surpassing the state-of-the-art by a large margin. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/ren20a.html https://proceedings.mlr.press/v129/ren20a.html A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning This paper proposes a Deep Reinforcement Learning (DRL) approach for solving knapsack problem. The proposed method consists of a state aggregation step based on tabular reinforcement learning to extract features and construct states. The state aggregation policy is applied to each problem instance of the knapsack problem, which is used with Advantage Actor Critic (A2C) algorithm to train a policy through which the items are sequentially selected at each time step. The method is a constructive solution approach and the process of selecting items is repeated until the final solution is obtained. The experiments show that our approach provides close to optimal solutions for all tested instances, outperforms the greedy algorithm, and is able to handle larger instances and more flexible than an existing DRL approach. In addition, the results demonstrate that the proposed model with the state aggregation strategy not only gives better solutions but also learns in less timesteps, than the one without state aggregation. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/refaei-afshar20a.html https://proceedings.mlr.press/v129/refaei-afshar20a.html A New Representation Learning Method for Individual Treatment Effect Estimation: Split Covariate Representation Network Individual treatment effect (ITE) estimation is widely used in many essential fields, such as medical and education. But two problems, unknown counterfactual outcome and confounder, are the barriers for making a good ITE estimation. Although some representation learning methods based on potential outcome framework have been proposed to solve the problems, we find that most of previous works assume all features (also named covariate) of a unit are confounders. However, this assumption is not easy to become true, because instrument variables, adjustment variables and irrelevant variables can also be included in features. Therefore, this paper proposes a simple method to split covariates, and then a network, Split Covariate Representation Network (SCRNet), is mentioned, which is used to estimate ITE by different kinds of variables. Experiment results show that our method outperforms other state-of-arts methods on IHDP, a semi-synthetic dataset, and Jobs, a real-world dataset. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/qidong20a.html https://proceedings.mlr.press/v129/qidong20a.html Scaling up Simhash The seminal work of (Charikar, 2002) gives a space efficient sketching algorithm (Simhash) which compresses real-valued vectors to binary vectors while maintaining an estimate of the Cosine similarity between any pairs of original real-valued vectors. In this work, we propose a sketching algorithm – Simsketch – that can be applied on top of the results obtained from Simhash. This further reduces the data dimension while maintaining an estimate of the Cosine similarity between original real-valued vectors. As a consequence, it helps in scaling up the performance of Simhash. We present theoretical bounds of our result and complement it with experimentation on public datasets. Our proposed algorithm is simple, efficient, and therefore can be adopted in practice. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/pratap20b.html https://proceedings.mlr.press/v129/pratap20b.html Randomness Efficient Feature Hashing for Sparse Binary Data We present sketching algorithms for sparse binary datasets, which maintain binary version of the dataset after sketching, while simultaneously preserving multiple similarity measures such as Jaccard Similarity, Cosine Similarity, Inner Product, and Hamming Distance, on the same sketch. A major advantage of our algorithms is that they are randomness efficient, and require significantly less number of random bits for sketching – logarithmic in dimension, while other competitive algorithms require linear in dimension. Our proposed algorithms are efficient, offer a compact sketch of the dataset, and can be efficiently deployed in a distributive setting. We present a theoretical analysis of our approach and complement them with extensive experimentations on public datasets. For analysis purposes, our algorithms require a natural assumption on the dataset. We empirically verify the assumption and notice that it holds on several real-world datasets. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/pratap20a.html https://proceedings.mlr.press/v129/pratap20a.html PSForest: Improving Deep Forest via Feature Pooling and Error Screening In recent years, most of the research on deep learning is based on deep neural networks, which uses the backpropagation algorithm to train parameters of nonlinear layers. Recently, a non-NN style deep model called Deep Forest or gcForest was proposed by Zhou and Feng, which is a deep learning model based on random forests and the training process does not rely on backpropagation. In this paper, we propose PSForest, which can be regarded as a modification of the standard Deep Forest. The main idea for improving the efficiency and performance of the Deep Forest is to do multi-grained pooling of raw features and screening the class vector of each layer based on out-of-bag error. The experiment on different datasets shows that our proposed model achieves predictive accuracy comparable to or better than gcForest, with lower memory requirement and smaller time cost. The study significantly improves the competitiveness of deep forests, further demonstrating that deep learning is more than just deep neural networks. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/ni20a.html https://proceedings.mlr.press/v129/ni20a.html Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning Guilt aversion induces experience of a utility loss in people if they believe they have disappointed others, and this promotes cooperative behaviour in human. In psychological game theory, guilt aversion necessitates modelling of agents that have theory about what other agents think, also known as Theory of Mind (ToM). We aim to build a new kind of affective reinforcement learning agents, called Theory of Mind Agents with Guilt Aversion (ToMAGA), which are equipped with an ability to think about the wellbeing of others instead of just self-interest. To validate the agent design, we use a general-sum game known as Stag Hunt as a test bed. As standard reinforcement learning agents could learn suboptimal policies in social dilemmas like Stag Hunt, we propose to use belief-based guilt aversion as a reward shaping mechanism. We show that our belief-based guilt averse agents can efficiently learn cooperative behaviours in Stag Hunt Games. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/nguyen20a.html https://proceedings.mlr.press/v129/nguyen20a.html Learning Code Changes by Exploiting Bidirectional Converting Deviation Software systems evolve with constant code changes when requirements change or bugs are found. Assessing the quality of code change is a vital part of software development. However, most existing software mining methods inspect software data from a static view and learn global code semantics from a snapshot of code, which cannot capture the semantic information of small changes and are under{-}representation for rich historical code changes. How to build a model to emphasize the code change remains a great challenge. In this paper, we propose a novel deep neural network called CCL, which models a forward converting process from the code before change to the code after change and a backward converting process inversely, and the change representations of bidirectional converting processes can be learned. By exploiting the deviation of the converting processes, the code change can be evaluated by the network. Experimental results on open source projects indicate that CCL significantly outperforms the compared methods in code change learning. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/mi20a.html https://proceedings.mlr.press/v129/mi20a.html Data-Dependent Conversion to a Compact Integer-Weighted Representation of a Weighted Voting Classifier We propose a method of converting a real-weighted voting classifier to a compact integer-weighted voting classifier. Real-weighted voting classifiers like those trained using boosting are very popular and widely used due to their high prediction performance. Real numbers, however, are space-consuming and its floating-point arithmetic is slow compared to integer arithmetic, so compact integer weights are preferable for implementation on devices with small computational resources. Our conversion makes use of given feature vectors and solves an integer linear programming problem that minimizes the sum of integer weights under the constraint of keeping the classification result for the vectors unchanged. According to our experimental results using datasets of UCI Machine Learning Repository, the bit representation sizes are reduced to $5.2$-$33.4$% within $3.7$% test accuracy degrade in 7 of 8 datasets for the weighted voting classifiers of decision stumps learned using AdaBoost-SAMME. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/maekawa20a.html https://proceedings.mlr.press/v129/maekawa20a.html MetAL: Active Semi-Supervised Learning on Graphs via Meta-Learning The objective of active learning (AL) is to train classification models with less labeled instances by selecting only the most informative instances for labeling. The AL algorithms designed for other data types such as images and text do not perform well on graph-structured data. Although a few heuristics-based AL algorithms have been proposed for graphs, a principled approach is lacking. In this paper, we propose MetAL, an AL approach that selects unlabeled instances that directly improve the future performance of a classification model. For a semi-supervised learning problem, we formulate the AL task as a bilevel optimization problem. Based on recent work in meta-learning, we use the meta-gradients to approximate the impact of retraining the model with any unlabeled instance on the model performance. Using multiple graph datasets belonging to different domains, we demonstrate that MetAL efficiently outperforms existing state-of-the-art AL algorithms. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/madhawa20a.html https://proceedings.mlr.press/v129/madhawa20a.html Localizing and Amortizing: Efficient Inference for Gaussian Processes The inference of Gaussian Processes concerns the distribution of the underlying function given observed data points. GP inference based on local ranges of data points is able to capture fine-scale correlations and allow fine-grained decomposition of the computation. Following this direction, we propose a new inference model that considers the correlations and observations of the K nearest neighbors for the inference at a data point. Compared with previous works, we also eliminate the data ordering prerequisite to simplify the inference process. Additionally, the inference task is decomposed to small subtasks with several technique innovations, making our model well suits the stochastic optimization. Since the decomposed small subtasks have the same structure, we further speed up the inference procedure with amortized inference. Our model runs efficiently and achieves good performances on several benchmark tasks. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/liu20b.html https://proceedings.mlr.press/v129/liu20b.html Network Representation Learning Algorithm Based on Neighborhood Influence Sequence Network representation learning (NRL) is playing an important role in network analysis, aiming to represent complex network more concisely by transforming nodes into low-dimensional vectors. However, most of the current work only uses network structure and node attribute to learn network representation, and often ignores the historical interactions between nodes that will affect the future interactions. Therefore, we propose a network representation learning algorithm based on neighborhood influence sequence (NIS), by investigating the influence of node historical interactions on future interactions. We propose three kinds of influence when two nodes interact, and integrate them into NIS by introducing the Hawkes process. In experiments, we compare our model with existing NRL models on four real-world datasets. Experimental results demonstrate that the embedding learned from the proposed NIS model achieve better performance than state-of-the-art methods in various tasks including node classification, link prediction, and network visualization. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/liu20a.html https://proceedings.mlr.press/v129/liu20a.html Atlas-aware ConvNet for Accurate yet Robust Anatomical Segmentation Convolutional networks (ConvNets) have achieved promising accuracy for various anatomical segmentation tasks. Despite the success, these methods can be sensitive to appearance variations that unforeseen from the training distributions. Considering the large variability of scans caused by artifacts, pathologies, and scanning setups, the robustness of ConvNets poses as a major challenge for their clinical applications, yet has not been much explored. In this paper, we propose to mitigate the challenge by enabling ConvNets’ awareness of the underlying anatomical invariances among imaging scans. Specifically, we introduce a fully convolutional Constraint Adoption Module (CAM) that incorporates probabilistic atlas priors as explicit constraints for predictions over a locally connected Conditional Random Field (CFR), which effectively reinforces the anatomical consistency of the labeling outputs. We design the CAM to be flexible for boosting various ConvNet, and compact for co-optimizing with ConvNets for fusion parameters that leads to the optimal performance. We show the advantage of such atlas priors fusion is two-fold with two brain parcellation tasks. First, our models achieve state-of-the-art accuracy among ConvNet-based methods on both datasets, by significantly reducing structural abnormalities of predictions. Second, we can largely boost the robustness of existing ConvNets, proved by: (i) testing on scans with synthetic pathologies, and (ii) training and evaluation on scans of different scanning setups across datasets. Our method is proposing to be easily adopted to existing ConvNets by fine-tuning with CAM plugged in for accuracy and robustness boosts. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/liang20a.html https://proceedings.mlr.press/v129/liang20a.html Scalable Calibration of Affinity Matrices from Incomplete Observations Estimating pairwise affinity matrices for given data samples is a basic problem in data processing applications. Accurately determining the affinity becomes impossible when the samples are not fully observed and approximate estimations have to be sought. In this paper, we investigated calibration approaches to improve the quality of an approximate affinity matrix. By projecting the matrix onto a closed and convex subset of matrices that meets specific constraints, the calibrated result is guaranteed to get nearer to the unknown true affinity matrix than the un-calibrated matrix, except in rare cases they are identical. To realize the calibration, we developed two simple, efficient, and yet effective algorithms that scale well. One algorithm applies cyclic updates and the other algorithm applies parallel updates. In a series of evaluations, the empirical results justified the theoretical benefits of the proposed algorithms, and demonstrated their high potential in practical applications. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/li20b.html https://proceedings.mlr.press/v129/li20b.html DFQF: Data Free Quantization-aware Fine-tuning Data free deep neural network quantization is a practical challenge, since the original training data is often unavailable due to some privacy, proprietary or transmission issues. The existing methods implicitly equate data-free with training-free and quantize model manually through analyzing the weights’ distribution. It leads to a significant accuracy drop in lower than 6-bit quantization. In this work, we propose the data free quantization-aware fine-tuning (DFQF), wherein no real training data is required, and the quantized network is fine-tuned with generated images. Specifically, we start with training a generator from the pre-trained full-precision network with inception score loss, batch-normalization statistics loss and adversarial loss to synthesize a fake image set. Then we fine-tune the quantized student network with the full-precision teacher network and the generated images by utilizing knowledge distillation (KD). The proposed DFQF outperforms state-of-the-art post-train quantization methods, and achieve W4A4 quantization of ResNet20 on the CIFAR10 dataset within 1% accuracy drop. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/li20a.html https://proceedings.mlr.press/v129/li20a.html Monte-Carlo Graph Search: the Value of Merging Similar States We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/leurent20a.html https://proceedings.mlr.press/v129/leurent20a.html Scalable Inference on the Soft Affiliation Graph Model for Overlapping Community Detection The Soft Affiliation Graph model (S-AGM) is a Bayesian generative model of overlapping community structure in social networks. Inference on this model is challenging due to the complexity of both the underlying network structure and the presence of non-conjugacy in the model. Scalable MCMC on the model is possible through the use of Stochastic Gradient Riemannian Langevin Dynamics (SGRLD). In this paper, we develop a novel and scalable Stochastic Gradient Variational Inference (SG-VI) algorithm and compare it to SGRLD inference. Similarly to MCMC inference, handling non-conjugacy in the S-AGM is a significant challenge for developing an SG-VI and requires the application of stochastic Monte Carlo estimation. We carry out a thorough empirical comparison of the SG-VI and SGRLD approaches, and draw some general conclusions about scalable inference on the S-AGM. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/laitonjam20a.html https://proceedings.mlr.press/v129/laitonjam20a.html Bridging Ordinary-Label Learning and Complementary-Label Learning A supervised learning framework has been proposed for the situation where each trainingdata is provided with a complementary label that represents a class to which the pattern does not belong. In the existing literature, complementary-label learning has been studied independently from ordinary-label learning, which assumes that each training data is provided with a label representing the class to which the pattern belongs. However, providing a complementary label should be treated as equivalent to providing the rest of all the labels as the candidates of the one true class. In this paper, we focus on the fact that the loss functions for one-versus-all and pairwise classification corresponding to ordinary-label learning and complementary-label learning satisfy certain additivity and duality, and provide a framework which directly bridge those existing supervised learning frameworks. Further, we derive classification risk and error bound for any loss functions which satisfy additivity and duality. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/katsura20a.html https://proceedings.mlr.press/v129/katsura20a.html Foolproof Cooperative Learning This paper extends the notion of learning algorithms and learning equilibriums from repeated games theory to stochastic games. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to an equilibrium strategy that allows cooperative strategies in self-play setting while being not exploitable by selfish learners. By construction, FCL is a learning equilibrium for repeated symmetric games. We illustrate the behavior of FCL on symmetric matrix and grid games, and its robustness to selfish learners. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/jacq20a.html https://proceedings.mlr.press/v129/jacq20a.html Partially Observable Markov Decision Process Modelling for Assessing Hierarchies Hierarchical clustering has been shown to be valuable in many scenarios. Despite its usefulness to many situations, there is no agreed methodology on how to properly evaluate the hierarchies produced from different techniques, particularly in the case where ground-truth labels are unavailable. This motivates us to propose a framework for assessing the quality of hierarchical clustering allocations which covers the case of no ground-truth information. This measurement is useful, e.g., to assess the hierarchical structures used by online retailer websites to display their product catalogues. Our framework is one of the few attempts for the hierarchy evaluation from a decision theoretic perspective. We model the process as a bot searching stochastically for items in the hierarchy and establish a measure representing the degree to which the hierarchy supports this search. We employ Partially Observable Markov Decision Processes (POMDP) to model the uncertainty, the decision making, and the cognitive return for searchers in such a scenario. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/huang20a.html https://proceedings.mlr.press/v129/huang20a.html Enhancing Topic Models by Incorporating Explicit and Implicit External Knowledge Topic models are widely used for extracting latent features from documents. Conventional count-based models like LDA focus on co-occurrence of words, neglecting features like semantics and lexical relations in the corpora. To overcome this drawback, many knowledge-enhanced models are proposed, attempting to achieve better topic coherence with external knowledge. In this paper, we present novel probabilistic topic models utilizing both explicit and implicit knowledge forms. Knowledge of real-world entities in a knowledge base/graph are referred to as explicit knowledge. We incorporate this knowledge form into our models by entity linking, a technique for bridging the gap between corpora and knowledge bases. This helps solving the problem of token/phrase-level synonymy and polysemy. Apart from explicit knowledge, we utilize latent feature word representations (implicit knowledge) to further capture lexical relations in pretraining corpora. Qualitative and Quantitative evaluations are conducted on 2 datasets with 5 baselines (3 probabilistic models and 2 neural models). Our models exhibit high potential in generating coherent topics. Remarkably, when adopting both explicit and implicit knowledge, our proposed model even outperforms 2 state-of-the-art neural topic models, suggesting that knowledge-enhancement can highly improve the performance of conventional topic models. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/hong20a.html https://proceedings.mlr.press/v129/hong20a.html CCA-Flow: Deep Multi-view Subspace Learning with Inverse Autoregressive Flow Multi-view subspace learning aims to learn a shared representation from multiple sources or views of an entity. The learned representation enables reconstruction of common patterns of multi-view data, which helps dimensional reduction, exploratory data analysis, missing view completion, and various downstream tasks. However, existing methods often use simple structured approximations of the posterior of shared latent variables for the sake of computational efficiency. Such oversimplified models have a huge impact on the inference quality and can hurt the representation power. To this end, we propose a new method for multi-view subspace learning that achieves efficient Bayesian inference with strong representation power. Our method, coined CCA-Flow, bases on variational Canonical Correlation Analysis and models the inference network as an Inverse Autoregressive Flow (IAF). With the flow-based variational inference imposed on the latent variables, the posterior approximations can be arbitrarily complex and flexible, and the model can still be efficiently trained with stochastic gradient descent. Experiments on three benchmark multi-view datasets show that our model gives improved representations of shared latent variables and has superior performance against previous works. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/he20a.html https://proceedings.mlr.press/v129/he20a.html Robust Deep Ordinal Regression under Label Noise The real-world data is often susceptible to label noise, which might constrict the effectiveness of the existing state of the art algorithms for ordinal regression. Existing works on ordinal regression do not take label noise into account. We propose a theoretically grounded approach for class conditional label noise in ordinal regression problems. We present a deep learning implementation of two commonly used loss functions for ordinal regression that is both - 1) robust to label noise, and 2) rank consistent for a good ranking rule. We verify these properties of the algorithm empirically and show robustness to label noise on real data and rank consistency. To the best of our knowledge, this is the first approach for robust ordinal regression models. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/garg20a.html https://proceedings.mlr.press/v129/garg20a.html FIREPruning: Learning-based Filter Pruning for Convolutional Neural Network Compression Despite their great success in various fields, modern convolutional neural networks (CNNs) require huge amount of computation in inference due to their deeper network structure, which prevents them from being used in resource-limited devices such as mobile phones and embedded sensors. Recently, filter pruning had been introduced as a promising model compression method to reduce computation cost and storage overhead. However, existing filter pruning approaches are mainly model-based, which rely on empirical model to evaluate the importance of filters and set parameters manually to guide model compression. In this paper, we observe that CNNs commonly consist of large amount of inactive filters, and introduce Filter Inactive RatE (FIRE), a novel metric to evaluate the importance of filters in a neural network. Based on FIRE, we develop a learning based filter pruning strategy called FIREPruning for fast model compression. It adopts a regression model to predict the FIRE value and uses a three stage pipeline (FIRE prediction, pruning, and fine-tuning) to compress the neural network efficiently. Extensive experiments based on widely-used CNN models and well-known datasets show that FIREPruning reduces overall computation cost up to 86.9% without sacrificing too much accuracy, which significantly outperforms the state-of-the-art model compression methods. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/fang20a.html https://proceedings.mlr.press/v129/fang20a.html Boosting-Based Reliable Model Reuse We study the following model reuse problem: a learner needs to select a subset of models from a model pool to classify an unlabeled dataset without accessing the raw training data of the models. Under this situation, it is challenging to properly estimate the reusability of the models in the pool. In this work, we consider the model reuse protocol under which the learner receives specifications of the models, including reusability indicators to verify the models’ prediction accuracy on any unlabeled instances. We propose MoreBoost, a simple yet powerful boosting algorithm to achieve effective model reuse under the idealized assumption that the reusability indicators are noise-free. When the reusability indicators are noisy, we strengthen MoreBoost with an active rectification mechanism, allowing the learner to query ground-truth indicator values from the model providers actively. The resulted MoreBoost.AR algorithm is guaranteed to significantly reduce the prediction error caused by the indicator noise. We also conduct experiments on both synthetic and benchmark datasets to verify the performance of the proposed approaches. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/ding20a.html https://proceedings.mlr.press/v129/ding20a.html Deep-n-Cheap: An Automated Search Framework for Low Complexity Deep Learning We present Deep-n-Cheap – an open-source AutoML framework to search for deep learning models. This search includes both architecture and training hyperparameters, and supports convolutional neural networks and multilayer perceptrons. Our framework is targeted for deployment on both benchmark and custom datasets, and as a result, offers a greater degree of search space customizability as compared to a more limited search over only pre-existing models from literature. We also introduce the technique of ’search transfer’, which demonstrates the generalization capabilities of our models to multiple datasets. Deep-n-Cheap includes a user-customizable complexity penalty which trades off performance with training time or number of parameters. Specifically, our framework results in models offering performance comparable to state-of-the-art while taking 1-2 orders of magnitude less time to train than models from other AutoML and model search frameworks. Additionally, this work investigates and develops various insights regarding the search process. In particular, we show the superiority of a greedy strategy and justify our choice of Bayesian optimization as the primary search methodology over random / grid search. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/dey20a.html https://proceedings.mlr.press/v129/dey20a.html Bidirectional Dependency-Guided Attention for Relation Extraction The dependency relation between words in the sentence is critical for the relation extraction. Existing methods often utilize the dependencies accompanied with various pruning strategies, thus suffer from the loss of detailed semantic information.In order to exploit dependency structure more effectively, we propose a novel bidirectional dependency-guided attention model. The main idea is to use a top-down attention as well as a bottom-up attention to fully capture the dependencies from different granularity. Specifically, the bottom-up attention aims to model the local semantics from the subtree of each node, while the top-down attention is to model the global semantics from the ancestor nodes. Moreover, we employ a label embedding component to attend the contextual features, which are extracted by the dependency-guided attention. Overall, the proposed model is fully attention-based which make it easy for parallel computing. Experiment results on TACRED dataset and SemEval 2010 Task 8 dataset show that our model outperforms existing dependency based models as well as the powerful pretraining model. Moreover, the proposed model achieves the state-of-the-art performance on TACRED dataset. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/deng20a.html https://proceedings.mlr.press/v129/deng20a.html A Novel Higher-order Weisfeiler-Lehman Graph Convolution Current GNN architectures use a vertex neighborhood aggregation scheme, which limits their discriminative power to that of the 1-dimensional Weisfeiler-Lehman (WL) graph isomorphism test. Here, we propose a novel graph convolution operator that is based on the 2-dimensional WL test. We formally show that the resulting 2-WL-GNN architecture is more discriminative than existing GNN approaches. This theoretical result is complemented by experimental studies using synthetic and real data. On multiple common graph classification benchmarks, we demonstrate that the proposed model is competitive with state-of-the-art graph kernels and GNNs. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/damke20a.html https://proceedings.mlr.press/v129/damke20a.html Collaborative Exploration in Stochastic Multi-Player Bandits Internet of Things (IoT) faces multiple challenges to achieve high reliability, low-latency and low power consumption. Its performance is affected by many factors such as external interference coming from other coexisting wireless communication technologies that are sharing the same spectrum. To address this problem, we introduce a general approach for the identification of poor-link quality channels. We formulate our problem as a multi-player multi-armed bandit problem, where the devices in an IoT network are the players, and the arms are the radio channels. For a realistic formulation, we do not assume that sensing information is available or that the number of players is below the number of arms. We develop and analyze a collaborative decentralized algorithm that aims to find a set of $m$ $(\epsilon,m)$-optimal arms using an Explore-$m$ algorithm (as denoted by Kalyanakrishnan and Stone (2010)) as a subroutine, and hence blacklisting the suboptimal arms in order to improve the QoS of IoT networks while reducing their energy consumption. We prove analytically and experimentally that our algorithm outperforms selfish algorithms in terms of sample complexity with a low communication cost, and that although playing a smaller set of arms increases the collision rate, playing the optimal arms only improves the QoS of the network. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/dakdouk20a.html https://proceedings.mlr.press/v129/dakdouk20a.html Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general $k$-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/costa20a.html https://proceedings.mlr.press/v129/costa20a.html A foreground detection algorithm for Time-of-Flight cameras adapted dynamic integration time adjustment and multipath distortions There are two scenarios often appear in the use of a Time-of-Flight (ToF) camera. One is requiring dynamic adjustment of its integration time to avoid overexposure, the other is multipath distortions happen. In these two scenarios, the pixel values of depth map and intensity map will suddenly and greatly change, and it will effect ToF based applications that require foreground detection. Traditional foreground detection algorithms can not adapt to these scenarios well, since they are sensitive to the sudden large change of pixel values and the threshold of pixel values difference people pick. Therefore, this paper proposes a pixel-insensitive and threshold-free algorithm to deal with the above scenarios. It is an end-to-end model based on deep learning. It takes two intensity maps captured by a ToF camera as input, where one intensity map works as a background, and the other works as a contrast. Taking their actual differences, also called foreground, as a label. Then, using deep learning to learn how to detect foreground based on these inputs and labels. To learn the pattern, datasets are collected under various scenes by multiple ToF cameras, and the training datasets are enlarged through applying a series of random transformations on the foreground and introducing two-dimensional Gaussian noise. Experiments show the new algorithm can stably detect foreground under different circumstances including the two mentioned scenarios. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/chen20d.html https://proceedings.mlr.press/v129/chen20d.html Learning Dynamic Context Graph Embedding Graph embeddings represent nodes as low-dimensional vectors to preserve the proximity between nodes and communities of graphs for network analysis. The temporal edges (e.g., relationships, contacts, and emails) in dynamic graphs are important for graph evolution analysis, but few existing methods in graph embeddings can capture the dynamic information from temporal edges. In this study, we propose a dynamic graph embedding method to analyze the evolution patterns of dynamic graphs effectively. Our method uses diffuse context sampling to preserve the proximity between nodes, and applies dynamic context graph embeddings to train discrete-time graph embeddings in the same vector space without alignments to preserve the temporal continuity of stable nodes. We compare our method with several state-of-the-art methods for link prediction, and the experiments demonstrate that our method generally performs better at the task. Our method is further verified using a real-world dynamic graph by visualizing the evolution of its community structure at different timesteps. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/chen20c.html https://proceedings.mlr.press/v129/chen20c.html Constrained Reinforcement Learning via Policy Splitting We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/chen20b.html https://proceedings.mlr.press/v129/chen20b.html A Distance-Weighted Class-Homogeneous Neighbourhood Ratio for Algorithm Selection In this paper, we introduce a new form of meta-feature that is based on a distance-weighted class-homogeneous neighbourhood ratio to facilitate algorithm selection. We show that these new meta-features, while exhibiting a cost advantage, achieve a comparable, and in some cases, higher performance than conventional meta-features. These results were obtained via experiments conducted over artificial datasets and real-world datasets from the UCI repository. We further redefine the algorithm selection problem by advocating that accuracy should be calculated based on the assumption that the population of datasets is uniformly distributed. Finally, in this paper, we provide a new perspective on landmarkers, such that a landmarker corresponds to a tuple (algorithm, metric), and propose the idea of a new family of meta-features. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/chen20a.html https://proceedings.mlr.press/v129/chen20a.html Learning Interpretable Models using Soft Integrity Constraints Integer models are of particular interest for applications where predictive models are supposed not only to be accurate but also interpretable to human experts. We introduce a novel penalty term called Facets whose primary goal is to favour integer weights. Our theoretical results illustrate the behaviour of the proposed penalty term: for small enough weights, the Facets matches the L1 penalty norm, and as the weights grow, it approaches the L2 regulariser. We provide the proximal operator associated with the proposed penalty term, so that the regularised empirical risk minimiser can be computed efficiently. We also introduce the Strongly Convex Facets, and discuss its theoretical properties. Our numerical results show that while achieving the state-of-the-art accuracy, optimisation of a loss function penalised by the proposed Facets penalty term leads to a model with a significant number of integer weights. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/belahcene20a.html https://proceedings.mlr.press/v129/belahcene20a.html Convergence Rates of a Momentum Algorithm with Bounded Adaptive Step Size for Nonconvex Optimization Although Adam is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of Adam have been proposed to circumvent this convergence issue. In this work, we study the Adam algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/barakat20a.html https://proceedings.mlr.press/v129/barakat20a.html Exact Passive-Aggressive Algorithms for Multiclass Classification Using Bandit Feedbacks In many real-life classification problems, we may not get exact class labels for training samples. One such example is bandit feedback in multiclass classification. In this setting, we only get to know whether our predicted label is correct or not. Due to which, we are left in uncertainty about the actual class label when we predict the wrong class. This paper proposes exact passive-aggressive online algorithms for multiclass classification under bandit feedback (EPABF). The proposed approach uses an exploration-exploitation strategy to guess the class label in every trial. To update the weights, we solve a quadratic optimization problem under multiple class separability constraints and find the exact solution. We do this by finding active constraints using the KKT conditions of the optimization problem. These constraints form a support set that determines the classes for which the weight vector needs to be updated. We propose three different variants of the weight update rule, which vary based on the aggressiveness to correct the mistake. These are called EPABF, EPABF-I, and EPABF-II. We also provide mistake bounds for the proposed EPABF, EPABF-I, and EPABF-II. Experiments demonstrated that our proposed algorithms perform better than other bandit feedback-based approaches and comparably to the full information approaches. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/arora20a.html https://proceedings.mlr.press/v129/arora20a.html Inverse Visual Question Answering with Multi-Level Attentions Inverse Visual Question Answering (iVQA) is a contemporary task emerged from the need of improving visual and language understanding. It tackles the challenging problem of generating a corresponding question for a given image-answer pair. In this paper, we propose a novel deep multi-level attention model to address inverse visual question answering. The proposed model generates regional visual and semantic features at the object level and then enhances them with the answer cue by using attention mechanisms. Two levels of multiple attentions are employed in the model, including the dual attention at the partial question encoding step and the dynamic attention at the question’s next word generation step. We evaluate the proposed model on the VQA V1 dataset. It demonstrates the state-of-the-art performance in terms of multiple commonly used metrics. Fri, 25 Sep 2020 00:00:00 +0000 https://proceedings.mlr.press/v129/alwattar20a.html https://proceedings.mlr.press/v129/alwattar20a.html