- title: 'Asian Conference on Machine Learning: Preface'
  abstract: 'Preface to ACML 2021.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/balasubramanian21a.html
  PDF: https://proceedings.mlr.press/v157/balasubramanian21a/balasubramanian21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-balasubramanian21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: i-xiii
  id: balasubramanian21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: i
  lastpage: xiii
  published: 2021-11-28 00:00:00 +0000
- title: 'Vector Transport Free Riemannian LBFGS for Optimization on Symmetric Positive Definite Matrix Manifolds'
  abstract: 'This work concentrates on optimization on Riemannian manifolds. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a commonly used quasi-Newton method for numerical optimization in Euclidean spaces. Riemannian LBFGS (RLBFGS) is an extension of this method to Riemannian manifolds. RLBFGS involves computationally expensive vector transports as well as unfolding recursions using adjoint vector transports. In this article, we propose two mappings in the tangent space using the inverse second root and Cholesky decomposition. These mappings make both vector transport and adjoint vector transport identity and therefore isometric. Identity vector transport makes RLBFGS less computationally expensive and its isometry is also very useful in convergence analysis of RLBFGS. Moreover, under the proposed mappings, the Riemannian metric reduces to Euclidean inner product, which is much less computationally expensive. We focus on the Symmetric Positive Definite (SPD) manifolds which are beneficial in various fields such as data science and statistics. This work opens a research opportunity for extension of the proposed mappings to other well-known manifolds.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/godaz21a.html
  PDF: https://proceedings.mlr.press/v157/godaz21a/godaz21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-godaz21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Reza
    family: Godaz
  - given: Benyamin
    family: Ghojogh
  - given: Reshad
    family: Hosseini
  - given: Reza
    family: Monsefi
  - given: Fakhri
    family: Karray
  - given: Mark
    family: Crowley
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1-16
  id: godaz21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1
  lastpage: 16
  published: 2021-11-28 00:00:00 +0000
- title: 'Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron'
  abstract: 'Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance — namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wang21a.html
  PDF: https://proceedings.mlr.press/v157/wang21a/wang21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wang21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jun-Kun
    family: Wang
  - given: Jacob
    family: Abernethy
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 17-32
  id: wang21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 17
  lastpage: 32
  published: 2021-11-28 00:00:00 +0000
- title: 'Hybrid Estimation for Open-Ended Questions with Early-Age Students’ Block-Based Programming Answers'
  abstract: 'Block-based programming is of great significance for cultivating children’s computational thinking. However, due to the following challenges, it is difficult to evaluate students’ programming ability in online learning systems: 1) compared with the traditional Online Judge (OJ) system, there is no standard answer for a given task in block-based programming; 2) in order to promote students’ interests, although the programs are not totally correct and unrelated to the task, the teacher will give a comparatively higher score. Therefore, current approaches involving output comparison and code analysis do not work effectively. Furthermore, deep learning methods also suffer from the problem of how to represent block code for classification. We propose a novel hybrid estimation model to address these challenges. We first learn graph embedding from the parsed Abstract Syntax Tree (AST) to present the logicality of the code. Next, we provide some methods to measure the workload and complexity of the code. Then, we extracted some key variables and task-irrelevant properties, introduced teacher bias. Finally, XGBoost was constructed for classification. Based on real-world data collected from an online Scratch platform by early-age students, our model outperforms KimCNN, ResNet-18, and Graph2Vec+XGBoost. Moreover, we provided statistical analyses and intuitive explanations to interpret the characteristics in various groups.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wu21a.html
  PDF: https://proceedings.mlr.press/v157/wu21a/wu21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wu21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hao
    family: Wu
  - given: Tianyi
    family: Chen
  - given: Xianzhe
    family: Luo
  - given: Canghong
    family: Jin
  - given: Yun
    family: Zhang
  - given: Minghui
    family: Wu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 33-48
  id: wu21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 33
  lastpage: 48
  published: 2021-11-28 00:00:00 +0000
- title: 'The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization'
  abstract: 'The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step-sizes, Lyapunov function constants and momentum constants. In this work we propose the use of factorial powers as a flexible tool for defining constants that appear in convergence proofs. We list a number of remarkable properties that these sequences enjoy, and show how they can be applied to convergence proofs to simplify or improve the convergence rates of the momentum method, accelerated gradient and the stochastic variance reduced method (SVRG).'
  volume: 157
  URL: https://proceedings.mlr.press/v157/defazio21a.html
  PDF: https://proceedings.mlr.press/v157/defazio21a/defazio21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-defazio21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Aaron
    family: Defazio
  - given: Robert M.
    family: Gower
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 49-64
  id: defazio21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 49
  lastpage: 64
  published: 2021-11-28 00:00:00 +0000
- title: 'Local Aggressive Adversarial Attacks on 3D Point Cloud'
  abstract: 'Deep neural networks are found to be prone to adversarial examples which could deliberately fool the model to make mistakes. Recently, a few of works expand this task from 2D image to 3D point cloud by using global point cloud optimization. However, the perturbations of global point are not effective for misleading the victim model. First, not all points are important in optimization toward misleading. Abundant points account considerable distortion budget but contribute trivially to attack. Second, the multi-label optimization is suboptimal for adversarial attack, since it consumes extra energy in finding multi-label victim model collapse and causes instance transformation to be dissimilar to any particular instance. Third, the independent adversarial and perceptibility losses, caring misclassification and dissimilarity separately, treat the updating of each point equally without a focus. Therefore, once perceptibility loss approaches its budget threshold, all points would be stock in the surface of hypersphere and attack would be locked in local optimality. Therefore, we propose a local aggressive adversarial attacks (L3A) to solve above issues. Technically, we select a bunch of salient points, the high-score subset of point cloud according to gradient, to perturb. Then a flow of aggressive optimization strategies are developed to reinforce the unperceptive generation of adversarial examples toward misleading victim models. Extensive experiments on PointNet, PointNet++ and DGCNN demonstrate the state-of-the-art performance of our method against existing adversarial attack methods.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/sun21a.html
  PDF: https://proceedings.mlr.press/v157/sun21a/sun21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-sun21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yiming
    family: Sun
  - given: Feng
    family: Chen
  - given: Zhiyu
    family: Chen
  - given: Mingjie
    family: Wang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 65-80
  id: sun21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 65
  lastpage: 80
  published: 2021-11-28 00:00:00 +0000
- title: '$h$-DBSCAN: A simple fast DBSCAN algorithm for big data'
  abstract: 'DBSCAN is a classical clustering algorithm, which can identify different shapes and isolate noisy patterns from a dataset. Despite the above advantages, the bottleneck of DBSCAN is its computation time for high dimensional datasets. This work, thus, presents a simple and fast method to improve the efficiency of DBSCAN algorithm. We reduce the execution time in two aspects. The first one is to reduce the number of points presented to DBSCAN and the second one is to apply the HNSW technique instead of the linear search structure for improving its efficiency. The experimental results show that our proposed algorithm can greatly improve the clustering speed without losing or even obtaining better accuracy, especially for large-scale datasets. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/weng21a.html
  PDF: https://proceedings.mlr.press/v157/weng21a/weng21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-weng21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shaoyuan
    family: Weng
  - given: Jin
    family: Gou
  - given: Zongwen
    family: Fan
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 81-96
  id: weng21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 81
  lastpage: 96
  published: 2021-11-28 00:00:00 +0000
- title: 'CTAB-GAN: Effective Table Data Synthesizing'
  abstract: 'While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) unfortunately limit its full effectiveness. Synthetic tabular data emerges as an alternative to enable data sharing while fulfilling regulatory and privacy constraints. The state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN) and address two main data types in industry, i.e., continuous and categorical. In this paper, we develop CTAB-GAN, a novel conditional table GAN architecture that can effectively model diverse data types, including a mix of continuous and categorical variables. Moreover, we address data imbalance and long tail issues, i.e., certain variables have drastic frequency differences across large values. To achieve those aims, we first introduce the information loss, classification loss and generator loss to the conditional GAN. Secondly, we design a novel conditional vector, which efficiently encodes the mixed data type and skewed distribution of data variable. We extensively evaluate CTAB-GAN with the state of the art GANs that generate synthetic tables, in terms of data similarity and analysis utility. The results on five datasets show that the synthetic data of CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up to 17%.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhao21a.html
  PDF: https://proceedings.mlr.press/v157/zhao21a/zhao21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhao21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zilong
    family: Zhao
  - given: Aditya
    family: Kunar
  - given: Robert
    family: Birke
  - given: Lydia Y.
    family: Chen
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 97-112
  id: zhao21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 97
  lastpage: 112
  published: 2021-11-28 00:00:00 +0000
- title: 'Fairness constraint of Fuzzy C-means Clustering improves clustering fairness'
  abstract: 'Fuzzy C-Means (FCM) clustering is a classic clustering algorithm, which is widely used in the real world. Despite the distinct advantages of FCM algorithm, whether the usage of fairness constraint in the FCM could improve clustering fairness remains fully elusive. By introducing a novel fair loss term into the objective function, a Fair Fuzzy C-Means (FFCM) algorithm was proposed in this current study. We proved that the membership value was constrained by distance and fairness in the meantime during the optimization process in the proposed objective function. By studying the Fuzzy C-Means Clustering with fairness constraint problem and proposing a fair fuzzy C-means method, this study provided mechanism understanding in achieving the fairness constraint in Fuzzy C-Means clustering and bridged up the gap of fair fuzzy clustering.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/xia21a.html
  PDF: https://proceedings.mlr.press/v157/xia21a/xia21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-xia21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Xu
    family: Xia
  - given: Zhang
    family: Hui
  - given: Ynag
    family: Chunming
  - given: Zhao
    family: Xujian
  - given: Li
    family: Bo
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 113-128
  id: xia21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 113
  lastpage: 128
  published: 2021-11-28 00:00:00 +0000
- title: 'Meta-Model-Based Meta-Policy Optimization'
  abstract: 'Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/hiraoka21a.html
  PDF: https://proceedings.mlr.press/v157/hiraoka21a/hiraoka21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-hiraoka21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Takuya
    family: Hiraoka
  - given: Takahisa
    family: Imagawa
  - given: Voot
    family: Tangkaratt
  - given: Takayuki
    family: Osa
  - given: Takashi
    family: Onishi
  - given: Yoshimasa
    family: Tsuruoka
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 129-144
  id: hiraoka21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 129
  lastpage: 144
  published: 2021-11-28 00:00:00 +0000
- title: 'An Aligned Subgraph Kernel Based on Discrete-Time Quantum Walk'
  abstract: 'In this paper, a novel graph kernel is designed by aligning the amplitude representation of the vertices. Firstly, the amplitude representation of a vertex is calculated based on the discrete-time quantum walk. Then a matching-based graph kernel is constructed through identifying the correspondence between the vertices of two graphs. The newly proposed kernel can be regarded as a kind of aligned subgraph kernel that incorporates the explicit local information of substructures. Thus, it can address the disadvantage arising in the classical R-convolution kernel that the relative locations of substructures between the graphs are ignored. Experiments on several standard datasets demonstrate that the proposed kernel has better performance compared with other state-of-the-art graph kernels in terms of classification accuracy.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/liu21a.html
  PDF: https://proceedings.mlr.press/v157/liu21a/liu21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-liu21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kai
    family: Liu
  - given: Lulu
    family: Wang
  - given: Yi
    family: Zhang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 145-157
  id: liu21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 145
  lastpage: 157
  published: 2021-11-28 00:00:00 +0000
- title: 'On the Convex Combination of Determinantal Point Processes'
  abstract: 'Determinantal point processes (DPPs) are attractive probabilistic models for expressing item quality and set diversity simultaneously. Although DPPs are widely-applicable to many subset selection tasks, there exist simple small-size probability distributions that any DPP cannot express. To overcome this drawback while keeping good properties of DPPs, in this paper we investigate the expressive power of \emph{convex combinations of DPPs}. We provide upper and lower bounds for the number of DPPs required for \emph{exactly} expressing any probability distribution. For the \emph{approximation} error, we give an upper bound on the Kullback–Leibler divergence $n-\lfloor \log t\rfloor +\epsilon$ for any $\epsilon >0$ of approximate distribution from a given joint probability distribution, where $t$ is the number of DPPs. Our numerical simulation on an online retail dataset empirically verifies that a convex combination of only two DPPs can outperform a nonsymmetric DPP in terms of the Kullback–Leibler divergence. By combining a polynomial number of DPPs, we can express probability distributions induced by bounded-degree pseudo-Boolean functions, which include weighted coverage functions of bounded occurrence.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/matsuoka21a.html
  PDF: https://proceedings.mlr.press/v157/matsuoka21a/matsuoka21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-matsuoka21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tatsuya
    family: Matsuoka
  - given: Naoto
    family: Ohsaka
  - given: Akihiro
    family: Yabe
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 158-173
  id: matsuoka21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 158
  lastpage: 173
  published: 2021-11-28 00:00:00 +0000
- title: 'Encoder-decoder-based image transformation approach for integrating precipitation forecasts'
  abstract: 'As the damage caused by heavy rainfall is becoming more serious, the improvement of precipitation forecasts is highly demanded. For this purpose, arithmetic and Bayesian average-based methods have been proposed to integrate multiple 2D-grid forecasts. However, since a single weight is shared in the entire grid in these methods, local variations of the importance of forecasts could not be taken into account. Besides, although a variety of information is available in precipitation forecast, it would not be straightforwardly to incorporate the additional information in the existing methods. To overcome these problems, we propose an encoder-decoder-based image transformation method that generates a weight image that is optimized in a pixel-wise manner and additional information could be embedded as the channel of input images and feature maps. Through the experiment of precipitation forecast in the period from April 2018 to March 2019 in Japan, we will show that our proposed integration method outperforms existing methods.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/hachiya21a.html
  PDF: https://proceedings.mlr.press/v157/hachiya21a/hachiya21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-hachiya21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hirotaka
    family: Hachiya
  - given: Yusuke
    family: Masumoto
  - given: Yuki
    family: Mori
  - given: Naonori
    family: Ueda
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 174-188
  id: hachiya21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 174
  lastpage: 188
  published: 2021-11-28 00:00:00 +0000
- title: 'A Mutual Information Regularization for Adversarial Training'
  abstract: 'Recently, a number of  methods have been developed to alleviate the vulnerability of deep neural networks to adversarial examples, among which adversarial training and its variants have been demonstrated to be the most effective empirically. This paper aims to further improve the robustness of adversarial training against adversarial examples. We propose a new training method called mutual information and mean absolute error adversarial training (MIMAE-AT) in which the mutual information between the probabilistic predictions of the natural and the adversarial examples along with the mean absolute error between their logits are used as regularization terms to the standard adversarial training.We conduct  experiments and demonstrate that the proposed MIMAE-AT method improves the state-of-the-art on adversarial robustness.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/atsague21a.html
  PDF: https://proceedings.mlr.press/v157/atsague21a/atsague21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-atsague21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Modeste
    family: Atsague
  - given: Olukorede
    family: Fakorede
  - given: Jin
    family: Tian
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 188-203
  id: atsague21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 188
  lastpage: 203
  published: 2021-11-28 00:00:00 +0000
- title: 'BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning'
  abstract: 'Online interactions with the environment to collect data samples for training a Reinforcement Learning (RL) agent is not always feasible due to economic and safety concerns. The goal of Offline Reinforcement Learning is to address this problem by learning effective policies using previously collected datasets. Standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (less explored) actions and are hence unsuitable for Offline RL. Behavior regularization, which constraints the learned policy within the support set of the dataset, has been proposed to tackle the limitations of standard off-policy algorithms. In this paper, we improve the behavior regularized offline reinforcement learning and propose BRAC+. First, we propose quantification of the out-of-distribution actions and conduct comparisons between using Kullback–Leibler divergence versus using Maximum Mean Discrepancy as the regularization protocol. We propose an analytical upper bound on the KL divergence as the behavior regularizer to reduce variance associated with sample based estimations. Second, we mathematically show that the learned Q values can diverge even using behavior regularized policy update under mild assumptions. This leads to large overestimations of the Q values and performance deterioration of the learned policy. To mitigate this issue, we add a gradient penalty term to the policy evaluation objective. By doing so, the Q values are guaranteed to converge. On challenging offline RL benchmarks, BRAC+  outperforms the baseline behavior regularized approaches by $40%\sim 87%$ and the state-of-the-art approach by $6%$.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhang21a.html
  PDF: https://proceedings.mlr.press/v157/zhang21a/zhang21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhang21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chi
    family: Zhang
  - given: Sanmukh
    family: Kuppannagari
  - given: Prasanna
    family: Viktor
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 204-219
  id: zhang21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 204
  lastpage: 219
  published: 2021-11-28 00:00:00 +0000
- title: 'Cautious Actor-Critic'
  abstract: 'The oscillating performance of off-policy learning and persisting errors in the actor-critic(AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications  better.   In  this  paper,  we  propose  a  novel  off-policy  AC  algorithm  cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic.  Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement.  We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate thatCAC achieves comparable performance while significantly stabilizes learning.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhu21a.html
  PDF: https://proceedings.mlr.press/v157/zhu21a/zhu21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhu21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Lingwei
    family: Zhu
  - given: Toshinori
    family: Kitamura
  - given: Matsubara
    family: Takamitsu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 220-235
  id: zhu21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 220
  lastpage: 235
  published: 2021-11-28 00:00:00 +0000
- title: 'Quaternion Graph Neural Networks'
  abstract: 'Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/nguyen21a.html
  PDF: https://proceedings.mlr.press/v157/nguyen21a/nguyen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-nguyen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Dai Quoc
    family: Nguyen
  - given: Tu Dinh
    family: Nguyen
  - given: Dinh
    family: Phung
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 236-251
  id: nguyen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 236
  lastpage: 251
  published: 2021-11-28 00:00:00 +0000
- title: 'Expressive Neural Voice Cloning'
  abstract: 'Voice cloning is the task of learning to synthesize the voice of an unseen speaker from a few samples. While current voice cloning methods achieve promising results in Text-to-Speech (TTS) synthesis for a new voice, these approaches lack the ability to control the expressiveness of synthesized audio. In this work, we propose a controllable voice cloning method that allows fine-grained control over various style aspects of the synthesized speech for an unseen speaker. We achieve this by explicitly conditioning the speech synthesis model on a speaker encoding, pitch contour and latent style tokens during training. Through both quantitative and qualitative evaluations, we show that our framework can be used for various expressive voice cloning tasks using only a few transcribed or untranscribed speech samples for a new speaker. These cloning tasks include style transfer from a reference speech, synthesizing speech directly from text, and fine-grained style control by manipulating the style conditioning variables during inference.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/neekhara21a.html
  PDF: https://proceedings.mlr.press/v157/neekhara21a/neekhara21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-neekhara21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Paarth
    family: Neekhara
  - given: Shehzeen
    family: Hussain
  - given: Shlomo
    family: Dubnov
  - given: Farinaz
    family: Koushanfar
  - given: Julian
    family: McAuley
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 252-267
  id: neekhara21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 252
  lastpage: 267
  published: 2021-11-28 00:00:00 +0000
- title: 'SPDE-Net: Neural Network based prediction of stabilization parameter for SUPG technique'
  abstract: 'We propose \textit{SPDE-Net}, an artificial neural network (ANN) to predict the stabilization parameter for the streamline upwind/Petrov-Galerkin (SUPG) stabilization technique for solving singularly perturbed differential equations (SPDEs). The prediction task is modeled as a regression problem and is solved using ANN. Three training strategies for the ANN have been proposed i.e supervised, $L^2$ error minimization (global) and $L^2$ error minimization (local). It has been observed that the proposed method yields accurate results, and even outperforms some of the existing state-of-the-art ANN-based partial differential equation (PDE) solvers such as Physics Informed Neural Network (PINN).'
  volume: 157
  URL: https://proceedings.mlr.press/v157/yadav21a.html
  PDF: https://proceedings.mlr.press/v157/yadav21a/yadav21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-yadav21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sangeeta
    family: Yadav
  - given: Sashikumaar
    family: Ganesan
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 268-283
  id: yadav21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 268
  lastpage: 283
  published: 2021-11-28 00:00:00 +0000
- title: 'DDSAS: Dynamic and Differentiable Space-Architecture Search'
  abstract: 'Neural Architecture Search (NAS) has made remarkable progress in automatically designing neural networks. However, existing differentiable NAS and stochastic NAS methods are either biased towards exploitation and thus may converge to a local minimum, or biased towards exploration and thus converge slowly. In this work, we propose a Dynamic and Differentiable Space-Architecture Search (DDSAS) method to address the exploration-exploitation dilemma. DDSAS dynamically samples space, searches architectures in the sampled subspace with gradient descent, and leverages the Upper Confidence Bound (UCB) to balance exploitation and exploration. The whole search space is elastic, offering flexibility to evolve and to consider resource constraints. Experiments on image classification datasets demonstrate that with only 4GB memory and 3 hours for searching, DDSAS achieves 2.39% test error on CIFAR10, 16.26% test error on CIFAR100, and 23.9% test error when transferring to ImageNet. When directly searching on ImageNet, DDSAS achieves comparable accuracy with more than 6.5 times speedup over state-of-the-art methods. The source codes are available at https://github.com/xingxing-123/DDSAS.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/yang21a.html
  PDF: https://proceedings.mlr.press/v157/yang21a/yang21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-yang21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Longxing
    family: Yang
  - given: Yu
    family: Hu
  - given: Shun
    family: Lu
  - given: Zihao
    family: Sun
  - given: Jilin
    family: Mei
  - given: Yiming
    family: Zeng
  - given: Zhiping
    family: Shi
  - given: Yinhe
    family: Han
  - given: Xiaowei
    family: Li
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 284-299
  id: yang21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 284
  lastpage: 299
  published: 2021-11-28 00:00:00 +0000
- title: 'Sinusoidal Flow: A Fast Invertible Autoregressive Flow'
  abstract: 'Normalising flows offer a flexible way of modelling continuous probability distributions. We consider expressiveness, fast inversion and exact Jacobian determinant as three desirable properties a normalising flow should possess. However, few flow models have been able to strike a good balance among all these properties. Realising that the integral of a convex sum of sinusoidal functions squared leads to a bijective residual transformation, we propose Sinusoidal Flow, a new type of normalising flows that inherits the expressive power and triangular Jacobian from fully autoregressive flows while guaranteed by Banach fixed-point theorem to remain fast invertible and thereby obviate the need for sequential inversion typically required in fully autoregressive flows. Experiments show that our Sinusoidal Flow is not only able to model complex distributions, but can also be reliably inverted to generate realistic-looking samples even with many layers of transformations stacked.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wei21a.html
  PDF: https://proceedings.mlr.press/v157/wei21a/wei21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wei21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yumou
    family: Wei
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 299-314
  id: wei21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 299
  lastpage: 314
  published: 2021-11-28 00:00:00 +0000
- title: 'Uplift Modeling with High Class Imbalance'
  abstract: 'Uplift modeling refers to estimating the causal effect of a treatment on an individual observation, used for instance to identify customers worth targeting with a discount in e-commerce. We introduce a simple yet effective undersampling strategy for dealing with the prevalent problem of high class imbalance (low conversion rate) in such applications. Our strategy is agnostic to the base learners and produces a 6.5% improvement over the best published benchmark for the largest public uplift data which incidentally exhibits high class imbalance. We also introduce a new metric on calibration for uplift modeling and present a strategy to improve the calibration of the proposed method.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/nyberg21a.html
  PDF: https://proceedings.mlr.press/v157/nyberg21a/nyberg21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-nyberg21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Otto
    family: Nyberg
  - given: Tomasz
    family: Kuśmierczyk
  - given: Arto
    family: Klami
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 315-330
  id: nyberg21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 315
  lastpage: 330
  published: 2021-11-28 00:00:00 +0000
- title: 'Iterative Deep Model Compression and Acceleration in the Frequency Domain'
  abstract: 'Deep Convolutional Neural Networks (CNNs) are successfully applied in many complex tasks, but their storage and huge computational costs hinder their deployment on edge devices. CNN model compression techniques have been widely studied in the past five years, most of which are conducted in the spatial domain. Inspired by the sparsity and low-rank properties of weight matrices in the frequency domain, we propose a novel frequency pruning framework for model compression and acceleration while maintaining high-performance. We firstly apply Discrete Cosine Transform (DCT) on convolutional kernels and train them in the frequency domain to get sparse representations. Then we propose an iterative model compression method to decompose the frequency matrices with a sampled-based low-rank approximation algorithm, and then fine-tune and recompose the low-rank matrices gradually until a predefined compression ratio is reached. We further demonstrate that model inference can be conducted with the decomposed frequency matrices, where model parameters and inference cost can be significantly reduced. Extensive experiments using well-known CNN models based on three open datasets show that the proposed method outperforms the state-of-the-arts in reduction of both the number of parameters and floating-point operations (FLOPs) without sacrificing too much model accuracy.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zeng21a.html
  PDF: https://proceedings.mlr.press/v157/zeng21a/zeng21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zeng21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yao
    family: Zeng
  - given: Xusheng
    family: Liu
  - given: Lintan
    family: Sun
  - given: Wenzhong
    family: Li
  - given: Yuchu
    family: Fang
  - given: Sanglu
    family: Lu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 331-346
  id: zeng21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 331
  lastpage: 346
  published: 2021-11-28 00:00:00 +0000
- title: 'Penalty Method for Inversion-Free Deep Bilevel Optimization'
  abstract: 'Solving a bilevel optimization problem is at the core of several machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and trainingdata poisoning. Different from simultaneous or multi-objective optimization, the steepest descent direction for minimizing the upper-level cost in a bilevel problem requires the inverse of the Hessian of the lower-level cost. In this work, we propose a novel algorithm for solving bilevel optimization problems based on the classical penalty function approach. Our method avoids computing the Hessian inverse and can handle constrained bilevel problems easily. We prove the convergence of the method under mild conditions and show that the exact hypergradient is obtained asymptotically. Our method’s simplicity and small space and time complexities enable us to effectively solve large-scale bilevel problems involving deep neural networks. We present results on data denoising, few-shot learning, and training-data poisoning problems in a large-scale setting. Our results show that our approach outperforms or is comparable to previously proposed methods based on automatic differentiation and approximate inversion in terms of accuracy, run-time, and convergence speed'
  volume: 157
  URL: https://proceedings.mlr.press/v157/mehra21a.html
  PDF: https://proceedings.mlr.press/v157/mehra21a/mehra21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-mehra21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Akshay
    family: Mehra
  - given: Jihun
    family: Hamm
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 347-362
  id: mehra21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 347
  lastpage: 362
  published: 2021-11-28 00:00:00 +0000
- title: 'CTS2: Time Series Smoothing with Constrained Reinforcement Learning'
  abstract: 'Time series smoothing is essential for time series analysis and forecasting. It helps to identify trends and patterns of time series. However, the presence of irregular perturbations disrupt the time series smoothness and distort information. The goal of time series smoothing is to remove these perturbations while preserving as much information as possible. Existing smoothing algorithms have complete freedom to make corrections to the data points which often over smooth the time series and lose information. None of them considers constraining data corrections to the best of our knowledge. Moreover, most existing methods either do not smooth in real-time or their parameters need to be hand-tuned in different scenarios. To improve smoothing performance while considering data correction constraints, we propose a $\mathbf{C}$onstrained reinforcement learning-based  $\mathbf{T}$ime  $\mathbf{S}$eries  $\mathbf{S}$moothing method, or CTS$^2$. Specifically, we first formulate the smoothing problem as a Constrained Markov Decision Process (CMDP). We then incorporate  data correction constraints to restrict the amount of correction at each point. Finally, we learn a policy network with a linear projection layer to smooth the time series. The linear projection layer ensures that all data corrections satisfy the data correction constraints. We evaluate CTS$^2$ on both synthetic and real-world time series datasets; our results show that CTS$^2$ successfully smooths time series in real-time, satisfies all the correction constraints, and works efficiently in a variety of scenarios.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/liu21b.html
  PDF: https://proceedings.mlr.press/v157/liu21b/liu21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-liu21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yongshuai
    family: Liu
  - given: Xin
    family: Liu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 363-378
  id: liu21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 363
  lastpage: 378
  published: 2021-11-28 00:00:00 +0000
- title: 'Open Images V5 Text Annotation and Yet Another Mask Text Spotter'
  abstract: 'A large scale human-labeled dataset plays an important role in creating high quality deep learning models. In this paper we present text annotation for Open Images V5 dataset. To our knowledge it is the largest among publicly available manually created text annotations. Having this annotation we trained a simple Mask-RCNN-based network, referred as Yet Another Mask Text Spotter (YAMTS), which achieves competitive performance or even outperforms current state-of-the-art approaches in some cases on ICDAR 2013, ICDAR 2015 and {Total-Text} datasets. Code for text spotting model available online at: \url{https://github.com/openvinotoolkit/training_extensions}. The model can be exported to OpenVINO{\texttrademark}-format and run on Intel{\textregistered} CPUs.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/krylov21a.html
  PDF: https://proceedings.mlr.press/v157/krylov21a/krylov21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-krylov21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ilya
    family: Krylov
  - given: Sergei
    family: Nosov
  - given: Vladislav
    family: Sovrasov
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 379-389
  id: krylov21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 379
  lastpage: 389
  published: 2021-11-28 00:00:00 +0000
- title: 'Language Representations for Generalization in Reinforcement Learning'
  abstract: 'The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task.  But its relationship with generalization to new tasks is under-explored.  One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discrete-actions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of language.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/goodger21a.html
  PDF: https://proceedings.mlr.press/v157/goodger21a/goodger21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-goodger21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nikolaj
    family: Goodger
  - given: Peter
    family: Vamplew
  - given: Cameron
    family: Foale
  - given: Richard
    family: Dazeley
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 390-405
  id: goodger21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 390
  lastpage: 405
  published: 2021-11-28 00:00:00 +0000
- title: 'Temporal Relation based Attentive Prototype Network for Few-shot Action Recognition'
  abstract: 'Few-shot action recognition aims at recognizing novel action classes with only a small number of labeled video samples. We propose a temporal relation based attentive prototype network (TRAPN) for few-shot action recognition. Concretely, we tackle this challenging task from three aspects. Firstly, we propose a spatio-temporal motion enhancement (STME) module to highlight object motions in videos. The STME module utilizes cues from content displacements in videos to enhance the features in the motion-related regions. Secondly, we learn the core common action transformations by our temporal relation (TR) module, which captures the temporal relations at short-term and long-term time scales. The learned temporal relations are encoded into descriptors to constitute sample-level features. The abstract action transformations are described by multiple groups of temporal relation descriptors. Thirdly, a vanilla prototype for the support class (e.g., the mean of the support class) cannot ﬁt well for different query samples. We generate an attentive prototype constructed from temporal relation descriptors of support samples, which gives more weight to discriminative samples. We evaluate our TRAPN on Kinetics, UCF101 and HMDB51 real-world few-shot datasets. Results show that our network achieves the state-of-the-art performance.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wang21b.html
  PDF: https://proceedings.mlr.press/v157/wang21b/wang21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wang21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Guangge
    family: Wang
  - given: Haihui
    family: Ye
  - given: Xiao
    family: Wang
  - given: Weirong
    family: Ye
  - given: Hanzi
    family: Wang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 406-421
  id: wang21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 406
  lastpage: 421
  published: 2021-11-28 00:00:00 +0000
- title: 'An Optimistic Acceleration of AMSGrad for Nonconvex Optimization'
  abstract: 'We propose a new variant of AMSGrad (Reddi et. al., 2018), a popular adaptive gradient based optimization algorithm widely used for training deep neural networks. Our algorithm adds prior knowledge about the sequence of consecutive mini-batch gradients and leverages its underlying structure making the gradients sequentially predictable. By exploiting the predictability process and ideas from optimistic online learning, the proposed algorithm can accelerate the convergence and increase its sample efficiency. After establishing a tighter upper bound under some convexity conditions on the regret, we offer a complimentary view of our algorithm which generalizes to the offline and stochastic nonconvex optimization settings. In the nonconvex case, we establish a non-asymptotic convergence bound independent of the initialization. We illustrate, via numerical experiments, the practical speedup on several deep learning models and benchmark datasets.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wang21c.html
  PDF: https://proceedings.mlr.press/v157/wang21c/wang21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wang21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jun-Kun
    family: Wang
  - given: Xiaoyun
    family: Li
  - given: Belhal
    family: Karimi
  - given: Ping
    family: Li
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 422-437
  id: wang21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 422
  lastpage: 437
  published: 2021-11-28 00:00:00 +0000
- title: 'Dynamic Coordination Graph for Cooperative Multi-Agent Reinforcement Learning'
  abstract: 'This paper introduces Dynamic $Q$-value Coordination Graph (QCGraph) for cooperative multi-agent reinforcement learning. QCGraph aims to dynamically represent and generalize through factorizing the joint value function of all agents according to dynamically created coordination graph based on subsets of agents. The value can be maximized by message passing at both a local and global level along the graph which allows training the value function end-to-end. The coordination graph is dynamically generated and used to generate the payoff functions which are approximated using graph neural networks and parameter sharing to improve generalization over the state-action space. We show that QCGraph can solve a variety of challenging multi-agent tasks being superior to other value factorization approaches. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/siu21a.html
  PDF: https://proceedings.mlr.press/v157/siu21a/siu21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-siu21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chapman
    family: Siu
  - given: Jason
    family: Traish
  - given: Richard Yi Da
    family: Xu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 438-453
  id: siu21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 438
  lastpage: 453
  published: 2021-11-28 00:00:00 +0000
- title: 'S2TNet: Spatio-Temporal Transformer Networks for Trajectory Prediction in Autonomous Driving'
  abstract: 'To safely and rationally participate in dense and heterogeneous traffic, autonomous vehicles require to sufficiently analyze the motion patterns of surrounding traffic-agents and accurately predict their future trajectories. This is challenging because the trajectories of traffic-agents are not only influenced by the traffic-agents themselves but also by spatial interaction with each other. Previous methods usually rely on the sequential step-by-step processing of Long Short-Term Memory networks (LSTMs) and merely extract the interactions between spatial neighbors for single type traffic-agents. We propose the Spatio-Temporal Transformer Networks (S2TNet), which models the spatio-temporal interactions by spatio-temporal Transformer and deals with the temporel sequences by temporal Transformer. We input additional category, shape and heading information into our networks to handle the heterogeneity of traffic-agents. The proposed methods outperforms state-of-the-art methods on ApolloScape Trajectory dataset by more than 7% on both the weighted sum of Average and Final Displacement Error.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/chen21a.html
  PDF: https://proceedings.mlr.press/v157/chen21a/chen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-chen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Weihuang
    family: Chen
  - given: Fangfang
    family: Wang
  - given: Hongbin
    family: Sun
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 454-469
  id: chen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 454
  lastpage: 469
  published: 2021-11-28 00:00:00 +0000
- title: 'Solving Machine Learning Problems'
  abstract: 'Can a machine learn Machine Learning? This work trains a machine learning model to solve machine learning problems from a University undergraduate level course. We generate a new training set of questions and answers consisting of course exercises, homework, and quiz questions from MIT’s 6.036 Introduction to Machine Learning course and train a machine learning model to answer these questions. Our system demonstrates an overall accuracy of 96% for open-response questions and 97% for multiple-choice questions, compared with MIT students’ average of 93%, achieving grade A performance in the course, all in real-time. Questions cover all 12 topics taught in the course, excluding coding questions or questions with images. Topics include: (i) basic machine learning principles; (ii) perceptrons; (iii) feature extraction and selection; (iv) logistic regression; (v) regression; (vi) neural networks; (vii) advanced neural networks; (viii) convolutional neural networks; (ix) recurrent neural networks; (x) state machines and MDPs; (xi) reinforcement learning; and (xii) decision trees. Our system uses Transformer models within an encoder-decoder architecture with graph and tree representations. An important aspect of our approach is a data-augmentation scheme for generating new example problems. We also train a machine learning model to generate problem hints. Thus, our system automatically generates new questions across topics, answers both open-response questions and multiple-choice questions, classifies problems, and generates problem hints, pushing the envelope of AI for STEM education.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/tran21a.html
  PDF: https://proceedings.mlr.press/v157/tran21a/tran21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-tran21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sunny
    family: Tran
  - given: Pranav
    family: Krishna
  - given: Ishan
    family: Pakuwal
  - given: Prabhakar
    family: Kafle
  - given: Nikhil
    family: Singh
  - given: Jayson
    family: Lynch
  - given: Iddo
    family: Drori
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 470-485
  id: tran21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 470
  lastpage: 485
  published: 2021-11-28 00:00:00 +0000
- title: 'Pedestrian Wind Factor Estimation in Complex Urban Environments'
  abstract: 'Urban planners and policy makers face the challenge of creating livable and enjoyable citiesfor larger populations in much denser urban conditions. While the urban microclimate holdsa key role in defining the quality of urban spaces today and in the future, the integrationof wind microclimate assessment in early urban design and planning processes remains achallenge  due  to  the  complexity  and  high  computational  expense  of  computational  fluiddynamics  (CFD)  simulations.   This  work  develops  a  data-driven  workflow  for  real-timepedestrian  wind  comfort  estimation  in  complex  urban  environments  which  may  enabledesigners,  policy  makers  and  city  residents  to  make  informed  decisions  about  mobility,health, and energy choices.  We use a conditional generative adversarial network (cGAN)architecture to reduce the computational computation while maintaining high confidencelevels and interpretability, adequate representation of urban complexity, and suitability forpedestrian  comfort  estimation.   We  demonstrate  high  quality  wind  field  approximationswhile reducing computation time from days to seconds.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/mokhtar21a.html
  PDF: https://proceedings.mlr.press/v157/mokhtar21a/mokhtar21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-mokhtar21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sarah
    family: Mokhtar
  - given: Matt
    family: Beveridge
  - given: Yumeng
    family: Cao
  - given: Iddo
    family: Drori
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 486-501
  id: mokhtar21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 486
  lastpage: 501
  published: 2021-11-28 00:00:00 +0000
- title: 'DPOQ: Dynamic Precision Onion Quantization'
  abstract: 'With the development of deployment platforms and application scenarios for deep neural networks, traditional fixed network architectures cannot meet the requirements. Meanwhile the dynamic network inference becomes a new research trend. Many slimmable and scalable networks have been proposed to satisfy different resource constraints (e.g., storage, latency and energy). And a single network may support versatile architectural configurations including: depth, width, kernel size, and resolution. In this paper, we propose a novel network architecture reuse strategy enabling dynamic precision in parameters. Since our low-precision networks are wrapped in the high-precision networks like an onion, we name it dynamic precision onion quantization (DPOQ). We train the network by using the joint loss with scaled gradients. To further improve the performance and make different precision network compatible with each other, we propose the precision shift batch normalization (PSBN). And we also propose a scalable input-specific inference mechanism based on this architecture and make the network more adaptable. Experiments on the CIFAR and ImageNet dataset have shown that our DPOQ achieves not only better flexibility but also higher accuracy than the individual quantization.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/li21a.html
  PDF: https://proceedings.mlr.press/v157/li21a/li21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-li21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Bowen
    family: Li
  - given: Kai
    family: Huang
  - given: Siang
    family: Chen
  - given: Dongliang
    family: Xiong
  - given: Luc
    family: Claesen
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 502-517
  id: li21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 502
  lastpage: 517
  published: 2021-11-28 00:00:00 +0000
- title: 'A Causal Approach for Unfair Edge Prioritization and Discrimination Removal'
  abstract: 'In budget-constrained settings aimed at mitigating unfairness, like law enforcement, it is essential to prioritize the sources of unfairness before taking measures to mitigate them in the real world. Unlike previous works, which only serve as a caution against possible discrimination and de-bias data after data generation, this work provides a toolkit to mitigate unfairness during data generation, given by the Unfair Edge Prioritization algorithm, in addition to de-biasing data after generation, given by the Discrimination Removal algorithm. We assume that a non-parametric Markovian causal model representative of the data generation procedure is given. The edges emanating from the sensitive nodes in the causal graph, such as race, are assumed to be the sources of unfairness. We first quantify Edge Flow in any edge X –> Y, which is the belief of observing a specific value of Y due to the influence of a specific value of X along X –> Y. We then quantify Edge Unfairness by formulating a non-parametric model in terms of edge flows. We then prove that cumulative unfairness towards sensitive groups in a decision, like race in a bail decision, is non-existent when edge unfairness is absent. We prove this result for the non-trivial non-parametric model setting when the cumulative unfairness cannot be expressed in terms of edge unfairness. We then measure the Potential to mitigate the Cumulative Unfairness when edge unfairness is decreased. Based on these measurements, we propose the Unfair Edge Prioritization algorithm that can then be used by policymakers. We also propose the Discrimination Removal Procedure that de-biases a data distribution by eliminating optimization constraints that grow exponentially in the number of sensitive attributes and values taken by them. Extensive experiments validate the theorem and specifications used for quantifying the above measures.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/pavan-ravishankar21a.html
  PDF: https://proceedings.mlr.press/v157/pavan-ravishankar21a/pavan-ravishankar21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-pavan-ravishankar21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Balaraman Ravindran
    family: Pavan Ravishankar
    suffix: Pranshu Malviya
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 518-533
  id: pavan-ravishankar21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 518
  lastpage: 533
  published: 2021-11-28 00:00:00 +0000
- title: 'Spatial Temporal Enhanced Contrastive and Pretext Learning for Skeleton-based Action Representation'
  abstract: 'In this paper, we focus on unsupervised representation learning for skeleton-based action recognition. The critical issue of this task is extracting discriminative spatial-temporal information from skeleton sequences to form action representation. To better solve this, we propose a novel unsupervised framework named contrastive-pretext spatial-temporal network (CP-STN), aiming to achieve accurate action recognition by better exploiting discriminative spatial-temporal enhanced features from massive unlabeled data. We combine contrastive and pretext tasks learning paradigms in one framework by using asymmetric spatial and temporal augmentations to enable network extracting discriminative representations with spatial-temporal information fully. Furthermore, graph-based convolution is used as the backbone to explore natural spatial-temporal graph information in skeleton data. Extensive experimental results show that our CP-STN signiﬁcantly boosts the performance of existing skeleton-based action representations learning networks and achieves state-of-the-art accuracy on two challenging benchmarks in both unsupervised and semi-supervised settings.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhan21a.html
  PDF: https://proceedings.mlr.press/v157/zhan21a/zhan21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhan21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yiwen
    family: Zhan
  - given: Yuchen
    family: Chen
  - given: Pengfei
    family: Ren
  - given: Haifeng
    family: Sun
  - given: Jingyu
    family: Wang
  - given: Qi
    family: Qi
  - given: Jianxin
    family: Liao
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 534-547
  id: zhan21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 534
  lastpage: 547
  published: 2021-11-28 00:00:00 +0000
- title: 'QActor: Active Learning on Noisy Labels'
  abstract: 'Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning.  In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/younesian21a.html
  PDF: https://proceedings.mlr.press/v157/younesian21a/younesian21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-younesian21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Taraneh
    family: Younesian
  - given: Zilong
    family: Zhao
  - given: Amirmasoud
    family: Ghiassi
  - given: Robert
    family: Birke
  - given: Lydia Y
    family: Chen
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 548-563
  id: younesian21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 548
  lastpage: 563
  published: 2021-11-28 00:00:00 +0000
- title: 'Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations'
  abstract: 'We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms. We show that in tasks like online information gathering, where sequential query recommendations are employed, the sequences of queries are correlated and the number of potentially optimal queries can be reduced to a manageable size by selecting queries with maximum utility with respect to the currently executing query. Our experimental results using a recent real online literature discovery service log file demonstrate that the proposed arm selection strategy improves the cumulative regret substantially with respect to the state-of-the-art baseline algorithms. Our data model and source code are available at  \url{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}'
  volume: 157
  URL: https://proceedings.mlr.press/v157/puthiya-parambath21a.html
  PDF: https://proceedings.mlr.press/v157/puthiya-parambath21a/puthiya-parambath21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-puthiya-parambath21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shameem
    family: Puthiya Parambath
  - given: Christos
    family: Anagnostopoulos
  - given: Roderick
    family: Murray-Smith
  - given: Sean
    family: MacAvaney
  - given: Evangelos
    prefix: and
    family: Zervas
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 564-579
  id: puthiya-parambath21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 564
  lastpage: 579
  published: 2021-11-28 00:00:00 +0000
- title: 'Multi-task Actor-Critic with Knowledge Transfer via a Shared Critic'
  abstract: 'Multi-task actor-critic is a learning paradigm proposed in the literature to improve the learning efficiency of multiple actor-critics by sharing the learned policies across tasks while the reinforcement learning progresses online. However, existing multi-task actor-critic algorithms can only handle reinforcement learning tasks within the same problem domain, they may fail in cases where tasks possessing diverse state-action spaces. Taking this cue, in this paper, we embark a study on multi-task actor-critic with knowledge transfer via a share critic to enable the multi-task learning of actor-critic in heterogeneous state-action environments. Further, for efficient learning of the proposed multi-task actor-critic, a new formula for calculating the gradient of the actor network is also presented. To evaluate the performance of our approach, comprehensive empirical studies on continuous robotic tasks with different numbers of links. The experimental results confirmed the effectiveness of the proposed multi-task actor-critic algorithm.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhang21b.html
  PDF: https://proceedings.mlr.press/v157/zhang21b/zhang21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhang21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Gengzhi
    family: Zhang
  - given: Liang
    family: Feng
  - given: Yaqing
    family: Hou
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 580-593
  id: zhang21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 580
  lastpage: 593
  published: 2021-11-28 00:00:00 +0000
- title: 'Contrastive Neural Processes for Self-Supervised Learning'
  abstract: 'Recent contrastive methods show significant improvement in self-supervised learning in several domains. In particular, contrastive methods are most effective where data augmentation can be easily constructed e.g. in computer vision. However, they are less successful in domains without established data transformations such as time series data. In this paper, we propose a novel self-supervised learning framework that combines contrastive learning with neural processes. It relies on recent advances in neural processes to perform time series forecasting. This allows to generate augmented versions of data by employing a set of various sampling functions and, hence, avoid manually designed augmentations. We extend conventional neural processes and propose a new contrastive loss to learn times series representations in a self-supervised setup. Therefore, unlike previous self-supervised methods, our augmentation pipeline is task-agnostic, enabling our method to perform well across various applications. In particular, a ResNet with a linear classifier trained using our approach is able to outperform state-of-the-art techniques across industrial, medical and audio datasets improving accuracy over 10% in ECG periodic data. We further demonstrate that our self-supervised representations are more efficient in the latent space, improving multiple clustering indexes and that fine-tuning our method on 10% of labels achieves results competitive to fully-supervised learning.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/kallidromitis21a.html
  PDF: https://proceedings.mlr.press/v157/kallidromitis21a/kallidromitis21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-kallidromitis21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Konstantinos
    family: Kallidromitis
  - given: Denis
    family: Gudovskiy
  - given: Kozuka
    family: Kazuki
  - given: Ohama
    family: Iku
  - given: Luca
    family: Rigazio
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 594-609
  id: kallidromitis21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 594
  lastpage: 609
  published: 2021-11-28 00:00:00 +0000
- title: 'Pyramid Correlation based Deep Hough Voting for Visual Object Tracking'
  abstract: ' Most of the existing Siamese-based trackers treat tracking problem as a parallel task of classification and regression. However, some studies show that the sibling head structure could lead to suboptimal solutions during the network training. Through experiments we find that, without regression, the performance could be equally promising as long as we delicately design the network to suit the training objective. We introduce a novel voting-based classification-only tracking algorithm named Pyramid Correlation based Deep Hough Voting (short for PCDHV), to jointly locate the top-left and bottom-right corners of the target. Specifically we innovatively construct a Pyramid Correlation module to equip the embedded feature with fine-grained local structures and global spatial contexts; The elaborately designed Deep Hough Voting module further take over, integrating long-range dependencies of pixels to perceive corners; In addition, the prevalent discretization gap is simply yet effectively alleviated by increasing the spatial resolution of the feature maps while exploiting channel-space relationships. The algorithm is general, robust and simple. We demonstrate the effectiveness of the module through a series of ablation experiments. Without bells and whistles, our tracker achieves better or comparable performance to the SOTA algorithms on three challenging benchmarks (TrackingNet, GOT-10k and LaSOT) while running at a real-time speed of 80 FPS. Codes and models will be released. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/wang21d.html
  PDF: https://proceedings.mlr.press/v157/wang21d/wang21d.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wang21d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ying
    family: Wang
  - given: Tingfa
    family: Xu
  - given: Shenwang
    family: Jiang
  - given: Junjie
    family: Chen
  - given: Jianan
    family: Li
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 610-625
  id: wang21d
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 610
  lastpage: 625
  published: 2021-11-28 00:00:00 +0000
- title: 'calibrated adversarial training'
  abstract: 'Adversarial training is an approach of increasing the robustness of models to adversarial attacks by including adversarial examples in the training set. One major challenge of producing adversarial examples is to contain sufficient perturbation in the example to flip the model’s output while not making severe changes in the example’s semantical content. Exuberant change in the semantical content could also change the true label of the example. Adding such examples to the training set results in adverse effects. In this paper, we present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training. The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error. We provide theoretical analysis on the calibrated robust error and derive an upper bound for it. Our empirical results show a superior performance of the Calibrated Adversarial Training over a number of public datasets.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/huang21a.html
  PDF: https://proceedings.mlr.press/v157/huang21a/huang21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-huang21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tianjin
    family: Huang
  - given: Vlado
    family: Menkovski
  - given: Yulong
    family: Pei
  - given: Mykola
    family: Pechenizkiy
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 626-641
  id: huang21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 626
  lastpage: 641
  published: 2021-11-28 00:00:00 +0000
- title: 'ASD-Conv: Monocular 3D object detection network based on Asymmetrical Segmentation Depth-aware Convolution'
  abstract: 'In the field of 3D object recognition, monocular 3D recognition technology is a valuable recognition technology. Compared with binocular technology and lidar technology, its cost is lower. In this paper, based on the existing monocular 3D recognition network, we propose an asymmetrical segmentation depth-aware network: ASD-Conv Network, which is used to better obtain the depth information of monocular images, so as to obtain better recognition results. Compared with other monocular recognition networks, ASD-Conv network performs special segmentation on the image, which can better obtain the depth distribution of the image, and has made a good breakthrough and improvement in the image recognition tasks of 2D, BEV and 3D. The improved algorithm proposed in this paper can improve the detection accuracy while maintaining a certain real-time performance. Experimental results show that compared with the current model, the proposed monocular 3D object detection algorithm based on D-ASDConv has an average improvement rate of 2.82%(AP) in large object detection and the highest average improvement rate of 2.01%(AP) in small object detection on Kitti dataset. The algorithm can effectively learn more advanced features of spatial perception, and the detection results of monocular images are more accurate.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/xingyuan21a.html
  PDF: https://proceedings.mlr.press/v157/xingyuan21a/xingyuan21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-xingyuan21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yu
    family: Xingyuan
  - given: Du
    family: Neng
  - given: Gao
    family: Ge
  - given: Wen
    family: Fan
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 642-655
  id: xingyuan21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 642
  lastpage: 655
  published: 2021-11-28 00:00:00 +0000
- title: 'Convolutional Hypercomplex Embeddings for Link Prediction'
  abstract: 'Knowledge graph embedding research has mainly focused on the two smallest normed division algebras, $\mathbb{R}$ and $\mathbb{C}$. Recent results suggest that trilinear products of quaternion-valued embeddings can be a more effective means to tackle link prediction. In addition, models based on convolutions on real-valued embeddings often yield state-of-the-art results for link prediction. In this paper, we investigate a composition of convolution operations with hypercomplex multiplications. We propose the four approaches QMult, OMult, ConvQ and ConvO  to tackle the link prediction problem. QMult and OMult can be considered as quaternion and octonion extensions of previous state-of-the-art approaches, including DistMult and ComplEx. ConvQ and ConvO build upon QMult and OMult by including convolution operations in a way inspired by the residual learning framework. We evaluated our approaches on seven link prediction datasets including WN18RR, FB15K-237 and YAGO3-10. Experimental results suggest that the benefits of learning hypercomplex-valued vector representations become more apparent as the size and complexity of the knowledge graph grows. ConvO outperforms state-of-the-art approaches on FB15K-237 in MRR, Hit@1 and Hit@3, while QMult, OMult, ConvQ and ConvO outperform state-of-the-approaches on YAGO3-10 in all metrics. Results also suggest that link prediction performances can be further improved via prediction averaging. To foster reproducible research, we provide an open-source implementation of approaches, including training and evaluation scripts as well as pretrained models.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/demir21a.html
  PDF: https://proceedings.mlr.press/v157/demir21a/demir21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-demir21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Caglar
    family: Demir
  - given: Diego
    family: Moussallem
  - given: Stefan
    family: Heindorf
  - given: Axel-Cyrille
    family: Ngonga Ngomo
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 656-671
  id: demir21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 656
  lastpage: 671
  published: 2021-11-28 00:00:00 +0000
- title: 'Beyond $L_p$ Clipping: Equalization based Psychoacoustic Attacks against ASRs'
  abstract: 'Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the “psychoacoustic” attacks can create examples with relatively imperceptible perturbations, as they leverage the knowledge of the human auditory system. Unfortunately, existing psychoacoustic attacks can only be applied against traditional models, and are obsolete against the newer, fully end-to-end ASRs. In this paper, we propose an equalization-based psychoacoustic attack that can exploit both traditional and fully end-to-end ASRs. We successfully demonstrate our attack against real-world ASRs that include DeepSpeech and Wav2Letter. Moreover, we employ a user study to verify that our method creates low audible distortion. Specifically, 80 of the 100 participants voted in favor of \textit{all} our attack audio samples as less noisier than the existing state-of-the-art attack. Through this, we demonstrate both types of existing ASR pipelines can be exploited with minimum degradation to attack audio quality.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/abdullah21a.html
  PDF: https://proceedings.mlr.press/v157/abdullah21a/abdullah21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-abdullah21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hadi
    family: Abdullah
  - given: Muhammad Sajidur
    family: Rahman
  - given: Christian
    family: Peeters
  - given: Cassidy
    family: Gibson
  - given: Washington
    family: Garcia
  - given: Vincent
    family: Bindschaedler
  - given: Thomas
    family: Shrimpton
  - given: Patrick
    family: Traynor
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 672-688
  id: abdullah21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 672
  lastpage: 688
  published: 2021-11-28 00:00:00 +0000
- title: 'Slice-sampling based 3D Object Classification'
  abstract: 'Multiview-based 3D object detection achieved great success in the past years. However, for some complex models with complex inner structures, the performances of these methods are not satisfactory. This paper provides a method based on slide sampling for 3D object classification. First, we slice and sample the model from the different depths and directions to get the model’s features. Then, a deep neural network designed based on the attention mechanism is used to classify the input data. The experiments show that the performance of our method is competitive on ModelNet. Moreover, for some special models with simple surfaces and complex inner structures, the performance of our method is outstanding and stable.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/xiangwen21a.html
  PDF: https://proceedings.mlr.press/v157/xiangwen21a/xiangwen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-xiangwen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhao
    family: Xiangwen
  - given: Yang
    family: Yi-Jun
  - given: Zeng
    family: Wei
  - given: Yang
    family: Liqun
  - given: Wang
    family: Yao
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 689-704
  id: xiangwen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 689
  lastpage: 704
  published: 2021-11-28 00:00:00 +0000
- title: 'Multi-Branch Network for Cross-Subject EEG-based Emotion Recognition'
  abstract: 'In recent years, electrocardiogram (EEG)-based emotion recognition has received increasing attention in affective computing. Since the individual differences of EEG signals are large, most models are trained for specific subjects, and the generalization is poor when applied to new subjects. In this paper, we propose a Multi-Branch Network (MBN) model to solve this problem. According to the characteristics of the cross-subject data, different branch networks are designed to separate the background features and task features of the EEG signals for classification to have better model performance. Besides, there is no new-subject data needed during model training. In order to avoid the negative improvement caused by samples with significant differences to model training, a tiny amount of new-subject data is used to filter the training samples to improve the model performance further. Before training the model, the samples with significant differences from the new subject were deleted by comparing the background features between the subjects. The experimental results show that compared with Single-Branch Network (SBN) model, the accuracy of the MBN model is improved by 20.89% on the SEED dataset. Furthermore, compared with other common methods, the proposed method uses less new-subject data, which improves its practicability in practical application.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/lin21a.html
  PDF: https://proceedings.mlr.press/v157/lin21a/lin21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-lin21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Guang
    family: Lin
  - given: Li
    family: Zhu
  - given: Bin
    family: Ren
  - given: Yiteng
    family: Hu
  - given: Jianhai
    family: Zhang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 705-720
  id: lin21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 705
  lastpage: 720
  published: 2021-11-28 00:00:00 +0000
- title: 'Skew-symmetrically perturbed gradient flow for convex optimization'
  abstract: 'Recently, many methods for optimization and sampling have been developed by designing continuous dynamics followed by discretization. The dynamics that have been used for optimization have their corresponding underlying functionals to be minimized. On the other hand, a wider class of dynamics have been studied for sampling, which is not necessarily limited to functional minimization. For example, dynamics perturbed with skew-symmetric matrices, which cannot be seen as minimization of functionals, have been widely used to reduce asymptotic variance. Following this success in sampling, exploring such perturbed dynamics in the context of optimization can open a new avenue to optimization algorithm design. In this work, we introduce a perturbation technique for sampling into optimization for strongly convex functions. We show that perturbation applied to the gradient flow yields rapid convergence in optimization for strongly convex functions. Based on this continuous dynamics, we propose an optimization algorithm for strongly convex functions with a novel discretization framework that combines the Euler method with the leapfrog method which is used in the Hamilton Monte Carlo method. Our numerical experiments show that the perturbation technique is useful for optimization.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/futami21a.html
  PDF: https://proceedings.mlr.press/v157/futami21a/futami21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-futami21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Futoshi
    family: Futami
  - given: Tomoharu
    family: Iwata
  - given: Naonori
    family: Ueda
  - given: Ikko
    family: Yamane
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 721-736
  id: futami21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 721
  lastpage: 736
  published: 2021-11-28 00:00:00 +0000
- title: 'Improving Gaussian mixture latent variable model convergence with Optimal Transport'
  abstract: 'Generative models with both discrete and continuous latent variables are highly motivated by the structure of many real-world data sets. They present, however, subtleties in training often manifesting in the discrete latent variable not being leveraged. In this paper, we show why such models struggle to train using traditional log-likelihood maximization, and that they are amenable to training using the Optimal Transport framework of Wasserstein Autoencoders. We find our discrete latent variable to be fully leveraged by the model when trained, without any modifications to the objective function or significant fine tuning. Our model generates comparable samples to other approaches while using relatively simple neural networks, since the discrete latent variable carries much of the descriptive burden. Furthermore, the discrete latent provides significant control over generation.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/gaujac21a.html
  PDF: https://proceedings.mlr.press/v157/gaujac21a/gaujac21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-gaujac21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Benoit
    family: Gaujac
  - given: Ilya
    family: Feige
  - given: David
    family: Barber
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 737-752
  id: gaujac21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 737
  lastpage: 752
  published: 2021-11-28 00:00:00 +0000
- title: 'Generating Deep Networks Explanations with Robust Attribution Alignment'
  abstract: 'Attribution methods play a key role in generating post-hoc explanations on pre-trained models, however it has been shown that existing methods yield unfaithful and noisy explanations. In this paper, we propose a new paradigm of attribution method: we treat the model’s explanations as a part of network’s outputs then generate attribution maps from the underlying deep network. The generated attribution maps are up-sampled from the last convolutional layer of the network to obtain localization information about the target to be explained. Inspired by recent studies that showed adversarially robust models’ saliency map aligns well with human perception, we utilize attribution maps from the robust model to supervise the learned attributions. Our proposed method can produce visually plausible explanations along with the prediction in inference phase. Experiments on real datasets show that our proposed method yields more faithful explanations than post-hoc attribution methods with lighter computational costs.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zeng21b.html
  PDF: https://proceedings.mlr.press/v157/zeng21b/zeng21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zeng21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Guohang
    family: Zeng
  - given: Yousef
    family: Kowsar
  - given: Sarah
    family: Erfani
  - given: James
    family: Bailey
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 753-768
  id: zeng21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 753
  lastpage: 768
  published: 2021-11-28 00:00:00 +0000
- title: 'Scalable gradient matching based on state space Gaussian Processes'
  abstract: 'In many scientific fields, various phenomena are modeled by ordinary differential equations (ODEs). Parameters in ODEs are generally unknown and hard to measure directly. Since analytical solutions for ODEs can rarely be obtained, statistical methods are often used to infer parameters from experimental observations. Among many existing methods, Gaussian process-based gradient matching has been explored extensively. However, the existing method cannot be scaled to a massive dataset. Given $N$ data points, existing algorithms show $\mathcal{O}(N^3)$ computational cost. In this paper, we propose a novel algorithm using the state space reformulation of Gaussian processes. More specifically, we reformulate Gaussian process gradient matching as a special state-space model problem, then approximate its posterior distribution by a novel Rao-Blackwellization filtering, which enjoys $\mathcal{O}(N)$ computational cost. Moreover, our algorithm is expressed as closed forms, it is 1000 times more faster than existing methods measured in wall clock time.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/futami21b.html
  PDF: https://proceedings.mlr.press/v157/futami21b/futami21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-futami21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Futoshi
    family: Futami
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 769-784
  id: futami21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 769
  lastpage: 784
  published: 2021-11-28 00:00:00 +0000
- title: 'Domain Adaptive YOLO for One-Stage Cross-Domain Detection'
  abstract: 'Domain shift is a major challenge for object detectors to generalize well to real world applications. Emerging techniques of domain adaptation for two-stage detectors help to tackle this problem. However, two-stage detectors are not the first choice for industrial applications due to its long time consumption. In this paper, a novel Domain Adaptive YOLO (DA-YOLO) is proposed to improve cross-domain performance for one-stage detectors. Image level features alignment is used to strictly match for  local features like texture, and loosely match for global features like illumination. Multi-scale instance level features alignment is presented to reduce instance domain shift effectively, such as variations in object appearance and viewpoint. A consensus regularization to these domain classifiers is employed to help the network generate domain-invariant detections. We evaluate our proposed method on popular datasets like Cityscapes, KITTI, SIM10K and et al.. The results demonstrate considerable improvement when tested under different cross-domain scenarios.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhang21c.html
  PDF: https://proceedings.mlr.press/v157/zhang21c/zhang21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhang21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shizhao
    family: Zhang
  - given: Hongya
    family: Tuo
  - given: Jian
    family: Hu
  - given: Zhongliang
    family: Jing
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 785-797
  id: zhang21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 785
  lastpage: 797
  published: 2021-11-28 00:00:00 +0000
- title: 'Hierarchical Semantic Segmentation using Psychometric Learning'
  abstract: 'Assigning meaning to parts of image data is the goal of semantic image segmentation. Machine learning methods, specifically supervised learning is commonly used in a variety of tasks formulated as semantic segmentation. One of the major challenges in the supervised learning approaches is expressing and collecting the rich knowledge that experts have with respect to the meaning present in the image data. Towards this, typically a fixed set of labels is specified and experts are tasked with annotating the pixels, patches or segments in the images with the given labels. In general, however, the set of classes does not fully capture the rich semantic information present in the images. For example, in medical imaging such as histology images, the different parts of cells could be grouped and sub-grouped based on the expertise of the pathologist.  To achieve such a precise semantic representation of the concepts in the image, we need access to the full depth of knowledge of the annotator. In this work, we develop a novel approach to collect segmentation annotations from experts based on psychometric testing. Our method consists of psychometric testing procedure, active query selection,  query enhancement, and a deep metric learning model to achieve a patch-level image embedding that allows for semantic segmentation of images. We show the merits of our method with evaluation on the synthetically generated image, aerial image and histology image.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/yin21a.html
  PDF: https://proceedings.mlr.press/v157/yin21a/yin21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-yin21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Lu
    family: Yin
  - given: Vlado
    family: Menkovski
  - given: Shwei
    family: Liu
  - given: Mykola
    family: Pechenizkiy
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 798-813
  id: yin21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 798
  lastpage: 813
  published: 2021-11-28 00:00:00 +0000
- title: 'Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick'
  abstract: 'Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure $d(\vec{x}_i,\vec{x}_j)$ between two vectors $\vec{x}_i, \vec{x}_j$. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/kang21a.html
  PDF: https://proceedings.mlr.press/v157/kang21a/kang21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-kang21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Keegan
    family: Kang
  - given: Sergey
    family: Kushnarev
  - given: Wei Pin
    family: Wong
  - given: Rameshwar
    family: Pratap
  - given: Haikal
    family: Yeo
  - given: Chen
    family: Yijia
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 814-829
  id: kang21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 814
  lastpage: 829
  published: 2021-11-28 00:00:00 +0000
- title: 'Feature Convolutional Networks'
  abstract: 'Convolutional neural networks are among the most successful deep learning models used for image processing, computer vision and natural language processing applications. In this paper, we define convolution operator for numerical tabular features and thus propose feature convolutional network model for machine learning tasks. Feature convolutional networks contain feature convolution layer to extract pairwise feature convolutions in the relational feature spaces. Compared with the baseline multi-layer neural network model, the feature convolutional network gains better performance among all the experiments. The experiments results suggest that feature convolutional networks can generate efficient features automatically and provide better performance through automatic feature learning. The demo code is at https://github.com/info-ruc/FeatConvNet.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/hu21a.html
  PDF: https://proceedings.mlr.press/v157/hu21a/hu21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-hu21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: He
    family: Hu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 830-839
  id: hu21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 830
  lastpage: 839
  published: 2021-11-28 00:00:00 +0000
- title: 'Bias-tolerant Fair Classification'
  abstract: 'The label bias and selection bias are acknowledged as two reasons in data that will hinder the fairness of machine-learning outcomes. The label bias occurs when the labeling decision is disturbed by sensitive features, while the selection bias occurs when subjective bias exists during the data sampling. Even worse, models trained on such data can inherit or even intensify the discrimination. Most algorithmic fairness approaches perform an empirical risk minimization with predefined fairness constraints, which tends to trade-off accuracy for fairness. However, such methods would achieve the desired fairness level with the sacrifice of the benefits (receive positive outcomes) for individuals affected by the bias. Therefore, we propose a \textbf{B}ias-Tolerant \textbf{FA}ir \textbf{R}egularized \textbf{L}oss (B-FARL), which tries to regain the benefits using data affected by label bias and selection bias. B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required. In addition, we show the effective components by decomposing B-FARL, and we utilize the meta-learning framework for the B-FARL optimization. The experimental results on real-world datasets show that our method is empirically effective in improving fairness towards the direction of true but latent labels.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhang21d.html
  PDF: https://proceedings.mlr.press/v157/zhang21d/zhang21d.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhang21d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yixuan
    family: Zhang
  - given: Feng
    family: Zhou
  - given: Zhidong
    family: Li
  - given: Yang
    family: Wang
  - given: Fang
    family: Chen
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 840-855
  id: zhang21d
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 840
  lastpage: 855
  published: 2021-11-28 00:00:00 +0000
- title: 'Multi-factor Memory Attentive Model for Knowledge Tracing'
  abstract: 'The traditional knowledge tracing with neural network usually embeds the required information and predicates the knowledge proficiency by embedded information. Only few information, however, is considered in traditional methods, such as the information of exercises in terms of concept. In this paper, we propose a multi-factor memory attentive model for knowledge tracing (MMAKT). In terms of Neural Cognitive Diagnosis (NeuralCD) framework, MMAKT introduces the factors of the knowledge concept relevancy, the difficulty of each concept, the discrimination among exercises and the student’s proficiency to construct interaction vectors.  Moreover, in order to achieve more accurate prediction precision, MMAKT introduces attention mechanism to enhance the expression of historical relationship between interactions. With the experiments on the real-world datasets, MMAKT shows better performance of knowledge tracing and prediction in comparision with the state-of-the-art approaches.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/liu21c.html
  PDF: https://proceedings.mlr.press/v157/liu21c/liu21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-liu21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Congjie
    family: Liu
  - given: Xiaoguang
    family: Li
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 856-869
  id: liu21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 856
  lastpage: 869
  published: 2021-11-28 00:00:00 +0000
- title: 'Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types'
  abstract: 'Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/christoffersen21a.html
  PDF: https://proceedings.mlr.press/v157/christoffersen21a/christoffersen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-christoffersen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Benjamin
    family: Christoffersen
  - given: Mark
    family: Clements
  - given: Keith
    family: Humphreys
  - given: Hedvig
    family: Kjellström
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 870-885
  id: christoffersen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 870
  lastpage: 885
  published: 2021-11-28 00:00:00 +0000
- title: 'Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization'
  abstract: 'For lower bit-widths such as less than 8-bit, many quantization strategies include re-training in order to recover accuracy degradation. However, the re-training works against rapid deployment for wide distribution of quantized models. Therefore, post-training quantization has been getting more attention in recent years. In one example, partial quantization according to the layer sensitivity based on the accuracy after each quantization has been proposed; however, the effects of one layer quantization on the other layers has not taken into account. To further reduce the accuracy degradation, we propose a quantization scheme that considers the effects by continuously updating the accuracy after each layer quantization. Additionally, for more data compression, we extend that scheme to mixed precision, which applies a layer-by-layer fitted bit-width. Since the search space for bit allocation per layer increases exponentially with the number of layers $N$, Existing methods require computationally intensive approach such as network training. Here, we derive practical solutions to the bit allocation problem in polynomial time $O(N^2)$ using a deterministic greedy search algorithm inspired by submodular optimization without any training. For example, the proposed algorithm completes a search on ResNet18 for ImageNet in 1 hour for a single GPU. Compared to the case without updating the layer sensitivity, our method improves the accuracy of the quantized model by more than 1% with multiple convolutional neural networks. For examples, 6-bit quantization of MobileNetV2 achieves 80.1% reduction of model size with -1.10% accuracy degradation. 4-bit quantization of ResNet50 achieves 82.9% size reduction with -0.194% accuracy degradation. Furthermore, results show that the proposed method reduces the accuracy degradation by more than about 0.7% compared to various latest post-training quantization strategies.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/satoki21a.html
  PDF: https://proceedings.mlr.press/v157/satoki21a/satoki21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-satoki21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tsuji
    family: Satoki
  - given: Kawaguchi
    family: Hiroshi
  - given: Inoue
    family: Atsuki
  - given: Sakai
    family: Yasufumi
  - given: Yamada
    family: Fuyuka
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 886-901
  id: satoki21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 886
  lastpage: 901
  published: 2021-11-28 00:00:00 +0000
- title: 'ExNN-SMOTE: Extended Natural Neighbors Based SMOTE to Deal with Imbalanced Data'
  abstract: 'Many practical applications suffer from the problem of imbalanced classification. The minority class has poor classification performance; on the other hand, its misclassification cost is high. One reason for classification difficulty is the intrinsic complicated distribution characteristics (CDCs) in imbalanced data itself. Classical oversampling method SMOTE generates synthetic minority class examples between neighbors, which is parameter dependent. Furthermore, due to blindness of neighbor selection, SMOTE suffers from overgeneralization in the minority class. To solve such problems, we propose an oversampling method, called extended natural neighbors based SMOTE (ExNN-SMOTE). In ExNN-SMOTE, neighbors are determined adaptively by capturing data distribution characteristics. Extensive experiments over synthetic and real datasets demonstrate the effectiveness of ExNN-SMOTE dealing with CDCs and the superiority of ExNN-SMOTE over other SMOTE-related methods.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/guan21a.html
  PDF: https://proceedings.mlr.press/v157/guan21a/guan21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-guan21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hongjiao
    family: Guan
  - given: Bin
    family: Ma
  - given: Yingtao
    family: Zhang
  - given: Xianglong
    family: Tang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 902-917
  id: guan21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 902
  lastpage: 917
  published: 2021-11-28 00:00:00 +0000
- title: 'Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning'
  abstract: ' The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and do not consider cases where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. Complementing this development, we propose a novel algorithm, Geometric Value Iteration (GVI), that features a dynamic error-aware KL coefficient design with the aim of mitigating the impact of errors on performance. Our experiments demonstrate that GVI can effectively exploit the trade-off between learning speed and robustness over uniform averaging of a constant KL coefficient. The combination of GVI and deep networks shows stable learning behavior even in the absence of a target network, where algorithms with a constant KL coefficient would greatly oscillate or even fail to converge.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/kitamura21a.html
  PDF: https://proceedings.mlr.press/v157/kitamura21a/kitamura21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-kitamura21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Toshinori
    family: Kitamura
  - given: Lingwei
    family: Zhu
  - given: Takamitsu
    family: Matsubara
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 918-931
  id: kitamura21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 918
  lastpage: 931
  published: 2021-11-28 00:00:00 +0000
- title: 'Collaborative Novelty Detection for Distributed Data by a Probabilistic Method'
  abstract: 'Novelty detection, which detects anomalies based on a training dataset consisting of only the normal data, is an important task in several applications. In addition, in the real world, there may be situations where data is owned by multiple parties in a distributed manner but cannot be shared with each other due to privacy and confidentiality requirements. Therefore, how to develop distributed novelty detection while preserving privacy is essential. To address this challenge, we propose a probabilistic collaborative method that allows distributed novelty detection for multiple parties without sharing the original data. The proposed method constructs a collaborative kernel based on a collaborative data analysis framework, by which intermediate representations are generated from each party and shared for collaborative novelty detection. Numerical experiments demonstrate that the proposed method obtains better performance compared with the individual novelty detection in the local party.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/imakura21a.html
  PDF: https://proceedings.mlr.press/v157/imakura21a/imakura21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-imakura21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Akira
    family: Imakura
  - given: Xiucai
    family: Ye
  - given: Tetsuya
    family: Sakurai
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 932-947
  id: imakura21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 932
  lastpage: 947
  published: 2021-11-28 00:00:00 +0000
- title: 'Efficient Coreset Constructions via Sensitivity Sampling'
  abstract: 'A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if $P$ is a set of points, $Q$ is a set of queries, and $f:P\times Q\to\mathbb{R}$ is a cost function, then a set $S\subseteq P$ with weights $w:P\to[0,\infty)$ is an $\epsilon$-coreset for some parameter $\epsilon>0$ if $\sum_{s\in S}w(s)f(s,q)$ is a $(1+\epsilon)$ multiplicative approximation to $\sum_{p\in P}f(p,q)$ for all $q\in Q$. Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on the total sensitivity $t$. In this paper we improve this bound from $O(t^2)$ to $O(t\log t)$. Thus our results imply more space efficient solutions to a number of problems, including projective clustering, $k$-line clustering, and subspace approximation. The main technical result is a generic reduction to the sample complexity of learning a class of functions with bounded VC dimension. We show that obtaining an $(\nu,\alpha)$-sample for this class of functions with appropriate parameters $\nu$ and $\alpha$ suffices to achieve space efficient $\epsilon$-coresets. Our result implies more efficient coreset constructions for a number of interesting problems in machine learning; we show applications to $k$-median/$k$-means, $k$-line clustering, $j$-subspace approximation, and the integer $(j,k)$-projective clustering problem. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/braverman21a.html
  PDF: https://proceedings.mlr.press/v157/braverman21a/braverman21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-braverman21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Vladimir
    family: Braverman
  - given: Dan
    family: Feldman
  - given: Harry
    family: Lang
  - given: Adiel
    family: Statman
  - given: Samson
    family: Zhou
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 948-963
  id: braverman21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 948
  lastpage: 963
  published: 2021-11-28 00:00:00 +0000
- title: 'Dynamic Popularity-Aware Contrastive Learning for Recommendation'
  abstract: 'With the development of deep learning techniques, contrastive representation learning has been increasingly employed in large-scale recommender systems. For instance, deep user-item matching models can be trained by contrasting positive and negative examples and learning discriminative user and item representations. Despite their success, the distinguishable properties of the recommender system are often ignored in existing modelling. Standard methods approximate maximum likelihood estimation on user behavior data in a manner similar to language models. Specifically, the way of model optimization corresponds to approximating the user-item pointwise mutual information, which can be regarded as eliminating the influence of global item popularity on user behavior to capture intrinsic user preference. In addition, unlike the situation in language models where word frequency is relatively stable, item popularity is constantly evolving. To address these issues, we propose a novel dynamic popularity-aware (DPA) contrastive learning method for recommendation, which consists of two key components: i) a dynamic negative sampling strategy is involved to enhance the user representation, ii) a dynamic prediction recovery is adopted by the real-time item popularity. The proposed strategy can be naturally overlaid on any contrastive learning-based matching model to more accurately capture user interest and system dynamics. Finally, the effectiveness of the proposed strategy is demonstrated through comprehensive experiments on an e-commerce scenario of Alibaba Group.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/lin21b.html
  PDF: https://proceedings.mlr.press/v157/lin21b/lin21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-lin21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Fangquan
    family: Lin
  - given: Wei
    family: Jiang
  - given: Jihai
    family: Zhang
  - given: Cheng
    family: Yang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 964-968
  id: lin21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 964
  lastpage: 968
  published: 2021-11-28 00:00:00 +0000
- title: 'Neural Graph Filtering for Context-aware Recommendation'
  abstract: ' With the rapid development of web services, various kinds of context data become available in recommender systems to handler the data sparsity problem, called context-aware recommendation (CAR). It is challenging to develop effective approaches to model and exploit these various and heterogeneous data. Recently, heterogeneous information network (HIN) has been adopted to model the context data due to its flexibility in modelling data heterogeneity. However, most of the HIN-based methods, which rely on meta paths or graph embedding to extract features from HINs, cannot fully mine the network structure and semantic features of users and items. Besides, these methods, utilizing the global dataset to learn personalized latent factors, usually suffer individuality loss problem. In this paper, we propose a neural graph filtering method for context-aware recommendation, called NGF. First, we use an unified HIN to model both the users’ feedback information and the context data. Then, we adopt graph filtering to predict aspect-level ratings on a series of independent subgraphs of the unified HIN and feed a deep neural network (DNN) to fuse the predictions for CAR. Concretely, graph filtering is a case-by-case algorithm for personalized recommendation on HINs, which predicts the further behavior by all its similar historical behaviors. We split the unified HIN into many single-aspect networks according to the semantic relations and utilize graph filtering to predict user’s behavior on each subgraphs. The following deep neural network is to fuse the personalized predictions in aspect-level. Extensive experiments on two real-world datasets demonstrate the effectiveness of our neural graph filtering for CAR.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/chuanyan21a.html
  PDF: https://proceedings.mlr.press/v157/chuanyan21a/chuanyan21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-chuanyan21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhang
    family: Chuanyan
  - given: Hong
    family: Xiaoguang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 969-984
  id: chuanyan21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 969
  lastpage: 984
  published: 2021-11-28 00:00:00 +0000
- title: 'Lifelong Learning with Sketched Structural Regularization'
  abstract: 'Preventing catastrophic forgetting while continually learning new tasks is an essential problem in lifelong learning. Structural regularization (SR) refers to a family of algorithms that mitigate catastrophic forgetting by penalizing the network for changing its “critical parameters" from previous tasks while learning a new one. The penalty is often induced via a quadratic regularizer defined by an \emph{importance matrix}, e.g., the (empirical) Fisher information matrix in the Elastic Weight Consolidation framework. In practice and due to computational constraints, most SR methods crudely approximate the importance matrix by its diagonal. In this paper, we propose \emph{Sketched Structural Regularization} (Sketched SR) as an alternative approach to compress the importance matrices used for regularizing in SR methods. Specifically, we apply \emph{linear sketching methods} to better approximate the importance matrices in SR algorithms. We show that sketched SR: (i) is computationally efficient and straightforward to implement, (ii) provides an approximation error that is justified in theory, and (iii) is method oblivious by construction and can be adapted to any method that belongs to the SR class. We show that our proposed approach consistently improves various SR algorithms’ performance on both synthetic experiments and benchmark continual learning tasks, including permuted-MNIST and CIFAR-100.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/li21b.html
  PDF: https://proceedings.mlr.press/v157/li21b/li21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-li21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Haoran
    family: Li
  - given: Aditya
    family: Krishnan
  - given: Jingfeng
    family: Wu
  - given: Soheil
    family: Kolouri
  - given: Praveen K.
    family: Pilly
  - given: Vladimir
    family: Braverman
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 985-1000
  id: li21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 985
  lastpage: 1000
  published: 2021-11-28 00:00:00 +0000
- title: 'Robust Regression for Monocular Depth Estimation'
  abstract: 'Learning accurate models for monocular depth estimation requires precise depth annotation as e.g. gathered through LiDAR scanners. Because the data acquisition with sensors of this kind is costly and does not scale well in general, less advanced depth sources, such as time-of-flight cameras, are often used instead. However, these sensors provide less reliable signals, resulting in imprecise depth data for training regression models. As shown in idealized environments, the noise produced by commonly used RGB-D sensors violates standard statistical assumptions of regression methods, such as least squares estimation. In this paper, we investigate whether robust regression methods, which are more tolerant toward violations of statistical assumptions, can mitigate the effects of low-quality data. As a viable alternative to established approaches of that kind, we propose the use of so-called superset learning, where the original data is replaced by (less precise but more reliable) set-valued data. To evaluate and compare the methods, we provide an extensive empirical study on common benchmark data for monocular depth estimation. Our results clearly show the superiority of robust variants over conventional regression.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/lienen21a.html
  PDF: https://proceedings.mlr.press/v157/lienen21a/lienen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-lienen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Julian
    family: Lienen
  - given: Nils
    family: Nommensen
  - given: Ralph
    family: Ewerth
  - given: Eyke
    family: Hüllermeier
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1001-1016
  id: lienen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1001
  lastpage: 1016
  published: 2021-11-28 00:00:00 +0000
- title: 'Transfer Learning with Adaptive Online TrAdaBoost for Data Streams'
  abstract: 'In many real-world applications, data are often produced in the form of streams. Consider, for example, data produced by sensors. In data streams there can be concept drift where the distribution of the data changes. When we deal with multiple streams from the same domain, concepts that have occurred in one stream may occur in another. Therefore, being able to reuse knowledge across multiple streams can help models recover from concept drifts more quickly. A major challenge is that these data streams may be only partially identical and a direct adoption would not suffice. In this paper, we propose a novel framework to transfer both identical and partially identical concepts across different streams. In particular, we propose a new technique called Adaptive Online TrAdaBoost that tunes weight adjustments during boosting based on model performance. The experiments on synthetic data verify the desired properties of the proposed method, and the experiments on real-world data show the better performance of the method for data stream mining compared with its baselines.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wu21b.html
  PDF: https://proceedings.mlr.press/v157/wu21b/wu21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wu21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ocean
    family: Wu
  - given: Yun Sing
    family: Koh
  - given: Gillian
    family: Dobbie
  - given: Thomas
    family: Lacombe
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1017-1032
  id: wu21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1017
  lastpage: 1032
  published: 2021-11-28 00:00:00 +0000
- title: 'Bridging Code-Text Representation Gap using Explanation'
  abstract: 'This paper studies Code-Text Representation (CTR) learning, aiming to learn general-purpose representations that support downstream code/text applications such as code search, finding code matching textual queries. However, state-of-the-arts do not focus on matching the gap between code/text modalities. In this paper, we complement this gap by providing an intermediate representation, and view it as  “explanation.” Our contribution is three fold: First, we propose four types of explanation utilization methods for CTR, and compare their effectiveness. Second, we show that using explanation as the model input is desirable. Third, we confirm that even automatically generated explanation can lead to a drastic performance gain. To the best of our knowledge, this is the first work to define and categorize code explanation, for enhancing code understanding/representation.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/han21a.html
  PDF: https://proceedings.mlr.press/v157/han21a/han21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-han21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hojae
    family: Han
  - given: Youngwon
    family: Lee
  - given: Minsoo
    family: Kim
  - given: Hwang
    family: Seung-won
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1033-1048
  id: han21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1033
  lastpage: 1048
  published: 2021-11-28 00:00:00 +0000
- title: 'ContriQ: Ally-Focused Cooperation and Enemy-Concentrated Confrontation in Multi-Agent Reinforcement Learning'
  abstract: 'Centralized training with decentralized execution (CTDE) is an important setting for cooperative multi-agent reinforcement learning (MARL) due to communication constraints during execution and scalability constraints during training, which has shown superior performance but still suffers from challenges. One branch is to understand the mutual interplay between agents. Due to the communication constraints in practice, agents cannot exchange perceptual information, and thus, many approaches use a centralized attention network with scalability constraints. Contrary to these common approaches, we propose to learn to cooperate in a decentralized way by applying attention mechanism on the local observation so that each agent could focus on allied agents with a decentralized model, and therefore promote understanding. Another branch is to model how agents cooperate and simplify the learning process. Previous approaches that focus on value decomposition have achieved innovative results but still suffer from problems. These approaches either limit the representation expressiveness of their value function classes or relax the IGM consistency to achieve scalability, which may lead to poor performance. We combine value composition with game abstraction by modeling the relationships between agents as a bi-level graph. We propose a novel value decomposition network based on it through a bi-level attention network, which indicates the contribution of allied agents attacking enemies and the priority of attacking each enemy under the situation of each time step, respectively. We show that our method substantially outperforms existing state-of-the-art methods on battle games in StarCraft Ⅱ, and attention analysis is also comprehensively discussed with sights.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/chenran21a.html
  PDF: https://proceedings.mlr.press/v157/chenran21a/chenran21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-chenran21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhao
    family: Chenran
  - given: Shi
    family: Dianxi
  - given: Zhang
    family: Yaowen
  - given: Yang
    family: Huanhuan
  - given: Yang
    family: Shaowu
  - given: Zhang
    family: Yongjun
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1049-1064
  id: chenran21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1049
  lastpage: 1064
  published: 2021-11-28 00:00:00 +0000
- title: 'DAGSurv: Directed Ayclic Graph Based Survival Analysis Using Deep Neural Networks'
  abstract: 'Causal structures for observational survival data provide crucial information regarding the relationships between covariates and time-to-event. We derive motivation from the information theoretic source coding argument, and show that incorporating the knowledge of the directed acyclic graph (DAG) can be beneficial if suitable source encoders are employed. As a possible source encoder in this context, we derive a variational inference based conditional variational autoencoder for causal structured survival prediction, which we refer to as \texttt{DAGSurv}. We illustrate the performance of \texttt{DAGSurv} on low and high-dimensional synthetic datasets, and real-world datasets such as METABRIC and GBSG. We demonstrate that the proposed method outperforms other survival analysis baselines such as \texttt{Cox} Proportional Hazards, \texttt{DeepSurv} and \texttt{Deephit}, which are oblivious to the underlying causal relationship between data entities.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/sharma21a.html
  PDF: https://proceedings.mlr.press/v157/sharma21a/sharma21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-sharma21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ansh Kumar
    family: Sharma
  - given: Rahul
    family: Kukreja
  - given: Ranjitha
    family: Prasad
  - given: Shilpa
    family: Rao
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1065-1080
  id: sharma21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1065
  lastpage: 1080
  published: 2021-11-28 00:00:00 +0000
- title: 'Modeling Risky Choices in Unknown Environments'
  abstract: 'Decision-theoretic models explain human behavior in choice problems involving uncertainty, in terms of individual tendencies such as risk aversion. However, many classical models of risk require knowing the distribution of possible outcomes (rewards) for all options, limiting their applicability outside of controlled experiments. We study the task of learning such models in contexts where the modeler does not know the distributions but instead can only observe the choices and their outcomes for a user familiar with the decision problems, for example a skilled player playing a digital game. We propose a framework combining two separate components, one for modeling the unknown decision-making environment and another for the risk behavior. By using environment models capable of learning distributions we are able to infer classical models of decision-making under risk from observations of the user’s choices and outcomes alone, and we also demonstrate alternative models for predictive purposes. We validate the approach on artificial data and demonstrate a practical use case in modeling risk attitudes of professional esports teams.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/tanskanen21a.html
  PDF: https://proceedings.mlr.press/v157/tanskanen21a/tanskanen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-tanskanen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ville
    family: Tanskanen
  - given: Chang
    family: Rajani
  - given: Homayun
    family: Afrabandpey
  - given: Aini
    family: Putkonen
  - given: Aurélien
    family: Nioche
  - given: Arto
    family: Klami
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1081-1096
  id: tanskanen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1081
  lastpage: 1096
  published: 2021-11-28 00:00:00 +0000
- title: 'Expert advice problem with noisy low rank loss'
  abstract: 'We consider the expert advice problem with a low rank but noisy loss sequence, where a loss vector $l_{t} \in [-1,1]^N$ in each round $t$ is of the form $l_{t} = U v_{t} + \epsilon_{t}$ for some fixed but unknown $N \times d$ matrix $U$ called the kernel, some $d$-dimensional seed vector $v_{t} \in \mathbb{R}^{d}$, and some additional noisy term $\epsilon_t \in \mathbb{R}^{N}$ whose norm is bounded by $\epsilon$. This is a generalization of the works of Hazan et al. and Barman et al., where the former only treats noiseless loss and the latter assumes that the kernel is known in advance. In this paper, we propose an algorithm, where we re-construct the kernel under the assumptions, that the low rank loss is noised and there is no prior information about kernel. In this algorithm, we approximate the kernel by choosing a set of loss vectors with a high degree of independence from each other, and we give a regret bound of $O(d\sqrt{T}+d^{4/3}(N\epsilon)^{1/3}\sqrt{T})$. Moreover, even if in experiment, the proposed algorithm performs better than Hazan’s algorithm and Hedge algorithm.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/liu21d.html
  PDF: https://proceedings.mlr.press/v157/liu21d/liu21d.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-liu21d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yaxiong
    family: Liu
  - given: Xuanke
    family: Jiang
  - given: Kohei
    family: Hatano
  - given: Eiji
    family: Takimoto
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1097-1112
  id: liu21d
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1097
  lastpage: 1112
  published: 2021-11-28 00:00:00 +0000
- title: 'An online semi-definite programming with a generalised log-determinant regularizer and its applications'
  abstract: 'We consider a variant of the online semi-definite programming problem: The decision space consists of positive semi-definite matrices with bounded diagonal entries and bounded $\Gamma$-trace norm, which is a generalization of the trace norm defined by a positive definite matrix $\Gamma$. To solve this problem, we propose a follow-the-regularized-leader algorithm with a novel regularizer, which is a generalisation of the log-determinant function parameterized by the matrix $\Gamma$. Then we apply our algorithm to online binary matrix completion (OBMC) with side information and online similarity prediction with side information, and improve mistake bounds by logarithmic factors. In particular, for OBMC our mistake bound is optimal.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/liu21e.html
  PDF: https://proceedings.mlr.press/v157/liu21e/liu21e.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-liu21e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yaxiong
    family: Liu
  - given: Ken-ichiro
    family: Moridomi
  - given: Kohei
    family: Hatano
  - given: Eiji
    family: Takimoto
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1113-1128
  id: liu21e
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1113
  lastpage: 1128
  published: 2021-11-28 00:00:00 +0000
- title: 'Cross-structural Factor-topic Model: Document Analysis with Sophisticated Covariates'
  abstract: 'Modern text data is increasingly gathered in situations where it is paired with a  high-dimensional collection of covariates:  then both the text, the covariates, and their relationships are of interest to analyze. Despite the growing amount of such data, current topic models are unable to take into account large amounts of covariates successfully:  they fail to model structure among covariates and distort findings of both text and covariates. This paper presents a solution: a novel factor-topic model that enables researchers to analyze latent structure in both text and sophisticated document-level covariates collectively. The key innovation is that besides learning the underlying topical structure,  the model also learns the underlying factorial structure from the covariates and the interactions between the two structures.  A set of tailored variational inference algorithms for efficient computation are provided.  Experiments on three different datasets show the model outperforms comparable topic models in the ability to predict held-out document content.  Two case studies focusing on Finnish parliamentary election candidates and game players on Steam demonstrate the model discovers semantically meaningful topics, factors, and their interactions.  The model both outperforms state-of-the-art models in predictive accuracy and offers new factor-topic insights beyond other topic models.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/lu21a.html
  PDF: https://proceedings.mlr.press/v157/lu21a/lu21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-lu21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chien
    family: Lu
  - given: Jaakko
    family: Peltonen
  - given: Timo
    family: Nummenmaa
  - given: Jyrki
    family: Nummenmaa
  - given: Kalervo
    family: Järvelin
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1129-1144
  id: lu21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1129
  lastpage: 1144
  published: 2021-11-28 00:00:00 +0000
- title: 'Deep Structural Contrastive Subspace Clustering'
  abstract: 'Deep subspace clustering based on data self-expression is devoted to learning pairwise affinities in the latent feature space. Existing methods tend to rely on an autoencoder framework to learn representations for an affinity matrix. However, the representation learning driven largely by pixel-level data reconstruction is somewhat incompatible with the subspace clustering task. With the unavailability of ground truth, can structural representations, which is exactly what subspace clustering favors, be achieved by simply exploiting the supervision information in the data itself? In this paper, we formulate this intuition as a structural contrastive prediction task and propose an end-to-end trainable framework referred as Deep Structural Contrastive Subspace Clustering (DSCSC). Specifically, DSCSC makes use of data augmentation technique to mine positive pairs and constructs a data similarity graph in the embedding feature space to search negative pairs. A novel structural contrastive loss is proposed on the latent representations to achieve positive-concentrated and negative-separated property for subspace preserving. Extensive experiments on the benchmark datasets demonstrate that our method outperforms the state-of-the-art deep subspace clustering methods and imply the necessity of the proposed structural contrastive loss.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/peng21a.html
  PDF: https://proceedings.mlr.press/v157/peng21a/peng21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-peng21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Bo
    family: Peng
  - given: Wenjie
    family: Zhu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1145-1160
  id: peng21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1145
  lastpage: 1160
  published: 2021-11-28 00:00:00 +0000
- title: 'Lifelong Learning with Branching Experts'
  abstract: 'The problem of branching experts is an extension of the experts problem where the set of experts may grow over time. We compare this problem in different learning settings along several axes: adversarial versus stochastic losses; a fixed versus a growing set of experts (branching experts); and single-task versus lifelong learning with expert advice. First, for the branching experts problem, we achieve tight regret bounds in both adversarial and stochastic setting with a single algorithm. While it was known that the adversarial branching experts problem is strictly harder than the non-branching one, the stochastic branching experts problem is in fact no harder. Next, we study the extension to the lifelong learning with expert advice in which one has to make online predictions with a sequence of tasks. For this problem, we provide a single algorithm which works for both adversarial and stochastic setting, and our bounds when specialized to the case without branching recover the regret bounds previously achieved separately via different algorithms. Furthermore, we prove a regret lower bound which shows that in the lifelong learning scenario, the case with branching experts now becomes strictly harder than the non-branching case in the stochastic setting.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wu21c.html
  PDF: https://proceedings.mlr.press/v157/wu21c/wu21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wu21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yi-Shan
    family: Wu
  - given: Yi-Te
    family: Hong
  - given: Chi-Jen
    family: Lu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1161-1175
  id: wu21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1161
  lastpage: 1175
  published: 2021-11-28 00:00:00 +0000
- title: 'Ensembling With a Fixed Parameter Budget: When Does It Help and Why?'
  abstract: 'Given a fixed parameter budget, one can build a single large neural network or create a memory-split ensemble: a pool of several smaller networks with the same total parameter count as the single network. A memory-split ensemble can outperform its single model counterpart (Lobacheva et al., 2020): a phenomenon known as the memory-split advantage (MSA). The reasons for MSA are still not yet fully understood. In particular, it is difficult in practice to predict when it will exist. This paper sheds light on the reasons underlying MSA using random feature theory. We study the dependence of the MSA on several factors: the parameter budget, the training set size, the L2 regularization and the Stochastic Gradient Descent (SGD) hyper-parameters. Using the bias-variance decomposition, we show that MSA exists when the reduction in variance due to the ensemble (\ie, \textit{ensemble gain}) exceeds the increase in squared bias due to the smaller size of the individual networks (\ie, \textit{shrinkage cost}). Taken together, our theoretical analysis demonstrates that the MSA mainly exists for the small parameter budgets relative to the training set size, and that memory-splitting can be understood as a type of regularization. Adding other forms of regularization, \eg L2 regularization, reduces the MSA. Thus, the potential benefit of memory-splitting lies primarily in the possibility of speed-up via parallel computation. Our empirical experiments with deep neural networks and large image datasets show that MSA is not a general phenomenon, but mainly exists when the number of training iterations is small.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/deng21a.html
  PDF: https://proceedings.mlr.press/v157/deng21a/deng21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-deng21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Didan
    family: Deng
  - given: Emil Bertram
    family: Shi
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1176-1191
  id: deng21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1176
  lastpage: 1191
  published: 2021-11-28 00:00:00 +0000
- title: 'Revisiting Weight Initialization of Deep Neural Networks'
  abstract: 'The proper {\em initialization of weights} is crucial for the effective training and fast convergence of {\em deep neural networks} (DNNs). Prior work in this area has mostly focused on the principle of {\em balancing the variance among weights per layer} to maintain stability of (i) the input data propagated forwards through the network, and (ii) the loss gradients propagated backwards, respectively. This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only first-order effects per layer. In this paper, we investigate a {\em unifying approach}, based on approximating and controlling the {\em norm of the layers’ Hessians}, which both generalizes and explains existing initialization schemes such as {\em smooth activation functions}, {\em Dropouts}, and {\em ReLU}.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/skorski21a.html
  PDF: https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-skorski21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Maciej
    family: Skorski
  - given: Alessandro
    family: Temperoni
  - given: Martin
    family: Theobald
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1192-1207
  id: skorski21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1192
  lastpage: 1207
  published: 2021-11-28 00:00:00 +0000
- title: 'Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control'
  abstract: 'Due to the high efficiency and less weather dependency, autonomous greenhouses provide an ideal solution to meet the increasing demand for fresh food. However, managers are faced with some challenges in finding appropriate control strategies for crop growth, since the decision space of the greenhouse control problem is an astronomical number. Therefore, an intelligent closed-loop control framework is highly desired to generate an automatic control policy. As a powerful tool for optimal control, reinforcement learning (RL) algorithms can surpass human beings’ decision-making and can also be seamlessly integrated into the closed-loop control framework. However, in complex real-world scenarios such as agricultural automation control, where the interaction with the environment is time-consuming and expensive, the application of RL algorithms encounters two main challenges, i.e., sample efficiency and safety. Although model-based RL methods can greatly mitigate the efficiency problem of greenhouse control, the safety problem has not got too much attention. In this paper, we present a model-based robust RL framework for autonomous greenhouse control to meet the sample efficiency and safety challenges. Specifically, our framework introduces an ensemble of environment models to work as a simulator and assist in policy optimization, thereby addressing the low sample efficiency problem. As for the safety concern, we propose a sample dropout module to focus more on worst-case samples, which can help improve the adaptability of the greenhouse planting policy in extreme cases. Experimental results demonstrate that our approach can learn a more effective greenhouse planting policy with better robustness than existing methods.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhang21e.html
  PDF: https://proceedings.mlr.press/v157/zhang21e/zhang21e.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhang21e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Wanpeng
    family: Zhang
  - given: Xiaoyan
    family: Cao
  - given: Yao
    family: Yao
  - given: Zhicheng
    family: An
  - given: Xi
    family: Xiao
  - given: Dijun
    family: Luo
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1208-1223
  id: zhang21e
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1208
  lastpage: 1223
  published: 2021-11-28 00:00:00 +0000
- title: 'Physics-inspired Learning for Structure-Aware Texture-Sensitive Underwater Image Enhancement'
  abstract: 'Recently, improving the visual quality of underwater images using deep learning-based methods has drawn considerable attention. Unfortunately, diverse environmental factors (e.g., blue/green color distortion) severely limit their performance in real-world environments. Therefore, strengthening the superiority of the underwater image enhancement method is critical. In this paper, we devote ourselves to develop a new architecture with strong superiority and adaptability. Inspired by the underwater imaging principle, we establish a novel physics-inspired learning model that is easy to realize. A Structure-Aware Texture-Sensitive Network (SATS-Net) is further developed to portray the model. The structure-aware module is responsible for structural information, and the texture-sensitive module is responsible for textural information. Thus, SATS-Net successfully incorporates robust characterization absorbed from the physical principle to achieve strong robustness and adaptability. We conduct extensive experiments to demonstrate that SATS-Net outperforms existing advanced techniques in various real-world underwater environments.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/xue21a.html
  PDF: https://proceedings.mlr.press/v157/xue21a/xue21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-xue21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Xinwei
    family: Xue
  - given: Zexuan
    family: Li
  - given: Long
    family: Ma
  - given: Risheng
    family: Liu
  - given: Xin
    family: Fan
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1224-1236
  id: xue21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1224
  lastpage: 1236
  published: 2021-11-28 00:00:00 +0000
- title: 'Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation'
  abstract: 'In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhao21b.html
  PDF: https://proceedings.mlr.press/v157/zhao21b/zhao21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhao21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chenyang
    family: Zhao
  - given: Timothy
    family: Hospedales
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1237-1252
  id: zhao21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1237
  lastpage: 1252
  published: 2021-11-28 00:00:00 +0000
- title: 'PFedAtt: Attention-based Personalized Federated Learning on Heterogeneous Clients'
  abstract: 'In federated learning, heterogeneity among the clients’ local datasets results in large variations in the number of local updates performed by each client in a communication round. Simply aggregating such local models into a global model will confine the capacity of the system, that is, the single global model will be restricted from delivering good performance on each client’s task. This paper provides a general framework to analyze the convergence of personalized federated learning algorithms. It subsumes previously proposed methods and provides a principled understanding of the computational guarantees. Using insights from this analysis, we propose PFedAtt, a personalized federated learning method that incorporates attention-based grouping to facilitate similar clients’ collaborations. Theoretically, we provide the convergence guarantee for the algorithm, and empirical experiments corroborate the competitive performance of PFedAtt on heterogeneous clients.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/ma21a.html
  PDF: https://proceedings.mlr.press/v157/ma21a/ma21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-ma21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zichen
    family: Ma
  - given: Yu
    family: Lu
  - given: Wenye
    family: Li
  - given: Jinfeng
    family: Yi
  - given: Shuguang
    family: Cui
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1253-1268
  id: ma21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1253
  lastpage: 1268
  published: 2021-11-28 00:00:00 +0000
- title: 'Multi-stream based marked point process'
  abstract: 'When using a point process, a specific form of the model needs to be designed for intensity function, based on physical and mathematical prior knowledge about the data. Recently, a fully trainable deep learning-based approach has been developed for temporal point processes. This approach models a cumulative hazard function (CHF), which is capable of systematic computation of adaptive intensity function in a data-driven manner. However, this approach does not take the attribute information of events into account although many applications of point processes generate with a variety of marked information such as location, magnitude, and depth of seismic activity. To overcome this limitation, we propose a fully trainable marked point process method, modeling decomposed CHFs for time and mark using multi-stream deep neural networks. In addition, we also propose to encode multiple marked information into a single image and extract necessary information adaptively without detailed knowledge about the data. We show the effectiveness of our proposed method through experiments with simulated toy data and real seismic data.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/hong21a.html
  PDF: https://proceedings.mlr.press/v157/hong21a/hong21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-hong21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sujun
    family: Hong
  - given: Hirotaka
    family: Hachiya
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1269-1284
  id: hong21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1269
  lastpage: 1284
  published: 2021-11-28 00:00:00 +0000
- title: 'Bayesian Latent Factor Model for Higher-order Data'
  abstract: 'Latent factor models are canonical tools to learn low-dimensional and linear embedding of original data. Traditional latent factor models are based on low-rank matrix factorization of covariance matrices. However, for higher-order data with multiple modes, i.e., tensors, this simple treatment fails to take into account the mode-specific relations. This ignorance leads to inefficiency in analysis of complex structures as well as poor data compression ability. In this paper, unlike covariance matrices, we investigate high-order covariance tensor directly by exploiting tensor ring (TR) format and propose the Bayesian TR latent factor model, which can represent complex multi-linear correlations and achieves efficient data compression. To overcome the difficulty of finding the optimal TR-ranks and simultaneously imposing sparsity on loading coefficients, a multiplicative Gamma process (MGP) prior is adopted to automatically infer the ranks and obtain sparsity. Then, we establish an efficient parameter-expanded EM algorithm to learn the maximum a posteriori (MAP) estimate of model parameters. Finally, we evaluate our model on covariance estimation, latent factor learning and image inpainting problems.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/tao21a.html
  PDF: https://proceedings.mlr.press/v157/tao21a/tao21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-tao21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zerui
    family: Tao
  - given: Xuyang
    family: Zhao
  - given: Toshihisa
    family: Tanaka
  - given: Qibin
    family: Zhao
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1285-1300
  id: tao21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1285
  lastpage: 1300
  published: 2021-11-28 00:00:00 +0000
- title: 'Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning'
  abstract: 'Traveling salesman problem (TSP) is a classical combinatorial optimization problem. As it represents a large number of important practical problems, it has received extensive studies and a great variety of algorithms have been proposed to solve it, including exact and heuristic algorithms. The success of heuristic algorithms relies heavily on the design of powerful heuristic rules, and most of the existing heuristic rules were manually designed by experienced experts to model their insights and observations on TSP instances and solutions. Recent studies have shown an alternative promising design strategy that directly learns heuristic rules from TSP instances without any manual interference. Here, we report an iterative improvement approach (called Neural-3-OPT) that solves TSP through automatically learning effective 3-opt heuristics via deep reinforcement learning. In the proposed approach, we adopt a pointer network to select 3 links from the current tour,and a feature-wise linear modulation network to select an appropriate way to reconnect the segments after removing the selected 3 links. We demonstrate that our approach achieves state-of-the-art performance on both real TSP instances and randomly-generated instances than, to the best of our knowledge, the existing neural network-based approaches.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/sui21a.html
  PDF: https://proceedings.mlr.press/v157/sui21a/sui21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-sui21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jingyan
    family: Sui
  - given: Shizhe
    family: Ding
  - given: Ruizhi
    family: Liu
  - given: Liming
    family: Xu
  - given: Dongbo
    family: Bu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1301-1316
  id: sui21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1301
  lastpage: 1316
  published: 2021-11-28 00:00:00 +0000
- title: 'Time-Constrained Multi-Agent Path Finding in Non-Lattice Graphs with Deep Reinforcement Learning'
  abstract: 'Multi-Agent Path Finding (MAPF) is a routing problem in which multiple agents need to each find a lowest-cost collection of routes in a graph that avoids collisions between agents. This problem occurs frequently in the domain of logistics, for example in the routing of trains in shunting yards, airplanes at airports, and picking robots in automated warehouses. A solution is presented for the MAPF problem in which agents operate on an arbitrary directed graph, rather than the commonly assumed grid world, which extends support to use cases where the environment cannot be easily modeled in a grid shape. Furthermore, constraints are introduced on the start and end times of the routing tasks, which is vital in MAPF problems that are part of larger logistics systems. A Reinforcement Learning-based (RL) approach is proposed to learn a local routing policy for an agent in a manner that relieves the need for manually designing heuristics. It relies on a Graph Convolutional Network to handle arbitrary graphs. Both single-agent and multi-agent RL approaches are presented, showing how a multi-agent setup can reduce training time by exploiting the similarities in agent properties and local graph topologies.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/knippenberg21a.html
  PDF: https://proceedings.mlr.press/v157/knippenberg21a/knippenberg21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-knippenberg21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Marijn
    prefix: van
    family: Knippenberg
  - given: Mike
    family: Holenderski
  - given: Vlado
    family: Menkovski
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1317-1332
  id: knippenberg21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1317
  lastpage: 1332
  published: 2021-11-28 00:00:00 +0000
- title: 'Exposing Cyber-Physical System Weaknesses by Implicitly Learning their Underlying Models'
  abstract: 'Cyber-Physical Systems (CPS) plays a critical role in today’s social life, especially with occasional pandemic events. With more reliance on the cyber operation of infrastructures, it is important to understand attacking mechanisms in CPS for potential solutions and defenses, where False Data Injection Attack (FDIA) is an important class. FDIA methods in the literature require the mathematical CPS model and state variable values to create an efficient attack vector, unrealistic for many attackers in the real world. Also, they do not have performance guarantee. This paper shows that it is possible to deploy a FDIA without having the CPS model and state variables information. Additionally, we prove a theoretic bound for the proposed method. Specifically, we design a scheme that learns an implicit CPS model to create tampered sensor measurements to deploy an attack based only on historical data. The proposed framework utilizes a Wasserstein generative adversarial network with two regularization terms to create such tampered measurements also known as adversarial examples. To build an attack with confidence, we present a proof based on convergence in distribution and Lipschitz norm to show that our method captures the real observed measurement distribution. This means that our model learns the complex underlying processes from the CPSs. We demonstrate the robustness and universality of our proposed framework based on two diversified adversarial examples with different systems, domains, and datasets.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/costilla-enriquez21a.html
  PDF: https://proceedings.mlr.press/v157/costilla-enriquez21a/costilla-enriquez21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-costilla-enriquez21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Napoleon
    family: Costilla-Enriquez
  - given: Yang
    family: Weng
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1333-1348
  id: costilla-enriquez21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1333
  lastpage: 1348
  published: 2021-11-28 00:00:00 +0000
- title: 'NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters'
  abstract: 'The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hyperparameters is essential to further NAS research. The existing NAS-HPO-Bench is a benchmark for joint optimization, but it does not consider the network connectivity design as done in modern NAS algorithms. This paper introduces the first benchmark dataset for joint optimization of network connections and training hyperparameters, which we call NAS-HPO-Bench-II. We collect the performance data of 4K cell-based convolutional neural network architectures trained on the CIFAR-10 dataset with different learning rate and batch size settings, resulting in the data of 192K configurations. The dataset includes the exact data for 12 epoch training. We further build the surrogate model predicting the accuracies after 200 epoch training to provide the performance data of longer training epoch. By analyzing NAS-HPO-Bench-II, we confirm the dependency between architecture and training hyperparameters and the necessity of joint optimization. Finally, we demonstrate the benchmarking of the baseline optimization algorithms using NAS-HPO-Bench-II.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/hirose21a.html
  PDF: https://proceedings.mlr.press/v157/hirose21a/hirose21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-hirose21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yoichi
    family: Hirose
  - given: Nozomu
    family: Yoshinari
  - given: Shinichi
    family: Shirakawa
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1349-1364
  id: hirose21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1349
  lastpage: 1364
  published: 2021-11-28 00:00:00 +0000
- title: 'Metric Learning for comparison of HMMs using Graph Neural Networks'
  abstract: 'Hidden Markov models (HMMs) belong to the class of double embedded stochastic models which were originally leveraged for speech recognition and synthesis. HMMs subsequently became a generic sequence model across multiple domains like NLP, bio-informatics and thermodynamics to name a few. Literature has several heuristic metrics to compare two HMMs by factoring in their structure and emission probability distributions in HMM nodes. However, typical structure-based metrics overlook the similarity between HMMs having different structures yet similar behavior and typical behavior-based metrics rely on the representativeness of the reference sequence used for assessing the similarity in behavior. Further, little exploration has taken place in leveraging the recent advancements in deep graph neural networks for learning effective representations for HMMs. In this paper, we propose two novel deep neural network based approaches to learn embeddings for HMMs and evaluate the validity of the embeddings based on subsequent clustering and classification tasks. Our proposed approaches use a Graph variational Autoencoder and diffpooling based Graph neural network (GNN) to learn embeddings for HMMs. The graph autoencoder infers latent low-dimensional flat embeddings for HMMs in a task-agnostic manner; whereas the diffpooling based graph neural network learns class-label aware embeddings by inferring and aggregating a hierarchical set of clusters and sub-clusters of graph nodes. Empirical results reveal that the HMM embeddings learnt through the Graph variational autoencoders and diffpooling based GNN outperform the popular heuristics as measured by the cluster quality metrics and the classification accuracy in downstream tasks.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/soni21a.html
  PDF: https://proceedings.mlr.press/v157/soni21a/soni21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-soni21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rajan Kumar
    family: Soni
  - given: Karthick
    family: Seshadri
  - given: Balaraman
    family: Ravindran
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1365-1380
  id: soni21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1365
  lastpage: 1380
  published: 2021-11-28 00:00:00 +0000
- title: 'Layer-Wise Neural Network Compression via Layer Fusion'
  abstract: ' This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset, we compress Transformer models to 20% of their original size while being within 5 perplexity points of the original network. We also find that other well-established compression techniques can achieve competitive performance when compared to their original networks given a sufficient number of retraining steps. Generally, we observe a clear inflection point in performance as the amount of compression increases, suggesting a bound on the amount of compression that can be achieved before an exponential degradation in performance.  '
  volume: 157
  URL: https://proceedings.mlr.press/v157/o-neill21a.html
  PDF: https://proceedings.mlr.press/v157/o-neill21a/o-neill21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-o-neill21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: James
    family: O’Neill
  - given: Greg
    family: V. Steeg
  - given: Aram
    family: Galstyan
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1381-1396
  id: o-neill21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1381
  lastpage: 1396
  published: 2021-11-28 00:00:00 +0000
- title: 'Bayesian neural network unit priors and generalized Weibull-tail property'
  abstract: 'The connection between Bayesian neural networks and Gaussian processes gained a lot of attention in the last few years. Hidden units are proven to follow a Gaussian process limit when the layer width tends to infinity. Recent work has suggested that finite Bayesian neural networks may outperform their infinite counterparts because they adapt their internal representations flexibly. To establish solid ground for future research on finite-width neural networks, our goal is to study the prior induced on hidden units. Our main result is an accurate description of hidden units tails which shows that unit priors become heavier-tailed going deeper, thanks to the introduced notion of generalized Weibull-tail. This finding sheds light on the behavior of hidden units of finite Bayesian neural networks. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/vladimirova21a.html
  PDF: https://proceedings.mlr.press/v157/vladimirova21a/vladimirova21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-vladimirova21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mariia
    family: Vladimirova
  - given: Julyan
    family: Arbel
  - given: Stéphane
    family: Girard
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1397-1412
  id: vladimirova21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1397
  lastpage: 1412
  published: 2021-11-28 00:00:00 +0000
- title: 'A Partial Label Metric Learning Algorithm for Class Imbalanced Data'
  abstract: 'The performance of machine learning algorithms depends on the distance metric, in addition to the model and loss function, etc. The partial label metric learning technique can improve the accuracy of partial label learning algorithms by using training data to learn a better distance metric, which has gradually attracted the attention of scholars in recent years. The essence of partial label learning is mainly to deal with multi-class classification problems, while class imbalance is a common phenomenon in these problems. The class imbalanced problem affects the prediction accuracy of minority class samples, but the current partial label metric learning algorithms rarely consider the problem. In this paper, we propose two partial label metric learning algorithms (PL-CCML-SFN and PL-CCML-LDD) that can solve the class imbalanced problem. The basic idea is to add a regularization term to the objective function of the PL-CCML model, which can induce each class to be uniformly distributed in the new metric space and thus play the role of balancing each class. The experimental results show that these two algorithms, compared with the existing partial label metric learning algorithms, have improved the overall performance on the class imbalanced data.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/liu21f.html
  PDF: https://proceedings.mlr.press/v157/liu21f/liu21f.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-liu21f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Wenpeng
    family: Liu
  - given: Li
    family: Wang
  - given: Jie
    family: Chen
  - given: Yu
    family: Zhou
  - given: Ruirui
    family: Zheng
  - given: Jianjun
    family: He
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1413-1428
  id: liu21f
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1413
  lastpage: 1428
  published: 2021-11-28 00:00:00 +0000
- title: 'Scaling Average-Linkage via Sparse Cluster Embeddings'
  abstract: 'Average-linkage is one of the most popular hierarchical clustering  algorithms. It is well known that average-linkage  does not scale to large data sets due to the slow asymptotic running time. The fastest known implementation has running time quadratic in the number of data points. This paper presents a technique that we call cluster embedding.  The embedding maps each cluster into a point in slightly higher dimensions. The pairwise distances between the mapped points approximate the average distance between clusters. By utilizing this embedding we scale the task of finding close pairs of clusters, which is a key step in average-linkage clustering. We achieve an approximate, sub-quadratic time implementation of average-linkage. We show  theoretically the algorithm proposed in this paper achieves a near-linear running time and scales to large data sets.  Moreover, its scalability empirically dominates average-linkage and typically offers 3-10x speed-up on large data sets.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/lavastida21a.html
  PDF: https://proceedings.mlr.press/v157/lavastida21a/lavastida21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-lavastida21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Thomas
    family: Lavastida
  - given: Kefu
    family: Lu
  - given: Benjamin
    family: Moseley
  - given: Yuyan
    family: Wang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1429-1444
  id: lavastida21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1429
  lastpage: 1444
  published: 2021-11-28 00:00:00 +0000
- title: 'Multi-scale Salient Instance Segmentation based on Encoder-Decoder'
  abstract: 'Salient instance segmentation refers to segmenting noticeable instance objects in images. In the face of multi-scale salient instances and overlapping instances, the existing salient instance segmentation methods have great limitations including inaccurate detection of large-scale instances, missing detection of small-scale instances, and wrong segmentation of overlapping instances. In order to solve these problems, a new multi-scale salient instance segmentation network (MSISNet) based on encoder-decoder is proposed. Firstly, a receptive field encoder (RFE) is designed to alleviate the problems of inaccurate detection of large-scale instances, missing detection of small-scale instances, and especially wrong segmentation of overlapping instances. Then, a pyramid decoder (PD) for the detection branch is designed to further alleviate the problem of inaccurate detection of large-scale instances and the difficulty in locating small-scale instances. Finally, a multi-stage decoder (MSD) is designed to improve the quality of the segmentation mask. Experiments on salient instance segmentation dataset Salient Instance Segmentation-1K (SIS-1K) have been conducted and the results show that the proposed method MSISNet is superior to the existing salient instance segmentation methods MSRNet and S4Net, and achieves better segmentation accuracy and speed.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/chen21b.html
  PDF: https://proceedings.mlr.press/v157/chen21b/chen21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-chen21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Houru
    family: Chen
  - given: Caijuan
    family: Shi
  - given: Wei
    family: Li
  - given: Changyu
    family: Duan
  - given: jinwei
    family: Yan
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1445-1460
  id: chen21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1445
  lastpage: 1460
  published: 2021-11-28 00:00:00 +0000
- title: 'A Two-Stage Training Framework with Feature-Label Matching Mechanism for Learning from Label Proportions'
  abstract: 'In this paper, we study a task called Learning from Label Proportions (LLP). LLP aims to learn an instance-level classifier given a number of bags and each bag is composed of several instances. The label of each instance is concealed and what we know is the proportion of each class in each bag. The lack of instance-level supervision information makes the model struggle for finding the right direction for optimization. In this paper, we solve this problem by developing a two-stage training framework. First, we facilitate contrastive learning to train a feature extractor in an unsupervised way. Second, we train a linear classifier with the parameter of the feature extractor fixed. This framework performs much better than most baselines but is still unsatisfactory when the bag size or the number of classes is large. Therefore, we further propose a Feature-Label Matching mechanism (FLMm). FLMm can provide a roughly right optimization direction for the classifier by assigning labels to a subset of instances selected in this bag with a high degree of confidence. Therefore, the classifier can more easily establish the correspondence between instances and labels in the second stage. Experimental results on two benchmark datasets, namely CIFAR10 and CIFAR100, show that our model is far superior than baseline models, for example, accuracy increases from 43.44% to 61.25% for bag size 128 on CIFAR100.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/yang21b.html
  PDF: https://proceedings.mlr.press/v157/yang21b/yang21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-yang21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Haoran
    family: Yang
  - given: Wanjing
    family: Zhang
  - given: Wai
    family: Lam
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1461-1476
  id: yang21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1461
  lastpage: 1476
  published: 2021-11-28 00:00:00 +0000
- title: '$K^2$-GNN: Multiple Users’ Comments Integration with Probabilistic K-Hop Knowledge Graph Neural Networks'
  abstract: 'Integrating multiple comments into a concise statement for any online products or web services requires a non-trivial understanding of the input. Recently, graph neural networks (GNN) has been successfully applied to learn from highly-structured graph representations to mitigate the relationship between entities, such as co-references. However, current inter-sentence relation extraction cannot leverage discrete reasoning chains over multiple comments. To address this issue, in this paper, we propose a probabilistic $K$-hop knowledge graph (KKG) to extend existing knowledge graphs with inferred relations via discrete intra-sentence and inter-sentence reasoning chains. KKG associates each inferred relation with a confidence value through Bayesian inference. We further answer how a knowledge graph with inferred relations can help the multiple comments integration through integrating KKG with GNN ($\text{K}^2$-GNN). Our extensive experimental results show that our $\text{K}^2$-GNN outperforms all baseline graph models on multiple comments integration.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhan21b.html
  PDF: https://proceedings.mlr.press/v157/zhan21b/zhan21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhan21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Huixin
    family: Zhan
  - given: Kun
    family: Zhang
  - given: Chenyi
    family: Hu
  - given: Victor
    family: Sheng
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1477-1492
  id: zhan21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1477
  lastpage: 1492
  published: 2021-11-28 00:00:00 +0000
- title: 'Learn to Predict Vertical Track Irregularity with Extremely Imbalanced Data'
  abstract: 'Railway systems require regular manual maintenance, a large part of which is dedicated to inspecting track deformation. Such deformation might severely impact trains’ runtime security, whereas such inspections remain costly for both finance and human resources. Therefore, a more precise and efficient approach to detect railway track deformation is in urgent need. In this paper, we showcase an application framework for predicting vertical track irregularity, based on a real-world, large-scale dataset produced by several operating railways in China. We have conducted extensive experiments on various machine learning & ensemble learning algorithms in an effort to maximize the model’s capability in capturing any irregularity. We also proposed a novel approach for handling imbalanced data in multivariate time series prediction tasks with adaptive data sampling and penalized loss. Such an approach has proven to reduce models’ sensitivity to the imbalanced target domain, thus improving its performance in predicting rare extreme values.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/chen21c.html
  PDF: https://proceedings.mlr.press/v157/chen21c/chen21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-chen21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yutao
    family: Chen
  - given: Yu
    family: Zhang
  - given: Fei
    family: Yang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1493-1504
  id: chen21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1493
  lastpage: 1504
  published: 2021-11-28 00:00:00 +0000
- title: 'Semi-Open Attribute Extraction from Chinese Functional Description Text'
  abstract: 'Attribute extraction is a task to identify the attribute and the corresponding attribute value from unstructured text, which is important for extensive applications like web information retrieval and the recommended system. The traditional relation extraction-based methods or joint extraction-based systems are often perform attribute classify based on subject and attribute-value pairs, and extract the attribute triples in the scope of ontology schema categories, which is in the assumption of the close-world and cannot satisfy the diversity of attributes. In this work, we propose a semi-open information extraction system for attribute extraction in a multi-component framework. With the proposed semi-open attribute extraction system (SOAE), more attribute-value pairs can be discovered by extracting literal triples without the limitation of pre-defined ontology. An additional co-trained ontology-based attribute extraction model is appended as a component following the assumption of the partial-closed world (PCWA), remission the performance degradation of SOAE caused by missing of the literal predicate in raw text and contribute to extract richer attribute triples and construct more dense knowledge graph. For evaluating the performance of the attribute extraction system, we construct a Chinese functional description text dataset CNShipNet and conduct experiments on it. The experimental results demonstrate that our proposed approach outperforms several state-of-the-art baselines with a large margin.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhang21f.html
  PDF: https://proceedings.mlr.press/v157/zhang21f/zhang21f.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhang21f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Li
    family: Zhang
  - given: Yanzeng
    family: Li
  - given: Rouyu
    family: Zhang
  - given: Wenjie
    family: Li
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1505-1520
  id: zhang21f
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1505
  lastpage: 1520
  published: 2021-11-28 00:00:00 +0000
- title: 'Regularized Mutual Learning for Personalized Federated Learning'
  abstract: 'Federated Learning (FL) is a privacy-protected learning paradigm, which allows many clients to jointly train a model under the coordination of a server without the local data leakage. In real-world scenarios, data in different clients usually cannot satisfy the independent and identically distributed (i.i.d.) assumption adopted widely in machine learning. Traditionally training a single global model may cause performance degradation and difficulty in ensuring convergence in such a non-i.i.d. case.  To handle this case, various models can be trained for each client to capture the personalization in each client. In this paper, we propose a new personalized FL framework, called Personalized Federated Mutual Learning (PFML), to use the non-i.i.d. characteristics to generate specific models for clients. Specifically, the PFML method integrates mutual learning into the local update process in each client to not only improve the performance of both the global and personalized models but also speed up the convergence compared with state-of-the-art methods. Moreover, the proposed PFML method can help maintain the heterogeneity of client models and protect the information of personalized models. Experiments on benchmark datasets show the effectiveness of the proposed PFML model. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/yang21c.html
  PDF: https://proceedings.mlr.press/v157/yang21c/yang21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-yang21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ruihong
    family: Yang
  - given: Junchao
    family: Tian
  - given: Yu
    family: Zhang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1521-1536
  id: yang21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1521
  lastpage: 1536
  published: 2021-11-28 00:00:00 +0000
- title: 'Relation Also Need Attention: Integrating Relation Information Into Image Captioning'
  abstract: 'Image captioning methods with attention mechanism are leading this field, especially models with global and local attention. But there are few conventional models to integrate the relationship information between various regions of the image. In this paper, this kind of relationship features are embedded into the fused attention mechanism to explore the internal visual and semantic relations between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we combine Generative Adversarial Network with Reinforcement Learning and employ the greedy decoding method to generate a dynamic baseline reward for self-critical training. Finally, experiments on MSCOCO datasets show that the model can generate more accurate and vivid image captioning sentences and perform better in multiple prevailing metrics than the previous advanced models.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/chen21d.html
  PDF: https://proceedings.mlr.press/v157/chen21d/chen21d.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-chen21d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tianyu
    family: Chen
  - given: Zhixin
    family: Li
  - given: Tiantao
    family: Xian
  - given: Canlong
    family: Zhang
  - given: Huifang
    family: Ma
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1537-1552
  id: chen21d
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1537
  lastpage: 1552
  published: 2021-11-28 00:00:00 +0000
- title: 'Learning to Switch Optimizers for Quadratic Programming'
  abstract: 'Quadratic programming (QP) seeks to solve optimization problems involving quadratic functions that can include complex boundary constraints. QP in the unrestricted form is $\mathcal{NP}$-hard; but when restricted to the convex case, it becomes tractable. Active set and interior point methods are used to solve convex problems, and in the nonconvex case various heuristics or relaxations are used to produce high-quality solutions in finite time. Learning to optimize (L2O) is an emerging approach to design solvers for optimization problems. We develop an L2O approach that uses reinforcement learning to learn a stochastic policy to switch between pre-existing optimization algorithms to solve QP problem instances. In particular, our agent switches between three simple optimizers: Adam, gradient descent, and random search. Our experiments show that the learned optimizer minimizes quadratic functions faster and finds better-quality solutions in the long term than do any of the possible optimizers switched between. We also compare our solver with the standard QP algorithms in MATLAB and find better performance in fewer function evaluations.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/getzelman21a.html
  PDF: https://proceedings.mlr.press/v157/getzelman21a/getzelman21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-getzelman21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Grant
    family: Getzelman
  - given: Prasanna
    family: Balaprakash
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1553-1568
  id: getzelman21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1553
  lastpage: 1568
  published: 2021-11-28 00:00:00 +0000
- title: 'Perturbing Eigenvalues with Residual Learning in Graph Convolutional Neural Networks'
  abstract: 'Network structured data is ubiquitous in natural and social science applications. Graph Convolutional Neural Network (GCN) has attracted significant attention recently due to its success in representing, modeling, and predicting large-scale network data. Various types of graph convolutional filters were proposed to process graph signals to boost the performance of graph-based semi-supervised learning. This paper introduces a novel spectral learning technique called EigLearn, which uses residual learning to perturb the eigenvalues of the graph filter matrix to optimize its capability. EigLearn is relatively easy to implement, and yet thorough experimental studies reveal that it is more effective and efficient than the prior works on the specific issue, such as LanczosNet and FisherGCN. EigLearn only perturbs a small number of eigenvalues and does not require a complete eigendecomposition. Our investigation shows that EigLearn reaches the maximal performance improvement by perturbing about 30 to 40 eigenvalues, and the EigLearn-based GCN has comparable efficiency as the standard GCN. Furthermore, EigLearn bears a clear explanation in the spectral domain of the graph filter and shows aggregation effects in performance improvement when coupled with different graph filters. Hence, we anticipate that EigLearn may serve as a useful neural unit in various graph-involved neural net architectures.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/yao21a.html
  PDF: https://proceedings.mlr.press/v157/yao21a/yao21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-yao21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shibo
    family: Yao
  - given: Dantong
    family: Yu
  - given: Xiangmin
    family: Jiao
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1569-1584
  id: yao21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1569
  lastpage: 1584
  published: 2021-11-28 00:00:00 +0000
- title: 'Bayesian nonparametric model for arbitrary cubic partitioning'
  abstract: 'In this paper, we propose a continuous-time Markov process for cubic partitioning models of three-dimensional (3D) arrays and its application to Bayesian nonparametric relational data analysis of 3D array data. Relational data analysis is a topic that has been actively studied in the field of Bayesian nonparametrics, and in particular, models for analyzing 3D arrays have attracted much attention in recent years. In particular, the cubic partitioning model is very popular due to its practical usefulness, and various models such as the infinite relational model and the Mondrian process have been proposed. However, these conventional models have the disadvantage that they are limited to a certain class of cubic partitions, and there is a need for a model that can represent a broader class of arbitrary cubic partitions, which has long been an open issue in this field. In this study, we propose a stochastic process that can represent arbitrary cubic partitions of 3D arrays as a continuous-time Markov process. Furthermore, by combining it with the Aldous-Hoover-Kallenberg representation theorem, we construct an infinitely exchangeable 3D relational model and apply it to real data to show its application to relational data analysis. Experiments show that the proposed model improves the prediction performance by expanding the class of representable cubic partitioning. '
  volume: 157
  URL: https://proceedings.mlr.press/v157/nakano21a.html
  PDF: https://proceedings.mlr.press/v157/nakano21a/nakano21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-nakano21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Masahiro
    family: Nakano
  - given: Yasuhiro
    family: Fujiwara
  - given: Akisato
    family: Kimura
  - given: Takeshi
    family: Yamada
  - given: Naonori
    family: Ueda
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1585-1600
  id: nakano21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1585
  lastpage: 1600
  published: 2021-11-28 00:00:00 +0000
- title: 'Bayesian Inference for Optimal Transport with Stochastic Cost'
  abstract: 'In machine learning and computer vision, optimal transport has had significant success inlearning generative models and defining metric distances between structured and stochasticdata objects, that can be cast as probability measures.  The key element of optimal trans-port is the so called lifting of anexactcost (distance) function, defined on the sample space,to a cost (distance) between probability measures over the sample space.  However, in manyreal life applications the cost isstochastic: e.g., the unpredictable traffic flow affects the costof transportation between a factory and an outlet.  To take this stochasticity into account,we introduce a Bayesian framework for inferring the optimal transport plan distributioninduced by the stochastic cost, allowing for a principled way to include prior informationand to model the induced stochasticity on the transport plans.  Additionally, we tailor anHMC method to sample from the resulting transport plan posterior distribution.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/mallasto21a.html
  PDF: https://proceedings.mlr.press/v157/mallasto21a/mallasto21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-mallasto21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anton
    family: Mallasto
  - given: Markus
    family: Heinonen
  - given: Samuel
    family: Kaski
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1601-1616
  id: mallasto21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1601
  lastpage: 1616
  published: 2021-11-28 00:00:00 +0000
- title: 'Multi-view Latent Subspace Clustering based on both Global and Local Structure'
  abstract: 'Most existing multi-view clustering methods focus on the global structure or local structure among samples, and few methods focus on the two structures at the same time. In this paper, we propose a Multi-view Latent subspace Clustering based on both Global and Local structure (MLCGL). In this method, a latent embedding representation is learned by exploring the complementary information from different views. In the latent space, not only the global reconstruction relationship but also the local geometric structure among the latent variables are discovered. In this way, a unified affinity graph matrix is constructed in the latent space for different views, which indicates a clear between-class relationship. Meanwhile, a rank constraint is introduced on the Laplacian graph to facilitate the division of samples into the required clusters. In MLCGL, the affinity graph also provides positive feedback to optimize the learned latent representation and contribute to divided it into reasonable clusters. Moreover, we present an alternating iterative optimization scheme to optimize objective functions. Compared with the state-of-art algorithms, MLCGL has achieved excellent experimental performance on several real-world datasets.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/honghan21a.html
  PDF: https://proceedings.mlr.press/v157/honghan21a/honghan21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-honghan21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhou
    family: Honghan
  - given: Cai
    family: Weiling
  - given: Xu
    family: Le
  - given: Yang
    family: Ming
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1617-1632
  id: honghan21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1617
  lastpage: 1632
  published: 2021-11-28 00:00:00 +0000
- title: 'Augmenting Imbalanced Time-series Data via Adversarial Perturbation in Latent Space'
  abstract: 'Success of training deep learning models largely depends on the amount and quality of training data. Although numerous data augmentation techniques have already been pro- posed for certain domains such as computer vision where simple schemes such as rotation and flipping have been shown to be effective, other domains such as time-series data have a relatively smaller set of augmentation techniques readily available. Data imbalance is a phenomenon often observed in real-world data. However, a simple oversampling technique may make a model vulnerable to overfitting, so a proper data augmentation is desired. To tackle these problems, we propose a novel data augmentation method that utilizes the latent vectors of an autoencoder in a novel way. When input data are perturbed in its latent space, their reconstructed data retains properties similar to the original one. In con- trast, adversarial augmentation is a technique to train robust deep neural networks against unforeseen data shifts or corruptions by providing a downstream model with samples that are difficult to predict. Our method adversarially perturbs input data in its latent space so that the augmented data is diverse and conducive to reducing test error of a downstream model. The experimental results demonstrated that our method achieves the right balance, significantly modifying the input data to help generalization while retaining its realism.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/kim21a.html
  PDF: https://proceedings.mlr.press/v157/kim21a/kim21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-kim21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Beomsoo
    family: Kim
  - given: Jang-Ho
    family: Choi
  - given: Jaegul
    family: Choo
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1633-1644
  id: kim21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1633
  lastpage: 1644
  published: 2021-11-28 00:00:00 +0000
- title: 'Building Decision Tree for Imbalanced Classification via Deep Reinforcement Learning'
  abstract: 'Data imbalance is prevalent in classification problems and tends to bias the classifier towards the majority of classes. This paper proposes a decision tree building method for imbalanced binary classification via deep reinforcement learning. First, the decision tree building process is regarded as a multi-step game and modeled as a Markov decision process. Then, the tree-based convolution is applied to extract state vectors from the tree structure, and each node is abstracted into a parameterized action. Next, the reward function is designed based on a range of evaluation metrics of imbalanced classification. Finally, a popular deep reinforcement learning algorithm called Multi-Pass DQN is employed to find an optimal decision tree building policy. The experiments on more than 15 imbalanced data sets indicate that our method outperforms the state-of-the-art methods.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/wen21a.html
  PDF: https://proceedings.mlr.press/v157/wen21a/wen21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-wen21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Guixuan
    family: Wen
  - given: Kaigui
    family: Wu
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1645-1659
  id: wen21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1645
  lastpage: 1659
  published: 2021-11-28 00:00:00 +0000
- title: 'Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox'
  abstract: 'We propose a novel machine learning framework to conduct real-time multi-speaker diarization and recognition without prior registration and pretraining in a fully online learning setting. Our contributions are two-fold. First, we propose a new benchmark to evaluate the rarely studied fully online speaker diarization problem. We build upon existing datasets of real world utterances to automatically curate MiniVox, an experimental environment which generates infinite configurations of continuous multi-speaker speech stream. Second, we consider the practical problem of online learning with episodically revealed rewards and introduce a solution based on semi-supervised and self-supervised learning methods. Additionally, we provide a workable web-based recognition system which interactively handles the cold start problem of new user’s addition by transferring representations of old arms to new ones with an extendable contextual bandit. We demonstrate that our proposed method obtains robust performance in the online MiniVox framework given either cepstrum-based representations or deep neural network embeddings.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/lin21c.html
  PDF: https://proceedings.mlr.press/v157/lin21c/lin21c.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-lin21c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Baihan
    family: Lin
  - given: Xinxin
    family: Zhang
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1660-1674
  id: lin21c
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1660
  lastpage: 1674
  published: 2021-11-28 00:00:00 +0000
- title: 'Video Action Recognition with Neural Architecture Search'
  abstract: 'Recently, deep convolutional neural networks have been widely used in the field of videoaction  recognition.   Current  approaches  tend  to  concentrate  on  the  structure  design  fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress.  With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers.  We relax the entire search space into a continuoussearch  space,  and  search  for  a  set  of  network  architectures  that  balance  accuracy  andcomputational  efficiency  by  considering  accuracy  as  the  primary  optimization  goal  andcomputational complexity as the secondary optimization goal.  We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/zhou21a.html
  PDF: https://proceedings.mlr.press/v157/zhou21a/zhou21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-zhou21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yuanding
    family: Zhou
  - given: Baopu
    family: Li
  - given: Zhihui
    family: Wang
  - given: Haojie
    family: Li
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1675-1690
  id: zhou21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1675
  lastpage: 1690
  published: 2021-11-28 00:00:00 +0000
- title: 'Learning Maximum Margin Markov Networks from examples with missing labels'
  abstract: 'Structured output classifiers based on the framework of Markov Networks provide a transparent way to model statistical dependencies between output labels. The Markov Network (MN) classifier can be efficiently learned by the maximum margin method, which however requires expensive completely annotated examples. We extend the maximum margin algorithm for learning of unrestricted MN classifiers from examples with partially missing annotation of labels. The proposed algorithm translates learning into minimization of a novel loss function which is convex, has a clear connection with the supervised margin-rescaling loss, and can be efficiently optimized by first-order methods. We demonstrate the efficacy of the proposed algorithm on a challenging structured output classification problem where it beats deep neural network models trained from a much higher number of completely annotated examples, while the proposed method used only partial annotations.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/franc21a.html
  PDF: https://proceedings.mlr.press/v157/franc21a/franc21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-franc21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Vojtech
    family: Franc
  - given: Andrii
    family: Yermakov
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1691-1706
  id: franc21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1691
  lastpage: 1706
  published: 2021-11-28 00:00:00 +0000
- title: 'Maximization of Monotone $k$-Submodular Functions with Bounded Curvature and Non-$k$-Submodular Functions'
  abstract: 'The concept of $k$-submodularity is an extension of submodularity, of which maximization has various applications, such as influence maximization and sensor placement. In such situations, to model complicated real problems, we want to deal with multiple factors, such as, more detailed parameter representing a property of a given function or a constraint which should be imposed for a given function, simultaneously. Besides, it is preferable that an algorithm for the modeling problem is simple. In this paper, for both monotone $k$-submodular function maximization with bounded curvature and monotone weakly $k$-submodular function maximization, we give approximation ratio analysis on greedy-type algorithms on the problem with the matroid constraint and that with the individual size constraint. Furthermore, we give an approximation ratio analysis on another type of the relaxation of $k$-submodular functions, approximately $k$-submodular functions, with the matroid constraint.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/matsuoka21b.html
  PDF: https://proceedings.mlr.press/v157/matsuoka21b/matsuoka21b.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-matsuoka21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tatsuya
    family: Matsuoka
  - given: Naoto
    family: Ohsaka
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1707-1722
  id: matsuoka21b
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1707
  lastpage: 1722
  published: 2021-11-28 00:00:00 +0000
- title: 'Unsupervised Cycle-Consistent Network for Removing Susceptibility Artifacts in Single-shot EPI'
  abstract: 'Single-shot EPI(ssEPI) is one of the most important ultrafast MRI sequences commonly used for diffusion-weighted MRI and functional MRI. However, ssEPI suffers from susceptibility artifacts, especially in the high field or at the tissue boundaries. The widely used blip/down approaches, such as TOPUP, estimate the underlying distortion field from a pair of images with reversed-phase encoding direction. Typically, the iterative methods are used to find a solution to the ill-posed problem of finding the displacement map that maps up/down acquisitions onto each other. Then the geometric and intensity corrections are applied to obtain the undistorted images based on the estimated displacement map. This paper presents a new unsupervised cycle-consistent deep neural network that takes advantage of both the deep neural network and the gradient reversal method. The proposed method consists of three main components: (1) the Resnet50-Unet to map the pair of images with inverted phase encoding to the displacement maps; (2) the geometric and intensity correction module to obtain the undistorted images; (3) the forward model is applied to get the cycled blip up/down images, and the cycle-consistent loss is optimized. In addition, the CNN network will generate two field maps to overcome motion or field drift during the scan. This new network is trained unsupervised on the clinical datasets downloaded from the Human Connection Project website. And we test this method on both preclinical and clinical datasets. The preclinical dataset is collected from 20 mice based on the modified EPI pulse sequence in 7T scanner. Both simulated and experimental results demonstrate that our method outperforms state-of-the-art methods. In conclusion, we proposed an unsupervised cycle-consistent deep neural network for removing susceptibility artifacts. The results on both preclinical and clinical datasets show this new method’s acceleration and generalization capabilities.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/xie21a.html
  PDF: https://proceedings.mlr.press/v157/xie21a/xie21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-xie21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Weida
    family: Xie
  - given: Shi
    family: Chen
  - given: Qingjia
    family: Bao
  - given: Kewen
    family: Liu
  - given: Zhao
    family: Li
  - given: Xiaojun
    family: Li
  - given: Chongxin
    family: Bai
  - given: Piqiang
    family: Li
  - given: Chaoyang
    family: Liu
  - given: Otikovs
    family: Martins
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1723-1738
  id: xie21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1723
  lastpage: 1738
  published: 2021-11-28 00:00:00 +0000
- title: 'Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector'
  abstract: 'Text summarization has been a significant challenge in the Nature Process Language (NLP) field. The approach of dealing with text summarization can be roughly divided into two main paradigms: extractive and abstractive manner. The former allows capturing the most representative snippets in a document while the latter generates a summary by understanding the latent meaning in a material with a language generation model. Recently, studies found that jointly employing the extractive and abstractive summarization models can take advantage of their complementary advantages, creating both concise and informative summaries. However, the reinforced summarization models mainly depend on the ROUGE-based reward, which only has the ability to quantify the extent of word-matching rather than semantic-matching between document and summary. Meanwhile, documents are usually collected with redundant or noisy information due to the existence of repeated or irrelevant information in real-world applications. Therefore, only depending on ROUGE-based reward to optimize the reinforced summarization models may lead to biased summary generation. In this paper, we propose a novel deep \bf{Hy}brid \bf{S}ummarization with semantic weighting \bf{R}eward and latent structure \bf{D}etector (HySRD). Specifically, HySRD introduces a new reward mechanism that simultaneously takes advantage of semantic and syntactic information among documents and summaries. To effectively model the accuracy semantics, a latent structure detector is designed to incorporate the high-level latent structures in the sentence representation for information selection. Extensive experiments have been conducted on two well-known benchmark datasets \emph{CNN/Daily Mail} (short input document) and \emph{BigPatent} (long input document). The automatic evaluation shows that our approach significantly outperforms the state-of-the-art of hybrid summarization models.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/song21a.html
  PDF: https://proceedings.mlr.press/v157/song21a/song21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-song21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mingyang
    family: Song
  - given: Liping
    family: Jing
  - given: Yi
    family: Feng
  - given: Zhiwei
    family: Sun
  - given: Lin
    family: Xiao
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1739-1754
  id: song21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1739
  lastpage: 1754
  published: 2021-11-28 00:00:00 +0000
- title: 'Fast Rate Learning in Stochastic First Price Bidding'
  abstract: 'First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. %  As far as learning is concerned, first-price auctions are more challenging because the optimal bidding strategy does not only depend on the value of the item but also requires some knowledge of the other bids. % They have already given rise to several works in sequential learning, % many of which consider models for which the value of the buyer or the opponents’ maximal bid is chosen in an adversarial manner. Even in the simplest settings, this gives rise to algorithms whose pseudo-regret grows as $\sqrt{T}$ with respect to the time horizon $T$. % Focusing on the case where the buyer plays against a stationary stochastic environment, we show how to achieve significantly lower pseudo-regret: when the opponents’ maximal bid distribution is known we provide an algorithm whose pseudo-regret can be as low as $\log^2(T)$; in the case where the distribution must be learnt sequentially, a generalization of this algorithm can achieve $T^{1/3+ \epsilon}$ pseudo-regret, for any $\epsilon>0$. % To obtain these results, we introduce two novel ideas that can be of interest in their own right. First, by transposing results obtained in the posted price setting, we provide conditions under which the first-price bidding utility is locally quadratic around its optimum. Second, we leverage the observation that, on small sub-intervals, the concentration of the variations of the empirical distribution function may be controlled more accurately than by using the classical Dvoretzky-Kiefer-Wolfowitz inequality. % Numerical simulations confirm that our algorithms converge much faster than alternatives proposed in the literature for various bid distributions, including for bids collected on an actual programmatic advertising platform.'
  volume: 157
  URL: https://proceedings.mlr.press/v157/achddou21a.html
  PDF: https://proceedings.mlr.press/v157/achddou21a/achddou21a.pdf
  edit: https://github.com/mlresearch//v157/edit/gh-pages/_posts/2021-11-28-achddou21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 13th Asian Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Juliette
    family: Achddou
  - given: Olivier
    family: Cappé
  - given: Aurélien
    family: Garivier
  editor: 
  - given: Vineeth N.
    family: Balasubramanian
  - given: Ivor
    family: Tsang
  page: 1754-1769
  id: achddou21a
  issued:
    date-parts: 
      - 2021
      - 11
      - 28
  firstpage: 1754
  lastpage: 1769
  published: 2021-11-28 00:00:00 +0000