Proceedings of Machine Learning ResearchProceedings of The 10th Asian Conference on Machine Learning on 14-16 November 2018
Published as Volume 95 by the Proceedings of Machine Learning Research on 04 November 2018.
Volume Edited by:
Jun Zhu
Ichiro Takeuchi
Series Editors:
Neil D. Lawrence
Mark Reid
http://proceedings.mlr.press/v95/
Fri, 23 Nov 2018 17:29:55 +0000Fri, 23 Nov 2018 17:29:55 +0000Jekyll v3.7.4PrefaceSun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhu18a.html
http://proceedings.mlr.press/v95/zhu18a.htmlWho Are Raising Their Hands? Hand-Raiser Seeking Based on Object Detection and Pose EstimationIn this paper, we propose an automatic hand-raiser recognition algorithm to show who raise their hands in real classroom scenarios, which is of great importance for further analyzing the learning states of individuals. To recognize the hand-raisers, we divide the hand-raiser recognition into three subproblems, including hand-raising detection, pose estimation, and matching the raised hands to students. Several challenges exist while dealing with the above-mentioned subproblems, such as low resolution of the back row for keypoints detection, the motion distortion caused by hand raising in pose estimation, and various complex situations for matching. To solve these challenges, we first adopt an improved R-FCN algorithm for hand-raising detection, whose effectiveness has been demonstrated. Secondly, we present a novel PAF-based pose estimation algorithm for detecting keypoints of human bodies. The proposed PAF adds scale search and modified weight metric to adapt to the real and complex scenarios. Specifically, scale search improves the detection effect at low resolution by pooling human characteristics in different sizes of pictures, and modified weight metric reasonably utilizes the directional vectors of possible limb connections to optimize the case of motion distortion. Thirdly, a heuristic matching strategy based on the location of hand-raising and keypoints information is proposed to recognize the hand-raisers. Experimental results on six teaching videos in real classrooms have demonstrated the efficiency of the proposed algorithm, and 83% recognition accuracy indicates the potential applications in real classrooms.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhou18a.html
http://proceedings.mlr.press/v95/zhou18a.htmlRefining Synthetic Images with Semantic Layouts by Adversarial TrainingRecently, progress in learning-by-synthesis has proposed training models on synthetic images, which can effectively reduce the cost of manpower and material resources. However, learning from synthetic images still fails to achieve the desired performance compared to naturalistic images due to the different distribution of synthetic images. In an attempt to address this issue, previous methods were to improve the realism of synthetic images by learning a model. However, the disadvantage of the method is that the distortion has not been improved and the authenticity level is unstable. To solve this problem, we put forward a new structure to improve synthetic images, via the reference to the idea of style transformation, through which we can efficiently reduce the distortion of pictures and minimize the need of real data annotation. We estimate that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on various datasets including MPIIGaze dataset.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhao18a.html
http://proceedings.mlr.press/v95/zhao18a.htmlRelative Attribute Learning with Deep Attentive Cross-image RepresentationIn this paper, we study the relative attribute learning problem, which refers to comparing the strengths of a specific attribute between image pairs, with a new perspective of cross-image representation learning. In particular, we introduce a deep attentive cross-image representation learning (DACRL) model, which first extracts single-image representation with one shared subnetwork, and then learns attentive cross-image representation through considering the channel-wise attention of concatenated single-image feature maps. Taking a pair of images as input, DACRL outputs a posterior probability indicating whether the first image in the pair has a stronger presence of attribute than the second image. The whole network is jointly optimized via a unified end-to-end deep learning scheme. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhang18d.html
http://proceedings.mlr.press/v95/zhang18d.htmlPerson Re-identification by Mid-level Attribute and Part-based Identity LearningExisting deep models using attributes usually take global features for identity classification and attribute recognition. However, some attributes exist in local position, such as a hat and shoes, therefore global feature alone is insufficient for person representation. In this work, we propose to use the attribute recognition as an auxiliary task for person re-identification. The attributes are recognised from the local regions of mid-level layers. Besides, we extract local features and global features from a high-level layer for identity classification. The mid-level attribute learning improves the discrimination of high-level features, and the local feature is complementary to the global feature. We report competitive results on two large-scale person re-identification benchmarks, Market-1501 and DukeMTMC-reID datasets, which demonstrate the effectiveness of the proposed method.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhang18c.html
http://proceedings.mlr.press/v95/zhang18c.htmlDiscriminative Feature Representation for Person Re-identification by Batch-contrastive LossIn the past few years, person re-identification (reID) has developed rapidly due to the success of deep convolutional neural networks. The softmax loss function is an important component for learning discriminative features. However, the classifier trained by the softmax loss is difficult to distinguish the hard samples. In this work, we introduce a new auxiliary loss function, called batch-contrastive loss, for person reID to further separate the features of different identities and pulls the features of same identity closer. Furthermore, the proposed loss function does not rely on the pairwise or triplet sampling which is commonly used in the Siamese model. We test our loss function on two large-scale person reID benchmarks, Market-1501 and DukeMTMC datasets. Under the combination of the batch-contrastive loss and the softmax loss, even only employing the generic L2-distance metric, we can achieve competitive results among the state-of-the-arts.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhang18b.html
http://proceedings.mlr.press/v95/zhang18b.htmlEnd-to-End Learning of Multi-scale Convolutional Neural Network for Stereo MatchingDeep neural networks have shown excellent performance in stereo matching task. Recently CNN-based methods have shown that stereo matching can be formulated as a supervised learning task. However, less attention is paid on the fusion of contextual semantic information and details. To tackle this problem, we propose a network for disparity estimation based on abundant contextual details and semantic information, called Multi-scale Features Network (MSFNet). First, we design a new structure to encode rich semantic information and fine-grained details by fusing multi-scale features. And we combine the advantages of element-wise addition and concatenation, which is conducive to merge semantic information with details. Second, a guidance mechanism is introduced to guide the network to automatically focus more on the unreliable regions. Third, we formulate the consistency check as an error map, obtained by the low stage features with fine-grained details. Finally, we adopt the consistency checking between the left feature and the synthetic left feature to refine the initial disparity. Experiments on Scene Flow and KITTI 2015 benchmark demonstrated that the proposed method can achieve the state-of-the-art performance.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zhang18a.html
http://proceedings.mlr.press/v95/zhang18a.htmlJoint Patch-Group Based Sparse Representation for Image InpaintingSparse representation has achieved great successes in various machine learning and image processing tasks. For image processing, typical patch-based sparse representation (PSR) models usually tend to generate undesirable visual artifacts, while group-based sparse representation (GSR) models produce over-smooth phenomena. In this paper, we propose a new sparse representation model, termed joint patch-group based sparse representation (JPG-SR). Compared with existing sparse representation models, the proposed JPG-SR provides a powerful mechanism to integrate the local sparsity and nonlocal self-similarity of images. We then apply the proposed JPG-SR model to a low-level vision problem, namely, image inpainting. To make the proposed scheme tractable and robust, an iterative algorithm based on the alternating direction method of multipliers (ADMM) framework is developed to solve the proposed JPG-SR model. Experimental results demonstrate that the proposed model is efficient and outperforms several state-of-the-art methods in both objective and perceptual quality.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zha18a.html
http://proceedings.mlr.press/v95/zha18a.htmlCollaboratively Weighting Deep and Classic Representation via $l_2$ Regularization for Image ClassificationDeep convolutional neural networks provide a powerful feature learning capability for image classification. The deep image features can be utilized to deal with many image understanding tasks like image classification and object recognition. However, the robustness obtained in one dataset can be hardly reproduced in the other domain, which leads to inefficient models far from state-of-the-art. We propose a deep collaborative weight-based classification (DeepCWC) method to resolve this problem, by providing a novel option to fully take advantage of deep features in classic machine learning. It firstly performs the $l_2$-norm based collaborative representation on the original images, as well as the deep features extracted by deep CNN models. Then, two distance vectors, obtained based on the pair of linear representations, are fused together via a novel collaborative weight. This collaborative weight enables deep and classic representations to weigh each other. We observed the complementarity between two representations in a series of experiments on 10 facial and object datasets. The proposed DeepCWC produces very promising classification results, and outperforms many other benchmark methods, especially the ones claimed for Fashion-MNIST. The code is going to be published in our public repository\footnote{https://github.com/zengsn/research}.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/zeng18a.html
http://proceedings.mlr.press/v95/zeng18a.htmlCo-regularized Multi-view Subspace ClusteringFor many clustering applications, Multi-view data sets are very common. Multi-view clustering aims to exploit information across views instead of individual views, which is promising to improve clustering performance. Note that a high-dimensional data set usually distributes on certain low-dimensional subspace. Thus, many multi-view subspace clustering algorithms have been developed. However, existing multi-view subspace clustering methods rarely perform clustering on the subspace representation of each view simultaneously as well as keep the indicator consistency among the representations, i.e., the same data point in different views should be assigned to the same cluster. In this paper, we propose a novel multi-view subspace clustering method. In our method, we use the indicator matrix to ensure that we perform clustering on the subspace representation of each view simultaneously. And at the same time, a co-regularized term is added to guarantee the consistency of the indicator matrices. Experiments on several real-world multi-view datasets demonstrate the effectiveness and superiority of our proposed method.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/yu18a.html
http://proceedings.mlr.press/v95/yu18a.htmlBoosting Dynamic Programming with Neural Networks for Solving NP-hard ProblemsDynamic programming is a powerful method for solving combinatorial optimization problems. However, it does not always work well, particularly for some NP-hard problems having extremely large state spaces. In this paper, we propose an approach to boost the capability of dynamic programming with neural networks. First, we replace the conventional tabular method with neural networks of polynomial sizes to approximately represent dynamic programming functions. And then we design an iterative algorithm to train the neural network with data generated from a solution reconstruction process. Our method combines the approximating ability and flexibility of neural networks and the advantage of dynamic programming in utilizing intrinsic properties of a problem. This approach can significantly reduce the space complexity and it is flexible in balancing space, running time, and accuracy. We apply the method to the Travelling Salesman Problem (TSP). The experimental results show that our approach can solve larger problems that are intractable for conventional dynamic programming and the performances are near optimal, outperforming the well-known approximation algorithms.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/yang18a.html
http://proceedings.mlr.press/v95/yang18a.htmlDeep Multi-instance Learning with Dynamic PoolingEnd-to-end optimization of multi-instance learning (MIL) using neural networks is an important problem with many applications, in which a core issue is how to design a permutation-invariant pooling function without losing much instance-level information. Inspired by the dynamic routing in recent capsule networks, we propose a novel dynamic pooling function for MIL. It is an adaptive scheme for both key instance selection and modeling the contextual information among instances in a bag. The dynamic pooling iteratively updates the instance contribution to its bag. It is permutation-invariant and can interpret instance-to-bag relationship. The proposed dynamic pooling based multi-instance neural network has been validated on many MIL tasks and outperforms other MIL methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/yan18a.html
http://proceedings.mlr.press/v95/yan18a.htmlAdversarial Neural Machine TranslationIn this paper, we study a new learning paradigm for neural machine translation (NMT). Instead of maximizing the likelihood of the human translation as in previous works, we minimize the distinction between human translation and the translation given by an NMT model. To achieve this goal, inspired by the recent success of generative adversarial networks (GANs), we employ an adversarial training architecture and name it as Adversarial-NMT. In Adversarial-NMT, the training of the NMT model is assisted by an adversary, which is an elaborately designed $2D$ convolutional neural network (CNN). The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human. The goal of the NMT model is to produce high quality translations so as to cheat the adversary. A policy gradient method is leveraged to co-train the NMT model and the adversary. Experimental results on English$\rightarrow$French and German$\rightarrow$English translation tasks show that Adversarial-NMT can achieve significantly better translation quality than several strong baselines.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/wu18a.html
http://proceedings.mlr.press/v95/wu18a.htmlCCNet: Cluster-Coordinated Net for Learning Multi-agent Communication Protocols with Reinforcement LearningMulti-agent system is crucial for many practical applications. Recent years have witnessed numerous research on multi-agent task with reinforcement learning (RL) algorithms. Traditional reinforcement learning algorithms often fail to learn the cooperation between different agents, which is vital for multi-agent problems. A promising solution is to establish a communication protocol among agents. However, existing approaches often suffer from generalization challenges especially in tasks with partial observation and dynamic variation of agent amount. In this paper, we develop a Cluster-Coordinated Network (CCNet) to address the “Learning-to-communicate” problem in multi-agent system by utilizing the combination of a trainable Vector of Locally Aggregated Descriptor (VLAD) algorithm and reinforcement learning. Embedding with a VLAD based end-to-end trainable communication information processing module (called VLAD Processing Core), CCNet can learn efficient communication protocols even from scratch under partially observable environments and possesses robustness to the dynamic changes of agent number as well. Moreover, with the help of communication, CCNet is with less non-stationarity when training the network by common RL algorithms. We evaluated the proposed CCNet on two multi-agent partially observable tasks, \emph{i.e.}, Traffic Junction and Combat Task. The experimental results have demonstrated that CCNet is effective and improves the performance by a large margin over the state-of-the-art methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/wen18a.html
http://proceedings.mlr.press/v95/wen18a.htmlA Data Driven Approach to Predicting Rating Scores for New RestaurantsThis paper focuses on predicting rating scores of new restaurants listed in online restaurant review platforms. Most existing works rely on customer reviews to make an prediction. However, in practice, the customer reviews for new restaurants are always missing. In this paper, we mine useful features from the information of restaurants as well as highly available urban data to tackle this problem. We propose a deep-learning based approach called MR-Net to model both endogenous and exogenous factors in a unified manner and capture deep feature interaction for rating score prediction. Extensive experiments on real world data from Dianping show that our approach achieves better performance than various baseline methods. To the best of our knowledge, it is the first work that predicts rating scores for new restaurants without the knowledge of customer reviews.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/wang18c.html
http://proceedings.mlr.press/v95/wang18c.htmlA Self-Attentive Hierarchical Model for Jointly Improving Text Summarization and Sentiment ClassificationText summarization and sentiment classification, in NLP, are two main tasks implemented on text analysis, focusing on extracting the major idea of a text at different levels. Based on the characteristics of both, sentiment classification can be regarded as a more abstractive summarization task. According to the scheme, a Self-Attentive Hierarchical model for jointly improving text Summarization and Sentiment Classification (SAHSSC) is proposed in this paper. This model jointly performs abstractive text summarization and sentiment classification within a hierarchical end-to-end neural framework, in which the sentiment classification layer on top of the summarization layer predicts the sentiment label in the light of the text and the generated summary. Furthermore, a self-attention layer is also proposed in the hierarchical framework, which is the bridge that connects the summarization layer and the sentiment classification layer and aims at capturing emotional information at text-level as well as summary-level. The proposed model can generate a more relevant summary and lead to a more accurate summary-aware sentiment prediction. Experimental results evaluated on SNAP amazon online review datasets show that our model outperforms the state-of-the-art baselines on both abstractive text summarization and sentiment classification by a considerable margin.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/wang18b.html
http://proceedings.mlr.press/v95/wang18b.htmlDeep Correlation Structure Preserved Label Space Embedding for Multi-label ClassificationLabel embedding is an effective and efficient method which can jointly extract the information of all labels for better performance of multi-label classification. However, most existing embedding methods ignore information of feature space or intrinsic structure of previous label space, such that their learned latent space will not have strong predictability and discriminant ability. We propose a novel deep neural network (DNN) based model, namely Deep Correlation Structure Preserved Label Space Embedding (DCSPE). Specifically, DCSPE derives a deep latent space by performing feature-aware label space embedding with deep canonical correlation analysis (DCCA) and preserving the intrinsic structure of the previous label space with proposed deep multidimensional scaling (DMDS). Our DCSPE is achieved by integrating the DNN architectures of the two DNN based models and can learn a feature-aware structure preserved deep latent space. Furthermore, extensive experimental results on datasets with many labels demonstrate that our proposed approach is significantly better than the existing label embedding algorithms.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/wang18a.html
http://proceedings.mlr.press/v95/wang18a.htmlBatch Normalized Deep Boltzmann MachinesTraining Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters $\bnshift$ but learning scale parameters $\bnscale$; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The experimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature representation and classification accuracy ($3.98%$ and $5.84%$ average improvement for pretraining and no pretraining respectively).Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/vu18a.html
http://proceedings.mlr.press/v95/vu18a.htmlHypernetwork-based Implicit Posterior Estimation and Model Averaging of CNNDeep neural networks have a rich ability to learn complex representations and achieved remarkable results in various tasks. However, they are prone to overfitting due to the limited number of training samples; regularizing the learning process of neural networks is critical. In this paper, we propose a novel regularization method, which estimates parameters of a large convolutional neural network as implicit probabilistic distributions generated by a hypernetwork. Also, we can perform model averaging to improve the network performance. Experimental results demonstrate our regularization method outperformed the commonly-used maximum a posterior (MAP) estimation.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/ukai18a.html
http://proceedings.mlr.press/v95/ukai18a.htmlRDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced DatasetsClustering is a fundamental machine learning task and can be used in many applications. With the development of deep neural networks (DNNs), combining techniques from DNNs with clustering has become a new research direction and achieved some success. However, few studies have focused on the imbalanced-data problem which commonly occurs in real-world applications. In this paper, we propose a clustering method, regularized deep embedding clustering (RDEC), that integrates virtual adversarial training (VAT), a network regularization technique, with a clustering method called deep embedding clustering (DEC). DEC optimizes cluster assignments by pushing data more densely around centroids in latent space, but it is sometimes sensitive to the initial location of centroids, especially in the case of imbalanced data, where the minor class has less chance to be assigned a good centroid. RDEC introduces regularization using VAT to ensure the model’s robustness to local perturbations of data. VAT pushes data that are similar in the original space closer together in the latent space, bunching together data from minor classes and thereby facilitating cluster identification by RDEC. Combining the advantages of DEC and VAT, RDEC attains state-of-the-art performance on both balanced and imbalanced benchmark/real-world datasets. For example, accuracies are as high as 98.41% on MNIST dataset and 85.45% on a highly imbalanced dataset derived from the MNIST, which is nearly 8% higher than the current best result.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/tao18a.html
http://proceedings.mlr.press/v95/tao18a.htmlRICAP: Random Image Cropping and Patching Data Augmentation for Deep CNNsDeep convolutional neural networks (CNNs) have demonstrated remarkable results in image recognition owing to their rich expression ability and numerous parameters. However, an excessive expression ability compared to the variety of training images often has a risk of overfitting. Data augmentation techniques have been proposed to address this problem as they enrich datasets by flipping, cropping, resizing, and color-translating images. They enable deep CNNs to achieve an impressive performance. In this study, we propose a new data augmentation technique called \emph{random image cropping and patching} (\emph{RICAP}), which randomly crops four images and patches them to construct a new training image. Hence, RICAP randomly picks up subsets of original features among the four images and discard others, enriching the variety of training images. Also, RICAP mixes the class labels of the four images and enjoys a benefit similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., shake-shake regularization model) and achieved a new state-of-the-art test error of \textcolor{red}{$2.23%$} on CIFAR-10 among competitive data augmentation techniques such as cutout and mixup. We also confirmed that deep CNNs with RICAP achieved better results on CIFAR-100 and ImageNet than those results obtained by other techniques.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/takahashi18a.html
http://proceedings.mlr.press/v95/takahashi18a.htmlUnsupervised Heterogeneous Domain Adaptation with Sparse Feature TransformationHeterogeneous domain adaptation (HDA), which aims to adapt information across domains with different input feature spaces, has attracted a lot of attention recently. However, many existing HDA approaches rely on labeled data in the target domain, which is either scarce or even absent in many tasks. In this paper, we propose a novel unsupervised heterogeneous domain adaptation approach to bridge the representation gap between the source and target domains. The proposed method learns a sparse feature transformation function based on the data in both the source and target domains and a small number of existing parallel instances. The learning problem is formulated as a sparsity regularized optimization problem and an ADMM algorithm is developed to solve it. We conduct experiments on several real-world domain adaptation datasets and the experimental results validate the advantages of the proposed method over existing unsupervised heterogeneous domain adaptation approaches.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/shen18b.html
http://proceedings.mlr.press/v95/shen18b.htmlEnd-to-End Time Series Imputation via Residual Short PathsTime series imputation (replacing
missing data) plays an important role in time series analysis due to missing values in real world data. How to recover missing values and model the underlying dynamic dependencies from incomplete time series remains a challenge. A recent work has found that residual networks help build very deep networks by leveraging short paths due to skip connections (Veit et al., 2016). Inspired by this, we observe that these short paths can model underlying correlations between missing items and their previous non-missing observations in a graph-like way. Hence, we propose an end-to-end imputation network with residual short paths, called Residual IMPutation LSTM (RIMP-LSTM), a flexible combination of residual short paths with graph-based temporal dependencies. We construct a residual sum unit (RSU), which enables RIMP-LSTM to make full use of previous revealed information to model incomplete time series and reduce the negative impact of missing values. Moreover, a switch unit is designed to detect the missing values and a new loss function is then developed to train our model with time series in the presence of missing values in an end-to-end way, which also allows simultaneous imputation and prediction. Extensive empirical comparisons with other competitive imputation approaches over several synthetic and real world time series with various rates of missing data verify the superiority of our model.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/shen18a.html
http://proceedings.mlr.press/v95/shen18a.htmlKnowledge Guided Multi-instance Multi-label Learning via Neural Networks in Medicines PredictionPredicting medicines for patients with co-morbidity has long been recognized as a hard task due to complex dependencies between diseases and medicines. Efforts have been made recently to build high-order dependency between diseases and medicines by extracting knowledge from electronic health records (EHR). But current works failed to utilize additional knowledge and ignored the data skewness problem which lead to sub-optimal combination of medicines. In this paper, we formulate the medicines prediction task in multi-instance multi-label learning framework considering the multi-diagnoses as input instances and multi-medicines as output labels. We propose a knowledge-guided multi-instance multi-label networks called \mname where two types of additional knowledge are incorporated into a RNN encoder-decoder model. The utilization of structural knowledge like clinical ontology provides a way to learn better representation called tree embedding by utilizing the ancestors’ information. Contextual knowledge is a global summarization of input instances which is informative for personal prediction. Experiments are conducted on a real world clinical dataset which showed the necessity to combine both contextual and structural knowledge and the \mname performs better than baselines up to 4+% in terms of Jaccard similarity score.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/shang18c.html
http://proceedings.mlr.press/v95/shang18c.htmlASVRG: Accelerated Proximal SVRGThis paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick. Unlike most existing accelerated stochastic variance reduction methods such as Katyusha, ASVRG has only one additional variable and one momentum parameter. Thus, ASVRG is much simpler than those methods, and has much lower per-iteration complexity. We prove that ASVRG achieves the best known oracle complexities for both strongly convex and non-strongly convex objectives. In addition, we extend ASVRG to mini-batch and non-smooth settings. We also empirically verify our theoretical results and show that the performance of ASVRG is comparable with, and sometimes even better than that of the state-of-the-art stochastic methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/shang18b.html
http://proceedings.mlr.press/v95/shang18b.htmlZoomNet: Deep Aggregation Learning for High-Performance Small Pedestrian DetectionIt remains very challenging for a single deep model to detect pedestrians of different sizes appears in an image. One typical remedy for the small pedestrian detection is to up-sample the input and pass it to the network multiple times. Unfortunately this strategy not only exponentially increases the computational cost but also probably impairs the model effectiveness. In this work, we present a deep architecture, refereed to as ZoomNet, which performs small pedestrian detection by deep aggregation learning without up-sampling the input. ZoomNet learns and aggregates deep feature representations at multiple levels and retains the spatial information of the pedestrian from different scales. ZoomNet also learns to cultivate the feature representations from the classification task to the detection task and obtains further performance improvements. Extensive experimental results demonstrate the state-of-the-art performance of ZoomNet. The source code of this work will be made public available to facilitate further studies on this problem.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/shang18a.html
http://proceedings.mlr.press/v95/shang18a.htmlA Faster Sampling Algorithm for Spherical $k$-meansThe <em>Spherical $k$-means</em> algorithm proposed by (Dhillon and Modha, 2001) is a popular algorithm for clustering high dimensional datasets. Although their algorithm is simple and easy to implement, a drawback of the same is that it doesn’t provide any provable guarantee on the clustering result. (Endo and Miyamoto, 2015) suggest an adaptive sampling based algorithm (<em>Spherical $k$-means$++$</em>) which gives near optimal results, with high probability. However, their algorithm requires $k$ sequential passes over the entire dataset, which may not be feasible when the dataset and/or the values of $k$ are large. In this work, we propose a Markov chain based sampling algorithm that takes only one pass over the data, and gives close to optimal clustering similar to <em>Spherical $k$-means$++$</em>, <em>i.e.</em>, a faster algorithm while maintaining almost the same approximation. We present a theoretical analysis of the algorithm, and complement it with rigorous experiments on real-world datasets. Our proposed algorithm is simple and easy to implement, and can be easily adopted in practice.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/pratap18a.html
http://proceedings.mlr.press/v95/pratap18a.htmlConcorde: Morphological Agreement in Conversational ModelsNeural conversational models are widely used in applications such as personal assistants and chat bots. These models seem to give better performance when operating on the word level. However, for fusional languages such as French, Russian, or Polish, the vocabulary size can become infeasible since most of the words have multiple of word forms. To reduce vocabulary size, we propose a new pipeline for building conversational models: first generate words in a standard (lemmatized) form and then transform them into a grammatically correct sentence. In this work, we focus on the \emph{morphological agreement} part of the pipeline, i.e., reconstructing proper word forms from lemmatized sentences. For this task, we propose a neural network architecture that outperforms character-level models while being twice faster in training and 20% faster in inference. The proposed pipeline yields better performance than character-level conversational models according to human assessor testing.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/polykovskiy18a.html
http://proceedings.mlr.press/v95/polykovskiy18a.htmlClustering Induced Kernel LearningLearning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose <em>Clustering Induced Kernel Learning</em> (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/nguyen18a.html
http://proceedings.mlr.press/v95/nguyen18a.htmlCHS-NET: A Cascaded Neural Network with Semi-Focal Loss for Mitosis DetectionCounting of mitotic figures in hematoxylin and eosin(H&E) stained histological slide is the main indicator of tumor proliferation speed which is an important biomarker indicative of breast cancer patients’ prognosis. It is difficult to detect mitotic cells due to the diversity of the cells and the problem of class imbalance. We propose a new network called CHS-NET which is a cascaded neural network with hard example mining and semi-focal loss to detect mitotic cells in breast cancer. First, we propose a screening network to identify the candidates of mitotic cells preliminary and a refined network to identify mitotic cells from these candidates more accurately. We propose a new feature fusion module in each network to explore complex nonlinear predictors and improve accuracy. Then, we propose a novel loss named semi-focal loss and we use off-line hard example mining to solve the problem of class imbalance and error labeling. Finally, we propose a new training skill of cutting patches in the whole slide image, considering the size and distribution of mitotic cells. Our method achieves 0.68 F1 score which outperforms the best result in Tumor Proliferation Assessment Challenge 2016 held by MICCAI.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/ma18a.html
http://proceedings.mlr.press/v95/ma18a.htmlMaking Classifier Chains Resilient to Class ImbalanceClass imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/liu18c.html
http://proceedings.mlr.press/v95/liu18c.htmlA Scalable Heterogeneous Parallel SOM Based on MPI/CUDASelf-Organizing Map (SOM) is a kind of artificial neural network used in unsupervised machine learning, which is widely applied to clustering, dimension reduction and visualization for high-dimensional data, etc. There are two major versions of the training algorithm: original algorithm and batch algorithm. Compared with the original, the batch algorithm has some advantages including faster convergence and less computation, and is suitable for parallelization. However, it is still confronted with the challenge of eficiency in the case of massive data, high-dimensional data or a large-scale map. In this paper, a scalable heterogeneous parallel SOM based on the batch algorithm is proposed which combines process-level and thread-level parallelism by MPI and CUDA. To boost the parallel efficiency on GPUs and make full use of the high floating-point computing capability, we design matrix operations for the the most time-consuming steps, the computation of best match units and weights update, making the steps available for the implementation by cuBLAS. In addition, the memory optimization methods are adopted. The experiments show that the proposed heterogeneous parallel SOM is effective, efficient and scalable.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/liu18b.html
http://proceedings.mlr.press/v95/liu18b.htmlLearning Selfie-Friendly Abstraction from Artistic Style ImagesArtistic style transfer can be thought as a process to generate different versions of abstraction of the original image. However, most of the artistic style transfer operators are not optimized for human faces thus mainly suffers from two undesirable features when applying them to selfies. First, the edges of human faces may unpleasantly deviate from the ones in the original image. Second, the skin color is far from faithful to the original one which is usually problematic in producing quality selfies. In this paper, we take a different approach and formulate this abstraction process as a gradient domain learning problem. We aim to learn a type of abstraction which not only achieves the specified artistic style but also circumvents the two aforementioned drawbacks thus highly applicable to selfie photography. We also show that our method can be directly generalized to videos with high inter-frame consistency. Our method is also robust to non-selfie images, and the generalization to various kinds of real-life scenes is discussed. We will make our code publicly available.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/liu18a.html
http://proceedings.mlr.press/v95/liu18a.htmlEfficient Mechanisms for Peer Grading and Dueling BanditsMany scenarios in our daily life require us to infer some ranking over items or people based on limited information. In this paper, we consider two such scenarios, one for ranking student papers in massive online open courses and one for identifying the best player (or team) in sports tournaments. For the peer grading problem, we design a mechanism with a new way of matching graders to papers. This allows us to aggregate partial rankings from graders into a global one, with an accuracy rate matching the best in previous works, but with a much simpler analysis. For the winner selection problem in sports tournaments, we cast it as the well-known dueling bandit problem and identify a new measure to minimize: the number of parallel rounds, as one normally would not like a large tournament to last too long. We provide mechanisms which can determine the optimal or an almost optimal player in a small number of parallel rounds and at the same time using a small number of competitions.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/lin18a.html
http://proceedings.mlr.press/v95/lin18a.htmlOptimization Algorithm Inspired Deep Neural Network Structure DesignDeep neural networks have been one of the dominant machine learning approaches in recent years. Several new network structures are proposed and have better performance than the traditional feedforward neural network structure. Representative ones include the skip connection structure in ResNet and the dense connection structure in DenseNet. However, it still lacks a unified guidance for the neural network structure design. In this paper, we propose the hypothesis that the neural network structure design can be inspired by optimization algorithms and a faster optimization algorithm may lead to a better neural network structure. Specifically, we prove that the propagation in the feedforward neural network with the same linear transformation in different layers is equivalent to minimizing some function using the gradient descent algorithm. Based on this observation, we replace the gradient descent algorithm with the heavy ball algorithm and Nesterov’s accelerated gradient descent algorithm, which are faster and inspire us to design new and better network structures. ResNet and DenseNet can be considered as two special cases of our framework. Numerical experiments on CIFAR-10, CIFAR-100 and ImageNet verify the advantage of our optimization algorithm inspired structures over ResNet and DenseNet.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/li18f.html
http://proceedings.mlr.press/v95/li18f.htmlConstruction of Incoherent Dictionaries via Direct Babel Function MinimizationHighly incoherent dictionaries have broad applications in machine learning. Minimizing the mutual coherence is a common intuition to construct incoherent dictionaries in the previous methods. However, as pointed out by Tropp(2004), mutual coherence does not offer a very subtle description and Babel function, as a generalization of mutual coherence, is a more attractive alternative. However, it is much more challenging to optimize. In this work, we minimize the Babel function directly to construct incoherent dictionaries. As far as we know, this is the first work to optimize the Babel function. We propose an augmented Lagrange multiplier based algorithm to solve this nonconvex and nonsmooth problem with the convergence guarantee that every accumulation point is a KKT point. We define a new norm $\|\X\|_{\infty,max_p}$ and propose an efficient method to compute its proximal operation with $O(n^2\mbox{log}n)$ complexity, which dominates the running time of our algorithm, where $max_p$ means the sum of the largest $p$ elements and $n$ is the number of the atoms. Numerical experiments testify to the advantage of our method.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/li18e.html
http://proceedings.mlr.press/v95/li18e.htmlCharacter-based BiLSTM-CRF Incorporating POS and Dictionaries for Chinese Opinion Target ExtractionOpinion target extraction (OTE) is a fundamental step for sentiment analysis and opinion summarization. We analyze the difference between Chinese and the Indo-European languages family, and reduce Chinese OTE to a character-based sequence tagging task. Then we introduce two novel features for each character by distributing POS differentially and using predefined templates over contexts and dictionaries. We further propose a character-based BiLSTM-CRF model incorporating the two feature sequences aligned with the character sequence. Experimental results on real-world consumer review datasets show that our work significantly outperforms the baseline methods for Chinese OTE.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/li18d.html
http://proceedings.mlr.press/v95/li18d.htmlStock Price Prediction Using Attention-based Multi-Input LSTMStock price prediction has always been a hot but challenging task due to the complexity and randomness in stock market. Investors and researchers usually derive a great number of factors from original data such as historical stock price, company profit, or textual data collected from social media. Normally these factors are then fed into models like linear regression, SVM or neural networks to make a prediction. Even though the number of factors are considerable, most of them have relatively weak correlations with future stock price. During training process, these factors not only result in additional computation but sometimes even be harmful to the performance of prediction. In this paper, we propose a novel multi-input LSTM model which is capable of extracting valuable information from low-correlated factors and discarding their harmful noise by employing extra input gates controlled by the convincing factors called \emph{mainstream}. We also introduce several new factors including the prices of other related stocks to improve the prediction accuracy. The experimental results on the stock data from China stock market demonstrate the effectiveness of the proposed approach compared with the state-of-the-art methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/li18c.html
http://proceedings.mlr.press/v95/li18c.htmlClustering Uncertain Graphs with Node AttributesGraph clustering has attracted much attention in recent years, which has wide applications in social and biological networks. Recent approaches on graph clustering mainly focus on either certain graphs with node attributes or uncertain graphs without node attributes. However, many real-world graphs have both uncertainty on the edges and attributes on the nodes. We refer to such networks as \emph{attributed uncertain graphs}. Different from conventional graphs, attributed uncertain graphs post two major challenges for graph clustering: 1) uncertainty on the edges, which makes it difficult to extract reliable clusters; 2) high dimensional attributes on the nodes, which contain irrelevant and noisy information. In this paper, we study the problem of node clustering on attributed uncertain graphs, where we exploit both the uncertain edges and a set of important attributes for graph clustering. The uncertain edges can help identify the set of relevant attributes in the nodes, which are called focus attributes. While the focus attributes can help reduce the uncertainty in edges for graph clustering. We propose two novel approaches: AUG-I based upon integrated attribute induced graphs and AUG-U based upon the unified partition over possible worlds of a uncertain graph. Extensive empirical studies on real-world datasets demonstrate the effectiveness of our approaches for clustering tasks on attributed uncertain graphs.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/li18b.html
http://proceedings.mlr.press/v95/li18b.htmlFeature-correlation-aware Gaussian Process Latent Variable ModelGaussian Process Latent Variable Model (GPLVM) is a powerful nonlinear dimension reduction model and has been widely used in many machine learning scenarios. However, the original GPLVM and its variants do not explicitly model the correlations among the original features, leading to the underutilization of underlying information involved in the data. To compensate for this shortcoming, we propose a feature-correlation-aware GPLVM (fcaGPLVM) to simultaneously learn the latent variables and the feature correlations. The main contributions of this paper are 1) introducing a set of extra latent variables into the original GPLVM and proposing a feature-correlation-aware kernel function to explicitly model the feature-description information and infer the feature correlations; 2) defining a joint objective function and developing a stochastic optimization algorithm based on the stochastic variational inference (SVI) to learn all the latent variables. To the best of our knowledge, this is the first work that explicitly considers the feature correlations in the GPLVM and makes many existing GPLVMs become its special cases. Furthermore, it can be applied to both unsupervised and supervised learnings to improve the performance of dimension reduction. Experimental results show that in these two learning scenarios the proposed models outperform their state-of-the-art counterparts.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/li18a.html
http://proceedings.mlr.press/v95/li18a.htmlExtracting Invariant Features From Images Using An Equivariant AutoencoderConvolutional Neural Networks achieve state of the art results in many image recognition tasks. While their structure makes predictions invariant to small translations, some recognition tasks require invariance to other transformations, like rotation and reflection. We apply group convolutions to build an Equivariant Autoencoder with embeddings that change predictably under the specified set of transformations. We then introduce two approaches to extracting invariant features from these embeddings—Gram Pooling and Equivariant Attention. These two methods separate transformation-relevant information from all other image features. We use obtained embeddings in classification and clustering tasks and show an improvement of the classification quality on the learned embeddings compared to pure autoencoder and average pooling method. A visualization of the learned manifold shows that objects of the same class tend to cluster together, which was not observed for the pure autoencoder embeddings.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/kuzminykh18a.html
http://proceedings.mlr.press/v95/kuzminykh18a.htmlReSet: Learning Recurrent Dynamic Routing in ResNet-like Neural NetworksNeural Network is a powerful Machine Learning tool that shows outstanding performance in Computer Vision, Natural Language Processing, and Artificial Intelligence. In particular, recently proposed ResNet architecture and its modifications produce state-of-the-art results in image classification problems. ResNet and most of the previously proposed architectures have a fixed structure and apply the same transformation to all input images. In this work, we develop a ResNet-based model that dynamically selects Computational Units (CU) for each input object from a learned set of transformations. Dynamic selection allows the network to learn a sequence of useful transformations and apply only required units to predict the image label. We compare our model to ResNet-38 architecture and achieve better results than the original ResNet on CIFAR-10.1 test set. While examining the produced paths, we discovered that the network learned different routes for images from different classes and similar routes for similar images.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/kemaev18a.html
http://proceedings.mlr.press/v95/kemaev18a.htmlDistinguishing Question Subjectivity from Difficulty for Improved CrowdsourcingThe questions in a crowdsourcing task typically exhibit varying degrees of <em>difficulty</em> and <em>subjectivity</em>. Their joint effects give rise to the variation in responses to the same question by different crowd-workers. This variation is low when the question is easy to answer and objective, and high when it is difficult and subjective. Unfortunately, current quality control methods for crowdsourcing consider only the question difficulty to account for the variation. As a result, these methods cannot distinguish workers' ,<em>personal preferences</em> for different correct answers of a <em>partially subjective</em> question from their <em>ability</em> to avoid objectively incorrect answers for that question. To address this issue, we present a probabilistic model which (i) explicitly encodes question difficulty as a model parameter and (ii) implicitly encodes question subjectivity via latent <em>preference factors</em> for crowd-workers. We show that question subjectivity induces grouping of crowd-workers, revealed through clustering of their latent preferences. Moreover, we develop a quantitative measure for the question subjectivity. Experiments show that our model (1) improves both the question true answer prediction and the unseen worker response prediction, and (2) can potentially provide rankings of questions coherent with human assessment in terms of difficulty and subjectivity.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/jin18a.html
http://proceedings.mlr.press/v95/jin18a.htmlCartoon-to-Photo Facial Translation with Generative Adversarial NetworksCartoon-to-photo facial translation could be widely used in different applications, such as law enforcement and anime remaking. Nevertheless, current general-purpose image-to-image models \ygyan{usually} %can only produce blurry or unrelated results in this task. In this paper, we propose a Cartoon-to-Photo facial translation with Generative Adversarial Networks (\name) for inverting cartoon faces to generate photo-realistic and related face images. In order to produce convincing faces with intact facial parts, we exploit global and local discriminators to capture global facial features and three local facial regions, respectively. Moreover, we use a specific content network to capture and preserve face characteristic and identity between cartoons and photos. As a result, the proposed approach can generate convincing high-quality faces that satisfy both the characteristic and identity constraints of input cartoon faces. Compared with recent works on unpaired image-to-image translation, our proposed method is able to generate more realistic and correlative images.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/huang18a.html
http://proceedings.mlr.press/v95/huang18a.htmlUnderwater Image Restoration Based on Convolutional Neural NetworkRestoring degraded underwater images is a challenging ill-posed problem. Existing priors-based approaches have limited performance in many situations due to their hand-designed features. In this paper, we propose an effective convolutional neural network (CNN) based approach for underwater image restoration, which consists of a transmission estimation network (T-network) and a global ambient light estimation network (A-network). By learning the relationship between the underwater scenes and their corresponding blue channel transmission map and global ambient light respectively, we can recover and enhance the underwater images with an underwater optical imaging model. In T-network, we use cross-layer connection and multi-scale estimation to prevent halo artifacts and to preserve edge features. Moreover, we develop a new underwater image synthetic method for training, which can simulate underwater images captured in various underwater environments. Experimental results of synthetic and real images demonstrate that our restored underwater images exhibits more natural color correction and better visibility improvement against these state-of-the-art methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/hu18a.html
http://proceedings.mlr.press/v95/hu18a.htmlPreconditioned Conjugate Gradient Methods in Truncated Newton Frameworks for Large-scale Linear ClassificationTruncated Newton method is one of the most effective optimization methods for large-scale linear classification. The main computational task at each Newton iteration is to approximately solve a quadratic sub-problem by an iterative procedure such as the conjugate gradient (CG) method. It is known that CG has slow convergence if the sub-problem is ill-conditioned. Preconditioned CG (PCG) methods have been used to improve the convergence of the CG method, but it is difficult to find a preconditioner that performs well in most situations. Further, because Hessian-free optimization techniques are incorporated for handling large data, many existing preconditioners are not directly applicable. In this work, we detailedly study some preconditioners that have been considered in past works for linear classification. We show that these preconditioners may not help to improve the training speed in some cases. After some investigation, we propose simple and effective techniques to make the PCG method more robust in a truncated Newton framework. The idea is to avoid the situation when a preconditioner leads to a much worse condition number than when it is not applied. We provide theoretical justification. Through carefully designed experiments, we demonstrate that our method can effectively reduce the training time for large-scale problems.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/hsia18a.html
http://proceedings.mlr.press/v95/hsia18a.htmlDeep Embedded Clustering with Data AugmentationDeep Embedded Clustering (DEC) surpasses traditional clustering algorithms by jointly performing feature learning and cluster assignment. Although a lot of variants have emerged, they all ignore a crucial ingredient, \emph{data augmentation}, which has been widely employed in supervised deep learning models to improve the generalization. To fill this gap, in this paper, we propose the framework of Deep Embedded Clustering with Data Augmentation (DEC-DA). Specifically, we first train an autoencoder with the augmented data to construct the initial feature space. Then we constrain the embedded features with a clustering loss to further learn clustering-oriented features. The clustering loss is composed of the target (pseudo label) and the actual output of the feature learning model, where the target is computed by using clean (non-augmented) data, and the output by augmented data. This is analogous to supervised training with data augmentation and expected to facilitate unsupervised clustering too. Finally, we instantiate five DEC-DA based algorithms. Extensive experiments validate that incorporating data augmentation can improve the clustering performance by a large margin. Our DEC-DA algorithms become the new state of the art on various datasets.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/guo18b.html
http://proceedings.mlr.press/v95/guo18b.htmlMultidimensional Time Series Anomaly Detection: A GRU-based Gaussian Mixture Variational Autoencoder ApproachUnsupervised anomaly detection on multidimensional time series data is a very important problem due to its wide applications in many systems such as cyber-physical systems, the Internet of Things. Some existing works use traditional variational autoencoder (VAE) for anomaly detection. They generally assume a single-modal Gaussian distribution as prior in the data generative procedure. However, because of the intrinsic multimodality in time series data, previous works cannot effectively learn the complex data distribution, and hence cannot make accurate detections. To tackle this challenge, in this paper, we propose a GRU-based Gaussian Mixture VAE system for anomaly detection, called GGM-VAE. In particular, Gated Recurrent Unit (GRU) cells are employed to discover the correlations among time sequences. Then we use Gaussian Mixture priors in the latent space to characterize multimodal data. The proposed detector reports an anomaly when the reconstruction probability is below a certain threshold. We conduct extensive simulations on real world datasets and find that our proposed scheme outperforms the state-of-the-art anomaly detection schemes and achieves up to 5.7% and 7.2% improvements in accuracy and F1 score, respectively, compared with existing methods.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/guo18a.html
http://proceedings.mlr.press/v95/guo18a.htmlA Joint Selective Mechanism for Abstractive Sentence SummarizationSequence-to-sequence (Seq2Seq) learning framework has been widely used in many natural language processing (NLP) tasks, including abstractive summarization and machine translation (MT). However, abstractive summarization generates the output in a <em>lossy manner</em>, in comparison with MT which is almost <em>loss-less</em>. We model this by introducing a joint selective mechanism: (i) A selective gate is added after encoding phase of the Seq2Seq learning framework, which learns to tailor the original input information and generates a selected input representation. (ii) A selection loss function is also added to help our selective gate function well, which is computed by looking at the input and the output jointly. Experimental results show that our proposed model outperforms most of the baseline models and is comparable to the state-of-the-art model in automatic evaluations.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/fu18a.html
http://proceedings.mlr.press/v95/fu18a.htmlFast Randomized PCA for Sparse DataPrincipal component analysis (PCA) is widely used for dimension reduction and embedding of real data in social network analysis, information retrieval, and natural language processing, etc. In this work we propose a fast randomized PCA algorithm for processing large sparse data. The algorithm has similar accuracy to the basic randomized SVD (rPCA) algorithm (Halko et al., 2011), but is largely optimized for sparse data. It also has good flexibility to trade off runtime against accuracy for practical usage. Experiments on real data show that the proposed algorithm is up to 9.1X faster than the basic rPCA algorithm without accuracy loss, and is up to 20X faster than the \texttt{svds} in Matlab with little error. The algorithm computes the first 100 principal components of a large information retrieval data with 12,869,521 persons and 323,899 keywords in less than 400 seconds on a 24-core machine, while all conventional methods fail due to the out-of-memory issue.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/feng18a.html
http://proceedings.mlr.press/v95/feng18a.htmlDeep Fully-Connected Part-Based Models for Human Pose EstimationWe propose a 2D multi-level appearance representation of the human body in RGB images, spatially modelled using a fully-connected graphical model. The appearance model is based on a CNN body part detector, which uses shared features in a cascade architecture to simultaneously detect body parts with different levels of granularity. We use a fully-connected Conditional Random Field (CRF) as our spatial model, over which approximate inference is efficiently performed using the Mean-Field algorithm, implemented as a Recurrent Neural Network (RNN). The stronger visual support from body parts with different levels of granularity, along with the fully-connected pairwise spatial relations, which have their weights learnt by the model, improve the performance of the bottom-up part detector. We adopt an end-to-end training strategy to leverage the potential of both our appearance and spatial models, and achieve competitive results on the MPII and LSP datasets.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/de-bem18a.html
http://proceedings.mlr.press/v95/de-bem18a.htmlFast Dynamic Convolutional Neural Networks for Visual
TrackingMost of the existing tracking methods based on CNN(convolutional neural networks) are too slow for real-time application despite the excellent tracking precision compared with the traditional ones. In this paper, a fast dynamic visual tracking algorithm combining CNN based MDNet(Multi-Domain Network) and RoIAlign was developed. The major problem of MDNet also lies in the time efficiency. Considering the computational complexity of MDNet is mainly caused by the large amount of convolution operations and fine-tuning of the network during tracking, a RoIPool layer which could conduct the convolution over the whole image instead of each RoI is added to accelerate the convolution and a new strategy of fine-tuning the fully-connected layers is used to accelerate the update. With RoIPool employed, the computation speed has been increased but the tracking precision has dropped simultaneously. RoIPool could lose some positioning precision because it can not handle locations represented by floating numbers. So RoIAlign, instead of RoIPool, which can process floating numbers of locations by bilinear interpolation has been added to the network. The results show the target localization precision has been improved and it hardly increases the computational cost. These strategies can accelerate the processing and make it 7 $\times$ faster than MDNet with very low impact on precision and it can run at around 7 fps. The proposed algorithm has been evaluated on two benchmarks: OTB100 and VOT2016, on which high precision and speed have been obtained. The influence of the network structure and training data are also discussed with experiments. The source code is publicly available on github: https://github.com/ZhiyanCui/fdn-releaseSun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/cui18a.html
http://proceedings.mlr.press/v95/cui18a.htmlAdversarial TableQA: Attention Supervision for Question Answering on TablesThe task of answering a question given a text passage has shown great developments on model performance thanks to community efforts in building useful datasets. Recently, there have been doubts whether such rapid progress has been based on truly understanding language. The same question has not been asked in the table question answering (TableQA) task, where we are tasked to answer a query given a table. We show that existing efforts, of using “answers” for both evaluation and supervision for TableQA, show deteriorating performances in adversarial settings of perturbations that do not affect the answer. This insight naturally motivates to develop new models that understand question and table more precisely. For this goal, we propose \textsc{Neural Operator (NeOp)}, a multi-layer sequential network with attention supervision to answer the query given a table. \textsc{NeOp} uses multiple Selective Recurrent Units (SelRUs) to further help the interpretability of the answers of the model. Experiments show that the use of operand information to train the model significantly improves the performance and interpretability of TableQA models. \textsc{NeOp} outperforms all the previous models by a big margin.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/cho18a.html
http://proceedings.mlr.press/v95/cho18a.htmlAn Empirical Evaluation of Sketched SVD and its Application to Leverage Score OrderingThe power of randomized algorithms in numerical methods have led to fast solutions which use the Singular Value Decomposition (SVD) as a core routine. However, given the large data size of modern and the modest runtime of SVD, most practical algorithms would require some form of approximation, such as sketching, when running SVD. While these approximation methods satisfy many theoretical guarantees, we provide the first algorithmic implementations for sketch-and-solve SVD problems on real-world, large-scale datasets. We provide a comprehensive empirical evaluation of these algorithms and provide guidelines on how to ensure accurate deployment to real-world data. As an application of sketched SVD, we present Sketched Leverage Score Ordering, a technique for determining the ordering of data in the training of neural networks. Our technique is based on the distributed computation of leverage scores using random projections. These computed leverage scores provide a flexible and efficient method to determine the optimal ordering of training data without manual intervention or annotations. We present empirical results on an extensive set of experiments across image classification, language sentiment analysis, and multi-modal sentiment analysis. Our method is faster compared to standard randomized projection algorithms and shows improvements in convergence and results.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/chin18a.html
http://proceedings.mlr.press/v95/chin18a.htmlTVT: Two-View Transformer Network for Video CaptioningVideo captioning is a task of automatically generating the natural text description of a given video. There are two main challenges in video captioning under the context of an encoder-decoder framework: 1) How to model the sequential information; 2) How to combine the modalities including video and text. For challenge 1), the recurrent neural networks (RNNs) based methods are currently the most common approaches for learning temporal representations of videos, while they suffer from a high computational cost. For challenge 2), the features of different modalities are often roughly concatenated together without insightful discussion. In this paper, we introduce a novel video captioning framework, i.e., Two-View Transformer (TVT). TVT comprises of a backbone of Transformer network for sequential representation and two types of fusion blocks in decoder layers for combining different modalities effectively. Empirical study shows that our TVT model outperforms the state-of-the-art methods on the MSVD dataset and achieves a competitive performance on the MSR-VTT dataset under four common metrics.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/chen18b.html
http://proceedings.mlr.press/v95/chen18b.htmlSecureNets: Secure Inference of Deep Neural Networks on an Untrusted CloudInference using deep neural networks may be outsourced to the cloud due to its high computational cost, which, however, raises security concerns. Particularly, the data involved in deep neural networks can be highly sensitive, such as in medical, financial, commercial applications, and hence should be kept private. Besides, the deep neural network models owned by research institutions or commercial companies are their valuable intellectual properties and can contain proprietary information, which should be protected as well. Moreover, an untrusted cloud service provider may return accurate and even erroneous computing results. To address the above issues, we propose a secure outsourcing framework for deep neural network inference called SecureNets, which can preserve both a user’s data privacy and his/her neural network model privacy, and also verify the computation results returned by the cloud. Specifically, we employ a secure matrix transformation scheme in SecureNets to avoid privacy leakage of the data and the model. Meanwhile, we propose a verification method that can efficiently verify the correctness of cloud computing results. Our simulation results on four- and five-layer deep neural networks demonstrate that SecureNets can reduce the processing runtime by up to $64%$. Compared with CryptoNets, one of the previous schemes, SecureNets can increase the throughput by $104.45%$ while reducing the data transmission size by $69.78%$ per instance.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/chen18a.html
http://proceedings.mlr.press/v95/chen18a.htmlStructured Gaussian Processes with Twin Multiple Kernel LearningVanilla Gaussian processes (GPs) have prohibitive computational needs for very large data sets. To overcome this difficulty, special structures in the covariance matrix, if exist, should be exploited using decomposition methods such as the Kronecker product. In this paper, we integrated the Kronecker decomposition approach into a multiple kernel learning (MKL) framework for GP regression. We first formulated a regression algorithm with the Kronecker decomposition of structured kernels for spatiotemporal modeling to learn the contribution of spatial and temporal features as well as learning a model for out-of-sample prediction. We then evaluated the performance of our proposed computational framework, namely, structured GPs with twin MKL, on two different real data sets to show its efficiency and effectiveness. MKL helped us extract relative importance of input features by assigning weights to kernels calculated on different subsets of temporal and spatial features.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/ak18a.html
http://proceedings.mlr.press/v95/ak18a.htmlProfitable BanditsOriginally motivated by default risk management applications, this paper investigates a novel problem, referred to as the \emph{profitable bandit problem} here. At each step, an agent chooses a subset of the $K\geq 1$ possible actions. For each action chosen, she then respectively pays and receives the sum of a random number of costs and rewards. Her objective is to maximize her cumulated profit. We adapt and study three well-known strategies in this purpose, that were proved to be most efficient in other settings: \textsc{kl-UCB}, \textsc{Bayes-UCB} and \textsc{Thompson Sampling}. For each of them, we prove a finite time regret bound which, together with a lower bound we obtain as well, establishes asymptotic optimality in some cases. Our goal is also to \emph{compare} these three strategies from a theoretical and empirical perspective both at the same time. We give simple, self-contained proofs that emphasize their similarities, as well as their differences. While both Bayesian strategies are automatically adapted to the geometry of information, the numerical experiments carried out show a slight advantage for \textsc{Thompson Sampling} in practice.Sun, 04 Nov 2018 00:00:00 +0000
http://proceedings.mlr.press/v95/achab18a.html
http://proceedings.mlr.press/v95/achab18a.html