Category: 미분류

7 AITRICS Papers Accepted to NeurIPS 2018

November 22, 2018

Seven research papers with AITRICS researchers have been accepted to NeurIPS 2018, the 32
nd Conference on Neural Information Processing Systems, and will be presented at the event from the 4th to the 7th in December

The ‘Neural Information Processing Systems” or “NIPS”, is a very well-known machine learning conference that recently adopted “NeurIPS’ as an alternative acronym for the conference.

This year, 1,011 papers were accepted out of 4,856 submissions for a 20.8% acceptance rate. AITRICS is among the top corporate research institutions with the most accepted papers, along with Deepmind, Element AI and Amazon.

Listed below is the overview of AITRICS papers that NeurIPS has accepted.

  1. Uncertainty-Aware Attention for Reliable Interpretation and Prediction.
  2. Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding
  3. DropMax: Adaptive Variational Softmax
  4. Stacked Semantic-Guided Attention Model for Fine-Grained Zero-Shot Learning
  5. Stochastic Chebyshev Gradient Descent for Spectral Optimization
  6. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
  7. Learning to Specialize with Knowledge Distillation for Visual Question Answering

The most important challenge with prediction models is to produce an argument explaining why the algorithm came up with a certain prediction or suggestion. An attention mechanism is most often used for interpretation of neural network-based systems, since it enables the neural network to focus on relevant parts of the input to produce predictions, and this way it measures the correlation of the features and the prediction results.

However, the problem with the conventional use of attention mechanisms is that it is difficult to analyze the reliability of feature contribution to the final result. When working with noisy datasets that can hardly be one-to-one matched with the prediction, such as in case of risk predictions with patients’ electronic health records, the overconfident and inaccurate attentions can lead to incorrect predictions, which could possibly result in severe outcomes.

To tackle such limitation of attention mechanisms, AITRICS proposes a novel methodology that captures input-level uncertainty – that knows when it is safe to make predictions and when it is not. This accurate calibration of model uncertainty and attentions that align well with human interpretations demonstrates that the use of uncertainty-aware attention can provide high reliability and interpretation in health care.

We also propose a joint learning framework for active feature acquisition and classification, which can minimize the number of unnecessary examinations and reduce overall medical expenses.

Doctors often make initial diagnosis based on a few symptoms that patients report. Then they conduct further examinations to narrow down the set of diseases the patient might have, until they are confident enough to make the final diagnosis. Some medical tests involve blood tests, a urinalysis, an electrocardiogram, and etc.

However, acquiring results from all the tests are often to be found inappropriate. It costs a bundle and may increase risks for not receiving proper treatment at the right time. Furthermore, collecting irrelevant features might only add noise to the data and make the prediction unstable.

In our paper, we suggest the framework that is designed to sequentially collect the subset of features to achieve the optimal prediction performance in the most cost-effective way.

We have evaluated both of our novel approaches on electronic health record (EHR) datasets, on which they outperformed all baselines in terms of prediction performance, interpretation, and feature acquisition cost.

Back to News List

NeurIPS 2018 Workshop on Metalearning

TAEML: Task-Adaptive Ensemble of Meta-Learners

Minseop Park, Saehoon Kim, Jungtaek Kim, Yanbin Liu, Seugnjin Choi
Most of meta-learning methods assume that a set of tasks in the meta-training phase is sampled from a single dataset. Thus when a new task is drawn from another dataset, the performance of meta-learning methods is degraded. To alleviate this effect, we introduce a task-adaptive ensemble network that aggregates meta-learners by putting more weights on the learners that are expected to perform well to the given task. Experiments demonstrate that our task-adaptive ensemble significantly outperforms previous meta-learners and their uniform averaging.


Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout

Juho Lee, Saehoon Kim, Jaehong Yoon, Hae Beom Lee, Eunho Yang, Sung Ju Hwang
While variational dropout approaches have been shown to be effective for network sparsification, they are still suboptimal in the sense that they set the dropout rate for each neuron without consideration of the input data. With such input-independent dropout, each neuron is evolved to be generic across inputs, which makes it difficult to sparsify networks without accuracy loss. To overcome this limitation, we propose adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta-Bernoulli prior. It allows each neuron to be evolved either to be generic or specific for certain inputs, or dropped altogether. Such input-adaptive sparsity- inducing dropout allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features. We validate our dependent variational beta-Bernoulli dropout on multiple public datasets, on which it obtains significantly more compact networks than baseline methods, with consistent accuracy improvements over the base networks.
View Publication


Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sungju Hwang, Yi Yang
The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data problem in the novel classification task still remains. In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem. Specifically, we propose to learn to propagate labels from labeled instances to unlabeled test instances, by learning a graph construction module that exploits the manifold structure in the data. TPN jointly learns both the parameters of feature embedding and the graph construction in an end-to-end manner. We validate TPN on multiple benchmark datasets, on which it largely outperforms existing few-shot learning approaches and achieves the state-of-the-art results.
View Publication


Mixed Effect Composite RNN-GP: A Personalized and Reliable Prediction Model for Healthcare

Ingyo Chung, Saehoon Kim, Juho Lee, Sung Ju Hwang, Eunho Yang
We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment and prevention. Our proposed framework targets to making reliable predictions from time-series data, such as Electronic Health Records (EHR), by modeling two complementary components: i) shared component that captures global trend across diverse patients and ii) patient-specific component that models idiosyncratic variability for each patient. To this end, we propose a composite model of a deep recurrent neural network (RNN) to exploit expressive power of the RNN in estimating global trends from large number of patients, and Gaussian Processes (GP) to probabilistically model individual time-series given relatively small number of time points. We evaluate the strength of our model on diverse and heterogeneous tasks in EHR datasets. The results show that our model significantly outperforms baselines such as RNN, demonstrating clear advantage over existing models when working with noisy medical data.
View Publication

NeurIPS 2018

Learning to Specialize with Knowledge Distillation for Visual Question Answering

Jonghwan Mun, Kimin Lee, Jinwoo Shin and Bohyung Han


Visual Question Answering (VQA) is a notoriously challenging problem because it involves various heterogeneous tasks defined by questions within a unified framework. Learning specialized models for individual types of tasks is intuitively attracting but surprisingly difficult; it is not straightforward to outperform naive independent ensemble approaches. We present a principled algorithm to learn specialized models with knowledge distillation under a multiple choice learning framework. The training examples are dynamically assigned to a subset of models for specializing their functionality. The assigned and non-assigned models are learned to predict ground-truth answers and imitate their own base models before specialization, respectively. Our approach alleviates the problem of data deficiency, which is a critical limitation in existing frameworks on multiple choice learning, and allows each model to learn its own specialized expertise without forgetting general knowledge by knowledge distillation. Our experiments show that the proposed algorithm achieves the superior performances compared to naive ensemble methods and other baselines in VQA. Our framework is also effective for more general tasks, e.g., image classification with a large number of labels, which is known to be difficult under existing multiple choice learning schemes.

View Publication

NeurIPS 2018

Stacked Semantic-Guided Attention Model for Fine-Grained Zero-Shot Learning

IYunlong Yu, Zhong Ji, Yanwei Fu, Jichang Guo, Yanwei Pang and Zhongfei Zhang


Zero-Shot Learning (ZSL) is achieved via aligning the semantic relationships between the global image feature vector and the corresponding class semantic descriptions. However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions. Besides, different regions contain distinct discriminative information. The important regions should contribute more to the prediction. To this end, we propose a novel stacked semantics-guided attention (S2GA) model to obtain semantic relevant features by using individual class semantic features to progressively guide the visual features to generate an attention map for weighting the importance of different local regions. Feeding both the integrated visual features and the class semantic features into a multi-class classification architecture, the proposed framework can be trained end-to-end. Extensive experimental results on CUB and NABird datasets show that the proposed approach has a consistent improvement on both fine-grained zero-shot classification and retrieval tasks.

View Publication

NeurIPS 2018

DropMax: Adaptive Variational Softmax

Haebeom Lee, Juho Lee, Saehoon Kim, Eunho Yang and Sung Ju Hwang


We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. Specifically, we overlay binary masking variables over class output probabilities, which are input-adaptively learned via variational inference. This stochastic regularization has an effect of building an ensemble classifier out of exponentially many classifiers with different decision boundaries. Moreover, the learning of dropout rates for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes. We validate our model on multiple public datasets for classification, on which it obtains significantly improved accuracy over the regular softmax classifier and other baselines. Further analysis of the learned dropout probabilities shows that our model indeed selects confusing classes more often when it performs classification.

View Publication

NeurIPS 2018

Uncertainty-Aware Attention for Reliable Interpretation and Prediction

*Jay Heo, *Haebeom Lee, Saehoon Kim, Juho Lee, Kwangjun Kim, Eunho Yang, and Sung Ju Hwang
(* indicates equal contribution)


Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians’ interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with “I don’t know” decision show that UA yields networks with high reliability as well.

View Publication

NeurIPS 2018

Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding

Hajin Shim, Sung Ju Hwang and Eunho Yang


We consider the problem of active feature acquisition, where we sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way. In this work, we formulate this active feature acquisition problem as a reinforcement learning problem, and provide a novel framework for jointly learning both the RL agent and the classifier (environment). We also introduce a more systematic way of encoding subsets of features that can properly handle innate challenge with missing entries in active feature acquisition problems, that uses the orderless LSTM-based set encoding mechanism that readily fits in the joint learning framework. We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several real datasets such as electric health record (EHR) datasets, on which it outperforms all baselines in terms of prediction performance as well feature acquisition cost.

View Publication

NIPS 2018

Learning to Specialize with Knowledge Distillation for Visual Question Answering

Jonghwan Mun, Kimin Lee, Jinwoo Shin and Bohyung Han

NeurIPS 2018

Stochastic Chebyshev Gradient Descent for Spectral Optimization

Insu Han, Haim Avron and Jinwoo Shin


A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e.g. log-determinant and nuclear norm. Unfortunately, computing the gradient of a spectral function is generally of cubic complexity, as such gradient descent methods are rather expensive for optimizing objectives involving the spectral function. Thus, one naturally turns to stochastic gradient methods in hope that they will provide a way to reduce or altogether avoid the computation of full gradients. However, here a new challenge appears: there is no straightforward way to compute unbiased stochastic gradients for spectral functions. In this paper, we develop unbiased stochastic gradients for spectral-sums, an important subclass of spectral functions. Our unbiased stochastic gradients are based on combining randomized trace estimators with stochastic truncation of the Chebyshev expansions. A careful design of the truncation distribution allows us to offer distributions that are variance-optimal, which is crucial for fast and stable convergence of stochastic gradient methods. We further leverage our proposed stochastic gradients to devise stochastic methods for objective functions involving spectral-sums, and rigorously analyze their convergence rate. The utility of our methods is demonstrated in numerical experiments.

View Publication

NeurIPS 2018

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

Kimin Lee, Kibok Lee, Honglak Lee and Jinwoo Shin


Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement to deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditionalGaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in extreme cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.

View Publication

ICML 2018

Deep Asymmetric Multi-task Feature Learning

Hae Beom LeeEunho Yang, and Sung Ju Hwang


We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppressing the influences of unreliable predictors for more difficult tasks. This allows the learning of less noisy representations, and enables unreliable predictors to exploit knowledge from the reliable predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer.

View Publication

ICML 2018

MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shot and Zero-shot Learning

Bo Zhao, Xinwei Sun, Yanwei Fu, Yuan Yao, Yizhou Wang


We propose the idea that the features consist of three orthogonal parts, namely sparse strong signals, dense weak signals and random noise, in which both strong and weak signals contribute to the fitting of data. To facilitate such novel decomposition, MSplit LBI is for the first time proposed to realize feature selection and dense estimation simultaneously. We provide theoretical and simulational verification that our method exceeds L1 and L2 regularization, and extensive experimental results show that our method achieves state-of-the-art performance in the few-shot and zero-shot learning.

View Publication

ICML 2018

Bucket Renormalization for Approximate Inference

Sungsoo Ahn, Michael Chertkov, Adrian Weller, and Jinwoo Shin


Probabilistic graphical models are a key tool in machine learning applications. Computing the partition function, i.e., normalizing constant, is a fundamental task of statistical inference but it is generally computationally intractable, leading to extensive study of approximation methods. Iterative variational methods are a popular and successful family of approaches. However, even state of the art variational methods can return poor results or fail to converge on difficult instances. In this paper, we instead consider computing the partition function via sequential summation over variables. We develop robust approximate algorithms by combining ideas from mini-bucket elimination with tensor network and renormalization group methods from statistical physics. The resulting “convergence-free” methods show good empirical performance on both synthetic and real-world benchmark models, even for difficult instances.

View Publication

AITRICS research papers to be presented at ICML 2018

July. 9, 2018

icml 2018에 대한 이미지 검색결과

AITRICS Research team has published 3 research papers in the International Conference on Machine Learning (ICML), for the second consecutive year.

At this well-known summit where some of the best machine learning experts around the world will come together to discuss their new innovative ideas, our researchers will present their study on solving some critical problems that may arise when applying AI techniques to clinical data.

Listed below are the three papers that have been accepted for publications.

  1. AMTFL: Deep Asymmetric Multitask Feature Learning (Deep-AMTFL))
  2. MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shot and Zero-shot Learning
  3. Bucket Renormalization for Approximate Inference

The risks of multiple related diseases are often studied together in a multi-task framework where knowledge transfer between diseases occurs. However, each disease might not be closely related to all of the predictable diseases. In those cases, sharing information with an unrelated task might negatively affect the prediction performance, a phenomenon known as negative transfer.   

To address this challenge, our researchers proposed Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) that can prevent negative transfer, by allowing asymmetric knowledge transfer between tasks.

Many datasets related to medical diagnoses are naturally imbalanced and typically have an inadequate supply of training instances. For example, datasets for heart failure or cerebrovascular diseases are small and imbalanced. When sufficient data is not readily available, the development of a representative prediction algorithm becomes even more difficult due to the unequal distribution between classes. The application of multi-task learning model in this case would only increase the probability of transferring the false information between unrelated diseases.

Our Deep AMTFL methodology effectively controls such negative transfer by allowing reliable predictors for the easy tasks (related) to have higher contribution to the feature sharing, while suppressing the influences of unreliable predictors for more difficult (unrelated) tasks. Thus, AITRICS continues to focus on building core AI techniques to overcome the challenges associated with integrating AI into the healthcare industry.

ICML is one of the world’s most prestigious machine learning gatherings that highlights studies of emerging machine learning technologies. The 35th International Conference on Machine Learning (ICML) 2018, will take place on July 10, 2018 to July 15, 2018 in Stockholm, Sweden.

Back to News List

ICML 2017

Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity

Eunho Yang and A. Lozano


Imposing sparse + group-sparse superposition structures in high-dimensional parameter estimation is known to provide flexible regularization that is more realistic for many real-world problems. For example, such a superposition enables partially-shared support sets in multi-task learning, thereby striking the right balance between parameter overlap across tasks and task specificity.

View Publication

ICML 2017

SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization

Juyoung Kim, YooKoon Park, Gunhee Kim, and Sungju Hwang


We propose a novel deep neural network that is both lightweight and effectively structured for model parallelization. Our network, which we name as SplitNet, automatically learns to split the network weights into either a set or a hierarchy of multiple groups that use disjoint sets of features, by learning both the class-to-group and feature-to-group assignment matrices along with the network weights.

View Publication
Source Code

ICML 2017

Graphical Models for Ordinal Data: A Tale of Two Approaches

Arun Sai Suggala, Eunho Yang, and P. Ravikumar


Undirected graphical models or Markov random fields (MRFs) are widely used for modeling multivariate probability distributions. Much of the work on MRFs has focused on continuous variables, and nominal variables (that is, unordered categorical variables). However, data from many real world applications involve ordered categorical variables also known as ordinal variables, e.g., movie ratings on Netflix which can be ordered from 1 to 5 stars.

View Publication