• A Bayesian random simple graph model with power-law degree distribution
    ISBA 2018
    Juho Lee, Lancelot F. James, François Caron
    We present a novel Bayesian model of random simple graphs with power-law degree distribution. Based on the random graph model of Norros and Reittue 2006, we introduce inverse gamma and generalized inverse Gaussian priors on vertex weights, and show that the asymptotic degree distributions of the graphs generated from our model would be power-law with index greater than 2 for inverse gamma case.
  • Lifelong Learning with Dynamically Expandable Networks
    ICLR 2018
    JaehongYoon, EunhoYang, Jeongtae Lee, SungJu Hwang
    We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks.
  • On the optimal Bit Complexity of Circulant Binary Embedding
    AAAI 2018
    Saehoon Kim, Jungtaek Kim, and Seungjin Choi
    In this paper, to support promising empirical results of CBE, we extend the previous theoretical framework to address the optimal condition on the number of bits, achieving that CBE requires the same number of bits to approximate the angle up to epsilon-distortion under mild assumptions. We also provide numerical experiments to support our theoretical results.
  • Product Quantized Translation for Fast Nearest Neighbor Search
    AAAI 2018
    Yoonho Hwang, Mooyeol Baek, Saehoon Kim, Bohyung Han, Hee-Kap Ahn
    We propose an effective filtering algorithm to eliminate nearest neighbor candidates using their distance lower bounds in nonlinear embedded spaces, constructed by product quantized translations. Experiments on several large-scale benchmark datasets show that our framework achieves the state-of-the-art performance compared to existing exact nearest neighbor search algorithms.
  • Learning to Transfer Initializations for Bayesian Hyperparameter Optimization
    NIPS 2017 Workshop on Bayesian Optimization
    Jungtaek Kim, Saehoon Kim, and Seungjin Choi
    We propose a neural network to learn meta-features over datasets, which is used to select initial points for Bayesian hyperparameter optimization. Specifically, we retrieve k-nearest datasets to transfer a prior knowledge on initial points, where similarity over datasets is computed by learned meta-features. Experiments demonstrate that our learned meta-features are useful in optimizing several hyperparameters of deep residual networks for image classification.
  • Combined Group and Exclusive Sparsity for Deep Neural Networks
    ICML 2017
    Jaehong Yoon and Sung Ju Hwang
    The number of parameters in a deep neural network is usually very large, which helps with its learning capacity but also hinders its scalability and practicality due to memory/time inefficiency and overfitting. To resolve this issue, we propose a sparsity regularization method that exploits both positive and negative correlations among the features to enforce the network to be sparse, and at the same time remove any redundancies among the features to fully utilize the capacity of the network.
  • Graphical Models for Ordinal Data: A Tale of Two Approaches
    ICML 2017
    Arun Sai Suggala, Eunho Yang and P. Ravikumar
    Undirected graphical models or Markov random fields (MRFs) are widely used for modeling multivariate probability distributions. Much of the work on MRFs has focused on continuous variables, and nominal variables (that is, unordered categorical variables). However, data from many real world applications involve ordered categorical variables also known as ordinal variables, e.g., movie ratings on Netflix which can be ordered from 1 to 5 stars.
  • SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization
    ICML 2017
    Juyong Kim, Yookoon Park, Gunhee Kim and Sung Ju Hwang
    We propose a novel deep neural network that is both lightweight and effectively structured for model parallelization. Our network, which we name as SplitNet, automatically learns to split the network weights into either a set or a hierarchy of multiple groups that use disjoint sets of features, by learning both the class-to-group and feature-to-group assignment matrices along with the network weights.
  • Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity
    ICML 2017
    Eunho Yang and A. Lozano
    Imposing sparse + group-sparse superposition structures in high-dimensional parameter estimation is known to provide flexible regularization that is more realistic for many real-world problems. For example, such a superposition enables partially-shared support sets in multi-task learning, thereby striking the right balance between parameter overlap across tasks and task specificity.