We present a novel Bayesian model of random simple graphs with power-law degree distribution. Based on the random graph model of Norros and Reittue 2006, we introduce inverse gamma and generalized inverse Gaussian priors on vertex weights, and show that the asymptotic degree distributions of the graphs generated from our model would be power-law with index greater than 2 for inverse gamma case.
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks.
In this paper, to support promising empirical results of CBE, we extend the previous theoretical framework to address the optimal condition on the number of bits, achieving that CBE requires the same number of bits to approximate the angle up to epsilon-distortion under mild assumptions. We also provide numerical experiments to support our theoretical results.
Yoonho Hwang, Mooyeol Baek, Saehoon Kim, Bohyung Han, Hee-Kap Ahn
We propose an effective filtering algorithm to eliminate nearest neighbor candidates using their distance lower bounds in nonlinear embedded spaces, constructed by product quantized translations. Experiments on several large-scale benchmark datasets show that our framework achieves the state-of-the-art performance compared to existing exact nearest neighbor search algorithms.
We propose a neural network to learn meta-features over datasets, which is used to select initial points for Bayesian hyperparameter optimization. Specifically, we retrieve k-nearest datasets to transfer a prior knowledge on initial points, where similarity over datasets is computed by learned meta-features. Experiments demonstrate that our learned meta-features are useful in optimizing several hyperparameters of deep residual networks for image classification.
The number of parameters in a deep neural network is usually very large, which helps with its learning capacity but also hinders its scalability and practicality due to memory/time inefficiency and overfitting. To resolve this issue, we propose a sparsity regularization method that exploits both positive and negative correlations among the features to enforce the network to be sparse, and at the same time remove any redundancies among the features to fully utilize the capacity of the network.
Undirected graphical models or Markov random fields (MRFs) are widely used for modeling multivariate probability distributions. Much of the work on MRFs has focused on continuous variables, and nominal variables (that is, unordered categorical variables). However, data from many real world applications involve ordered categorical variables also known as ordinal variables, e.g., movie ratings on Netflix which can be ordered from 1 to 5 stars.
Juyong Kim, Yookoon Park, Gunhee Kim and Sung Ju Hwang
We propose a novel deep neural network that is both lightweight and effectively structured for model parallelization. Our network, which we name as SplitNet, automatically learns to split the network weights into either a set or a hierarchy of multiple groups that use disjoint sets of features, by learning both the class-to-group and feature-to-group assignment matrices along with the network weights.
Imposing sparse + group-sparse superposition structures in high-dimensional parameter estimation is known to provide flexible regularization that is more realistic for many real-world problems. For example, such a superposition enables partially-shared support sets in multi-task learning, thereby striking the right balance between parameter overlap across tasks and task specificity.