Hyunjik Kim

Preprints | Publications

I'm a second year PhD student in machine learning at the University of Oxford, supervised by Prof. Yee Whye Teh in the Machine Learning group at the Department of Statistics

My research interests fall under the topic of scalable probabilistic inference. So far I have worked on scaling up inference for Gaussian processes, in particular on regression models for collaborative filtering that are motivated by a scalable approximation to a GP, as well as a method for scaling up the compositional kernel search used by the Automatic Statistician via variational sparse GP methods. I have recently developed an interest in deep generative models, in particular latent variable models whose latent variables are interpretable, for example representing disentangled factors of variation in the data. I am also interested in gradient based inference for generative models with discrete units.

Previously, I studied Mathematics at the University of Cambridge, from which I obtained B.A. and M.Math. degrees. I spent a summer at Microsoft Research, Cambridge as a research intern, and worked on collaborative filtering.

Curriculum Vitae

E-mail: hkim@stats.ox.ac.uk

Recent

Public Engagement: Introducing Machine Learning to the Public

I helped create a cute two-minute animation that introduces machine learning to the general public, along with friends at Oxford. Check it out below!



Further details can be found here

Preprints

Tucker Gaussian Process for Regression and Collaborative Filtering

Abstract: We introduce the Tucker Gaussian Process (TGP), a model for regression that regularises a Gaussian Process (GP) towards simpler regression functions for enhanced generalisation performance. We derive it using a novel approach to scalable GP learning, and show that our model is particularly well-suited to grid-structured data and problems where the dependence on covariates is close to being separable. A prime example is collaborative filtering, for which our model provides an effective GP based method that has a low-rank matrix factorisation at its core. We show that TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information.

Hyunjik Kim, Xiaoyu Lu, Seth Flaxman, Yee Whye Teh
ArXiv, 2016
pdf | bibtex

Publications

Scaling up the Automatic Statistician: Scalable Structure Discovery for Regression using Gaussian Processes

Abstract: Automatic Bayesian Covariance Discovery(ABCD) in Lloyd et. al (2014) provides a framework for automating statistical modelling as well as exploratory data analysis for regression problems. However ABCD does not scale due to its O(N^3) running time. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially more information in bigger data, implying a greater need for more expressive models that can discover sophisticated structure. We propose a scalable version of ABCD, to encompass big data within the boundaries of automated statistical modelling.

Hyunjik Kim, Yee Whye Teh
AutoML 2016, Journal of Machine Learning Research Workshop and Conference Proceedings.
Practical Bayesian Nonparametrics Workshop, NIPS 2016. Oral & Travel Award.
pdf | bibtex