OREGON STATE UNIVERSITY

You are here

On provable exact low-rank recovery in topic models

TitleOn provable exact low-rank recovery in topic models
Publication TypeConference Paper
Year of Publication2011
AuthorsBehmardi, B., and R. Raich
Conference Name2011 IEEE Statistical Signal Processing Workshop (SSP)
Pagination265 - 268
Date Published06/2011
PublisherIEEE
Conference LocationNice, France
ISBN Number978-1-4577-0569-4
Keywordslow rank matrix recovery, nuclear norm minimization, topic models
Abstract

In the past few years, probabilistic topic models have been developed and applied to problems in text document classification and computer vision. Such models provide a probabilistic framework for characterizing a corpus of documents (or images) in the bag-of-words representation. Key feature of such models is that a low dimensional representation is facilitated through latent topic variables. Most inference algorithms in topic models assume a fixed number of topics and determine the number of topics empirically. In this paper, we consider the problem of identifying the number of topics in topic models. We present a rank minimization framework and provide sufficient conditions, which guarantee exact recovery of the number of topics. Moreover, we propose a heuristic convex relaxation to the rank minimization. Using simulations, we show that the proposed convex relaxation provides exact rank recovery under the sufficient conditions proposed for the rank minimization problem.

DOI10.1109/SSP.2011.5967677