OREGON STATE UNIVERSITY

You are here

On Confidence-Constrained Rank Recovery in Topic Models

TitleOn Confidence-Constrained Rank Recovery in Topic Models
Publication TypeJournal Article
Year of Publication2012
AuthorsBehmardi, B., and R. Raich
JournalIEEE Transactions on Signal Processing
Volume60
Issue10
Pagination5146 - 5162
Date Published10/2012
ISSN1941-0476
Keywordsconfidence constraints, low-rank matrix recovery, nuclear norm minimization, rank estimation, topic models
Abstract

Topic models have been proposed to model a collection of data such as text documents and images in which each object (e.g., a document) contains a set of instances (e.g., words). In many topic models, the dimension of the latent topic space (the number of topics) is assumed to be a deterministic unknown. The number of topics significantly affects the prediction performance and interpretability of the estimated topics. In this paper, we propose a confidence-constrained rank minimization (CRM) to recover the exact number of topics in topic models with theoretical guarantees on recovery probability and mean squared error of the estimation. We provide a computationally-efficient optimization algorithm for the problem to further the applicability of the proposed framework to large real world datasets. Numerical evaluations are used to verify our theoretical results. Additionally, to illustrate the applicability of the proposed framework to practical problems, we provide results in image classification for two real world datasets and text classification for three real world datasets.

DOI10.1109/TSP.2012.2208634
Short TitleIEEE Trans. Signal Process.