Canonical correlation analysis (CCA) has been the main workhorse for multi-view feature learning, where we have access to multiple ''views'' of data at training time while only one primary view is available at test time. The idea of CCA is to project the views to a common space such that the projections of different views are maximally correlated.
In the first part of the talk, we compare different nonlinear extensions of CCA, and find that the deep neural network extension of CCA, termed deep CCA (DCCA), has consistently good performance while being computationally efficient for large datasets. We further compare DCCA with deep autoencoder-based approaches, as well as new variants. We find an advantage for correlation-based representation learning.
In the second part of the talk, we study the stochastic optimization of canonical correlation analysis, whose objective is nonconvex and does not decouple over training samples. Although several stochastic optimization algorithms have been previously proposed to solve this problem, no global convergence guarantee was provided by any of them. Based on the alternating least squares formulation of CCA, we propose a globally convergent stochastic algorithm, which solves the resulting least squares problems approximately to sufficient accuracy with state-of-the-art stochastic gradient methods. We provide the overall time complexity of our algorithm which improves upon that of previous work.
This talk summarizes primarily my postdoc research at TTI-Chicago, and I will give pointers to more recent development. The talk includes joint work with Raman Arora (JHU), Jeff Bilmes (UW), Jialei Wang (U Chicago), Dan Garber (Technion), Nathan Srebro (TTIC), and Karen Livescu (TTIC).