To make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. A key challenge to this end is to bridge the gap between the low-level perceptual inputs and the semantically useful inferences. A natural way to approach this challenge is to have a set of intermediate characterizations that could appropriately channel low-level information such that it could be used to draw useful high-level inference. In this talk, I will address some of the challenges associated with one such set of intermediate characterizations, including (i) key-objects present in an environment, (ii) interactions among these key-objects that define various actions, and (iii) sequences of different actions that compose various activities.
I will begin by focusing on the problem of localization and tracking of key-objects in an environment, especially when they are being observed using multiple cameras. I will present a method to model the problem of fusing information from multiple cameras as finding cycles in complete K-partite graphs, and will summarize a class of greedy algorithms that can search for these cycles in an efficient manner. Using sports visualization as a motivating application, I will present results of our work on close to 300,000 frames of real soccer footage captured over a diverse set of playing conditions.
Next, I will talk about the problem of accurate detection of different actions performed in an environment. Exploiting the perceptual similarity that naturally exists among multiple actions, I will present a method of adaptively sharing information among multiple actions in order to simultaneously learn their discriminative models. I will present results of our learning framework on a set of 10 different actions performed in real soccer games.
Finally, I will focus on the problem of learning structure of activities performed in everyday environments. I will particularly talk about the representation of n-grams that attempt to learn the global structure of activities by using their local action-statistics. I will discuss how such a data-driven approach towards activity modeling can help discover and characterize human activities, and learn typical behaviors crucial for detecting irregular occurrences in an environment.