 |
|
 |


Thomas G. Dietterich
Research Activities
Research Description
I am interested in all aspects of machine learning. My primary focus is in how machine learning can provide the basis for building integrated intelligent systems and intelligent user interfaces. Hence, I am interested in problems of "learning in the wild," where a deployed system must learn without an engineer intervening to adjust parameters or change features and where the user feedback may be very noisy and indirect. Another interest is combining learning with probabilistic and logical inference, for example through probabilistic truth maintenance systems. Transfer learning--where the learner is able to transfer knowledge learned in one domain to another--is a central challenge for machine learning. Integrated learning, in which the learning system can flexibly learn from a wide range of different information sources, poses major new challenges for the field. Finally, I am interested bringing machine learning methods to bear on the challenging problem of generic objective recognition in computer vision.
Project Descriptions
Most of my research involves collaborations with many other researchers. Here is a brief list of my current projects:
TaskTracer (collaborators: Jon Herlocker, Simone Stumpf, Margaret Burnett)
TaskTracer (http://eecs.oregonstate.edu/TaskTracer/) applies machine learning and HCI techniques to make the computer desktop "task-aware." Existing desktop applications lack a coordinated view of the user's activities. Instead, each application operates independently and is unaware of the ways in which users combine multiple applications to carry out tasks. TaskTracer tracks the user's actions as he or she accesses "resources" (i.e., web pages, files, folders, email messages, and email folders). This information is automatically associated with tasks in a user-defined task hierarchy, and TaskTracer learns to recognize which task the user is working on and then to use this information to help the user in several ways. TaskTracer automatically tags incoming email with the task to which it belongs. TaskTracer makes it easy to access documents associated with the current task (i.e., in contrast to the "recent documents" menus in Windows, which are not aware of the user's current task). TaskTracer also initializes the file Open and Save dialogue boxes so that they are in the right folders. Finally, TaskTracer helps the user recover from interruptions, by restoring the desktop to the state it was in when the user was last working on the relevant task.
CALO (http://www.ai.sri.com/project/CALO) (collaborators: SRI International and 25 other university research groups)
The goal of the CALO project is to develop an integrated intelligent system that learns "in the wild" from a wide range of inputs. The task of the intelligent system is to serve as a personal assistant to a computer user. Hence, CALO's visible component is a kind of all-in-one desktop application called IRIS that incorporates a web browser, email system, calendar, and other tools (as well as the traditional MS Office application suite). A user's CALO builds and maintains a model of that user's world: the projects they are working on, the action items they are responsible for, the people and organizations they work with, the files, folders, and web pages they manipulate, the meetings they attend, and so on. CALO helps the user organize and prepare information including preparing for meetings and assembling PowerPoint presentations. CALO accompanies the user to meetings where it applies speech recognition, handwriting recognition, and computer vision to understand the meeting. It builds a transcript of the meeting and of handwritten notes and extracts summary information such as the topics discussed and the action items assigned to each person. It also acquires project plans drawn on white boards or digital paper, and it keeps track of which Powerpoint slides were presented and discussed during the meeting. Finally, CALO automates some kinds of complex tasks such as purchasing and scheduling meetings (including full-day visits, workshops attended by multiple people, etc.). CALO has a wide range of machine learning capabilities including email foldering, email urgency detection, extracting meeting invitations and contact informtion from emails, automatically populating your contacts list with information extracted from web pages, automatically ranking documents for the relevance to future meetings, and so forth.
Transfer Learning (collaborators: Alan Fern, Prasad Tadepalli)
People are able to learn things in one task or domain and then transfer that knowledge to another task or domain. For example, after learning to play a computer game (such as a real-time strategy game), a person can learn to play a similar game much faster. However, existing machine learning methods have difficulty transferring learned knowledge even from one part of a game to another, let alone to entirely different games. The goal of our transfer learning project is to develop new algorithms that are capable of transfer learning. This requires that learning take place at deeper, more fundamental levels. We are developing our methods in the real-time strategy game of Wargus and similar games. My particular focus in this project is on the problem of learning subroutine hierarchies that can transfer from one domain to another. This builds on my previous work on MAXQ hierarchical reinforcement learning.
Integrated Learning (collaborators: Prasad Tadepalli, Weng-Keen Wong, and Ron Metoyer)
Consider a case where a teacher demonstrates a complex task to a student. This single "training example" is a rich experience that includes commentary about required and optional steps, key conditions to check, places where mistakes are often made, and so on. The student can learn much more from this rich experience than current machine learning algorithms learn from a single training example. The goal of Integrated Learning is to develop learning algorithms that combine extensive and diverse sources of background knowledge with rich training experiences to learn complex task knowledge. We are studying this in two domains: air traffic flight planning and the Wargus real-time strategy game.
Insect Identification for Ecosystem Science and Environmental Monitoring (collaborators: Eric Mortensen, Bob Paasch, David Lytle, Andrew Moldenke, Linda Shapiro (UW))
Many challenges in community ecology and environmental monitoring could be addressed if we had an inexpensive ability to collect population counts of insects. Under NSF funding, we are developing methods for automated insect population counting in two applications: (a) counting stoneflies that live in freshwater stream substrates for water quality and stream health monitoring and (b) counting soil mesofauna for biodiversity studies. Our approach combines special-purpose hardware for manipulating insect specimens with imaging and pattern recognition software for photographing and classifying insects. This provides an challenging test problem for generic object recognition in computer vision. It is also part of a larger effort in the area Ecosystem Informatics, where the goal is to build interdisciplinary collaborations between mathematics, computer science, and the ecosystem sciences.
Sequential and Structural Supervised Learning
Many emerging machine learning problems involve assigning class labels to each item in a sequence, grid, or arbitrary graph. Often the label on one item is related in some way to the labels of its nearby items. We can achieve higher prediction accuracy by taking into learning and exploiting these correlations. This is known as "collective classification," and we are developing and applying such methods in sequential and spatial data including predicting protein secondary structure from the primary amino acid sequence and predicting land cover classes from pixels in remote-sensed images.
Recent Research Collaborations & Projects
- ITR: Pattern Recognition for Ecological Science and Environmental Monitoring. NSF ITR program. (co-PIs: Dave Lytle, Andrew Moldenke, Bob Paasch, and Linda Shapiro) 9/03 to 8/07. $1,730,000.
- Machine Learning for Task Recognition and Exploitation in CALO. DARPA (via subcontract from SRI International). (co-PIs Jon Herlocker, Margaret Burnett, Prasad Tadepalli, Alan Fern). 7/04 to 12/07. $2,166,658.
- Effective Bayesian Transfer Learning. DARPA (via subcontract from Berkeley). (co-PIs Alan Fern, Prasad Tadepalli) 10/05 to 9/08. $1,380,120.
- Generalized Integrated Learning Architecture (GILA) DARPA (via subcontract from Lockheed Martin ATL). (co-PIs Prasad Tadepalli, Weng-Keen Wong, Ron Metoyer) 6/06 to 5/10. $1,651,108.
- IGERT: Ecosystem Informatics. NSF program for interdisciplinary graduate stipends. (PI Julia Jones, co-PIs Mark Harmon, Ed Waymire). 10/03 to 9/08. $3,913,548.
- Summer Institute in EcoInformatics. NSF. (PI: Desiree Tullos; co-PIs: Julia Jones, Kari O'Connell, and Enrique Thomann). 6/06 to 5/10. $581,291.
|
 |