Friday, February 24, 2012 - 9:40am to 11:00am
KEC 1005

Speaker Information

Yeye He
PhD candidate
Department of Computer Science
University of Wisconsin-Madison

Abstract

In this era of big-data, the tension between doing useful data analysis and preserving data privacy has grown significantly, and the problem of data privacy has become ever more important. Unfortunately, existing techniques cannot handle or do not consider a number of important data processing tasks. To address this problem, my dissertation analyzes challenges and proposes anonymization techniques in the context of three fundamental data models: relational data, set-valued data, and streaming event data. In this talk, I will focus on a new privacy problem motivated by hospital applications of the streaming model called Complex Event Processing. Despite the popularity of this event processing model, so far its privacy implication has been overlooked. I will describe the fundamental structure of the problem and discuss its theoretical properties. I will also present real-time privacy-aware event processing techniques that serve as a promising step towards a full privacy solution in a streaming environment.

Speaker Bio

Yeye He is a PhD candidate advised by Professor Jeffrey Naughton in the Department of Computer Science at University of Wisconsin-Madison. His thesis work is in the area of preserving data privacy, which is motivated by diverse real-world applications including streaming event processing, market basket analysis and machine learning using medical records. Yeye has completed several industrial internships at Microsoft Research and Google. In addition to his dissertation work, he has worked on a wide range of projects: SEISA, a set expansion system using semi-structured Web data; Keyword++, a framework to improve keyword search over entity databases; and EntityCrawl, a deep-web crawling system optimized for entity-oriented content. Before starting his PhD work, he worked on performance tuning for data warehousing benchmarks and participated in the development of the TPC-DS benchmark as a Member of Technical Staff at Oracle Corporation.