OREGON STATE UNIVERSITY

You are here

Constructing Training Sets for Outlier Detection

TitleConstructing Training Sets for Outlier Detection
Publication TypeConference Paper
Year of Publication2012
AuthorsLiu, L-P., and X. Z. Fern
Conference NameProceedings of SIAM International Conference on Data Mining
Pagination919-929
Date Published04/2012
Conference LocationAnaheim, California
Abstract

Outlier detection often works in an unsupervised manner due to the diculty of obtaining enough training data. Since outliers are rare, one has to label a very large dataset to include enough outliers in the training set, with which classi ers could suciently learn the concept of outliers. Labeling a large training set is costly for most applications. However, we could just label suspected instances identi ed by unsupervised methods. In this way, the number of instances to be labeled could be greatly reduced. Based on this idea, we propose CISO, an algorithm Constructing training set by Identifying Suspected Outliers. In this algorithm, instances in a pool are rst ranked by an unsupervised outlier detection algorithm. Then, suspected instances are selected and hand-labeled, and all remaining instances receive label of inlier. As such, all instances in the pool are labeled and used in the training set. We also propose Budgeted CISO (BCISO), with which user could set a xed budget for labeling. Experiments show that both algorithms achieve good performance compared to other methods when the same amount of labeling e ort are used.