OREGON STATE UNIVERSITY

You are here

Detecting insider threats in a real corporate database of computer usage activity

TitleDetecting insider threats in a real corporate database of computer usage activity
Publication TypeConference Paper
Year of Publication2013
AuthorsSenator, T. E., H. G. Goldberg, A. Memory, W. T. Young, B. Rees, R. Pierce, D. Huang, M. Reardon, D. A. Bader, E. Chow, I. Essa, J. Jones, V. Bettadapura, D H. Chau, O. Green, O. Kaya, A. Zakrzewska, E. Briscoe, R M. L. IV, R. McColl, L. Weiss, D. Koutra, T. G. Dietterich, A. Fern, W-K. Wong, S. Das, A. Emmott, J. Irvine, J-Y. Lee, D. Koutra, C. Faloutsos, D. Corkill, L. Friedland, A. Gentzel, and D. Jensen
Conference NameProceedings of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - KDD '13
Pagination1393
Date Published08/2013
PublisherACM Press
Conference LocationChicago, Illinois
ISBN Number9781450321747
Abstract

This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations' information systems. Our system combines structural and semantic information from a real corporate database of monitored activity on their users' computers to detect independently developed red team inserts of malicious insider activities. We have developed and applied multiple algorithms for anomaly detection based on suspected scenarios of malicious insider behavior, indicators of unusual activities, high-dimensional statistical patterns, temporal sequences, and normal graph evolution. Algorithms and representations for dynamic graph processing provide the ability to scale as needed for enterprise-level deployments on real-time data streams. We have also developed a visual language for specifying combinations of features, baselines, peer groups, time periods, and algorithms to detect anomalies suggestive of instances of insider threat behavior. We defined over 100 data features in seven categories based on approximately 5.5 million actions per day from approximately 5,500 users. We have achieved area under the ROC curve values of up to 0.979 and lift values of 65 on the top 50 user-days identified on two months of real data.

DOI10.1145/2487575.2488213