OREGON STATE UNIVERSITY

You are here

Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?

TitleIs It Dangerous to Use Version Control Histories to Study Source Code Evolution?
Publication TypeConference Paper
Year of Publication2012
AuthorsNegara, S., M. Vakilian, N. Chen, R. E. Johnson, and D. Dig
Pagination79 - 103
Date Published06/2012
Conference LocationBeijing, China
ISBN Number978-3-642-31057-7
Abstract

Researchers use file-based Version Control System (VCS) as the primary source of code evolution data. VCSs are widely used by developers, thus, researchers get easy access to historical data of many projects. Although it is convenient, research based on VCS data is incomplete and imprecise. Moreover, answering questions that correlate code changes with other activities (e.g., test runs, refactoring) is impossible.

Our tool, CodingTracker, non-intrusively records fine-grained and diverse data during code development. CodingTracker collected data from 24 developers: 1,652 hours of development, 23,002 committed files, and 314,085 testcase runs.

This allows us to answer: How much code evolution data is not stored in VCS? How much do developers intersperse refactorings and edits in the same commit? How frequently do developers fix failing tests by changing the test itself? How many changes are committed to VCS without being tested? What is the temporal and spacial locality of changes?

DOI10.1007/978-3-642-31057-7_5