首页 | 本学科首页   官方微博 | 高级检索  
     


On the relative value of cross-company and within-company data for defect prediction
Authors:Burak Turhan  Tim Menzies  Ayşe B. Bener  Justin Di Stefano
Affiliation:(1) Department of Computer Engineering, Bogazici University, Istanbul, Turkey;(2) Lane Department of Computer Science and Electrical Engineering, Morgantown, West Virginia, USA
Abstract:We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few. Then we apply principles of analogy-based learning (i.e. nearest neighbor (NN) filtering) to CC data, in order to fine tune these models for localization. We compare the performance of these models with that of defect predictors learned from within-company (WC) data. As expected, we observe that defect predictors learned from WC data outperform the ones learned from CC data. However, our analyses also yield defect predictors learned from NN-filtered CC data, with performance close to, but still not better than, WC data. Therefore, we perform a final analysis for determining the minimum number of local defect reports in order to learn WC defect predictors. We demonstrate in this paper that the minimum number of data samples required to build effective defect predictors can be quite small and can be collected quickly within a few months. Hence, for companies with no local defect data, we recommend a two-phase approach that allows them to employ the defect prediction process instantaneously. In phase one, companies should use NN-filtered CC data to initiate the defect prediction process and simultaneously start collecting WC (local) data. Once enough WC data is collected (i.e. after a few months), organizations should switch to phase two and use predictors learned from WC data.
Contact Information Justin Di StefanoEmail:

Burak Turhan   received his PhD degree from the department of Computer Engineering at Bogazici University. He recently joined in NRC-Canada IIT-SEG as a Research Associate after six years of research assistant experience in Bogazici University. His research interests include all aspects of software quality and are focused on software defect prediction models. He is a member of IEEE, IEEE Computer Society and ACM SIGSOFT. MediaObjects/10664_2008_9103_Figd_HTML.gif Tim Menzies   (tim@menzies.us) has been working on advanced modeling, software engineering, and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 160 refereeed papers. A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University’s Lane Department of Computer Science and Electrical Engineering. For more information, visit his web page at . MediaObjects/10664_2008_9103_Fige_HTML.gif Ayşe B. Bener   is an assistant professor and a full time faculty member in the Department of Computer Engineering at Bogazici University. Her research interests are software defect prediction, process improvement and software economics. Bener has a PhD in information systems from the London School of Economics. She is a member of the IEEE, the IEEE Computer Society and the ACM. MediaObjects/10664_2008_9103_Figf_HTML.gif Justin Di Stefano   is currently the Software Technical Lead for Delcan, Inc. in Vienna, Virginia, specializing in transportation management and planning. He earned his Master’s degree in Electrical Engineering (with a specialty area of Software Engineering) from West Virginia University in 2007. Prior to his current employment he worked as a researcher for the WVU/NASA Space Grant program where he helped to develop a spin-off product based upon research into static code metrics and error prone code prediction. His undergraduate degrees are in Electrical Engineering and Computer Engineering, both from West Virginia University, earned in the fall of 2002. He has numerous publications on software error prediction, static code analysis and various machine learning algorithms. MediaObjects/10664_2008_9103_Figg_HTML.gif
Keywords:Defect prediction  Learning  Metrics (product metrics)  Cross-company  Within-company  Nearest-neighbor filtering
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号