首页 | 本学科首页   官方微博 | 高级检索  
     


Mining extremely small data sets with application to software reuse
Authors:Yuan Jiang  Ming Li  Zhi‐Hua Zhou
Affiliation:National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
Abstract:A serious problem encountered by machine learning and data mining techniques in software engineering is the lack of sufficient data. For example, there are only 24 examples in the current largest data set on software reuse. In this paper, a recently proposed machine learning algorithm is modified for mining extremely small data sets. This algorithm works in a twice‐learning style. In detail, a random forest is trained from the original data set at first. Then, virtual examples are generated from the random forest and used to train a single decision tree. In contrast to the numerous discrepancies between the empirical data and expert opinions reported by previous research, our mining practice shows that the empirical data are actually consistent with expert opinions. Copyright © 2008 John Wiley & Sons, Ltd.
Keywords:data mining  machine learning  software reuse  extremely small data set  twice learning  ensemble learning  random forest
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号