首页 | 本学科首页   官方微博 | 高级检索  
     

基于LDA和DBSCAN的软件多版本克隆群映射方法
引用本文:葛广帅,刘东升,侯敏.基于LDA和DBSCAN的软件多版本克隆群映射方法[J].计算机应用研究,2017,34(2).
作者姓名:葛广帅  刘东升  侯敏
作者单位:内蒙古师范大学 计算机与信息工程学院,内蒙古师范大学 计算机与信息工程学院,内蒙古师范大学 计算机与信息工程学院
摘    要:针对克隆群映射大多基于相邻版本对比,当克隆群在中期版本短暂消失,实现多版本间映射存在困难,提出一种基于LDA和DBSCAN的软件多版本克隆群映射方法。首先,对所有版本的克隆群进行预处理,获得克隆群文档集合;其次,根据贝叶斯信息准则选取合适主题数T,进行主题概率模型训练,将所有克隆群都表示成T个主题的概率分布向量;再次,计算克隆群之间的JS距离,利用DBSCAN算法将同源的克隆群聚成一簇;最后,对同簇的克隆群按版本先后排序,得到多版本克隆群映射结果。对5款开源软件83个版本进行映射实验,结果表明查全率、查准率均在98%以上,为克隆代码分析、管理提供有力支持。

关 键 词:克隆群映射  软件演化  LDA  DBSCAN  克隆代码
收稿时间:2016/3/17 0:00:00
修稿时间:2016/12/20 0:00:00

Clone group mapping method in multi-version based on the LDA and DBSCAN
GE Guangshuai,LIU Dongsheng and HOU Min.Clone group mapping method in multi-version based on the LDA and DBSCAN[J].Application Research of Computers,2017,34(2).
Authors:GE Guangshuai  LIU Dongsheng and HOU Min
Affiliation:College of Computer and Information Engineering,Inner Mongolia Normal University,,College of Computer and Information Engineering,Inner Mongolia Normal University
Abstract:The present study on clone group mapping is mostly based on adjacent version comparison. When clone group disappear temporary in medium term version,it is difficult to implement mapping between multiple versions. A clone group mapping method based on the LDA and DBSCAN is proposed in this paper. First of all, clone group of all versions were preprocessed, and collections of clone document were acquired;Secondly, suitable subject number was selected as T based on the bayesian information criterion, then a theme probability model was trained, and all clone group could be described as the vector of T themes probability distribution. Thirdly, JS distance between clone group was computed, DBSCAN algorithm was used to put the homologous clone group into a cluster. Finally, clone group of the same cluster was sorted according to order of versions, and clone mapping results of multiple versions were obtained. Mapping experiment was conducted on 5 open-source softwares over 83 versions. Results show that the recall and precision is over 98%, which provide a strong support for analysis and management of clone code.
Keywords:clone group mapping  software evolution  LDA  DBSCAN  clone code
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号