首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the usage of version control systems, many bug fixes have accumulated over the years. Researchers have proposed various automatic program repair (APR) approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new fixes overlap with old fixes, have not been investigated. Intuitively, the overlap between old and new fixes decides how APR approaches can construct new fixes with old ones. Based on this intuition, we systematically designed six overlap metrics, and performed an empirical study on 5,735 bug fixes to investigate the usefulness of past fixes when composing new fixes. For each bug fix, we created delta graphs (i.e., program dependency graphs for code changes), and identified how bug fixes overlap with each other in terms of the content, code structures, and identifier names of fixes. Our results show that if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the content of past fixes, only 2.1% and 3.2% new fixes can be created from single or multiple past fixes in the same project, compared with 0.9% and 1.2% fixes created from past fixes across projects. However, if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the code structures of past fixes, up to 41.3% and 29.7% new fixes can be created. By making the above observations and revealing other ten findings, we investigated the upper bound of reusable past fixes and composable new fixes, exploring the potential of existing and future APR approaches.  相似文献   

2.
ContextBug report assignment, namely, to assign new bug reports to developers for timely and effective bug resolution, is crucial for software quality assurance. However, with the increasing size of software system, it is difficult to assign bugs to appropriate developers for bug managers.ObjectiveThis paper propose an approach, called KSAP (K-nearest-neighbor search and heterogeneous proximity), to improve automatic bug report assignment by using historical bug reports and heterogeneous network of bug repository.MethodWhen a new bug report was submitted to the bug repository, KSAP assigns developers for the bug report by using a two-phase procedure. The first phase is to search historically-resolved similar bug reports to the new bug report by K-nearest-neighbor (KNN) method. The second phase is to rank the developers who contributed to those similar bug reports by heterogeneous proximity.ResultsWe collected bug repositories of Mozilla, Eclipse, Apache Ant and Apache Tomcat6 projects to investigate the performance of the proposed KSAP approach. Experimental results demonstrate that KSAP can improve the recall of bug report assignment between 7.5–32.25% in comparison with the state of art techniques. When there is only a small number of developer collaborations on common bug reports, KSAP has shown its excellence over other sate of art techniques. When we tune the parameters of the number of historically-resolved similar bug reports (K) and the number of developers (Q) for recommendation, KSAP keeps its superiority steadily.ConclusionThis is the first paper to demonstrate how to automatically build heterogeneous network of a bug repository and extract meta-paths of developer collaborations from the heterogeneous network for bug report assignment.  相似文献   

3.
ContextBug fixing is an integral part of software development and maintenance. A large number of bugs often indicate poor software quality, since buggy behavior not only causes failures that may be costly but also has a detrimental effect on the user’s overall experience with the software product. The impact of long lived bugs can be even more critical since experiencing the same bug version after version can be particularly frustrating for user. While there are many studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge, none of these studies investigates the extent and reasons of long lived bugs.ObjectiveIn this paper, we investigate the triaging and fixing processes of long lived bugs so that we can identify the reasons for delay and improve the overall bug fixing process.MethodologyWe mine the bug repositories of popular open source projects, and analyze long lived bugs from five different perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes.ResultsOur study on seven open-source projects shows that there are a considerable number of long lived bugs in each system and over 90% of them adversely affect the user’s experience. The reasons for these long lived bugs are diverse including long assignment time, not understanding their importance in advance, etc. However, many bug-fixes were delayed without any specific reasons. Furthermore, 40% of long lived bugs need only small fixes.ConclusionOur overall results suggest that a significant number of long lived bugs may be minimized through careful triaging and prioritization if developers could predict their severity, change effort, and change impact in advance. We believe our results will help both developers and researchers better to understand factors behind delays, improve the overall bug fixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expected bug fixing effort.  相似文献   

4.
The large number of new bug reports received in bug repositories of software systems makes their management a challenging task.Handling these reports manually is time consuming,and often results in delaying the resolution of important bugs.To address this issue,a recommender may be developed which automatically prioritizes the new bug reports.In this paper,we propose and evaluate a classification based approach to build such a recommender.We use the Na¨ ve Bayes and Support Vector Machine (SVM) classifiers,and present a comparison to evaluate which classifier performs better in terms of accuracy.Since a bug report contains both categorical and text features,another evaluation we perform is to determine the combination of features that better determines the priority of a bug.To evaluate the bug priority recommender,we use precision and recall measures and also propose two new measures,Nearest False Negatives (NFN) and Nearest False Positives (NFP),which provide insight into the results produced by precision and recall.Our findings are that the results of SVM are better than the Na¨ ve Bayes algorithm for text features,whereas for categorical features,Na¨ ve Bayes performance is better than SVM.The highest accuracy is achieved with SVM when categorical and text features are combined for training.  相似文献   

5.
随着位置社交的快速发展,推荐系统融合基于位置服务的兴趣点推荐成为一个重要的研究热点。当前最新的兴趣点推荐工作开始融合地理、文本和社交信息进行推荐,但是还存在信息挖掘不充分的情况。为此,提出了改进的多类型信息融合的联合概率生成的兴趣点推荐模型。首先,提出了自动学习文档话题数目的分层狄利克雷过程主题模型,学习用户和兴趣点相关兴趣话题;同时,利用由签到分布决定带宽大小的核密度估计法,个性化地理信息对用户签到行为的影响;而且,还融合了用户位置访问序列中,已访问兴趣点对待访问兴趣点的影响,即序列模式的影响;然后,综合考虑了用户社交关系的影响;最后,基于联合概率生成模型,融合文本、地理、社会和序列信息,提出TGSS-PGM兴趣点推荐模型,依据计算结果从而生成兴趣点推荐列表推荐给用户。实验结果表明,该模型在推荐准确率等多种评价指标上都取得了更好的结果。  相似文献   

6.
提出一种潜在文档相似模型(LDSM),把每对文档看作一个二分图,把文档的潜在主题看作二分图的顶点,用主题问的加权相似度为相应边赋权值,并用二分图的最佳匹配表示文档的相似度。实验结果表明,LDSM的平均查准率和平均查全率都优于用TextTiling和二分图最佳匹配方法构建的文档相似模型。  相似文献   

7.
基于稀疏编码的动态纹理识别   总被引:2,自引:1,他引:1       下载免费PDF全文
目的 线性动态系统有效地捕捉了动态纹理在时间和空间的转移信息。然而,线性动态系统属于非欧氏空间模型,无法直接应用传统的稀疏编码进行分类识别,为此提出一种基于稀疏编码线性动态系统的求解方法并应用于动态纹理识别。方法 基于约束凸优化公式,将稀疏编码和控制论中相似性变换结合,优化学习模型参数,解决应用稀疏编码进行分类识别的问题,实现有效的动态纹理识别。结果 在公开的动态纹理图像数据库UCLA上进行实验并与其他方法进行比较,实验结果表明,本文方法具有更好的性能,识别率可达97%,且对遮挡具有更好的鲁棒性。结论 本文方法对动态纹理及遮挡情况具有更好的识别率。  相似文献   

8.
基于特定领域的中文微博热点话题挖掘系统BTopicMiner   总被引:1,自引:0,他引:1  
李劲  张华  吴浩雄  向军 《计算机应用》2012,32(8):2346-2349
随着微博应用的迅猛发展,自动地从海量微博信息中提取出用户感兴趣的热点话题成为一个具有挑战性的研究课题。为此研究并提出了基于扩展的话题模型的中文微博热点话题抽取算法。为了解决微博信息固有的数据稀疏性问题,算法首先利用文本聚类方法将内容相关的微博消息合成为微博文档;基于微博之间的跟帖关系蕴含着话题的关联性的假设,算法对传统潜在狄利克雷分配(LDA)话题模型进行扩展以建模微博之间的跟帖关系;最后利用互信息(MI)计算被抽取出的话题的话题词汇用于热点话题推荐。为了验证扩展的话题抽取模型的有效性,实现了一个基于特定领域的中文微博热点话题挖掘的原型系统——BTopicMiner。实验结果表明:基于微博跟帖关系的扩展话题模型可以更准确地自动提取微博中的热点话题,同时利用MI度量自动计算得到的话题词汇和人工挑选的热点词汇之间的语义相似度达到75%以上。  相似文献   

9.
胡川  孟祥武  张玉洁  杜雨露 《软件学报》2018,29(10):3164-3183
近年来,组推荐系统已经逐渐成为推荐系统领域的研究热点之一.在电影电视和旅游推荐中,用户常常是参与活动的一组人,这就需要为多个用户形成的群组进行推荐.作为解决群组推荐问题的有效手段,组推荐系统将单个用户推荐扩展为群组推荐,目前已经应用在新闻、音乐、电影、餐饮等诸多领域.现有的组推荐融合方法主要是模型融合与推荐融合,其效用好坏目前仍没有定论,并且它们各有自己的优缺点.模型融合存在着群组成员间的公平性问题,推荐融合忽视了群组成员间的交互.提出一种改进的偏好融合组推荐方法,它结合了两种融合方法的优点.同时根据实验得出了"群组偏好与个人偏好具有相似性"的结论,并将它结合在改进方法中.最后,通过在Movielens数据集上的实验分析,验证了该方法的有效性,证明了它能够有效地提高推荐准确率.  相似文献   

10.
11.
Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.  相似文献   

12.
基于加权隐含狄利克雷分配模型的新闻话题挖掘方法   总被引:2,自引:0,他引:2  
李湘东  巴志超  黄莉 《计算机应用》2014,34(5):1354-1359
针对传统新闻话题挖掘准确率不高、话题可解释性差等问题,结合新闻报道的体例结构特点,提出一种基于加权隐含狄利克雷分配(LDA)模型的新闻话题挖掘方法。首先从不同角度改进词汇权重并构造复合权值,扩展LDA模型生成特征词的过程,以获取表意性较强的词汇;其次,将类别区分词(CDW)方法应用于建模结果的词序优化上,以消除话题歧义和噪声、提高话题的可解释性;最后,依据模型话题概率分布的数学特性,从文档对话题的贡献度以及话题权值概率角度对话题进行量化计算,以获取热门话题。仿真实验表明:与传统LDA模型相比,改进方法的漏报率、误报率分别平均降低1.43%、0.16%,最小标准代价平均降低2.68%,验证了该方法的可行性和有效性。  相似文献   

13.
ContextTopic models such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) have demonstrated success in mining software repository tasks. Understanding software change messages described by the unstructured nature-language text is one of the fundamental challenges in mining these messages in repositories.ObjectiveWe seek to present a novel automatic change message classification method characterized by semi-supervised topic semantic analysis.MethodIn this work, we present a semi-supervised LDA based approach to automatically classify change messages. We use domain knowledge of software changes to make labeled samples which are added to build the semi-supervised LDA model. Next, we verify the cross-project analysis application of our method on three open-source projects. Our method has two advantages over existing software change classification methods: First of all, it mitigates the issue of how to set the appropriate number of latent topics. We do not have to choose the number of latent topics in our method, because it corresponds to the number of class labels. Second, this approach utilizes the information provided by the label samples in the training set.ResultsOur method automatically classified about 85% of the change messages in our experiment and our validation survey showed that 70.56% of the time our automatic classification results were in agreement with developer opinions.ConclusionOur approach automatically classifies most of the change messages which record the cause of the software change and the method is applicable to cross-project analysis of software change messages.  相似文献   

14.
The rapid growth of social network services has produced a considerable amount of data, called big social data. Big social data are helpful for improving personalized recommender systems because these enormous data have various characteristics. Therefore, many personalized recommender systems based on big social data have been proposed, in particular models that use people relationship information. However, most existing studies have provided recommendations on special purpose and single-domain SNS that have a set of users with similar tastes, such as MovieLens and Last.fm; nonetheless, they have considered closeness relation. In this paper, we introduce an appropriate measure to calculate the closeness between users in a social circle, namely, the friendship strength. Further, we propose a friendship strength-based personalized recommender system that recommends topics or interests users might have in order to analyze big social data, using Twitter in particular. The proposed measure provides precise recommendations in multi-domain environments that have various topics. We evaluated the proposed system using one month's Twitter data based on various evaluation metrics. Our experimental results show that our personalized recommender system outperforms the baseline systems, and friendship strength is of great importance in personalized recommendation.  相似文献   

15.
User based collaborative filtering (CF) has been successfully applied into recommender system for years. The main idea of user based CF is to discover communities of users sharing similar interests, thus, in which, the measurement of user similarity is the foundation of CF. However, existing user based CF methods suffer from data sparsity, which means the user-item matrix is often too sparse to get ideal outcome in recommender systems. One possible way to alleviate this problem is to bring new data sources into user based CF. Thanks to the rapid development of social annotation systems, we turn to using tags as new sources. In these approaches, user-topic rating based CF is proposed to extract topics from tags using different topic model methods, based on which we compute the similarities between users by measuring their preferences on topics. In this paper, we conduct comparisons between three user-topic rating based CF methods, using PLSA, Hierarchical Clustering and LDA. All these three methods calculate user-topic preferences according to their ratings of items and topic weights. We conduct the experiments using the MovieLens dataset. The experimental results show that LDA based user-topic rating CF and Hierarchical Clustering outperforms the traditional user based CF in recommending accuracy, while the PLSA based user-topic rating CF performs worse than the traditional user based CF.  相似文献   

16.
黄育  张鸿 《计算机应用》2017,37(4):1061-1064
针对不同模态数据对相同语义主题表达存在差异性,以及传统跨媒体检索算法忽略了不同模态数据能以合作的方式探索数据的内在语义信息等问题,提出了一种新的基于潜语义主题加强的跨媒体检索(LSTR)算法。首先,利用隐狄利克雷分布(LDA)模型构造文本语义空间,然后以词袋(BoW)模型来表达文本对应的图像;其次,使用多分类逻辑回归对图像和文本分类,用得到的基于多分类的后验概率表示文本和图像的潜语义主题;最后,利用文本潜语义主题去正则化图像的潜语义主题,使图像的潜语义主题得到加强,同时使它们之间的语义关联最大化。在Wikipedia数据集上,文本检索图像和图像检索文本的平均查准率为57.0%,比典型相关性分析(CCA)、SM(Semantic Matching)、SCM(Semantic Correlation Matching)算法的平均查准率分别提高了35.1%、34.8%、32.1%。实验结果表明LSTR算法能有效地提高跨媒体检索的平均查准率。  相似文献   

17.
Information Retrieval (IR) approaches, such as Latent Semantic Indexing (LSI) and Vector Space Model (VSM), are commonly applied to recover software traceability links. Recently, an approach based on developers’ eye gazes was proposed to retrieve traceability links. This paper presents a comparative study on IR and eye-gaze based approaches. In addition, it reports on the possibility of using eye gaze links as an alternative benchmark in comparison to commits. The study conducted asked developers to perform bug-localization tasks on the open source subject system JabRef. The iTrace environment, which is an eye tracking enabled Eclipse plugin, was used to collect eye gaze data. During the data collection phase, an eye tracker was used to gather the source code entities (SCE’s), developers looked at while solving these tasks. We present an algorithm that uses the collected gaze dataset to produce candidate traceability links related to the tasks. In the evaluation phase, we compared the results of our algorithm with the results of an IR technique, in two different contexts. In the first context, precision and recall metric values are reported for both IR and eye gaze approaches based on commits. In the second context, another set of developers were asked to rate the candidate links from each of the two techniques in terms of how useful they were in fixing the bugs. The eye gaze approach outperforms standard LSI and VSM approaches and reports a 55 % precision and 67 % recall on average for all tasks when compared to how the developers actually fixed the bug. In the second context, the usefulness results show that links generated by our algorithm were considered to be significantly more useful (to fix the bug) than those of the IR technique in a majority of tasks. We discuss the implications of this radically different method of deriving traceability links. Techniques for feature location/bug localization are commonly evaluated on benchmarks formed from commits as is done in the evaluation phase of this study. Although, commits are a reasonable source, they only capture entities that were eventually changed to fix a bug or resolve a feature. We investigate another type of benchmark based on eye tracking data, namely links generated from the bug-localization tasks given to the developers in the data collection phase. The source code entities relevant to subjected bugs recommended from IR methods are evaluated on both commits and links generated from eye gaze. The results of the benchmarking phase show that the use of eye tracking could form an effective (complementary) benchmark and add another interesting perspective in the evaluation of bug-localization techniques.  相似文献   

18.
字典学习和稀疏表示的海马子区图像分割   总被引:2,自引:2,他引:0       下载免费PDF全文
目的 海马子区体积极小且结构复杂,现有多图谱的分割方法难以取得理想的分割结果,为此提出一种字典学习和稀疏表示的海马子区分割方法。方法 该方法为目标图像中的每个体素点建立稀疏表示和字典学习模型以获取该点的标记。其中,字典学习模型由图谱灰度图像中的图像块构建。提出利用图谱标记图像的局部二值模式(LBP)特征增强训练字典的判别性;然后求解目标图像块在训练字典中的稀疏表示以确定该点标记;最后依据图谱的先验知识纠正分割结果中的错误标记。结果 与现有典型的多图谱方法进行定性和定量对比,该方法优于现有典型的多图谱分割方法,对较大海马子区的平均分割准确率可达到0.890。结论 本文方法适用于在大脑核磁共振图像中精确分割海马子区,且具有较强的鲁棒性,可为神经退行性疾病的诊断提供可靠的依据。  相似文献   

19.
PCA与移动窗小波变换的高光谱决策融合分类   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 高光谱数据具有较高的谱间分辨率和相关性,给分类处理带来了一定的困难.为了提高分类精度,提出一种结合PCA与移动窗小波变换的高光谱决策融合分类算法.方法 首先,利用相关系数矩阵对原始高光谱数据进行波段分组;然后,利用主成分分析对每组数据进行谱间降维;再根据提出的移动窗小波变换法进行空间特征提取;最后,采用线性意见池(LOP)决策融合规则对多分类器的分类结果进行融合.结果 采用两组来自不同传感器的数据进行实验,所提算法的分类精度和Kappa系数均高于已有的5种分类算法.与SVM-RBF算法相比,本文算法的分类精度高出了8%左右.结论 实验结果表明,本文算法充分挖掘了高光谱图像的谱间-空间信息,能有效提高分类正确率,在小样本情况下和噪声环境中也具有良好的分类性能.  相似文献   

20.
ContextSome recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness.ObjectiveWe evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods.MethodWe present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems.ResultsThe results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base.ConclusionWe conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号