首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents the results of a large scale empirical study of coherent dependence clusters. All statements in a coherent dependence cluster depend upon the same set of statements and affect the same set of statements; a coherent cluster's statements have ‘coherent’ shared backward and forward dependence. We introduce an approximation to efficiently locate coherent clusters and show that it has a minimum precision of 97.76%. Our empirical study also finds that, despite their tight coherence constraints, coherent dependence clusters are in abundance: 23 of the 30 programs studied have coherent clusters that contain at least 10% of the whole program. Studying patterns of clustering in these programs reveals that most programs contain multiple substantial coherent clusters. A series of subsequent case studies uncover that all clusters of significant size map to a logical functionality and correspond to a program structure. For example, we show that for the program acct, the top five coherent clusters all map to specific, yet otherwise non-obvious, functionality. Cluster visualization also brings out subtle deficiencies in program structure and identifies potential refactoring candidates. A study of inter-cluster dependence is used to highlight how coherent clusters are connected to each other, revealing higher-level structures, which can be used in reverse engineering. Finally, studies are presented to illustrate how clusters are not correlated with program faults as they remain stable during most system evolution.  相似文献   

2.
Understanding the implementation of a certain feature of a system requires identification of the computational units of the system that contribute to this feature. In many cases, the mapping of features to the source code is poorly documented. In this paper, we present a semiautomatic technique that reconstructs the mapping for features that are triggered by the user and exhibit an observable behavior. The mapping is in general not injective; that is, a computational unit may contribute to several features. Our technique allows for the distinction between general and specific computational units with respect to a given set of features. For a set of features, it also identifies jointly and distinctly required computational units. The presented technique combines dynamic and static analyses to rapidly focus on the system's parts that relate to a specific set of features. Dynamic information is gathered based on a set of scenarios invoking the features. Rather than assuming a one-to-one correspondence between features and scenarios as in earlier work, we can now handle scenarios that invoke many features. Furthermore, we show how our method allows incremental exploration of features while preserving the "mental map" the analyst has gained through the analysis.  相似文献   

3.
ContextIdentifying refactoring opportunities in object-oriented code is an important stage that precedes the actual refactoring process. Several techniques have been proposed in the literature to identify opportunities for various refactoring activities.ObjectiveThis paper provides a systematic literature review of existing studies identifying opportunities for code refactoring activities.MethodWe performed an automatic search of the relevant digital libraries for potentially relevant studies published through the end of 2013, performed pilot and author-based searches, and selected 47 primary studies (PSs) based on inclusion and exclusion criteria. The PSs were analyzed based on a number of criteria, including the refactoring activities, the approaches to refactoring opportunity identification, the empirical evaluation approaches, and the data sets used.ResultsThe results indicate that research in the area of identifying refactoring opportunities is highly active. Most of the studies have been performed by academic researchers using nonindustrial data sets. Extract Class and Move Method were found to be the most frequently considered refactoring activities. The results show that researchers use six primary existing approaches to identify refactoring opportunities and six approaches to empirically evaluate the identification techniques. Most of the systems used in the evaluation process were open-source, which helps to make the studies repeatable. However, a relatively high percentage of the data sets used in the empirical evaluations were small, which limits the generality of the results.ConclusionsIt would be beneficial to perform further studies that consider more refactoring activities, involve researchers from industry, and use large-scale and industrial-based systems.  相似文献   

4.
针对已有的基于链接分析的热点发现方法存在准确度较低、易受作弊链接影响、易产生主题漂移现象等问题,利用复杂网络簇结构具有高度主题相关的特点,提出一种融合应用链接分析和萤火虫算法聚类博文的热点话题发现算法。以博文页面为节点,与博文内容相同或相关的链接作为边,根据博文及博主的相关属性,综合评定页面权重,建立博客话题模型;运用萤火虫算法对博文进行聚类获得聚类中心,按页面权重将聚类中心从大到小排序,形成热点话题热度排行。实验结果表明,该方法能够发现精度更高、数量更多的博客热点话题。  相似文献   

5.
利用聚类优化语义Web服务发现   总被引:1,自引:0,他引:1       下载免费PDF全文
针对传统Web服务缺乏语义造成注册中心返回结果不精确的问题,提出了一种用OWL-S提供语义支持,并据语义相似度将Web服务聚类的解决方法。该方法应用OWL-S实现Web服务的语义描述,采用凝聚的层次聚类的Single-Link算法将相似Web服务聚类,快速定位并返回最合适的服务,提高了服务发现的精确性。  相似文献   

6.
7.
Appropriate comments of code snippets provide insight for code functionality, which are helpful for program comprehension. However, due to the great cost of authoring with the comments, many code projects do not contain adequate comments. Automatic comment generation techniques have been proposed to generate comments from pieces of code in order to alleviate the human efforts in annotating the code. Most existing approaches attempt to exploit certain correlations (usually manually given) between code and generated comments, which could be easily violated if coding patterns change and hence the performance of comment generation declines. In addition, recent approaches ignore exploiting the code constructs and leveraging the code snippets like plain text. Furthermore, previous datasets are also too small to validate the methods and show their advantage. In this paper, we propose a new attention mechanism called CodeAttention to translate code to comments, which is able to utilize the code constructs, such as critical statements, symbols and keywords. By focusing on these specific points, CodeAttention could understand the semantic meanings of code better than previous methods. To verify our approach in wider coding patterns, we build a large dataset from open projects in GitHub. Experimental results in this large dataset demonstrate that the proposed method has better performance over existing approaches in both objective and subjective evaluation. We also perform ablation studies to determine effects of different parts in CodeAttention.  相似文献   

8.
Source code documentation often contains summaries of source code written by authors. Recently, automatic source code summarization tools have emerged that generate summaries without requiring author intervention. These summaries are designed for readers to be able to understand the high-level concepts of the source code. Unfortunately, there is no agreed upon understanding of what makes up a “good summary.” This paper presents an empirical study examining summaries of source code written by authors, readers, and automatic source code summarization tools. This empirical study examines the textual similarity between source code and summaries of source code using Short Text Semantic Similarity metrics. We found that readers use source code in their summaries more than authors do. Additionally, this study finds that accuracy of a human written summary can be estimated by the textual similarity of that summary to the source code.  相似文献   

9.
10.
基于静态分析技术的源代码安全检测模型*   总被引:2,自引:0,他引:2  
介绍了当前主流的静态代码分析技术,在分析讨论其优缺点的基础上提出了一种新的静态代码检测模型。该模型结合了当前成熟的静态分析技术,并借鉴了编译器中数据流和控制流分析的思想,获取上下文关联的数据信息,从而更加准确地分析代码中存在的安全问题。  相似文献   

11.
A concern can be characterized as a developer׳s intent behind a piece of code, often not explicitly captured in it. We discuss a technique of recording concerns using source code annotations (concern annotations). Using two studies and two controlled experiments, we seek to answer the following 3 research questions: (1) Do programmers׳ mental models overlap? (2) How do developers use shared concern annotations when they are available? (3) Does using annotations created by others improve program comprehension and maintenance correctness, time and confidence? The first study shows that developers׳ mental models, recorded using concern annotations, overlap and thus can be shared. The second study shows that shared concern annotations can be used during program comprehension for the following purposes: hypotheses confirmation, feature location, obtaining new knowledge, finding relationships and maintenance notes. The first controlled experiment with students showed that the presence of annotations significantly reduced program comprehension and maintenance time by 34%. The second controlled experiment was a differentiated replication of the first one, focused on industrial developers. It showed a 33% significant improvement in correctness. We conclude that concern annotations are a viable way to share developers׳ thoughts.  相似文献   

12.
A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.  相似文献   

13.
We propose a semantic passage segmentation method for a Question Answering (QA) system. We define a semantic passage as sentences grouped by semantic coherence, determined by the topic assigned to individual sentences. Topic assignments are done by a sentence classifier based on a statistical classification technique, Maximum Entropy (ME), combined with multiple linguistic features. We ran experiments to evaluate the proposed method and its impact on application tasks, passage retrieval and template-filling for question answering. The experimental result shows that our semantic passage retrieval method using topic matching is more useful than fixed length passage retrieval. With the template-filling task used for information extraction in the QA system, the value of the sentence topic assignment method was reinforced.  相似文献   

14.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

15.
There are a number of reasons why one might wish to transform the source code of an operational program:
  • 1 To make the program conform to a standard layout.
  • 2 To make the program conform to syntax and semantics standards.
  • 3 To improve the performance of the program.
The primary advantage of transforming source code into a standard form is that the programs become easier to maintain. The cost-benefit of standardization is thus realized at a later stage in the lifetime of the programs. Alternatively the cost-benefit of improving performance is immediate. The desirability of transforming source code is affected by several parameters:
  • 1 The benefit to be realized from transformation.
  • 2 The cost of transformation.
  • 3 The time involved in transformation.
  • 4 The risk associated with transformation.
If the benefit of transformation is significant, the cost, time and risk associated with the process can be considerably reduced by automating the process. In this paper, application of the CONVERT language to the transformation problem is discussed. CONVERT was developed as a vehicle for writing automatic language and dialect converters. Clearly, the features useful for converters are also applicable when the application involves transformation of source code.  相似文献   

16.
Many tools designed to help programmers view and manipulate source code exploit the formal structure of the programming language. Language-based tools use information derived via linguistic analysis to offer services that are impractical for purely text-based tools. In order to be effective, however, language-based tools must be designed to account properly for the documentary structure of source code: a structure that is largely orthogonal to the linguistic but no less important. Documentary structure includes, in addition to the language text, all extra-lingual information added by programmers for the sole purpose of aiding the human reader: comments, white space, and choice of names. Largely ignored in the research literature, documentary structure occupies a central role in the practice of programming. An examination of the documentary structure of programs leads to a better understanding of requirements for tool architectures.  相似文献   

17.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

18.
19.
针对恶意代码数量呈爆发式增长,但真正的新型恶意代码却不多,多数是已有代码变种的情况,通过研究恶意代码的行为特征,提出了一套判别恶意代码同源性的方法。从恶意代码的行为特征入手,通过敏感恶意危险行为以及产生危险行为的代码流程、函数调用,应用反汇编工具提取具体特征,计算不同恶意代码之间的相似性度量,进行同源性分析比对,利用DBSCAN聚类算法将具有相同或相似特征的恶意代码汇聚成不同的恶意代码家族。设计并实现了原型系统,实验结果表明提出的方法能够有效地对不同恶意代码及其变种进行同源性分析及判定。  相似文献   

20.
为了提高源程序代码之间相似性的检测效率,提出一种基于序列聚类的相似代码检测算法.算法首先把源代码按照其自身的结构进行分段提取,然后对各个分段进行部分代码变换,再以带权重的编辑距离为相似度量标准对这些符号进行序列聚类,得到相似的程序代码片段,以达到对源程序进行相似功能检测的目的.使用多个真实和仿真程序对上述算法进行了实验,实验结果验证了算法的有效性和可伸缩性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号