首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Model checking and static analysis are traditionally seen as two separate approaches to software analysis and verification. In this work we define a model, checking approach for the static analysis of large C/C++ source code bases to detect potential run-time issues such as program crashes, security vulnerabilities and memory leaks. Working on the intersection of software model checking and automated static bug detection for real-life systems, we address a number of issues: how to scale for real-life systems of 1,000,000 LoC or more, how to quickly write new checks, and most importantly how to distinguish between relevant and irrelevant bugs and fine tune the analysis accordingly. We define our model checking-based static analysis approach implemented in our tool Goanna, illustrate a number of design and implementation decisions to obtain practical outcomes and relevant results, and present our findings by empirical data obtained from regularly analyzing large industrial and open source code bases such as the Firefox Web browser.  相似文献   

2.
随着软件生态系统和开源社区的发展,代码常在多个软件系统中复制、传播和演化,给软件系统带来了软件质量的不确定性和风险,因此,高效地查找软件系统代码可能的来源是当前研究的热点之一。提出一种基于代码克隆检测的代码来源分析方法,将目标软件代码以方法为单位切割为代码片段,转换为词袋后,在大规模代码资源库中进行并行化代码克隆检测,从而实现方法粒度的代码来源分析。基于该方法,设计并实现了一个代码来源分析工具。该工具能自动分析被测软件项目代码与代码资源库中多个项目以及版本之间的可能来源。实验结果表明,该系统能够有效地找出目标项目在大规模代码库中的代码来源信息,辅助软件维护人员理解和维护代码。  相似文献   

3.
李广威  袁挺  李炼 《软件学报》2022,33(6):2061-2081
软件静态缺陷检测是软件安全领域中的一个研究热点.随着使用C/C++语言编写的软件规模和复杂度的逐渐提高, 软件迭代速度的逐渐加快, 由于静态软件缺陷检测不需要运行目标代码即可发现其中潜藏的缺陷, 因而在工业界和学术界受到了更广泛的关注.近年来涌现大量使用软件静态分析技术的检测工具, 并在不同领域的软件项目中发挥了不可忽视的作用, 但是开发者仍然对静态缺陷检测工具缺乏信心.高误报率是C/C++静态缺陷检测工具难以普及的首要原因.因此, 我们选择现有较为完善的开源C/C++静态缺陷检测工具, 在Juliet基准测试集和37个良好维护的开源软件项目上对特定类型缺陷的检测效果进行了深入研究, 结合检测工具的具体实现归纳了导致静态缺陷检测工具产生误报的关键原因.同时, 我们通过研究静态缺陷检测工具的版本迁移轨迹, 总结出了当下静态分析工具的发展方向和未来趋势, 有助未来静态分析技术的优化和发展, 从而实现静态缺陷检测工具的普及应用.  相似文献   

4.
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-of-the-art software engineering tools, as these tools often require access to both the structural and textual information available in source code.We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-of-the-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.  相似文献   

5.
Metamorphic software changes its internal structure across generations with its functionality remaining unchanged. Metamorphism has been employed by malware writers as a means of evading signature detection and other advanced detection strategies. However, code morphing also has potential security benefits, since it can serve to increase the “genetic diversity” of software. We have created a metamorphic code generator within the LLVM compiler framework. LLVM is a three-phase compiler that supports multiple source languages and target architectures. It uses a common intermediate representation (IR) bytecode in its optimizer. Consequently, any supported high-level programming language is transformed to this IR bytecode as part of the LLVM compilation process. Our metamorphic generator functions at the IR bytecode level, which provides many advantages over morphing at the assembly or source code level. The morphing techniques that we employ include dead code insertion and transposition, where the dead code is actually executed within the morphed code, making its detection and removal more challenging. We have verified the effectiveness of our code morphing using hidden Markov model analysis.  相似文献   

6.
Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Manual categorization is expensive, tedious, and laborious – this is why automatic categorization approaches are gaining widespread importance. Unfortunately, for different legal and organizational reasons, the applications’ source code is often not available, thus making it difficult to automatically categorize these applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries for automatic categorization of software applications that use these API calls. Our approach is general since it enables different categorization algorithms to be applied to repositories that contain both source code and bytecode of applications, since API calls can be extracted from both the source code and byte-code. We compare our approach to a state-of-the-art approach that uses machine learning algorithms for software categorization, and conduct experiments on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source repository with 745 applications, where the source code was not available. Our contribution is twofold: we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, and furthermore we carried out a comprehensive empirical evaluation of automatic categorization approaches.  相似文献   

7.
Context: Static analysis of source code is a scalable method for discovery of software faults and security vulnerabilities. Techniques for static code analysis have matured in the last decade and many tools have been developed to support automatic detection.Objective: This research work is focused on empirical evaluation of the ability of static code analysis tools to detect security vulnerabilities with an objective to better understand their strengths and shortcomings.Method: We conducted an experiment which consisted of using the benchmarking test suite Juliet to evaluate three widely used commercial tools for static code analysis. Using design of experiments approach to conduct the analysis and evaluation and including statistical testing of the results are unique characteristics of this work. In addition to the controlled experiment, the empirical evaluation included case studies based on three open source programs.Results: Our experiment showed that 27% of C/C++ vulnerabilities and 11% of Java vulnerabilities were missed by all three tools. Some vulnerabilities were detected by only one or combination of two tools; 41% of C/C++ and 21% of Java vulnerabilities were detected by all three tools. More importantly, static code analysis tools did not show statistically significant difference in their ability to detect security vulnerabilities for both C/C++ and Java. Interestingly, all tools had median and mean of the per CWE recall values and overall recall across all CWEs close to or below 50%, which indicates comparable or worse performance than random guessing. While for C/C++ vulnerabilities one of the tools had better performance in terms of probability of false alarm than the other two tools, there was no statistically significant difference among tools’ probability of false alarm for Java test cases.Conclusions: Despite recent advances in methods for static code analysis, the state-of-the-art tools are not very effective in detecting security vulnerabilities.  相似文献   

8.
As software systems become increasingly massive, the advantages of automated transformation tools are clearly evident. These tools allow the machine to both reason about and manipulate high-level source code. They enable off-loading of mundane and laborious programming tasks from human developer to machine, thereby reducing cost and development time frames.Although there has been much work in software transformation, there still exist many hurdles in realizing this technology in a commercial domain. From our own experience, there are two significant problems that must be addressed before transformation technology can be usefully applied in a commercial setting. These are: (1) Avoiding disruption of the style (i.e., layout and commenting) of source code and the introduction of any undesired modifications that can occur as a side effect of the transformation process. (2) Correct automated handling of C preprocessing and the presentation of a semantically correct view of the program during transformation. Many existing automated transformation tools require source to be manually modified so that preprocessing constructs can be parsed. The real semantic of the program remains obscured resulting in the need for complicated analysis during transformation. Many systems also resort to pretty printing to generate transformed programs, which inherently disrupts coding style. In this paper we describe our own C/C++ transformation system, Proteus, that addresses both these issues. It has been tested on millions of lines of commercial C/C++ code and has been shown to meet the stringent criteria laid out by Lucent’s own software developers.  相似文献   

9.
Model-based development (MBD) holds the promise to capture potential timing problems in embedded software during the early phases of the development, securing the production of bug-free embedded software. For most MBD approaches, the source code is just an intermediate artifact that can be generated automatically from the models. This assumption clashes with an undeniable fact: a large share of the commercial embedded software exploits existing libraries or is developed using C/C++ natively. A way to reconcile the ambitions of MBD with the use of a programming language is by offering new language constructs and an innovative compilation tool-chain that prevents model error and timing problems “by construction.” However, the persistent popularity of C/C++ among embedded programmers and the limited availability of tools have severely limited the uptake of alternative programming languages for embedded software. Therefore, we propose an original route. Our language proposal, named Tice, has been shaped as a C++ active library. Tice retains full compatibility with existing C++ code, which can be integrated easily into new Tice-based projects. The enforcement of Tice syntax and semantics can be made by a standard C++ compiler, forgoing the need for new tools. In this article, we describe Tice's syntax, semantics, and model of computation and communication. We demonstrate Tice's practical applicability on an industrial scale use-case and give ample evidence for Tice's efficient compilation using off-the-shelf C++ compilers. Finally, we show Tice's code generation process.  相似文献   

10.
There has been an ongoing trend toward collaborative software development using open and shared source code published in large software repositories on the Internet. While traditional source code analysis techniques perform well in single project contexts, new types of source code analysis techniques are ermerging, which focus on global source code analysis challenges. In this article, we discuss how the Semantic Web, can become an enabling technology to provide a standardized, formal, and semantic rich representations for modeling and analyzing large global source code corpora. Furthermore, inference services and other services provided by Semantic Web technologies can be used to support a variety of core source code analysis techniques, such as semantic code search, call graph construction, and clone detection. In this paper, we introduce SeCold, the first publicly available online linked data source code dataset for software engineering researchers and practitioners. Along with its dataset, SeCold also provides some Semantic Web enabled core services to support the analysis of Internet-scale source code repositories. We illustrated through several examples how this linked data combined with Semantic Web technologies can be harvested for different source code analysis tasks to support software trustworthiness. For the case studies, we combine both our linked-data set and Semantic Web enabled source code analysis services with knowledge extracted from StackOverflow, a crowdsourcing website. These case studies, we demonstrate that our approach is not only capable of crawling, processing, and scaling to traditional types of structured data (e.g., source code), but also supports emerging non-structured data sources, such as crowdsourced information (e.g., StackOverflow.com) to support a global source code analysis context.  相似文献   

11.
Mining social networks from software repositories is becoming a popular research area. Mining approaches often use technical artifacts, such as source code, or communication artifacts, such as emails, to create social networks. The authors describe a repository-independent approach of mining task-based communication in social networks. In their approach, collaborative tasks that tools record in software engineering repositories provide the constructed networks' context that link developers' task-based social networks if they've communicated about a collaborative task. These social networks demonstrate the applicability of their approach through two research studies that mined the IBM Rational Jazz development repository. They then propose practical applications that utilize their approach to directly support development projects.  相似文献   

12.
A system for analyzing program structures is described. The system extracts relational information from C programs according to a conceptual model and stores the information in a database. It is shown how several interesting software tasks can be performed by using the relational views. These tasks include generation of graphical views, subsystem extraction, program layering, dead code elimination and binding analysis  相似文献   

13.
The safety and reliability of software is influenced by the choice of implementation language and the choice of programming idioms. C++ is gaining popularity as the implementation language of choice for large software projects because of its promise to reduce the complexity and cost of their construction. But is C++ an appropriate choice for such projects? An assessment of how well C++ fits into recent software guidelines for safety critical systems is presented along with a collection of techniques and idioms for the construction of safer C++ code.  相似文献   

14.
When analyzing legacy code, generating a high‐level model of an application during the reverse engineering process helps the developers understand how the application is structured and how the dependencies relate the different software entities. Within the context of procedural programming languages (such as C), the existing approaches to get a model of the code require documentation and/or implicit knowledge that stakeholders acquire during the software building. These approaches use the code itself to build a syntactic model where we see the different software artifacts, such as variables, functions, and modules. However, there is no supporting methodology to detect and analyze if there are relationships/dependencies between those artifacts, such as which variable in a module is declared using an abstract data type described in another one, or which are the functions that are using parameters typed with an abstract data type; or any design decision taken by original developers, such as how the developer has implemented functions in different modules. On the other hand, current developers use object‐oriented (OO) paradigm to implement not only business applications but also useful methodologies/tools that allow semiautomatic analysis of any application. We must remark the legacy procedural code still has worth and is working in several industries, and as any evolving code, the developers have to be able to perform maintenance tasks minimizing the limitations offered by the language. Based on useful properties that the OO paradigm (and their supporting analysis tools) provide, such as UML models, we propose M2K as a methodology to generate a high‐level model from legacy procedural code, mainly written in Ansi C. To understand how C‐based applications were implemented is not a new problem in software reengineering. However, our contribution is based on building an OO model and suggesting different refactorings that help the developer to improve it and to eventually guide a new implementation of the target application. Specifically, the methodology builds cohesive software entities mapped from procedural code and makes the coupling between C entities explicit in the high‐level model. The result of our methodology is a set of refactored class candidates: a structure that groups a set of variables and a set of functions obtained from the C applications. Based on the class candidate model, we propose refactorings based on OO design principles to improve the design of the application. The most relevant design improvements were obtained with algorithm abstraction by applying the strategy pattern, attributes/methods relocalization, variables types generalization, and removing/renaming methods/attributes. Besides a methodology and the supporting tool, we provide 14 case studies based on real projects implemented in C, and we showed how the results validate our proposal.  相似文献   

15.
李玫  高庆  马森  张世琨  胡文蕙  张兴明 《软件学报》2021,32(7):2242-2259
代码相似性检测(code similarity detection)是软件工程领域的基本任务之一,其在剽窃检测、许可证违反检测、软件复用分析以及漏洞发现等方向均起着重要作用.随着软件开源化的普及以及开源代码量的高速增长,开源代码在各个领域的应用日益频繁,给传统的代码相似性检测方法带来了新的挑战.现有的一些基于词法、语法...  相似文献   

16.
Quantitative empirical software engineering research benefits mightily from processing large open source software repository data sets. The diversity of repository management tools and the long history of some projects, renders the task of working with those datasets a tedious and error-prone exercise. The Alitheia Core analysis platform preprocesses repository data into an intermediate format that allows researchers to provide custom analysis tools. Alitheia Core automatically distributes the processing load on multiple processors while enabling programmatic access to the raw data, the metadata, and the analysis results. The tool has been successfully applied on hundreds of medium to large-sized open-source projects, enabling large-scale empirical studies.  相似文献   

17.
In this article, we present CLAM, a C++ software framework, that offers a complete development and research platform for the audio and music domain. It offers an abstract model for audio systems and includes a repository of processing algorithms and data types as well as all the necessary tools for audio and control input/output. The framework offers tools that enable the exploitation of all these features to easily build cross-platform applications or rapid prototypes for media processing algorithms and systems. Furthermore, included ready-to-use applications can be used for tasks such as audio analysis/synthesis, plug-in development, feature extraction or metadata annotation. CLAM represents a step forward over other similar existing environments in the multimedia domain. Nevertheless, it also shares models and constructs with many of those. These commonalities are expressed in the form of a metamodel for multimedia processing systems and a design pattern language.  相似文献   

18.
Open source software systems are becoming increasingly important these days. Many companies are investing in open source projects and lots of them are also using such software in their own work. But, because open source software is often developed with a different management style than the industrial ones, the quality and reliability of the code needs to be studied. Hence, the characteristics of the source code of these projects need to be measured to obtain more information about it. This paper describes how we calculated the object-oriented metrics given by Chidamber and Kemerer to illustrate how fault-proneness detection of the source code of the open source Web and e-mail suite called Mozilla can be carried out. We checked the values obtained against the number of bugs found in its bug database - called Bugzilla - using regression and machine learning methods to validate the usefulness of these metrics for fault-proneness prediction. We also compared the metrics of several versions of Mozilla to see how the predicted fault-proneness of the software system changed during its development cycle.  相似文献   

19.
代码搜索引擎(code search engines,CSE)的产生和互联网上日益增加的开源代码工程,使得软件开发人员在软件开发的过程中可以大量的重用已有的源代码。然而大部分开发人员使用CSEs只是简单完成相关代码搜索。该文给出了一种通用的范型挖掘过程模型,能够充分利用CSEs,通过挖掘源代码范型保证重用代码的质量,并详细的说明了该范型挖掘过程模型在三个方面辅助软件质量改进。  相似文献   

20.
代码搜索引擎(code search engines,CSE)产生和互联网上日益增加的开源代码工程,使得软件开发人员在软件开发的过程中可以大量的重用已有的源代码。然而大部分开发人员使用CSEs只是简单完成相关代码搜索。该文给出了一种通用的范型挖掘过程模型,能够充分利用CSEs,通过挖掘源代码范型保证重用代码的质量,并详细的说明了该范型挖掘过程模型在三个方面辅助软件质量改进。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号