首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Model checking is an effective technique used to identify subtle problems in software safety using a comprehensive search algorithm. However, this comprehensiveness requires a large number of resources and is often too expensive to be applied in practice. This work strives to find a practical solution to model‐checking automotive operating systems for the purpose of safety analysis, with minimum requirements and a systematic engineering approach for applying the technique in practice. The paper presents methods for converting the Trampoline kernel code into formal models for the model checker SPIN, a series of experiments using an incremental verification approach, and the use of embedded C constructs for performance improvement. The conversion methods include functional modularization and treatment for hardware‐dependent code, such as memory access for context switching. The incremental verification approach aims at increasing the level of confidence in the verification even when comprehensiveness cannot be provided because of the limitations of the hardware resource. We also report on potential safety issues found in the Trampoline operating system during the experiments and present experimental evidence of the performance improvement using the embedded C constructs in SPIN. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
漏洞这一名词伴随着计算机软件领域的发展已经走过了数十载。自世界上第一个软件漏洞被公开以来,软件安全研究者和工程师们就一直在探索漏洞的挖掘与分析方法。源代码漏洞静态分析是一种能够贯穿整个软件开发生命周期的、帮助软件开发人员及早发现漏洞的技术,在业界有着广泛的使用。然而,随着软件的体量越来越大,软件的功能越来越复杂,如何表示和建模软件源代码是当前面临的一个难题;此外,近年来的研究倾向于将源代码漏洞静态分析和机器学习相结合,试图通过引入机器学习模型提升漏洞挖掘的精度,但如何选择和构建合适的机器学习模型是该研究方向的一个核心问题。本文将目光聚焦于源代码漏洞静态分析技术(以下简称:静态分析技术),通过对该领域相关工作的回顾,将静态分析技术的研究分为两个方向:传统静态分析和基于学习的静态分析。传统静态分析主要是利用数据流分析、污点分析等一系列软件分析技术对软件的源代码进行建模分析;基于学习的静态分析则是将源代码以数值的形式表示并提交给学习模型,利用学习模型挖掘源代码的深层次表征特征和关联性。本文首先阐述了软件漏洞分析技术的基本概念,对比了静态分析技术和动态分析技术的优劣;然后对源代码的表示方法进行...  相似文献   

3.
A software repository provides a central information source for understanding and reengineering code in a software project. Complex reverse engineering tools can be built by analyzing information stored in the repository without reparsing the original source code. The most critical design aspect of a repository is its data model, which directly affects how effectively the repository supports various analysis tasks. This paper focuses on the design rationales behind a data model for a C++ software repository that supports reachability analysis and dead code detection at the declaration level. These two tasks are frequently needed in large software projects to help remove excess software baggage, select regression tests and support software reuse studies. The language complexity introduced by class inheritance, friendship, and template instantiation in C++ requires a carefully designed model to catch all necessary dependencies for correct reachability analysis. We examine the major design decisions and their consequences in our model and illustrate how future software repositories can be evaluated for completeness at a selected abstraction level. Examples are given to illustrate how our model also supports variants of reachability analysis: impact analysis, class visibility analysis, and dead code detection. Finally, we discuss the implementation and experience of our analysis tools on a few C++ software projects  相似文献   

4.
Despite many advances, today’s software model checkers and extended static checkers still do not scale well to large code bases when verifying properties that depend on complex interprocedural flow of data. An obvious approach to improve performance is to exploit software structure. Although a tremendous amount of work has been done on exploiting structure at various levels of granularity, the fine-grained shared structure among multiple verification conditions has been largely ignored. In this paper, we formalize the notion of shared structure among verification conditions and propose a novel and efficient approach to exploit this sharing by safely reusing facts learned while checking one verification condition to help solve the others. Experimental results show that this approach can improve the performance of verification, even on path- and context-sensitive and dataflow-intensive properties.  相似文献   

5.
Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task.In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system.  相似文献   

6.
H. Szer 《Software》2015,45(10):1359-1373
Static code analysis tools automatically generate alerts for potential software faults that can lead to failures. However, these tools usually generate a very large number of alerts, some of which are subject to false positives. Because of limited resources, it is usually hard to inspect all the alerts. As a complementary approach, runtime verification techniques verify dynamic system behavior with respect to a set of specifications. However, these specifications are usually created manually based on system requirements and constraints. In this paper, we introduce a noval approach and a toolchain for integrated static code analysis and runtime verification. Alerts that are generated by static code analysis tools are utilized for automatically generating runtime verification specifications. On the other hand, runtime verification results are used for automatically generating filters for static code analysis tools to eliminate false positives. The approach is illustrated for the static analysis and runtime verification of an open‐source bibliography reference manager software. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

7.
There has been an ongoing trend toward collaborative software development using open and shared source code published in large software repositories on the Internet. While traditional source code analysis techniques perform well in single project contexts, new types of source code analysis techniques are ermerging, which focus on global source code analysis challenges. In this article, we discuss how the Semantic Web, can become an enabling technology to provide a standardized, formal, and semantic rich representations for modeling and analyzing large global source code corpora. Furthermore, inference services and other services provided by Semantic Web technologies can be used to support a variety of core source code analysis techniques, such as semantic code search, call graph construction, and clone detection. In this paper, we introduce SeCold, the first publicly available online linked data source code dataset for software engineering researchers and practitioners. Along with its dataset, SeCold also provides some Semantic Web enabled core services to support the analysis of Internet-scale source code repositories. We illustrated through several examples how this linked data combined with Semantic Web technologies can be harvested for different source code analysis tasks to support software trustworthiness. For the case studies, we combine both our linked-data set and Semantic Web enabled source code analysis services with knowledge extracted from StackOverflow, a crowdsourcing website. These case studies, we demonstrate that our approach is not only capable of crawling, processing, and scaling to traditional types of structured data (e.g., source code), but also supports emerging non-structured data sources, such as crowdsourced information (e.g., StackOverflow.com) to support a global source code analysis context.  相似文献   

8.
Software model checkers quickly reach their limits when being applied to verifying pointer safety properties in source code that includes function pointers and inlined assembly. This article introduces a novel technique for checking pointer safety violations, called symbolic object code analysis (SOCA), which is based on bounded symbolic execution, incorporates path-sensitive slicing, and employs the SMT solver Yices as its execution and verification engine. Extensive experimental results of a prototypic SOCA Verifier, using the Verisec suite and almost 10,000 Linux device driver functions as benchmarks, show that SOCA performs competitively to modern source-code model checkers, scales well when applied to real operating systems code and pointer safety issues, and effectively explores niches of pointer-complex software that current software verifiers do not reach.  相似文献   

9.
近几十年来,计算机硬件性能和软件规模技术已不同以往,其承载了人类社会生活生产的方方面面.计算机技术的飞速发展,也带来了人们对程序安全问题的关注.由于市面上存在着较多的遗留软件,这些软件无人维护且缺乏源代码支持,其安全性令人担忧,而二进制分析技术被用来解决该类软件问题.二进制分析技术根据其检测方式不同可分为:基于静态的二进制代码分析技术、基于动态的二进制代码分析技术和动静态混合的二进制代码分析技术.本文调研了近年来的二进制代码安全分析领域上相关研究,分别详细阐述了这3类技术中的主要方法,并对其关键技术进行详细介绍.  相似文献   

10.
When reengineering software systems, maintainers should be able to assess and compare multiple change scenarios for a given goal, so as to choose the most pertinent one. Because they implicitly consider one single working copy, revision control systems do not scale up well to perform simultaneous analyses of multiple versions of systems. We designed Orion, an interactive prototyping tool for reengineering, to simulate changes and compare their impact on multiple versions of software source code models. Our approach offers an interactive simulation of changes, reuses existing assessment tools, and has the ability to hold multiple and branching versions simultaneously in memory. Specifically, we devise an infrastructure which optimizes memory usage of multiple versions for large models. This infrastructure uses an extension of the FAMIX source code meta-model but it is not limited to source code analysis tools since it can be applied to models in general. In this paper, we validate our approach by running benchmarks on memory usage and computation time of model queries on large models. Our benchmarks show that the Orion approach scales up well in terms of memory usage, while the current implementation could be optimized to lower its computation time. We also report on two large case studies on which we applied Orion.  相似文献   

11.
Software systems accumulate technical debt (TD) when short-term goals in software development are traded for long-term goals (e.g., quick-and-dirty implementation to reach a release date versus a well-refactored implementation that supports the long-term health of the project). Some forms of TD accumulate over time in the form of source code that is difficult to work with and exhibits a variety of anomalies. A number of source code analysis techniques and tools have been proposed to potentially identify the code-level debt accumulated in a system. What has not yet been studied is if using multiple tools to detect TD can lead to benefits, that is, if different tools will flag the same or different source code components. Further, these techniques also lack investigation into the symptoms of TD “interest” that they lead to. To address this latter question, we also investigated whether TD, as identified by the source code analysis techniques, correlates with interest payments in the form of increased defect- and change-proneness. Comparing the results of different TD identification approaches to understand their commonalities and differences and to evaluate their relationship to indicators of future TD “interest.” We selected four different TD identification techniques (code smells, automatic static analysis issues, grime buildup, and Modularity violations) and applied them to 13 versions of the Apache Hadoop open source software project. We collected and aggregated statistical measures to investigate whether the different techniques identified TD indicators in the same or different classes and whether those classes in turn exhibited high interest (in the form of a large number of defects and higher change-proneness). The outputs of the four approaches have very little overlap and are therefore pointing to different problems in the source code. Dispersed Coupling and Modularity violations were co-located in classes with higher defect-proneness. We also observed a strong relationship between Modularity violations and change-proneness. Our main contribution is an initial overview of the TD landscape, showing that different TD techniques are loosely coupled and therefore indicate problems in different locations of the source code. Moreover, our proxy interest indicators (change- and defect-proneness) correlate with only a small subset of TD indicators.  相似文献   

12.
基于开源源码大数据进行代码生成、缺陷预测等是当前智能化软件开发方法与技术的重要研究内容。然而现有的关注点主要聚焦于各种推荐、预测等智能算法的研究,较少对研究所使用数据的质量进行评估与分析。大部分智能化软件开发研究的数据来源于开源数据托管平台,受限于开发者自身水平,它们并不能保证都具有较高质量。根据"garbage in,garbage out",这会影响最终结果质量。源码数据的质量对相关的研究有重要影响,却没有得到足够的重视。针对上述问题,提出了一种面向开源源码大数据的方法块数据质量评估方法。首先研究如何定义和评估GitHub上抽取的源码的数据质量问题,然后对开源源码从不同维度进行质量评估。通过该源码数据质量评估方法可以帮助相关研究人员构建具有更高质量的数据集,进而提高智能化相关研究,比如代码生成、缺陷预测等的结果质量。  相似文献   

13.
With the technology advances it becomes feasible to implement a large multiprocessor system on a single chip. In such Systems-on-Chip (SoCs), a significant portion of energy is spent in the memory subsystem. There are several approaches reducing this energy, including the ones at physical, architecture and algorithmic levels. Classical approaches, including algorithmic and some architectural approaches, use static analysis and transformation of the application source code. However, often it is not possible to perform static analysis and optimization of a program’s memory access behavior unless the program is written in an easily analyzable form, e.g., free from pointer arithmetic. In this paper, we introduce the FORAY model of a program that allows aggressive analysis of the application’s memory behavior and enables such optimizations on arbitrary code which are not possible to apply otherwise. We then present FORAY-GEN: an automated profile-based approach for extraction of the FORAY model from the original program. We also outline our approach in applying FORAY-GEN for multiprocessor SoCs. We demonstrate how FORAY-GEN enhances applicability of other memory subsystem optimization approaches, resulting in an average of two times increase in the number of memory references that can be analyzed by existing static approaches.  相似文献   

14.
Context: Static analysis of source code is a scalable method for discovery of software faults and security vulnerabilities. Techniques for static code analysis have matured in the last decade and many tools have been developed to support automatic detection.Objective: This research work is focused on empirical evaluation of the ability of static code analysis tools to detect security vulnerabilities with an objective to better understand their strengths and shortcomings.Method: We conducted an experiment which consisted of using the benchmarking test suite Juliet to evaluate three widely used commercial tools for static code analysis. Using design of experiments approach to conduct the analysis and evaluation and including statistical testing of the results are unique characteristics of this work. In addition to the controlled experiment, the empirical evaluation included case studies based on three open source programs.Results: Our experiment showed that 27% of C/C++ vulnerabilities and 11% of Java vulnerabilities were missed by all three tools. Some vulnerabilities were detected by only one or combination of two tools; 41% of C/C++ and 21% of Java vulnerabilities were detected by all three tools. More importantly, static code analysis tools did not show statistically significant difference in their ability to detect security vulnerabilities for both C/C++ and Java. Interestingly, all tools had median and mean of the per CWE recall values and overall recall across all CWEs close to or below 50%, which indicates comparable or worse performance than random guessing. While for C/C++ vulnerabilities one of the tools had better performance in terms of probability of false alarm than the other two tools, there was no statistically significant difference among tools’ probability of false alarm for Java test cases.Conclusions: Despite recent advances in methods for static code analysis, the state-of-the-art tools are not very effective in detecting security vulnerabilities.  相似文献   

15.
Static analysis is an efficient approach for software assurance. It is indicated that its most effective usage is to perform analysis in an interactive way through the software development process, which has a high performance requirement. This paper concentrates on rule-based static analysis tools and proposes an optimized rule-checking algorithm. Our technique improves the performance of static analysis tools by filtering vulnerability rules in terms of characteristic objects before checking source files. Since a source file always contains vulnerabilities of a small part of rules rather than all, our approach may achieve better performance. To investigate our technique’s feasibility and effectiveness, we implemented it in an open source static analysis tool called PMD and used it to conduct experiments. Experimental results show that our approach can obtain an average performance promotion of 28.7% compared with the original PMD. While our approach is effective and precise in detecting vulnerabilities, there is no side effect.  相似文献   

16.
To improve software quality, static or dynamic defect-detection tools accept programming rules as input and detect their violations in software as defects. As these programming rules are often not well documented in practice, previous work developed various approaches that mine programming rules as frequent patterns from program source code. Then these approaches use static or dynamic defect-detection techniques to detect pattern violations in source code under analysis. However, these existing approaches often produce many false positives due to various factors. To reduce false positives produced by these mining approaches, we develop a novel approach, called Alattin, that includes new mining algorithms and a technique for detecting neglected conditions based on our mining algorithm. Our new mining algorithms mine patterns in four pattern formats: conjunctive, disjunctive, exclusive-disjunctive, and combinations of these patterns. We show the benefits and limitations of these four pattern formats with respect to false positives and false negatives among detected violations by applying those patterns to the problem of detecting neglected conditions.  相似文献   

17.
Modular software model checking of large real‐world systems is known to require extensive manual effort in environment modelling and preparing source code for model checking. Avinux is a tool chain that facilitates the automatic analysis of Linux and especially of Linux device drivers. The tool chain is implemented as a plugin for the Eclipse IDE, using the source code bounded model checker CBMC as its backend. Avinux supports a verification process for Linux that is built upon specification annotations with SLICx (an extension of the SLIC language), automatic data environment creation, source code transformation and simplification, and the invocation of the verification backend. In this paper technical details of the verification process are presented: Using Avinux on thousands of drivers from various Linux versions led to the discovery of six new errors. In these experiments, Avinux also reduced the immense overhead of manual code preprocessing that other projects incurred. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

18.
In this paper, we are exploring the approach to utilize system-specific static analyses of code with the goal to improve software quality for specific software systems. Specialized analyses, tailored for a particular system, make it possible to take advantage of system/domain knowledge that is not available to more generic analyses. Furthermore, analyses can be selected and/or developed in order to best meet the challenges and specific issues of the system at hand. As a result, such analyses can be used as a complement to more generic code analysis tools because they are likely to have a better impact on (business) concerns such as improving certain software quality attributes and reducing certain classes of failures. We present a case study of a large, industrial embedded system, giving examples of what kinds of analyses could be realized and demonstrate the feasibility of implementing such analyses. We synthesize lessons learned based on our case study and provide recommendations on how to realize system-specific analyses and how to get them adopted by industry.  相似文献   

19.
Many of the existing approaches in Software Comprehension focus on program structure or external documentation. However, by analyzing formal information the informal semantics contained in the vocabulary of source code are overlooked. To understand software as a whole, we need to enrich software analysis with the developer knowledge hidden in the code naming. This paper proposes the use of information retrieval to exploit linguistic information found in source code, such as identifier names and comments. We introduce Semantic Clustering, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary. We call these groups semantic clusters and we interpret them as linguistic topics that reveal the intention of the code. We compare the topics to each other, identify links between them, provide automatically retrieved labels, and use a visualization to illustrate how they are distributed over the system. Our approach is language independent as it works at the level of identifier names. To validate our approach we applied it on several case studies, two of which we present in this paper.Note: Some of the visualizations presented make heavy use of colors. Please obtain a color copy of the article for better understanding.  相似文献   

20.
代码搜索引擎(code search engines,CSE)的产生和互联网上日益增加的开源代码工程,使得软件开发人员在软件开发的过程中可以大量的重用已有的源代码。然而大部分开发人员使用CSEs只是简单完成相关代码搜索。该文给出了一种通用的范型挖掘过程模型,能够充分利用CSEs,通过挖掘源代码范型保证重用代码的质量,并详细的说明了该范型挖掘过程模型在三个方面辅助软件质量改进。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号