首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
并发程序由多个共享存储空间并发执行的流程组成.由于流程之间执行次序的不确定性,使得并发软件系统的测试比较困难.变异测试是一种基于故障的软件测试技术,广泛用于评估测试用例集的充分性和测试技术的有效性.将变异测试应用于并发程序的一个关键问题是,如何高效地生成大量的模拟并发故障的变异体集合.给出了一种并发程序的变异测试框架,...  相似文献   

2.
ContextMutation analysis has been widely used in research studies to evaluate the effectiveness of test suites and testing techniques. Faulty versions (i.e., mutants) of a program are generated such that each mutant contains one seeded fault. The mutation score provides a measure of effectiveness.ObjectiveWe study three problems with the use of mutation analysis for testing AspectJ programs:
  • •The manual identification and removal of equivalent mutants is difficult and time consuming. We calculate the percentage of equivalent mutants generated for benchmark AspectJ programs using available mutation tools.
  • •The generated mutants need to cover the various fault types described in the literature on fault models for AspectJ programs. We measure the distribution of the mutants generated using available mutation tools with respect to the AspectJ fault types.
  • •We measure the difficulty of killing the generated mutants.
We propose the use of simple analysis of the subject programs to prevent the generation of some equivalent mutants.MethodWe revised existing AspectJ fault models and presented a fault model that removes the problems in existing fault models, such as overlapping between fault types and missing fault types. We also defined three new fault types that occur due to incorrect data-flow interactions occurring in AspectJ programs. We used three mutation tools: AjMutator, Proteum/AJ, and MuJava on three AspectJ programs. To measure the difficulty of killing the mutants created using a mutation operator, we compared the average number of the mutants killed by 10 test suites that satisfy block coverage criterion.ResultsA high percentage of the mutants are equivalent. The mutation tools do not cover all the fault types. Only 4 out of 27 operators generated mutants that were easy to kill.ConclusionsOur analysis approach removed about 80% of the equivalent mutants. Higher order mutation is needed to cover all the fault types.  相似文献   

3.

Context

The increasing presence of Object-Oriented (OO) programs in industrial systems is progressively drawing the attention of mutation researchers toward this paradigm. However, while the number of research contributions in this topic is plentiful, the number of empirical results is still marginal and mostly provided by researchers rather than practitioners.

Objective

This article reports our experience using mutation testing to measure the effectiveness of an automated test data generator from a user perspective.

Method

In our study, we applied both traditional and class-level mutation operators to FaMa, an open source Java framework currently being used for research and commercial purposes. We also compared and contrasted our results with the data obtained from some motivating faults found in the literature and two real tools for the analysis of feature models, FaMa and SPLOT.

Results

Our results are summarized in a number of lessons learned supporting previous isolated results as well as new findings that hopefully will motivate further research in the field.

Conclusion

We conclude that mutation testing is an effective and affordable technique to measure the effectiveness of test mechanisms in OO systems. We found, however, several practical limitations in current tool support that should be addressed to facilitate the work of testers. We also missed specific techniques and tools to apply mutation testing at the system level.  相似文献   

4.
ContextTesting and debugging consume a significant portion of software development effort. Both processes are usually conducted independently despite their close relationship with each other. Test adequacy is vital for developers to assure that sufficient testing effort has been made, while finding all the faults in a program as soon as possible is equally important. A tight integration between testing and debugging activities is essential.ObjectiveThe paper aims at finding whether three factors, namely, the adequacy criterion to gauge a test suite, the size of a prioritized test suite, and the percentage of such a test suite used in fault localization, have significant impacts on integrating test case prioritization techniques with statistical fault localization techniques.MethodWe conduct a controlled experiment to investigate the effectiveness of applying adequate test suites to locate faults in a benchmark suite of seven Siemens programs and four real-life UNIX utility programs using three adequacy criteria, 16 test case prioritization techniques, and four statistical fault localization techniques. We measure the proportion of code needed to be examined in order to locate a fault as the effectiveness of statistical fault localization techniques. We also investigate the integration of test case prioritization and statistical fault localization with postmortem analysis.ResultThe main result shows that on average, it is more effective for a statistical fault localization technique to utilize the execution results of a MC/DC-adequate test suite than those of a branch-adequate test suite, and is in turn more effective to utilize the execution results of a branch-adequate test suite than those of a statement-adequate test suite. On the other hand, we find that none of the fault localization techniques studied can be sufficiently effective in suggesting fault-relevant statements that can fit easily into one debug window of a typical IDE.ConclusionWe find that the adequacy criterion and the percentage of a prioritized test suite utilized are major factors affecting the effectiveness of statistical fault localization techniques. In our experiment, the adoption of a stronger adequacy criterion can lead to more effective integration of testing and debugging.  相似文献   

5.
We report results from an experiment to compare the fault detection effectiveness of mutation, its variants and the all-uses data flow criteria. Adequate test sets were generated randomly, as opposed to by human testers as in some previous studies. We view our results in the light of those from earlier studies comparing mutation with path-oriented testing strategies. We identify and discuss factors that one might consider while evaluating an adequacy criterion for use in practice. Results from our experiments strengthen a hypothesis that an adequacy criterion based on one of the two variants of mutation has superior fault detection effectiveness than that of the all-uses criterion.  相似文献   

6.
Regression testing is an important activity in the software life cycle, but it can also be very expensive. To reduce the cost of regression testing, software testers may prioritize their test cases so that those which are more important, by some measure, are run earlier in the regression testing process. One potential goal of test case prioritization techniques is to increase a test suite's rate of fault detection (how quickly, in a run of its test cases, that test suite can detect faults). Previous work has shown that prioritization can improve a test suite's rate of fault detection, but the assessment of prioritization techniques has been limited primarily to hand-seeded faults, largely due to the belief that such faults are more realistic than automatically generated (mutation) faults. A recent empirical study, however, suggests that mutation faults can be representative of real faults and that the use of hand-seeded faults can be problematic for the validity of empirical results focusing on fault detection. We have therefore designed and performed two controlled experiments assessing the ability of prioritization techniques to improve the rate of fault detection of test case prioritization techniques, measured relative to mutation faults. Our results show that prioritization can be effective relative to the faults considered, and they expose ways in which that effectiveness can vary with characteristics of faults and test suites. More importantly, a comparison of our results with those collected using hand-seeded faults reveals several implications for researchers performing empirical studies of test case prioritization techniques in particular and testing techniques in general  相似文献   

7.
Using spanning sets for coverage testing   总被引:1,自引:0,他引:1  
A test coverage criterion defines a set E/sub r/ of entities of the program flowgraph and requires that every entity in this set is covered under some test Case. Coverage criteria are also used to measure the adequacy of the executed test cases. In this paper, we introduce the notion of spanning sets of entities for coverage testing. A spanning set is a minimum subset of E/sub r/, such that a test suite covering the entities in this subset is guaranteed to cover every entity in E/sub r/. When the coverage of an entity always guarantees the coverage of another entity, the former is said to subsume the latter. Based on the subsumption relation between entities, we provide a generic algorithm to find spanning sets for control flow and data flow-based test coverage criteria. We suggest several useful applications of spanning sets: They help reduce and estimate the number of test cases needed to satisfy coverage criteria. We also empirically investigate how the use of spanning sets affects the fault detection effectiveness.  相似文献   

8.
The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria (block, decision, C-use, and P-use), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results: generated mutants can be used to predict the detection effectiveness of real faults. Applying such a mutation analysis, we then investigate the relative cost and effectiveness of the above-mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants, which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then; compared with published studies, plausible reasons for the differences are provided, and the research leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment  相似文献   

9.
变异分析是一种广泛用来评估软件测试技术性能的方法.已有的变异分析技术通常将变异算子平均地应用于原始程序.由于现实程序中的故障分布往往具有群束的特征,采用平均分布的变异分析方法不能客观地评估软件测试技术的性能.前期研究工作中提出了非均匀分布的变异分析方法,采用实例研究验证了不同的故障分布对测试技术性能评估的影响.为了增强非均匀分布的变异分析方法的实用性,开发了支持非均匀分布的变异生成系统MujavaX,该系统是对广泛实践的Mujava工具的扩展与改进.采用一个实例系统验证了开发的MujavaX的正确性与可行性,实验结果表明该系统能够生成指定分布的非均匀变体集合.  相似文献   

10.
孙昌爱  吴思懿  张守峰  付安 《软件学报》2024,35(6):2844-2862
BPEL (business process execution language)是一种可执行的Web服务组合语言. 与传统程序相比, BPEL程序在编程模型、执行方式等方面存在较大差异. 这些新特点使得如何定位并修改测试阶段发现的BPEL程序故障成为挑战, 面向传统软件的故障修复技术难以直接应用于BPEL程序. 从变异分析角度出发, 提出一种基于模板匹配的BPEL程序故障修复方法BPELRepair. 为了克服基于变异分析的故障修复技术计算开销高的缺点, 从补丁生成、测试用例选择以及终止条件3个角度提出多种优化策略. 开发一个BPEL故障修复支持工具, 提高故障修复的自动化程度与效率. 采用经验研究的方式, 评估所提故障修复技术及优化策略的有效性. 实验结果表明, 所提故障修复方法能够成功修复约53%的BPEL程序故障; 所提优化策略能够显著降低搜索匹配、补丁程序验证、测试用例执行与故障修复等方面的开销.  相似文献   

11.
Although numerous empirical studies have been conducted to measure the fault detection capability of software analysis methods, few studies have been conducted using programs of similar size and characteristics. Therefore, it is difficult to derive meaningful conclusions on the relative detection ability and cost‐effectiveness of various fault detection methods. In order to compare fault detection capability objectively, experiments must be conducted using the same set of programs to evaluate all methods and must involve participants who possess comparable levels of technical expertise. One such experiment was ‘Conflict1’, which compared voting, a testing method, self‐checks, code reading by stepwise refinement and data‐flow analysis methods on eight versions of a battle simulation program. Since an inspection method was not included in the comparison, the authors conducted a follow‐up experiment ‘Conflict2’, in which five of the eight versions from Conflict1 were subjected to Fagan inspection. Conflict2 examined not only the number and types of faults detected by each method, but also the cost‐effectiveness of each method, by comparing the average amount of effort expended in detecting faults. The primary findings of the Conflict2 experiment are the following. First, voting detected the largest number of faults, followed by the testing method, Fagan inspection, self‐checks, code reading and data‐flow analysis. Second, the voting, testing and inspection methods were largely complementary to each other in the types of faults detected. Third, inspection was far more cost‐effective than the testing method studied. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

12.
Automating software testing activities can increase the quality and drastically decrease the cost of software development. Toward this direction, various automated test data generation tools have been developed. The majority of existing tools aim at structural testing, while a quite limited number aim at a higher level of testing thoroughness such as mutation. In this paper, an attempt toward automating the generation of mutation-based test cases by utilizing existing automated tools is proposed. This is achieved by reducing the killing mutants’ problem into a covering branches one. To this extent, this paper is motivated by the use of state of the art techniques and tools suitable for covering program branches when performing mutation. Tools and techniques such as symbolic execution, concolic execution, and evolutionary testing can be easily adopted toward automating the test input generation activity for the weak mutation testing criterion by simply utilizing a special form of the mutant schemata technique. The propositions made in this paper integrate three automated tools in order to illustrate and examine the method’s feasibility and effectiveness. The obtained results, based on a set of Java program units, indicate the applicability and effectiveness of the suggested technique. The results advocate that the proposed approach is able to guide existing automating tools in producing test cases according to the weak mutation testing criterion. Additionally, experimental results with the proposed mutation testing regime show that weak mutation is able to speedup the mutant execution time by at least 4.79 times when compared with strong mutation.  相似文献   

13.
Prioritizing test cases for regression testing   总被引:1,自引:0,他引:1  
Test case prioritization techniques schedule test cases for execution in an order that attempts to increase their effectiveness at meeting some performance goal. Various goals are possible; one involves rate of fault detection, a measure of how quickly faults are detected within the testing process. An improved rate of fault detection during testing can provide faster feedback on the system under test and let software engineers begin correcting faults earlier than might otherwise be possible. One application of prioritization techniques involves regression testing, the retesting of software following modifications; in this context, prioritization techniques can take advantage of information gathered about the previous execution of test cases to obtain test case orderings. We describe several techniques for using test execution information to prioritize test cases for regression testing, including: 1) techniques that order test cases based on their total coverage of code components; 2) techniques that order test cases based on their coverage of code components not previously covered; and 3) techniques that order test cases based on their estimated ability to reveal faults in the code components that they cover. We report the results of several experiments in which we applied these techniques to various test suites for various programs and measured the rates of fault detection achieved by the prioritized test suites, comparing those rates to the rates achieved by untreated, randomly ordered, and optimally ordered suites  相似文献   

14.
This paper presents a theory of testing that integrates into Hoare and He’s Unifying Theory of Programming (UTP). We give test cases a denotational semantics by viewing them as specification predicates. This reformulation of test cases allows for relating test cases via refinement to specifications and programs. Having such a refinement order that integrates test cases, we develop a testing theory for fault-based testing. Fault-based testing uses test data designed to demonstrate the absence of a set of pre-specified faults. A well-known fault-based technique is mutation testing. In mutation testing, first, faults are injected into a program by altering (mutating) its source code. Then, test cases that can detect these errors are designed. The assumption is that other faults will be caught, too. In this paper, we apply the mutation technique to both, specifications and programs. Using our theory of testing, two new test case generation laws for detecting injected (anticipated) faults are presented: one is based on the semantic level of UTP design predicates, the other on the algebraic properties of a small programming language.  相似文献   

15.
In this paper we discuss the advantages and limitations of a specification‐based software testing technique we call CEG‐BOR. There are two phases in this approach. First, informal software specifications are converted into cause‐effect graphs (CEG). Then, the Boolean OperatoR (BOR) strategy is applied to design and select test cases. The conversion of an informal specification into a CEG helps detect ambiguities and inconsistencies in the specification and sets the stage for design of test cases. The number of test cases needed to satisfy the BOR strategy grows linearly with the number of Boolean operators in CEG, and BOR testing guarantees detection of certain classes of Boolean operator faults. But, what makes the approach especially attractive is that the BOR based test suites appear to be very effective in detecting other fault types. We have empirically evaluated this broader aspect of the CEG‐BOR strategy on a simplified safety‐related real‐time control system, a set of N‐version programs, and on elements of a commercial data‐base system. In all cases, CEG‐BOR testing required fewer test cases than those generated for the applications without the use of CEG‐BOR. Furthermore, in all cases CEG‐BOR testing detected all faults that the original, and independently generated, application test‐suites did. In two instances CEG‐BOR testing uncovered additional faults. Our results indicate that the CEG‐BOR strategy is practical, scalable, and effective across diverse applications. We believe that it is a cost‐effective methodology for the development of systematic specification‐based software test‐suites.  相似文献   

16.
Interface Mutation: an approach for integration testing   总被引:1,自引:0,他引:1  
The need for test adequacy criteria is widely recognized. Several criteria have been proposed for the assessment of adequacy of tests at the unit level. However, there remains a lack of criteria for the assessment of the adequacy of tests generated during integration testing. We present a mutation based interprocedural criterion, named Interface Mutation (IM), suitable for use during integration testing. A case study to evaluate the proposed criterion is reported. In the study, the UNIX sort utility was seeded with errors and Interface Mutation evaluated by measuring the cost of its application and its error revealing effectiveness. Alternative IM criteria using different sets of Interface Mutation operators were also evaluated. While comparing the error revealing effectiveness of these Interface Mutation-based test sets with same size randomly generated test sets, we observed that in most cases Interface Mutation based test sets are superior. The results suggest that Interface Mutation offers a viable test adequacy criteria for use at the integration level  相似文献   

17.
Variability testing techniques search for effective and manageable test suites that lead to the rapid detection of faults in systems with high variability. Evaluating the effectiveness of these techniques in realistic settings is a must, but challenging due to the lack of variability-intensive systems with available code, automated tests and fault reports. In this article, we propose using the Drupal framework as a case study to evaluate variability testing techniques. First, we represent the framework variability using a feature model. Then, we report on extensive non-functional data extracted from the Drupal Git repository and the Drupal issue tracking system. Among other results, we identified 3392 faults in single features and 160 faults triggered by the interaction of up to four features in Drupal v7.23. We also found positive correlations relating the number of bugs in Drupal features to their size, cyclomatic complexity, number of changes and fault history. To show the feasibility of our work, we evaluated the effectiveness of non-functional data for test case prioritization in Drupal. Results show that non-functional attributes are effective at accelerating the detection of faults, outperforming related prioritization criteria as test case similarity.  相似文献   

18.
A number of coverage criteria have been proposed for testing classes and class clusters modeled with state machines. Previous research has revealed their limitations in terms of their capability to detect faults. As these criteria can be considered to execute the control flow structure of the state machine, we are investigating how data flow information can be used to improve them in the context of UML state machines. More specifically, we investigate how such data flow analysis can be used to further refine the selection of a cost‐effective test suite among alternative, adequate test suites for a given state machine criterion. This paper presents a comprehensive methodology to perform data flow analysis of UML state machines—with a specific focus on identifying the data flow from OCL guard conditions and operation contracts—and applies it to a widely referenced coverage criterion, the round‐trip path (transition tree) criterion. It reports on two case studies whose results show that data flow information can be used to select the best transition tree, in terms of cost effectiveness, when more than one satisfies the transition tree criterion. The results also suggest that different trees are complementary in terms of the data flow that they exercise, thus, leading to the detection of intersecting but distinct subsets of faults. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.
The mutation score is an important measure to evaluate the quality of the test cases. It is obtained by executing a lot of mutant programs generated by a set of operators. A common problem, however, is that some operators can generate unnecessary and redundant mutants. Because of this, different strategies were proposed to find a set of operators that generates a reduced number of mutants without decreasing the mutation score. However, the operator selection, in practice, may include real constraints and is dependent on diverse factors besides the number of mutants and score, such as: number of test data, execution time, number of revealed faults, number of equivalent mutants, etc. In fact this is a multi-objective problem, which does not have a single solution. Different set of operators exist for multiple objectives to be satisfied, and some restrictions can be used to choose among the existing sets. To make this choice possible, in this paper, we introduce a multi-objective strategy. We investigate three multi-objective algorithms and introduce a procedure to establish a set of operators to prioritize mutation score. Better results are obtained in comparison with traditional strategies.  相似文献   

20.
This paper compares the fault-detecting ability of several software test data adequacy criteria. It has previously been shown that if C1 properly covers C2, then C1 is guaranteed to be better at detecting faults than C2, in the following sense: a test suite selected by independent random selection of one test case from each subdomain induced by C1 is at least as likely to detect a fault as a test suite similarly selected using C2. In contrast, if C1 subsumes but does not properly cover C2, this is not necessarily the case. These results are used to compare a number of criteria, including several that have been proposed as stronger alternatives to branch testing. We compare the relative fault-detecting ability of data flow testing, mutation testing, and the condition-coverage techniques, to branch testing, showing that most of the criteria examined are guaranteed to be better than branch testing according to two probabilistic measures. We also show that there are criteria that can sometimes be poorer at detecting faults than substantially less expensive criteria  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号