期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An Experimental Evaluation of Data Flow and Mutation Testing

A. Jefferson Offutt Jie Pan Kanupriya Tewary Tong Zhang 《Software》1996,26(2):165-176

Two experimental comparisons of data flow and mutation testing are presented. These techniques are widely considered to be effective for unit-level software testing, but can only be analytically compared to a limited extent. We compare the techniques by evaluating the effectiveness of test data developed for each. We develop ten independent sets of test data for a number of programs: five to satisfy the mutation criterion and five to satisfy the all-uses data-flow criterion. These test sets are developed using automated tools, in a manner consistent with the way a test engineer might be expected to generate test data in practice. We use these test sets in two separate experiments. First we measure the effectiveness of the test data that was developed for one technique in terms of the other. Second, we investigate the ability of the test sets to find faults. We place a number of faults into each of our subject programs, and measure the number of faults that are detected by the test sets. Our results indicate that while both techniques are effective, mutation-adequate test sets are closer to satisfying the data flow criterion, and detect more faults. 相似文献

2.

An empirical comparison of data flow and mutation-based test adequacy criteria

Aditya P. Mathur W. Eric Wong 《Software Testing, Verification and Reliability》1994,4(1):9-31

Evaluation of the adequacy of a test set consisting of one or more test cases is a problem oftes encountered in software testing environments. Two test adequacy criiteria are considered, namely the data flow based all-uses criterion and a mutation based criterion. An empirical study was conducted to compare the ‘difficulty’ of satisfying the two criteria and their costs. Similar studies conducted in the past are discussed in the light of this study. A discussion is also presented of how and why the results of this study, when viewed in conjunction with the results of earlier comparisons of testing methods, are useful to a software test team. 相似文献

3.

A quest for appropriate software fault models: Case studies on fault detection effectiveness of model-based test generation techniques

《Information and Software Technology》2006,48(10):949-959

Model-based test generation (MBTG) is becoming an area of active research. These techniques differ in terms of (1) modeling notations used, and (2) the adequacy criteria used for test generation. This paper (1) reviews different classes of MBTG techniques at a conceptual level, and (2) reports results of three case studies comparing various techniques in terms of their fault detection effectiveness. Our results indicate that MBTG technique which employs mutation and explicitly generates state verification sequences has better fault detection effectiveness than those based on boundary values, and predicate coverage criteria for transitions. Instead of a default adequacy criteria, certain techniques allow the user to specify test objectives in addition to the model. Our experience indicates that the task of defining appropriate test objectives is not intuitive. Furthermore, notations provided to describe such test objectives may have inadequate expressive power. We posit the need for a suitable fault modeling notation which also treats domain invariants as first class entities. 相似文献

4.

On the adoption of MC/DC and control-flow adequacy for a tight integration of program testing and statistical fault localization

《Information and Software Technology》2013,55(5):897-917

ContextTesting and debugging consume a significant portion of software development effort. Both processes are usually conducted independently despite their close relationship with each other. Test adequacy is vital for developers to assure that sufficient testing effort has been made, while finding all the faults in a program as soon as possible is equally important. A tight integration between testing and debugging activities is essential.ObjectiveThe paper aims at finding whether three factors, namely, the adequacy criterion to gauge a test suite, the size of a prioritized test suite, and the percentage of such a test suite used in fault localization, have significant impacts on integrating test case prioritization techniques with statistical fault localization techniques.MethodWe conduct a controlled experiment to investigate the effectiveness of applying adequate test suites to locate faults in a benchmark suite of seven Siemens programs and four real-life UNIX utility programs using three adequacy criteria, 16 test case prioritization techniques, and four statistical fault localization techniques. We measure the proportion of code needed to be examined in order to locate a fault as the effectiveness of statistical fault localization techniques. We also investigate the integration of test case prioritization and statistical fault localization with postmortem analysis.ResultThe main result shows that on average, it is more effective for a statistical fault localization technique to utilize the execution results of a MC/DC-adequate test suite than those of a branch-adequate test suite, and is in turn more effective to utilize the execution results of a branch-adequate test suite than those of a statement-adequate test suite. On the other hand, we find that none of the fault localization techniques studied can be sufficiently effective in suggesting fault-relevant statements that can fit easily into one debug window of a typical IDE.ConclusionWe find that the adequacy criterion and the percentage of a prioritized test suite utilized are major factors affecting the effectiveness of statistical fault localization techniques. In our experiment, the adoption of a stronger adequacy criterion can lead to more effective integration of testing and debugging. 相似文献

5.

Can fault‐exposure‐potential estimates improve the fault detection abilities of test suites?

Wei Chen Roland H. Untch Gregg Rothermel Sebastian Elbaum Jeffery von Ronne 《Software Testing, Verification and Reliability》2002,12(4):197-218

Code‐coverage‐based test data adequacy criteria typically treat all coverable code elements (such as statements, basic blocks or outcomes of decisions) as equal. In practice, however, the probability that a test case can expose a fault in a code element varies: some faults are more easily revealed than others. Thus, several researchers have suggested that if one could estimate the probability that a fault in a code element will cause a failure, one could use this estimate to determine the number of executions of a code element that are required to achieve a certain level of confidence in that element's correctness. This estimate, in turn, could be used to improve the fault‐detection effectiveness of test suites and help testers distribute testing resources more effectively. This conjecture is intriguing; however, like many such conjectures it has never been directly examined empirically. If empirical evidence were to support this conjecture, it would motivate further research into methodologies for obtaining fault‐exposure‐potential estimates and incorporating them into test data adequacy criteria. This paper reports the results of experiments conducted to investigate the effects of incorporating an estimate of fault‐exposure probability into the statement coverage test data adequacy criterion. The results of these experiments, however, ran contrary to the conjectures of previous researchers. Although incorporation of the estimates did produce statistically significant increases in the fault‐detection effectiveness of test suites, these increases were quite small, suggesting that the approach might not be able to produce the gains hoped for and might not be worth the cost of its employment. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

6.

Comparing coverage criteria for dynamic web application: An empirical evaluation

《Computer Standards & Interfaces》2021

Web applications have become popular and a preferred mean for users to do various crucial tasks such as selling and buying goods, doing short tasks, controlling smart houses and bank account management. The correctness of all such applications is important and requires thorough testing. Structural testing is widely used to achieve correctness in traditional software's, however, for web applications, it is challenging because of its dynamic and heterogeneous nature. To achieve desired structural coverage of web applications different dynamic coverage criteria are used as a quality assessment indicator. However, there is a lack of empirical evidence regarding the effectiveness of the proposed coverage criteria. In this paper, we conduct an empirical evaluation by evaluating and comparing the fault detection effectiveness and efficiency of various dynamic coverage criteria by performing mutation analysis. We conduct a series of experiments to assess and compare four widely used coverage criteria on seven open-source case studies including small to large scale applications. We performed mutation analysis by first generating different faulty versions (mutants) for the case studies and then by executing test suites to record mutation score for each criterion. The results from most of the subject applications show that DOM coverage is the most effective and efficient criterion followed by Virtual DOM, HTML Element and Statement coverage criteria. 相似文献

7.

An assessment of operational coverage as both an adequacy and a selection criterion for operational profile based testing

Breno Miranda Antonia Bertolino 《Software Quality Journal》2018,26(4):1571-1594

While the relation between code coverage measures and fault detection is actively studied, only few works have investigated the correlation between measures of coverage and of reliability. In this work, we introduce a novel approach to measuring code coverage, called the operational coverage, that takes into account how much the program’s entities are exercised so to reflect the profile of usage into the measure of coverage. Operational coverage is proposed as (i) an adequacy criterion, i.e., to assess the thoroughness of a black box test suite derived from the operational profile, and as (ii) a selection criterion, i.e., to select test cases for operational profile-based testing. Our empirical evaluation showed that operational coverage is better correlated than traditional coverage with the probability that the next test case derived according to the user’s profile will not fail. This result suggests that our approach could provide a good stopping rule for operational profile-based testing. With respect to test case selection, our investigations revealed that operational coverage outperformed the traditional one in terms of test suite size and fault detection capability when we look at the average results. 相似文献

8.

基于故障检测上下文的等价变异体识别算法

于畅王雅文林欢宫云战《计算机研究与发展》2021,58(1):83-97

等价变异体识别一直是阻碍变异测试在工业界得以广泛应用的一个关键难题.为此提出了一种基于故障检测上下文的等价变异体识别算法.该算法通过静态分析技术抽取程序中与故障检测条件相关的代码上下文信息,以构造故障检测上下文;接着,故障检测上下文被转换为文档模型,经过一个文档表示学习网络进行编码;最后通过机器学习模型将变异体分类为等价或非等价变异.在包含了22个C程序和118000个变异体样本的训练集上,该算法取得91%的分类精准度和82%的召回率;同时在跨项目交叉验证中,机器学习模型取得了77%的精准度和78%的召回率.该结果表明基于故障检测上下文的识别技术能够有效地提高等价变异体分类的精准性和泛用性,为提高变异测试技术的有效性提供了技术支持. 相似文献

9.

Interface Mutation Test Adequacy Criterion: An Empirical Evaluation

MÁrcio Eduardo Delamaro JosÉ Carlos Maldonado Alberto Pasquini Aditya P. Mathur 《Empirical Software Engineering》2001,6(2):111-142

An experiment was conducted to evaluate an inter-procedural test adequacy criterion named Interface Mutation. Program SPACE, developed for the European Space Agency (ESA), was used in this experiment. The development record available for this program was used to find the faults uncovered during its development. Using this information the test process was reproduced starting with a version of SPACE containing several faults and then applying Interface Mutation. Thus we could evaluate the fault revealing effectiveness of Interface Mutation. Results from the experiment suggest that (a) the application of Interface Mutation favors the selection of fault revealing test cases when they exist and (b) Interface Mutation tends to select fault revealing test cases more efficiently than in the case where random selection is used. 相似文献

10.

Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria 总被引：2，自引：0，他引：2

Andrews J.H. Briand L.C. Labiche Y. Namin A.S. 《IEEE transactions on pattern analysis and machine intelligence》2006,32(8):608-624

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria (block, decision, C-use, and P-use), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results: generated mutants can be used to predict the detection effectiveness of real faults. Applying such a mutation analysis, we then investigate the relative cost and effectiveness of the above-mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants, which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then; compared with published studies, plausible reasons for the differences are provided, and the research leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment 相似文献

11.

A state-based approach to integration testing based on UML models 总被引：3，自引：0，他引：3

Shaukat Ali Lionel C. Briand Muhammad Jaffar-ur Rehman Hajra Asghar Muhammad Zohaib Z. Iqbal Aamer Nadeem 《Information and Software Technology》2007,49(11-12):1087-1106

Correct functioning of object-oriented software depends upon the successful integration of classes. While individual classes may function correctly, several new faults can arise when these classes are integrated together. In this paper, we present a technique to enhance testing of interactions among modal classes. The technique combines UML collaboration diagrams and statecharts to automatically generate an intermediate test model, called SCOTEM (State COllaboration TEst Model). The SCOTEM is then used to generate valid test paths. We also define various coverage criteria to generate test paths from the SCOTEM model. In order to assess our technique, we have developed a tool and applied it to a case study to investigate its fault detection capability. The results show that the proposed technique effectively detects all the seeded integration faults when complying with the most demanding adequacy criterion and still achieves reasonably good results for less expensive adequacy criteria. 相似文献

12.

A formal analysis of the subsume relation between software testadequacy criteria

Hong Zhu 《IEEE transactions on pattern analysis and machine intelligence》1996,22(4):248-255

Software test adequacy criteria are rules to determine whether a software system has been adequately tested. A central question in the study of test adequacy criteria is how they relate to fault detecting ability. We identify two idealized software testing scenarios. In the first scenario, which we call prior testing scenario, software testers are provided with an adequacy criterion in addition to the software under test. The knowledge of the adequacy criterion is used to generate test cases. In the second scenario, which we call posterior testing scenario, software testers are not provided with the knowledge of adequacy criterion. The criterion is only used to decide when to stop the generation of test cases. In 1993, Frankl and Weyuker proved that the subsume relation between software test adequacy criteria does not guarantee better fault detecting ability in the prior testing scenario. We investigate the posterior testing scenario and prove that in this scenario the subsume relation does guarantee a better fault detecting ability. Two measures of fault detecting ability will be used, the probability of detecting faults and the expected number of exposed errors 相似文献

13.

An experimental comparison of the effectiveness of branch testingand data flow testing

Frankl P.G. Weiss S.N. 《IEEE transactions on pattern analysis and machine intelligence》1993,19(8):774-787

An experiment comparing the effectiveness of the all-uses and all-edges test data adequacy criteria is discussed. The experiment was designed to overcome some of the deficiencies of previous software testing experiments. A large number of test sets was randomly generated for each of nine subject programs with subtle errors. For each test set, the percentages of executable edges and definition-use associations covered were measured, and it was determined whether the test set exposed an error. Hypothesis testing was used to investigate whether all-uses adequate test sets are more likely to expose errors than are all-edges adequate test sets. Logistic regression analysis was used to investigate whether the probability that a test set exposes an error increases as the percentage of definition-use associations or edges covered by it increases. Error exposing ability was shown to be strongly positively correlated to percentage of covered definition-use associations in only four of the nine subjects. Error exposing ability was also shown to be positively correlated to the percentage of covered edges in four different subjects, but the relationship was weaker 相似文献

14.

CMuJava:一个面向Java程序并发变异体生成系统

孙昌爱耿宁代贺鹏顾友达《软件学报》2022,33(2):397-409

并发程序由多个共享存储空间并发执行的流程组成.由于流程之间执行次序的不确定性,使得并发软件系统的测试比较困难.变异测试是一种基于故障的软件测试技术,广泛用于评估测试用例集的充分性和测试技术的有效性.将变异测试应用于并发程序的一个关键问题是,如何高效地生成大量的模拟并发故障的变异体集合.给出了一种并发程序的变异测试框架,... 相似文献

15.

Novel Metrics for Mutation Analysis

Savas Takan Gokmen Katipoglu 《计算机系统科学与工程》2023,46(2):2075-2089

A measure of the “goodness” or efficiency of the test suite is used to determine the proficiency of a test suite. The appropriateness of the test suite is determined through mutation analysis. Several Finite State Machine (FSM) mutants are produced in mutation analysis by injecting errors against hypotheses. These mutants serve as test subjects for the test suite (TS). The effectiveness of the test suite is proportional to the number of eliminated mutants. The most effective test suite is the one that removes the most significant number of mutants at the optimal time. It is difficult to determine the fault detection ratio of the system. Because it is difficult to identify the system’s potential flaws precisely. In mutation testing, the Fault Detection Ratio (FDR) metric is currently used to express the adequacy of a test suite. However, there are some issues with this metric. If both test suites have the same defect detection rate, the smaller of the two tests is preferred. The test case (TC) is affected by the same issue. The smaller two test cases with identical performance are assumed to have superior performance. Another difficulty involves time. The performance of numerous vehicles claiming to have a perfect mutant capture time is problematic. Our study developed three metrics to address these issues: , , and In this context, most used test generation tools were examined and tested using the developed metrics. Thanks to the metrics we have developed, the research contributes to eliminating the problems related to performance measurement by integrating the missing parameters into the system. 相似文献

16.

基于改进粒子群算法的变异体选择优化

王曙燕杨悦孙家泽《计算机应用研究》2017,34(3)

变异测试是常用的测试方法之一,变异测试分析的过程中计算开销会比较大,问题主要集中于测试过程中会产生大量的变异体,为了减少变异体的数量,提出用标准粒子群聚类算法进行选择优化,但标准粒子群算法在被测数据量增加到一定数量的时候,它的迭代次数就会增加、收敛速度就会下降。针对以上问题提出基于改进的粒子群算法对变异体进行选择优化。通过对变异体集合进行聚类分区,增强变异体集合的多态性,从而对粒子群算法改进优化。实验结果表明在不影响测试充分度的前提下,使变异体的数量大幅度减少,同时与K-means算法以及标准粒子群算法相比之下,改进后的方法具有更好的优化效果。相似文献

17.

Interface Mutation: an approach for integration testing 总被引：1，自引：0，他引：1

Delamaro M.E. Maidonado J.C. Mathur A.P. 《IEEE transactions on pattern analysis and machine intelligence》2001,27(3):228-247

The need for test adequacy criteria is widely recognized. Several criteria have been proposed for the assessment of adequacy of tests at the unit level. However, there remains a lack of criteria for the assessment of the adequacy of tests generated during integration testing. We present a mutation based interprocedural criterion, named Interface Mutation (IM), suitable for use during integration testing. A case study to evaluate the proposed criterion is reported. In the study, the UNIX sort utility was seeded with errors and Interface Mutation evaluated by measuring the cost of its application and its error revealing effectiveness. Alternative IM criteria using different sets of Interface Mutation operators were also evaluated. While comparing the error revealing effectiveness of these Interface Mutation-based test sets with same size randomly generated test sets, we observed that in most cases Interface Mutation based test sets are superior. The results suggest that Interface Mutation offers a viable test adequacy criteria for use at the integration level 相似文献

18.

On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques

Hyunsook Do Rothermel G. 《IEEE transactions on pattern analysis and machine intelligence》2006,32(9):733-752

Regression testing is an important activity in the software life cycle, but it can also be very expensive. To reduce the cost of regression testing, software testers may prioritize their test cases so that those which are more important, by some measure, are run earlier in the regression testing process. One potential goal of test case prioritization techniques is to increase a test suite's rate of fault detection (how quickly, in a run of its test cases, that test suite can detect faults). Previous work has shown that prioritization can improve a test suite's rate of fault detection, but the assessment of prioritization techniques has been limited primarily to hand-seeded faults, largely due to the belief that such faults are more realistic than automatically generated (mutation) faults. A recent empirical study, however, suggests that mutation faults can be representative of real faults and that the use of hand-seeded faults can be problematic for the validity of empirical results focusing on fault detection. We have therefore designed and performed two controlled experiments assessing the ability of prioritization techniques to improve the rate of fault detection of test case prioritization techniques, measured relative to mutation faults. Our results show that prioritization can be effective relative to the faults considered, and they expose ways in which that effectiveness can vary with characteristics of faults and test suites. More importantly, a comparison of our results with those collected using hand-seeded faults reveals several implications for researchers performing empirical studies of test case prioritization techniques in particular and testing techniques in general 相似文献

19.

Genetic Algorithm Training of Elman Neural Network in Motor Fault Detection 总被引：2，自引：0，他引：2

X. Z. Gao S. J. Ovaska 《Neural computing & applications》2002,11(1):37-44

Fault detection methods are crucial in acquiring safe and reliable operation in motor drive systems. Remarkable maintenance costs can also be saved by applying advanced detection techniques to find potential failures. However, conventional motor fault detection approaches often have to work with explicit mathematic models. In addition, most of them are deterministic or non-adaptive, and therefore cannot be used in time-varying cases. In this paper, we propose an Elman neural network-based motor fault detection scheme to address these difficulties. The Elman neural network has the advantageous time series prediction capability because of its memory nodes, as well as local recurrent connections. Motor faults are detected from the variants in the expectation of feature signal prediction error. A Genetic Algorithm (GA) aided training strategy for the Elman neural network is further introduced to improve the approximation accuracy, and achieve better detection performance. Experiments with a practical automobile transmission gearbox with an artificial fault are carried out to verify the effectiveness of our method. Encouraging fault detection results have been obtained without any prior information on the gearbox model. 相似文献

20.

On generating mutants for AspectJ programs

《Information and Software Technology》2012,54(8):900-914

ContextMutation analysis has been widely used in research studies to evaluate the effectiveness of test suites and testing techniques. Faulty versions (i.e., mutants) of a program are generated such that each mutant contains one seeded fault. The mutation score provides a measure of effectiveness.ObjectiveWe study three problems with the use of mutation analysis for testing AspectJ programs:

•The manual identification and removal of equivalent mutants is difficult and time consuming. We calculate the percentage of equivalent mutants generated for benchmark AspectJ programs using available mutation tools.
•The generated mutants need to cover the various fault types described in the literature on fault models for AspectJ programs. We measure the distribution of the mutants generated using available mutation tools with respect to the AspectJ fault types.
•We measure the difficulty of killing the generated mutants.

We propose the use of simple analysis of the subject programs to prevent the generation of some equivalent mutants.MethodWe revised existing AspectJ fault models and presented a fault model that removes the problems in existing fault models, such as overlapping between fault types and missing fault types. We also defined three new fault types that occur due to incorrect data-flow interactions occurring in AspectJ programs. We used three mutation tools: AjMutator, Proteum/AJ, and MuJava on three AspectJ programs. To measure the difficulty of killing the mutants created using a mutation operator, we compared the average number of the mutants killed by 10 test suites that satisfy block coverage criterion.ResultsA high percentage of the mutants are equivalent. The mutation tools do not cover all the fault types. Only 4 out of 27 operators generated mutants that were easy to kill.ConclusionsOur analysis approach removed about 80% of the equivalent mutants. Higher order mutation is needed to cover all the fault types. 相似文献