共查询到20条相似文献,搜索用时 0 毫秒
1.
Tong-Seng Quah 《Information Sciences》2009,179(4):430-5009
In this study, defect tracking is used as a proxy method to predict software readiness. The number of remaining defects in an application under development is one of the most important factors that allow one to decide if a piece of software is ready to be released. By comparing predicted number of faults and number of faults discovered in testing, software manager can decide whether the software is likely ready to be released or not.The predictive model developed in this research can predict: (i) the number of faults (defects) likely to exist, (ii) the estimated number of code changes required to correct a fault and (iii) the estimated amount of time (in minutes) needed to make the changes in respective classes of the application. The model uses product metrics as independent variables to do predictions. These metrics are selected depending on the nature of source code with regards to architecture layers, types of faults and contribution factors of these metrics. The use of neural network model with genetic training strategy is introduced to improve prediction results for estimating software readiness in this study. This genetic-net combines a genetic algorithm with a statistical estimator to produce a model which also shows the usefulness of inputs.The model is divided into three parts: (1) prediction model for presentation logic tier (2) prediction model for business tier and (3) prediction model for data access tier. Existing object-oriented metrics and complexity software metrics are used in the business tier prediction model. New sets of metrics have been proposed for the presentation logic tier and data access tier. These metrics are validated using data extracted from real world applications. The trained models can be used as tools to assist software mangers in making software release decisions. 相似文献
2.
Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task.In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system. 相似文献
3.
Heng?Li Tse-Hsun??Chen Weiyi?Shang Ahmed?E.?Hassan 《Empirical Software Engineering》2018,23(5):2655-2694
Software developers insert logging statements in their source code to record important runtime information; such logged information is valuable for understanding system usage in production and debugging system failures. However, providing proper logging statements remains a manual and challenging task. Missing an important logging statement may increase the difficulty of debugging a system failure, while too much logging can increase system overhead and mask the truly important information. Intuitively, the actual functionality of a software component is one of the major drivers behind logging decisions. For instance, a method maintaining network communications is more likely to be logged than getters and setters. In this paper, we used automatically-computed topics of a code snippet to approximate the functionality of a code snippet. We studied the relationship between the topics of a code snippet and the likelihood of a code snippet being logged (i.e., to contain a logging statement). Our driving intuition is that certain topics in the source code are more likely to be logged than others. To validate our intuition, we conducted a case study on six open source systems, and we found that i) there exists a small number of “log-intensive” topics that are more likely to be logged than other topics; ii) each pair of the studied systems share 12% to 62% common topics, and the likelihood of logging such common topics has a statistically significant correlation of 0.35 to 0.62 among all the studied systems; and iii) our topic-based metrics help explain the likelihood of a code snippet being logged, providing an improvement of 3% to 13% on AUC and 6% to 16% on balanced accuracy over a set of baseline metrics that capture the structural information of a code snippet. Our findings highlight that topics contain valuable information that can help guide and drive developers’ logging decisions. 相似文献
4.
Experimental field data are used at different levels of complexity to calibrate, validate and improve agro-ecosystem models to enhance their reliability for regional impact assessment. A methodological framework and software are presented to evaluate and classify data sets into four classes regarding their suitability for different modelling purposes. Weighting of inputs and variables for testing was set from the aspect of crop modelling. The software allows users to adjust weights according to their specific requirements. Background information is given for the variables with respect to their relevance for modelling and possible uncertainties. Examples are given for data sets of the different classes. The framework helps to assemble high quality data bases, to select data from data bases according to modellers requirements and gives guidelines to experimentalists for experimental design and decide on the most effective measurements to improve the usefulness of their data for modelling, statistical analysis and data assimilation. 相似文献
5.
Spatial classification using fuzzy membership models 总被引:2,自引:0,他引:2
Kent J.T. Mardia K.V. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(5):659-671
In the usual statistical approach to spatial classification, it is assumed that each pixel belongs to precisely one of a small number of known groups. This framework is extended to include mixed-pixel data; then, only a proportion of each pixel belongs to each group. Two models based on multivariate Gaussian random fields are proposed to model this fuzzy membership process. The problems of predicting the group membership and estimating the parameters are discussed. Some simulations are presented to study the properties of this approach, and an example is given using Landsat remote-sensing data 相似文献
6.
Karunanithi N. Whitley D. Malaiya Y.K. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(7):563-574
The usefulness of connectionist models for software reliability growth prediction is illustrated. The applicability of the connectionist approach is explored using various network models, training regimes, and data representation methods. An empirical comparison is made between this approach and five well-known software reliability growth models using actual data sets from several different software projects. The results presented suggest that connectionist models may adapt well across different data sets and exhibit a better predictive accuracy. The analysis shows that the connectionist approach is capable of developing models of varying complexity 相似文献
7.
8.
Software classification models have been regarded as an essential support tool in performing measurement and analysis processes. Most of the established models are single-cycled in the model usage stage, and thus require the measurement data of all the model’s variables to be simultaneously collected and utilized for classifying an unseen case within only a single decision cycle. Conversely, the multi-cycled model allows the measurement data of all the model’s variables to be gradually collected and utilized for such a classification within more than one decision cycle, and thus intuitively seems to have better classification efficiency but poorer classification accuracy. Software project managers often have difficulties in choosing an appropriate classification model that is better suited to their specific environments and needs. However, this important topic is not adequately explored in software measurement and analysis literature. By using an industrial software measurement dataset of NASA KC2, this paper explores the quantitative performance comparisons of the classification accuracy and efficiency of the discriminant analysis (DA)- and logistic regression (LR)-based single-cycled models and the decision tree (DT)-based (C4.5 and ECHAID algorithms) multi-cycled models. The experimental results suggest that the re-appraisal cost of the Type I MR, the software failure cost of Type II MR and the data collection cost of software measurements should be considered simultaneously when choosing an appropriate classification model. 相似文献
9.
Hardware–software partitioning (HW/SW) divides an application into software and hardware. It is one of the crucial steps in embedded system design. For a given task, hardware with different areas may provide different execution speeds due to the potential of parallel execution in hardware implementation. Thus, one task may have multiple-choice in hardware implementation according to the available hardware areas. Existing HW/SW partitioning approaches typically consider only a single implementation manner in hardware, overlooking the multiple-choice of hardware implementations. This paper presents a computing model to cater for the HW/SW partitioning problems with the multiple-choice implementation in hardware. An efficient heuristic algorithm is proposed to rapidly generate approximate solution, that is further refined by a tabu search algorithm also customized in this paper. Moreover, a dynamic programming algorithm is proposed for the exact solution of the relatively small problems. Extensive simulation results show that the approximate solutions are very close to the exact ones, and they can be refined by tabu search to the solutions with the error no more than 1.5% for all cases considered in this paper. 相似文献
10.
Texture classification using noncausal hidden Markov models 总被引:1,自引:0,他引:1
Povlow B.R. Dunn S.M. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(10):1010-1014
This paper addresses the problem of using noncausal hidden Markov models (HMMs) for texture classification. In noncausal models, the state of each pixel may be dependent on its neighbors in all directions. New algorithms are given to learn the parameters of a noncausal HMM of a texture and to classify it into one of several learned categories. Texture classification results using these algorithms are provided 相似文献
11.
Classification models based on statistical data have been developed that make it possible to identify a potential insider based on the indicators that manifest in the context of data incompleteness regarding the insider’s behavior. 相似文献
12.
Arie ten Cate 《Computational statistics & data analysis》2009,53(6):2055-2060
Simultaneous econometric models may contain pairs of complementary inequalities. It is discussed how to reformulate such models and solve them with econometric software which can handle only equalities. Two approaches are applied: the normal map representation and the Fischer-Burmeister NCP function. The latter seems to work best. The software programs TSP, SAS/ETS and EViews are tested. The test model describes two markets for electricity, each with fluctuating demand and an endogenous production capacity; the capacity of the trade link between the regions is also endogenous. 相似文献
13.
Lior Rokach 《Pattern recognition》2008,41(5):1676-1700
Feature set partitioning generalizes the task of feature selection by partitioning the feature set into subsets of features that are collectively useful, rather than by finding a single useful subset of features. This paper presents a novel feature set partitioning approach that is based on a genetic algorithm. As part of this new approach a new encoding schema is also proposed and its properties are discussed. We examine the effectiveness of using a Vapnik–Chervonenkis dimension bound for evaluating the fitness function of multiple, oblivious tree classifiers. The new algorithm was tested on various datasets and the results indicate the superiority of the proposed algorithm to other methods. 相似文献
14.
Kitchenham B. Pfleeger S.L. Fenton N. 《IEEE transactions on pattern analysis and machine intelligence》1995,21(12):929-944
In this paper we propose a framework for validating software measurement. We start by defining a measurement structure model that identifies the elementary component of measures and the measurement process, and then consider five other models involved in measurement: unit definition models, instrumentation models, attribute relationship models, measurement protocols and entity population models. We consider a number of measures from the viewpoint of our measurement validation framework and identify a number of shortcomings; in particular we identify a number of problems with the construction of function points. We also compare our view of measurement validation with ideas presented by other researchers and identify a number of areas of disagreement. Finally, we suggest several rules that practitioners and researchers can use to avoid measurement problems, including the use of measurement vectors rather than artificially contrived scalars 相似文献
15.
A low-complex algorithm is proposed for the hardware/software partitioning. The proposed algorithm employs dynamic programming principles while accounting for communication delays. It is shown that the time complexity of the latest algorithm has been reduced from O(n2⋅A) to O(n⋅A), without increase in space complexity, for n code fragments and hardware area A. 相似文献
16.
We present an interactive software package for implementing the supervised classification task during electromyographic (EMG) signal decomposition process using a fuzzy k-NN classifier and utilizing the MATLAB high-level programming language and its interactive environment. The method employs an assertion-based classification that takes into account a combination of motor unit potential (MUP) shapes and two modes of use of motor unit firing pattern information: the passive and the active modes. The developed package consists of several graphical user interfaces used to detect individual MUP waveforms from a raw EMG signal, extract relevant features, and classify the MUPs into motor unit potential trains (MUPTs) using assertion-based classifiers. 相似文献
17.
Software managers are routinely confronted with software projects that contain errors or inconsistencies and exceed budget and time limits. By mining software repositories with comprehensible data mining techniques, predictive models can be induced that offer software managers the insights they need to tackle these quality and budgeting problems in an efficient way. This paper deals with the role that the Ant Colony Optimization (ACO)-based classification technique AntMiner+ can play as a comprehensible data mining technique to predict erroneous software modules. In an empirical comparison on three real-world public datasets, the rule-based models produced by AntMiner+ are shown to achieve a predictive accuracy that is competitive to that of the models induced by several other included classification techniques, such as C4.5, logistic regression and support vector machines. In addition, we will argue that the intuitiveness and comprehensibility of the AntMiner+ models can be considered superior to the latter models. 相似文献
18.
Hidden Markov models (HMM) are a widely used tool for sequence modelling. In the sequence classification case, the standard approach consists of training one HMM for each class and then using a standard Bayesian classification rule. In this paper, we introduce a novel classification scheme for sequences based on HMMs, which is obtained by extending the recently proposed similarity-based classification paradigm to HMM-based classification. In this approach, each object is described by the vector of its similarities with respect to a predetermined set of other objects, where these similarities are supported by HMMs. A central problem is the high dimensionality of resulting space, and, to deal with it, three alternatives are investigated. Synthetic and real experiments show that the similarity-based approach outperforms standard HMM classification schemes. 相似文献
19.
Software systems are seen more and more as evolutive systems. At the design phase, software is constantly in adaptation by the building process itself, and at runtime, it can be adapted in response to changing conditions in the executing environment such as location or resources. Adaptation is generally difficult to specify because of its cross-cutting impact on software. This article introduces an approach to unify adaptation at design and at runtime based on Aspect Oriented Modeling. Our approach proposes a unified aspect metamodel and a platform that realizes two different weaving processes to achieve design and runtime adaptations. This approach is used in a Dynamic Software Product Line which derives products that can be configured at design time and adapted at runtime in order to dynamically fit new requirements or resource changes. Such products are implemented using the Service Component Architecture and Java. Finally, we illustrate the use of our approach based on an adaptive e-shopping scenario. The main advantages of this unification are: a clear separation of concerns, the self-contained aspect model that can be weaved during the design and execution, and the platform independence guaranteed by two different types of weaving. 相似文献
20.
Guoxiang Gu 《Automatic Control, IEEE Transactions on》2002,47(3):486-490
Uncertainty validation using frequency response data has been studied by several authors. If the uncertainty is assumed to be stable, then the problem amounts to one of boundary interpolation. It is shown in this paper that for more general uncertainty models, an additional constraint is needed in order for the results of Boulet et al. (1998) and Chen (1997) to be applicable. Boundary interpolation, together with the proposed constraint, gives a satisfactory answer to the validation problem for uncertainty models 相似文献