首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate because it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical cross-validation methods, importance sampling and k-fold cross-validation. As illustrative examples, we use multilayer perceptron neural networks and gaussian processes with Markov chain Monte Carlo sampling in one toy problem and two challenging real-world problems.  相似文献   

2.
The paper examines the validation of prediction models of acceptance of academic placement offers by students in the context of international applications at a large metropolitan Australian University using data mining techniques. Earlier works in enrolment management have examined various classification problems such as inquiry to enrol, persistence and graduation. The data and settings from different institutions are often different, which implies that in order to find out which models and techniques are applicable at a given university, the dataset from that university needs to be used in the validation effort. The whole dataset from the Australian university comprised 24,283 offers made to international applicants from the year 2008 to 2013. Every year around 2000–2500 new international students who accept offers of academic placement commence their studies. The important predictors for the acceptance of offers were as follows: the chosen course and faculty, whether the student was awarded any form of scholarship, and also the visa assessment level of the country by the immigration department. Prediction models were developed using a number of classification methods such as logistic regression, Naïve Bayes, decision trees, support vector machines, random forests, k-nearest neighbour, neural networks and their performances compared. Overall, the neural network prediction model with a single hidden layer produced the best result.  相似文献   

3.
Modeling Spatial-Temporal Data with a Short Observation History   总被引:1,自引:0,他引:1  
A novel method is proposed for forecasting spatial-temporal data with a short observation history sampled on a uniform grid. The method is based on spatial-temporal autoregressive modeling where the predictions of the response at the subsequent temporal layer are obtained using the response values from a recent history in a spatial neighborhood of each sampling point. Several modeling aspects such as covariance structure and sampling, as well as identification, model estimation and forecasting issues, are discussed. Extensive experimental evaluation is performed on synthetic and real-life data. The proposed forecasting models were shown capable of providing a near optimal prediction accuracy on simulated stationary spatial-temporal data in the presence of additive noise and a correlated model error. Results on a spatial-temporal agricultural dataset indicate that the proposed methods can provide useful prediction on complex real-life data with a short observation history.  相似文献   

4.
Bayesian Network Models for Web Effort Prediction: A Comparative Study   总被引:1,自引:0,他引:1  
OBJECTIVE – The objective of this paper is to compare, using a cross-company dataset, several Bayesian Network (BN) models for Web effort estimation. METHOD – Eight BNs were built; four automatically using Hugin and PowerSoft tools with two training sets, each with 130 Web projects from the Tukutuku database; four using a causal graph elicited by a domain expert, with parameters automatically fit using the same training sets used in the automated elicitation (hybrid models). Their accuracy was measured using two validation sets, each containing data on 65 projects, and point estimates. As a benchmark, the BN-based estimates were also compared to estimates obtained using Manual StepWise Regression (MSWR), Case-Based Reasoning (CBR), mean- and median-based effort models. RESULTS – MSWR presented significantly better predictions than any of the BN models built herein, and in addition was the only technique to provide significantly superior predictions to a Median-based effort model. CONCLUSIONS – This paper investigated data-driven and hybrid BN models using project data from the Tukutuku database. Our results suggest that the use of simpler models, such as the median effort, can outperform more complex models, such as BNs. In addition, MSWR seemed to be the only effective technique for Web effort estimation.  相似文献   

5.
6.
ContextParametric cost estimation models need to be continuously calibrated and improved to assure more accurate software estimates and reflect changing software development contexts. Local calibration by tuning a subset of model parameters is a frequent practice when software organizations adopt parametric estimation models to increase model usability and accuracy. However, there is a lack of understanding about the cumulative effects of such local calibration practices on the evolution of general parametric models over time.ObjectiveThis study aims at quantitatively analyzing and effectively handling local bias associated with historical cross-company data, thus improves the usability of cross-company datasets for calibrating and maintaining parametric estimation models.MethodWe design and conduct three empirical studies to measure, analyze and address local bias in cross-company dataset, including: (1) defining a method for measuring the local bias associated with individual organization data subset in the overall dataset; (2) analyzing the impacts of local bias on the performance of an estimation model; (3) proposing a weighted sampling approach to handle local bias. The studies are conducted on the latest COCOMO II calibration dataset.ResultsOur results show that the local bias largely exists in cross company dataset, and the local bias negatively impacts the performance of parametric model. The local bias based weighted sampling technique helps reduce negative impacts of local bias on model performance.ConclusionLocal bias in cross-company data does harm model calibration and adds noisy factors to model maintenance. The proposed local bias measure offers a means to quantify degree of local bias associated with a cross-company dataset, and assess its influence on parametric model performance. The local bias based weighted sampling technique can be applied to trade-off and mitigate potential risk of significant local bias, which limits the usability of cross-company data for general parametric model calibration and maintenance.  相似文献   

7.
Artificial Intelligence (AI) use in automated Electrocardiogram (ECG) classification has continuously attracted the research community’s interest, motivated by their promising results. Despite their great promise, limited attention has been paid to the robustness of their results, which is a key element for their implementation in clinical practice. Uncertainty Quantification (UQ) is a critical for trustworthy and reliable AI, particularly in safety-critical domains such as medicine. Estimating uncertainty in Machine Learning (ML) model predictions has been extensively used for Out-of-Distribution (OOD) detection under single-label tasks. However, the use of UQ methods in multi-label classification remains underexplored.This study goes beyond developing highly accurate models comparing five uncertainty quantification methods using the same Deep Neural Network (DNN) architecture across various validation scenarios, including internal and external validation as well as OOD detection, taking multi-label ECG classification as the example domain. We show the importance of external validation and its impact on classification performance, uncertainty estimates quality, and calibration. Ensemble-based methods yield more robust uncertainty estimations than single network or stochastic methods. Although current methods still have limitations in accurately quantifying uncertainty, particularly in the case of dataset shift, incorporating uncertainty estimates with a classification with a rejection option improves the ability to detect such changes. Moreover, we show that using uncertainty estimates as a criterion for sample selection in active learning setting results in greater improvements in classification performance compared to random sampling.  相似文献   

8.
An important factor for planning, budgeting and bidding a software project is prediction of the development effort required to complete it. This prediction can be obtained from models related to neural networks. The hypothesis of this research was the following: effort prediction accuracy of a general regression neural network (GRNN) model is statistically equal or better than that obtained by a statistical regression model, using data obtained from industrial environments. Each model was generated from a separate dataset obtained from the International Software Benchmarking Standards Group (ISBSG) software projects repository. Each of the two models was then validated using a new dataset from the same ISBSG repository. Results obtained from a variance analysis of accuracies of the models suggest that a GRNN could be an alternative for predicting development effort of software projects that have been developed in industrial environments.  相似文献   

9.
对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题。为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类。实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率。  相似文献   

10.
Bayesian inference and prediction for a generalized autoregressive conditional heteroskedastic (GARCH) model where the innovations are assumed to follow a mixture of two Gaussian distributions is performed. The mixture GARCH model can capture the patterns usually exhibited by many financial time series such as volatility clustering, large kurtosis and extreme observations. A Griddy-Gibbs sampler implementation is proposed for parameter estimation and volatility prediction. Bayesian prediction of the Value at Risk is also addressed providing point estimates and predictive intervals. The method is illustrated using the Swiss Market Index.  相似文献   

11.
Metamodels are approximate mathematical models used as surrogates for computationally expensive simulations. Since metamodels are widely used in design space exploration and optimization, there is growing interest in developing techniques to enhance their accuracy. It has been shown that the accuracy of metamodel predictions can be increased by combining individual metamodels in the form of an ensemble. Several efforts were focused on determining the contribution (or weight factor) of a metamodel in the ensemble using global error measures. In addition, prediction variance is also used as a local error measure to determine the weight factors. This paper investigates the efficiency of using local error measures, and also presents the use of the pointwise cross validation error as a local error measure as an alternative to using prediction variance. The effectiveness of ensemble models are tested on several problems with varying dimensionality: five mathematical benchmark problems, two structural mechanics problems and an automobile crash problem. It is found that the spatial ensemble models show better performances than the global ensemble for the low-dimensional problems, while the global ensemble is a more accurate model than the spatial ensembles for the high-dimensional problems. Ensembles based on pointwise cross validation error and prediction variance provide similar accuracy. The ensemble models based on local measures reduce cross validation errors drastically, but their performances are not that impressive in reducing the error evaluated at random test points, because the pointwise cross validation error is not a good surrogate for the error at a point.  相似文献   

12.
The quality of conceptual business process models is highly relevant for the design of corresponding information systems. In particular, a precise measurement of model characteristics can be beneficial from a business perspective, helping to save costs thanks to early error detection. This is just as true from a software engineering point of view. In this latter case, models facilitate stakeholder communication and software system design. Research has investigated several proposals as regards measures for business process models, from a rather correlational perspective. This is helpful for understanding, for example size and complexity as general driving forces of error probability. Yet, design decisions usually have to build on thresholds, which can reliably indicate that a certain counter-action has to be taken. This cannot be achieved only by providing measures; it requires a systematic identification of effective and meaningful thresholds. In this paper, we derive thresholds for a set of structural measures for predicting errors in conceptual process models. To this end, we use a collection of 2000 business process models from practice as a means of determining thresholds, applying an adaptation of the ROC curve method. Furthermore, an extensive validation of the derived thresholds was conducted by using 429 EPC models from an Australian financial institution. Finally, significant thresholds were adapted to refine existing modeling guidelines in a quantitative way.  相似文献   

13.
In agricultural and environmental sciences dispersal models are often used for risk assessment to predict the risk associated with a given configuration and also to test scenarios that are likely to minimise those risks. Like any biological process, dispersal is subject to biological, climatic and environmental variability and its prediction relies on models and parameter values which can only approximate the real processes. In this paper, we present a Bayesian method to model dispersal using spatial configuration and climatic data (distances between emitters and receptors; main wind direction) while accounting for uncertainty, with an application to the prediction of adventitious presence rate of genetically modified maize (GM) in a non-GM field. This method includes the design of candidate models, their calibration, selection and evaluation on an independent dataset. A group of models was identified that is sufficiently robust to be used for prediction purpose. The group of models allows to include local information and it reflects reliably enough the observed variability in the data so that probabilistic model predictions can be performed and used to quantify risk under different scenarios or derive optimal sampling schemes.  相似文献   

14.
In this paper, a sensor data validation/reconstruction methodology applicable to water networks and its implementation by means of a software tool are presented. The aim is to guarantee that the sensor data are reliable and complete in case that sensor faults occur. The availability of such dataset is of paramount importance in order to successfully use the sensor data for further tasks e.g. water billing, network efficiency assessment, leak localization and real-time operational control. The methodology presented here is based on a sequence of tests and on the combined use of spatial models (SM) and time series models (TSM) applied to the sensors used for real-time monitoring and control of the water network. Spatial models take advantage of the physical relations between different system variables (e.g. flow and level sensors in hydraulic systems) while time series models take advantage of the temporal redundancy of the measured variables (here by means of a Holt–Winters (HW) time series model). First, the data validation approach, based on several tests of different complexity, is described to detect potential invalid or missing data. Then, the reconstruction process is based on a set of spatial and time series models used to reconstruct the missing/invalid data with the model estimation providing the best fit. A software tool implementing the proposed data validation and reconstruction methodology is also described. Finally, results obtained applying the proposed methodology to a real case study based on the Catalonia regional water network is used to illustrate its performance.  相似文献   

15.
Standard practice in building models in software engineering normally involves three steps: collecting domain knowledge (previous results, expert knowledge); building a skeleton of the model based on step 1 including as yet unknown parameters; estimating the model parameters using historical data. Our experience shows that it is extremely difficult to obtain reliable data of the required granularity, or of the required volume with which we could later generalize our conclusions. Therefore, in searching for a method for building a model we cannot consider methods requiring large volumes of data. This paper discusses an experiment to develop a causal model (Bayesian net) for predicting the number of residual defects that are likely to be found during independent testing or operational usage. The approach supports (1) and (2), does not require (3), yet still makes accurate defect predictions (an R 2 of 0.93 between predicted and actual defects). Since our method does not require detailed domain knowledge it can be applied very early in the process life cycle. The model incorporates a set of quantitative and qualitative factors describing a project and its development process, which are inputs to the model. The model variables, as well as the relationships between them, were identified as part of a major collaborative project. A dataset, elicited from 31 completed software projects in the consumer electronics industry, was gathered using a questionnaire distributed to managers of recent projects. We used this dataset to validate the model by analyzing several popular evaluation measures (R 2, measures based on the relative error and Pred). The validation results also confirm the need for using the qualitative factors in the model. The dataset may be of interest to other researchers evaluating models with similar aims. Based on some typical scenarios we demonstrate how the model can be used for better decision support in operational environments. We also performed sensitivity analysis in which we identified the most influential variables on the number of residual defects. This showed that the project size, scale of distributed communication and the project complexity cause the most of variation in number of defects in our model. We make both the dataset and causal model available for research use.  相似文献   

16.
A cluster operator takes a set of data points and partitions the points into clusters (subsets). As with any scientific model, the scientific content of a cluster operator lies in its ability to predict results. This ability is measured by its error rate relative to cluster formation. To estimate the error of a cluster operator, a sample of point sets is generated, the algorithm is applied to each point set and the clusters evaluated relative to the known partition according to the distributions, and then the errors are averaged over the point sets composing the sample. Many validity measures have been proposed for evaluating clustering results based on a single realization of the random-point-set process. In this paper we consider a number of proposed validity measures and we examine how well they correlate with error rates across a number of clustering algorithms and random-point-set models. Validity measures fall broadly into three classes: internal validation is based on calculating properties of the resulting clusters; relative validation is based on comparisons of partitions generated by the same algorithm with different parameters or different subsets of the data; and external validation compares the partition generated by the clustering algorithm and a given partition of the data. To quantify the degree of similarity between the validation indices and the clustering errors, we use Kendall's rank correlation between their values. Our results indicate that, overall, the performance of validity indices is highly variable. For complex models or when a clustering algorithm yields complex clusters, both the internal and relative indices fail to predict the error of the algorithm. Some external indices appear to perform well, whereas others do not. We conclude that one should not put much faith in a validity score unless there is evidence, either in terms of sufficient data for model estimation or prior model knowledge, that a validity measure is well-correlated to the error rate of the clustering algorithm.  相似文献   

17.
软件缺陷预测是软件质量保障领域的热点研究课题,缺陷预测模型的质量与训练数据有密切关系。用于缺陷预测的数据集主要存在数据特征的选择和数据类不平衡问题。针对数据特征选择问题,采用软件开发常用的过程特征和新提出的扩展过程特征,然后采用基于聚类分析的特征选择算法进行特征选择;针对数据类不平衡问题,提出改进的Borderline-SMOTE过采样方法,使得训练数据集的正负样本数量相对平衡且合成样本的特征更符合实际样本特征。采用bugzilla、jUnit等项目的开源数据集进行实验,结果表明:所采用的特征选择算法在保证模型F-measure值的同时,可以降低57.94%的模型训练时间;使用改进的Borderline-SMOTE方法处理样本得到的缺陷预测模型在Precision、Recall、F-measure、AUC指标上比原始方法得到的模型平均分别提高了2.36个百分点、1.8个百分点、2.13个百分点、2.36个百分点;引入了扩展过程特征得到的缺陷预测模型比未引入扩展过程特征得到的模型在F-measure值上平均提高了3.79%;与文献中的方法得到的模型相比,所提方法得到的模型在F-measure值上平均提高了15.79%。实验结果证明所提方法能有效提升缺陷预测模型的质量。  相似文献   

18.
软件缺陷预测是软件质量保障领域的热点研究课题,缺陷预测模型的质量与训练数据有密切关系。用于缺陷预测的数据集主要存在数据特征的选择和数据类不平衡问题。针对数据特征选择问题,采用软件开发常用的过程特征和新提出的扩展过程特征,然后采用基于聚类分析的特征选择算法进行特征选择;针对数据类不平衡问题,提出改进的Borderline-SMOTE过采样方法,使得训练数据集的正负样本数量相对平衡且合成样本的特征更符合实际样本特征。采用bugzilla、jUnit等项目的开源数据集进行实验,结果表明:所采用的特征选择算法在保证模型F-measure值的同时,可以降低57.94%的模型训练时间;使用改进的Borderline-SMOTE方法处理样本得到的缺陷预测模型在Precision、Recall、F-measure、AUC指标上比原始方法得到的模型平均分别提高了2.36个百分点、1.8个百分点、2.13个百分点、2.36个百分点;引入了扩展过程特征得到的缺陷预测模型比未引入扩展过程特征得到的模型在F-measure值上平均提高了3.79%;与文献中的方法得到的模型相比,所提方法得到的模型在F-measure值上平均提高了15.79%。实验结果证明所提方法能有效提升缺陷预测模型的质量。  相似文献   

19.
Since 1984 the International Function Point Users Group (IFPUG) has produced and maintained a set of standards and technical documents about a functional size measurement methods, known as IFPUG, based on Albrecht function points. On the other hand, in 1998, the Common Software Measurement International Consortium (COSMIC) proposed an improved measurement method known as full function points (FFP). Both the IFPUG and the COSMIC methods both measure functional size of software, but produce different results. In this paper, we propose a model to convert functional size measures obtained with the IFPUG method to the corresponding COSMIC measures. We also present the validation of the model using 33 software projects measured with both methods. This approach may be beneficial to companies using both methods or migrating to COSMIC such that past data in IFPUG can be considered for future estimates using COSMIC and as a validation procedure.  相似文献   

20.
陈曙  叶俊民  刘童 《软件学报》2020,31(2):266-281
软件缺陷预测旨在帮助软件开发人员在早期发现和定位软件部件可能存在的潜在缺陷,以达到优化测试资源分配和提高软件产品质量的目的.跨项目缺陷预测在已有项目的缺陷数据集上训练模型,去预测新的项目中的缺陷,但其效果往往不理想,其主要原因在于,采样自不同项目的样本数据集,其概率分布特性存在较大差异,由此对预测精度造成较大影响.针对此问题,提出一种监督型领域适配(domain adaptation)的跨项目软件缺陷预测方法.将实例加权的领域适配与机器学习的预测模型训练过程相结合,通过构造目标项目样本相关的权重,将其施加于充足的源项目样本中,以实例权重去影响预测模型的参数学习过程,将来自目标项目中缺陷数据集的分布特性适配到训练数据集中,从而实现缺陷数据样本的复用和跨项目软件缺陷预测.在10个大型开源软件项目上对该方法进行实证,从数据集、数据预处理、实验结果多个角度针对不同的实验设定策略进行分析;从数据、预测模型以及模型适配层面分析预测模型的过拟合问题.实验结果表明,该方法性能优于同类方法,显著优于基准性能,且能够接近和达到项目内缺陷预测的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号