首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ContextDue to the complex nature of software development process, traditional parametric models and statistical methods often appear to be inadequate to model the increasingly complicated relationship between project development cost and the project features (or cost drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for software cost estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage of ML methods; however, very few works have been focused on the effects of data preprocessing techniques.ObjectiveThis study aims for an empirical assessment of the effectiveness of data preprocessing techniques on ML methods in the context of software cost estimation.MethodIn this work, we first conduct a literature survey of the recent publications using data preprocessing techniques, followed by a systematic empirical study to analyze the strengths and weaknesses of individual data preprocessing techniques as well as their combinations.ResultsOur results indicate that data preprocessing techniques may significantly influence the final prediction. They sometimes might have negative impacts on prediction performance of ML methods.ConclusionIn order to reduce prediction errors and improve efficiency, a careful selection is necessary according to the characteristics of machine learning methods, as well as the datasets used for software cost estimation.  相似文献   

2.
ContextSoftware defect prediction has been widely studied based on various machine-learning algorithms. Previous studies usually focus on within-company defects prediction (WCDP), but lack of training data in the early stages of software testing limits the efficiency of WCDP in practice. Thus, recent research has largely examined the cross-company defects prediction (CCDP) as an alternative solution.ObjectiveHowever, the gap of different distributions between cross-company (CC) data and within-company (WC) data usually makes it difficult to build a high-quality CCDP model. In this paper, a novel algorithm named Double Transfer Boosting (DTB) is introduced to narrow this gap and improve the performance of CCDP by reducing negative samples in CC data.MethodThe proposed DTB model integrates two levels of data transfer: first, the data gravitation method reshapes the whole distribution of CC data to fit WC data. Second, the transfer boosting method employs a small ratio of labeled WC data to eliminate negative instances in CC data.ResultsThe empirical evaluation was conducted based on 15 publicly available datasets. CCDP experiment results indicated that the proposed model achieved better overall performance than compared CCDP models. DTB was also compared to WCDP in two different situations. Statistical analysis suggested that DTB performed significantly better than WCDP models trained by limited samples and produced comparable results to WCDP with sufficient training data.ConclusionsDTB reforms the distribution of CC data from different levels to improve the performance of CCDP, and experimental results and analysis demonstrate that it could be an effective model for early software defects detection.  相似文献   

3.

Context

Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?

Objective

In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.

Method

Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.

Results

This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.

Conclusion

It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.  相似文献   

4.
基于HY-2高度计数据,采用局部线性回归非参数估计方法,利用球谐核函数及局部可调带宽,对70和71周期的交叉点进行海况偏差非参数估计。依据解释方差、海况偏差与有效波高及风速的相关度和模型残差分析,检验评价模型。与相同数据集下的参数模型估计结果进行了分析比对,结果表明:所选定的非参数模型的海况偏差与有效波高和风速的相关度均处于较高水平,说明模型更为有效。在不同纬度段,非参数模型和参数模型各有所长,在北半球高纬度区域,非参数模型表现更优。  相似文献   

5.
ContextSeveral issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset.ObjectiveThe objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy.MethodWe carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance.ResultsForward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1.ConclusionThis paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.  相似文献   

6.
This paper presents a generic framework in which images are modelled as order-less sets of weighted visual features. Each visual feature is associated with a weight factor that may inform its relevance. This framework can be applied to various bag-of-features approaches such as the bag-of-visual-word or the Fisher kernel representations. We suggest that if dense sampling is used, different schemes to weight local features can be evaluated, leading to results that are often better than the combination of multiple sampling schemes, at a much lower computational cost, because the features are extracted only once. This allows our framework to be a test-bed for saliency estimation methods in image categorisation tasks. We explored two main possibilities for the estimation of local feature relevance. The first one is based on the use of saliency maps obtained from human feedback, either by gaze tracking or by mouse clicks. The method is able to profit from such maps, leading to a significant improvement in categorisation performance. The second possibility is based on automatic saliency estimation methods, including Itti & Koch’s method and SIFT’s DoG. We evaluated the proposed framework and saliency estimation methods using an in house dataset and the PASCAL VOC 2008/2007 dataset, showing that some of the saliency estimation methods lead to a significant performance improvement in comparison to the standard unweighted representation.  相似文献   

7.
In 2004 [Kitchenham, B.A., Mendes, E., 2004a. Software productivity measurement using multiple size measures. IEEE Transactions on Software Engineering 30 (12), 1023-1035, Kitchenham, B.A., Mendes, E., 2004b. A comparison of cross-company and single-company effort estimation models for web applications. In: Proceedings Evaluation and Assessment in Software Engineering (EASE’ 04), pp. 47-55] (S1) investigated, using data on 63 Web projects, to what extent a cross-company cost model could be successfully employed to estimate development effort for single-company Web projects. Their effort models were built using Forward Stepwise Regression (SWR) and they found that cross-company predictions were significantly worse than single-company predictions. This study S1 was extended by Mendes and Kitchenham [Mendes, E., Kitchenham, B.A., 2004. Further comparison of cross-company and within company effort estimation models for web applications. In: Proceedings International Software Metrics Symposium (METRICS’04), Chicago, Illinois, September 11-17th, 2004. IEEE Computer Society, pp. 348-357] (S2), who used SWR and Case-based reasoning (CBR), and data on 67 Web projects from the Tukutuku database. They built two cross-company and one single-company models and found that both SWR cross-company models and CBR cross-company data provided predictions significantly worse than single-company predictions. Since 2004 another 83 projects were volunteered to the Tukutuku database, and recently used by Mendes et al. [Mendes, E., Di Martino, S., Ferrucci, F., Gravino, C., in press. Effort estimation: How valuable is it for a web company to use a cross-company data set, compared to using its own single-company data set? In: Proceedings of International World Wide Web Conference (WWW’07), Banff, Canada, 8-12 May] (S3), who partially replicated Mendes and Kitchenham’s study (S2), using SWR and CBR. They corroborated some of S2’s findings (SWR cross-company model and the CBR cross-company data provided predictions significantly worse than single-company predictions) however they replicated only part of S2. The objective of this paper (S4) is therefore to extend Mendes et al.’s work and fully replicate S2. We used the same dataset used in S3, and our results corroborated most of those obtained in S2. The main difference between S2 and our study was that one of our SWR cross-company models showed significantly similar predictions to the single-company model, which contradicts the findings from S2.  相似文献   

8.
目的 光场相机通过一次成像同时记录场景的空间信息和角度信息,获取多视角图像和重聚焦图像,在深度估计中具有独特优势。遮挡是光场深度估计中的难点问题之一,现有方法没有考虑遮挡或仅仅考虑单一遮挡情况,对于多遮挡场景点,方法失效。针对遮挡问题,在多视角立体匹配框架下,提出了一种对遮挡鲁棒的光场深度估计算法。方法 首先利用数字重聚焦算法获取重聚焦图像,定义场景的遮挡类型,并构造相关性成本量。然后根据最小成本原则自适应选择最佳成本量,并求解局部深度图。最后利用马尔可夫随机场结合成本量和平滑约束,通过图割算法和加权中值滤波获取全局优化深度图,提升深度估计精度。结果 实验在HCI合成数据集和Stanford Lytro Illum实际场景数据集上展开,分别进行局部深度估计与全局深度估计实验。实验结果表明,相比其他先进方法,本文方法对遮挡场景效果更好,均方误差平均降低约26.8%。结论 本文方法能够有效处理不同遮挡情况,更好地保持深度图边缘信息,深度估计结果更准确,且时效性更好。此外,本文方法适用场景是朗伯平面场景,对于含有高光的非朗伯平面场景存在一定缺陷。  相似文献   

9.
Recent studies have reported that Support Vector Regression (SVR) has the potential as a technique for software development effort estimation. However, its prediction accuracy is heavily influenced by the setting of parameters that needs to be done when employing it. No general guidelines are available to select these parameters, whose choice also depends on the characteristics of the dataset being used. This motivated the work described in (Corazza et al. 2010), extended herein. In order to automatically select suitable SVR parameters we proposed an approach based on the use of the meta-heuristics Tabu Search (TS). We designed TS to search for the parameters of both the support vector algorithm and of the employed kernel function, namely RBF. We empirically assessed the effectiveness of the approach using different types of datasets (single and cross-company datasets, Web and not Web projects) from the PROMISE repository and from the Tukutuku database. A total of 21 datasets were employed to perform a 10-fold or a leave-one-out cross-validation, depending on the size of the dataset. Several benchmarks were taken into account to assess both the effectiveness of TS to set SVR parameters and the prediction accuracy of the proposed approach with respect to widely used effort estimation techniques. The use of TS allowed us to automatically obtain suitable parameters’ choices required to run SVR. Moreover, the combination of TS and SVR significantly outperformed all the other techniques. The proposed approach represents a suitable technique for software development effort estimation.  相似文献   

10.
崔帅  张骏  高隽 《中国图象图形学报》2019,24(12):2111-2125
目的 颜色恒常性通常指人类在任意光源条件下正确感知物体颜色的自适应能力,是实现识别、分割、3维视觉等高层任务的重要前提。对图像进行光源颜色估计是实现颜色恒常性计算的主要途径之一,现有光源颜色估计方法往往因局部场景的歧义颜色导致估计误差较大。为此,提出一种基于深度残差学习的光源颜色估计方法。方法 将输入图像均匀分块,根据局部图像块的光源颜色估计整幅图像的全局光源颜色。算法包括光源颜色估计和图像块选择两个残差网络:光源颜色估计网络通过较深的网络层次和残差结构提高光源颜色估计的准确性;图像块选择网络按照光源颜色估计误差对图像块进行分类,根据分类结果去除图像中误差较大的图像块,进一步提高全局光源颜色估计精度。此外,对输入图像进行对数色度预处理,可以降低图像亮度对光源颜色估计的影响,提高计算效率。结果 在NUS-8和重处理的ColorChecker数据集上的实验结果表明,本文方法的估计精度和稳健性较好;此外,在相同条件下,对数色度图像比原始图像的估计误差低10% 15%,图像块选择网络能够进一步使光源颜色估计网络的误差降低约5%。结论 在两组单光源数据集上的实验表明,本文方法的总体设计合理有效,算法精度和稳健性好,可应用于需要进行色彩校正的图像处理和计算机视觉等领域。  相似文献   

11.
BackgroundSource code size in terms of SLOC (source lines of code) is the input of many parametric software effort estimation models. However, it is unavailable at the early phase of software development.ObjectiveWe investigate the accuracy of early SLOC estimation approaches for an object-oriented system using the information collected from its UML class diagram available at the early software development phase.MethodWe use different modeling techniques to build the prediction models for investigating the accuracy of six types of metrics to estimate SLOC. The used techniques include linear models, non-linear models, rule/tree-based models, and instance-based models. The investigated metrics are class diagram metrics, predictive object points, object-oriented project size metric, fast&&serious class points, objective class points, and object-oriented function points.ResultsBased on 100 open-source Java systems, we find that the prediction model built using object-oriented project size metric and ordinary least square regression with a logarithmic transformation achieves the highest accuracy (mean MMRE = 0.19 and mean Pred(25) = 0.74).ConclusionWe should use object-oriented project size metric and ordinary least square regression with a logarithmic transformation to build a simple, accurate, and comprehensible SLOC estimation model.  相似文献   

12.
ABSTRACT

Aboveground biomass (AGB) of mangrove forest plays a crucial role in global carbon cycle by reducing greenhouse gas emissions and mitigating climate change impacts. Monitoring mangrove forests biomass accurately still remains challenging compared to other forest ecosystems. We investigated the usability of machine learning techniques for the estimation of AGB of mangrove plantation at a coastal area of Hai Phong city (Vietnam). The study employed a GIS database and support vector regression (SVR) to build and verify a model of AGB, drawing upon data from a survey in 25 sampling plots and an integration of Advanced Land Observing Satellite-2 Phased Array Type L-band Synthetic Aperture Radar-2 (ALOS-2 PALSAR-2) dual-polarization horizontal transmitting and horizontal receiving (HH) and horizontal transmitting and vertical receiving (HV) and Sentinel-2A multispectral data. The performance of the model was assessed using root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and leave-one-out cross-validation. Usability of the SVR model was assessed by comparing with four state-of-the-art machine learning techniques, i.e. radial basis function neural networks, multi-layer perceptron neural networks, Gaussian process, and random forest. The SVR model shows a satisfactory result (R2 = 0.596, RMSE = 0.187, MAE = 0.123) and outperforms the four machine learning models. The SVR model-estimated AGB ranged between 36.22 and 230.14 Mg ha?1 (average = 87.67 Mg ha?1). We conclude that an integration of ALOS-2 PALSAR-2 and Sentinel-2A data used with SVR model can improve the AGB accuracy estimation of mangrove plantations in tropical areas.  相似文献   

13.
基于RGB-D深度相机的室内场景重建   总被引:1,自引:1,他引:0       下载免费PDF全文
目的 重建包含真实纹理的彩色场景3维模型是计算机视觉领域重要的研究课题之一,由于室内场景复杂、采样图像序列长且运动无规则,现有的3维重建算法存在重建尺度受限、局部细节重建效果差的等问题。方法 以RGBD-SLAM 算法为基础并提出了两方面的改进,一是将深度图中的平面信息加入帧间配准算法,提高了帧间配准算法的鲁棒性与精度;二是在截断符号距离函数(TSDF)体重建过程中,提出了一种指数权重函数,相比普通的权重函数能更好地减少相机深度畸变对重建的影响。结果 本文方法在相机姿态估计中带来了比RGBD-SLAM方法更好的结果,平均绝对路径误差减少1.3 cm,能取得到更好的重建效果。结论 本文方法有效地提高了相机姿态估计精度,可以应用于室内场景重建中。  相似文献   

14.
ABSTRACT

For photometric stereo in capsule endoscopy, calibration of light source is crucial for improving the precision of surface normal estimation. Therefore, this paper presents an improved planar-mirror-based light source position calibration method: from captured images of light source and detected poses of planar mirror, light paths are retraced from camera to light source, and position of light source is triangulated with least square method. The contribution of this paper is that a refraction model of the planar mirror is employed in the retracement of light paths, thus the bias of light paths caused by refraction can be compensated and the position of light source can be estimated more precisely. The results of simulation and experiment show that the proposed method provides higher calibration accuracy than the current planar mirror-based calibration method and can improve the precision of subsequent photometric stereo-based 3D reconstruction.  相似文献   

15.
运用软件成本估算技术可以在软件项目实施过程中有效控制进度、降低风险从而保证所开发软件的质量.本文对成本估算技术中加权类比估算模型的特征属性的权重,使用优化的粒子群算法(C&S-PSO)进行优化后,利用MMRE和Pred(0.25)两个标准与非加权类比模型、支持回归和模糊神经网络等模型进行估算的精度比较.另外,采用非参数自助法对优化的PSO加权类比估算模型稳定性进行评估.研究结果表明,从估算精度来看,采用本文的估算模型比上述估算模型估算精度高,同时该模型具有较好的稳定性.  相似文献   

16.
目的 为改善摄像机间接标定采样不全、模型表达模糊问题,实现小视场下检测视域完备采样,提出一种基于双目系统全视域采样的支持向量机(SVM)标定方法。方法 该方法利用六角晶格标定板靶点序号可读特点为基础,采集整个双目系统有效视域中检测点的视差坐标、世界坐标并建立完备的样本集。选取SVM对该样本集进行训练,将SVM算法得到的模型参数代入其决策函数中进行求解,获得公式化的标定模型。由于六角晶格标定板的四角和中心分布了5个互为非中心对称的多边形,可在标定板部分区域被采集的情况下获取标定板位姿信息,进而读取采集的各靶点序号。通过上下移动标定板,利用HALCON算子获取图像中各靶点的序号,建立双目视觉系统检测区域的完备样本集。最后,利用SVM算法训练样本获得标定模型,可以明确表达出标定模型的数学形式。结果 与传统采样建立的模型进行对比分析,实验结果表明该方法建立模型的标定误差减小了24.51%,降低了标定模型在传统方法未采样区域的标定误差,证明了该方法的可行性。结论 提出一种基于双目系统全视域采样的支持向量机标定方法,通过非中心对称的多边形确定标定板上靶点的序号,实现双目视觉系统检测视域的完备采样。实验结果表明该方法提高了摄像机间接标定的精度,具有良好的适用性和鲁棒性,适用于小视域内双目视觉系统的间接标定。  相似文献   

17.
带权点集Laguerre图的增量算法与软件设计研究   总被引:1,自引:0,他引:1  
Laguerre图作为Voronoi图的推广,在计算几何学、材料科学等领域中有着重要应用。重点讨论了带权点集Regular三角化的增量算法以及根据其对偶性质构造Laguerre图的实现过程;通过研究球填充带权点集对Laguerre图胞体结构特征的影响,在此基础上开发了用于参数化、自动化、可视化构造Laguerre图的软件;利用软件给出了多晶体材料与泡沫材料微结构仿真的应用实例,验证了软件的有效性。  相似文献   

18.
ABSTRACT

Creating an interactive, accurate, and low-latency big data visualisation is challenging due to the volume, variety, and velocity of the data. Visualisation options range from visualising the entire big dataset, which could take a long time and be taxing to the system, to visualising a small subset of the dataset, which could be fast and less taxing to the system but could also lead to a less-beneficial visualisation as a result of information loss. The main research questions investigated by this work are what effect sampling has on visualisation insight and how to provide guidance to users in navigating this trade-off. To investigate these issues, we study an initial case of simple estimation tasks on histogram visualisations of sampled big data, in hopes that these results may generalise. Leveraging sampling, we generate subsets of large datasets and create visualisations for a crowd-sourced study involving a simple cognitive visualisation task. Using the results of this study, we quantify insight, sampling, visualisation, and perception error in comparison to the full dataset. We use these results to model the relationship between sample size and insight error, and we propose the use of our model to guide big data visualisation sampling.  相似文献   

19.
ContextNowadays, there are sound methods and tools which implement the Model-Driven Development approach (MDD) satisfactorily. However, MDD approaches focus on representing and generating code that represents functionality, behaviour and persistence, putting the interaction, and more specifically the usability, in a second place. If we aim to include usability features in a system developed with a MDD tool, we need to extend manually the generated code.ObjectiveThis paper tackles how to include functional usability features (usability recommendations strongly related to system functionality) in MDD through conceptual primitives.MethodThe approach consists of studying usability guidelines to identify usability properties that can be represented in a conceptual model. Next, these new primitives are the input for a model compiler that generates the code according to the characteristics expressed in them. An empirical study with 66 subjects was conducted to study the effect of including functional usability features regarding end users’ satisfaction and time to complete tasks. Moreover, we have compared the workload of two MDD analysts including usability features by hand in the generated code versus including them through conceptual primitives according to our approach.ResultsResults of the empirical study shows that after including usability features, end users’ satisfaction improves while spent time does not change significantly. This justifies the use of usability features in the software development process. Results of the comparison show that the workload required to adapt the MDD method to support usability features through conceptual primitives is heavy. However, once MDD supports these features, MDD analysts working with primitives are more efficient than MDD analysts implementing these features manually.ConclusionThis approach brings us a step closer to conceptual models where models represent not only functionality, behaviour or persistence, but also usability features.  相似文献   

20.
Bayesian analysis of empirical software engineering cost models   总被引:1,自引:0,他引:1  
Many parametric software estimation models have evolved in the last two decades (L.H. Putnam and W. Myers, 1992; C. Jones, 1997; R.M. Park et al., 1992). Almost all of these parametric models have been empirically calibrated to actual data from completed software projects. The most commonly used technique for empirical calibration has been the popular classical multiple regression approach. As discussed in the paper, the multiple regression approach imposes a few assumptions frequently violated by software engineering datasets. The paper illustrates the problems faced by the multiple regression approach during the calibration of one of the popular software engineering cost models, COCOMO II. It describes the use of a pragmatic 10 percent weighted average approach that was used for the first publicly available calibrated version (S. Chulani et al., 1998). It then moves on to show how a more sophisticated Bayesian approach can be used to alleviate some of the problems faced by multiple regression. It compares and contrasts the two empirical approaches, and concludes that the Bayesian approach was better and more robust than the multiple regression approach  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号