期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the relationship between the circumference of rubber trees and L-band waves

Bambang H. Trisasongko David J. Paull Amy L. Griffin Xiuping Jia Dyah R. Panuju 《International journal of remote sensing》2019,40(16):6395-6417

Despite substantial research conducted within the forestry domain, detailed assessments to monitor plantations and support their sustainable management have been understudied. This article attempts to fill this gap through coupling fully polarimetric L-band data and contemporary data mining methods for the estimation of tree circumference as: (1) a primary dataset for biomass accumulation studies; and, (2) critical information for operational management in rubber plantations. We used two rubber plantation sites in Subang (West Java) and Jember (East Java), Indonesia, to evaluate the capability of L-band radar data. Although polarimetric features derived from polarimetric decomposition theorems have been advocated by others, we show that backscatter coefficients, especially HV polarization, remain an important dataset for this research domain. Using Subang data to build the model, we found that modern machine learning methods do not always deliver the best performance. It appears that the data being ingested plays a significant role in obtaining a good model, hence careful selection of datasets from multiple forms of polarimetric SAR data needs to be further considered. The highest coefficient of determination (R² = 0.79) was achieved by Yamaguchi decomposition features with the aid of partial least squares regression. Nonetheless, we note that the R² gap was insignificant to the backscatter coefficient when random forests regression was used (R² = 0.78). Overall, only the backscatter coefficient dataset delivered fairly consistent results with any regression model, with the average R² being about 0.67. When tuning parameters were not assessed, random forests consistently outweighed support vector regressions in all forms of datasets. The latter generated a substantial increase in R² when a linear kernel was used instead of the popular radial basis function. The issue of transferability of the model is also addressed in this article. It appears that similarity of terrain characteristics substantially influences the model’s performance. Models developed in Subang, which has gentle slopes, seem valid only in plantations with similar terrain. Validation attempts in very flat terrain within two plantation sectors in Jember delivered a poor result, although they have similar elevations to the Subang site. In contrast, validation in a plantation sector with similar, gently sloping terrain achieved an R² of about 0.6 using some datasets. 相似文献

2.

融合知识图谱和轻量图卷积网络的推荐算法

樊海玮张丽苗鲁芯丝雨王帅《计算机系统应用》2023,32(8):207-213

针对知识图谱推荐算法用户端和项目端建模程度不均且模型复杂度较高等问题, 提出融合知识图谱和轻量图卷积网络的推荐算法. 在用户端, 利用用户相似性生成邻居集合, 将用户及其相似用户的交互记录在知识图谱上多次迭代传播, 增强用户特征表示. 在项目端, 将知识图谱中实体嵌入传播, 挖掘与用户喜好相关的项目信息; 接着, 利用轻量图卷积网络聚合邻域特征获得用户和项目的特征表示, 同时采用注意力机制将邻域权重融入实体, 增强节点的嵌入表示; 最后, 预测用户和项目之间的评分. 实验表明, 在Book-Crossing数据集上, 相较于最优基线, AUC和ACC分别提高了1.8%和2.3%. 在Yelp2018数据集上, AUC和ACC分别提高了1.2%和1.4%. 结果证明, 该模型与其他基准模型相比有较好的推荐性能. 相似文献

3.

Antecedents of open source software defects: A data mining approach to model formulation, validation and testing 总被引：1，自引：0，他引：1

Uzma Raja Marietta J. Tretter 《Information Technology and Management》2009,10(4):235-251

This paper develops tests and validates a model for the antecedents of open source software (OSS) defects, using Data and Text Mining. The public archives of OSS projects are used to access historical data on over 5,000 active and mature OSS projects. Using domain knowledge and exploratory analysis, a wide range of variables is identified from the process, product, resource, and end-user characteristics of a project to ensure that the model is robust and considers all aspects of the system. Multiple Data Mining techniques are used to refine the model and data is enriched by the use of Text Mining for knowledge discovery from qualitative information. The study demonstrates the suitability of Data Mining and Text Mining for model building. Results indicate that project type, end-user activity, process quality, team size and project popularity have a significant impact on the defect density of operational OSS projects. Since many organizations, both for profit and not for profit, are beginning to use Open Source Software as an economic alternative to commercial software, these results can be used in the process of deciding what software can be reasonably maintained by an organization. 相似文献

4.

A new technique to predict fly-rock in bench blasting based on an ensemble of support vector regression and GLMNET

Guo Hongquan Nguyen Hoang Bui Xuan-Nam Armaghani Danial Jahed 《Engineering with Computers》2021,37(1):421-435

Fly-rock caused by blasting is one of the dangerous side effects that need to be accurately predicted in open-pit mines. This study proposed a new technique to predict the distance of fly-rock based on an ensemble of support vector regression models (SVRs) and Lasso and elastic-net regularized generalized linear model (GLMNET), called SVRs–GLMNET. It was developed based on a combination of six SVR models and a GLMNET model. Accordingly, the dataset including 210 experimental data was divided into three parts, i.e., training, validating, and testing. Of the whole dataset, 70% was used for the development of the six SVR models first as the sub-models. Subsequently, 20% of the entire dataset (the validating dataset) was used to predict fly-rock based on the six developed SVR models. The predicted results from the six developed SVR models were used as the input variables to establish the GLMNET model (i.e., SVRs–GLMNET model). Finally, the remaining 10% of the dataset was used for testing the performance of the proposed SVRs–GLMNET model. A comparison and evaluation of the six developed SVR models and the proposed SVRs–GLMNET model were implemented based on five statistical criteria, such as mean absolute error (MAE), mean absolute percentage error (MAPE), root-mean-square error (RMSE), variance account for (VAF), and determination of correlation (R²). The results indicated that the proposed SVRs–GLMNET model provided the most dominant performance in predicting the distance of fly-rock caused by bench blasting in this study with an RMSE of 3.737, R² of 0.993, MAE of 3.214, MAPE of 0.018, and VAF of 99.207. Whereas, the other models yielded poorer accuracy with RMSE of 7.058–12.779, R² of 0.920–0.972, MAE of 3.438–7.848, MAPE of 0.021–0.055, and VAF of 90.538–97.003.

相似文献

5.

Integrating geostatistics and remote sensing for mapping the spatial distribution of cattle hoofprints in relation to malaria vector control

Oupa E. Malahlela Clement Adjorlolo Jane M. Olwoch Mahlatse L. Kganyago Morwapula J. Mashalane 《International journal of remote sensing》2019,40(15):5917-5937

Globally, malaria is still a persistent health problem affecting more than 200 million people. With about 90% of malaria cases occurring in Sub-Saharan Africa, it becomes imperative to understand the environmental factors contributing to malaria vector proliferation. The cattle hoofprints are known to be some of the productive breeding sites for Anopheles (An.) arabiensis and An. fenestus in Southern and East African countries. Therefore, this study aimed at testing the potential of integrating field data and Sentinel-2 satellite imagery for mapping cattle hoofprint distribution in the Vhembe District, South Africa. The purpose was to improve the predictability of mosquito breeding sites in the study area by using field point dataset and Sentinel-2 data. Due to the difficulty of sampling all locations in the study area, the spatial interpolation was employed to create continuous surfaces of cattle hoofprints, using limited sampled point observations. The sampled point observations were then correlated with Sentinel-derived variables for predicting cattle hoofprints at unsampled locations. The ordinary Kriging (OK), co-Kriging (CK) and step-wise multiple linear regression (SMLR) were used due to their ability to incorporate both field point data and ancillary datasets. The CK was the best performing interpolation method, with R² = 0.69 for validation dataset (n = 33), compared to OK (R² = 0.57) and SMLR (R² = 0.25). The resulting co-Kriging semivariogram shows that the combination of field data and remote sensing dataset improves the prediction accuracy of cattle hoofprint distribution. Findings from this study demonstrated that the interpolation error for estimating cattle hoofprints/100 m² can be minimized greatly by using CK (RMSE = 0.2; MAD = 0.04) than with both OK (RMSE = 2.39; MAD = 2.11) and SMLR (RMSE = 5.20; MAD = 4.55) methods. Furthermore, the results from this study indicate that there is a high number of cattle hoofprints in malaria-prone areas at the study site than in the malaria-free areas. Studies such as this provide the platform for developing an operational platform for long-term monitoring of areas susceptible to malaria, risks, and control management. 相似文献

6.

Prediction of soil compression coefficient for urban housing project using novel integration machine learning approach of swarm intelligence and Multi-layer Perceptron Neural Network

《Advanced Engineering Informatics》2018

In many engineering projects, the soil compression coefficient is an important parameter used for estimating the settlement of soil layers. The common practice of determining the soil compression coefficient via the oedometer test is time-consuming and expensive. This study proposes a machine learning solution to replace the conventional tests used for obtaining the coefficient of soil compression. The new approach is an integration of the Multi-Layer Perceptron Neural Network (MLP Neural Nets) and Particle Swarm Optimization (PSO). These two computational intelligence methods work synergistically to establish a prediction model of soil compression coefficient. The PSO metaheuristic is employed to optimize the MLP Neural Nets model structure. To train and validate the proposed method, named as PSO-MLP Neural Nets, a dataset of 154 soil samples featuring 12 influencing factors has been collected from the geotechnical investigation process of a high-rise building project. Experimental results show that the proposed PSO-MLP Neural Nets has attained the most accurate prediction of the soil compression coefficient performance with RMSE = 0.0267, MAE = 0.0145, and R² = 0.884. The result of the proposed model is significantly better than those obtained from other benchmark methods including the backpropagation neural network, the radial basis function neural network, the support vector regression, the random forest, and the Gaussian process. Based on the experimental results, the newly constructed PSO-MLP Neural Nets is very potential to be a new alternative to assist geotechnical engineers in design phase of civil engineering projects. 相似文献

7.

基于GF-1 WFV影像和机器学习算法的玉米叶面积指数估算

下载免费PDF全文

贾洁琼刘万青孟庆岩孙云晓孙震辉《中国图象图形学报》2018,23(5):719-729

目的叶面积指数（LAI）是重要的植被生物理化参数,对农作物长势和产量预测具有重要研究意义。基于物理模型和经验模型的LAI估算方法被认为是当前最常用的方法,但两种方法的估算效率和精度有限。近年来,机器学习算法在遥感监测领域广泛应用,算法具有描述非线性数据拟合、融合更多辅助信息的能力,为了评价机器学习算法在玉米LAI遥感估算中的适用性,本文分析比较了随机森林和BP神经网络算法估算玉米LAI的能力,并与传统经验模型进行了比较。方法以河北省怀来县东花园镇为研究区,基于野外实测玉米LAI数据,结合同时期国产高分卫星（GF1-WFV影像）,首先分析了8种植被指数与LAI的相关性,进而采用保留交叉验证的方式将所有样本数据分为两部分,65%的数据作为模型训练集,35%作为验证集,重复随机分为3组,构建以8种植被指数为自变量,对应LAI值为因变量的RF模型、BP神经网络模型及传统经验模型。采用决定系数R²和均方根误差（RMSE）作为模型评价指标。结果 8种植被指数与LAI的相关性分析表明所有样本数据中,实测LAI值与各植被指数均在（P<0.01）水平下极显著相关,且相关系数均高于0.5;将3组不同样本数据在随机森林、BP神经网络算法中多次训练,并基于验证数据集进行估算精度检验,经验模型采用训练数据集建模,验证数据集检验,结果表明,RF模型表现出了较强的预测能力,LAI预测值与实测值R²分别为0.681、0.757、0.701,均高于BP模型（0.504、0.589、0.605）和经验模型（0.492、0.557、0.531）,对应RMSE分别为0.264、0.292、0.259;均低于BP模型（0.284、0.410、0.283）和经验模型（0.541、0.398、0.306）。结论研究表明,RF算法能更好地进行玉米LAI遥感估算,为快速准确进行农作物LAI遥感监测提供了技术参考。相似文献

8.

Improving Markov Chain Monte Carlo Model Search for Data Mining 总被引：9，自引：0，他引：9

Giudici Paolo Castelo Robert 《Machine Learning》2003,50(1-2):127-158

The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects.To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC ³ algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215–232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial.We present and describe in detail our implementation of the MC ³ algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish.Furthermore, in order to improve the MC ³ method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest.We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards &; T. Havránek, Biometrika, 72:2, 339–351, 1985). We then introduce a novel data mining application which concerns market basket analysis. 相似文献

9.

Managing Uncertainties in Image Databases: A Fuzzy Approach 总被引：1，自引：1，他引：0

Chianese A. Picariello A. Sansone L. Sapino M.L. 《Multimedia Tools and Applications》2004,23(3):237-252

相似文献

10.

A simple proof of the Pontryagin maximum principle on manifolds

Dong Eui Chang 《Automatica》2011,(3):630-633

Applying the tubular neighborhood theorem, we give a simple proof of the Pontryagin maximum principle on a smooth manifold. The idea is as follows. Given a control system on a manifold M, we embed it into some Rⁿ and extend the control system to Rⁿ. Then, we apply the Pontryagin maximum principle on Rⁿ to the extended system and project the consequence to M. 相似文献

11.

一种半监督集成跨项目软件缺陷预测方法

何吉元孟昭鹏陈翔王赞樊向宇《软件学报》2017,28(6):1455-1473

软件缺陷预测方法可以在项目的开发初期,通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配。早期的缺陷预测研究大多集中于同项目缺陷预测,但同项目缺陷预测需要充足的历史数据,而在实际应用中可能需要预测的项目的历史数据较为稀缺,或这个项目是一个全新项目。因此跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点,其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题。受到基于搜索的软件工程思想的启发,论文提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S³EL。该方法首先通过调整训练集中各类数据的分布比例,构建出多个朴素贝叶斯基分类器,随后利用具有全局搜索能力的遗传算法,基于少量已标记目标实例对上述基分类器进行集成,并构建出最终的缺陷预测模型。在Promise数据集及AEEEM数据集上和多个经典的跨项目缺陷预测方法（Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA）进行了对比。以F1值作为评测指标,结果表明在大部分情况下,S³EL方法可以取得最好的预测性能。相似文献

12.

Quantitative mapping of pasture biomass using satellite imagery

A. Edirisinghe M. J. Hill G. E. Donald M. Hyder 《International journal of remote sensing》2013,34(10):2699-2724

A knowledge of the amount of pasture biomass available in farm paddocks is crucial for improving utilization and productivity in the Australian grazing industry. A method to quantitatively map the biomass of annual pastures under grazing has been developed using the Normalized Difference Vegetation Index (NDVI) derived from high-resolution satellite imagery. Relationships between field-measured pasture biomass and the NDVI were examined for different transects in paddocks under different grazing regimes across three geographically dispersed farm sites. A significant linear relationship (R ² = 0.84) was observed when the NDVI was regressed against biomass. The slope of the relationship between the NDVI and biomass declined in a highly predictable (R ² = 0.82) exponential form as the growing season progressed and this pattern was consistent across four separate seasons. This knowledge was used to formulate a reliable model to predict paddock average pasture biomass using the NDVI. The model estimates were validated against observed biomass in the range 500–4000 kilograms of dry matter per hectare (kg DM ha^–1) with R ² = 0.85 and a standard error of 315 (kg DM ha^–1). 相似文献

13.

Application of several non-linear prediction tools for estimating uniaxial compressive strength of granitic rocks and comparison of their performances

Danial Jahed Armaghani Edy Tonnizam Mohamad Mohsen Hajihassani Saffet Yagiz Hossein Motaghedi 《Engineering with Computers》2016,32(2):189-206

Uniaxial compressive strength (UCS) of rock is crucial for any type of projects constructed in/on rock mass. The test that is conducted to measure the UCS of rock is expensive, time consuming and having sample restriction. For this reason, the UCS of rock may be estimated using simple rock tests such as point load index (I _s(50)), Schmidt hammer (R _n) and p-wave velocity (V _p) tests. To estimate the UCS of granitic rock as a function of relevant rock properties like R _n, p-wave and I _s(50), the rock cores were collected from the face of the Pahang–Selangor fresh water tunnel in Malaysia. Afterwards, 124 samples are prepared and tested in accordance with relevant standards and the dataset is obtained. Further an established dataset is used for estimating the UCS of rock via three-nonlinear prediction tools, namely non-linear multiple regression (NLMR), artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS). After conducting the mentioned models, considering several performance indices including coefficient of determination (R ²), variance account for and root mean squared error and also using simple ranking procedure, the models were examined and the best prediction model was selected. It is concluded that the R ² equal to 0.951 for testing dataset suggests the superiority of the ANFIS model, while these values are 0.651 and 0.886 for NLMR and ANN techniques, respectively. The results pointed out that the ANFIS model can be used for predicting UCS of rocks with higher capacity in comparison with others. However, the developed model may be useful at a preliminary stage of design; it should be used with caution and only for the specified rock types. 相似文献

14.

Comparison of different regression models and validation techniques for the assessment of wheat leaf area index from hyperspectral data 总被引：1，自引：0，他引：1

Bastian Siegmann Thomas Jarmer 《International journal of remote sensing》2013,34(18):4519-4534

Leaf area index (LAI) is one of the most important plant parameters when observing agricultural crops and a decisive factor for yield estimates. Remote-sensing data provide spectral information on large areas and allow for a detailed quantitative assessment of LAI and other plant parameters. The present study compared support vector regression (SVR), random forest regression (RFR), and partial least-squares regression (PLSR) and their achieved model qualities for the assessment of LAI from wheat reflectance spectra. In this context, the validation technique used for verifying the accuracy of an empirical–statistical regression model was very important in order to allow the spatial transferability of models to unknown data. Thus, two different validation methods, leave-one-out cross-validation (cv) and independent validation (iv), were performed to determine model accuracy. The LAI and field reflectance spectra of 124 plots were collected from four fields during two stages of plant development in 2011 and 2012. In the case of cross-validation for the separate years, as well as the entire data set, SVR provided the best results (2011: R²_cv = 0.739, 2012: R²_cv = 0.85, 2011 and 2012: R²_cv = 0.944). Independent validation of the data set from both years led to completely different results. The accuracy of PLSR (R²_iv = 0.912) and RFR (R²_iv = 0.770) remained almost at the same level as that of cross-validation, while SVR showed a clear decline in model performance (R²_iv = 0.769). The results indicate that regression model robustness largely depends on the applied validation approach and the data range of the LAI used for model building. 相似文献

15.

GA-SVR: a novel hybrid data-driven model to simulate vertical load capacity of driven piles

Luo Zhenyan Hasanipanah Mahdi Bakhshandeh Amnieh Hassan Brindhadevi Kathirvel Tahir M. M. 《Engineering with Computers》2021,37(2):823-831

Piles are widely applied to substructures of various infrastructural buildings. Soil has a complex nature; thus, a variety of empirical models have been proposed for the prediction of the bearing capacity of piles. The aim of this study is to propose a novel artificial intelligent approach to predict vertical load capacity of driven piles in cohesionless soils using support vector regression (SVR) optimized by genetic algorithm (GA). To the best of our knowledge, no research has been developed the GA-SVR model to predict vertical load capacity of driven piles in different timescales as of yet, and the novelty of this study is to develop a new hybrid intelligent approach in this field. To investigate the efficacy of GA-SVR model, two other models, i.e., SVR and linear regression models, are also used for a comparative study. According to the obtained results, GA-SVR model clearly outperformed the SVR and linear regression models by achieving less root mean square error (RMSE) and higher coefficient of determination (R²). In other words, GA-SVR with RMSE of 0.017 and R² of 0.980 has higher performance than SVR with RMSE of 0.035 and R² of 0.912, and linear regression model with RMSE of 0.079 and R² of 0.625.

相似文献

16.

Retrieval of remotely sensed sea surface salinity using MODIS data in the Chinese Bohai Sea

Xiang Yu Bei Xiao Xiangyang Liu Yebao Wang Buli Cui Xin Liu 《International journal of remote sensing》2017,38(23):7357-7373

Salinity dominates seawater density and directly affects physical and biochemical processes. Having a reliable retrieval model is essential to providing frequent and accurate sea surface salinity (SSS) data for marine research. Remote-sensing techniques provide alternatives for SSS data retrieval with its advantages of wide area surveys and real-time monitoring. In the present study, inverse relationship between SSS and coloured dissolved organic matter (CDOM) concentration in the Chinese Bohai Sea was verified. Thus, four simple band ratios of the original remote-sensing reflectance (R_rs) used to retrieve the CDOM concentration were compared and tested during SSS retrieval. R_rs (531)/R_rs (551) performed best among the four given band ratios. The model employed here can be applied to derive SSS with a root mean square error (RMSE) of 0.26 practical salinity units (psu) (R² = 0.76). A calibration model was verified using a discrete dataset of the measured SSS and was tested further during mapping of SSS in the Chinese Bohai Sea during 2010–2014. The yielded spatial patterns of SSS were satisfactory and an inverse relationship between SSS and the Yellow River discharge was confirmed. 相似文献

17.

An investigation on the feasibility of cross-project defect prediction

Zhimin He Fengdi Shu Ye Yang Mingshu Li Qing Wang 《Automated Software Engineering》2012,19(2):167-199

Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar project settings as those under prediction. However, such historical data is not always available in practice. One potential way of predicting defects in projects without historical data is to learn predictors from data of other projects. This paper investigates defect predictions in the cross-project context focusing on the selection of training data. We conduct three large-scale experiments on 34 data sets obtained from 10 open source projects. Major conclusions from our experiments include: (1) in the best cases, training data from other projects can provide better prediction results than training data from the same project; (2) the prediction results obtained using training data from other projects meet our criteria for acceptance on the average level, defects in 18 out of 34 cases were predicted at a Recall greater than 70% and a Precision greater than 50%; (3) results of cross-project defect predictions are related with the distributional characteristics of data sets which are valuable for training data selection. We further propose an approach to automatically select suitable training data for projects without historical data. Prediction results provided by the training data selected by using our approach are comparable with those provided by training data from the same project. 相似文献

18.

投资者视角下的奖励型众筹问题研究

倪宁曦陈玉婕金百锁《计算机系统应用》2017,26(7):17-23

随着众筹行业的迅猛发展,众筹项目数量迅速增长,使得投资者在项目选择上花费了大量的时间精力.本文旨在帮助投资者以最少时间成本选择优质的众筹项目.在假设众筹项目优质程度与融资完成比有正相关关系的前提下,本文基于京东众筹数据,利用CART回归树算法进行决策树建模,模型R²达到0.746.研究结果表明,投资者应重点关注目标金额,关注人数,项目进展和话题这四个指标.本文研究结果仅适用于奖励型众筹,对于其他类型众筹应当重新选择自变量进行模型建立,但决策树模型仍然可以适用. 相似文献

19.

基于特征迁移和实例迁移的跨项目缺陷预测方法

倪超陈翔刘望舒顾庆黄启国李娜《软件学报》2019,30(5):1308-1329

在实际软件开发中,需要进行缺陷预测的项目可能是一个新启动项目,或者这个项目的历史训练数据较为稀缺.一种解决方案是利用其他项目（即源项目）已搜集的训练数据来构建模型,并完成对当前项目（即目标项目）的预测.但不同项目的数据集间会存在较大的分布差异性.针对该问题,从特征迁移和实例迁移角度出发,提出了一种两阶段跨项目缺陷预测方法FeCTrA.具体来说,在特征迁移阶段,该方法借助聚类分析选出源项目与目标项目之间具有高分布相似度的特征;在实例迁移阶段,该方法基于TrAdaBoost方法,借助目标项目中的少量已标注实例,从源项目中选出与这些已标注实例分布相近的实例.为了验证FeCTrA方法的有效性,选择Relink数据集和AEEEM数据集作为评测对象,以F1作为评测指标.首先,FeCTrA方法的预测性能要优于仅考虑特征迁移阶段或实例迁移阶段的单阶段方法;其次,与经典的跨项目缺陷预测方法TCA+、Peters过滤法、Burak过滤法以及DCPDP法相比,FeCTrA方法的预测性能在Relink数据集上可以分别提升23%、7.2%、9.8%和38.2%,在AEEEM数据集上可以分别提升96.5%、108.5%、103.6%和107.9%;最后,分析了FeCTrA方法内的影响因素对预测性能的影响,从而为有效使用FeCTrA方法提供了指南. 相似文献

20.

Towards building a universal defect prediction model with rank transformed predictors

Feng Zhang Audris Mockus Iman Keivanloo Ying Zou 《Empirical Software Engineering》2016,21(5):2107-2145

Software defects can lead to undesired results. Correcting defects costs 50 % to 75 % of the total software development budgets. To predict defective files, a prediction model must be built with predictors (e.g., software metrics) obtained from either a project itself (within-project) or from other projects (cross-project). A universal defect prediction model that is built from a large set of diverse projects would relieve the need to build and tailor prediction models for an individual project. A formidable obstacle to build a universal model is the variations in the distribution of predictors among projects of diverse contexts (e.g., size and programming language). Hence, we propose to cluster projects based on the similarity of the distribution of predictors, and derive the rank transformations using quantiles of predictors for a cluster. We fit the universal model on the transformed data of 1,385 open source projects hosted on SourceForge and GoogleCode. The universal model obtains prediction performance comparable to the within-project models, yields similar results when applied on five external projects (one Apache and four Eclipse projects), and performs similarly among projects with different context factors. At last, we investigate what predictors should be included in the universal model. We expect that this work could form a basis for future work on building a universal model and would lead to software support tools that incorporate it into a regular development workflow. 相似文献