首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite substantial research conducted within the forestry domain, detailed assessments to monitor plantations and support their sustainable management have been understudied. This article attempts to fill this gap through coupling fully polarimetric L-band data and contemporary data mining methods for the estimation of tree circumference as: (1) a primary dataset for biomass accumulation studies; and, (2) critical information for operational management in rubber plantations. We used two rubber plantation sites in Subang (West Java) and Jember (East Java), Indonesia, to evaluate the capability of L-band radar data. Although polarimetric features derived from polarimetric decomposition theorems have been advocated by others, we show that backscatter coefficients, especially HV polarization, remain an important dataset for this research domain. Using Subang data to build the model, we found that modern machine learning methods do not always deliver the best performance. It appears that the data being ingested plays a significant role in obtaining a good model, hence careful selection of datasets from multiple forms of polarimetric SAR data needs to be further considered. The highest coefficient of determination (R2 = 0.79) was achieved by Yamaguchi decomposition features with the aid of partial least squares regression. Nonetheless, we note that the R2 gap was insignificant to the backscatter coefficient when random forests regression was used (R2 = 0.78). Overall, only the backscatter coefficient dataset delivered fairly consistent results with any regression model, with the average R2 being about 0.67. When tuning parameters were not assessed, random forests consistently outweighed support vector regressions in all forms of datasets. The latter generated a substantial increase in R2 when a linear kernel was used instead of the popular radial basis function. The issue of transferability of the model is also addressed in this article. It appears that similarity of terrain characteristics substantially influences the model’s performance. Models developed in Subang, which has gentle slopes, seem valid only in plantations with similar terrain. Validation attempts in very flat terrain within two plantation sectors in Jember delivered a poor result, although they have similar elevations to the Subang site. In contrast, validation in a plantation sector with similar, gently sloping terrain achieved an R2 of about 0.6 using some datasets.  相似文献   

2.
针对知识图谱推荐算法用户端和项目端建模程度不均且模型复杂度较高等问题, 提出融合知识图谱和轻量图卷积网络的推荐算法. 在用户端, 利用用户相似性生成邻居集合, 将用户及其相似用户的交互记录在知识图谱上多次迭代传播, 增强用户特征表示. 在项目端, 将知识图谱中实体嵌入传播, 挖掘与用户喜好相关的项目信息; 接着, 利用轻量图卷积网络聚合邻域特征获得用户和项目的特征表示, 同时采用注意力机制将邻域权重融入实体, 增强节点的嵌入表示; 最后, 预测用户和项目之间的评分. 实验表明, 在Book-Crossing数据集上, 相较于最优基线, AUCACC分别提高了1.8%和2.3%. 在Yelp2018数据集上, AUCACC分别提高了1.2%和1.4%. 结果证明, 该模型与其他基准模型相比有较好的推荐性能.  相似文献   

3.
This paper develops tests and validates a model for the antecedents of open source software (OSS) defects, using Data and Text Mining. The public archives of OSS projects are used to access historical data on over 5,000 active and mature OSS projects. Using domain knowledge and exploratory analysis, a wide range of variables is identified from the process, product, resource, and end-user characteristics of a project to ensure that the model is robust and considers all aspects of the system. Multiple Data Mining techniques are used to refine the model and data is enriched by the use of Text Mining for knowledge discovery from qualitative information. The study demonstrates the suitability of Data Mining and Text Mining for model building. Results indicate that project type, end-user activity, process quality, team size and project popularity have a significant impact on the defect density of operational OSS projects. Since many organizations, both for profit and not for profit, are beginning to use Open Source Software as an economic alternative to commercial software, these results can be used in the process of deciding what software can be reasonably maintained by an organization.  相似文献   

4.
目的 叶面积指数(LAI)是重要的植被生物理化参数,对农作物长势和产量预测具有重要研究意义。基于物理模型和经验模型的LAI估算方法被认为是当前最常用的方法,但两种方法的估算效率和精度有限。近年来,机器学习算法在遥感监测领域广泛应用,算法具有描述非线性数据拟合、融合更多辅助信息的能力,为了评价机器学习算法在玉米LAI遥感估算中的适用性,本文分析比较了随机森林和BP神经网络算法估算玉米LAI的能力,并与传统经验模型进行了比较。方法 以河北省怀来县东花园镇为研究区,基于野外实测玉米LAI数据,结合同时期国产高分卫星(GF1-WFV影像),首先分析了8种植被指数与LAI的相关性,进而采用保留交叉验证的方式将所有样本数据分为两部分,65%的数据作为模型训练集,35%作为验证集,重复随机分为3组,构建以8种植被指数为自变量,对应LAI值为因变量的RF模型、BP神经网络模型及传统经验模型。采用决定系数R2和均方根误差(RMSE)作为模型评价指标。结果 8种植被指数与LAI的相关性分析表明所有样本数据中,实测LAI值与各植被指数均在(P<0.01)水平下极显著相关,且相关系数均高于0.5;将3组不同样本数据在随机森林、BP神经网络算法中多次训练,并基于验证数据集进行估算精度检验,经验模型采用训练数据集建模,验证数据集检验,结果表明,RF模型表现出了较强的预测能力,LAI预测值与实测值R2分别为0.681、0.757、0.701,均高于BP模型(0.504、0.589、0.605)和经验模型(0.492、0.557、0.531),对应RMSE分别为0.264、0.292、0.259;均低于BP模型(0.284、0.410、0.283)和经验模型(0.541、0.398、0.306)。结论 研究表明,RF算法能更好地进行玉米LAI遥感估算,为快速准确进行农作物LAI遥感监测提供了技术参考。  相似文献   

5.
Improving Markov Chain Monte Carlo Model Search for Data Mining   总被引:9,自引:0,他引:9  
Giudici  Paolo  Castelo  Robert 《Machine Learning》2003,50(1-2):127-158
The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects.To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC 3 algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215–232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial.We present and describe in detail our implementation of the MC 3 algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish.Furthermore, in order to improve the MC 3 method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest.We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards &; T. Havránek, Biometrika, 72:2, 339–351, 1985). We then introduce a novel data mining application which concerns market basket analysis.  相似文献   

6.
Managing Uncertainties in Image Databases: A Fuzzy Approach   总被引:1,自引:1,他引:0  
  相似文献   

7.
Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar project settings as those under prediction. However, such historical data is not always available in practice. One potential way of predicting defects in projects without historical data is to learn predictors from data of other projects. This paper investigates defect predictions in the cross-project context focusing on the selection of training data. We conduct three large-scale experiments on 34 data sets obtained from 10 open source projects. Major conclusions from our experiments include: (1) in the best cases, training data from other projects can provide better prediction results than training data from the same project; (2) the prediction results obtained using training data from other projects meet our criteria for acceptance on the average level, defects in 18 out of 34 cases were predicted at a Recall greater than 70% and a Precision greater than 50%; (3) results of cross-project defect predictions are related with the distributional characteristics of data sets which are valuable for training data selection. We further propose an approach to automatically select suitable training data for projects without historical data. Prediction results provided by the training data selected by using our approach are comparable with those provided by training data from the same project.  相似文献   

8.
In many engineering projects, the soil compression coefficient is an important parameter used for estimating the settlement of soil layers. The common practice of determining the soil compression coefficient via the oedometer test is time-consuming and expensive. This study proposes a machine learning solution to replace the conventional tests used for obtaining the coefficient of soil compression. The new approach is an integration of the Multi-Layer Perceptron Neural Network (MLP Neural Nets) and Particle Swarm Optimization (PSO). These two computational intelligence methods work synergistically to establish a prediction model of soil compression coefficient. The PSO metaheuristic is employed to optimize the MLP Neural Nets model structure. To train and validate the proposed method, named as PSO-MLP Neural Nets, a dataset of 154 soil samples featuring 12 influencing factors has been collected from the geotechnical investigation process of a high-rise building project. Experimental results show that the proposed PSO-MLP Neural Nets has attained the most accurate prediction of the soil compression coefficient performance with RMSE = 0.0267, MAE = 0.0145, and R2 = 0.884. The result of the proposed model is significantly better than those obtained from other benchmark methods including the backpropagation neural network, the radial basis function neural network, the support vector regression, the random forest, and the Gaussian process. Based on the experimental results, the newly constructed PSO-MLP Neural Nets is very potential to be a new alternative to assist geotechnical engineers in design phase of civil engineering projects.  相似文献   

9.
A knowledge of the amount of pasture biomass available in farm paddocks is crucial for improving utilization and productivity in the Australian grazing industry. A method to quantitatively map the biomass of annual pastures under grazing has been developed using the Normalized Difference Vegetation Index (NDVI) derived from high-resolution satellite imagery. Relationships between field-measured pasture biomass and the NDVI were examined for different transects in paddocks under different grazing regimes across three geographically dispersed farm sites. A significant linear relationship (R 2 = 0.84) was observed when the NDVI was regressed against biomass. The slope of the relationship between the NDVI and biomass declined in a highly predictable (R 2 = 0.82) exponential form as the growing season progressed and this pattern was consistent across four separate seasons. This knowledge was used to formulate a reliable model to predict paddock average pasture biomass using the NDVI. The model estimates were validated against observed biomass in the range 500–4000 kilograms of dry matter per hectare (kg DM ha–1) with R 2 = 0.85 and a standard error of 315 (kg DM ha–1).  相似文献   

10.
Uniaxial compressive strength (UCS) of rock is crucial for any type of projects constructed in/on rock mass. The test that is conducted to measure the UCS of rock is expensive, time consuming and having sample restriction. For this reason, the UCS of rock may be estimated using simple rock tests such as point load index (I s(50)), Schmidt hammer (R n) and p-wave velocity (V p) tests. To estimate the UCS of granitic rock as a function of relevant rock properties like R n, p-wave and I s(50), the rock cores were collected from the face of the Pahang–Selangor fresh water tunnel in Malaysia. Afterwards, 124 samples are prepared and tested in accordance with relevant standards and the dataset is obtained. Further an established dataset is used for estimating the UCS of rock via three-nonlinear prediction tools, namely non-linear multiple regression (NLMR), artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS). After conducting the mentioned models, considering several performance indices including coefficient of determination (R 2), variance account for and root mean squared error and also using simple ranking procedure, the models were examined and the best prediction model was selected. It is concluded that the R 2 equal to 0.951 for testing dataset suggests the superiority of the ANFIS model, while these values are 0.651 and 0.886 for NLMR and ANN techniques, respectively. The results pointed out that the ANFIS model can be used for predicting UCS of rocks with higher capacity in comparison with others. However, the developed model may be useful at a preliminary stage of design; it should be used with caution and only for the specified rock types.  相似文献   

11.

Fly-rock caused by blasting is one of the dangerous side effects that need to be accurately predicted in open-pit mines. This study proposed a new technique to predict the distance of fly-rock based on an ensemble of support vector regression models (SVRs) and Lasso and elastic-net regularized generalized linear model (GLMNET), called SVRs–GLMNET. It was developed based on a combination of six SVR models and a GLMNET model. Accordingly, the dataset including 210 experimental data was divided into three parts, i.e., training, validating, and testing. Of the whole dataset, 70% was used for the development of the six SVR models first as the sub-models. Subsequently, 20% of the entire dataset (the validating dataset) was used to predict fly-rock based on the six developed SVR models. The predicted results from the six developed SVR models were used as the input variables to establish the GLMNET model (i.e., SVRs–GLMNET model). Finally, the remaining 10% of the dataset was used for testing the performance of the proposed SVRs–GLMNET model. A comparison and evaluation of the six developed SVR models and the proposed SVRs–GLMNET model were implemented based on five statistical criteria, such as mean absolute error (MAE), mean absolute percentage error (MAPE), root-mean-square error (RMSE), variance account for (VAF), and determination of correlation (R2). The results indicated that the proposed SVRs–GLMNET model provided the most dominant performance in predicting the distance of fly-rock caused by bench blasting in this study with an RMSE of 3.737, R2 of 0.993, MAE of 3.214, MAPE of 0.018, and VAF of 99.207. Whereas, the other models yielded poorer accuracy with RMSE of 7.058–12.779, R2 of 0.920–0.972, MAE of 3.438–7.848, MAPE of 0.021–0.055, and VAF of 90.538–97.003.

  相似文献   

12.

Piles are widely applied to substructures of various infrastructural buildings. Soil has a complex nature; thus, a variety of empirical models have been proposed for the prediction of the bearing capacity of piles. The aim of this study is to propose a novel artificial intelligent approach to predict vertical load capacity of driven piles in cohesionless soils using support vector regression (SVR) optimized by genetic algorithm (GA). To the best of our knowledge, no research has been developed the GA-SVR model to predict vertical load capacity of driven piles in different timescales as of yet, and the novelty of this study is to develop a new hybrid intelligent approach in this field. To investigate the efficacy of GA-SVR model, two other models, i.e., SVR and linear regression models, are also used for a comparative study. According to the obtained results, GA-SVR model clearly outperformed the SVR and linear regression models by achieving less root mean square error (RMSE) and higher coefficient of determination (R2). In other words, GA-SVR with RMSE of 0.017 and R2 of 0.980 has higher performance than SVR with RMSE of 0.035 and R2 of 0.912, and linear regression model with RMSE of 0.079 and R2 of 0.625.

  相似文献   

13.
Globally, malaria is still a persistent health problem affecting more than 200 million people. With about 90% of malaria cases occurring in Sub-Saharan Africa, it becomes imperative to understand the environmental factors contributing to malaria vector proliferation. The cattle hoofprints are known to be some of the productive breeding sites for Anopheles (An.) arabiensis and An. fenestus in Southern and East African countries. Therefore, this study aimed at testing the potential of integrating field data and Sentinel-2 satellite imagery for mapping cattle hoofprint distribution in the Vhembe District, South Africa. The purpose was to improve the predictability of mosquito breeding sites in the study area by using field point dataset and Sentinel-2 data. Due to the difficulty of sampling all locations in the study area, the spatial interpolation was employed to create continuous surfaces of cattle hoofprints, using limited sampled point observations. The sampled point observations were then correlated with Sentinel-derived variables for predicting cattle hoofprints at unsampled locations. The ordinary Kriging (OK), co-Kriging (CK) and step-wise multiple linear regression (SMLR) were used due to their ability to incorporate both field point data and ancillary datasets. The CK was the best performing interpolation method, with R2 = 0.69 for validation dataset (n = 33), compared to OK (R2 = 0.57) and SMLR (R2 = 0.25). The resulting co-Kriging semivariogram shows that the combination of field data and remote sensing dataset improves the prediction accuracy of cattle hoofprint distribution. Findings from this study demonstrated that the interpolation error for estimating cattle hoofprints/100 m2 can be minimized greatly by using CK (RMSE = 0.2; MAD = 0.04) than with both OK (RMSE = 2.39; MAD = 2.11) and SMLR (RMSE = 5.20; MAD = 4.55) methods. Furthermore, the results from this study indicate that there is a high number of cattle hoofprints in malaria-prone areas at the study site than in the malaria-free areas. Studies such as this provide the platform for developing an operational platform for long-term monitoring of areas susceptible to malaria, risks, and control management.  相似文献   

14.
We study syntax-free models for name-passing processes. For interleaving semantics, we identify the indexing structure required of an early labelled transition system to support the usual π-calculus operations, defining Indexed Labelled Transition Systems. For non-interleaving causal semantics we define Indexed Labelled Asynchronous Transition Systems, smoothly generalizing both our interleaving model and the standard Asynchronous Transition Systems model for CCS-like calculi. In each case we relate a denotational semantics to an operational view, for bisimulation and causal bisimulation respectively. We establish completeness properties of, and adjunctions between, categories of the two models. Alternative indexing structures and possible applications are also discussed. These are first steps towards a uniform understanding of the semantics and operations of name-passing calculi.  相似文献   

15.
Software defects can lead to undesired results. Correcting defects costs 50 % to 75 % of the total software development budgets. To predict defective files, a prediction model must be built with predictors (e.g., software metrics) obtained from either a project itself (within-project) or from other projects (cross-project). A universal defect prediction model that is built from a large set of diverse projects would relieve the need to build and tailor prediction models for an individual project. A formidable obstacle to build a universal model is the variations in the distribution of predictors among projects of diverse contexts (e.g., size and programming language). Hence, we propose to cluster projects based on the similarity of the distribution of predictors, and derive the rank transformations using quantiles of predictors for a cluster. We fit the universal model on the transformed data of 1,385 open source projects hosted on SourceForge and GoogleCode. The universal model obtains prediction performance comparable to the within-project models, yields similar results when applied on five external projects (one Apache and four Eclipse projects), and performs similarly among projects with different context factors. At last, we investigate what predictors should be included in the universal model. We expect that this work could form a basis for future work on building a universal model and would lead to software support tools that incorporate it into a regular development workflow.  相似文献   

16.
如何利用人工智能技术回答标准测试题目是一项具有挑战性的任务,吸引了人工智能领域的广泛研究。该文聚焦在高中地理的因果简答题求解任务,求解因果简答题需要进行知识集成和多跳因果推理,最终生成一段长文本作为答案。为此,该文定义了抽象事理图谱(AEG)来表示因果等关系,并利用预训练语言模型从语料中自动抽取一个面向高中地理因果简答题的抽象事理图谱,实现了多源知识集成。基于抽象事理图谱,该文利用图神经网络技术来融合结构化和非结构化知识,实现了多跳因果推理。该文在包含真实的高中地理因果简答题的数据集GeoCEQA上开展实验,结果表明,无论是ROUGE、BLEU指标还是人工评价的得分,该文提出的方法都取得了最佳结果,在ROUGE指标上,相比最优基线方法提升0.8%~1.4%;在BLEU指标上,相比最优基线方法提升0.4%;在人工评价得分上,相比最优基线方法提升4.2%。  相似文献   

17.
一种半监督集成跨项目软件缺陷预测方法   总被引:2,自引:2,他引:0  
何吉元  孟昭鹏  陈翔  王赞  樊向宇 《软件学报》2017,28(6):1455-1473
软件缺陷预测方法可以在项目的开发初期,通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配。早期的缺陷预测研究大多集中于同项目缺陷预测,但同项目缺陷预测需要充足的历史数据,而在实际应用中可能需要预测的项目的历史数据较为稀缺,或这个项目是一个全新项目。因此跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点,其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题。受到基于搜索的软件工程思想的启发,论文提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S3EL。该方法首先通过调整训练集中各类数据的分布比例,构建出多个朴素贝叶斯基分类器,随后利用具有全局搜索能力的遗传算法,基于少量已标记目标实例对上述基分类器进行集成,并构建出最终的缺陷预测模型。在Promise数据集及AEEEM数据集上和多个经典的跨项目缺陷预测方法(Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA)进行了对比。以F1值作为评测指标,结果表明在大部分情况下,S3EL方法可以取得最好的预测性能。  相似文献   

18.
《Knowledge》2006,19(7):459-470
This paper presents a set of experiments we carried out with, Divago, a system that is an attempt to implement our ideas towards a computational model of creativity. It is expected to be able to generate novel concepts out of previous knowledge. Here we show its behaviour with a large dataset constructed independently by other researchers consisting of over 170 nouns (for a project named C3). Each noun is represented with a syntax that is equivalent to the one adopted for Divago. We apply a two step experimentation procedure, which starts by “training” the system with “preferred outcomes” and then allowing it to do free generation, constrained by the pragmatic goal of a given query. We evaluate the results and make a short discussion regarding well-defined criteria of novelty and usefulness. We also present a comparison with a similar experiment done with C3.  相似文献   

19.

This study proposes a novel design to systematically optimize the parameters for the adaptive neuro-fuzzy inference system (ANFIS) model using stochastic fractal search (SFS) algorithm. To affirm the efficiency of the proposed SFS-ANFIS model, the predicting results were compared with ANFIS and three hybrid methodologies based on ANFIS combined with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO). Accurate prediction of uniaxial compressive strength (UCS) is of great significance for all geotechnical projects such as tunnels and dams. Hence, this study proposes the use of SFS-ANFIS, GA-ANFIS, DE-ANFIS, PSO-ANFIS, and ANFIS models to predict UCS. In this regard, the fresh water tunnel of Pahang–Selangor located in Malaysia was considered and the requirement data samples were collected. Different statistical metrics such as coefficient of determination (R2) and mean absolute error were used to evaluate the models. Referring to the efficiency results of SFS-ANFIS, it can be found that the SFS-ANFIS (with the R2 of 0.981) has higher ability than PSO-ANFIS, DE-ANFIS, GA-ANFIS, and ANFIS models in predicting the UCS.

  相似文献   

20.
The spectral characteristics of and the interaction between leaves and light were analysed based on the optical absorption coefficients of foliar water and biochemical components. The equations for calculating the radiative-equivalent water thickness (REWT) of leaves and canopy were presented based on the difference in reflectance at 945 and 975 nm. Because of the direct reflection on leaf surface and the leaf internal scattering, the REWT derived from the Beer–Lambert principle was different from the leaf or canopy equivalent water thickness (EWT). Two independent datasets at canopy or leaf scales were designed to calibrate and validate the relationships between EWT and REWT. The results show that (1) the leaf or canopy REWT can be calculated from the reflectance difference between 945 and 975 nm; (2) the leaf REWT was 3.3 times larger than the EWT with a significant determination coefficient (R 2) of 0.80 for our dataset and 0.86 for the Leaf Optical Properties Experiment (LOPEX'93) dataset; (3) the canopy REWT was 1.4 times larger than the EWT with a significant R 2 of 0.56 for the winter wheat canopy spectral dataset in 2002, and 0.61 for the 2004 dataset. Therefore, the leaf or canopy EWT can be detected by calculating REWT from the difference in reflectance at 945 and 975 nm. Furthermore, because the relationship between REWT and EWT reflected the interaction of light with leaves or canopy, the multiple scattering optical pathlength in the near-infrared (NIR) bands can also be calculated by the ratio of REWT to EWT.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号