首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Crop models are used to estimate crop productivity under future climate projections, and modellers manage uncertainty by considering different scenarios and GCMs, using a range of crop simulators. Five crop models and 20 users were arranged in a randomized block design with four replicates. Parameters for maize (well studied by modellers) and rapeseed (almost ignored) were calibrated. While all models were accurate for maize (RRMSE from 16.5% to 25.9%), they were, to some extent, unsuitable for rapeseed. Although differences between biomass simulated by the models were generally significant for rapeseed, they were significant only in 30% of the cases for maize. This could suggest that in case of models well suited to a crop, user subjectivity (which explained 14% of total variance in maize outputs) can hide differences in model algorithms and, consequently, the uncertainty due to parameterization should be better investigated.  相似文献   

2.
    
Despite significant successes achieved in knowledge discovery,traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data,such as imbalanced,high-dimensional,noisy data,etc.The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data.In this context,it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model.Ensemble learning,as one research hot spot,aims to integrate data fusion,data modeling,and data mining into a unified framework.Specifically,ensemble learning firstly extracts a set of features with a variety of transformations.Based on these learned features,multiple learning algorithms are utilized to produce weak predictive results.Finally,ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way.In this paper,we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics.In addition,we present challenges and possible research directions for each mainstream approach of ensemble learning,and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning,reinforcement learning,etc.  相似文献   

3.
在集成学习领域,传统的动态集成选择需要为每一个样本选择子分类器组成集成分类器,这极大地增加了计算复杂度。针对这一问题,提出一种新的半动态集成选择方法。该方法分为两阶段,第一阶段为所有的测试样本选择最好的个体分类器组成一个集成分类器,第二阶段从剩余的个体分类器集合中为当前测试样本动态地选择子分类器组成一个集成分类器。最终的分类结果通过融合两阶段得到集成分类器的结果得到。通过对UCI数据测试的结果表明,该算法不仅能取得较好的分类性能,而且能极大地降低计算复杂度。  相似文献   

4.
基于不平衡数据的中文情感分类   总被引:2,自引:0,他引:2  
近些年来,情感分类在自然语言处理研究领域获得了显著的发展。然而,大部分已有的研究都假设参与分类的正类样本和负类样本一样多,而实际情况中正负类数据的分布往往是不平衡的。该文收集四个产品领域的中文评论文本,发现正类样本的数目远远多于负类样本。针对不平衡数据的中文情感分类,提出了一种基于欠采样和多分类算法的集成学习框架。在四个不同领域的实验结果表明,我们的方法能够显著提高分类性能,并明显优于目前主流的多种不平衡分类方法。  相似文献   

5.
Random Forests   总被引:333,自引:0,他引:333  
Breiman  Leo 《Machine Learning》2001,45(1):5-32
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.  相似文献   

6.
随着Internet技术的发展,万维网上的文档数目成指数级增长。在如此浩瀚的信息库中,用户很难找到自己所需要的信息,如何自动且高效地处理这些海量文档信息成为了目前重要的研究课题。文章通过对抽取到的数据集文档中的标题,超连接和标记等超文本信息,以及文档内容本身分别建立分类模型。然后根据神经网络集成各个分类模型得出判别结果,提出了一种基于元信息的超文本集成分类算法,该算法能更好的综合利用超文本的多元结构化信息。实验结果表明,相对于单独利用某种超文本结构信息进行分类的方法。基于元信息的超文本集成分类算法具有更好的分类性能。  相似文献   

7.
牛鹏  魏维 《计算机应用》2010,30(6):1590-1593
在Bagging支持向量机(SVM)的基础上,将动态分类器集选择技术用于SVM的集成学习,研究了SVM动态集成在高光谱遥感图像分类中的应用。结合高光谱数据特性,通过随机选取特征子空间和反馈学习改进了Bagging SVM方法;通过引进加性复合距离改善了K近邻局部空间的计算方法;通过将错分的训练样本添加到验证集增强了验证集样本的代表性。实验结果表明,与单个优化的SVM和其他常见的SVM集成方法相比,改进后的SVM动态集成分类精度最高,能有效地提高高光谱遥感图像的分类精度。  相似文献   

8.
鼻炎(Rhinitis )是上呼吸道常见的慢性炎症,具有多种证型和体征。鼻炎临床分类具有样本类型多、类别不平衡特征,属于多输出分类范畴,常出现少数类样本识别率低、综合分类精度差的问题。为此,本文提出异质集成结构分类算法,将鼻炎多输出分类转化为多标签和多类别分类,采用集成学习算法构建异质集成分类器。该方法可根据子数据集中单一类标的不平衡度,自动调节集成森林基学习器数量和深度,有效减少不均衡样本对分类的影响,提高多数类和少数类的总体分类精度,进而提升集成模型的泛化能力。针对临床461例鼻炎样本进行交叉验证分类实验,本文分类模型灵敏度为74.9%,特异性为86.5%,准确度为92.0%,F1为0.783,AUC为0.953。与6种典型模型相比,本文模型具有更好的评估性能,更适合于鼻炎的早期临床诊断。  相似文献   

9.
用于图分类的组合维核方法   总被引:1,自引:0,他引:1  
对图等内含结构信息的数据进行学习,是机器学习领域的一个重要问题.核方法是解决此类问题的一种有效技术.文中针对分子图分类问题,基于Swamidass等人的工作,提出用于图分类的组合维核方法.该方法首先构建融合一维信息的二维核来刻画分子化学特征,然后基于分子力学的相关知识,利用几何信息构建三维核来刻画分子物理性质.在此基础上对不同维度的核进行集成,通过求解二次约束二次规划问题来获得最优核组合.实验结果表明,文中方法比现有技术具有更好的性能.  相似文献   

10.
电网公司的电费敏感客户往往对由用电引发的电量、电价、电费、缴费、欠费等电力服务具有强烈反应。快速定位电费敏感客户,对降低客户投诉率、提升客户满意度、树立供电企业良好的服务形象具有重要的作用。基于电网用户数据,提出了一种用于构建用户画像的多视角融合框架,该框架能够快速、准确地识别出电费敏感客户。首先,对电网用户进行了分析研究,利用双通道对不同特性的用户分别建模预测;其次,提出了多种特征萃取方法,用于构建用户多源特征体系;最后,为了充分利用多源特征,进一步提出了基于双层Xgboost的多视角融合模型。该框架在2016CCF大数据与计算智能大赛“客户画像”竞赛中获得了F1值为0.90379(第一名)的成绩,其有效性得到了验证。  相似文献   

11.
12.
Formulation of the static frame problem   总被引:1,自引:0,他引:1  
This report describes a static framework validation challenge problem used in the SANDIA Validation Challenge Workshop, May 21–23, 2006. The challenge problem has clear engineering character, is simple to state and allows many different approaches to solve it. The regulatory assessment problem is to estimate the probability of a given vertical displacement to exceed a prescribed threshold.  相似文献   

13.
A probabilistic construction of model validation   总被引:1,自引:0,他引:1  
We describe a procedure to assess the predictive accuracy of process models subject to approximation error and uncertainty. The proposed approach is a functional analysis-based probabilistic approach for which we represent random quantities using polynomial chaos expansions (PCEs). The approach permits the formulation of the uncertainty assessment in validation, a significant component of the process, as a problem of approximation theory. It has two essential parts. First, a statistical procedure is implemented to calibrate uncertain parameters of the candidate model from experimental or model-based measurements. Such a calibration technique employs PCEs to represent the inherent uncertainty of the model parameters. Based on the asymptotic behavior of the statistical parameter estimator, the associated PCE coefficients are then characterized as independent random quantities to represent epistemic uncertainty due to lack of information. Second, a simple hypothesis test is implemented to explore the validation of the computational model assumed for the physics of the problem. The above validation path is implemented for the case of dynamical system validation challenge exercise.  相似文献   

14.
A method for black-box identification of uncertain systems is presented. The method identifies a nominal model and an uncertainty model set, consisting of unfalsified uncertainty models. Minimisation of a Chebyshev criterion leads to computationally favourable linear programming problems and allows the possibility to include a priori information in the form of linear constraints without making the computations more complex. Using data compression via correlation computations solves the computation problem associated with identifying unfalsified uncertainty models. The application of set-valued uncertainty models to robust process control is illustrated in a simulation study of robust model predictive control of a distillation column.  相似文献   

15.
基于半监督学习的数据流混合集成分类算法   总被引:1,自引:0,他引:1  
当前已有的数据流分类模型都需要大量已标记样本来进行训练,但在实际应用中,对大量样本标记的成本相对较高。针对此问题,提出了一种基于半监督学习的数据流混合集成分类算法SMEClass,选用混合模式来组织基础分类器,用K个决策树分类器投票表决为未标记数据添加标记,以提高数据类标的置信度,增强集成分类器的准确度,同时加入一个贝叶斯分类器来有效减少标记过程中产生的噪音数据。实验结果显示,SMEClass算法与最新基于半监督学习的集成分类算法相比,其准确率有所提高,在运行时间和抗噪能力方面有明显优势。  相似文献   

16.
Behavioral Diversity and a Probabilistically Optimal GP Ensemble   总被引:3,自引:0,他引:3  
We propose N-version Genetic Programming (NVGP) as an ensemble method to enhance accuracy and reduce performance fluctuation of programs produced by genetic programming. Diversity is essential for forming successful ensembles. NVGP quantifies behavioral diversity of ensemble members and defines NVGP optimal as an ensemble that has independent fault occurrences among its members. We observed significant accuracy improvement by NVGP optimal ensembles when applied to a DNA segment classification problem.  相似文献   

17.
Crop models are important tools for impact assessment of climate change, as well as for exploring management options under current climate. It is essential to evaluate the uncertainty associated with predictions of these models. We compare two criteria of prediction error; MSEPfixed, which evaluates mean squared error of prediction for a model with fixed structure, parameters and inputs, and MSEPuncertain(X), which evaluates mean squared error averaged over the distributions of model structure, inputs and parameters. Comparison of model outputs with data can be used to estimate the former. The latter has a squared bias term, which can be estimated using hindcasts, and a model variance term, which can be estimated from a simulation experiment. The separate contributions to MSEPuncertain (X) can be estimated using a random effects ANOVA. It is argued that MSEPuncertain (X) is the more informative uncertainty criterion, because it is specific to each prediction situation.  相似文献   

18.
Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperforms most of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.  相似文献   

19.
Safety measures for terrain classification and safest site selection   总被引:1,自引:0,他引:1  
Two safety measures for terrain classification are described: safety score and safety grade. The terrain safety score s is a multi-valued quantitative measure in the form of a crisp numeric value in the continuous unit interval [0.0, 1.0], that is, . The terrain safety grades are qualitative measures in the form of linguistic fuzzy sets defined by a human expert that cover the ranges of values of s, with adjacent grades having smooth (i.e., non-abrupt) and overlapping boundaries. The safety grade of a terrain segment is inferred from a set of linguistic rules provided by the human expert that relate the terrain qualities to the terrain safety grades. The safety score for the terrain segment is then computed simply from the safety grades in the activated rules. Safety margin of a terrain is also introduced as a quantitative measure of the degree of terrain safety. Validation and confidence in the sensory data are discussed. The terrain safety score and the sensor confidence score are combined and represented by the fused safety/confidence grid. Given the safety/confidence grid of a terrain patch, two new methods for selection of the safest site are presented: Peak-with-High-Neighbors (PHN) and Center-of-Largest-Area (CLA). These two methods are then illustrated by a numerical example. The methods presented in this paper are computationally fast, and are thus strong viable candidates for real-time implementation. Similar fuzzy rule-based terrain classifiers have previously been implemented successfully in rover navigation experiments and spacecraft landing simulations at JPL. Homayoun Seraji was born in Tehran, Iran, in 1947, completed his school education in Iran, and ranked first in the national high-school diploma examinations in 1965. He graduated with a B.Sc. (First Class Honours) in Electronics from the University of Sussex, England, in 1969, and earned his Ph.D. in Control Systems at the University of Cambridge, England, in 1972. He was elected a Research Fellow at St. John’s College, Cambridge, and conducted post-doctoral research and teaching for two years. In 1974, he joined Sharif (formerly Arya-Mehr) University of Technology, Iran, as a Professor of Electrical Engineering and was involved in teaching and research in control systems for ten years. He was selected a U.N. Distinguished Scientist in 1984 and spent one year at the University of New Mexico, USA, as a Visiting Professor. During his 13-year academic career, he has published extensively in the field of multivariable control systems, focusing on: optimal control, pole placement, multivariable PID controllers, and output regulation. Dr. Seraji joined JPL in 1985 as a Senior Member of Technical Staff and additionally taught part-time at Caltech. Since 1991, he has been a Group Supervisor leading and managing a group of about 20 engineers and researchers in the Telerobotics Research and Applications Group. During his tenure at JPL, he has conducted extensive research that has led to major contributions in the field of robot control systems, particularly in: adaptive robot control, control of dexterous robots, contact control, real-time collision avoidance, rule-based robot navigation, and safe spacecraft landing. He received the NASA Exceptional Engineering Achievement Award in 1992, the NASA Group Achievement Award in 2002 and 1991, and eight NASA Major Space Act Awards since 1995. In 2003, he received the JPL Edward Stone Award for Outstanding Research Publication. The outcome of his research in controls and robotics has been published in 93 peer-reviewed journal papers, 112 refereed conference publications, 5 contributed chapters, and has led to 10 patents. In 1996, Dr. Seraji was appointed a Senior Research Scientist at JPL in recognition of his significant individual research contributions in the fields of controls and robotics. He was selected a Fellow of IEEE in 1997 for his contributions to robotic control technology and its space applications. In 2003, he was recognized as the most-published author in the 20-year history of the Journal of Robotic Systems.  相似文献   

20.
不平衡分类在现实生活中有着广泛应用,提高不平衡数据的分类精度一直是相关领域中的热门课题.针对已有欠采样方法容易保留多数类噪声样本的问题,提出一种基于聚类融合欠采样的改进欠采样方法.结合聚类融合与孤立森林(Isolation Forest,iForest)方法,筛选、删除异常指数高的多数类噪声样本,有效提高模型中的样本质...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号