共查询到20条相似文献,搜索用时 15 毫秒
1.
Crop models are used to estimate crop productivity under future climate projections, and modellers manage uncertainty by considering different scenarios and GCMs, using a range of crop simulators. Five crop models and 20 users were arranged in a randomized block design with four replicates. Parameters for maize (well studied by modellers) and rapeseed (almost ignored) were calibrated. While all models were accurate for maize (RRMSE from 16.5% to 25.9%), they were, to some extent, unsuitable for rapeseed. Although differences between biomass simulated by the models were generally significant for rapeseed, they were significant only in 30% of the cases for maize. This could suggest that in case of models well suited to a crop, user subjectivity (which explained 14% of total variance in maize outputs) can hide differences in model algorithms and, consequently, the uncertainty due to parameterization should be better investigated. 相似文献
2.
Xibin DONG Zhiwen YU Wenming CAO Yifan SHI Qianli MA 《Frontiers of Computer Science》2020,14(2):241-258
Despite significant successes achieved in knowledge discovery,traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data,such as imbalanced,high-dimensional,noisy data,etc.The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data.In this context,it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model.Ensemble learning,as one research hot spot,aims to integrate data fusion,data modeling,and data mining into a unified framework.Specifically,ensemble learning firstly extracts a set of features with a variety of transformations.Based on these learned features,multiple learning algorithms are utilized to produce weak predictive results.Finally,ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way.In this paper,we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics.In addition,we present challenges and possible research directions for each mainstream approach of ensemble learning,and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning,reinforcement learning,etc. 相似文献
3.
在集成学习领域,传统的动态集成选择需要为每一个样本选择子分类器组成集成分类器,这极大地增加了计算复杂度。针对这一问题,提出一种新的半动态集成选择方法。该方法分为两阶段,第一阶段为所有的测试样本选择最好的个体分类器组成一个集成分类器,第二阶段从剩余的个体分类器集合中为当前测试样本动态地选择子分类器组成一个集成分类器。最终的分类结果通过融合两阶段得到集成分类器的结果得到。通过对UCI数据测试的结果表明,该算法不仅能取得较好的分类性能,而且能极大地降低计算复杂度。 相似文献
4.
5.
Random Forests 总被引:333,自引:0,他引:333
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression. 相似文献
6.
随着Internet技术的发展,万维网上的文档数目成指数级增长。在如此浩瀚的信息库中,用户很难找到自己所需要的信息,如何自动且高效地处理这些海量文档信息成为了目前重要的研究课题。文章通过对抽取到的数据集文档中的标题,超连接和标记等超文本信息,以及文档内容本身分别建立分类模型。然后根据神经网络集成各个分类模型得出判别结果,提出了一种基于元信息的超文本集成分类算法,该算法能更好的综合利用超文本的多元结构化信息。实验结果表明,相对于单独利用某种超文本结构信息进行分类的方法。基于元信息的超文本集成分类算法具有更好的分类性能。 相似文献
7.
在Bagging支持向量机(SVM)的基础上,将动态分类器集选择技术用于SVM的集成学习,研究了SVM动态集成在高光谱遥感图像分类中的应用。结合高光谱数据特性,通过随机选取特征子空间和反馈学习改进了Bagging SVM方法;通过引进加性复合距离改善了K近邻局部空间的计算方法;通过将错分的训练样本添加到验证集增强了验证集样本的代表性。实验结果表明,与单个优化的SVM和其他常见的SVM集成方法相比,改进后的SVM动态集成分类精度最高,能有效地提高高光谱遥感图像的分类精度。 相似文献
8.
鼻炎(Rhinitis )是上呼吸道常见的慢性炎症,具有多种证型和体征。鼻炎临床分类具有样本类型多、类别不平衡特征,属于多输出分类范畴,常出现少数类样本识别率低、综合分类精度差的问题。为此,本文提出异质集成结构分类算法,将鼻炎多输出分类转化为多标签和多类别分类,采用集成学习算法构建异质集成分类器。该方法可根据子数据集中单一类标的不平衡度,自动调节集成森林基学习器数量和深度,有效减少不均衡样本对分类的影响,提高多数类和少数类的总体分类精度,进而提升集成模型的泛化能力。针对临床461例鼻炎样本进行交叉验证分类实验,本文分类模型灵敏度为74.9%,特异性为86.5%,准确度为92.0%,F1为0.783,AUC为0.953。与6种典型模型相比,本文模型具有更好的评估性能,更适合于鼻炎的早期临床诊断。 相似文献
9.
10.
电网公司的电费敏感客户往往对由用电引发的电量、电价、电费、缴费、欠费等电力服务具有强烈反应。快速定位电费敏感客户,对降低客户投诉率、提升客户满意度、树立供电企业良好的服务形象具有重要的作用。基于电网用户数据,提出了一种用于构建用户画像的多视角融合框架,该框架能够快速、准确地识别出电费敏感客户。首先,对电网用户进行了分析研究,利用双通道对不同特性的用户分别建模预测;其次,提出了多种特征萃取方法,用于构建用户多源特征体系;最后,为了充分利用多源特征,进一步提出了基于双层Xgboost的多视角融合模型。该框架在2016CCF大数据与计算智能大赛“客户画像”竞赛中获得了F1值为0.90379(第一名)的成绩,其有效性得到了验证。 相似文献
11.
Static frame challenge problem: Summary 总被引:1,自引:0,他引:1
12.
Formulation of the static frame problem 总被引:1,自引:0,他引:1
I. Babuka F. Nobile R. Tempone 《Computer Methods in Applied Mechanics and Engineering》2008,197(29-32):2496
This report describes a static framework validation challenge problem used in the SANDIA Validation Challenge Workshop, May 21–23, 2006. The challenge problem has clear engineering character, is simple to state and allows many different approaches to solve it. The regulatory assessment problem is to estimate the probability of a given vertical displacement to exceed a prescribed threshold. 相似文献
13.
A probabilistic construction of model validation 总被引:1,自引:0,他引:1
Roger G. Ghanem Alireza Doostan John Red-Horse 《Computer Methods in Applied Mechanics and Engineering》2008,197(29-32):2585
We describe a procedure to assess the predictive accuracy of process models subject to approximation error and uncertainty. The proposed approach is a functional analysis-based probabilistic approach for which we represent random quantities using polynomial chaos expansions (PCEs). The approach permits the formulation of the uncertainty assessment in validation, a significant component of the process, as a problem of approximation theory. It has two essential parts. First, a statistical procedure is implemented to calibrate uncertain parameters of the candidate model from experimental or model-based measurements. Such a calibration technique employs PCEs to represent the inherent uncertainty of the model parameters. Based on the asymptotic behavior of the statistical parameter estimator, the associated PCE coefficients are then characterized as independent random quantities to represent epistemic uncertainty due to lack of information. Second, a simple hypothesis test is implemented to explore the validation of the computational model assumed for the physics of the problem. The above validation path is implemented for the case of dynamical system validation challenge exercise. 相似文献
14.
A method for black-box identification of uncertain systems is presented. The method identifies a nominal model and an uncertainty model set, consisting of unfalsified uncertainty models. Minimisation of a Chebyshev criterion leads to computationally favourable linear programming problems and allows the possibility to include a priori information in the form of linear constraints without making the computations more complex. Using data compression via correlation computations solves the computation problem associated with identifying unfalsified uncertainty models. The application of set-valued uncertainty models to robust process control is illustrated in a simulation study of robust model predictive control of a distillation column. 相似文献
15.
基于半监督学习的数据流混合集成分类算法 总被引:1,自引:0,他引:1
当前已有的数据流分类模型都需要大量已标记样本来进行训练,但在实际应用中,对大量样本标记的成本相对较高。针对此问题,提出了一种基于半监督学习的数据流混合集成分类算法SMEClass,选用混合模式来组织基础分类器,用K个决策树分类器投票表决为未标记数据添加标记,以提高数据类标的置信度,增强集成分类器的准确度,同时加入一个贝叶斯分类器来有效减少标记过程中产生的噪音数据。实验结果显示,SMEClass算法与最新基于半监督学习的集成分类算法相比,其准确率有所提高,在运行时间和抗噪能力方面有明显优势。 相似文献
16.
Behavioral Diversity and a Probabilistically Optimal GP Ensemble 总被引:3,自引:0,他引:3
Kosuke Imamura Terence Soule Robert B. Heckendorn James A. Foster 《Genetic Programming and Evolvable Machines》2003,4(3):235-253
We propose N-version Genetic Programming (NVGP) as an ensemble method to enhance accuracy and reduce performance fluctuation of programs produced by genetic programming. Diversity is essential for forming successful ensembles. NVGP quantifies behavioral diversity of ensemble members and defines NVGP optimal as an ensemble that has independent fault occurrences among its members. We observed significant accuracy improvement by NVGP optimal ensembles when applied to a DNA segment classification problem. 相似文献
17.
Crop models are important tools for impact assessment of climate change, as well as for exploring management options under current climate. It is essential to evaluate the uncertainty associated with predictions of these models. We compare two criteria of prediction error; MSEPfixed, which evaluates mean squared error of prediction for a model with fixed structure, parameters and inputs, and MSEPuncertain(X), which evaluates mean squared error averaged over the distributions of model structure, inputs and parameters. Comparison of model outputs with data can be used to estimate the former. The latter has a squared bias term, which can be estimated using hindcasts, and a model variance term, which can be estimated from a simulation experiment. The separate contributions to MSEPuncertain (X) can be estimated using a random effects ANOVA. It is argued that MSEPuncertain (X) is the more informative uncertainty criterion, because it is specific to each prediction situation. 相似文献
18.
Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperforms most of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments. 相似文献
19.
Homayoun Seraji 《Autonomous Robots》2006,21(3):211-225
Two safety measures for terrain classification are described: safety score and safety grade. The terrain safety score s is a multi-valued quantitative measure in the form of a crisp numeric value in the continuous unit interval [0.0, 1.0], that is,
. The terrain safety grades
are qualitative measures in the form of linguistic fuzzy sets defined by a human expert that cover the ranges of values of s, with adjacent grades having smooth (i.e., non-abrupt) and overlapping boundaries. The safety grade of a terrain segment
is inferred from a set of linguistic rules provided by the human expert that relate the terrain qualities to the terrain safety grades. The safety score for the
terrain segment is then computed simply from the safety grades in the activated rules. Safety margin of a terrain is also introduced as a quantitative measure of the degree of terrain safety. Validation and confidence in the
sensory data are discussed. The terrain safety score and the sensor confidence score are combined and represented by the fused safety/confidence grid. Given the safety/confidence grid of a terrain patch, two new methods for selection of the safest site are presented: Peak-with-High-Neighbors (PHN) and Center-of-Largest-Area (CLA). These two methods are then illustrated
by a numerical example. The methods presented in this paper are computationally fast, and are thus strong viable candidates
for real-time implementation. Similar fuzzy rule-based terrain classifiers have previously been implemented successfully in
rover navigation experiments and spacecraft landing simulations at JPL.
Homayoun Seraji was born in Tehran, Iran, in 1947, completed his school education in Iran, and ranked first in the national high-school diploma
examinations in 1965. He graduated with a B.Sc. (First Class Honours) in Electronics from the University of Sussex, England,
in 1969, and earned his Ph.D. in Control Systems at the University of Cambridge, England, in 1972. He was elected a Research
Fellow at St. John’s College, Cambridge, and conducted post-doctoral research and teaching for two years. In 1974, he joined
Sharif (formerly Arya-Mehr) University of Technology, Iran, as a Professor of Electrical Engineering and was involved in teaching
and research in control systems for ten years. He was selected a U.N. Distinguished Scientist in 1984 and spent one year at
the University of New Mexico, USA, as a Visiting Professor. During his 13-year academic career, he has published extensively
in the field of multivariable control systems, focusing on: optimal control, pole placement, multivariable PID controllers,
and output regulation.
Dr. Seraji joined JPL in 1985 as a Senior Member of Technical Staff and additionally taught part-time at Caltech. Since 1991,
he has been a Group Supervisor leading and managing a group of about 20 engineers and researchers in the Telerobotics Research
and Applications Group. During his tenure at JPL, he has conducted extensive research that has led to major contributions
in the field of robot control systems, particularly in: adaptive robot control, control of dexterous robots, contact control,
real-time collision avoidance, rule-based robot navigation, and safe spacecraft landing. He received the NASA Exceptional
Engineering Achievement Award in 1992, the NASA Group Achievement Award in 2002 and 1991, and eight NASA Major Space Act Awards
since 1995. In 2003, he received the JPL Edward Stone Award for Outstanding Research Publication. The outcome of his research
in controls and robotics has been published in 93 peer-reviewed journal papers, 112 refereed conference publications, 5 contributed
chapters, and has led to 10 patents.
In 1996, Dr. Seraji was appointed a Senior Research Scientist at JPL in recognition of his significant individual research
contributions in the fields of controls and robotics. He was selected a Fellow of IEEE in 1997 for his contributions to robotic
control technology and its space applications. In 2003, he was recognized as the most-published author in the 20-year history
of the Journal of Robotic Systems. 相似文献
20.
不平衡分类在现实生活中有着广泛应用,提高不平衡数据的分类精度一直是相关领域中的热门课题.针对已有欠采样方法容易保留多数类噪声样本的问题,提出一种基于聚类融合欠采样的改进欠采样方法.结合聚类融合与孤立森林(Isolation Forest,iForest)方法,筛选、删除异常指数高的多数类噪声样本,有效提高模型中的样本质... 相似文献