首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 156 毫秒
1.
随机森林在bootstrap的基础上通过对特征进行抽样构建决策树,以牺牲决策树准确性的方式来降低决策树间的相关性,从而提高预测的准确性。但在数据规模较大时,决策树间的相关性仍然较高,导致随机森林的性能表现不佳。为解决该问题,提出一种基于袋外预测的改进算法,通过提高决策树的准确性来提升随机森林的预测性能。将随机森林的袋外预测与原特征相结合并重新训练随机森林,以有效降低决策树的VC-dimension、经验风险、泛化风险并提高其准确性,最终提升随机森林的预测性能。然而,决策树准确性的提高会使决策树间的预测趋于相近,提升了决策树间的相关性从而影响随机森林最终的预测表现,为此,通过扩展空间算法为不同决策树生成不同的特征,从而降低决策树间的相关性而不显著降低决策树的准确性。实验结果表明,该算法在32个数据集上的平均准确率相对原始随机森林提高1.7%,在校正的paired t-test上,该方法在其中19个数据集上的预测性能显著优于原始随机森林。  相似文献   

2.
随机森林是一种有效的集成学习算法,被广泛应用于模式识别中。为了得到更高的预测精度,需要对参数进行优化。提出了一种基于袋外数据估计的分类误差,利用改进的网格搜索算法对随机森林算法中的决策树数量和候选分裂属性数进行参数优化的随机森林算法。仿真结果表明,利用该方法优化得到的参数都能够使随机森林的分类效果得到一定程度的提高。  相似文献   

3.
杨丰瑞 《计算机应用研究》2020,37(9):2625-2628,2633
高维复杂数据处理是数据挖掘领域中的关键问题,针对现有特征选择分类算法存在的预测精确度失衡、整体分类效率低下等问题,提出了一种结合概率相关性和极限随机森林的特征选择分类算法(P-ERF)。该算法使用充分考虑特征之间相关性与P值结合的特征选择方式,避免了树节点分裂过程中造成的冗余性问题;并以随机树为基分类器、极限随机森林为整体框架,使P-ERF算法获得了更高的精准度和更好的泛化误差。实验结果表明,P-ERF算法相较于随机森林算法、极限随机森林算法,在数据集分类精度与整体性方面均得到良好的效果。  相似文献   

4.
Changes in the structural state of forests of the semi-arid U.S.A., such as an increase in tree density, are widely believed to be leading to an ecological crisis, but accurate methods of quantifying forest density and configuration are lacking at landscape scales. An individual tree canopy (ITC) method based on aerial LiDAR has been developed to assess forest structure by estimating the density and spatial configuration of trees in four different height classes. The method has been tested against field measured forest inventory data from two geographically distinct forests with independent LiDAR acquisitions. The results show two distinct patterns: accurate, unbiased density estimates for trees taller than 20 m, and underestimation of density in trees less than 20 m tall. The underestimation of smaller trees is suggested to be a limitation of LiDAR remote sensing. Ecological applications of the method are demonstrated through landscape metrics analysis of density and configuration rasters.  相似文献   

5.
改进的随机森林及其在遥感图像中的应用   总被引:1,自引:0,他引:1  
对于遥感图像训练样本获取难的问题,引入适用于小样本分类的随机森林算法。为了随机森林能在小样本情况下有更优的分类效果和更高的稳定性,在决策树基础上提出了一种更加随机的特征组合的方法,降低了决策树之间的相关性,从而降低了森林的泛化误差;引入人工免疫算法来对改进后的随机森林进行压缩优化,很好地权衡了森林规模和分类稳定性、精度的矛盾。通过UCI数据集的实验表明,改进的随机森林的有效性及其优化的模型的可行性,优化后森林的规模降低了,且有更高的分类精度。在遥感图像上与传统的方法进行了对比。  相似文献   

6.
Airborne spectral and light detection and ranging (lidar) sensors have been used to quantify biophysical characteristics of tropical forests. Lidar sensors have provided high-resolution data on forest height, canopy topography, volume, and gap size; and provided estimates on number of strata in a forest, successional status of forests, and above-ground biomass. Spectral sensors have provided data on vegetation types, foliar biochemistry content of forest canopies, tree and canopy phenology, and spectral signatures for selected tree species. A number of advances are theoretically possible with individual and combined spectral and lidar sensors for the study of forest structure, floristic composition and species richness. Delineating individual canopies of over-storey trees with small footprint lidar and discrimination of tree architectural types with waveform distributions is possible and would provide scientists with a new method to study tropical forest structure. Combined spectral and lidar data can be used to identify selected tree species and identify the successional status of tropical forest fragments in order to rank forest patches by levels of species richness. It should be possible in the near future to quantify selected patterns of tropical forests at a higher resolution than can currently be undertaken in the field or from space.  相似文献   

7.
Enlarging the Margins in Perceptron Decision Trees   总被引:4,自引:0,他引:4  
Capacity control in perceptron decision trees is typically performed by controlling their size. We prove that other quantities can be as relevant to reduce their flexibility and combat overfitting. In particular, we provide an upper bound on the generalization error which depends both on the size of the tree and on the margin of the decision nodes. So enlarging the margin in perceptron decision trees will reduce the upper bound on generalization error. Based on this analysis, we introduce three new algorithms, which can induce large margin perceptron decision trees. To assess the effect of the large margin bias, OC1 (Journal of Artificial Intelligence Research, 1994, 2, 1–32.) of Murthy, Kasif and Salzberg, a well-known system for inducing perceptron decision trees, is used as the baseline algorithm. An extensive experimental study on real world data showed that all three new algorithms perform better or at least not significantly worse than OC1 on almost every dataset with only one exception. OC1 performed worse than the best margin-based method on every dataset.  相似文献   

8.
Many areas of forest across northern Canada are challenging to monitor on a regular basis as a result of their large extent and remoteness. Although no forest inventory data typically exist for these northern areas, detailed and timely forest information for these areas is required to support national and international reporting obligations. We developed and tested a sample-based approach that could be used to estimate forest stand height in these remote forests using panchromatic Very High Spatial Resolution (VHSR, < 1 m) optical imagery and light detection and ranging (lidar) data. Using a study area in central British Columbia, Canada, to test our approach, we compared four different methods for estimating stand height using stand-level and crown-level metrics generated from the VHSR imagery. ‘Lidar plots’ (voxel-based samples of lidar data) are used for calibration and validation of the VHSR-based stand height estimates, similar to the way that field plots are used to calibrate photogrammetric estimates of stand height in a conventional forest inventory or to make empirical attribute estimates from multispectral digital remotely sensed data. A k-nearest neighbours (k-NN) method provided the best estimate of mean stand height (R 2 = 0.69; RMSE = 2.3 m, RMSE normalized by the mean value of the estimates (RMSE-%) = 21) compared with linear regression, random forests, and regression tree methods. The approach presented herein demonstrates the potential of VHSR panchromatic imagery and lidar to provide robust and representative estimates of stand height in remote forest areas where conventional forest inventory approaches are either too costly or are not logistically feasible. While further evaluation of the methods is required to generalize these results over Canada to provide robust and representative estimation, VHSR and lidar data provide an opportunity for monitoring in areas for which there is no detailed forest inventory information available.  相似文献   

9.
该文提出一种基于语言现象的文本蕴涵识别方法,该方法建立了一个语言现象识别和整体推理判断的联合分类模型,目的是对两个高度相关的任务进行统一学习,避免管道模型的错误传播问题并提升系统精度。针对语言现象识别,设计了22个专用特征和20个通用特征;为提高随机森林的泛化能力,提出一种基于特征选择的随机森林生成算法。实验结果表明,基于随机森林的联合分类模型能够有效识别语言现象和总体蕴涵关系。  相似文献   

10.
Meso-scale digital terrain models (DTMs) and canopy-height estimates, or digital canopy models (DCMs), are two lidar products that have immense potential for research in tropical rain forest (TRF) ecology and management. In this study, we used a small-footprint lidar sensor (airborne laser scanner, ALS) to estimate sub-canopy elevation and canopy height in an evergreen tropical rain forest. A fully automated, local-minima algorithm was developed to separate lidar ground returns from overlying vegetation returns. We then assessed inverse distance weighted (IDW) and ordinary kriging (OK) geostatistical techniques for the interpolation of a sub-canopy DTM. OK was determined to be a superior interpolation scheme because it smoothed fine-scale variance created by spurious understory heights in the ground-point dataset. The final DTM had a linear correlation of 1.00 and a root-mean-square error (RMSE) of 2.29 m when compared against 3859 well-distributed ground-survey points. In old-growth forests, RMS error on steep slopes was 0.67 m greater than on flat slopes. On flatter slopes, variation in vegetation complexity associated with land use caused highly significant differences in DTM error distribution across the landscape. The highest DTM accuracy observed in this study was 0.58-m RMSE, under flat, open-canopy areas with relatively smooth surfaces. Lidar ground retrieval was complicated by dense, multi-layered evergreen canopy in old-growth forests, causing DTM overestimation that increased RMS error to 1.95 m.A DCM was calculated from the original lidar surface and the interpolated DTM. Individual and plot-scale heights were estimated from DCM metrics and compared to field data measured using similar spatial supports and metrics. For old-growth forest emergent trees and isolated pasture trees greater than 20 m tall, individual tree heights were underestimated and had 3.67- and 2.33-m mean absolute error (MAE), respectively. Linear-regression models explained 51% (4.15-m RMSE) and 95% (2.41-m RMSE) of the variance, respectively. It was determined that improved elevation and field-height estimation in pastures explained why individual pasture trees could be estimated more accurately than old-growth trees. Mean height of tree stems in 32 young agroforestry plantation plots (0.38 to 18.53 m tall) was estimated with a mean absolute error of 0.90 m (r2=0.97; 1.08-m model RMSE) using the mean of lidar returns in the plot. As in other small-footprint lidar studies, plot mean height was underestimated; however, our plot-scale results have stronger linear models for tropical, leaf-on hardwood trees than has been previously reported for temperate-zone conifer and deciduous hardwoods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号