首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到7条相似文献,搜索用时 15 毫秒
1.
The k-Nearest Neighbor (kNN) method of forest attribute estimation and mapping has become an integral part of national forest inventory methods in Finland in the last decade. This success of kNN method in facilitating multi-source inventory has encouraged trials of the method in the Great Lakes Region of the United States. Here we present results from applying the method to Landsat TM and ETM+ data and land cover data collected by the USDA Forest Service's Forest Inventory and Analysis (FIA) program. In 1999, the FIA program in the state of Minnesota moved to a new annual inventory design to reach its targeted full sampling intensity over a 5-year period. This inventory design also utilizes a new 4-subplot cluster plot configuration. Using this new plot design together with 1 year of field plot observations, the kNN classification of forest/nonforest/water achieved overall accuracies ranging from 87% to 91%. Our analysis revealed several important behavioral features associated with kNN classification using the new FIA sample plot design. Results demonstrate the simplicity and utility of using kNN to produce FIA defined forest/nonforest/water classifications.  相似文献   

2.
Meaningful relationships between forest structure attributes measured in representative field plots on the ground and remotely sensed data measured comprehensively across the same forested landscape facilitate the production of maps of forest attributes such as basal area (BA) and tree density (TD). Because imputation methods can efficiently predict multiple response variables simultaneously, they may be usefully applied to map several structural attributes at the species-level. We compared several approaches for imputing the response variables BA and TD, aggregated at the plot-scale and species-level, from topographic and canopy structure predictor variables derived from discrete-return airborne LiDAR data. The predictor and response variables were associated using imputation techniques based on normalized and unnormalized Euclidean distance, Mahalanobis distance, Independent Component Analysis (ICA), Canonical Correlation Analysis (aka Most Similar Neighbor, or MSN), Canonical Correspondence Analysis (aka Gradient Nearest Neighbor, or GNN), and Random Forest (RF). To compare and evaluate these approaches, we computed a scaled Root Mean Square Distance (RMSD) between observed and imputed plot-level BA and TD for 11 conifer species sampled in north-central Idaho. We found that RF produced the best results overall, especially after reducing the number of response variables to the most important species in each plot with regard to BA and TD. We concluded that RF was the most robust and flexible among the imputation methods we tested. We also concluded that canopy structure and topographic metrics derived from LiDAR surveys can be very useful for species-level imputation.  相似文献   

3.
The Nearest Neighbor rule is one of the most successful classifiers in machine learning. However, it is very sensitive to noisy, redundant and irrelevant features, which may cause its performance to deteriorate. Feature weighting methods try to overcome this problem by incorporating weights into the similarity function to increase or reduce the importance of each feature, according to how they behave in the classification task. This paper proposes a new feature weighting classifier, in which the computation of the weights is based on a novel idea combining imputation methods – used to estimate a new distribution of values for each feature based on the rest of the data – and the Kolmogorov–Smirnov nonparametric statistical test to measure the changes between the original and imputed distribution of values. This proposal is compared with classic and recent feature weighting methods. The experimental results show that our feature weighting scheme is very resilient to the choice of imputation method and is an effective way of improving the performance of the Nearest Neighbor classifier, outperforming the rest of the classifiers considered in the comparisons.  相似文献   

4.
Forest information over a landscape is often represented as a spatial mosaic of polygons, separated by differences in species composition, height, age, crown closure, productivity, and other variables. These polygons are commonly delineated on medium-scale photography (e.g., 1:15,000) by a photo-interpreter familiar with the inventory area, and displayed and stored in a Geographic Information System (GIS) layer as a forest cover map. Forest cover maps are used for multiple purposes including timber and habitat supply analyses, and carbon inventories, at a regional or management unit level, and for parks planning, operational planning, and selection of stands for many purposes at a local level. Attribute data for each polygon commonly include the variables used to delineate the polygon, and other variables that can be measured or estimated using these medium-scale photographs. Additional measures that can only be obtained via expensive ground measures or possibly on high resolution photographs (e.g., volume per unit area, biomass components per unit area, tree-list of species and diameters) are available only for a sample of polygons, or may have been gathered independently using a sample survey over the land area. Improved linkages over a variety of data sources may help to support landscape level analyses. This study presents an approach to combine information from a systematic (grid) ground survey, forest cover (polygon) data, and Landsat Thematic Mapper (TM) imagery using variable-space nearest neighbor methods to estimate (i) mean ground-measured attributes for each polygon, in particular, volume per ha (m3/ha), stems per ha, and quadratic mean diameter for each polygon; and (ii) variation of these ground attributes within polygons. The approach was initially evaluated using Monte Carlo simulations with known measures of these attributes. Nearest neighbor methods were then applied to an approximate 5000 ha area (about 1000 polygons) of high productivity, mountainous forests located near the Pacific Coast of British Columbia, Canada. Based on the simulation results, the use of Landsat pixel reflectances to estimate volume per ha, average tree size (i.e., quadratic mean diameter), and stems per ha did not show great promise in improving estimates for each polygon over using forest cover data alone. However, in application, the use of remotely sensed data provided estimates of within-polygon variability. At the same time, the estimated means of these three imputed variables over the entire study area were very similar to the representative sample estimates using the ground data only. Extensions to other variables such as ranges of diameters and numbers of snags may also be possible providing useful data for habitat and forest growth analysis.  相似文献   

5.
Transient disturbances in process measurements compromise the accuracy of some methods for plant-wide oscillation analysis. This paper presents a method to remove such transients while maintaining the dynamic features of the original measurement. The method is based on a nearest neighbors imputation technique. It replaces the removed transient with an estimate which is based on the time series of the whole measurement. The method is demonstrated on experimental and industrial case studies. The results demonstrate the efficacy of the method and recommended parameters. Furthermore, inconsistency indices are proposed which facilitate the automation of the method.  相似文献   

6.
The handling of missing values is a topic of growing interest in the software quality modeling domain. Data values may be absent from a dataset for numerous reasons, for example, the inability to measure certain attributes. As software engineering datasets are sometimes small in size, discarding observations (or program modules) with incomplete data is usually not desirable. Deleting data from a dataset can result in a significant loss of potentially valuable information. This is especially true when the missing data is located in an attribute that measures the quality of the program module, such as the number of faults observed in the program module during testing and after release. We present a comprehensive experimental analysis of five commonly used imputation techniques. This work also considers three different mechanisms governing the distribution of missing values in a dataset, and examines the impact of noise on the imputation process. To our knowledge, this is the first study to thoroughly evaluate the relationship between data quality and imputation. Further, our work is unique in that it employs a software engineering expert to oversee the evaluation of all of the procedures and to ensure that the results are not inadvertently influenced by poor quality data. Based on a comprehensive set of carefully controlled experiments, we conclude that Bayesian multiple imputation and regression imputation are the most effective techniques, while mean imputation performs extremely poorly. Although a preliminary evaluation has been conducted using Bayesian multiple imputation in the empirical software engineering domain, this is the first work to provide a thorough and detailed analysis of this technique. Our studies also demonstrate conclusively that the presence of noisy data has a dramatic impact on the effectiveness of imputation techniques.  相似文献   

7.
Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the central partitional techniques and the agglomerative hierarchical clustering techniques and their derivatives. These methods are well studied and well established. However, both categories have some drawbacks related to data dimensionality (for partitional algorithms) and to the bottom-up structure (for hierarchical agglomerative algorithms). To overcome these limitations, motivated by the problem of gene expression analysis with DNA microarrays, we present a hierarchical clustering algorithm based on a completely different principle, which is the analysis of shared farthest neighbors. We present a framework for clustering using ranks and indexes, and introduce the shared farthest neighbors (SFN) clustering criterion. We illustrate the properties of the method and present experimental results on different data sets, using the strategy of evaluating data clustering by extrinsic knowledge given by class labels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号