首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance.  相似文献   

3.
谢雨  蒋瑜  龙超奇 《计算机应用》2021,41(6):1679-1685
针对扩展隔离林(EIF)算法时间开销过大的问题,提出了一种基于随机子空间的扩展隔离林(RS-EIF)算法.首先,在原数据空间确定多个随机子空间;然后,在不同的随机子空间中通过计算每个节点的截距向量与斜率来构建扩展孤立树,并将多棵扩展孤立树集成为子空间扩展隔离林;最后,通过计算数据点在扩展隔离林中的平均遍历深度来确定数据...  相似文献   

4.
With the rapid growth and increased competition in credit industry, the corporate credit risk prediction is becoming more important for credit-granting institutions. In this paper, we propose an integrated ensemble approach, called RS-Boosting, which is based on two popular ensemble strategies, i.e., boosting and random subspace, for corporate credit risk prediction. As there are two different factors encouraging diversity in RS-Boosting, it would be advantageous to get better performance. Two corporate credit datasets are selected to demonstrate the effectiveness and feasibility of the proposed method. Experimental results reveal that RS-Boosting gets the best performance among seven methods, i.e., logistic regression analysis (LRA), decision tree (DT), artificial neural network (ANN), bagging, boosting and random subspace. All these results illustrate that RS-Boosting can be used as an alternative method for corporate credit risk prediction.  相似文献   

5.
Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA, which, on the other hand, are better suited for classification tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving the best of both types of methods: an approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification with a high-breakdown point. We also show that subspace methods, such as CCA, which are used for solving regression tasks, can be treated in a similar manner. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.  相似文献   

6.
《Information Fusion》2002,3(4):245-258
In classifier combination, it is believed that diverse ensembles have a better potential for improvement on the accuracy than non-diverse ensembles. We put this hypothesis to a test for two methods for building the ensembles: Bagging and Boosting, with two linear classifier models: the nearest mean classifier and the pseudo-Fisher linear discriminant classifier. To estimate diversity, we apply nine measures proposed in the recent literature on combining classifiers. Eight combination methods were used: minimum, maximum, product, average, simple majority, weighted majority, Naive Bayes and decision templates. We carried out experiments on seven data sets for different sample sizes, different number of classifiers in the ensembles, and the two linear classifiers. Altogether, we created 1364 ensembles by the Bagging method and the same number by the Boosting method. On each of these, we calculated the nine measures of diversity and the accuracy of the eight different combination methods, averaged over 50 runs. The results confirmed in a quantitative way the intuitive explanation behind the success of Boosting for linear classifiers for increasing training sizes, and the poor performance of Bagging in this case. Diversity measures indicated that Boosting succeeds in inducing diversity even for stable classifiers whereas Bagging does not.  相似文献   

7.

In this paper, we propose a novel method, called random subspace method (RSM) based on tensor (Tensor-RS), for face recognition. Different from the traditional RSM which treats each pixel (or feature) of the face image as a sampling unit, thus ignores the spatial information within the face image, the proposed Tensor-RS regards each small image region as a sampling unit and obtains spatial information within small image regions by using reshaping image and executing tensor-based feature extraction method. More specifically, an original whole face image is first partitioned into some sub-images to improve the robustness to facial variations, and then each sub-image is reshaped into a new matrix whose each row corresponds to a vectorized small sub-image region. After that, based on these rearranged newly formed matrices, an incomplete random sampling by row vectors rather than by features (or feature projections) is applied. Finally, tensor subspace method, which can effectively extract the spatial information within the same row (or column) vector, is used to extract useful features. Extensive experiments on four standard face databases (AR, Yale, Extended Yale B and CMU PIE) demonstrate that the proposed Tensor-RS method significantly outperforms state-of-the-art methods.

  相似文献   

8.
The investigation of the accuracy of methods employed to forecast agricultural commodities prices is an important area of study. In this context, the development of effective models is necessary. Regression ensembles can be used for this purpose. An ensemble is a set of combined models which act together to forecast a response variable with lower error. Faced with this, the general contribution of this work is to explore the predictive capability of regression ensembles by comparing ensembles among themselves, as well as with approaches that consider a single model (reference models) in the agribusiness area to forecast prices one month ahead. In this aspect, monthly time series referring to the price paid to producers in the state of Parana, Brazil for a 60 kg bag of soybean (case study 1) and wheat (case study 2) are used. The ensembles bagging (random forests — RF), boosting (gradient boosting machine — GBM and extreme gradient boosting machine — XGB), and stacking (STACK) are adopted. The support vector machine for regression (SVR), multilayer perceptron neural network (MLP) and K-nearest neighbors (KNN) are adopted as reference models. Performance measures such as mean absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE) are used for models comparison. Friedman and Wilcoxon signed rank tests are applied to evaluate the models’ absolute percentage errors (APE). From the comparison of test set results, MAPE lower than 1% is observed for the best ensemble approaches. In this context, the XGB/STACK (Least Absolute Shrinkage and Selection Operator-KNN-XGB-SVR) and RF models showed better performance for short-term forecasting tasks for case studies 1 and 2, respectively. Better APE (statistically smaller) is observed for XGB/STACK and RF in relation to reference models. Besides that, approaches based on boosting are consistent, providing good results in both case studies. Alongside, a rank according to the performances is: XGB, GBM, RF, STACK, MLP, SVR and KNN. It can be concluded that the ensemble approach presents statistically significant gains, reducing prediction errors for the price series studied. The use of ensembles is recommended to forecast agricultural commodities prices one month ahead, since a more assertive performance is observed, which allows to increase the accuracy of the constructed model and reduce decision-making risk.  相似文献   

9.
New healthcare technologies are emerging with the increasing age of the society, where the development of smart homes for monitoring the elders’ activities is in the center of them. Identifying the resident’s activities in an apartment is an important module in such systems. Dense sensing approach aims to embed sensors in the environment to report the detected events continuously. The events are segmented and analyzed via classifiers to identify the corresponding activity. Although several methods were introduced in recent years for detecting simple activities, the recognition of complex ones requires more effort. Due to the different time duration and event density of each activity, finding the best size of the segments is one of the challenges in detecting the activity. Also, using appropriate classifiers that are capable of detecting simple and interleaved activities is the other issue. In this paper, we devised a two-phase approach called CARER (Complex Activity Recognition using Emerging patterns and Random forest). In the first phase, the emerging patterns are mined, and various features of the activities are extracted to build a model using the Random Forest technique. In the second phase, the sequences of events are segmented dynamically by considering their recency and sensor correlation. Then, the segments are analyzed by the generated model from the previous phase to recognize both simple and complex activities. We examined the performance of the devised approach using the CASAS dataset. To do this, first we investigated several classifiers. The outcome showed that the combination of emerging patterns and the random forest provide a higher degree of accuracy. Then, we compared CARER with the static window approach, which used Hidden Markov Model. To have a fair comparison, we replaced the dynamic segmentation module of CARER with the static one. The results showed more than 12% improvement in f-measure. Finally, we compared our work with Dynamic sensor segmentation for real-time activity recognition, which used dynamic segmentation. The f-measure metric demonstrated up to 12.73% improvement.  相似文献   

10.
We give a consistency proof for two subspace methods. We then show the asymptotic equivalence of a special subspace method and the initial estimate proposed by Hannan and Rissanen. Finally, a simulation study comparing two subspace methods and the maximum-likelihood method is performed.  相似文献   

11.
12.
针对microRNA识别方法中过多注重新特征、忽略弱分类能力特征和冗余特征,导致敏感性和特异性指标不佳或两者不平衡的问题,提出一种基于特征聚类和随机子空间的集成算法CLUSTER-RS。该算法采用信息增益率剔除部分弱分类能力的特征后,利用信息熵度量特征之间相关性,对特征进行聚类,再从每个特征簇中随机选取等量特征组成特征集用于构建基分类器,最后将基分类器集成用于microRNA识别。通过调整参数、选择基分类器实现算法最优化后,在microRNA最新数据集上与经典方法Triplet-SVM、miPred、MiPred、microPred和HuntMi进行对比实验,结果显示CLUSTER-RS在识别中敏感性不及microPred但优于其他模型,特异性为六者最优,而且从整体性能指标准确性和马修兹系数可以看出,CLUSTER-RS比其他算法具有优势。结果表明,CLUSTER-RS取得了较好的识别效果,在敏感性和特异性上实现了很好的平衡,即在性能指标平衡方面优于对比方法。  相似文献   

13.
Growing subspace pattern recognition methods and theirneural-network models   总被引:3,自引:0,他引:3  
In statistical pattern recognition, the decision of which features to use is usually left to human judgment. If possible, automatic methods are desirable. Like multilayer perceptrons, learning subspace methods (LSMs) have the potential to integrate feature extraction and classification. In this paper, we propose two new algorithms, along with their neural-network implementations, to overcome certain limitations of the earlier LSMs. By introducing one cluster at a time and adapting it if necessary, we eliminate one limitation of deciding how many clusters to have in each class by trial-and-error. By using the principal component analysis neural networks along with this strategy, we propose neural-network models which are better in overcoming another limitation, scalability. Our results indicate that the proposed classifiers are comparable to classifiers like the multilayer perceptrons and the nearest-neighbor classifier in terms of classification accuracy. In terms of classification speed and scalability in design, they appear to be better for large-dimensional problems.  相似文献   

14.
We report the results from modelling standing volume, above-ground biomass and stem count with the aim of exploring the potential of two non-parametric approaches to estimate forest attributes. The models were built based on spectral and 3D information extracted from airborne optical and laser scanner data. The survey was completed across two geographically adjacent temperate forest sites in southwestern Germany, using spatially and temporally comparable remote-sensing data collected by similar instruments. Samples from the auxiliary reference stands (called off-site samples) were combined with random, random stratified and systematically stratified samples from the target area for prediction of standing volume, above-ground biomass and stem count in the target area. A range of combinations was used for the modelling process, comprising the most similar neighbour (MSN) and random forest (RF) imputation methods, three sampling designs and two predictor subset sizes. An evolutionary genetic algorithm (GA) was applied to prune the predictor variables. Diagnostic tools, including root mean square error (RMSE), bias and standard error of imputation, were employed to evaluate the results. The results showed that RF produced more accurate results than MSN (average improvement of 3.5% for a single-neighbour case with selected predictors), yet was more biased than MSN (average bias of 5.13% with RF compared to 2.44% with MSN for stem volume in a single-neighbour case with selected predictors). Combining systematically stratified auxiliary samples from the target data set with the reference data set yielded more accurate results compared to those from random and stratified random samples. Combining additional data was most influential when an intensity of up to 40% of supplementary samples was appended to the reference set. The use of GA-selected predictors resulted in reduced bias of the models. By means of bootstrap simulations of RMSE, the simulations were shown to lie within the applied non-parametric confidence intervals. The achieved results are concluded to be helpful for modelling the mentioned forest attributes by means of airborne remote-sensing data.  相似文献   

15.
In a small case study of mixed hardwood Hyrcanian forests of Iran, three non-parametric methods, namely k-nearest neighbour (k-NN), support vector machine regression (SVR) and tree regression based on random forest (RF), were used in plot-level estimation of volume/ha, basal area/ha and stems/ha using field inventory and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. Relevant pre-processing and processing steps were applied to the ASTER data for geometric and atmospheric correction and for enhancing quantitative forest parameters. After collecting terrestrial information on trees in the 101 samples, the volume, basal area and tree number per hectare were calculated in each plot. In the k-NN implementation using different distance measures and k, the cross-validation method was used to find the best distance measure and optimal k. In SVR, the best regularized parameters of four kernel types were obtained using leave-one-out cross-validation. RF was implemented using a bootstrap learning method with regularized parameters for decision tree model and stopping. The validity of performances was examined using unused test samples by absolute and relative root mean square error (RMSE) and bias metrics. In volume/ha estimation, the results showed that all the three algorithms had similar performances. However, SVR and RF produced better results than k-NN with relative RMSE values of 28.54, 25.86 and 26.86 (m3 ha–1), respectively, using k-NN, SVR and RF algorithms, but RF could generate unbiased estimation. In basal area/ha and stems/ha estimation, the implementation results of RF showed that RF was slightly superior in relative RMSE (18.39, 20.64) to SVR (19.35, 22.09) and k-NN (20.20, 21.53), but k-NN could generate unbiased estimation compared with the other two algorithms used.  相似文献   

16.
17.
Kolcz A 《Neural computation》2000,12(2):293-304
Similarities between bootstrap aggregation (bagging) and N-tuple sampling are explored to propose a retina-free data-driven version of the N-tuple network, whose close analogies to aggregated regression trees, such as classification and regression trees (CART), lead to further architectural enhancements. Performance of the proposed algorithms is compared with the traditional versions of the N-tuple and CART networks on a number of regression problems. The architecture significantly outperforms conventional N-tuple networks while leading to more compact solutions and avoiding certain implementational pitfalls of the latter.  相似文献   

18.
The behavioural framework has several attractions to offer for the identification of multivariable systems. Some of the variables may be left unexplained without the need for a distinction between inputs and outputs; criteria for model quality are independent of the chosen parametrization; and behaviours allow for a global (i.e., non-local) approximation of the system dynamics. This is illustrated by the identification of dynamic factor models. Behavioural least squares is a natural method for this problem, and a comparison is given with non-behavioural methods.  相似文献   

19.
Information about forest cover is needed by all of the nine societal benefit areas identified by the Group of Earth Observation (GEO). In particular, the biodiversity and ecosystem areas need information on landscape composition, structure of forests, species richness, as well as their changes. Field sample plots from National Forest Inventories (NFI) are, in combination with satellite data, a tremendous resource for fulfilling these information needs. NFIs have a history of almost 100 years and have developed in parallel in several countries. For example, the NFIs in Finland and Sweden measure annually more than 10,000 field plots with approximately 200 variables per plot. The inventories are designed for five-year rotations. In Finland nationwide forest cover maps have been produced operationally since 1990 by using the k-NN algorithm to combine satellite data, field sample plot information, and other georeferenced digital data. A similar k-NN database has also been created for Sweden. The potentials of NFIs to fulfil diverse information needs are currently analyzed also in the COST Action E43 project of the European Union. In this article, we provide a review of how NFI field plot information has been used for parameterization of image data in Sweden and Finland, including pre-processing steps like haze correction, slope correction, and the optimization of the estimation variables. Furthermore, we review how the produced small-area statistics and forest cover data have been used in forestry, including forest biodiversity monitoring and habitat modelling. We also show how remote sensing data can be used for post-stratification to derive the sample plot based estimates, which cannot be directly estimated from the spectral data.  相似文献   

20.
为使用正例与未标注数据训练分类器(positive and unlabeled learning,PU learning),提出基于随机森林的PU学习算法。对POSC4.5算法进行扩展,在其生成决策树的过程中加入随机特征选择;在训练阶段,使用有放回抽样技术对PU数据集抽样,生成多个不同的PU训练集,并以其训练扩展后的POSC4.5算法,构造多棵决策树;在分类阶段,采用多数投票策略集成各决策树输出。在UCI数据集上的实验结果表明,该算法的分类性能优于偏置支持向量机算法、POS4.5算法和基于装袋技术的POSC4.5算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号