首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose an approach for ensemble construction based on the use of supervised projections, both linear and non-linear, to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set, it uses misclassified instances to find a supervised projection that favors their correct classification. We show that supervised projection algorithms can be used for this task. We try several known supervised projections, both linear and non-linear, in order to test their ability in the present framework. Additionally, the method is further improved introducing concepts from oversampling for imbalance datasets. The introduced method counteracts the negative effect of a low number of instances for constructing the supervised projections.The method is compared with AdaBoost showing an improved performance on a large set of 45 problems from the UCI Machine Learning Repository. Also, the method shows better robustness in presence of noise with respect to AdaBoost.  相似文献   

2.
A number of earlier studies that have attempted a theoretical analysis of majority voting assume independence of the classifiers. We formulate the majority voting problem as an optimization problem with linear constraints. No assumptions on the independence of classifiers are made. For a binary classification problem, given the accuracies of the classifiers in the team, the theoretical upper and lower bounds for performance obtained by combining them through majority voting are shown to be solutions of the corresponding optimization problem. The objective function of the optimization problem is nonlinear in the case of an even number of classifiers when rejection is allowed, for the other cases the objective function is linear and hence the problem is a linear program (LP). Using the framework we provide some insights and investigate the relationship between two candidate classifier diversity measures and majority voting performance.  相似文献   

3.

Image texture can be an important source of data in the image classification process. Although not as easily measurable as image spectral attributes, image texture has proved in a number of cases to be a valuable source of data capable of increasing the accuracy of the classification process. In remote sensing there are cases in which classes are spectrally very similar, but present distinct spatial distribution, i.e. different textural characteristics. Image texture becomes then an important source of information in the classification process. The aim of this study is (1) to develop and test a supervised image classification method based on the image spatial texture as extracted by the Gabor filtering concept and (2) to investigate experimentally the performance of the classification process as a function of the Gabor filter's parameters. A set of Gabor filters is initially generated for the given image data. The filter parameters related to the relevant spatial frequencies present in the image are estimated from the available samples via the Fourier transform. Each filter generates one filtered image which characterizes the particular spatial frequency implemented by the filter parameters. As a result, a number of filtered images, sometimes referred to as 'textural bands', are generated and the originally univariate problem is transformed into a multivariate one, every pixel being defined by a vector with dimension identical to the number of filters used. The multidimensional image data can then be classified by implementing an appropriate supervised classification method. In this study the Euclidean Minimum Distance and the Gaussian Maximum Likelihood classifiers are used. The adequacy of the selected Gabor filter parameters (namely, the spatial frequency and the filter's spatial extent) are then examined as a function of the resulting classification accuracy. The proposed supervised methodology is tested using both synthetic and real image data. Results are presented and analysed.  相似文献   

4.
We propose a framework for learning good prototypes, called prototype generation and filtering (PGF), by integrating the strength of instance-filtering and instance-abstraction techniques using two different integration methods. The two integration methods differ in the filtering granularity as well as the degree of coupling of the techniques. In order to characterize the behavior of the effect of integration, we categorize instance-filtering techniques into three kinds, namely, (1) removing border instances, (2) retaining border instance, (3) retaining center instances. The effect of using different kinds of filtering in different variants of our PGF framework are investigated. We have conducted experiments on 35 real-world benchmark data sets. We found that our PGF framework maintains or achieves better classification accuracy and gains a significant improvement in data reduction compared with pure filtering and pure abstraction techniques as well as KNN and C4.5.  相似文献   

5.
Learning classifier systems (LCS) are machine learning systems designed to work for both multi-step and single-step decision tasks. The latter case presents an interesting challenge for such algorithms, especially when they are applied to real-world data mining (DM) problems. The present investigation departs from the popular approach of applying accuracy-based LCS to single-step classification and aims to uncover the potential of strength-based LCS in such tasks. Although the latter family of algorithms have often been associated with poor generalization and performance, we aim at alleviating these problems by defining appropriate extensions to the traditional strength-based LCS framework. These extensions are detailed and their effect on system performance is studied through the application of the proposed algorithm on a set of artificial problems, designed to challenge its scalability and generalization abilities. The comparison of the proposed algorithm with UCS, its state-of-the-art accuracy-based counterpart, emphasizes the effects of our extended strength-based approach and validates its competitiveness in multi-class problems with various class distributions. Overall, our work presents an investigation of strength-based LCS in the domain of supervised classification. Our extensive analysis of the learning dynamics involved in these systems provides proof of their potential as real-world DM tools, inducing tractable rule-based classification models, even in the presence of severe class imbalances.  相似文献   

6.
It has been demonstrated that combining the decisions of several classifiers can lead to better recognition results. The combination can be implemented using a variety of strategies, among which majority vote is by far the simplest, and yet it has been found to be just as effective as more complicated schemes in improving the recognition results. This paper examines the mode of operation of the majority vote method in order to gain a deeper understanding of how and why it works, so that a more solid basis can be provided for its future applications to different data and/or domains. In the course of our research, we have analyzed this method from its foundations and obtained many new and original results regarding its behavior. Particular attention has been directed toward the changes in the correct and error rates when classifiers are added, and conditions are derived under which their addition/elimination would be valid for the specific objectives of the application. At the same time, our theoretical findings are compared against experimental results, and these results do reflect the trends predicted by the theoretical considerations  相似文献   

7.
In a distributed database system, data replicas are placed at different locations to achieve high data availability in the presence of link failures. With a majority voting protocol, a location survives for read/write operations if and only if it is accessible to more than half of the replicas. The problem is to find out the optimal placements for a given number of data replicas in a ring network. When the number of replicas is odd, it was conjectured by Hu et al. [X.-D. Hu, X.-H. Jia, D.-Z. Du, D.-Y. Li, H.-J. Huang, Placement of data replicas for optimal data availability in ring networks, J. Parallel Distrib. Comput., 61 (2001) 1412–1424] that every uniform placement is optimal, which is proved by Shekhar and Wu later. However, when the number of replicas is even, it was pointed out by Hu et al. that uniform placements are not optimal and the optimal placement problem may be very complicated. In this paper, we study the optimal placement problem in a ring network with majority voting protocol and an even number of replicas, and give a complete characterization of optimal placements when the number of replicas is not too large compared with the number of locations.  相似文献   

8.
Donald FM 《Ergonomics》2008,51(11):1643-1655
The ability to generalise vigilance research to operational environments has been questioned, largely due to differences between laboratory research and real-world settings. The taxonomy of vigilance tasks proposed by Parasuraman and Davies (1977) represents an attempt to classify vigilance tasks so that tasks with similar information-processing demands can be compared and the ability to generalise results enhanced. Although the taxonomy originally included complexity, the term specifically referred to multiple sources of information. Complexity has been overlooked in much of the traditional vigilance literature, although it is included in more recent studies of jobs such as air traffic control. In this paper, the taxonomy is evaluated in relation to two vigilance intensive jobs - closed circuit television surveillance operators and air traffic controllers. In its present form, the existing taxonomy of experimental settings has limited applicability to these operational settings. Therefore, recommendations for expanding the taxonomy to include more aspects of complexity are made. It is argued that the revised taxonomy be used in conjunction with situation awareness, which makes provision for the cognitive processes involved in these jobs.  相似文献   

9.
《Ergonomics》2012,55(11):1643-1655
The ability to generalise vigilance research to operational environments has been questioned, largely due to differences between laboratory research and real-world settings. The taxonomy of vigilance tasks proposed by Parasuraman and Davies (1977 Parasuraman, R. and Davies, D. R. 1977. “A taxonomic analysis of vigilance performance”. In Vigilance: Theory, operational performance, and physiological correlates, Edited by: Mackie, R. R. 559574. New York: Plenum. [Crossref] [Google Scholar]) represents an attempt to classify vigilance tasks so that tasks with similar information-processing demands can be compared and the ability to generalise results enhanced. Although the taxonomy originally included complexity, the term specifically referred to multiple sources of information. Complexity has been overlooked in much of the traditional vigilance literature, although it is included in more recent studies of jobs such as air traffic control. In this paper, the taxonomy is evaluated in relation to two vigilance intensive jobs – closed circuit television surveillance operators and air traffic controllers. In its present form, the existing taxonomy of experimental settings has limited applicability to these operational settings. Therefore, recommendations for expanding the taxonomy to include more aspects of complexity are made. It is argued that the revised taxonomy be used in conjunction with situation awareness, which makes provision for the cognitive processes involved in these jobs.  相似文献   

10.
Weighted voting is the commonly used strategy for combining predictions in pairwise classification. Even though it shows good classification performance in practice, it is often criticized for lacking a sound theoretical justification. In this paper, we study the problem of combining predictions within a formal framework of label ranking and, under some model assumptions, derive a generalized voting strategy in which predictions are properly adapted according to the strengths of the corresponding base classifiers. We call this strategy adaptive voting and show that it is optimal in the sense of yielding a MAP prediction of the class label of a test instance. Moreover, we offer a theoretical justification for weighted voting by showing that it yields a good approximation of the optimal adaptive voting prediction. This result is further corroborated by empirical evidence from experiments with real and synthetic data sets showing that, even though adaptive voting is sometimes able to achieve consistent improvements, weighted voting is in general quite competitive, all the more in cases where the aforementioned model assumptions underlying adaptive voting are not met. In this sense, weighted voting appears to be a more robust aggregation strategy.  相似文献   

11.
Support vector machines are a relatively new classification method which has nowadays established a firm foothold in the area of machine learning. It has been applied to numerous targets of applications. Automated taxa identification of benthic macroinvertebrates has got generally very little attention and especially using a support vector machine in it. In this paper we investigate how the changing of a kernel function in an SVM classifier effects classification results. A novel question is how the changing of a kernel function effects the number of ties in a majority voting method when we are dealing with a multi-class case. We repeated the classification tests with two different feature sets. Using SVM, we present accurate classification results proposing that SVM suits well to the automated taxa identification of benthic macroinvertebrates. We also present that the selection of a kernel has a great effect on the number of ties.  相似文献   

12.
A software system Gel Analysis System for Epo (GASepo) has been developed within an international WADA project. As recent WADA criteria of rEpo positivity are based on identification of each relevant object (band) in Epo images, development of suitable methods of image segmentation and object classification were needed for the GASepo system. In the paper we address two particular problems: segmentation of disrupted bands and classification of the segmented objects into three or two classes. A novel band projection operator is based on convenient object merging measures and their discrimination analysis using specifically generated training set of segmented objects. A weighted ranks classification method is proposed, which is new in the field of image classification. It is based on ranks of the values of a specific criterial function. The weighted ranks classifiers proposed in our paper have been evaluated on real samples of segmented objects of Epo images and compared to three selected well-known classifiers: Fisher linear classifier, Support Vector Machine, and Multilayer Perceptron.
Svorad Štolc (Corresponding author)Email:
  相似文献   

13.
The rapid growth in Internet applications in tourism has lead to an enormous amount of personal reviews for travel-related information on the Web. These reviews can appear in different forms like BBS, blogs, Wiki or forum websites. More importantly, the information in these reviews is valuable to both travelers and practitioners for various understanding and planning processes. An intrinsic problem of the overwhelming information on the Internet, however, is information overloading as users are simply unable to read all the available information. Query functions in search engines like Yahoo and Google can help users find some of the reviews that they needed about specific destinations. The returned pages from these search engines are still beyond the visual capacity of humans. In this research, sentiment classification techniques were incorporated into the domain of mining reviews from travel blogs. Specifically, we compared three supervised machine learning algorithms of Naïve Bayes, SVM and the character based N-gram model for sentiment classification of the reviews on travel blogs for seven popular travel destinations in the US and Europe. Empirical findings indicated that the SVM and N-gram approaches outperformed the Naïve Bayes approach, and that when training datasets had a large number of reviews, all three approaches reached accuracies of at least 80%.  相似文献   

14.
Proton therapy has the potential for high-precision radiotherapy of retinal tumors. However, the standardized eye models currently used do not fully account for the patient's individual anatomy. To better exploit the data provided by MR images, a model-based approach was used based on a database of eye models. A face recognition algorithm was advanced to define similarity criteria between the reference image and the actual image. After building a high-dimensional feature vector and using a training data set, the reference model was selected by using the minimum Mahalanobis distance between the image to be classified and the reference images.  相似文献   

15.
Because of its self-regulating nature, immune system has been an inspiration source for usually unsupervised learning methods in classification applications of Artificial Immune Systems (AIS). But classification with supervision can bring some advantages to AIS like other classification systems. Indeed, there have been some studies, which have obtained reasonable results and include supervision in this branch of AIS. In this study, we have proposed a new supervised AIS named as Supervised Affinity Maturation Algorithm (SAMA) and have presented its performance results through applying it to diagnose atherosclerosis using carotid artery Doppler signals as a real-world medical classification problem. We have employed the maximum envelope of the carotid artery Doppler sonograms derived from Autoregressive (AR) method as an input of proposed classification system and reached a maximum average classification accuracy of 98.93% with 10-fold cross-validation method used in training-test portioning. To evaluate this result, comparison was done with Artificial Neural Networks and Decision Trees. Our system was found to be comparable with those systems, which are used effectively in literature with respect to classification accuracy and classification time. Effects of system's parameters were also analyzed in performance evaluation applications. With this study and other possible contributions to AIS, classification algorithms with effective performances can be developed and potential of AIS in classification can be further revealed.  相似文献   

16.
While mapping vegetation and land cover using remotely sensed data has a rich history of application at local scales, it is only recently that the capability has evolved to allow the application of classification models at regional, continental and global scales. The development of a comprehensive training, testing and validation site network for the globe to support supervised and unsupervised classification models is fraught with problems imposed by scale, bioclimatic representativeness of the sites, availability of ancillary map and high spatial resolution remote sensing data, landscape heterogeneity, and vegetation variability. The System for Terrestrial Ecosystem Parameterization (STEP) - a model for characterizing site biophysical, vegetation and landscape parameters to be used for algorithm training and testing and validation - has been developed to support supervised land cover mapping. This system was applied in Central America using two classification systems based on 428 sites. The results indicate that: (1) it is possible to generate site data efficiently at the regional scale; (2) implementation of a supervised model using artificial neural network and decision tree classification algorithms is feasible at the regional level with classification accuracies of 75-88%; and (3) the STEP site parameter model is effective for generating multiple classification systems and thus supporting the development of global surface biophysical parameters.  相似文献   

17.
提出了一种基于固有模态函数(Intrinsic Mode Function,IMF)能量熵的特征提取方法。对三类脑电思维信号分别进行了经验模态分解(Empirical Mode Decomposition,EMD),并得到与其相对应的IMF。试验发现对于不同类别的信号,同阶的IMF能量的判别熵有明显的不同。而采用K-近邻分类器对三类脑电信号进行了分类,发现基于最佳特征向量选择的分类试验的平均正确识别率达75%以上。  相似文献   

18.
During heavy rains, small urbanized watersheds with predominantly impervious surfaces exhibit high surface runoff which may subsequently lead to flash floods. Prediction of such extreme events in an efficient and timely manner is one of the important problems faced by regional flood management teams. These predictions can be done using supervised classification and data collected by stream and rain gauges installed on the watershed. The accuracy of predictions depends on data granularity which determines the achievable level of uncertainty for different lead time intervals. The study was implemented on data collected in a highly urbanized watershed of a small stream – Spring Creek, Ontario, Canada. It was demonstrated that the upscaling of observation data improves the classifiers’ performance while increasing modelling scales. The obtained results suggest the development of ensembles of classifiers trained on data sets of different granularity as a means to extend the lead time of reliable predictions.  相似文献   

19.
The objective of this article was to apply supervised classification to accomplish automated landform mapping using four morphometric parameters. The approach was tested on high-resolution light detection and ranging (lidar) elevation data from the northern flank of the Dushanzi Anticline, western China. The morphometric parameters were calculated by applying a moving window to the lidar-derived digital elevation models (DEMs). The results obtained from using the Jeffries–Matusita distance and standard deviation ellipses for the training areas show that the main landforms in the study area can be distinguished using the four morphometric parameters. Compared with field surveying and image interpretation, the automated landform classification technique has advantages in terms of its efficiency and reproducibility, and it is capable of accurately reconstructing a detailed geomorphological map covering the study area with a classification accuracy of 72.9% and a kappa coefficient (κ) of 0.66. The geomorphological map derived using the automated classification approach revealed an obvious east–west zone composed of alluvial landforms. The close spatial relationship between this zone and mapped thrust faults indicates that this east–west zone represents a belt of seismic risks associated with the thrust faults, which should be avoided in major engineering projects. Due to its accuracy and efficacy, an automated landform classification has considerable prospects for its application in geomorphological mapping and landform characterization studies in the future, especially given the increasing availability of high-resolution digital terrain data.  相似文献   

20.
粗糙集与决策树在电子邮件分类与过滤中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
垃圾邮件的识别与过滤是目前研究的热点问题之一。而粗糙集是一种新的处理模糊和不确定性知识的数据分析工具,已被成功地应用到许多有关分类的领域。将粗糙集与决策树结合,提出一个基于RS-DT的邮件分类方案与模型,并进行了实验及结果分析。通过与朴素贝叶斯模型及SVM的比较,表明提出的基于RS-DT的模型可以降低把正常邮件错分为垃圾邮件的比率,提高过滤系统的自学习能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号