首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 156 毫秒
1.
赵强利  蒋艳凰  卢宇彤 《软件学报》2015,26(10):2567-2580
集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法.针对传统集成式数据流挖掘存在的缺陷,将人类的回忆和遗忘机制引入到数据流挖掘中,提出基于记忆的数据流挖掘模型MDSM(memorizing based data stream mining).该模型将基分类器看作是系统获得的知识,通过"回忆与遗忘"机制,不仅使历史上有用的基分类器因记忆强度高而保存在"记忆库"中,提高预测的稳定性,而且从"记忆库"中选取当前分类效果好的基分类器参与集成预测,以提高对概念变化的适应能力.基于MDSM模型,提出了一种集成式数据流挖掘算法MAE(memorizing based adaptive ensemble),该算法利用Ebbinghaus遗忘曲线对系统的遗忘机制进行设计,并利用选择性集成来模拟人类的"回忆"机制.与4种典型的数据流挖掘算法进行比较,结果表明:MAE算法分类精度高,对概念漂移的整体适应能力强,尤其对重复出现的概念漂移以及实际应用中存在的复杂概念漂移具有很好的适应能力.不仅能够快速适应新的概念变化,并且能够有效抵御随机的概念波动对系统性能的影响.  相似文献   

2.
This article presents a sufficient comparison of two types of advanced non-parametric classifiers implemented in remote sensing for land cover classification. A SPOT-5 HRG image of Yanqing County, Beijing, China, was used, in which agriculture and forest dominate land use. Artificial neural networks (ANNs), including the adaptive backpropagation (ABP) algorithm, Levenberg–Marquardt (LM) algorithm, Quasi-Newton (QN) algorithm and radial basis function (RBF) were carefully tested. The LM–ANN and RBF–ANN, which outperform the other two, were selected to make a detailed comparison with support vector machines (SVMs). The experiments show that those well-trained ANNs and SVMs have no significant difference in classification accuracy, but the SVM usually performs slightly better. Analysis of the effect of the training set size highlights that the SVM classifier has great tolerance on a small training set and avoids the problem of insufficient training of ANN classifiers. The testing also illustrates that the ANNs and SVMs can vary greatly with regard to training time. The LM–ANN can converge very quickly but not in a stable manner. By contrast, the training of RBF–ANN and SVM classifiers is fast and can be repeatable.  相似文献   

3.
Oh et al., 2006a, Oh et al., 2006b proposed a classification approach for building an early warning system (EWS) against potential financial crises. This EWS classification approach has been developed mainly for monitoring daily financial market against its abnormal movement and is based on the newly-developed crisis hypothesis that financial crisis is often self-fulfilling because of herding behavior of the investors. This article extends the EWS classification approach to the traditional-type crisis, i.e., the financial crisis is an outcome of the long-term deterioration of the economic fundamentals. It is shown that support vector machine (SVM) is an efficient classifier in such case.  相似文献   

4.
Early warning system (EWS) can be treated as a pattern recognition problem since the distinctive feature of economic crisis makes it possible to distinguish critical and normal economic situations using a pattern classifier. Although the most works in EWS are mainly focused on training and pattern classifier, little attention has been paid to the effective indices or feature variables that allow closer look and analysis about the current instability nature of the economic crisis. This paper proposes to utilize market instability index (MII) and stepwise risk warning levels that can diagnose the current instability of the stock market to foretell how the current stock market will proceed in advance. This approach allows the proper policy actions to be taken for the possible financial crisis according to different risk warning levels of instability. Through empirical examples with Korean stock market and Greece stock market, the proposed method demonstrates its potential usefulness in an early warning system.  相似文献   

5.
Gender recognition has been playing a very important role in various applications such as human–computer interaction, surveillance, and security. Nonlinear support vector machines (SVMs) were investigated for the identification of gender using the Face Recognition Technology (FERET) image face database. It was shown that SVM classifiers outperform the traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, and nearest neighbour). In this context, this paper aims to improve the SVM classification accuracy in the gender classification system and propose new models for a better performance. We have evaluated different SVM learning algorithms; the SVM‐radial basis function with a 5% outlier fraction outperformed other SVM classifiers. We have examined the effectiveness of different feature selection methods. AdaBoost performs better than the other feature selection methods in selecting the most discriminating features. We have proposed two classification methods that focus on training subsets of images among the training images. Method 1 combines the outcome of different classifiers based on different image subsets, whereas method 2 is based on clustering the training data and building a classifier for each cluster. Experimental results showed that both methods have increased the classification accuracy.  相似文献   

6.
The application of neural networks in the data mining has become wider. Although neural networks may have complex structure, long training time, and the representation of results is not comprehensible, neural networks have high acceptance ability for noisy data, high accuracy and are preferable in data mining. On the other hand, It is an open question as to what is the best way to train and extract symbolic rules from trained neural networks in domains like classification. In this paper, we train the neural networks by constructive learning and present the analysis of the convergence rate of the error in a neural network with and without threshold which have been learnt by a constructive method to obtain the simple structure of the network.The response of ANN is acquired but its result is not in understandable form or in a black box form. It is frequently desirable to use the model backwards and identify sets of input variable which results in a desired output value. The large numbers of variables and nonlinear nature of many materials models that can help finding an optimal set of difficult input variables. We will use a genetic algorithm to solve this problem. The method is evaluated on different public-domain data sets with the aim of testing the predictive ability of the method and compared with standard classifiers, results showed comparatively high accuracy.  相似文献   

7.
This article uses powerful technique of artificial neural network (ANN) models to simulate and estimate structural response of two-storey shear building by training the model for a particular earthquake. The neural network is first trained for a real earthquake data and the numerically generated responses of different floors of two-storey buildings as the training patterns. Trained ANN architecture is then used to simulate and test the structural response of different floors for various intensity earthquake data and it is found that the predicted responses given by ANN model are good for practical purposes. It is worth mentioning that although the simulation is done with numerically generated response data for particular earthquake, the idea may also be used for actual experimental (response) data.  相似文献   

8.
Offline/realtime traffic classification using semi-supervised learning   总被引:4,自引:0,他引:4  
Jeffrey  Anirban  Martin  Ira  Carey 《Performance Evaluation》2007,64(9-12):1194-1213
Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on a network. In this paper, we explore this latter approach and propose a semi-supervised classification method that can accommodate both known and unknown applications. To the best of our knowledge, this is the first work to use semi-supervised learning techniques for the traffic classification problem. Our approach allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows. We consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers. Our performance evaluation using empirical Internet traffic traces that span a 6-month period shows that: (1) high flow and byte classification accuracy (i.e., greater than 90%) can be achieved using training data that consists of a small number of labeled and a large number of unlabeled flows; (2) presence of “mice” and “elephant” flows in the Internet complicates the design of classifiers, especially of those with high byte accuracy, and necessitates the use of weighted sampling techniques to obtain training flows; and (3) retraining of classifiers is necessary only when there are non-transient changes in the network usage characteristics. As a proof of concept, we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach.  相似文献   

9.
Choice of a classification algorithm is generally based upon a number of factors, among which are availability of software, ease of use, and performance, measured here by overall classification accuracy. The maximum likelihood (ML) procedure is, for many users, the algorithm of choice because of its ready availability and the fact that it does not require an extended training process. Artificial neural networks (ANNs) are now widely used by researchers, but their operational applications are hindered by the need for the user to specify the configuration of the network architecture and to provide values for a number of parameters, both of which affect performance. The ANN also requires an extended training phase.In the past few years, the use of decision trees (DTs) to classify remotely sensed data has increased. Proponents of the method claim that it has a number of advantages over the ML and ANN algorithms. The DT is computationally fast, make no statistical assumptions, and can handle data that are represented on different measurement scales. Software to implement DTs is readily available over the Internet. Pruning of DTs can make them smaller and more easily interpretable, while the use of boosting techniques can improve performance.In this study, separate test and training data sets from two different geographical areas and two different sensors—multispectral Landsat ETM+ and hyperspectral DAIS—are used to evaluate the performance of univariate and multivariate DTs for land cover classification. Factors considered are: the effects of variations in training data set size and of the dimensionality of the feature space, together with the impact of boosting, attribute selection measures, and pruning. The level of classification accuracy achieved by the DT is compared to results from back-propagating ANN and the ML classifiers. Our results indicate that the performance of the univariate DT is acceptably good in comparison with that of other classifiers, except with high-dimensional data. Classification accuracy increases linearly with training data set size to a limit of 300 pixels per class in this case. Multivariate DTs do not appear to perform better than univariate DTs. While boosting produces an increase in classification accuracy of between 3% and 6%, the use of attribute selection methods does not appear to be justified in terms of accuracy increases. However, neither the univariate DT nor the multivariate DT performed as well as the ANN or ML classifiers with high-dimensional data.  相似文献   

10.
Classifier combination falls in the so called data mining area. Its aim is to combine some paradigms from the supervised classification – sometimes with a previous non-supervised data division phase – in order to improve the individual accuracy of the component classifiers. Formation of classifier hierarchies is an alternative among the several methods of classifier combination. In this paper we present a novel method to find good hierarchies of classifiers for given databases. In this new proposal, a search is performed by means of genetic algorithms, returning the best individual according to the classification accuracy over the dataset, estimated through 10-fold cross-validation. Experiments have been carried out over 14 databases from the UCI repository, showing an improvement in the performance compared to the single classifiers. Moreover, similar or better results than other approaches, such as decision tree bagging and boosting, have been obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号