首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a new model of support vector machines (SVMs) that handle data with tolerance and uncertainty. The constraints of the SVM are converted to fuzzy inequality. Giving more relaxation to the constraints allows us to consider an importance degree for each training samples in the constraints of the SVM. The new method is called relaxed constraints support vector machines (RSVMs). Also, the fuzzy SVM model is improved with more relaxed constraints. The new model is called fuzzy RSVM. With this method, we are able to consider importance degree for training samples both in the cost function and constraints of the SVM, simultaneously. In addition, we extend our method to solve one‐class classification problems. The effectiveness of the proposed method is demonstrated on artificial and real‐life data sets.  相似文献   

2.
3.
Mining fuzzy association rules for classification problems   总被引:3,自引:0,他引:3  
The effective development of data mining techniques for the discovery of knowledge from training samples for classification problems in industrial engineering is necessary in applications, such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy association rules for classification problems. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training samples by fuzzy partitioning in each attribute, and the other to generate fuzzy association rules for classification problems by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy association rules for classification problems.  相似文献   

4.
传统的机器学习方法是在训练数据和测试数据分布一致的前提下进行的。然而,在一些现实世界中的应用,训练数据和测试数据来自不同的领域。在不考虑数据分布的情况下,传统的机器学习算法可能会失效,针对这一问题,提出一种基于模糊C均值(FCM)的文本迁移学习算法。首先,通过简单分类器对测试样本分类,接着,利用自然邻算法构建样本初始模糊隶属度;然后,利用FCM算法通过迭代更新样本模糊隶属度,修正样本标签;最后,对样本孤立点进行处理,得到最终分类结果。实验结果表明,该算法具有较好的正确率,有效的解决了在训练数据和测试数据分布不一致的情况下的文本分类问题。  相似文献   

5.
This paper proposes a classification method that is based on easily interpretable fuzzy rules and fully capitalizes on the two key technologies, namely pruning the outliers in the training data by SVMs (support vector machines), i.e., eliminating the influence of outliers on the learning process; finding a fuzzy set with sound linguistic interpretation to describe each class based on AFS (axiomatic fuzzy set) theory. Compared with other fuzzy rule-based methods, the proposed models are usually more compact and easily understandable for the users since each class is described by much fewer rules. The proposed method also comes with two other advantages, namely, each rule obtained from the proposed algorithm is simply a conjunction of some linguistic terms, there are no parameters that are required to be tuned. The proposed classification method is compared with the previously published fuzzy rule-based classifiers by testing them on 16 UCI data sets. The results show that the fuzzy rule-based classifier presented in this paper, offers a compact, understandable and accurate classification scheme. A balance is achieved between the interpretability and the accuracy.  相似文献   

6.
Instance selection aims at filtering out noisy data (or outliers) from a given training set, which not only reduces the need for storage space, but can also ensure that the classifier trained by the reduced set provides similar or better performance than the baseline classifier trained by the original set. However, since there are numerous instance selection algorithms, there is no concrete winner that is the best for various problem domain datasets. In other words, the instance selection performance is algorithm and dataset dependent. One main reason for this is because it is very hard to define what the outliers are over different datasets. It should be noted that, using a specific instance selection algorithm, over-selection may occur by filtering out too many ‘good’ data samples, which leads to the classifier providing worse performance than the baseline. In this paper, we introduce a dual classification (DuC) approach, which aims to deal with the potential drawback of over-selection. Specifically, performing instance selection over a given training set, two classifiers are trained using both a ‘good’ and ‘noisy’ sets respectively identified by the instance selection algorithm. Then, a test sample is used to compare the similarities between the data in the good and noisy sets. This comparison guides the input of the test sample to one of the two classifiers. The experiments are conducted using 50 small scale and 4 large scale datasets and the results demonstrate the superior performance of the proposed DuC approach over the baseline instance selection approach.  相似文献   

7.
一种基于混合重取样策略的非均衡数据集分类算法   总被引:1,自引:0,他引:1  
非均衡数据是分类中的常见问题,当一类实例远远多于另一类实例,则代表类非均衡,真实世界的分类问题存在很多类别非均衡的情况并得到众多专家学者的重视,非均衡数据的分类问题已成为数据挖掘和模式识别领域中新的研究热点,是对传统分类算法的重大挑战。本文提出了一种新型重取样算法,采用改进的SMOTE算法对少数类数据进行过取样,产生新的少数类样本,使类之间数据量基本均衡,然后再根据SMO算法的特点,提出使用聚类的数据欠取样方法,删除冗余或噪音数据。通过对数据集的过取样和清理之后,一些有用的样本被保留下来,减少了数据集规模,增强支持向量机训练执行的效率。实验结果表明,该方法在保持整体分类性能的情况下可以有效地提高少数类的分类精度。  相似文献   

8.
In this paper, we make an effort to overcome the sensitivity of traditional clustering algorithms to noisy data points (noise and outliers). A novel pruning method, in terms of information theory, is therefore proposed to phase out noisy points for robust data clustering. This approach identifies and prunes the noisy points based on the maximization of mutual information against input data distributions such that the resulting clusters are least affected by noise and outliers, where the degree of robustness is controlled through a separate parameter to make a trade-off between rejection of noisy points and optimal clustered data. The pruning approach is general, and it can improve the robustness of many existing traditional clustering methods. In particular, we apply the pruning approach to improve the robustness of fuzzy c-means clustering and its extensions, e.g., fuzzy c-spherical shells clustering and kernel-based fuzzy c-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. The effectiveness of the proposed pruning approach is supported by experimental results.  相似文献   

9.
基于健壮支持向量机的异常检测   总被引:1,自引:0,他引:1  
用于异常检测的机器学习方法,如神经网络和支持向量机,都对训练样本的噪声非常敏感,进而导致推广能力和分类准确性的下降。为了解决上述问题,论文提出一种新的基于健壮支持向量机的方法。先将RSVM与标准SVM作了对比,然后使用1998DARPABSM的数据作为评估数据。实验表明,该方法在入侵检测的准确率、误检率和有噪声情况下的推广能力和运行时等多项指标上都有良好的表现。  相似文献   

10.
A study on reduced support vector machines   总被引:14,自引:0,他引:14  
Recently the reduced support vector machine (RSVM) was proposed as an alternate of the standard SVM. Motivated by resolving the difficulty on handling large data sets using SVM with nonlinear kernels, it preselects a subset of data as support vectors and solves a smaller optimization problem. However, several issues of its practical use have not been fully discussed yet. For example, we do not know if it possesses comparable generalization ability as the standard SVM. In addition, we would like to see for how large problems RSVM outperforms SVM on training time. In this paper we show that the RSVM formulation is already in a form of linear SVM and discuss four RSVM implementations. Experiments indicate that in general the test accuracy of RSVM are a little lower than that of the standard SVM. In addition, for problems with up to tens of thousands of data, if the percentage of support vectors is not high, existing implementations for SVM is quite competitive on the training time. Thus, from this empirical study, RSVM will be mainly useful for either larger problems or those with many support vectors. Experiments in this paper also serve as comparisons of: 1) different implementations for linear SVM and 2) standard SVM using linear and quadratic cost functions.  相似文献   

11.
The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Ziarko thus proposed the variable precision rough-set model to deal with noisy data and uncertain information. This model allowed for some degree of uncertainty and misclassification in the mining process. Conventionally, the mining algorithms based on the rough-set theory identify the relationships among data using crisp attribute values; however, data with quantitative values are commonly seen in real-world applications. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then calculates the fuzzy β-lower and the fuzzy β-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects. The paper thus extends the existing rough-set mining approaches to process quantitative data with tolerance of noise and uncertainty.  相似文献   

12.
In recent years because of substantial use of wireless sensor network the distributed estimation has attracted the attention of many researchers. Two popular learning algorithms: incremental least mean square (ILMS) and diffusion least mean square (DLMS) have been reported for distributed estimation using the data collected from sensor nodes. But these algorithms, being derivative based, have a tendency of providing local minima solution particularly for minimization of multimodal cost function. Hence for problems like distributed parameters estimation of IIR systems, alternative distributed algorithms are required to be developed. Keeping this in view the present paper proposes two population based incremental particle swarm optimization (IPSO) algorithms for estimation of parameters of noisy IIR systems. But the proposed IPSO algorithms provide poor performance when the measured data is contaminated with outliers in the training samples. To alleviate this problem the paper has proposed a robust distributed algorithm (RDIPSO) for IIR system identification task. The simulation results of benchmark IIR systems demonstrate that the proposed algorithms provide excellent identification performance in all cases even when the training samples are contaminated with outliers.  相似文献   

13.
14.
In dealing with the Two-Class classification problems, the traditional support vector machine (SVM) often cannot achieve good classification accuracy when outliers exist in the training data set. The fuzzy support vector machine (FSVM) can resolve this problem with an appropriate fuzzy membership for each data point. The effect of the outliers can be effectively reduced when the classification problem is solved. In this paper, a new fuzzy membership function is employed in the linear and nonlinear fuzzy support vector machine respectively. The fuzzy membership is calculated based on the structural information of two classes in the input space and in the feature space. This method can distinguish the support vectors and the outliers effectively. Experimental results show that this approach contributes greatly to the reduction of the effect of the outliers and significantly improves the classification accuracy and generalization.  相似文献   

15.
This paper describes various experiments involving the pre-processing of training data in order to improve the performance of a neural network classifier. To minimise the effect of noisy training samples we applied k-nearest neighbour filtering to the training data. This effectively detects and eliminates outliers, which would otherwise degrade the learning process of the neural network. For a range of neural network complexities, and for two classification tasks, the filtering operation increased classification accuracy.  相似文献   

16.
Robust clustering by pruning outliers   总被引:1,自引:0,他引:1  
In many applications of C-means clustering, the given data set often contains noisy points. These noisy points will affect the resulting clusters, especially if they are far away from the data points. In this paper, we develop a pruning approach for robust C-means clustering. This approach identifies and prunes the outliers based on the sizes and shapes of the clusters so that the resulting clusters are least affected by the outliers. The pruning approach is general, and it can improve the robustness of many existing C-means clustering methods. In particular, we apply the pruning approach to improve the robustness of hard C-means clustering, fuzzy C-means clustering, and deterministic-annealing C-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. In addition, we integrate the pruning approach with the fuzzy approach and the possibilistic approach to design two new algorithms for robust C-means clustering. The numerical results demonstrate that the pruning approach can achieve good robustness.  相似文献   

17.
In the objective world, how to deal with the complexity and uncertainty of big data efficiently and accurately has become the premise and key to machine learning. Fuzzy support vector machine (FSVM) not only deals with the classification problems for training samples with fuzzy information, but also assigns a fuzzy membership degree to each training sample, allowing different training samples to contribute differently in predicting an optimal hyperplane to separate two classes with maximum margin, reducing the effect of outliers and noise, Quantum computing has super parallel computing capabilities and holds the promise of faster algorithmic processing of data. However, FSVM and quantum computing are incapable of dealing with the complexity and uncertainty of big data in an efficient and accurate manner. This paper research and propose an efficient and accurate quantum fuzzy support vector machine (QFSVM) algorithm based on the fact that quantum computing can efficiently process large amounts of data and FSVM is easy to deal with the complexity and uncertainty problems. The central idea of the proposed algorithm is to use the quantum algorithm for solving linear systems of equations (HHL algorithm) and the least-squares method to solve the quadratic programming problem in the FSVM. The proposed algorithm can determine whether a sample belongs to the positive or negative class while also achieving a good generalization performance. Furthermore, this paper applies QFSVM to handwritten character recognition and demonstrates that QFSVM can be run on quantum computers, and achieve accurate classification of handwritten characters. When compared to FSVM, QFSVM’s computational complexity decreases exponentially with the number of training samples.  相似文献   

18.
In this paper, fuzzy inference models for pattern classifications have been developed and fuzzy inference networks based on these models are proposed. Most of the existing fuzzy rule-based systems have difficulties in deriving inference rules and membership functions directly from training data. Rules and membership functions are obtained from experts. Some approaches use backpropagation (BP) type learning algorithms to learn the parameters of membership functions from training data. However, BP algorithms take a long time to converge and they require an advanced setting of the number of inference rules. The work to determine the number of inference rules demands lots of experiences from the designer. In this paper, self-organizing learning algorithms are proposed for the fuzzy inference networks. In the proposed learning algorithms, the number of inference rules and the membership functions in the inference rules will be automatically determined during the training procedure. The learning speed is fast. The proposed fuzzy inference network (FIN) classifiers possess both the structure and the learning ability of neural networks, and the fuzzy classification ability of fuzzy algorithms. Simulation results on fuzzy classification of two-dimensional data are presented and compared with those of the fuzzy ARTMAP. The proposed fuzzy inference networks perform better than the fuzzy ARTMAP and need less training samples.  相似文献   

19.
由于SVM(Support Vector Machine)在有离群点和不平衡数据的问题中分类性能相对较低,有研究者提出了一种面向不均衡分类的隶属度加权模糊支持向量机,只是文中的模糊隶属度并不能较好衡量样本点对确定最佳分划超平面所做的贡献大小。针对以上问题提出了密度峰(Density Peaks,DP)聚类的可信性加权模糊支持向量机。首先由DP聚类找到离群点后剔除。再根据点到由DEC(Different Error Costs)确定的超平面的距离,得到初始隶属度,并用改进的FSVM-CIL(Fuzzy Support Vector Machines for Class Imbalance Learning)更新隶属度。之后剔除部分样本点,起到简约样本的作用,并减少数据不平衡带来的影响。通过实验验证了所提出算法的有效性。  相似文献   

20.
Robust TSK fuzzy modeling for function approximation with outliers   总被引:3,自引:0,他引:3  
The Takagi-Sugeno-Kang (TSK) type of fuzzy models has attracted a great attention of the fuzzy modeling community due to their good performance in various applications. Most approaches for modeling TSK fuzzy rules define their fuzzy subspaces based on the idea of training data being close enough instead of having similar functions. Besides, training data sets algorithms often contain outliers, which seriously affect least-square error minimization clustering and learning algorithms. A robust TSK fuzzy modeling approach is presented. In the approach, a clustering algorithm termed as robust fuzzy regression agglomeration (RFRA) is proposed to define fuzzy subspaces in a fuzzy regression manner with robust capability against outliers. To obtain a more precision model, a robust fine-tuning algorithm is then employed. Various examples are used to verify the effectiveness of the proposed approach. From the simulation results, the proposed robust TSK fuzzy modeling indeed showed superior performance over other approaches  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号