首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers - objects who behave in an unexpected way or have abnormal properties. The identification of outliers is important for many applications such as intrusion detection, credit card fraud, criminal activities in electronic commerce, medical diagnosis and anti-terrorism, etc. In this paper, we propose a hybrid approach to outlier detection, which combines the opinions from boundary-based and distance-based methods for outlier detection ( [Jiang et al., 2005], [Jiang et al., 2009] and [Knorr and Ng, 1998]). We give a novel definition of outliers - BD (boundary and distance)-based outliers, by virtue of the notion of boundary region in rough set theory and the definitions of distance-based outliers. An algorithm to find such outliers is also given. And the effectiveness of our method for outlier detection is demonstrated on two publicly available databases.  相似文献   

2.
Clustering analysis is an important topic in artificial intelligence, data mining and pattern recognition research. Conventional clustering algorithms, for instance, the famous Fuzzy C-means clustering algorithm (FCM), assume that all the attributes are equally relevant to all the clusters. However in most domains, especially for high-dimensional dataset, some attributes are irrelevant, and some relevant ones are less important than others with respect to a specific class. In this paper, such imbalances between the attributes are considered and a new weighted fuzzy kernel-clustering algorithm (WFKCA) is presented. WFKCA performs clustering in a kernel feature space mapped by mercer kernels. Compared with the conventional hard kernel-clustering algorithm, WFKCA can yield the meaningful prototypes (cluster centers) of the clusters. Numerical convergence properties of WFKCA are also discussed. For in-depth studies, WFKCA is extended to WFKCA2, which has been demonstrated as a useful tool for clustering incomplete data. Numerical examples demonstrate the effectiveness of the new WFKCA algorithm  相似文献   

3.
In this paper, we show how one can take advantage of the stability and effectiveness of object data clustering algorithms when the data to be clustered are available in the form of mutual numerical relationships between pairs of objects. More precisely, we propose a new fuzzy relational algorithm, based on the popular fuzzy C-means (FCM) algorithm, which does not require any particular restriction on the relation matrix. We describe the application of the algorithm to four real and four synthetic data sets, and show that our algorithm performs better than well-known fuzzy relational clustering algorithms on all these sets.  相似文献   

4.
Dubois and Prade (1990) [1] introduced the notion of fuzzy rough sets as a fuzzy generalization of rough sets, which was originally proposed by Pawlak (1982) [8]. Later, Radzikowska and Kerre introduced the so-called (I,T)-fuzzy rough sets, where I is an implication and T is a triangular norm. In the present paper, by using a pair of implications (I,J), we define the so-called (I,J)-fuzzy rough sets, which generalize the concept of fuzzy rough sets in the sense of Radzikowska and Kerre, and that of Mi and Zhang. Basic properties of (I,J)-fuzzy rough sets are investigated in detail.  相似文献   

5.
Semi-supervised fuzzy clustering: A kernel-based approach   总被引:1,自引:0,他引:1  
Huaxiang Zhang  Jing Lu 《Knowledge》2009,22(6):477-481
Semi-supervised clustering algorithms aim to improve the clustering accuracy under the supervisions of a limited amount of labeled data. Since kernel-based approaches, such as kernel-based fuzzy c-means algorithm (KFCM), have been successfully used in classification and clustering problems, in this paper, we propose a novel semi-supervised clustering approach using the kernel-based method based on KFCM and denote it the semi-supervised kernel fuzzy c-mean algorithm (SSKFCM). The objective function of SSKFCM is defined by adding classification errors of both the labeled and the unlabeled data, and its global optimum has been obtained through repeatedly updating the fuzzy memberships and the optimized kernel parameter. The objective function may have more than one local optimum, so we employ a function transformation technique to reformulate the objective function after a local minimum has been obtained, and select the best optimum as the solution to the objective function. Experimental results on both the artificial and several real data sets show SSKFCM performs better than its conventional counterparts and it achieves the best accurate clustering results when the parameter is optimized.  相似文献   

6.
传统的快速聚类算法大多基于模糊C均值算(Fuzzy C-means,FCM),而FCM对初始聚类中心敏感,对噪音数据敏感并且容易收敛到局部极小值,因而聚类准确率不高。建立使用分治策略解决聚类问题的算法架构,充分考虑数据本身特性并对传统的FCM算法进行改进,标准数据集的实验结果表明这种基于分治策略的FCM聚类算法较好地提高了算法的聚类准确率,加快了收敛速度。  相似文献   

7.
Clustering is a widely used unsupervised data mining technique. It allows to identify structures in collections of objects by grouping them into classes, named clusters, in such a way that similarity of objects within any cluster is maximized and similarity of objects belonging to different clusters is minimized. In density-based clustering, a cluster is defined as a connected dense component and grows in the direction driven by the density. The basic structure of density-based clustering presents some common drawbacks: (i) parameters have to be set; (ii) the behavior of the algorithm is sensitive to the density of the starting object; and (iii) adjacent clusters of different densities could not be properly identified. In this paper, we address all the above problems. Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a space with one more dimension. It performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets.  相似文献   

8.
This paper presents a new extension of fuzzy sets: R-fuzzy sets. The membership of an element of a R-fuzzy set is represented as a rough set. This new extension facilitates the representation of an uncertain fuzzy membership with a rough approximation. Based on our definition of R-fuzzy sets and their operations, the relationships between R-fuzzy sets and other fuzzy sets are discussed and some examples are provided.  相似文献   

9.
Evolutionary semi-supervised fuzzy clustering   总被引:3,自引:0,他引:3  
For learning classifier from labeled and unlabeled data, this paper proposes an evolutionary semi-supervised fuzzy clustering algorithm. Class labels information provided by labeled data is used to guide the evolution process of each fuzzy partition on unlabeled data, which plays the role of chromosome. The fitness of each chromosome is evaluated with a combination of fuzzy within cluster variance of unlabeled data and misclassification error of labeled data. The structure of the clusters obtained can be used to classify a future new pattern. The performance of the proposed approach is evaluated using two benchmark data sets. Experimental results indicate that the proposed approach can improve classification accuracy significantly, compared to classifier trained with a small number of labeled data only. Also, it outperforms a similar approach SSFCM.  相似文献   

10.
A new method of partitive clustering is developed in the framework of shadowed sets. The core and exclusion regions of the generated shadowed partitions result in a reduction in computations as compared to conventional fuzzy clustering. Unlike rough clustering, here the choice of threshold parameter is fully automated. The number of clusters is optimized in terms of various validity indices. It is observed that shadowed clustering can efficiently handle overlapping among clusters as well as model uncertainty in class boundaries. The algorithm is robust in the presence of outliers. A comparative study is made with related partitive approaches. Experimental results on synthetic as well as real data sets demonstrate the superiority of the proposed approach.  相似文献   

11.
针对模糊C-均值聚类算法对孤立点、随机初始化的聚类中心比较敏感的问题,将堆叠稀疏自编码与传统模糊C-均值聚类算法相结合,对传统模糊C-均值聚类算法进行了改进。由于堆叠稀疏自编码可以提取原始数据集从低层到高层的特征,而高层的特征通常比原始数据集更能反映待聚类样本的本质特征,用其代替原始数据集进行聚类,有助于提高聚类的效果。利用改进后的算法在UCI的几个标准数据集上进行实验,结果表明改进后的算法是有效可行的。  相似文献   

12.
Induction of multiple fuzzy decision trees based on rough set technique   总被引:5,自引:0,他引:5  
The integration of fuzzy sets and rough sets can lead to a hybrid soft-computing technique which has been applied successfully to many fields such as machine learning, pattern recognition and image processing. The key to this soft-computing technique is how to set up and make use of the fuzzy attribute reduct in fuzzy rough set theory. Given a fuzzy information system, we may find many fuzzy attribute reducts and each of them can have different contributions to decision-making. If only one of the fuzzy attribute reducts, which may be the most important one, is selected to induce decision rules, some useful information hidden in the other reducts for the decision-making will be losing unavoidably. To sufficiently make use of the information provided by every individual fuzzy attribute reduct in a fuzzy information system, this paper presents a novel induction of multiple fuzzy decision trees based on rough set technique. The induction consists of three stages. First several fuzzy attribute reducts are found by a similarity based approach, and then a fuzzy decision tree for each fuzzy attribute reduct is generated according to the fuzzy ID3 algorithm. The fuzzy integral is finally considered as a fusion tool to integrate the generated decision trees, which combines together all outputs of the multiple fuzzy decision trees and forms the final decision result. An illustration is given to show the proposed fusion scheme. A numerical experiment on real data indicates that the proposed multiple tree induction is superior to the single tree induction based on the individual reduct or on the entire feature set for learning problems with many attributes.  相似文献   

13.
针对传统的模糊C均值聚类算法在进行图像分割时对孤立点、噪声点敏感性较强,聚类耗时随图像变大而快速增长等缺陷,基于临近元素空间距离的模糊C均值聚类算法即SFGFCM算法,采用核化的空间距离公式,计算出空间临近像素与考察像素的相似度Sij,然后用邻近像素灰度加权和计算出邻近信息制约图像,并进一步在邻近信息制约图像的灰度级统计的基础上进行聚类。该算法考察了临近像素灰度和位置等信息,并且它们之间取得了很好的平衡;不仅表现出较强的鲁棒性且很好地保留了原图像边缘等细节信息,提高了聚类精度,同时大大缩短了大幅图像的聚类时间。通过在合成图像、医学图像及自然图像上的大量实验,与传统算法对比该算法聚类性能明显提高,在图像分割上体现出了较好的分割效果。  相似文献   

14.
为提高现有模糊C均值聚类算法(FCM)对噪声图像分割的效果和稳定性,提出一种基于FCM的图像分割算法。利用非局部空间信息构建和图像,根据和图像的直方图,自动选择初始化聚类中心,通过求取目标函数极小值完成图像分割。理论分析和实验结果表明,该算法比现有算法更加有效和稳定,对噪声图像有更强的鲁棒性。  相似文献   

15.
胡翰  李永忠 《计算机仿真》2010,27(3):140-142,150
针对网络环境,提出了一种新的半监督聚类入侵检测算法,将主动学习策略应用于半监督聚类过程中,利用少量的标记数据,生成用于初始化算法的种子聚类,通过辅助聚类过程,根据网络数据的特点,检测已知和未知攻击。主动学习策略查询网络中未标记数据与标记数据的约束关系,对标记数据可以快速获得k个不相交的非空近邻集,经检测结果证明,改进了算法的性能,且表明了算法的可行性及有效性。  相似文献   

16.
This paper proposes a hybrid framework composed of filtering module and clustering module to identify six common types of control chart patterns, including natural pattern, cyclic pattern, upward shift, downward shift, upward trend, and downward trend. In particular, a multi-scale wavelet filter is designed for denoising and its performance is compared to single-scale filters, including mean filter and exponentially weighted moving average (EWMA) filter. Moreover, three fuzzy clustering algorithms, based on fuzzy c means (FCM), entropy fuzzy c means (EFCM) and kernel fuzzy c means (KFCM), are adopted to compare their performance of pattern classification. Experimental results demonstrate that the excellent performance of EFCM and KFCM against outliers, especially in the case of high noise level embedded in the input data. Therefore, a hybrid framework combining wavelet filter with robust fuzzy clustering is suggested and proposed in this paper. Compared to neural network based approaches, the proposed method provides a promising way for the on-line recognition of control chart patterns because of its efficient computation and robustness against outliers.  相似文献   

17.
Generalized rough sets over fuzzy lattices   总被引:2,自引:0,他引:2  
This paper studies generalized rough sets over fuzzy lattices through both the constructive and axiomatic approaches. From the viewpoint of the constructive approach, the basic properties of generalized rough sets over fuzzy lattices are obtained. The matrix representation of the lower and upper approximations is given. According to this matrix view, a simple algorithm is obtained for computing the lower and upper approximations. As for the axiomatic approach, a set of axioms is constructed to characterize the upper approximation of generalized rough sets over fuzzy lattices.  相似文献   

18.
Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually tuning the kernel parameters due to the fact that no sufficient supervision is provided, but also lack a measure that achieves better effectiveness of clustering. In this paper, we propose an adaptive Semi-supervised Clustering Kernel Method based on Metric learning (SCKMM) to mitigate the above problems. Specifically, we first construct an objective function from pairwise constraints to automatically estimate the parameter of the Gaussian kernel. Then, we use pairwise constraint-based K-means approach to solve the violation issue of constraints and to cluster the data. Furthermore, we introduce metric learning into nonlinear semi-supervised clustering to improve separability of the data for clustering. Finally, we perform clustering and metric learning simultaneously. Experimental results on a number of real-world data sets validate the effectiveness of the proposed method.  相似文献   

19.
One of the simple techniques for Data Clustering is based on Fuzzy C-means (FCM) clustering which describes the belongingness of each data to a cluster by a fuzzy membership function instead of a crisp value. However, the results of fuzzy clustering depend highly on the initial state selection and there is also a high risk for getting the best results when the datasets are large. In this paper, we present a hybrid algorithm based on FCM and modified stem cells algorithms, we called it SC-FCM algorithm, for optimum clustering of a dataset into K clusters. The experimental results obtained by using the new algorithm on different well-known datasets compared with those obtained by K-means algorithm, FCM, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC) Algorithm demonstrate the better performance of the new algorithm.  相似文献   

20.
变精度模糊粗糙集的一种定义   总被引:1,自引:1,他引:1  
模糊粗糙集模型同经典粗糙集模型类似,容易受到噪音数据的影响.针对该问题,受变精度粗糙集模型的启发,提出了变精度模糊粗糙集的概念.针对现有变精度模糊粗糙集模型尚不能满足一些基本性质的缺陷,重新定义了模糊近似空间中某一模糊集的β-下近似和β-上近似,该定义方式能够满足上述的基本性质.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号