首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于遗传FCM算法的文本聚类   总被引:3,自引:1,他引:3  
况夯  罗军 《计算机应用》2009,29(2):558-560
本文提出基于遗传FCM算法的文本聚类方法,首先采用LSI方法对文本特征进行降维,然后通过聚类有效性分析得到文本的类别数,最后再采用遗传FCM算法对文本进行聚类,这种方法较好的克服了FCM算法收敛于局部最优的缺陷,很好的解决了FCM算法对初值敏感的问题。实验表明提出的方法具有较好的聚类性能。  相似文献   

2.
In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers - objects who behave in an unexpected way or have abnormal properties. The identification of outliers is important for many applications such as intrusion detection, credit card fraud, criminal activities in electronic commerce, medical diagnosis and anti-terrorism, etc. In this paper, we propose a hybrid approach to outlier detection, which combines the opinions from boundary-based and distance-based methods for outlier detection ( [Jiang et al., 2005], [Jiang et al., 2009] and [Knorr and Ng, 1998]). We give a novel definition of outliers - BD (boundary and distance)-based outliers, by virtue of the notion of boundary region in rough set theory and the definitions of distance-based outliers. An algorithm to find such outliers is also given. And the effectiveness of our method for outlier detection is demonstrated on two publicly available databases.  相似文献   

3.
改进FCM聚类算法及其在入侵检测中的应用   总被引:2,自引:0,他引:2  
针对模糊C-均值(FCM)算法的局限性,提出了一种具有两阶段的模糊FCM聚类改进算法。通过加入点密度函数加权系数和样本特征矢量权重对FCM聚类算法中的目标函数进行改造,进而给出迭代推导公式和算法描述。该算法克服了样本分布不均匀和样本特征矢量对分类贡献不均衡的情况,有效地提高了聚类精度。最后利用KDD CUP 99数据集进行实验,结果表明该算法具有良好的可靠性和可行性。  相似文献   

4.
Fuzzy C-means (FCM) partitions the observations partially into several clusters based on the principles of fuzzy theory. However, minimization on the Euclidean distance in FCM tends to detect hyper-spherical shaped clusters, which is unfeasible for the real world problems. In this paper, an effective FCM algorithm that adopts the symmetry similarity measure is proposed in order to search for the appropriate clusters, regardless of the geometric structures and overlapping characteristic. Experimental results on several artificial and real life datasets with different nature and the performance assessment with other existing clustering algorithms demonstrate its superiority.  相似文献   

5.
Fuzzy C-means (FCM) clustering has been widely used successfully in many real-world applications. However, the FCM algorithm is sensitive to the initial prototypes, and it cannot handle non-traditional curved clusters. In this paper, a multi-center fuzzy C-means algorithm based on transitive closure and spectral clustering (MFCM-TCSC) is provided. In this algorithm, the initial guesses of the locations of the cluster centers or the membership values are not necessary. Multi-centers are adopted to represent the non-spherical shape of clusters. Thus, the clustering algorithm with multi-center clusters can handle non-traditional curved clusters. The novel algorithm contains three phases. First, the dataset is partitioned into some subclusters by FCM algorithm with multi-centers. Then, the subclusters are merged by spectral clustering. Finally, based on these two clustering results, the final results are obtained. When merging subclusters, we adopt the lattice similarity method as the distance between two subclusters, which has explicit form when we use the fuzzy membership values of subclusters as the features. Experimental results on two artificial datasets, UCI dataset and real image segmentation show that the proposed method outperforms traditional FCM algorithm and spectral clustering obviously in efficiency and robustness.  相似文献   

6.
针对局部空间信息的模糊C-均值算法(WFLICM)中空间影响因子容易受到噪声影响出现错误标识的问题,提出一种融合局部和非局部空间信息的模糊C-均值聚类图像分割算法(NLWFLICM),在WFLICM算法的模糊影响因子中引入非局部空间信息,根据噪声程度自适应地设置局部和非局部信息权重,并重新标记中心点的模糊影响因子。实验结果表明,NLWFLICM算法具有比WFLICM算法更强的鲁棒性和自适应性,并在一定程度上提高了WFLICM算法对含有大量噪声图像进行分割的鲁棒性,同时保留了图像的纹理。为了提高算法的聚类性能和收敛速度,结合Canopy算法能够快速对数据进行粗聚类的优点,提出基于Canopy聚类与非局部空间信息的FCM图像分割改进算法(Canopy-NLWFLICM),可以在NLWFLICM算法聚类前,对聚类中心进行预处理,从而提高收敛速度和图像分割精度。  相似文献   

7.
软硬结合的快速模糊C-均值聚类算法的研究   总被引:1,自引:1,他引:1  
讨论的是对模糊C-均值聚类方法的改进,在原有的模糊C-均值算法的基础上,提出一种软硬结合的快速模糊C-均值聚类算法。快速模糊C-均值聚类算法是在模糊C-均值聚类算法之前加入一层硬C-均值聚类算法。硬聚类算法能比模糊聚类算法以高得多的速度完成,将硬聚类中心作为模糊聚类中心的迭代初值,从而提高模糊C-均值聚类算法的收敛速度,这对于大量数据的聚类是很有意义的。用数据仿真验证了这种快速模糊C-均值聚类算法比模糊C-均值算法迭代调整过程短,收敛速度快,聚类效果好。  相似文献   

8.
模糊C均值聚类图像分割的改进遗传算法研究   总被引:3,自引:0,他引:3       下载免费PDF全文
基于模糊C均值(FCM)聚类算法,并利用遗传算法全局随机搜索的特点,提出了一种图像分割的改进遗传算法。该算法首先采用一种初值化算法确定合适的遗传算法的初始搜索范围,然后对遗传算法中的编码方式、交叉算子、变异算子等参数进行了一些适当改进,进而给出了该算法的理论推导和算法的具体实现步骤。该算法除了解决模糊C均值聚类算法在医学图像分割中容易陷入局部最优解的问题,而且采用的初值化算法比标准的遗传模糊C均值聚类算法能确定更合适的遗传算法的初始搜索范围,从而加速了遗传算法的收敛过程。实验表明,该方法相对于标准的遗传模糊C均值聚类算法,效果要好得多。  相似文献   

9.
改进的模糊C均值聚类算法   总被引:4,自引:0,他引:4       下载免费PDF全文
把自适应的策略与传统的模糊C均值聚类算法结合起来,形成新的模糊聚类算法。在不影响收敛速度的情况下,它能够很好解决局部最优以及对初始值敏感的问题。以UCI机器学习数据库中的两组数据集为研究对象,实验结果表明,它的精确度与自适应免疫聚类算法相当,能够得到准确的簇的数目,并且它的收敛速度更快,这对于如今网络数据的高速变化来说,该方法显得更为重要。  相似文献   

10.
针对传统图转导(GT)算法计算量大并且准确率不高的问题,提出一个基于C均值聚类和图转导的半监督分类算法。首先,采用模糊C均值(FCM)聚类算法先对未标记样本预选取,缩小图转导算法构图数据集的范围;然后,构建k近邻稀疏图,减少相似度矩阵的虚假连接,进而缩减了构图的时间,通过标记传播的方式得出初选未标记样本的标记信息;最后,结合半监督流形假设模型利用扩充的标记数据集以及剩余未标记数据集进行分类器的训练,进而得出最终的分类结果。在Weizmann Horse数据集下,所提算法分类准确率均达到96%以上,和传统仅使用图转导的分类方法相比,解决了对初始标记集的依赖性问题,将准确率至少提高了10%;将所提算法直接运用到兵马俑数据集,分类准确度也达到95%以上,明显高于传统的图转导算法。实验结果表明,基于C均值聚类和图转导的半监督分类算法,在图像分类方面有较好的分类效果,对图像的精准分类具有研究意义。  相似文献   

11.
Clustering analysis is an important topic in artificial intelligence, data mining and pattern recognition research. Conventional clustering algorithms, for instance, the famous Fuzzy C-means clustering algorithm (FCM), assume that all the attributes are equally relevant to all the clusters. However in most domains, especially for high-dimensional dataset, some attributes are irrelevant, and some relevant ones are less important than others with respect to a specific class. In this paper, such imbalances between the attributes are considered and a new weighted fuzzy kernel-clustering algorithm (WFKCA) is presented. WFKCA performs clustering in a kernel feature space mapped by mercer kernels. Compared with the conventional hard kernel-clustering algorithm, WFKCA can yield the meaningful prototypes (cluster centers) of the clusters. Numerical convergence properties of WFKCA are also discussed. For in-depth studies, WFKCA is extended to WFKCA2, which has been demonstrated as a useful tool for clustering incomplete data. Numerical examples demonstrate the effectiveness of the new WFKCA algorithm  相似文献   

12.
In this paper, we show how one can take advantage of the stability and effectiveness of object data clustering algorithms when the data to be clustered are available in the form of mutual numerical relationships between pairs of objects. More precisely, we propose a new fuzzy relational algorithm, based on the popular fuzzy C-means (FCM) algorithm, which does not require any particular restriction on the relation matrix. We describe the application of the algorithm to four real and four synthetic data sets, and show that our algorithm performs better than well-known fuzzy relational clustering algorithms on all these sets.  相似文献   

13.
Dubois and Prade (1990) [1] introduced the notion of fuzzy rough sets as a fuzzy generalization of rough sets, which was originally proposed by Pawlak (1982) [8]. Later, Radzikowska and Kerre introduced the so-called (I,T)-fuzzy rough sets, where I is an implication and T is a triangular norm. In the present paper, by using a pair of implications (I,J), we define the so-called (I,J)-fuzzy rough sets, which generalize the concept of fuzzy rough sets in the sense of Radzikowska and Kerre, and that of Mi and Zhang. Basic properties of (I,J)-fuzzy rough sets are investigated in detail.  相似文献   

14.
基于空间邻域加权的模糊C-均值聚类及其应用研究*   总被引:2,自引:0,他引:2  
针对模糊C-均值聚类法用于图像聚类时仅利用了像素的灰度信息,而忽视空间位置信息,导致在噪声区域和边界处有误分类现象,提出一种新的基于空间邻域加权的模糊C-均值图像聚类法。首先,定义了一个空间邻域信息函数,该函数能够有力抑制噪声点,同时能够很好保留边界的特性;其次,设计了具有空间约束的样本邻域信息加权隶属度矩阵;最后,将该方法应用于人工合成图像和模拟MR脑图像的聚类。实验结果表明,该方法能够获得较好的聚类效果,同时具有较强的抑制噪声的能力。  相似文献   

15.
传统的快速聚类算法大多基于模糊C均值算(Fuzzy C-means,FCM),而FCM对初始聚类中心敏感,对噪音数据敏感并且容易收敛到局部极小值,因而聚类准确率不高。建立使用分治策略解决聚类问题的算法架构,充分考虑数据本身特性并对传统的FCM算法进行改进,标准数据集的实验结果表明这种基于分治策略的FCM聚类算法较好地提高了算法的聚类准确率,加快了收敛速度。  相似文献   

16.
Semi-supervised fuzzy clustering: A kernel-based approach   总被引:1,自引:0,他引:1  
Huaxiang Zhang  Jing Lu 《Knowledge》2009,22(6):477-481
Semi-supervised clustering algorithms aim to improve the clustering accuracy under the supervisions of a limited amount of labeled data. Since kernel-based approaches, such as kernel-based fuzzy c-means algorithm (KFCM), have been successfully used in classification and clustering problems, in this paper, we propose a novel semi-supervised clustering approach using the kernel-based method based on KFCM and denote it the semi-supervised kernel fuzzy c-mean algorithm (SSKFCM). The objective function of SSKFCM is defined by adding classification errors of both the labeled and the unlabeled data, and its global optimum has been obtained through repeatedly updating the fuzzy memberships and the optimized kernel parameter. The objective function may have more than one local optimum, so we employ a function transformation technique to reformulate the objective function after a local minimum has been obtained, and select the best optimum as the solution to the objective function. Experimental results on both the artificial and several real data sets show SSKFCM performs better than its conventional counterparts and it achieves the best accurate clustering results when the parameter is optimized.  相似文献   

17.
针对网络行为数据中带标签数据收集困难及网络行为数据的异构性,提出了一种基于异构距离和样本密度的半监督模糊聚类算法,并将该算法应用到网络入侵检测中。该方法依据网络行为数据样本的异构性计算样本与类之间的异构距离及各个类的样本密度,利用异构距离和类内样本密度计算样本与类之间的模糊隶属度,用所得隶属度对无标签样本进行加标签处理,并得到相应的分类器。在KDD CUP99数据集上进行仿真实验,结果表明该方法是可行的、高效的。  相似文献   

18.
基于模糊C均值聚类的医学图像分割研究   总被引:1,自引:0,他引:1  
模糊C均值聚类算法(FCM)在硬C均值聚类的基础上有效地解决了医学图像分割中存在的模糊情况,通过建立表示图像中像素点与聚类中心加权相似度的目标函数,采用迭代优化的方法求解目标函数的极小值来确定最佳聚类。针对FCM算法中存在的对大样本数据分割速度慢、结果易受初始值影响、对噪声敏感、难以适应多种数据分布等缺陷,涌现出了大量的改进算法。对其中的部分改进算法进行综述,主要介绍快速FCM算法、基于初始值选取的FCM算法、基于空间邻域信息的FCM算法以及基于核函数的FCM算法等,并对其优缺点进行概要的总结和介绍。指出该算法进一步的研究方向。  相似文献   

19.
基于快速二维熵的加权模糊C均值聚类图像分割   总被引:1,自引:0,他引:1       下载免费PDF全文
提出了一种结合快速二维熵和加权模糊C均值聚类的图像分割方法。采用快速二维熵算法对实际图像进行初步分割求得目标和背景的中心,然后采用样本点像素与其邻域灰度像素的差别表征该样本点对分类的影响程度,最后利用加权模糊C均值聚类算法完成图像分割。该方法一方面解决了传统的模糊C均值聚类算法对初始值敏感的问题,另一方面克服了传统的聚类算法对数据集进行等划分的缺陷。实验结果表明,该方法不仅具有良好的收敛性,而且还可以有效地把目标从背景中分割出来,具有重要的实际应用价值。  相似文献   

20.
在综合分析标准的模糊C-均值聚类算法和条件模糊C-均值聚类算法基础上,对模糊划分空间进行修改,进一步弱化模糊划分矩阵的约束,给出一种扩展的条件模糊C-均值聚类算法。算法的划分矩阵和原型不依赖于背景约束及模糊划分矩阵的隶属度总和。实验结果表明:该算法可以得到不同的聚类原型,并具有很好的聚类效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号