共查询到18条相似文献,搜索用时 171 毫秒
1.
传统的聚类方法大多是基于距离或者是样品间相似度的,这就要求所分析的数据必须是定量的。但是在数据挖掘中,存在着大量的定性数据,传统的聚类分析方法已不再是一个可行的方法,这就需要寻找一个可以有效处理定性数据的聚类方法。粗糙集是处理定性数据的有效方法,在详细阐述粗糙集的相关概念后,利用属性重要性的概念,提出了一种能有效处理定性数据的聚类分析方法,并利用了数据对该方法进行了实证分析,取得了良好的结果。 相似文献
2.
《软件》2019,(9):156-163
粗糙集理论是一种新型的处理含糊不确定知识的数学工具,善于分析隐藏在数据中的事实而不需要关于数据的任何附加知识,粗集理论不仅为信息科学和认知科学提供了新的科学逻辑和研究方法,而且为智能信息处理提供了有效的处理技术。聚类是作为数据挖掘系统中的一个模块,既可以作为一个单独的工具以发现数据库中数据分布的深层信息,也可以作为其他数据挖掘分析算法的一个预处理步骤。模糊聚类算法忽略了聚类边界不确定的问题和复杂数据问题从而导致聚类效果不理想。本文提出了将粗糙集和模糊聚类算法相结合,利用粗糙集中上近似集和下近似集的概念得到相似性度量来改进模糊聚类算法。实验证明,改进的算法能够得到更好的聚类效果。 相似文献
3.
基于粗糙集的混合属性数据聚类算法 总被引:2,自引:0,他引:2
传统聚类方法将对象严格地划分到某一类,但是很多时候边界对象不能被严格地划分。基于粗糙集的k-means聚类算法和基于粗糙集的leader聚类算法,利用粗糙集理论将数据对象划分到一个簇的上近似集或下近似集当中,提供了一种新的处理不确定性的视角,很好地解决了这种边界不确定问题。但其缺点是不能处理混合属性数据,聚类结果对初值有明显的依赖性。针对这些算法存在的不足,给出了一种适用于混合属性数据的距离定义,对初始值的选取提出了改进办法,提出了一种基于粗糙集的混合属性数据聚类算法。仿真实验证明,在不确定聚类簇数的情况下,该算法的聚类准确率比传统k-means算法明显提高。 相似文献
4.
5.
基于数据场的粗糙聚类算法 总被引:2,自引:1,他引:1
聚类分析是数据挖掘的研究热点.传统的聚类算法都是把一个对象精确地划分到一个聚类簇中,类别之间的界限是非常精确的.随着Web挖掘技术的发展,精确地划分每个对象的聚类算法面临着巨大的挑战.根据数据场理论和经典粗糙集理论所具有处理不精确与不确定性数据的特性,提出一种新的基于数据场的粗糙聚类算法,该粗糙聚类算法采用势值作为对象的划分依据,避免传统粗糙聚类算法一贯采用基于欧氏距离的划分方法.算法首先通过对数据对象进行粗分然后再不断迭代细分,直至形成稳定的聚类簇.实验分析过程中,把提出的算法与粗糙K-means算法和粗糙K-medoids算法进行了比较,结果表明该算法在交叉数据集上具有较好的聚类效果,而且收敛速度较快. 相似文献
6.
一种基于粗糙集的K-Means聚类算法 总被引:5,自引:0,他引:5
冯征 《计算机工程与应用》2006,42(20):141-142,146
在传统的硬聚类过程中,得到的簇中数据对象是确定的,然而在现实世界,边界数据是不能被准确划分到任何一个簇的。粗糙集是处理这种边界不确定性的工具,基于此提出了一种基于粗糙集的K-Means聚类算法,这种算法生成的簇包括上近似集和下近似集,可以处理边界对象。试验证明,这种算法是有效的。 相似文献
7.
8.
提出一种将粗糙集方法与模糊C均值聚类(FCM)算法结合的图像聚类方法。借助于粗糙集理论在处理大数据量、消除冗余信息等方面的优点,减少模糊C均值聚类的训练数据量,克服其因为数据量大而处理速度慢等缺点,同时利用模糊C均值聚类好的聚类性能,对经过约简的最小属性子集进行聚类分析,实现图像聚类的快速、准确、鲁棒等优点。在人脸图像上的聚类实验取得了很好的效果。 相似文献
9.
10.
11.
Pixel clustering in spectral domain is an important approach for the soft-tissue categorization of magnetic resonance (MR) brain images. In this regard, clustering algorithms based on type-1 fuzzy set theory are suitable for the overlapping partitions while the rough set based clustering algorithms deal with uncertainty and vagueness. However, additional degree of fuzziness makes the clustering more challenging for various subtle uncertainties and noisy data in the overlapping areas. Hence, this fact motivates us to propose a hybrid technique, called Rough Possibilistic Type-2 Fuzzy C-Means clustering with the integration of Random Forest. In the proposed method, possibilistic approach handles the noisy data better, whereas the other various uncertainties and inherent vagueness are taken care by type-2 fuzzy set and rough set theories. After clustering, it produces rough and crisp points. Thereafter, such crisp points are used to train the Random Forest classifier in order to classify the rough points for yielding better clustering solution. The performance of the proposed method has been demonstrated in comparison with several other recently proposed methods for MR brain image segmentation. Finally, superiority of the results produced by the proposed hybrid method has also been validated through statistical significance test. 相似文献
12.
半监督聚类在无监督学习中通过对少量监督信息的有效利用提高聚类性能。提出一种基于seeds集的半监督聚类算法,它采用Apiori算法对初始seeds集和扩大规模后seeds集的数据进行频繁项集挖掘,使得数据中存在的噪音数据和误标记数据得到净化、修正,以改善seeds集质量,提高聚类性能。该算法使用带权χ2测试这一数学模型作为分类规则度量指标,以对无标记数据进行类标签值预测。实验结果显示,所提出的结合了频繁项集挖掘和带权χ2测试的基于seeds集的半监督聚类算法不仅改善了seeds集质量,也提高了预测结果的精确度,优化了聚类性能。 相似文献
13.
Matthias Luber Kai O. Arras Christian Plagemann Wolfram Burgard 《Autonomous Robots》2009,26(2-3):141-151
For robots operating in real-world environments, the ability to deal with dynamic entities such as humans, animals, vehicles, or other robots is of fundamental importance. The variability of dynamic objects, however, is large in general, which makes it hard to manually design suitable models for their appearance and dynamics. In this paper, we present an unsupervised learning approach to this model-building problem. We describe an exemplar-based model for representing the time-varying appearance of objects in planar laser scans as well as a clustering procedure that builds a set of object classes from given observation sequences. Extensive experiments in real environments demonstrate that our system is able to autonomously learn useful models for, e.g., pedestrians, skaters, or cyclists without being provided with external class information. 相似文献
14.
Interval Set Clustering of Web Users with Rough K-Means 总被引:1,自引:0,他引:1
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors. 相似文献
15.
16.
聚类分析是数据挖掘中的一个重要研究课题。在许多实际应用中,聚类分析的数据往往具有很高的维度,例如文档数据、基因微阵列等数据可以达到上千维,而在高维数据空间中,数据的分布较为稀疏。受这些因素的影响,许多对低维数据有效的经典聚类算法对高维数据聚类常常失效。针对这类问题,本文提出了一种基于遗传算法的高维数据聚类新方法。该方法利用遗传算法的全局搜索能力对特征空间进行搜索,以找出有效的聚类特征子空间。同时,为了考察特征维在子空间聚类中的特征,本文设计出一种基于特征维对子空间聚类贡献率的适应度函数。人工数据、真实数据的实验结果以及与k-means算法的对比实验证明了该方法的可行性和有效性。 相似文献
17.
Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute values. However, nowadays commercial or scientific databases usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data. 相似文献
18.
针对复杂及带噪声的数据集的聚类问题, 提出了一种基于局部密度的网格排序策略(GSS-LD)并以其作为网格聚类的组织模式. GSS-LD一方面利用聚类的局部性质进行网格单元排序, 将基于网格的聚类问题转化为网格的排序问题;另一方面运用相对局部密度变化率的概念, 克服了传统网格聚类算法中全局性参数的局限性, 使其可以适应多密度数据集的聚类. 通过3组具有不同拓扑结构的数据集测试GSS-LD的聚类性能并同其它两种方法进行比较, 结果表明GSS-LD可以对复杂数据集进行有效聚类, 它的时间复杂度分别与数据规模及网格结构具有线性关系, 同时具有较强的噪声处理能力. 相似文献