首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
为解决连续属性值的离散化问题,提出了一种改进的自组织映射( SOM )聚类离散化算法,该算法利用SOM实现初始聚类,界定聚类上限;之后以初始聚类中心为样本,通过层次方法的平衡迭代规约和聚类( BIRCH)层次聚类算法进行二次聚类,解决聚类数虚高问题并确定离散断点集;最后对断点集任一样本找出其所在维各聚类中心的最近邻,以此作为离散微调依据。实验结果表明,该算法在断点集数(轮廓系数提升75%)及离散精度方面(不相容度更近似0)均优于传统SOM聚类离散化算法,可有效解决大样本、高维数据离散化问题。  相似文献   

2.
自组织映射(SOM)是一种竞争型无指导学习的神经网络方法。SOM神经网络已广泛地应用于模式聚类、模式识别、拓扑不变性映射等方面。论文利用SOM对某高校2012年晋级副教授的46位教师的实际数据进行聚类分析,建立职称评审决策模型。首先,选取影响指标:SCI/EI篇数,一级核心论文篇数,二级核心论文篇数等作为SOM神经网络的输入模式;然后,用SOM进行聚类;最后,对聚类结果进行分析得出各类的特征和等级。实验结果表明,利用SOM对高校教师职称数据进行聚类分析是可行的、有效的,可以避开人的主观因素,更迅速客观地得到聚类结果。它为高校教师职称的评审提供了一种新的参考依据,具有较好的应用前景。  相似文献   

3.
文本聚类的核心问题是找到一种优化的聚类算法对文本向量进行聚类,是典型的高维数据聚类,提出一种基于自组织神经网络SOM和人工免疫网络aiNet的两阶段文本聚类算法TCBSA。新算法先用SOM神经网络进行聚类,把高维的文本数据映射到二维的平面上,然后再用aiNet对文本聚类。该方法利用SOM神经网络对高维数据降维的优点,克服了人工免疫网络对高维数据的聚类能力差的缺点。仿真实验结果表明该文本聚类算法不仅是可行的,而且具有一定的自适应能力和较好的聚类效果。  相似文献   

4.
基于SOM聚类的软构件分类方法   总被引:1,自引:0,他引:1  
软构件刻面分类法是一种被各大软构件库系统广泛采用的分类方法,但是传统的刻面分类法需要人工建立和维护庞大的术语空间,增大了软构件建库和入库的工作量.利用基于SOM神经网络的聚类技术可实现无需建立术语空间的软构件自动分类,同时针对软构件的特点和SOM聚类的需要预先确定拓扑结构和聚类结果与输入样本的次序有关等缺点,对SOM聚类的训练过程进行改进以满足软构件聚类的要求.  相似文献   

5.
基于SOM神经网和K-均值算法的图像分割   总被引:2,自引:0,他引:2  
提出了一种基于SOM神经网络和K-均值的图像分割算法。SOM网络将多维数据映射到低维规则网格中,可以有效地用于大型数据的挖掘;而K-均值是一种动态聚类算法,适用于中小型数据的聚类。文中算法利用SOM网络将具有相似特征的象素S点映射到一个2-D神经网上,再根据神经元间的相似性,利用K-均值算法将神经元聚类。文中将该算法用于彩色图像的分割,并给出了经SOM神经网初聚类后,不同K值下神经元聚类对图像分割的结果及与单纯K-均值分割图像进行对比。  相似文献   

6.
自组织特征映射神经网络的改进及应用研究   总被引:2,自引:0,他引:2       下载免费PDF全文
为了提高自组织特征映射(SOM)神经网络学习速度及分类精度,对初始连接权值及竞争层神经元数的确定方法进行改进。提出用聚类方法确定初始权值的新方法,还提出了采用聚类数与邻域之和确定竞争层神经元数的方法,并给出了改进后的SOM分类算法。将改进的SOM网络用于储粮害虫分类,采用留一方法进行分类验证实验。仿真结果表明,改进后的SOM网络在学习速度和分类精度方面都有明显提高,证明了该方法的有效性。  相似文献   

7.
平行坐标技术是信息可视化中重要的分析手段,可以实现多维数据在二维空间上的可视化.为了给用户提供一种快捷、方便的金融数据可视化及分析工具,提出一种基于引力场聚类的金融数据可视化方法.首先利用自组织映射(SOM)对初始金融数据进行分类,使每类数据都含有特定的经济意义;然后进行视觉聚类,利用引力场原理对每个类中的折线进行聚拢,对类与类之间进行排斥,再通过设置不透明度以及交互操作等手段对可视化结果进行增强.实验结果表明,该方法可以形成清晰的可视化聚类结果,便于发现数据的变化规律.  相似文献   

8.
本文利用模糊聚类的原理(神经网络SOM算法)提出一种个性化WEB信息检索系统结构,包括用户个性化模糊聚类和网络信息模糊聚类,并分别论述其实现过程。  相似文献   

9.
基于核方法可在高维特征空间中完成数据聚类,但缺乏对原输入空间聚类中心及结果的直观刻画.提出一种核自组织映射竞争聚类算法.该算法是利用核的特征,导出SOM算法的获胜神经元及权重更新规则,而竞争学习机制依然保持在原输入空间中,这样既解决了当输入样本分布结构呈高度非线性时,其分类能力下降的问题,而且解决了Donald[1]算法导致的特征空间中的获胜神经元在原始输入空间中的原像不存在,而无法对聚类结果利用可视化技术进行解释的问题.实验结果表明,提出的核自组织映射竞争聚类算法在某些数据集中可以获得比SOM算法更好的结果.  相似文献   

10.
覃晓  元昌安 《计算机应用》2008,28(3):757-760
自组织映射(SOM)算法作为一种聚类和高维可视化的无监督学习算法,为进行中文Web文档聚类提供了有力的手段。但是SOM算法天然存在着对网络初始权值敏感的缺陷,从而影响聚类质量。为此,引进遗传算法对SOM网络加以优化。提出了以遗传算法优化SOM网络的文本聚类算法(GSTCA);进行了对比实验,实验表明,改进后的算法GSTCA比SOM算法在Web中文文档聚类中具有更高的准确率,其F-measure值平均提高了14%,同时,实验还表明,GSTCA算法对网络初始权值是不敏感的,从而提高了算法的稳定性。  相似文献   

11.
数值型和分类型混合数据的模糊K-Prototypes聚类算法   总被引:15,自引:0,他引:15  
陈宁  陈安  周龙骧 《软件学报》2001,12(8):1107-1119
由于数据库经常同时包含数值型和分类型的属性,因此研究能够处理混合型数据的聚类算法无疑是很重要的.讨论了混合型数据的聚类问题,提出了一种模糊K-prototypes算法.该算法融合了K-means和K-modes对数值型和分类型数据的处理方法,能够处理混合类型的数据.模糊技术体现聚类的边界特征,更适合处理含有噪声和缺失数据的数据库.实验结果显示,模糊算法比相应的确定算法得到的结果准确度高.  相似文献   

12.
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.  相似文献   

13.
Clustering is one of the most popular techniques in data mining. The goal of clustering is to identify distinct groups in a dataset. Many clustering algorithms have been published so far, but often limited to numeric or categorical data. However, most real world data are mixed, numeric and categorical. In this paper, we propose a clustering algorithm CAVE which is based on variance and entropy, and is capable of mining mixed data. The variance is used to measure the similarity of the numeric part of the data. To express the similarity between categorical values, distance hierarchy has been proposed. Accordingly, the similarity of the categorical part is measured based on entropy weighted by the distances in the hierarchies. A new validity index for evaluating the clustering results has also been proposed. The effectiveness of CAVE is demonstrated by a series of experiments on synthetic and real datasets in comparison with that of several traditional clustering algorithms. An application of mining a mixed dataset for customer segmentation and catalog marketing is also presented.  相似文献   

14.
Generalizing self-organizing map for categorical data   总被引:1,自引:0,他引:1  
The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Self-organizing maps have been successfully applied to many fields, including engineering and business domains. However, the conventional SOM training algorithm handles only numeric data. Categorical data are usually converted to a set of binary data before training of an SOM takes place. If a simple transformation scheme is adopted, the similarity information embedded between categorical values may be lost. Consequently, the trained SOM is unable to reflect the correct topological order. This paper proposes a generalized self-organizing map model that offers an intuitive method of specifying the similarity between categorical values via distance hierarchies and, hence, enables the direct process of categorical values during training. In fact, distance hierarchy unifies the distance computation of both numeric and categorical values. The unification is done by mapping the values to distance hierarchies and then measuring the distance in the hierarchies. Experiments on synthetic and real datasets were conducted, and the results demonstrated the effectiveness of the generalized SOM model.  相似文献   

15.
Self-Organizing Map (SOM) possesses effective capability for visualizing high-dimensional data. Therefore, SOM has numerous applications in visualized clustering. Many growing SOMs have been proposed to overcome the constraint of having a fixed map size in conventional SOMs. However, most growing SOMs lack a robust solution to process mixed-type data which may include numeric, ordinal and categorical values in a dataset. Moreover, the growing scheme has an impact on the quality of resultant maps. In this paper, we propose a Growing Mixed-type SOM (GMixSOM), combining a value representation mechanism distance hierarchy with a novel growing scheme to tackle the problem of analyzing mixed-type data and to improve the quality of the projection map. Experimental results on synthetic and real-world datasets demonstrate that the proposed mechanism is feasible and the growing scheme yields better projection maps than the existing method.  相似文献   

16.
余泽 《计算机系统应用》2014,23(12):125-130
混合属性聚类是近年来的研究热点,对于混合属性数据的聚类算法要求处理好数值属性以及分类属性,而现存许多算法没有很好得平衡两种属性,以至于得不到令人满意的聚类结果.针对混合属性,在此提出一种基于交集的聚类融合算法,算法单独用基于相对密度的算法处理数值属性,基于信息熵的算法处理分类属性,然后通过基于交集的融合算法融合两个聚类成员,最终得到聚类结果.算法在UCI数据集Zoo上进行验证,与现存k-prototypes与EM算法进行了比较,在聚类的正确率上都优于k-prototypes与EM算法,还讨论了融合算法中交集元素比的取值对算法结果的影响.  相似文献   

17.
K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results.  相似文献   

18.
Self-Organizing Map (SOM) networks have been successfully applied as a clustering method to numeric datasets. However, it is not feasible to directly apply SOM for clustering transactional data. This paper proposes the Transactions Clustering using SOM (TCSOM) algorithm for clustering binary transactional data. In the TCSOM algorithm, a normalized Dot Product norm based dissimilarity measure is utilized for measuring the distance between input vector and output neuron. And a modified weight adaptation function is employed for adjusting weights of the winner and its neighbors. More importantly, TCSOM is a one-pass algorithm, which is extremely suitable for data mining applications. Experimental results on real datasets show that TCSOM algorithm is superior to those state-of-the-art transactional data clustering algorithms with respect to clustering accuracy.  相似文献   

19.
模糊K Prototypes(FKP)算法融合了K Means和K Modes对数值型和符号型数据的处理方法,适合于混合类型数据的聚类分析。同时,模糊技术使得FKP适合于处理含有噪声和缺少数据的数据库。但是,在使用FCM(FuzzyC Meansalgorithm)或FKP算法时,如何选取加权指数α仍是悬而未决的问题。许多研究者基于他们的实验结果给出FCM中的最佳加权指数可能位于区间 [1. 5,2. 5],本文则提出了一个FKP中加权指数的探寻算法。在多个实际数据集上的实验结果表明,为进行有效的聚类,FKP中加权指数应该小于 1. 5。  相似文献   

20.
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this article, we present two algorithms that extend the Squeezer algorithm to domains with mixed numeric and categorical attributes. The performance of the two algorithms has been studied on real and artificially generated datasets. Comparisons with other clustering algorithms illustrate the superiority of our approaches. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 1077–1089, 2005.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号