首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
2.
The self-organizing map (SOM) is a powerful method for visualization, cluster extraction, and data mining. It has been used successfully for data of high dimensionality and complexity where traditional methods may often be insufficient. In order to analyze data structure and capture cluster boundaries from the SOM, one common approach is to represent the SOM's knowledge by visualization methods. Different aspects of the information learned by the SOM are presented by existing methods, but data topology, which is present in the SOM's knowledge, is greatly underutilized. We show in this paper that data topology can be integrated into the visualization of the SOM and thereby provide a more elaborate view of the cluster structure than existing schemes. We achieve this by introducing a weighted Delaunay triangulation (a connectivity matrix) and draping it over the SOM. This new visualization, CONNvis, also shows both forward and backward topology violations along with the severity of forward ones, which indicate the quality of the SOM learning and the data complexity. CONNvis greatly assists in detailed identification of cluster boundaries. We demonstrate the capabilities on synthetic data sets and on a real 8D remote sensing spectral image.  相似文献   

3.
基于涌现自组织映射的聚类分析与可视化处理   总被引:1,自引:0,他引:1  
K0honen自组织特征映射可实现高维模式空间到低维拓扑结构的映射,借此可进行模式聚类分析及高维数据的二维可视化.但当输入样本数目较多、复杂度较大时,采用KSOM将使相邻类簇间发生大面积重叠,降低聚类效果.本文通过利用涌现自组织特征映射神经网络对数据进行聚类分析.并通过无边界u矩阵实现可视化功能.测试结果表明.借助ESOM模型进行数据的聚类分析与可视化在诸多方面表现出优越的性能.  相似文献   

4.
Model-Based Clustering by Probabilistic Self-Organizing Maps   总被引:1,自引:0,他引:1  
In this paper, we consider the learning process of a probabilistic self-organizing map (PbSOM) as a model-based data clustering procedure that preserves the topological relationships between data clusters in a neural network. Based on this concept, we develop a coupling-likelihood mixture model for the PbSOM that extends the reference vectors in Kohonen's self-organizing map (SOM) to multivariate Gaussian distributions. We also derive three expectation-maximization (EM)-type algorithms, called the SOCEM, SOEM, and SODAEM algorithms, for learning the model (PbSOM) based on the maximum-likelihood criterion. SOCEM is derived by using the classification EM (CEM) algorithm to maximize the classification likelihood; SOEM is derived by using the EM algorithm to maximize the mixture likelihood; and SODAEM is a deterministic annealing (DA) variant of SOCEM and SOEM. Moreover, by shrinking the neighborhood size, SOCEM and SOEM can be interpreted, respectively, as DA variants of the CEM and EM algorithms for Gaussian model-based clustering. The experimental results show that the proposed PbSOM learning algorithms achieve comparable data clustering performance to that of the deterministic annealing EM (DAEM) approach, while maintaining the topology-preserving property.  相似文献   

5.
崔鹏  张汝波 《计算机科学》2010,37(7):205-207
半监督聚类是近年来研究的热点,传统的方法是在无监督算法的基础上加入有限的背景知识来提高聚类性能.然而大多数半监督聚类技术都基于邻近或密度,难以处理高维数据,因此必须将约减的特征加入到半监督聚类过程中.为解决此问题,提出了一种新的半监督聚类算法框架.该算法利用样本约束传递性进行预处理,然后将特征投影到低维空间实现降维,最终用半监督算法对约减后的样本进行聚类.通过实验同现行主要降维方法进行了比较,说明此方法能有效地处理高维数据,聚类效果良好.  相似文献   

6.
Self-organizing map (SOM) is an artificial neural network tool that is trained using unsupervised learning to produce a low dimensional representation of the input space, called a map. This map is generally the object of a clustering analysis step which aims to partition the referents vectors (map neurons) into compact and well-separated groups. In this paper, we consider the problem of the clustering SOM using different aspects: partitioning, hierarchical and graph coloring based techniques. Unlike the traditional clustering SOM techniques, which use k-means or hierarchical clustering, the graph-based approaches have the advantage of providing a partitioning of the SOM by simultaneously using dissimilarities and neighborhood relations provided by the map. We present the experimental results of several comparisons between these different ways of clustering.  相似文献   

7.
Web sites contain an ever increasing amount of information within their pages. As the amount of information increases so does the complexity of the structure of the web site. Consequently it has become difficult for visitors to find the information relevant to their needs. To overcome this problem various clustering methods have been proposed to cluster data in an effort to help visitors find the relevant information. These clustering methods have typically focused either on the content or the context of the web pages. In this paper we are proposing a method based on Kohonen’s self-organizing map (SOM) that utilizes both content and context mining clustering techniques to help visitors identify relevant information quicker. The input of the content mining is the set of web pages of the web site whereas the source of the context mining is the access-logs of the web site. SOM can be used to identify clusters of web sessions with similar context and also clusters of web pages with similar content. It can also provide means of visualizing the outcome of this processing. In this paper we show how this two-level clustering can help visitors identify the relevant information faster. This procedure has been tested to the access-logs and web pages of the Department of Informatics and Telecommunications of the University of Athens.  相似文献   

8.
维度灾难、含有噪声数据和输入参数对领域知识的强依赖性,是不确定数据聚类领域中具有挑战性的问题。针对这些问题,基于相似性度量和凝聚层次聚类思想的基础上提出了高维不确定数据高效聚类HDUDEC(High Dimensional Un-certain Data Efficient Clustering)算法。该算法采用一个能够准确表达不确定高维对象之间的相似度的度量函数计算出对象之间的相似度,然后根据相似度阈值自底向上进行聚类分析。实验证明新的算法需要的先验知识较少、可以有效地过滤噪声数据、可以高效的获得任意形状的高维不确定聚类结果。  相似文献   

9.
10.
通常,经典的数据聚类算法在低维情况下是有效的,但随着维数的增加,性能和效率都明显的下降,原因在于数据的复杂度是呈指数增长。本文提出了一个处理高维数据聚类的框架,并分析了该框架的性能。  相似文献   

11.
自组织特征映射神经网络SOM(Self-Organizing Feature Maps)是一种优良的聚类工具,但其存在着一些限制,如需要预先定义网络大小、网络的收敛性较差和结构不灵活等.为了克服这些不足,在自组织神经网络理论的指导下,提出了一种基于生长型自组织神经网络的聚类方法.在无监督的情况下,该方法采用阈值控制的触发机制实现网络中神经元的生长和删除,并通过神经元权值的有效调整,以期得到数据对象的聚类结果.实验以二维空间中的数据对象为输入样本,验证了该方法的有效性和优越性.  相似文献   

12.
自组织映射(SOM)聚类算法的研究   总被引:7,自引:0,他引:7  
余健  郭平 《现代计算机》2007,(3):7-8,33
通过自组织映射神经网络实现的聚类算法能将任意维数的输入信号模式转变为一维或二维的离散映射,以拓扑有序的方式自适应实现这个变换.介绍自组织映射聚类算法的原理,通过实验进行仿真,结果表明自组织映射聚类算法是可行有效的.  相似文献   

13.
On High Dimensional Projected Clustering of Data Streams   总被引:3,自引:0,他引:3  
The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan, stream analysis methods have been proposed in this context. However, a lot of stream data is high-dimensional in nature. High-dimensional data is inherently more complex in clustering, classification, and similarity search. Recent research discusses methods for projected clustering over high-dimensional data sets. This method is however difficult to generalize to data streams because of the complexity of the method and the large volume of the data streams.In this paper, we propose a new, high-dimensional, projected data stream clustering method, called HPStream. The method incorporates a fading cluster structure, and the projection based clustering methodology. It is incrementally updatable and is highly scalable on both the number of dimensions and the size of the data streams, and it achieves better clustering quality in comparison with the previous stream clustering methods. Our performance study with both real and synthetic data sets demonstrates the efficiency and effectiveness of our proposed framework and implementation methods.Charu C. Aggarwal received his B.Tech. degree in Computer Science from the Indian Institute of Technology (1993) and his Ph.D. degree in Operations Research from the Massachusetts Institute of Technology (1996). He has been a Research Staff Member at the IBM T. J. Watson Research Center since June 1996. He has applied for or been granted over 50 US patents, and has published over 75 papers in numerous international conferences and journals. He has twice been designated Master Inventor at IBM Research in 2000 and 2003 for the commercial value of his patents. His contributions to the Epispire project on real time attack detection were awarded the IBM Corporate Award for Environmental Excellence in 2003. He has been a program chair of the DMKD 2003, chair for all workshops organized in conjunction with ACM KDD 2003, and is also an associate editor of the IEEE Transactions on Knowledge and Data Engineering Journal. His current research interests include algorithms, data mining, privacy, and information retrieval.Jiawei Han is a Professor in the Department of Computer Science at the University of Illinois at Urbana–Champaign. He has been working on research into data mining, data warehousing, stream and RFID data mining, spatiotemporal and multimedia data mining, biological data mining, social network analysis, text and Web mining, and software bug mining, with over 300 conference and journal publications. He has chaired or served in many program committees of international conferences and workshops, including ACM SIGKDD Conferences (2001 best paper award chair, 1996 PC co-chair), SIAM-Data Mining Conferences (2001 and 2002 PC co-chair), ACM SIGMOD Conferences (2000 exhibit program chair), International Conferences on Data Engineering (2004 and 2002 PC vice-chair), and International Conferences on Data Mining (2005 PC co-chair). He also served or is serving on the editorial boards for Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, Journal of Computer Science and Technology, and Journal of Intelligent Information Systems. He is currently serving on the Board of Directors for the Executive Committee of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Jiawei has received three IBM Faculty Awards, the Outstanding Contribution Award at the 2002 International Conference on Data Mining, ACM Service Award (1999) and ACM SIGKDD Innovation Award (2004). He is an ACM Fellow (since 2003). He is the first author of the textbook “Data Mining: Concepts and Techniques” (Morgan Kaufmann, 2001).Jianyong Wang received the Ph.D. degree in computer science in 1999 from the Institute of Computing Technology, the Chinese Academy of Sciences. Since then, he ever worked as an assistant professor in the Department of Computer Science and Technology, Peking (Beijing) University in the areas of distributed systems and Web search engines (May 1999–May 2001), and visited the School of Computing Science at Simon Fraser University (June 2001–December 2001), the Department of Computer Science at the University of Illinois at Urbana-Champaign (December 2001–July 2003), and the Digital Technology Center and Department of Computer Science and Engineering at the University of Minnesota (July 2003–November 2004), mainly working in the area of data mining. He is currently an associate professor in the Department of Computer Science and Technology, Tsinghua University, Beijing, China.Philip S. Yuis the manager of the Software Tools and Techniques group at the IBM Thomas J. Watson Research Center. The current focuses of the project include the development of advanced algorithms and optimization techniques for data mining, anomaly detection and personalization, and the enabling of Web technologies to facilitate E-commerce and pervasive computing. Dr. Yu,s research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, disk arrays, computer architecture, performance modeling and workload analysis. Dr. Yu has published more than 340 papers in refereed journals and conferences. He holds or has applied for more than 200 US patents. Dr. Yu is an IBM Master Inventor.Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He will become the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering on Jan. 2001. He is an associate editor of ACM Transactions of the Internet Technology and also Knowledge and Information Systems Journal. He is a member of the IEEE Data Engineering steering committee. He also serves on the steering committee of IEEE Intl. Conference on Data Mining. He received an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts”. Philip S. Yu received the B.S. Degree in E.E. from National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University.  相似文献   

14.
高维Turnstile型数据流聚类算法   总被引:4,自引:1,他引:3  
现有数据流聚类算法只能处理Time Series和Cash Register型数据流,并且应用于高维数据流时其精度不甚理想。提出针对高维Turnstile型数据流的子空间聚类算法HT-Stream,算法对数据空间进行网格划分,在线动态维护网格单元信息,采用倾斜时间窗口存储统计信息,根据用户指定时间跨度离线输出聚类结果。基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性。  相似文献   

15.
一种大规模高维数据快速聚类算法   总被引:1,自引:1,他引:1  
提出了一种面向大规模高维数据的自组织映射聚类算法. 算法通过压缩神经元的特征集合, 仅选择与神经元代表的文档类相关的特征构造神经元的特征向量, 从而减少了聚类时间. 同时由于选取的特征能够将映射到不同神经元的文档类进行有效区分, 避免了无关特征的干扰, 因而提升了聚类的精度. 实验结果表明该方法能够有效加快聚类的速度, 提升聚类的准确度, 达到比较理想的聚类效果.  相似文献   

16.
CABOSFV是一种有效的高维数据聚类算法。针对CABOSFV算法倾向于将数据对象分配到更大的类中这一问题,提出一种拓展差异度的高维数据聚类算法(CABOSFV_D)。该算法引入了调整指数[p],对原始稀疏差异度进行拓展,降低类大小对对象分配的影响;同时用位集的方式实现CABOSFV_D算法,使算法的运算效率明显提升。基于多个UCI标准数据集进行聚类实验,结果表明CABOSFV_D在聚类效果和时间效率上均优于原始CABOSFV算法。  相似文献   

17.
祝琴  高学东  武森  陈敏  陈华 《计算机工程》2010,36(22):13-14
针对CABOSFV聚类算法对数据输入顺序的敏感性问题,提出融合排序思想的高属性维稀疏数据聚类算法,通过计算首次聚类中两两高属性维稀疏数据非零属性取值情况确定所需要计算差异度的集合组合,减小了算法复杂度。应用结果表明,该方法能提高CABOSFV聚类的质量。  相似文献   

18.
Multimedia Tools and Applications - The emergence of IoT and advanced multimedia information systems have undoubtedly created a proliferation of video sensor data. Although diverse machine learning...  相似文献   

19.
针对含有噪声的高维数据的聚类问题,提出一种使用新的距离度量方式的增量式聚类算法ANFCM(c+p)。由于传统的模糊C均值聚类算法对初始化聚类中心比较敏感,所提出的聚类算法将单程FCM的增量机制(称为SpFCM)与FCPM中使用的初始化聚类中心的策略相结合,即将先前数据块的聚类中心附近的几个样本点添加到下一个数据块进行聚类,以避免FCM对噪声的敏感性。此外,所提出的聚类算法使用一种新的改进后的距离度量的同时,使用修正后的约束条件和目标函数。通过以上改进,可以有效区分已知类和未知类在算法中的不同影响程度,并加强类之间的相互影响程度。实验结果表明,该算法对高维噪声数据具有很好的聚类效果和鲁棒性。  相似文献   

20.
In data mining, the usefulness of a data pattern depends on the user of the database and does not solely depend on the statistical strength of the pattern. Based on the premise that heuristic search in combinatorial spaces built on computer and human cognitive theories is useful for effective knowledge discovery, this study investigates how the use of self-organizing maps as a tool of data visualization in data mining plays a significant role in human–computer interactive knowledge discovery. This article presents the conceptual foundations of the integration of data visualization and query processing for knowledge discovery, and proposes a set of query functions for the validation of self-organizing maps in data mining. Received 1 November 1999 / Revised 2 March 2000 / Accepted in revised form 20 October 2000  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号