首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
概念聚类挖掘方法的客户交易行为分析   总被引:4,自引:0,他引:4  
本文首先介绍数据挖掘的相关概念,再给出了一个在证券行业应用的系统。该系统采用要领聚类的挖掘方法,从客户的交易行为中撮有价值的信息,发现影响客户盈亏的一般性规律。  相似文献   

2.
分析二部图的二元组和概念聚类问题之间的关系,在此基础上结合数据流的特点,提出一种适用于对象属性为布尔型的数据流概念聚类算法。将数据流分段,对每一批到来的数据流,生成局部的近似极大ε二元组集合,对全局的近似极大ε二元组集合进行更新,从而有效地对整个数据流进行聚类。实验结果表明,该算法具有良好的时间效率和空间效率。  相似文献   

3.
一种有效的用于数据挖掘的动态概念聚类算法   总被引:11,自引:0,他引:11  
郭建生  赵奕  施鹏飞 《软件学报》2001,12(4):582-591
概念聚类适用于领域知识不完整或领域知识缺乏时的数据挖掘任务.定义了一种基于语义的距离判定函数,结合领域知识对连续属性值进行概念化处理,对于用分类属性和数值属性混合描述数据对象的情况,提出了一种动态概念聚类算法DDCA(domain-baseddynamicclusteringalgorithm).该算法能够自动确定聚类数目,依据聚类内部属性值的频繁程度修正聚类中心,通过概念归纳处理,用概念合取表达式解释聚类输出.研究表明,基于语义距离判定函数和基于领域知识的动态概念聚类的算法DDCA是有效的.  相似文献   

4.
When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through...  相似文献   

5.
The theory of concept (or Galois) lattices provides a simple and formal approach to conceptual clustering. In this paper we present GALOIS, a system that automates and applies this theory. The algorithm utilized by GALOIS to build a concept lattice is incremental and efficient, each update being done in time at most quadratic in the number of objects in the lattice. Also, the algorithm may incorporate background information into the lattice, and through clustering, extend the scope of the theory. The application we present is concerned with information retrieval via browsing, for which we argue that concept lattices may represent major support structures. We describe a prototype user interface for browsing through the concept lattice of a document-term relation, possibly enriched with a thesaurus of terms. An experimental evaluation of the system performed on a medium-sized bibliographic database shows good retrieval performance and a significant improvement after the introduction of background knowledge.  相似文献   

6.
文本聚类在文本挖掘和信息检索系统中发挥着重要的作用,而词聚类是文本聚类的基础。提出了一种基于混合聚类的中文词聚类方法,它将层次聚类和概念聚类结合起来,以缩短整个聚类时间。首先对预处理后的词集进行初始聚类,然后从每个类中各取一个出现次数最多的词组成新的词集,最后对该词集进行再聚类。实验表明,这种方法有效降低了中文词聚类的时间复杂度。  相似文献   

7.
The automatic discovery of classes of errors that represent misconceptions and other knowledge errors underlying discrepancies in novice behavior is not a trivial task. A novel approach to this problem is described, in which relationships among behavioral discrepancies are analyzed and inductively generalized via an unsupervised, incremental, relational multistrategy conceptual clustering method that takes into account similarities as well as causalities in the data. Performance results on the classification of discrepancy sets and discovery of error classes from discrepancies of buggy PROLOG programs demonstrate the potential of the approach.  相似文献   

8.
文章提出了一种基于算法选择和结果评估的自动聚类方法。对给定数据集,该方法首先通过分析数据集的潜在簇结构,并依据所发现的簇结构为数据集挑选一种合适的备选聚类算法集;然后利用聚类有效性指标对这个算法集的算法聚类结果进行评估,以确保得到高质量聚类结果。实验结果表明该方法能够自动地挑选适合数据集的聚类算法,并获得高质量的聚类结果。  相似文献   

9.
An Improved Algorithm for Online Unit Clustering   总被引:1,自引:0,他引:1  
We revisit the online unit clustering problem in one dimension which we recently introduced at WAOA’06: given a sequence of n points on the line, the objective is to partition the points into a minimum number of subsets, each enclosable by a unit interval. We present a new randomized online algorithm that achieves expected competitive ratio 11/6 against oblivious adversaries, improving the previous ratio of 15/8. This immediately leads to improved upper bounds for the problem in two and higher dimensions as well. A preliminary version of this paper appeared in the Proceedings of the 13th Annual International Computing and Combinatorics Conference (COCOON 2007), LNCS 4598, pp. 383–393.  相似文献   

10.
A Randomized Algorithm for Online Unit Clustering   总被引:1,自引:0,他引:1  
In this paper, we consider the online version of the following problem: partition a set of input points into subsets, each enclosable by a unit ball, so as to minimize the number of subsets used. In the one-dimensional case, we show that surprisingly the naïve upper bound of 2 on the competitive ratio can be beaten: we present a new randomized 15/8-competitive online algorithm. We also provide some lower bounds and an extension to higher dimensions.  相似文献   

11.
In this paper we present an n^ O(k 1-1/d ) -time algorithm for solving the k -center problem in \reals d , under L fty - and L 2 -metrics. The algorithm extends to other metrics, and to the discrete k -center problem. We also describe a simple (1+ɛ) -approximation algorithm for the k -center problem, with running time O(nlog k) + (k/ɛ)^ O(k 1-1/d ) . Finally, we present an n^ O(k 1-1/d ) -time algorithm for solving the L -capacitated k -center problem, provided that L=Ω(n/k 1-1/d ) or L=O(1) . Received July 25, 2000; revised April 6, 2001.  相似文献   

12.
Hush  Don  Scovel  Clint 《Machine Learning》2003,51(1):51-71
This paper studies the convergence properties of a general class of decomposition algorithms for support vector machines (SVMs). We provide a model algorithm for decomposition, and prove necessary and sufficient conditions for stepwise improvement of this algorithm. We introduce a simple rate certifying condition and prove a polynomial-time bound on the rate of convergence of the model algorithm when it satisfies this condition. Although it is not clear that existing SVM algorithms satisfy this condition, we provide a version of the model algorithm that does. For this algorithm we show that when the slack multiplier C satisfies 1/2 C mL, where m is the number of samples and L is a matrix norm, then it takes no more than 4LC 2 m 4/ iterations to drive the criterion to within of its optimum.  相似文献   

13.
每一种聚类算法都有其适合处理的特定分布的数据集.为了给未知分布数据集挑选合适的聚类算法,提出了一种挑选聚类算法的网格连通图方法 SCGG.SCGG通过对数据潜在类结构的分析,若含有环形结构类则选择层次聚类的单连接算法对数据聚类,否则选择k-means算法.实验显示该方法十分的有效,能够挑选到合适的聚类算法对数据聚类.  相似文献   

14.
为了解决主成分分析(PCA)算法无法处理高维数据降维后再聚类精确度下降的问题,提出了一种新的属性空间概念,通过属性空间与信息熵的结合构建了基于特征相似度的降维标准,提出了新的降维算法ENPCA。针对降维后特征是原特征的线性组合而导致可解释性变差以及输入不够灵活的问题,提出了基于岭回归的稀疏主成分算法(ESPCA)。ESPCA算法的输入为主成分降维结果,不需要迭代获得稀疏结果,增加了灵活性和求解速度。最后在降维数据的基础上,针对遗传算法聚类收敛速度慢等问题,对遗传算法的初始化、选择、交叉、变异等操作进行改进,提出了新的聚类算法GKA++。实验分析表明EN-PCA算法表现稳定,GKA++算法在聚类有效性和效率方面表现良好。  相似文献   

15.
面向属性的归纳与概念聚类   总被引:3,自引:1,他引:3  
伍小荣  谢立宏 《计算机工程》2003,29(5):92-93,123
面向属性的归纳是新近提出的一种广泛用于数据库中知识发现的方法,文章指出这种方法与一种机器学习方法-概念聚类之间的紧密联系,并描述如何使用一个概念聚类算法进行面向属性的归纳。  相似文献   

16.
Comparing, clustering and merging ellipsoids are problems that arise in various applications, e.g., anomaly detection in wireless sensor networks and motif-based patterned fabrics. We develop a theory underlying three measures of similarity that can be used to find groups of similar ellipsoids in p-space. Clusters of ellipsoids are suggested by dark blocks along the diagonal of a reordered dissimilarity image (RDI). The RDI is built with the recursive iVAT algorithm using any of the three (dis) similarity measures as input and performs two functions: (i) it is used to visually assess and estimate the number of possible clusters in the data; and (ii) it offers a means for comparing the three similarity measures. Finally, we apply the single linkage and CLODD clustering algorithms to three two-dimensional data sets using each of the three dissimilarity matrices as input. Two data sets are synthetic, and the third is a set of real WSN data that has one known second order node anomaly. We conclude that focal distance is the best measure of elliptical similarity, iVAT images are a reliable basis for estimating cluster structures in sets of ellipsoids, and single linkage can successfully extract the indicated clusters.  相似文献   

17.
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.  相似文献   

18.
基于视觉系统的聚类算法   总被引:15,自引:0,他引:15  
人类对于结构的感知方式和产生数据的物理系统原理对于聚类分析而言具有同等的重要性。因此,在聚类算法的设计和分析中,模拟人类的主要器官-视觉系统将帮助我们解决这一领域的一些基本问题。从这一观点出发,文中提出一类基于初级视觉系统尺度空间理论的聚类算法,并通过引入显著性假设,将生物物理学中的Weber定律与聚类结构的有效性联系起来。由此产生的聚类算法简洁有效,并可部分地回答那些与人类感知数据结构相关联的聚类有效性问题。我们的数值试验表明这一方法具有广泛的应用前景。  相似文献   

19.
Knowledge Acquisition Via Incremental Conceptual Clustering   总被引:32,自引:22,他引:32  
Conceptual clustering is an important way of summarizing and explaining data. However, the recent formulation of this paradigm has allowed little exploration of conceptual clustering as a means of improving performance. Furthermore, previous work in conceptual clustering has not explicitly dealt with constraints imposed by real world environments. This article presents COBWEB, a conceptual clustering system that organizes data so as to maximize inference ability. Additionally, COBWEB is incremental and computationally economical, and thus can be flexibly applied in a variety of domains.  相似文献   

20.
采用的聚类思想是,不替换随机选取的聚类代表,按语义相关的原则界定对象,合并相似度较大的聚类,分解稀疏聚类,对未有归宿的对象再给机会聚类。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号