首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 203 毫秒
1.
Spatial clustering has attracted a lot of research attention due to its various applications. In most conventional clustering problems, the similarity measurement mainly takes the geometric attributes into consideration. However, in many real applications, the nongeometric attributes are what users are concerned about. In the conventional spatial clustering, the input data set is partitioned into several compact regions and data points which are similar to one another in their nongeometric attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. To remedy this, we propose and explore in this paper a new clustering problem on two domains, called dual clustering, where one domain refers to the optimization domain and the other refers to the constraint domain. Attributes on the optimization domain are those involved in the optimization of the objective function, while those on the constraint domain specify the application dependent constraints. Our goal is to optimize the objective function in the optimization domain while satisfying the constraint specified in the constraint domain. We devise an efficient and effective algorithm, named Interlaced Clustering-Classification, abbreviated as ICC, to solve this problem. The proposed ICC algorithm combines the information in both domains and iteratively performs a clustering algorithm on the optimization domain and also a classification algorithm on the constraint domain to reach the target clustering effectively. The time and space complexities of the ICC algorithm are formally analyzed. Several experiments are conducted to provide the insights into the dual clustering problem and the proposed algorithm.  相似文献   

2.
关系数据库中数量属性的关联规则挖掘问题是关联规则挖掘中经常要遇到的问题。该文利用遗传算法解决FCM模糊聚类问题主要是为了避免FCM算法的局部极小问题。利用聚类的结果可以使数量型属性关联规则转换成类别型属性,类别型属性再转化为布尔型属性,这样,即可以使用许多已有关联规则挖掘方法挖掘出有意义的规则。  相似文献   

3.
双层随机游走半监督聚类   总被引:3,自引:0,他引:3  
何萍  徐晓华  陆林  陈崚 《软件学报》2014,25(5):997-1013
半监督聚类旨在根据用户给出的必连和不连约束,把所有数据点划分到不同的簇中,从而获得更准确、更加符合用户要求的聚类结果.目前的半监督聚类算法大多数通过修改已有的聚类算法或者结合度规学习,使聚类结果与点对约束尽可能地保持一致,却很少考虑点对约束对周围无约束数据的显式影响程度.提出一种由在顶点上的低层随机游走和在组件上的高层随机游走两部分构成的双层随机游走半监督聚类算法,其中,低层随机游走主要负责计算选出的约束顶点对其他顶点的影响范围和影响程度,称为组件;高层随机游走则进一步将各个点对约束以自适应的强度在组件上进行约束传播,把它们在每个顶点上的影响综合在一个簇指示矩阵中.UCI数据集和大型真实数据集上的实验结果表明,双层随机游走半监督聚类算法比其他半监督聚类算法更准确,也比较高效.  相似文献   

4.
模糊聚类在数量型关联规则提取中的应用   总被引:1,自引:0,他引:1  
王越  曹长修 《计算机仿真》2003,20(11):64-66,69
关系数据库中数量属性的关联规则挖掘问题是经常要遇到的问题。该文利用改进的FCM进行模糊聚类,主要是解决FCM算法的局部极小问题。利用聚类的结果可以使数量型属性关联规则向类别型属性转换,类别型属性再转化为布尔型属性,这样,便可以从许多关联规则的挖掘方法中找出有意义的规则。  相似文献   

5.
Hierarchical clustering of mixed data based on distance hierarchy   总被引:1,自引:0,他引:1  
Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity.  相似文献   

6.
7.
Automatic network clustering is an important technique for mining the meaningful communities (or clusters) of a network. Communities in a network are clusters of nodes where the intra-cluster connection density is high and the inter-cluster connection density is low. The most popular scheme of automatic network clustering aims at maximizing a criterion function known as modularity in partitioning all the nodes into clusters. But it is found that the modularity suffers from the resolution limit problem, which remains an open challenge. In this paper, the automatic network clustering is formulated as a constrained optimization problem: maximizing a criterion function with a density constraint. With this scheme, the established algorithm can be free from the resolution limit problem. Furthermore, it is found that the density constraint can improve the detection accuracy of the modularity optimization. The efficiency of the proposed scheme is verified by comparative experiments on large scale benchmark networks.  相似文献   

8.
BIRCH混合属性数据聚类方法   总被引:2,自引:1,他引:1       下载免费PDF全文
数据聚类是数据挖掘中的重要研究内容。现实世界中的数据往往同时具有连续属性和离散属性,但现有大多数算法局限于仅处理其中一种属性,而对另一种采取简单舍弃的办法丢失聚类信息和降低聚类质量。一些能处理混合属性的算法又往往处理的属性过多,导致计算量的大增。提出了一种基于BIRCH算法的混合属性数据的聚类算法;在UCI数据集上的实验表明,文中提出的算法具有较好的性能。  相似文献   

9.
《Pattern recognition》2014,47(2):820-832
A key issue of semi-supervised clustering is how to utilize the limited but informative pairwise constraints. In this paper, we propose a new graph-based constrained clustering algorithm, named SCRAWL. It is composed of two random walks with different granularities. In the lower-level random walk, SCRAWL partitions the vertices (i.e., data points) into constrained and unconstrained ones, according to whether they are in the pairwise constraints. For every constrained vertex, its influence range, or the degrees of influence it exerts on the unconstrained vertices, is encapsulated in an intermediate structure called component. The edge set between each pair of components determines the affecting scope of the pairwise constraints. In the higher-level random walk, SCRAWL enforces the pairwise constraints on the components, so that the constraint influence can be propagated to the unconstrained edges. At last, we combine the cluster membership of all the components to obtain the cluster assignment for each vertex. The promising experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our method.  相似文献   

10.
We optimize eigenvalues in optimal shape design using binary level set methods. The interfaces of subregions are represented implicitly by the discontinuities of binary level set functions taking two values 1 or ?1 at convergence. A binary constraint is added to the original model problems. We propose two variational algorithms to solve the constrained optimization problems. One is a hybrid type by coupling the Lagrange multiplier approach for the geometry constraint with the augmented Lagrangian method for the binary constraint. The other is devised using the Lagrange multiplier method for both constraints. The two iterative algorithms are both largely independent of the initial guess and can satisfy the geometry constraint very accurately during the iterations. Intensive numerical results are presented and compared with those obtained by level set methods, which demonstrate the effectiveness and robustness of our algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号