首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Spatial clustering has attracted a lot of research attention due to its various applications. In most conventional clustering problems, the similarity measurement mainly takes the geometric attributes into consideration. However, in many real applications, the nongeometric attributes are what users are concerned about. In the conventional spatial clustering, the input data set is partitioned into several compact regions and data points which are similar to one another in their nongeometric attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. To remedy this, we propose and explore in this paper a new clustering problem on two domains, called dual clustering, where one domain refers to the optimization domain and the other refers to the constraint domain. Attributes on the optimization domain are those involved in the optimization of the objective function, while those on the constraint domain specify the application dependent constraints. Our goal is to optimize the objective function in the optimization domain while satisfying the constraint specified in the constraint domain. We devise an efficient and effective algorithm, named Interlaced Clustering-Classification, abbreviated as ICC, to solve this problem. The proposed ICC algorithm combines the information in both domains and iteratively performs a clustering algorithm on the optimization domain and also a classification algorithm on the constraint domain to reach the target clustering effectively. The time and space complexities of the ICC algorithm are formally analyzed. Several experiments are conducted to provide the insights into the dual clustering problem and the proposed algorithm.  相似文献   

2.
关系数据库中数量属性的关联规则挖掘问题是关联规则挖掘中经常要遇到的问题。该文利用遗传算法解决FCM模糊聚类问题主要是为了避免FCM算法的局部极小问题。利用聚类的结果可以使数量型属性关联规则转换成类别型属性,类别型属性再转化为布尔型属性,这样,即可以使用许多已有关联规则挖掘方法挖掘出有意义的规则。  相似文献   

3.
双层随机游走半监督聚类   总被引:3,自引:0,他引:3  
何萍  徐晓华  陆林  陈崚 《软件学报》2014,25(5):997-1013
半监督聚类旨在根据用户给出的必连和不连约束,把所有数据点划分到不同的簇中,从而获得更准确、更加符合用户要求的聚类结果.目前的半监督聚类算法大多数通过修改已有的聚类算法或者结合度规学习,使聚类结果与点对约束尽可能地保持一致,却很少考虑点对约束对周围无约束数据的显式影响程度.提出一种由在顶点上的低层随机游走和在组件上的高层随机游走两部分构成的双层随机游走半监督聚类算法,其中,低层随机游走主要负责计算选出的约束顶点对其他顶点的影响范围和影响程度,称为组件;高层随机游走则进一步将各个点对约束以自适应的强度在组件上进行约束传播,把它们在每个顶点上的影响综合在一个簇指示矩阵中.UCI数据集和大型真实数据集上的实验结果表明,双层随机游走半监督聚类算法比其他半监督聚类算法更准确,也比较高效.  相似文献   

4.
模糊聚类在数量型关联规则提取中的应用   总被引:1,自引:0,他引:1  
王越  曹长修 《计算机仿真》2003,20(11):64-66,69
关系数据库中数量属性的关联规则挖掘问题是经常要遇到的问题。该文利用改进的FCM进行模糊聚类,主要是解决FCM算法的局部极小问题。利用聚类的结果可以使数量型属性关联规则向类别型属性转换,类别型属性再转化为布尔型属性,这样,便可以从许多关联规则的挖掘方法中找出有意义的规则。  相似文献   

5.
Hierarchical clustering of mixed data based on distance hierarchy   总被引:1,自引:0,他引:1  
Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity.  相似文献   

6.
7.
Automatic network clustering is an important technique for mining the meaningful communities (or clusters) of a network. Communities in a network are clusters of nodes where the intra-cluster connection density is high and the inter-cluster connection density is low. The most popular scheme of automatic network clustering aims at maximizing a criterion function known as modularity in partitioning all the nodes into clusters. But it is found that the modularity suffers from the resolution limit problem, which remains an open challenge. In this paper, the automatic network clustering is formulated as a constrained optimization problem: maximizing a criterion function with a density constraint. With this scheme, the established algorithm can be free from the resolution limit problem. Furthermore, it is found that the density constraint can improve the detection accuracy of the modularity optimization. The efficiency of the proposed scheme is verified by comparative experiments on large scale benchmark networks.  相似文献   

8.
BIRCH混合属性数据聚类方法   总被引:2,自引:1,他引:1       下载免费PDF全文
数据聚类是数据挖掘中的重要研究内容。现实世界中的数据往往同时具有连续属性和离散属性,但现有大多数算法局限于仅处理其中一种属性,而对另一种采取简单舍弃的办法丢失聚类信息和降低聚类质量。一些能处理混合属性的算法又往往处理的属性过多,导致计算量的大增。提出了一种基于BIRCH算法的混合属性数据的聚类算法;在UCI数据集上的实验表明,文中提出的算法具有较好的性能。  相似文献   

9.
《Pattern recognition》2014,47(2):820-832
A key issue of semi-supervised clustering is how to utilize the limited but informative pairwise constraints. In this paper, we propose a new graph-based constrained clustering algorithm, named SCRAWL. It is composed of two random walks with different granularities. In the lower-level random walk, SCRAWL partitions the vertices (i.e., data points) into constrained and unconstrained ones, according to whether they are in the pairwise constraints. For every constrained vertex, its influence range, or the degrees of influence it exerts on the unconstrained vertices, is encapsulated in an intermediate structure called component. The edge set between each pair of components determines the affecting scope of the pairwise constraints. In the higher-level random walk, SCRAWL enforces the pairwise constraints on the components, so that the constraint influence can be propagated to the unconstrained edges. At last, we combine the cluster membership of all the components to obtain the cluster assignment for each vertex. The promising experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our method.  相似文献   

10.
We optimize eigenvalues in optimal shape design using binary level set methods. The interfaces of subregions are represented implicitly by the discontinuities of binary level set functions taking two values 1 or ?1 at convergence. A binary constraint is added to the original model problems. We propose two variational algorithms to solve the constrained optimization problems. One is a hybrid type by coupling the Lagrange multiplier approach for the geometry constraint with the augmented Lagrangian method for the binary constraint. The other is devised using the Lagrange multiplier method for both constraints. The two iterative algorithms are both largely independent of the initial guess and can satisfy the geometry constraint very accurately during the iterations. Intensive numerical results are presented and compared with those obtained by level set methods, which demonstrate the effectiveness and robustness of our algorithms.  相似文献   

11.
Clustering is an important research area with numerous applications in pattern recognition, machine learning, and data mining. Since the clustering problem on numeric data sets can be formulated as a typical combinatorial optimization problem, many researches have addressed the design of heuristic algorithms for finding sub-optimal solutions in a reasonable period of time. However, most of the heuristic clustering algorithms suffer from the problem of being sensitive to the initialization and do not guarantee the high quality results. Recently, Approximate Backbone (AB), i.e., the commonly shared intersection of several sub-optimal solutions, has been proposed to address the sensitivity problem of initialization. In this paper, we aim to introduce the AB into heuristic clustering to overcome the initialization sensitivity of conventional heuristic clustering algorithms. The main advantage of the proposed method is the capability of restricting the initial search space around the optimal result by defining the AB, and in turn, reducing the impact of initialization on clustering, eventually improving the performance of heuristic clustering. Experiments on synthetic and real world data sets are performed to validate the effectiveness of the proposed approach in comparison to three conventional heuristic clustering algorithms and three other algorithms with improvement on initialization.  相似文献   

12.
Non-redundant data clustering   总被引:6,自引:6,他引:0  
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We discuss extensions of the technique to the tasks of semi-supervised classification and enumeration of successive non-redundant clusterings. We present experimental results for applications in text mining and computer vision.  相似文献   

13.
关联规则的冗余删除与聚类   总被引:9,自引:0,他引:9  
关联规则挖掘常常会产生大量的规则,这使得用户分析和利用这些规则变得十分困难,尤其是数据库中属性高度相关时,问题更为突出.为了帮助用户做探索式分析,可以采用各种技术来有效地减少规则数量,如约束性关联规则挖掘、对规则进行聚类或泛化等技术.本文提出一种关联规则冗余删除算法ADRR和一种关联规则聚类算法ACAR.根据集合具有的性质,证明在挖掘到的关联规则中存在大量可以删除的冗余规则,从而提出了算法ADRR;算法ACAR采用一种新的用项目间的相关性来定义规则间距离的方法,结合DBSCAN算法的思想对关联规则进行聚类.最后将本文提出的算法加以实现,实验结果表明该算法暑有数可行的.且具较高的效率。  相似文献   

14.
一种结合主动学习的半监督文档聚类算法   总被引:1,自引:0,他引:1  
半监督文档聚类,即利用少量具有监督信息的数据来辅助无监督文档聚类,近几年来逐渐成为机器学习和数据挖掘领域研究的热点问题.由于获取大量监督信息费时费力,因此,国内外学者考虑如何获得少量但对聚类性能提高显著的监督信息.提出一种结合主动学习的半监督文档聚类算法,通过引入成对约束信息指导DBSCAN的聚类过程来提高聚类性能,得到一种半监督文档聚类算法Cons-DBSCAN.通过对约束集中所含信息量的衡量和对DBSCAN算法本身的分析,提出了一种启发式的主动学习算法,能够选取含信息量大的成对约束集,从而能够更高效地辅助半监督文档聚类.实验结果表明,所提出的算法能够高效地进行文档聚类.通过主动学习算法获得的成对约束集,能够显著地提高聚类性能.并且,算法的性能优于两个代表性的结合主动学习的半监督聚类算法.  相似文献   

15.
基于成对约束的判别型半监督聚类分析   总被引:10,自引:1,他引:9  
尹学松  胡恩良  陈松灿 《软件学报》2008,19(11):2791-2802
现有一些典型的半监督聚类方法一方面难以有效地解决成对约束的违反问题,另一方面未能同时处理高维数据.通过提出一种基于成对约束的判别型半监督聚类分析方法来同时解决上述问题.该方法有效地利用了监督信息集成数据降维和聚类,即在投影空间中使用基于成对约束的K均值算法对数据聚类,再利用聚类结果选择投影空间.同时,该算法降低了基于约束的半监督聚类算法的计算复杂度,并解决了聚类过程中成对约束的违反问题.在一组真实数据集上的实验结果表明,与现有相关半监督聚类算法相比,新方法不仅能够处理高维数据,还有效地提高了聚类性能.  相似文献   

16.
A Generic Framework for Constrained Optimization Using Genetic Algorithms   总被引:7,自引:0,他引:7  
In this paper, we propose a generic, two-phase framework for solving constrained optimization problems using genetic algorithms. In the first phase of the algorithm, the objective function is completely disregarded and the constrained optimization problem is treated as a constraint satisfaction problem. The genetic search is directed toward minimizing the constraint violation of the solutions and eventually finding a feasible solution. A linear rank-based approach is used to assign fitness values to the individuals. The solution with the least constraint violation is archived as the elite solution in the population. In the second phase, the simultaneous optimization of the objective function and the satisfaction of the constraints are treated as a biobjective optimization problem. We elaborate on how the constrained optimization problem requires a balance of exploration and exploitation under different problem scenarios and come to the conclusion that a nondominated ranking between the individuals will help the algorithm explore further, while the elitist scheme will facilitate in exploitation. We analyze the proposed algorithm under different problem scenarios using Test Case Generator-2 and demonstrate the proposed algorithm's capability to perform well independent of various problem characteristics. In addition, the proposed algorithm performs competitively with the state-of-the-art constraint optimization algorithms on 11 test cases which were widely studied benchmark functions in literature.  相似文献   

17.
一般来说.外存访问的数据文件中针对多属性的区域查询有两个改进其效率的方向.一个是在其上建立索引,另一个是在物理层按照某种规律重新安排记录.探讨如何通过第二种方法来提高范围查询的效率,即通过多维聚簇的方式得到数据文件中更好的记录的存储顺序.首先,细致分析了该问题,并针对该问题构造了一个数学模型,然后通过引入光谱算法(SA)的思想为解决该NP难问题提供了一种多项式时间内的近似解.最后通过实验来验证了该方法在矩形区域查询和单维范围查询方面的有效性.  相似文献   

18.
With the increasing adoption of role-based access control (RBAC) in business security, role mining technology has been widely applied to aid the process of migrating a non-RBAC system to an RBAC system. However, because it is hard to deal with a variety of constraint conflicts at the same time, none of existing role mining algorithms can simultaneously satisfy various constraints that usually describe organizations’ security and business requirements. To extend the ability of role mining technology, this paper proposes a novel role mining approach using answer set programming (ASP) that complies with constraints and meets various optimization objectives, named constrained role miner (CRM). Essentially, the idea is that ASP is an approach to declarative problem solving. Thus, either to discover RBAC configurations or to deal with conflicts between constraints, ASP programs do not need to specify how answers are computed. Finally, we demonstrate the effectiveness and efficiency of our approach through experimental results.  相似文献   

19.
In this paper we explore a recent iterative compression technique called non-negative matrix factorization (NMF). Several special properties are obtained as a result of the constrained optimization problem of NMF. For facial images, the additive nature of NMF results in a basis of features, such as eyes, noses, and lips. We explore various methods for efficiently computing NMF, placing particular emphasis on the initialization of current algorithms. We propose using Spherical K-Means clustering to produce a structured initialization for NMF. We demonstrate some of the properties that result from this initialization and develop an efficient way of choosing the rank of the low-dimensional NMF representation.  相似文献   

20.
聚类分析由于其应用较为广泛,已经成为数据挖掘、数理统计等学科的一个活跃的研究领域。聚类技术可以应用于模式识别、数据分析、图像处理、网页挖掘、电子商务等。以往的聚类分析都没有考虑现实世界存在的物体障碍问题从而影响聚类结果。该文对有障碍物体聚类问题进行了初步的探讨,并且提出了一种称之为改进的变色龙(ADP-Chameleon)的算法来解决有障碍物体聚类问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号