首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
邱烨  何振峰 《计算机科学》2012,39(8):196-198,209
结合关联限制K-means算法能有效地提高聚类结果,但对数据对象分配次序却非常敏感。为获得一个好的分配次序,提出了一种基于分配次序聚类不稳定性的迭代学习算法。根据Cop-Kmeans算法的稳定性特点,采用迭代思想,逐步确定数据对象的稳定性,进而确定分配次序。实验结果表明,基于分配次序聚类不稳定性迭代学习算法有效地提高了Cop-Kmeans算法的准确率。  相似文献   

2.
This paper proposes a graph‐based approach for semisupervised clustering based on pairwise relations among instances. In our approach, the entire data set is represented as an edge‐weighted graph by mapping each data element (instance) as a vertex and connecting the instances by edges with their similarities. In order to reflect pairwise constraints on the clustering process, the graph is modified by contraction as it is known from general graph theory and the graph Laplacian in spectral graph theory. The graph representation enables us to deal with pairwise constraints as well as pairwise similarities over the same unified representation. By exploiting the constraints as well as similarities among instances, the entire data set is projected onto a subspace via the modified graph, and data clustering is conducted over the projected representation. The proposed approach is evaluated over several real‐world data sets. The results are encouraging and show that it is worthwhile to pursue the proposed approach.  相似文献   

3.
In this short paper, a unified framework for performing density-weighted fuzzy $c$-means (FCM) clustering of feature and relational datasets is presented. The proposed approach consists of reducing the original dataset to a smaller one, assigning each selected datum a weight reflecting the number of nearby data, clustering the weighted reduced dataset using a weighted version of the feature or relational data FCM algorithm, and if desired, extending the reduced data results back to the original dataset. Several methods are given for each of the tasks of data subset selection, weight assignment, and extension of the weighted clustering results. The newly proposed weighted version of the non-Euclidean relational FCM algorithm is proved to produce the identical results as its feature data analog for a certain type of relational data. Artificial and real data examples are used to demonstrate and contrast various instances of this general approach.   相似文献   

4.
This paper presents a linear assignment algorithm for solving the clustering problem. By using the most dissimilar data as cluster representatives, a linear assignment algorithm is developed based on the linear assignment model for clustering multivariate data. The computational results evaluated using multiple performance criteria show that the clustering algorithm is very effective and efficient, especially for clustering a large number of data with many attributes  相似文献   

5.
There is no doubt that clustering is one of the most studied data mining tasks. Nevertheless, it remains a challenging problem to solve despite the many proposed clustering approaches. Graph-based approaches solve the clustering task as a global optimization problem, while many other works are based on local methods. In this paper, we propose a novel graph-based algorithm “GBR” that relaxes some well-defined method even as improving the accuracy whilst keeping it simple. The primary motivation of our relaxation of the objective is to allow the reformulated objective to find well distributed cluster indicators for complicated data instances. This relaxation results in an analytical solution that avoids the approximated iterative methods that have been adopted in many other graph-based approaches. The experiments on synthetic and real data sets show that our relaxation accomplishes excellent clustering results. Our key contributions are: (1) we provide an analytical solution to solve the global clustering task as opposed to approximated iterative approaches; (2) a very simple implementation using existing optimization packages; (3) an algorithm with relatively less computation time over the number of data instances to cluster than other well defined methods in the literature.  相似文献   

6.
陆林花 《计算机仿真》2009,26(7):122-125,158
为了在聚类数不明确的情况下实现聚类分析,提出一种新的结合最近邻聚类和遗传算法的动态聚类算法.新算法包括两个阶段:第一阶段用最近邻聚类算法根据最近邻方法把最相似的实例分到同一个簇中并根据一些相似性或相异性度量过滤掉噪声数据从而得到初始聚类集,第二阶段是遗传优化阶段,利用动态聚类评估函数,动态地合并初始聚类集,从而获得接近最优的解.最后对算法进行了实验仿真,实验结果表明方法在事先不知道聚类数的情况下能够有效地进行聚类.  相似文献   

7.
数据库查询方法审计疑点发现依赖于审计人员先验知识,当经验不足且审计数据量巨大时,难以发挥大数据优势并从海量数据中发现疑点。为解决这一问题,提出基于改进Leaders算子迭代聚类的审计大数据潜在疑点发现方法。该方法在无先验知识的情形下,通过Leaders算法自动完成审计大数据的初始聚类,在此基础上通过随机抽样融合方法对初始聚类结果优化,最后通过多次迭代聚类的方法,对实例数较少或可疑程度易被掩盖的小簇进一步聚类,实现审计大 数据的精确聚类,并将实例较少且行为明显异常的数据聚类识别为潜在疑点,配合审计人员审计经验快速精确定位审计疑点。实验结果验证了算法的有效性,表明算法有助于从海量数据中自主发现审计疑点,缩小疑点筛查范围,提高审计效率。  相似文献   

8.
提出一个求解多车库VRPTW问题的聚类和迭代混合遗传算法。该算法采用三阶段过程:客户聚类分配、路径规划和路径改进,与以往两阶段算法不同,该算法采用混合遗传算法进行路径规划,采用竞争-插入进行路径改进,且路径规划与路径改进有机结合形成迭代路径规划过程。用Cordeau等人提出的算例实验表明该算法能够在可以接受的计算时间内得到可接受的好解。  相似文献   

9.
Constructive genetic algorithm for clustering problems   总被引:1,自引:0,他引:1  
Genetic algorithms (GAs) have recently been accepted as powerful approaches to solving optimization problems. It is also well-accepted that building block construction (schemata formation and conservation) has a positive influence on GA behavior. Schemata are usually indirectly evaluated through a derived structure. We introduce a new approach called the Constructive Genetic Algorithm (CGA), which allows for schemata evaluation and the provision of other new features to the GA. Problems are modeled as bi-objective optimization problems that consider the evaluation of two fitness functions. This double fitness process, called fg-fitness, evaluates schemata and structures in a common basis. Evolution is conducted considering an adaptive rejection threshold that contemplates both objectives and attributes a rank to each individual in population. The population is dynamic in size and composed of schemata and structures. Recombination preserves good schemata, and mutation is applied to structures to get population diversification. The CGA is applied to two clustering problems in graphs. Representation of schemata and structures use a binary digit alphabet and are based on assignment (greedy) heuristics that provide a clearly distinguished representation for the problems. The clustering problems studied are the classical p-median and the capacitated p-median. Good results are shown for problem instances taken from the literature.  相似文献   

10.
获取数据流上样本的真实类别的代价很高,因此标记所有样本的方式缺乏实用性,而随机标记部分样本又会导致模型的不稳定.针对上述问题,文中提出基于聚类假设的数据流分类算法.基于通过聚类算法分到同类中的样本可能具有相同类别这一聚类假设,利用训练数据集上的聚类结果拟合样本的分布情况,在分类阶段有目的性地选取很难分类或潜在概念漂移的样本更新模型.为了训练数据集上每个类别的样本,建立各自对应的基础分类器,当数据流中样本的类别消失或重现时,只需要冻结或激活与之对应的基础分类器,而无需再重新学习之前已经掌握的知识.实验表明,文中算法能够在适应概念漂移的前提下,减少更新模型需要的样本数量,并且取得和当前数据流上的分类算法相当或更好的分类效果.  相似文献   

11.
Chien-Yu  Shien-Ching  Yen-Jen   《Pattern recognition》2005,38(12):2256-2269
As the sizes of many contemporary databases continue to grow rapidly, incremental clustering has emerged as an essential issue for conducting data analysis on contemporary databases. An incremental clustering algorithm refers to an abstraction of the distribution of the data instances generated by the previous run of the algorithm and therefore is able to cope well with the ever-growing contemporary databases. There are two main challenges in the design of incremental clustering algorithms. The first challenge is how to reduce information loss due to the data abstraction (or summarization) operations. The second challenge is that the clustering result should not be sensitive to the order of input data. This paper presents the GRIN algorithm, an incremental hierarchical clustering algorithm for numerical datasets based on the gravity theory in physics. In the design of GRIN, a statistical test aimed at reducing information loss and distortion is employed to control formation of subclusters as well as to monitor the evolution of the dataset. Due to the statistical test-based summarization approach, GRIN is able to achieve near linear scalability and is not sensitive to input ordering.  相似文献   

12.
崔鹏  张汝波 《计算机科学》2010,37(7):205-207
半监督聚类是近年来研究的热点,传统的方法是在无监督算法的基础上加入有限的背景知识来提高聚类性能.然而大多数半监督聚类技术都基于邻近或密度,难以处理高维数据,因此必须将约减的特征加入到半监督聚类过程中.为解决此问题,提出了一种新的半监督聚类算法框架.该算法利用样本约束传递性进行预处理,然后将特征投影到低维空间实现降维,最终用半监督算法对约减后的样本进行聚类.通过实验同现行主要降维方法进行了比较,说明此方法能有效地处理高维数据,聚类效果良好.  相似文献   

13.
In the last times, semi-supervised clustering has been an area that has received a lot of attention. It is distinguished from more traditional unsupervised approaches on the use of a small amount of supervision to “steer” clustering. Unfortunately in the real world, the supervision is not always available: data to process are often too large and so the cost (in terms of time and human resources) for user-provided information is not conceivable. To address this issue, this work presents an automatic generation of the supervision, by the analysis of the data structure itself. This analysis is performed using a partitional clustering algorithm that discovers relationships between pairs of instances that may be used as a semi-supervision in the clustering process. The methodology has been studied in the document clustering domain, an area where novel approaches for accurate documents classifications are strongly required. Experimental result shows the validity of this approach.  相似文献   

14.
密度峰值聚类算法在处理分类型数据时难以产生较好的聚类效果。针对该现象,详细分析了其产生的原因:距离计算的重叠问题和密度计算的聚集问题。同时为了解决上述问题,提出了一种面向分类型数据的密度峰值聚类算法(Cauchy kernel-based density peaks clustering for categorical data,CDPCD)。算法首先指出分类型数据距离度量过程中有序特性(分类型数据属性值之间的顺序关系)鲜有考虑的现状,进而提出一种基于概率分布的加权有序距离度量来缓解重叠问题。通过结合柯西核函数,在共享最近邻密度峰值聚类算法基础上重新评估数据密度值,改进了密度计算和二次分配方式,增强了密度多样性,降低了聚集问题带来的影响。多个真实数据集上的实验结果表明,相较于传统的基于划分和密度的聚类算法,CDPCD都取得了更好的聚类结果。  相似文献   

15.
基于模糊测度和证据理论的模糊聚类集成方法   总被引:1,自引:1,他引:0  
针对现有集成方法在处理模糊聚类时存在的不足,提出一种基于证据理论的模糊聚类集成方法.以各聚类成员作为证据元,以样本点间的类别关系作为焦元,通过证据积累构造互相关矩阵.考虑到模糊聚类对于各样本点的聚类有效性,提出一种结合点模糊度和模糊贴近度的类别关系表示方法,并以此作为各证据元的基本概率赋值函数.最后基于互相关矩阵构造样本点间相似性关系,并利用谱聚类算法对其聚类. 实验中通过与多种已有聚类集成方法的对比表明,该方法具有较高的聚类性能.  相似文献   

16.
Owing to sparseness, directly clustering high-dimensional data is still a challenge problem. Therefore, obtaining their low-dimensional compact representation by dimensional reduction is an effective method for clustering high-dimensional data. Most of existing dimensionality reduction methods, however, are developed originally for classification (such as Linear Discriminant Analysis) or recovering the geometric structure (known as manifold) of high-dimensional data (such as Locally Linear Embedding) rather than clustering purpose. Hence, a novel nonlinear discriminant clustering by dimensional reduction based on spectral regularization is proposed. The contributions of the proposed method are two folds: (1) it can obtain nonlinear low-dimensional representation that can recover the intrinsic manifold structure as well as enhance the cluster structure of the original high-dimensional data; (2) the clustering results can also be obtained in the dimensionality reduction procedure. Firstly, the desired low-dimensional coordinates are represented as linear combinations of predefined smooth vectors with respect to the data manifold, which are characterized by a weighted graph. Then, the optimal combination coefficients and the optimal cluster assignment matrix are computed by maximizing the ratio between the between-cluster scatter and the total scatter simultaneously as well as preserving the smoothness of the cluster assignment matrix with respect to the data manifold. Finally, the optimization problem is solved in an iterative procedure, which is proved to be convergent. Experiments on UCI data sets and real world data sets demonstrated the effectiveness of the proposed method for both clustering and visualization high-dimensional data set.  相似文献   

17.
K-Hub聚类算法是一种有效的高维数据聚类算法,但是它对初始聚类中心的选择非常敏感,并且对于靠近类边界的实例往往不能正确聚类.为了解决这些问题,提出一种结合主动学习和半监督聚类的K-Hub聚类算法.运用主动学习策略学习部分实例的关联限制,然后利用这些关联限制指导K-Hub的聚类过程.实验结果表明,基于主动学习的K-Hub聚类算法能有效提升K-Hub的聚类准确率.  相似文献   

18.
The problem of task assignment in heterogeneous computing systems has been studied for many years with many variations. We consider the version in which communicating tasks are to be assigned to heterogeneous processors with identical communication links to minimize the sum of the total execution and communication costs. Our contributions are three fold: a task clustering method which takes the execution times of the tasks into account; two metrics to determine the order in which tasks are assigned to the processors; a refinement heuristic which improves a given assignment. We use these three methods to obtain a family of task assignment algorithms including multilevel ones that apply clustering and refinement heuristics repeatedly. We have implemented eight existing algorithms to test the proposed methods. Our refinement algorithm improves the solutions of the existing algorithms by up to 15% and the proposed algorithms obtain better solutions than these refined solutions.  相似文献   

19.
Multidimensional projection‐based visualization methods typically rely on clustering and attribute selection mechanisms to enable visual analysis of multidimensional data. Clustering is often employed to group similar instances according to their distance in the visual space. However, considering only distances in the visual space may be misleading due to projection errors as well as the lack of guarantees to ensure that distinct clusters contain instances with different content. Identifying clusters made up of a few elements is also an issue for most clustering methods. In this work we propose a novel multidimensional projection‐based visualization technique that relies on representative instances to define clusters in the visual space. Representative instances are selected by a deterministic sampling scheme derived from matrix decomposition, which is sensitive to the variability of data while still been able to handle classes with a small number of instances. Moreover, the sampling mechanism can easily be adapted to select relevant attributes from each cluster. Therefore, our methodology unifies sampling, clustering, and feature selection in a simple framework. A comprehensive set of experiments validate our methodology, showing it outperforms most existing sampling and feature selection techniques. A case study shows the effectiveness of the proposed methodology as a visual data analysis tool.  相似文献   

20.
This paper presents and analyzes a Two-Phase Multi-Swarm Particle Swarm Optimizer (2MPSO) solving the Dynamic Vehicle Routing Problem (DVRP). The research presented in this paper focuses on finding a configuration of several optimization improvement techniques, dedicated to solving dynamic optimization problems, within the 2MPSO framework. Techniques, whose impact on results achieved for DVRP is analyzed, include: solving the current state of a problem with a capacitated clustering and routing heuristic algorithms, solving requests-to-vehicles assignment by the PSO algorithm, route optimization by a separate instance of the PSO algorithm, and knowledge transfer between subsequent states of the problem. The results obtained by the best chosen configuration of the 2MPSO are compared with the state-of-the-art literature results on a popular set of benchmark instances.Our study shows that strong results achieved by 2MPSO should be attributed to three factors: generating initial solutions with a clustering heuristic, optimizing the requests-to-vehicle assignment with a metaheuristic approach, direct passing of solutions obtained in the previous stage (times step) of the problem solving procedure to the next stage. Additionally, 2MPSO outperforms the average results obtained by other algorithms presented in the literature, both in the time limited experiments, as well as those restricted by the number of fitness function evaluations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号