首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Applying graph theory to clustering, we propose a partitional clustering method and a clustering tendency index. No initial assumptions about the data set are requested by the method. The number of clusters and the partition that best fits the data set, are selected according to the optimal clustering tendency index value.  相似文献   

2.
In this paper, the multiple kernel learning (MKL) is formulated as a supervised classification problem. We dealt with binary classification data and hence the data modelling problem involves the computation of two decision boundaries of which one related with that of kernel learning and the other with that of input data. In our approach, they are found with the aid of a single cost function by constructing a global reproducing kernel Hilbert space (RKHS) as the direct sum of the RKHSs corresponding to the decision boundaries of kernel learning and input data and searching that function from the global RKHS, which can be represented as the direct sum of the decision boundaries under consideration. In our experimental analysis, the proposed model had shown superior performance in comparison with that of existing two stage function approximation formulation of MKL, where the decision functions of kernel learning and input data are found separately using two different cost functions. This is due to the fact that single stage representation helps the knowledge transfer between the computation procedures for finding the decision boundaries of kernel learning and input data, which inturn boosts the generalisation capacity of the model.  相似文献   

3.
基于核方法的并行模糊聚类算法   总被引:1,自引:0,他引:1  
介绍并分析了模糊C-均值聚类算法、基于核方法的模糊C-均值聚类算法以及硬聚类算法.将硬聚类算法和模糊聚类算法结合起来,利用硬聚类算法初始化聚类中心,有效的减少模糊聚类算法的迭代次数.针对海量数据处理问题,将改进后的算法并行化,有效地提高了数据处理速度和效率,并在分布式互联PC环境下进行了性能测试.测试结果表明,基于核方法的并行模糊聚类算法具有很好的规模增长性和加速比.  相似文献   

4.
基于模糊一类支持向量机的核聚类算法   总被引:2,自引:0,他引:2  
引进模糊概念替代距离拒绝尺度,定义具有支持向量特性的模糊隶属度函数,以描述训练点隶属于聚类集的程度.惩罚了边缘点对聚类中心的贡献权重,从而抑制了聚类中心的偏移,在避免复杂的参数搜索过程的同时,保证了算法的鲁棒性能.仿真结果表明,在相同初始条件下,改进算法较原算法对不规则分布数据的处理效率更高.  相似文献   

5.
Clustering analysis is an important topic in artificial intelligence, data mining and pattern recognition research. Conventional clustering algorithms, for instance, the famous Fuzzy C-means clustering algorithm (FCM), assume that all the attributes are equally relevant to all the clusters. However in most domains, especially for high-dimensional dataset, some attributes are irrelevant, and some relevant ones are less important than others with respect to a specific class. In this paper, such imbalances between the attributes are considered and a new weighted fuzzy kernel-clustering algorithm (WFKCA) is presented. WFKCA performs clustering in a kernel feature space mapped by mercer kernels. Compared with the conventional hard kernel-clustering algorithm, WFKCA can yield the meaningful prototypes (cluster centers) of the clusters. Numerical convergence properties of WFKCA are also discussed. For in-depth studies, WFKCA is extended to WFKCA2, which has been demonstrated as a useful tool for clustering incomplete data. Numerical examples demonstrate the effectiveness of the new WFKCA algorithm  相似文献   

6.
将模糊K-均值聚类算法与核函数相结合,采用基于核的模糊K-均值聚类算法来进行聚类。核函数隐含地定义了一个非线性变换,将数据非线性映射到高维特征空间来增加数据的可分性。该算法能够解决模糊K-均值聚类算法对于非凸形状数据不能正确聚类的问题。  相似文献   

7.
All the state of the art approaches based on evolutionary algorithm (EA) for addressing the meta-matching problem in ontology alignment require the domain expert to provide a reference alignment (RA) between two ontologies in advance. Since the RA is very expensive to obtain especially when the scale of ontology is very large, in this paper, we propose to use the Partial Reference Alignment (PRA) built by clustering-based approach to take the place of RA in the process of using evolutionary approach. Then a problem-specific Memetic Algorithm (MA) is proposed to address the meta-matching problem by optimizing the aggregation of three different basic similarity measures (Syntactic Measure, Linguistic Measure and Taxonomy based Measure) into a single similarity metric. The experimental results have shown that using PRA constructed by our approach in most cases leads to higher quality of solution than using PRA built in randomly selecting classes from ontology and the quality of solution is very close to the approach using RA where the precision value of solution is generally high. Comparing to the state of the art ontology matching systems, our approach is able to obtain more accurate results. Moreover, our approach’s performance is better than GOAL approach based on Genetic Algorithm (GA) and RA with the average improvement up to 50.61%. Therefore, the proposed approach is both effective.  相似文献   

8.
Fingerprint matching is an important problem in fingerprint identification. A set of minutiae is usually used to represent a fingerprint. Most existing fingerprint identification systems match two fingerprints using minutiae-based method. Typically, they choose a reference minutia from the template fingerprint and the query fingerprint, respectively. When matching the two sets of minutiae, the template and the query, firstly reference minutiae pair is aligned coordinately and directionally, and secondly the matching score of the rest minutiae is evaluated. This method guarantees satisfactory alignments of regions adjacent to the reference minutiae. However, the alignments of regions far away from the reference minutiae are usually not so satisfactory. In this paper, we propose a minutia matching method based on global alignment of multiple pairs of reference minutiae. These reference minutiae are commonly distributed in various fingerprint regions. When matching, these pairs of reference minutiae are to be globally aligned, and those region pairs far away from the original reference minutiae will be aligned more satisfactorily. Experiment shows that this method leads to improvement in system identification performance.  相似文献   

9.
Image clustering methods are efficient tools for applications such as content-based image retrieval and image annotation. Recently, graph based manifold learning methods have shown promising performance in extracting features for image clustering. Typical manifold learning methods adopt appropriate neighborhood size to construct the neighborhood graph, which captures local geometry of data distribution. Because the density of data points’ distribution may be different in different regions of the manifold, a fixed neighborhood size may be inappropriate in building the manifold. In this paper, we propose a novel algorithm, named sparse patch alignment framework, for the embedding of data lying in multiple manifolds. Specifically, we assume that for each data point there exists a small neighborhood in which only the points that come from the same manifold lie approximately in a low-dimensional affine subspace. Based on the patch alignment framework, we propose an optimization strategy for constructing local patches, which adopt sparse representation to select a few neighbors of each data point that span a low-dimensional affine subspace passing near that point. After that, the whole alignment strategy is utilized to build the manifold. Experiments are conducted on four real-world datasets, and the results demonstrate the effectiveness of the proposed method.  相似文献   

10.
基于PSO_KFCM的医学图像分割   总被引:1,自引:0,他引:1  
在核模糊聚类算法(KFCM)的基础上,提出了一种新的PSO KFCM聚类算法.新算法利用高斯核函数,把输入空间的样本映射到高维特征空间,利用微粒群算法的全局搜索、快速收敛的特点,代替KFCM算法逐次迭代的过程,在特征空间中进行聚类,克服了KFCM对初始值和噪声数据敏感、易陷入局部最优的缺点.通过对医学图像进行分割,仿真实验结果表明,新算法在性能上比KFCM聚类算法有较大改进,具有更好的聚类效果,且算法能够很快地收敛.  相似文献   

11.
Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate and statistically significant multiple alignments is still a challenge. In this paper, we propose an efficient method by using multi-objective genetic algorithm (MSAGMOGA) to discover optimal alignments with affine gap in multiple sequence data. The main advantage of our approach is that a large number of tradeoff (i.e., non-dominated) alignments can be obtained by a single run with respect to conflicting objectives: affine gap penalty minimization and similarity and support maximization. To the best of our knowledge, this is the first effort with three objectives in this direction. The proposed method can be applied to any data set with a sequential character. Furthermore, it allows any choice of similarity measures for finding alignments. By analyzing the obtained optimal alignments, the decision maker can understand the tradeoff between the objectives. We compared our method with the three well-known multiple sequence alignment methods, MUSCLE, SAGA and MSA-GA. As the first of them is a progressive method, and the other two are based on evolutionary algorithms. Experiments on the BAliBASE 2.0 database were conducted and the results confirm that MSAGMOGA obtains the results with better accuracy statistical significance compared with the three well-known methods in aligning multiple sequence alignment with affine gap. The proposed method also finds solutions faster than the other evolutionary approaches mentioned above.  相似文献   

12.
给出了一种新的映射音乐到R°空间的方法和基于串核的音乐风格聚类法.利用统计方法分析大量音乐的旋律轮廓线得到合适的编码模式,用它把旋律轮廓线编码为有限字母表(8个字母)的字符串.利用连续子串嵌入法把音乐串显式映射到高维R°空间,并用核表示这一映射.通过用基于核的山方法选择聚类的适合初始点,最后使用基于核的K-means方法聚类音乐数据集,比较了3个不同串核在5个音乐数据集上的聚类性能.  相似文献   

13.
针对人工标记数据类别代价太高以及传统聚类方法在处理高维数据时产生的维度效应,提出了一种针对无标签数据的新型模糊核聚类方法。通过将K-means与DBSCAN聚类算法相结合生成关联矩阵,设置约束条件的阈值得到初始聚类结果,并在模糊支持向量数据描述方法的基础上完成聚类过程。通过在网络连接数据的对比实验,验证了该方法的可行性与有效性。  相似文献   

14.
总结了数据挖掘中聚类算法的研究现状,分析比较了它们的差异及局限性。提出了一种新的聚类方法。通过实例得出该方法为数据挖掘提供了有效的平台。  相似文献   

15.
动态加权模糊核聚类算法   总被引:2,自引:0,他引:2  
为了克服噪声特征向量对聚类的影响,充分考虑各特征向量对聚类结果的贡献度的不同,运用mercer核将待聚类的数据映射到高维空间,提出了一种新的动态加权模糊核聚类算法.该算法运用动态加权,自动消弱噪声特征向量在分类中的作用,在对数据没有任何先验信息的情况下,不仅能够准确划分线性数据,而且能够做到非线性划分非团状数据.仿真和实际数据分类结果表明,数据中的噪声对分类结果影响较小,该算法具有很高的实用性.  相似文献   

16.
In this paper, we develop a novel framework, called Monitoring Vehicle Outliers based on a Clustering technique (MVOC), for monitoring vehicle outliers caused by complex vehicle states. The vehicle outlier monitoring is a method to continuously check the current vehicle conditions. Most of previous monitoring methods have conducted simple operations depending on uncomplicated analyses or expected lifetimes in regard to vehicle components. However, many serious vehicle outliers such as turning off during a drive result from the complex vehicle states influenced by correlated components. The proposed method monitors the current vehicle conditions based on not simple components like the previous methods but more complex and various vehicle states using a clustering technique. We perform vehicle data clustering and then analyze the generated clusters with information of vehicle outliers caused by complex correlations of vehicle components. Thus, we can learn vehicle information in more detail. To facilitate MVOC, we also propose related techniques such as sampling cluster data with representative attributes and deciding cluster characteristics on the basis of relations between vehicle data and states. Then, we demonstrate the performance of our approach in terms of monitoring vehicle outliers on the basis of real complex correlations between outliers and vehicle data through various experiments. Experimental results show that the proposed method can not only monitor the complex outliers by predicting their occurrence possibilities in advance but also outperform a standard technique. Moreover, we present statistical significance of the results through significance tests.  相似文献   

17.
一种新的混合核函数支持向量机   总被引:1,自引:0,他引:1  
针对单核函数支持向量机性能的局限性问题,提出将sigmoid核函数与高斯核函数组成一种新的混合核函数支持向量机.高斯核是典型的局部核;sigmoid核在神经网络中被证明具有良好的全局分类性能.新混合核函数结合二者的优点,其支持向量机的分类性能优于由单核函数构成的支持向量机,实验结果表明该方法的有效性.  相似文献   

18.
吴开兴  杨颖  张虎 《微计算机信息》2006,22(13):279-281
本文主要探讨了基于字典的矢量地图压缩中字典的设计问题,提出了一种新颖的基于聚类方式的字典设计方法,它可以使字典更好的近似于某种特定的数据集。实验证明只要字典结构适合,这种基于聚类方式的字典数据压缩技术可获得更好的压缩效果。  相似文献   

19.
In multi-instance learning, the training set is composed of labeled bags each consists of many unlabeled instances, that is, an object is represented by a set of feature vectors instead of only one feature vector. Most current multi-instance learning algorithms work through adapting single-instance learning algorithms to the multi-instance representation, while this paper proposes a new solution which goes at an opposite way, that is, adapting the multi-instance representation to single-instance learning algorithms. In detail, the instances of all the bags are collected together and clustered into d groups first. Each bag is then re-represented by d binary features, where the value of the ith feature is set to one if the concerned bag has instances falling into the ith group and zero otherwise. Thus, each bag is represented by one feature vector so that single-instance classifiers can be used to distinguish different classes of bags. Through repeating the above process with different values of d, many classifiers can be generated and then they can be combined into an ensemble for prediction. Experiments show that the proposed method works well on standard as well as generalized multi-instance problems. Zhi-Hua Zhou is currently Professor in the Department of Computer Science & Technology and head of the LAMDA group at Nanjing University. His main research interests include machine learning, data mining, information retrieval, and pattern recognition. He is associate editor of Knowledge and Information Systems and on the editorial boards of Artificial Intelligence in Medicine, International Journal of Data Warehousing and Mining, Journal of Computer Science & Technology, and Journal of Software. He has also been involved in various conferences. Min-Ling Zhang received his B.Sc. and M.Sc. degrees in computer science from Nanjing University, China, in 2001 and 2004, respectively. Currently he is a Ph.D. candidate in the Department of Computer Science & Technology at Nanjing University and a member of the LAMDA group. His main research interests include machine learning and data mining, especially in multi-instance learning and multi-label learning.  相似文献   

20.
聚类分析在数据挖掘研究中占有重要的位置。聚类结果的可视化则是用图形的方式直观地表现聚类质量的优劣。目前采用的聚类结果可视化方法多为统计学方法,如饼图、柱状图等。但是这些统计学方法只能反映簇与簇之间的数量关系、簇内成分的比例关系,没有具体到每一个对象,没有利用到每个对象所包含的信息。针对上述问题,本文提出三种聚类结果的可视化方法:随机点图、顺序点图、电子云图。其中,随机点图的优点是简单、易于实现;顺序点图的优点是可以反映具体哪一个对象被错分,并且适合动态显示聚类过程;电子云图的优点是可以反映每个对象与相应聚类中心的距离。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号