首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
朱二周  孙悦  张远翔  高新  马汝辉  李学俊 《软件学报》2021,32(10):3085-3103
聚类分析是统计学、模式识别和机器学习等领域的研究热点.通过有效的聚类分析,数据集的内在结构与特征可以被很好地发掘出来.然而,无监督学习的特性使得当前已有的聚类方法依旧面临着聚类效果不稳定、无法对多种结构的数据集进行正确聚类等问题.针对这些问题,首先将K-means算法和层次聚类算法的聚类思想相结合,提出了一种混合聚类算法K-means-AHC;其次,采用拐点检测的思想,提出了一个基于平均综合度的新聚类有效性指标DAS(平均综合度之差,difference of average synthesis degree),以此来评估K-means-AHC算法聚类结果的质量;最后,将K-means-AHC算法和DAS指标相结合,设计了一种寻找数据集最佳类簇数和最优划分的有效方法.实验将K-means-AHC算法用于测试多种结构的数据集,结果表明:该算法在不过多增加时间开销的同时,提高了聚类分析的准确性.与此同时,新的DAS指标在聚类结果的评价上要优于当前已有的常用聚类有效性指标.  相似文献   

2.
K-means算法最佳聚类数确定方法   总被引:10,自引:0,他引:10  
K-means聚类算法是以确定的类数k为前提对数据集进行聚类的,通常聚类数事先无法确定。从样本几何结构的角度设计了一种新的聚类有效性指标,在此基础上提出了一种新的确定K-means算法最佳聚类数的方法。理论研究和实验结果验证了以上算法方案的有效性和良好性能。  相似文献   

3.
We propose an internal cluster validity index for a fuzzy c-means algorithm which combines a mathematical model for the fuzzy c-partition and a heuristic search for the number of clusters in the data. Our index resorts to information theoretic principles, and aims to assess the congruence between such a model and the data that have been observed. The optimal cluster solution represents a trade-off between discrepancy and the complexity of the underlying fuzzy c-partition. We begin by testing the effectiveness of the proposed index using two sets of synthetic data, one comprising a well-defined cluster structure and the other containing only noise. Then we use datasets arising from real life problems. Our results are compared to those provided by several available indices and their goodness is judged by an external measure of similarity. We find substantial evidence supporting our index as a credible alternative to the cluster validation problem, especially when it concerns structureless data.  相似文献   

4.
确定数据集的最佳聚类数是聚类研究中的一个重要难题。为了更有效地确定数据集的最佳聚类数,该文提出了通过改进K-means算法并结合一个不依赖于具体算法的有效性指标Q(c)对数据集的最佳聚类数进行确定的方法。理论分析和实验结果证明了该方法具有良好的性能和有效性。  相似文献   

5.
一种基于近邻传播算法的最佳聚类数确定方法   总被引:2,自引:0,他引:2  
在聚类分析中,决定聚类质量的关键是确定最佳聚类数,对此,从样本几何结构的角度定义了样本聚类距离和样本聚类离差距离,设计了一种新的聚类有效性指标.在此基础上,提出一种基于近邻传播算法确定样本最佳聚类数的方法.理论研究和实验结果表明,所提出的指标和方法能够有效地对聚类结果进行评估,适合于确定样本的最佳聚类数.  相似文献   

6.
模糊聚类是模式识别、机器学习和图像处理等领域的重要研究内容。模糊C-均值聚类算法是最常用的模糊聚类实现算法,该算法需要预先给定聚类数才能对数据集进行聚类。提出了一种新的聚类有效性指标,对聚类结果进行有效性验证。该指标从划分熵、隶属度、几何结构角度,定义了紧凑度、分离度、重叠度三个重要特征测量。在此基础上,提出了一种最佳聚类数确定方法。将新聚类有效性指标和传统有效性指标在6个人工数据集和3个真实数据集进行实验验证。实验结果表明,所提出的指标和方法能够有效地对聚类结果进行评估,适合确定样本的最佳聚类数。  相似文献   

7.
在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。  相似文献   

8.
A new cluster validity index is proposed that determines the optimal partition and optimal number of clusters for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index exploits an overlap measure and a separation measure between clusters. The overlap measure, which indicates the degree of overlap between fuzzy clusters, is obtained by computing an inter-cluster overlap. The separation measure, which indicates the isolation distance between fuzzy clusters, is obtained by computing a distance between fuzzy clusters. A good fuzzy partition is expected to have a low degree of overlap and a larger separation distance. Testing of the proposed index and nine previously formulated indexes on well-known data sets showed the superior effectiveness and reliability of the proposed index in comparison to other indexes.  相似文献   

9.
A measurement of cluster quality is often needed for DNA microarray data analysis. In this paper, we introduce a new cluster validity index, which measures geometrical features of the data. The essential concept of this index is to evaluate the ratio between the squared total length of the data eigen-axes with respect to the between-cluster separation. We show that this cluster validity index works well for data that contain clusters closely distributed or with different sizes. We verify the method using three simulated data sets, two real world data sets and two microarray data sets. The experiment results show that the proposed index is superior to five other cluster validity indices, including partition coefficients (PC), General silhouette index (GS), Dunn’s index (DI), CH Index and I-Index. Also, we have given a theorem to show for what situations the proposed index works well.  相似文献   

10.
We present a categorical logic formulation of induction and coinduction principles for reasoning about inductively and coinductively defined types. Our main results provide sufficient criteria for the validity of such principles: in the presence of comprehension, the induction principle for initial algebras is admissible, and dually, in the presence of quotient types, the coinduction principle for terminal coalgebras is admissible. After giving an alternative formulation of induction in terms of binary relations, we combine both principles and obtain a mixed induction/coinduction principle which allows us to reason about minimal solutionsXσ(X) whereXmay occur both positively and negatively in the type constructor σ. We further strengthen these logical principles to deal with contexts and prove that such strengthening is valid when the (abstract) logic we consider is contextually/functionally complete. All the main results follow from a basic result about adjunctions between “categories of algebras” (inserters).  相似文献   

11.
NeuroIS—the methods and knowledge of neuroscience applied to the information systems (IS) domain—has become an established research field within the IS discipline. A key advantage of NeuroIS is its ability to provide insights into human cognition beyond those obtained using behavioural techniques alone. Nevertheless, in neuroscience, there is renewed interest in examining behaviour together with neurophysiological methods to better inform our understanding of neural processes. In this research opinion article, we argue that in the field of NeuroIS, there is an opportunity for hybrid programs of study that combine neurophysiological and behavioural methods in a complementary manner. We outline four strategies for designing complementary neurophysiological and behavioural experiments in a research program: (1) observe the relationship between neural processes and behavioural change; (2) combine neurophysiological and behavioural methods to enhance internal, external, and ecological validity; (3) extend, rather than replicate, experiments based on theory; and (4) use neurophysiological and behavioural experiments together to evaluate IT artefact design. By applying these strategies, researchers can more effectively design programs using complementary neurophysiological and behavioural methods, which, in turn, can help to provide richer insights into the phenomena under study as well as accelerate the advancement of IS knowledge.  相似文献   

12.
基于近邻传播算法的最佳聚类数确定方法比较研究   总被引:2,自引:0,他引:2  
在聚类分析中,决定聚类质量的关键是确定最佳聚类数.提出采用聚类效果较好的近邻传播聚类算法对样本进行聚类,运用6种聚类有效性指标分别对聚类结果进行有效性分析,以确定最佳聚类数.具体分析了这些有效性指标,并改进了IGP指标确定最佳聚类数的方法.针对8个数据集,通过实验比较这些指标的性能.分析和实验结果表明,基于近邻传播聚类...  相似文献   

13.
《Ergonomics》2012,55(3):404-420
Data from on-road and simulation studies were compared to assess the validity of measures generated in the simulator. In the on-road study, driver interaction with three manual address entry methods (keypad, touch screen and rotational controller) was assessed in an instrumented vehicle to evaluate relative usability and safety implications. A separate group of participants drove a similar protocol in a medium fidelity, fixed-base driving simulator to assess the extent to which simulator measures mirrored those obtained in the field. Visual attention and task measures mapped very closely between the two environments. In general, however, driving performance measures did not differentiate among devices at the level of demand employed in this study. The findings obtained for visual attention and task engagement suggest that medium fidelity simulation provides a safe and effective means to evaluate the effects of in-vehicle information systems (IVIS) designs on these categories of driver behaviour.

Statement of Relevance: Realistic evaluation of the user interface of IVIS has significant implications for both user acceptance and safety. This study addresses the validity of driving simulation for accurately modelling differences between interface methodologies by comparing results from the field with those from a medium fidelity, fixed-base simulator.  相似文献   

14.
Cluster validity indices are used for estimating the quality of partitions produced by clustering algorithms and for determining the number of clusters in data. Cluster validation is difficult task, because for the same data set more partitions exists regarding the level of details that fit natural groupings of a given data set. Even though several cluster validity indices exist, they are inefficient when clusters widely differ in density or size. We propose a clustering validity index that addresses these issues. It is based on compactness and overlap measures. The overlap measure, which indicates the degree of overlap between fuzzy clusters, is obtained by calculating the overlap rate of all data objects that belong strongly enough to two or more clusters. The compactness measure, which indicates the degree of similarity of data objects in a cluster, is calculated from membership values of data objects that are strongly enough associated to one cluster. We propose ratio and summation type of index using the same compactness and overlap measures. The maximal value of index denotes the optimal fuzzy partition that is expected to have a high compactness and a low degree of overlap among clusters. Testing many well-known previously formulated and proposed indices on well-known data sets showed the superior reliability and effectiveness of the proposed index in comparison to other indices especially when evaluating partitions with clusters that widely differ in size or density.  相似文献   

15.
孙秀娟  刘希玉 《计算机应用》2008,28(12):3244-3247
在K-means算法中,聚类数k是影响聚类质量的关键因素之一。目前,已经提出了许多确定最佳k值的聚类有效性方法,但这些方法都不能很好地处理两种数据集:类(簇)密度不同的数据集和类间距比较小的数据集(含有合并簇的数据集)。为此,提出了一种新的聚类有效性函数,该函数定义为数据特征轴总长度的平方与最小类间距的比值,最佳聚类数为这个比值达到最小时对应的k值。同时,为减小K-means算法对噪声和孤立点数据的敏感性,使用了基于加权的改进K-平均的方法计算类中心。实验证明,与其他算法相比,基于新聚类有效性函数的K-wmeans算法不仅降低了噪声和孤立点数据对聚类结果的影响,而且能有效地处理上面提到的两种数据集,明显提高了数据聚类质量。  相似文献   

16.
The upper bound of the optimal number of clusters in fuzzy clustering   总被引:7,自引:0,他引:7  
The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤n~(1/n), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method.  相似文献   

17.
给出PASCAL过程蓝图逻辑结点到抽象逻辑结构图概念结点的逆向映射规则,并通过构造与双向映射规则集表示等价的双向映射关系图,揭示了两个映射规则集之间的关系和特性。在定义概念层表示与PASCAL逻辑层表示之间双向映射函数的有效性概念基础上,进一步给出双向映射函数的有效性定理。  相似文献   

18.
李洁  高新波  焦李成 《控制与决策》2004,19(11):1250-1254
提出一种模糊CLOPE算法,并定义了修正划分模糊度,将其作为新的聚类有效性函数来实现参数的自动优选.对真实数据测试的实验结果表明,模糊CLOPE算法以及基于修正划分模糊度的参数优选方法是非常有效的.  相似文献   

19.
Oracle数据库备份文件有效性检测设计方案   总被引:1,自引:0,他引:1  
从Oracle数据库的备份文件是否有效的角度出发,针对企业往往只注重如何制定完善的生产数据备份设计方案,而忽视对备份数据有效性进行检测这一问题。依据备份检测原理并考虑实际生产环境,设计了Oracle数据库备份文件的有效性检测设计方案。最后,给出一个针对某实际企业资金管理系统数据备份有效性检测的案例。  相似文献   

20.
A fuzzy reinforcement learning (FRL) scheme which is based on the principles of sliding-mode control and fuzzy logic is proposed. The FRL uses only immediate reward. Sufficient conditions for the convergence of the FRL to the optimal task performance are studied. The validity of the method is tested through simulation examples of a robot which deburrs a metal surface.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号