首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining and visualization of time profiled temporal associations is an important research problem that is not addressed in a wider perspective and is understudied. Visual analysis of time profiled temporal associations helps to better understand hidden seasonal, emerging, and diminishing temporal trends. The pioneering work by Yoo and Shashi Sekhar termed as SPAMINE applied the Euclidean distance measure. Following their research, subsequent studies were only restricted to the use of Euclidean distance. However, with an increase in the number of time slots, the dimensionality of a prevalence time sequence of temporal association, also increases, and this high dimensionality makes the Euclidean distance not suitable for the higher dimensions. Some of our previous studies, proposed Gaussian based dissimilarity measures and prevalence estimation approaches to discover time profiled temporal associations. To the best of our knowledge, there is no research that has addressed a similarity measure which is based on the standard score and normal probability to find the similarity between temporal patterns in z-space and retains monotonicity. Our research is pioneering work in this direction. This research has three contributions. First, we introduce a novel similarity (or dissimilarity) measure, SRIHASS to find the similarity between temporal associations. The basic idea behind the design of dissimilarity measure is to transform support values of temporal associations onto z-space and then obtain probability sequences of temporal associations using a normal distribution chart. The dissimilarity measure uses these probability sequences to estimate the similarity between patterns in z-space. The second contribution is the prevalence bound estimation approach. Finally, we give the algorithm for time profiled associating mining called Z-SPAMINE that is primarily inspired from SPAMINE. Experiment results prove that our approach, Z-SPAMINE is computationally more efficient and scalable compared to existing approaches such as Naïve, Sequential and SPAMINE that applies the Euclidean distance.  相似文献   

2.
3.
基于粗糙集的改进K—Modes聚类算法   总被引:3,自引:0,他引:3  
传统的K-Modes算法采用简单匹配的方法来计算对象之间的距离,并没有充分考虑同一属性下的两个不同值之间的相似性.基于粗糙集中的上、下近似,提出了一种新的距离度量,并重新定义了类中心,对传统K-Modes算法进行了改进.与其他改进K-Modes算法进行了比较,实验结果表明,基于粗糙集的改进K-Modes算法有效地提高了聚类精度.  相似文献   

4.
为解决社会关系网络图中节点没有坐标值、不能采用传统的欧几里得距离和曼哈坦距离进行聚类的问题,提出采用最短路径算法,来衡量点与点之间的相异度.针对最短路径算法具有时间复杂度大的缺点,引入基于参考节点嵌入的最短距离估算思想来估算两点之间的近似距离.在此基础上,针对DBLP数据集构成的社会关系网络图进行聚类,使用基于划分的k-medoids算法,分别采用以上两种距离算法,比较其优劣.实验证明改进后的算法和最短路径算法中的Dijkstra 算法相比,距离误差率小,时间复杂度大大降低,在提高效率的同时,取得了同样好的聚类效果.  相似文献   

5.
This paper reports an experimental result obtained by additionally using unlabeled data together with labeled ones to improve the classification accuracy of dissimilarity-based methods, namely, dissimilarity-based classifications (DBC) [25]. In DBC, classifiers among classes are not based on the feature measurements of individual objects, but on a suitable dissimilarity measure among the objects instead. In order to measure the dissimilarity distance between pairwise objects, an approach using the one-shot similarity (OSS) [30] measuring technique instead of the Euclidean distance is investigated in this paper. In DBC using OSS, the unlabeled set can be used to extend the set of prototypes as well as to compute the OSS distance. The experimental results, obtained with artificial and real-life benchmark datasets, demonstrate that designing the classifiers in the OSS dissimilarity matrices instead of expanding the set of prototypes can further improve the classification accuracy in comparison with the traditional Euclidean approach. Moreover, the results demonstrate that the proposed setting does not work with non-Euclidean data.  相似文献   

6.
非相似度保持投影   总被引:1,自引:0,他引:1       下载免费PDF全文
由主成分分析(PCA)可知任何一幅人脸图像都可以通过一组特征脸的线性加权来重构,PCA是最小均方误差意义下图像的最优表示,但是传统的PCA最终只通过比较加权系数的欧氏距离来进行识别,没有考虑残差。因此,提出非相似尺度的概念,将两个样本同时投影到相同向量上,在确定它们关系时既考虑投影系数,也考虑重构所产生的残差。两者的投影系数和残差相差越大,说明这两个样本越不相似。和保局投影(LPP)有所不同,非相似度保持投影算法不必预先设定近邻个数,它是利用非相似度的概念,创建非相似度散布矩阵,最终通过最大化目标函数获取最优子空间。在AR库和Feret库上的实验结果证明了该方法的有效性。  相似文献   

7.
提出度量多个集合之间总体差异程度的拓展集合差异度及相关定理,并给出一种新的解决分类属性高维数据聚类问题的CAESD算法。基于拓展集合差异度及拓展集合特征向量,在CABOSFV_C聚类的基础上通过两阶段聚类完成全部聚类过程。采用UCI数据集与K-modes及其改进算法、CABOSFV_C算法进行比较实验,结果表明CAESD算法具有较高的聚类正确率。  相似文献   

8.
基于流形距离的人工免疫无监督分类与识别算法   总被引:3,自引:0,他引:3  
将一种新的流形距离作为相似性度量测度, 提出了一种用于无监督分类与识别的人工免疫系统方法. 通过基于流形距离的相似性度量, 有效利用样本集固有的全局一致性信息, 充分挖掘无类属样本的空间分布信息, 对样本进行类别划分. 新方法将免疫响应过程建模为一个四元组 AIR=(G,I,R,A) , 其中 G 为引发免疫响应的外界刺激, 即抗原; I 为所有可能抗体的集合; R 为抗体间相互作用的规则集合; A 为支配抗体反应、指导抗体进化的动态算法. 针对无监督分类问题, 将抗体编码为代表各类别的典型样本序号的排列, 利用动态算法 A 搜索能代表各类别的典型样本的最佳组合. 将新方法与标准的 K-均值算法、基于流形距离的进化聚类算法以及 Maulik 等人提出的基于遗传算法的聚类算法进行了性能比较. 对 6 个人工数据集及手写体数字识别问题的仿真实验结果显示, 新方法对样本空间分布复杂的无监督分类问题和实际的模式识别问题具有较高的准确率和较好的鲁棒性.  相似文献   

9.
多维尺度分析(MDS)通常以欧氏空间中点的距离来度量对象间的差异性(相似性)。当对象有像性别、颜色等名义属性时,通常的做法是将它们数量化,然后再对其运用欧氏距离,显然,这种处理方法存在不合理性。将一种混合值差度量(HVDM)引入含名义属性的对象间距离的计算,以改善名义属性下MDS的计算合理性。在UCI Abalone数据集上进行的实验,结果表明该方法比传统的数量化方法在重构能力、重构精确度方面都有更好的表现。  相似文献   

10.
In this paper, a new matching pursuits dissimilarity measure (MPDM) is presented that compares two signals using the information provided by their matching pursuits (MP) approximations, without requiring any prior domain knowledge. MPDM is a flexible and differentiable measure that can be used to perform shape-based comparisons and fuzzy clustering of very high-dimensional, possibly compressed, data. A novel prototype based classification algorithm, which is termed the computer aided minimization procedure (CAMP), is also proposed. The CAMP algorithm uses the MPDM with the competitive agglomeration (CA) fuzzy clustering algorithm to build reliable shape based prototypes for classification. MP is a well known sparse signal approximation technique, which is commonly used for video and image coding. The dictionary and coefficient information produced by MP has previously been used to define features to build discrimination and prototype based classifiers. However, existing MP based classification applications are quite problem domain specific, thus making their generalization to other problems quite difficult. The proposed CAMP algorithm is the first MP based classification system that requires no assumptions about the problem domain and builds a bridge between the MP and fuzzy clustering algorithms. Experimental results also show that the CAMP algorithm is more resilient to outliers in test data than the multilayer perceptron (MLP) and support-vector-machine (SVM) classifiers, as well as prototype-based classifiers using the Euclidean distance as their dissimilarity measure.  相似文献   

11.

Time profiled association mining is one of the important and challenging research problems that is relatively less addressed. Time profiled association mining has two main challenges that must be addressed. These include addressing i) dissimilarity measure that also holds monotonicity property and can efficiently prune itemset associations ii) approaches for estimating prevalence values of itemset associations over time. The pioneering research that addressed time profiled association mining is by J.S. Yoo using Euclidean distance. It is widely known fact that this distance measure suffers from high dimensionality. Given a time stamped transaction database, time profiled association mining refers to the discovery of underlying and hidden time profiled itemset associations whose true prevalence variations are similar as the user query sequence under subset constraints that include i) allowable dissimilarity value ii) a reference query time sequence iii) dissimilarity function that can find degree of similarity between a temporal itemset and reference. In this paper, we propose a novel dissimilarity measure whose design is a function of product based gaussian membership function through extending the similarity function proposed in our earlier research (G-Spamine). Our approach, MASTER (Mining of Similar Temporal Associations) which is primarily inspired from SPAMINE uses the dissimilarity measure proposed in this paper and support bound estimation approach proposed in our earlier research. Expression for computation of distance bounds of temporal patterns are designed considering the proposed measure and support estimation approach. Experiments are performed by considering naïve, sequential, Spamine and G-Spamine approaches under various test case considerations that study the scalability and computational performance of the proposed approach. Experimental results prove the scalability and efficiency of the proposed approach. The correctness and completeness of proposed approach is also proved analytically.

  相似文献   

12.
A new data clustering algorithm Density oriented Kernelized version of Fuzzy c-means with new distance metric (DKFCM-new) is proposed. It creates noiseless clusters by identifying and assigning noise points into separate cluster. In an earlier work, Density Based Fuzzy C-Means (DOFCM) algorithm with Euclidean distance metric was proposed which only considered the distance between cluster centroid and data points. In this paper, we tried to improve the performance of DOFCM by incorporating a new distance measure that has also considered the distance variation within a cluster to regularize the distance between a data point and the cluster centroid. This paper presents the kernel version of the method. Experiments are done using two-dimensional synthetic data-sets, standard data-sets referred from previous papers like DUNN data-set, Bensaid data-set and real life high dimensional data-sets like Wisconsin Breast cancer data, Iris data. Proposed method is compared with other kernel methods, various noise resistant methods like PCM, PFCM, CFCM, NC and credal partition based clustering methods like ECM, RECM, CECM. Results shown that proposed algorithm significantly outperforms its earlier version and other competitive algorithms.  相似文献   

13.
王治和  王淑艳  杜辉 《计算机工程》2021,47(5):88-96,103
模糊C均值(FCM)聚类算法无法识别非凸数据,算法中基于欧式距离的相似性度量只考虑数据点之间的局部一致性特征而忽略了全局一致性特征。提出一种利用密度敏感距离度量创建相似度矩阵的FCM算法。通过近邻传播算法获取粗类数作为最佳聚类数的搜索范围上限,以解决FCM算法聚类数目需要人为预先设定和随机选定初始聚类中心造成聚类结果不稳定的问题。在此基础上,改进最大最小距离算法,得到具有代表性的样本点作为初始聚类中心,并结合轮廓系数自动确定最佳聚类数。基于UCI数据集和人工数据集的实验结果表明,相比经典FCM、K-means和CFSFDP算法,该算法不仅具有识别复杂非凸数据的能力,而且能够在保证聚类性能和稳定性的前提下加快收敛速度。  相似文献   

14.
15.
邱兴兴  程霄 《计算机应用》2013,33(9):1001-9081
针对空间分布复杂的数据以及空间分布未知的现实数据聚类问题,设计了一种改进流形距离作为不相似测度。该不相似测度可有效利用所有数据点之间的全局一致性,挖掘无类属数据集的空间分布信息。通过使用该不相似测度,提出了基于改进流形距离K-medoids算法。将新算法与基于已有的流形距离和基于欧氏距离的K-medoids算法进行性能比较,对八个人工数据集以及USPS手写体数字识别问题的实验结果表明:新算法针对不同结构的测试数据集,在聚类性能上均优于或接近于另外两种K-medoids算法,并且对于各种分布的,无论简单或复杂,凸或者非凸的数据都可以进行聚类。  相似文献   

16.
One of the critical aspects of clustering algorithms is the correct identification of the dissimilarity measure used to drive the partitioning of the data set. The dissimilarity measure induces the cluster shape and therefore determines the success of clustering algorithms. As cluster shapes change from a data set to another, dissimilarity measures should be extracted from data. To this aim, we exploit some pairs of points with known dissimilarity value to teach a dissimilarity relation to a feed-forward neural network. Then, we use the neural dissimilarity measure to guide an unsupervised relational clustering algorithm. Experiments on synthetic data sets and on the Iris data set show that the relational clustering algorithm based on the neural dissimilarity outperforms some popular clustering algorithms (with possible partial supervision) based on spatial dissimilarity.  相似文献   

17.
Nonlinear kernel-based feature extraction algorithms have recently been proposed to alleviate the loss of class discrimination after feature extraction. When considering image classification, a kernel function may not be sufficiently effective if it depends only on an information resource from the Euclidean distance in the original feature space. This study presents an extended radial basis kernel function that integrates multiple discriminative information resources, including the Euclidean distance, spatial context, and class membership. The concepts related to Markov random fields (MRFs) are exploited to model the spatial context information existing in the image. Mutual closeness in class membership is defined as a similarity measure with respect to classification. Any dissimilarity from the additional information resources will improve the discrimination between two samples that are only a short Euclidean distance apart in the feature space. The proposed kernel function is used for feature extraction through linear discriminant analysis (LDA) and principal component analysis (PCA). Experiments with synthetic and natural images show the effectiveness of the proposed kernel function with application to image classification.  相似文献   

18.
基于新的相异度量的模糊K-Modes聚类算法   总被引:3,自引:2,他引:1       下载免费PDF全文
白亮  曹付元  梁吉业 《计算机工程》2009,35(16):192-194
传统的模糊K-Modes聚类算法采用简单匹配方法度量对象与Mode之间的相异程度,没有充分考虑Mode对类的代表程度,容易造成信息的丢失,弱化了类内的相似性。针对上述问题,通过对象对类的隶属度反映Mode对类的代表程度,提出一种新的相异度量,并将它应用于传统的模糊K—Modes聚类算法。与传统的K—Modes和模糊K-Modes聚类算法相比,该相异度量是有效的。  相似文献   

19.
K-means聚类算法简单高效,应用广泛。针对传统K-means算法初始聚类中心点的选择随机性导致算法易陷入局部最优以及K值需要人工确定的问题,为了得到最合适的初始聚类中心,提出一种基于距离和样本权重改进的K-means算法。该聚类算法采用维度加权的欧氏距离来度量样本点之间的远近,计算出所有样本的密度和权重后,令密度最大的点作为第一个初始聚类中心,并剔除该簇内所有样本,然后依次根据上一个聚类中心和数据集中剩下样本点的权重并通过引入的参数[τi]找出下一个初始聚类中心,不断重复此过程直至数据集为空,最后自动得到[k]个初始聚类中心。在UCI数据集上进行测试,对比经典K-means算法、WK-means算法、ZK-means算法和DCK-means算法,基于距离和权重改进的K-means算法的聚类效果更好。  相似文献   

20.
K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号