首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Time profiled association mining is one of the important and challenging research problems that is relatively less addressed. Time profiled association mining has two main challenges that must be addressed. These include addressing i) dissimilarity measure that also holds monotonicity property and can efficiently prune itemset associations ii) approaches for estimating prevalence values of itemset associations over time. The pioneering research that addressed time profiled association mining is by J.S. Yoo using Euclidean distance. It is widely known fact that this distance measure suffers from high dimensionality. Given a time stamped transaction database, time profiled association mining refers to the discovery of underlying and hidden time profiled itemset associations whose true prevalence variations are similar as the user query sequence under subset constraints that include i) allowable dissimilarity value ii) a reference query time sequence iii) dissimilarity function that can find degree of similarity between a temporal itemset and reference. In this paper, we propose a novel dissimilarity measure whose design is a function of product based gaussian membership function through extending the similarity function proposed in our earlier research (G-Spamine). Our approach, MASTER (Mining of Similar Temporal Associations) which is primarily inspired from SPAMINE uses the dissimilarity measure proposed in this paper and support bound estimation approach proposed in our earlier research. Expression for computation of distance bounds of temporal patterns are designed considering the proposed measure and support estimation approach. Experiments are performed by considering naïve, sequential, Spamine and G-Spamine approaches under various test case considerations that study the scalability and computational performance of the proposed approach. Experimental results prove the scalability and efficiency of the proposed approach. The correctness and completeness of proposed approach is also proved analytically.

  相似文献   

2.
In this paper, we address the issue of nonlinear dimensionality reduction to efficiently index spectral audio similarity measures. We propose the embedding of the spectral similarity space to a low-dimensional Euclidean space. This guarantees the triangular inequality and allows the adoption of several indexing schemes. We enlighten the advantages of the proposed indexable method against recently proposed spectral similarity measures that are also indexable. Moreover, our method compares favorably to linear dimensionality reduction methods, like multidimensional scaling (MDS). The proposed method significantly reduces the computation time during the construction process compared to any audio measure and, simultaneously, minimizes the searching cost for similar songs. To the best of our knowledge, the important issue of audio similarity measures’ scalability is addressed for the first time.  相似文献   

3.
We address the handling of time series search based on two important distance definitions: Euclidean distance and time warping distance. The conventional method reduces the dimensionality by means of a discrete Fourier transform. We apply the Haar wavelet transform technique and propose the use of a proper normalization so that the method can guarantee no false dismissal for Euclidean distance. We found that this method has competitive performance from our experiments. Euclidean distance measurement cannot handle the time shifts of patterns. It fails to match the same rise and fall patterns of sequences with different scales. A distance measure that handles this problem is the time warping distance. However, the complexity of computing the time warping distance function is high. Also, as time warping distance is not a metric, most indexing techniques would not guarantee any false dismissal. We propose efficient strategies to mitigate the problems of time warping. We suggest a Haar wavelet-based approximation function for time warping distance, called Low Resolution Time Warping, which results in less computation by trading off a small amount of accuracy. We apply our approximation function to similarity search in time series databases, and show by experiment that it is highly effective in suppressing the number of false alarms in similarity search.  相似文献   

4.
符号化表示是一种有效的时间序列降维技术,其相似性度量是诸多挖掘任务的基础。基于SAX(sym-bolic aggregate approximation)的距离MINDIST_PAA_iSAX不满足对称性,在时间序列挖掘中具有局限性,提出了对称的度量Sym_PAA_SAX,且下界于欧拉距离。在真实数据集和合成数据集上的实验说明下界紧密性较好,相似搜索错报率较低。  相似文献   

5.
一种有效的的时间序列维数约简方法   总被引:3,自引:0,他引:3  
提出了一种用于相似性查询的时间序列维数约简的有效方法 .该方法采用快速小波变换将时间序列分解成不同频率的子带 ,用经过多分辨分解后得到的低频逼近信号重新表示原始序列 .这样将一个高维的时间序列映射到一个低维空间 .这种方法支持欧几理德距离标准和 L -平移欧几理德距离标准 .该算法的时间复杂性为 O(n) .  相似文献   

6.
多维尺度分析(MDS)通常以欧氏空间中点的距离来度量对象间的差异性(相似性)。当对象有像性别、颜色等名义属性时,通常的做法是将它们数量化,然后再对其运用欧氏距离,显然,这种处理方法存在不合理性。将一种混合值差度量(HVDM)引入含名义属性的对象间距离的计算,以改善名义属性下MDS的计算合理性。在UCI Abalone数据集上进行的实验,结果表明该方法比传统的数量化方法在重构能力、重构精确度方面都有更好的表现。  相似文献   

7.
研究基于时间序列的感知QoS的云服务组合,将服务的QoS偏好随时间不断变化的过程纳入云服务组合的研究范围,将云服务组合建模成时间序列的相似度对比问题。分别用欧几里得距离和扩展Frobenius范数距离度量二维时间序列的相似度,继而用基于主成分分析的扩展Frobenius范数距离和欧几里得距离、Brute Force等方法度量多维时间序列的相似度,通过实验对比验证扩展Frobenius范数距离度量相似度在时间和准确性上的优越性。关  相似文献   

8.
This paper reports an experimental result obtained by additionally using unlabeled data together with labeled ones to improve the classification accuracy of dissimilarity-based methods, namely, dissimilarity-based classifications (DBC) [25]. In DBC, classifiers among classes are not based on the feature measurements of individual objects, but on a suitable dissimilarity measure among the objects instead. In order to measure the dissimilarity distance between pairwise objects, an approach using the one-shot similarity (OSS) [30] measuring technique instead of the Euclidean distance is investigated in this paper. In DBC using OSS, the unlabeled set can be used to extend the set of prototypes as well as to compute the OSS distance. The experimental results, obtained with artificial and real-life benchmark datasets, demonstrate that designing the classifiers in the OSS dissimilarity matrices instead of expanding the set of prototypes can further improve the classification accuracy in comparison with the traditional Euclidean approach. Moreover, the results demonstrate that the proposed setting does not work with non-Euclidean data.  相似文献   

9.
局部线性嵌入算法(LLE)中常用欧氏距离度量样本间相似度。而对于图像等高维数据,欧氏距离不能准确体现样本间的相似程度。文中提出基于马氏距离度量的局部线性嵌入算法(MLLE)。算法首先从现有样本中学习到一个马氏度量,然后在LLE算法的近邻选择、现有样本及新样本降维过程中用马氏度量作为相似性度量。将MLLE算法及其它典型的流形学习算法在ORL和USPS数据库上进行对比实验,结果表明MLLE算法具有良好的识别性能。  相似文献   

10.
向量近似方法(vector approximation file)是解决高维索引中维数灾难问题的一种有效方法,但是它不能直接支持二次式距离上的近邻搜索,为此,提出一种基于奇异值分解(SVD)的二次式距离上的向量近似方法,通过奇异值分解技术将二次式距离变换为欧氏距离形式,对变换后的特征向量进行近似得到近似向量。进行近邻搜索时采用低维过滤算法,先在较高能量的低维子空间内计算近似距离进行过滤,再对过滤结果进行高维距离计算。实验结果表明,低维过滤算法可以过滤掉大部分特征向量,而只有小部分数据需要进行高维距离运算,该方法可以显著提高大型高维图像数据库的近邻搜索性能。  相似文献   

11.
Similarity and dissimilarity measures are widely used in many research areas and applications. When a dissimilarity measure is used, it is normally required to be a distance metric. However, when a similarity measure is used, there is no formal requirement. In this article, we have three contributions. First, we give a formal definition of similarity metric. Second, we show the relationship between similarity metric and distance metric. Third, we present general solutions to normalize a given similarity metric or distance metric.  相似文献   

12.
Tree is a data structure used to express various objects such as semistructured data and genes. When objects are represented as trees, computing tree similarity is essential for pattern recognition and retrieval. This paper considers the noisy subsequence tree recognition problem whose purpose is to recognize the original tree, given its noisy subsequence tree. Previous research on this problem relied on constrained tree edit distance to measure the dissimilarity. However, the number of relabelings must be predetermined to compute it. This paper proposes a new dissimilarity measure for this problem. Our dissimilarity measure is obtained by counting the node edit operations included in the unit‐cost tree edit distance that contribute to the matching of node labels. The number of relabelings need not be specified to compute our dissimilarity measure. Moreover, our measure achieves more accurate recognition performance and faster execution speed than the constrained tree edit distance. Our measure is also useful to solve the tree inclusion problem which is the problem of deciding whether a tree includes another tree and shows the extent of approximate tree inclusion when a tree incompletely includes another tree. © 2011 Wiley Periodicals, Inc.  相似文献   

13.
Conventional Fuzzy C-means (FCM) algorithm uses Euclidean distance to describe the dissimilarity between data and cluster prototypes. Since the Euclidean distance based dissimilarity measure only characterizes the mean information of a cluster, it is sensitive to noise and cluster divergence. In this paper, we propose a novel fuzzy clustering algorithm for image segmentation, in which the Mahalanobis distance is utilized to define the dissimilarity measure. We add a new regularization term to the objective function of the proposed algorithm, reflecting the covariance of the cluster. We experimentally demonstrate the effectiveness of the proposed algorithm on a generated 2D dataset and a subset of Berkeley benchmark images.  相似文献   

14.
针对目前数据降维算法受高维空间样本分布影响效果不佳的问题,提出了一种自适应加权的t分布随机近邻嵌入(t-SNE)算法。该算法对两样本点在高维空间中的欧氏距离进行归一化后按距离的不同分布状况进行分组分析,分别按照近距离、较近距离和远距离三种情况在计算高维空间内样本点间的相似概率时进行自适应加权处理,以加权相对距离代替欧氏绝对距离,从而更真实地度量每一组不同样本在高维空间的相似程度。在高维脑网络状态观测矩阵中的降维实验结果表明,自适应加权t-SNE的降维聚类可视化效果优于其它降维算法,与传统t-SNE算法相比,聚类指标值DBI值平均降低了28.39%,DI值平均提高了161.84%,并且有效地消除了分散、交叉和散点等问题。  相似文献   

15.
In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized variants according to the Maximum Likelihood theory. These measures can provide a more accurate feature model than the classical Euclidean and Manhattan distances. We also find that the feature elements are often from heterogeneous sources that may have different influence on similarity estimation. Therefore, the assumption of single isotropic distribution model is often inappropriate. To alleviate this problem, we use a boosted distance measure framework that finds multiple distance measures which fit the distribution of selected feature elements best for accurate similarity estimation. The new distance measures for similarity estimation are tested on two applications: stereo matching and motion tracking in video sequences. The performance of boosted distance measure is further evaluated on several benchmark data sets from the UCI repository and two image retrieval applications. In all the experiments, robust results are obtained based on the proposed methods.  相似文献   

16.
多维时序数据中的相似子序列搜索研究   总被引:4,自引:0,他引:4  
由于动态时间弯曲距离较之欧氏距离有更好鲁棒性,因此被广泛用作时序数据相似子序列搜索研究领域中的相似性度量.在单一维度上的相似子序列搜索可能不能获得足够的匹配结果作为继续深入分析的依据,因此通过引入在多维数据分析中常用的数据立方体模型将相似子序列搜索问题扩展到了多维场景之下,从而在多个维度上得到搜索结果以获取更多有价值的知识.在此基础上利用数据立方体相邻层次单元间的相关性对基本的搜索算法进行了改进,在保证准确性的基础上提高了搜索效率.在真实网络安全数据集上的实验验证了所提方法的有效性.  相似文献   

17.
利用PCA进行深度学习图像特征提取后的降维研究   总被引:1,自引:0,他引:1  
深度学习是当前人工智能领域广泛使用的一种机器学习方法.深度学习对数据的高度依赖性使得数据需要处理的维度剧增,极大地影响了计算效率和数据分类性能.本文以数据降维为研究目标,对深度学习中的各种数据降维方法进行分析.在此基础上,以Caltech 101图像数据集为实验对象,采用VGG-16深度卷积神经网络进行图像的特征提取,以PCA主成分分析方法为例来实现高维图像特征数据的降维处理.在实验阶段,采用欧氏距离作为相似性度量来检验经过降维处理后的精度指标.实验证明:当提取VGG-16神经网络fc3层的4096维特征后,使用PCA法将数据维度降至64维,依然能够保持较高的特征信息.  相似文献   

18.
We introduce a new representation for time series, the Multiresolution Vector Quantized (MVQ) approximation, along with a distance function. Similar to Discrete Wavelet Transform, MVQ keeps both local and global information about the data. However, instead of keeping low-level time series values, it maintains high-level feature information (key subsequences), facilitating the introduction of more meaningful similarity measures. The method is fast and scales linearly with the database size and dimensionality. Contrary to previous methods, the vast majority of which use the Euclidean distance, MVQ uses a multiresolution/hierarchical distance function. In our experiments, the proposed technique consistently outperforms the other major methods.  相似文献   

19.
对离线的文本无关的笔迹鉴别进行研究,结合维吾尔文文字连写多、字形复杂等特点,采用基于概率分布函数的微结构特征笔迹鉴别,提出一种维吾尔文的笔迹鉴别方法。该方法对笔迹中局部细微结构的书写变化趋势进行描述,运用欧氏距离和Manhattan距离度量方法进行笔迹特征匹配。对120份维吾尔族学生的笔迹样本进行测试,结果表明,该方法能有效提高维吾尔文笔迹鉴别的正确率。  相似文献   

20.
In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号