首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recognition of noisy subsequences using constrained edit distances   总被引:1,自引:0,他引:1  
Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We consider the problem of estimating X* by processing Y, which is a noisy version of U. We do this by defining the constrained edit distance between XH and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed. An algorithm to compute this constrained edit distance has been presented. Although in general the algorithm has a cubic time complexity, within the framework of our solution the algorithm possesses a quadratic time complexity. Recognition using the constrained edit distance as a criterion demonstrates remarkable accuracy. Experimental results which involve strings of lengths between 40 and 80 and which contain an average of 26.547 errors per string demonstrate that the scheme has about 99.5 percent accuracy.  相似文献   

2.
Curve matching is one instance of the fundamental correspondence problem. Our flexible algorithm is designed to match curves under substantial deformations and arbitrary large scaling and rigid transformations. A syntactic representation is constructed for both curves and an edit transformation which maps one curve to the other is found using dynamic programming. We present extensive experiments where we apply the algorithm to silhouette matching. In these experiments, we examine partial occlusion, viewpoint variation, articulation, and class matching (where silhouettes of similar objects are matched). Based on the qualitative syntactic matching, we define a dissimilarity measure and we compute it for every pair of images in a database of 121 images. We use this experiment to objectively evaluate our algorithm. First, we compare our results to those reported by others. Second, we use the dissimilarity values in order to organize the image database into shape categories. The veridical hierarchical organization stands as evidence to the quality of our matching and similarity estimation  相似文献   

3.
The normalized string editing problem revisited   总被引:1,自引:0,他引:1  
Marzal and Vidal (1993) considered the problem of computing the normalized edit distance between two strings, and reported experimental results which demonstrated the use of the measure to recognize hand-written characters. Their paper formulated the theoretical properties of the measure and developed two algorithms to compute it. In this short communication the authors demonstrate how this measure is related to an auxiliary measure already defined in the literature-the inter-string constrained edit distance. Since the normalized edit distance can be computed efficiently using the latter, the analytic and experimental results reported in the above paper can be obtained just as accurately, but more efficiently, using the strategies presented here  相似文献   

4.
Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.  相似文献   

5.
In this paper, we propose a formal definition of a new class of trees called semi-ordered trees and a polynomial dynamic programming algorithm to compute a constrained edit distance between such trees. The core of the method relies on a similar approach to compare unordered [Kaizhong Zhang, A constrained edit distance between unordered labeled trees, Algorithmica 15 (1996) 205–222] and ordered trees [Kaizhong Zhang, Algorithms for the constrained editing distance between ordered labeled trees and related problems, Pattern Recognition 28 (3) (1995) 463–474]. The method is currently applied to evaluate the similarity between architectures of apple trees [Vincent Segura, Aida Ouangraoua, Pascal Ferraro, Evelyne Costes, Comparison of tree architecture using tree edit distances: Application to two-year-old apple tree, Euphytica 161 (2007) 155–164].  相似文献   

6.
Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching   总被引:1,自引:0,他引:1  
In a way similar to the string-to-string correction problem, we address discrete time series similarity in light of a time-series-to-time-series-correction problem for which the similarity between two time series is measured as the minimum cost sequence of edit operations needed to transform one time series into another. To define the edit operations, we use the paradigm of a graphical editing process and end up with a dynamic programming algorithm that we call time warp edit distance (TWED). TWED is slightly different in form from dynamic time warping (DTW), longest common subsequence (LCSS), or edit distance with real penalty (ERP) algorithms. In particular, it highlights a parameter that controls a kind of stiffness of the elastic measure along the time axis. We show that the similarity provided by TWED is a potentially useful metric in time series retrieval applications since it could benefit from the triangular inequality property to speed up the retrieval process while tuning the parameters of the elastic measure. In that context, a lower bound is derived to link the matching of time series into down sampled representation spaces to the matching into the original space. The empiric quality of the TWED distance is evaluated on a simple classification task. Compared to edit distance, DTW, LCSS, and ERP, TWED has proved to be quite effective on the considered experimental task.  相似文献   

7.
A method to restore stochastic monotonicity of noisy multi-criteria data sets through relabeling is presented. By formulating the problem as a weighted maximum independent set problem on a comparability graph, it is possible to compute optimal relabelings w.r.t. cumulative label frequency loss function. We demonstrate how to formulate the problem in this manner and discuss why it requires objects to be relabeled instead of deleted. More precisely, we will formulate the zero-one cumulative label frequency loss, L1 cumulative label frequency loss and squared cumulative label frequency loss, and provide a weighing function for each. We investigate these loss functions in the related context of restoring regular monotonicity, dealing with objects with a single label, rather than distributions. Finally, we provide applications on some closely related example data sets and discuss some interesting findings.  相似文献   

8.
Finding a sequence of edit operations that transforms one string of symbols into another with the minimum cost is a well-known problem. The minimum cost, or edit distance, is a widely used measure of the similarity of two strings. An important parameter of this problem is the cost function, which specifies the cost of each insertion, deletion, and substitution. We show that cost functions having the same ratio of the sum of the insertion and deletion costs divided by the substitution cost yield the same minimum cost sequences of edit operations. This leads to a partitioning of the universe of cost functions into equivalence classes. Also, we show the relationship between a particular set of cost functions and the longest common subsequence of the input strings. This work was supported in part by the U.S. Department of Defense and the U.S. Department of Energy.  相似文献   

9.
两字符串的编辑距离是从一个串转换到另一个串所需要的最少基本操作数。编辑距离广泛应用于字符串近似匹配、字符串相似连接等领域。动态规划法利用编辑距离矩阵来计算两个串的编辑距离,需要计算矩阵中的所有元素,时间效率低。改进的方法改变了矩阵中元素的计算次序,减少了需要比对的元素,但仍需要比对一半以上的元素,时间效率还有待提高。提出基于基本操作序列的编辑距离顺序验证方法。首先,分析了基本操作序列的可列性,给出了列举基本操作序列的方法。然后依次顺序验证基本操作数从小到大的基本操作序列直到某一序列通过验证,得到其编辑距离。在阈值为2的字符串近似搜索实验中发现,所提方法比动态规划类方法具有更高的效率。  相似文献   

10.
系统发生树代表了不同物种之间进化关系的历史。生物信息学中的一个基本问题是对系统发生树进行比较。一种比较方法是通过定义树空间中两棵系统发生树之间的相似度或相异度来测定这两棵树的同异。Robinson-Foulds距离是目前使用最广泛的相异度。定义了一个用于有根系统发生树比较的新的相异度,该相异度考虑了子类间更精细的相似,而不是如Robinson-Foulds距离那样仅考虑子类相同与否,因此能够提供更精确、清晰的测量。给出了两个能有效计算这个相异度的算法。简单修改之后,这些结果适用于其他5个相关的比较指标。  相似文献   

11.
基于树编辑距离的层次聚类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
为了识别犯罪嫌疑人伪造和篡改的虚假身份,利用树编辑距离计算个体属性相似性,证明了树编辑距离的相关数学性质,对属性应用层次编码方法,提出了一种新的基于树编辑距离的层次聚类算法HCTED(Hi-erarchical Clustering Algorithm Based on Tree Edit Distance)。新算法通过树编辑操作使用最少的代价计算属性相似性,克服了传统聚类算法标称型计算的缺陷,提高了聚类精度,通过设定阈值对给定样本聚类。实验证明了新方法在身份识别上的准确性和有效性,讨论了不同参数对实验结果的影响,对比传统聚类算法,HCTED算法性能明显提高。新算法已经应用到警用流动人口分析中,取得了良好效果。  相似文献   

12.
We consider a fundamental inference problem in syntactic pattern recognition (PR). We assume that the system has a dictionary which is a collection of all the ideal representations of the objects in question. To recognize a noisy sample, the system compares it with every element in the dictionary based on a nearest-neighbor philosophy, using three standard edit operations: substitution, insertion, and deletion, and the associated primitive elementary edit distances d(.,.). In this paper, we consider the assignment of the inter-symbol distances using the parametric distances. We show how the classifier can be trained to get the optimal parametric distance using vector quantization in the meta-space. In all our experiments, the training was typically achieved in a very few iterations. The subsequent classification accuracy we obtained using this single-parameter scheme was 96.13%. The power of the scheme is evident if we compare it to 96.67%, which is the accuracy of the scheme which uses the complete array of inter-symbol distances derived from a knowledge of all the confusion probabilities.  相似文献   

13.
String similarity join (SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics. The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets.  相似文献   

14.
Discovering Shape Classes using Tree Edit-Distance and Pairwise Clustering   总被引:2,自引:0,他引:2  
This paper describes work aimed at the unsupervised learning of shape-classes from shock trees. We commence by considering how to compute the edit distance between weighted trees. We show how to transform the tree edit distance problem into a series of maximum weight clique problems, and show how to use relaxation labeling to find an approximate solution. This allows us to compute a set of pairwise distances between graph-structures. We show how the edit distances can be used to compute a matrix of pairwise affinities using χ2 statistics. We present a maximum likelihood method for clustering the graphs by iteratively updating the elements of the affinity matrix. This involves interleaved steps for updating the affinity matrix using an eigendecomposition method and updating the cluster membership indicators. We illustrate the new tree clustering framework on shock-graphs extracted from the silhouettes of 2D shapes. National ICT Australia is funded by the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.  相似文献   

15.
This paper reports an experimental result obtained by additionally using unlabeled data together with labeled ones to improve the classification accuracy of dissimilarity-based methods, namely, dissimilarity-based classifications (DBC) [25]. In DBC, classifiers among classes are not based on the feature measurements of individual objects, but on a suitable dissimilarity measure among the objects instead. In order to measure the dissimilarity distance between pairwise objects, an approach using the one-shot similarity (OSS) [30] measuring technique instead of the Euclidean distance is investigated in this paper. In DBC using OSS, the unlabeled set can be used to extend the set of prototypes as well as to compute the OSS distance. The experimental results, obtained with artificial and real-life benchmark datasets, demonstrate that designing the classifiers in the OSS dissimilarity matrices instead of expanding the set of prototypes can further improve the classification accuracy in comparison with the traditional Euclidean approach. Moreover, the results demonstrate that the proposed setting does not work with non-Euclidean data.  相似文献   

16.
提出一种改进的树匹配算法,通过考量HTML特性,对树编辑距离方法进行改进,根据不同HTML树结点在浏览器中所显示的相关数据的不同权重赋以不同的权重值。算法由HTML数据对象构造具有结点权重的HTML树,模式识别通过取得两棵构造树的最大映射值达成。通过基于商用网站的实验对算法有效性进行了证实。  相似文献   

17.
基于粗糙集的改进K—Modes聚类算法   总被引:3,自引:0,他引:3  
传统的K-Modes算法采用简单匹配的方法来计算对象之间的距离,并没有充分考虑同一属性下的两个不同值之间的相似性.基于粗糙集中的上、下近似,提出了一种新的距离度量,并重新定义了类中心,对传统K-Modes算法进行了改进.与其他改进K-Modes算法进行了比较,实验结果表明,基于粗糙集的改进K-Modes算法有效地提高了聚类精度.  相似文献   

18.
We consider the problem of recognizing ordered labeled trees by processing their noisy subsequence-trees which are “patched-up” noisy portions of their fragments. We assume that H, a finite dictionary of ordered labeled trees, is given. X* is an unknown element of H, and U is any arbitrary subsequence-tree of X*. We consider the problem of estimating X* by processing Y, which is a noisy version of U. The solution which we present is, to our knowledge, the first reported solution to the problem. We solve the problem by sequentially comparing Y with every element X of H, the basis of comparison being a new dissimilarity measure between two trees, which implicitly captures the properties of the corrupting mechanism that noisily garbles U into Y. The algorithm which incorporates this constraint has been used to test our pattern recognition system, and the experimental results obtained demonstrate good accuracy  相似文献   

19.
The problem of determining the identity and pose of occluded objects from noisy data is examined. Previous work has shown that local measurements of the position and surface orientation of small patches of an object's surface may be used in a constrained search process to solve this problem, for the case of rigid polygonal objects using 2-D sensory data, or rigid polyhedral objects using 3-D data. The recognition system is extended to recognize and locate curved objects. The extension is done in two dimensions, and applies to the recognition of 2-D objects from 2-D data, or to the recognition of the 3-D objects in stable positions from 2-D data  相似文献   

20.
Traditional normalized tree edit distances do not satisfy the triangle inequality. We present a metric normalization method for tree edit distance, which results in a new normalized tree edit distance fulfilling the triangle inequality, under the condition that the weight function is a metric over the set of elementary edit operations with all costs of insertions/deletions having the same weight. We prove that the new distance, in the range [0, 1], is a genuine metric as a simple function of the sizes of two ordered labeled trees and the tree edit distance between them, which can be directly computed through tree edit distance with the same complexity. Based on an efficient algorithm to represent digits as ordered labeled trees, we show that the normalized tree edit metric can provide slightly better results than other existing methods in handwritten digit recognition experiments using the approximating and eliminating search algorithm (AESA) algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号