共查询到17条相似文献,搜索用时 78 毫秒
1.
汉字/字符串编辑距离和编辑路径的有效求解技术 总被引:2,自引:0,他引:2
邹旭楷 《计算机研究与发展》1996,33(8):574-580
本文提出了一种有效的编辑距离和编辑路径求解技术,该技术不但适合于单字符字符串而且也适合于双字节汉字串的编辑距离和编辑路径的计算。它首先通过一有效的字符串相似匹配算法计算出串编辑距离,而后通过简单的二进制字位运算正确计算出串(最短)编辑路径。文章也给出了本技术的完整实现算法并分析了算法的复杂性。 相似文献
2.
3.
为解决目前人工评判古诗词默写正确性的方式存在耗时、漏判、误判等问题的情况,提出一种基于编辑距离的古诗词在线评判算法。根据中文古诗词的结构特点,该算法应用了基于树结构的编辑距离进行相似度判定。整个古诗词文本的相似度是语句、语块两个级别的相似度的综合。试验表明基于树状结构的编辑距离算法在古诗词相似评判方面较基于字符串的编辑距离算法具有更佳的适用性。 相似文献
4.
5.
《计算机工程与应用》2016,(2):81-85
目前关于XML文档相似性算法有很多种,其中基于编辑距离的方法是很重要的一类。目前已发表的基于编辑距离的算法中,编辑图算法由于其计算高效率的特点成为研究的出发点。首先介绍了编辑图算法的思想,由于它在计算过程中对同层兄弟节点的顺序有很强的依赖性,因此不能准确有效地比较数据无序的数据中心的XML文档相似性。针对该问题,在编辑图算法思想的基础上,结合路径算法的思想提出拆分编辑图算法。实验结果表明,拆分编辑图算法降低了编辑图算法中对兄弟节点次序的依赖性,更适合于数据中心的XML文档相似性比较,而且所得结果更加准确有效。 相似文献
6.
一种有效的编辑距离和编辑路径求解技术 总被引:1,自引:0,他引:1
邹旭楷 《小型微型计算机系统》1996,17(7):72-76
给定字符串T.P,将T转换成P所需的插入,删除,替代序列称为T到P的编辑路径,其最短编辑路径所需的插入,删除替代总数称为T到P的编辑距离,本文提出一种有效的编辑距离和编辑路径求解技术,该技术首先通过一有效的字符串相似匹配算法计算出编辑距离,而后仅通过简单的二进制字位运算正确计算出编辑路径。 相似文献
7.
8.
余丽 《电脑编程技巧与维护》2010,(18):44-46
在此提出一种基于模糊聚类的目录查询新方法,该方法基于模糊C均值聚类算法,并结合了编辑距离算法。针对传统的模糊C均值聚类算法的聚类结果不稳定性问题,引入了高权样本点集;并且在处理聚类过程中的边界值归属不足问题,引入编辑距离算法。 相似文献
9.
为了识别犯罪嫌疑人伪造和篡改的虚假身份,利用树编辑距离计算个体属性相似性,证明了树编辑距离的相关数学性质,对属性应用层次编码方法,提出了一种新的基于树编辑距离的层次聚类算法HCTED(Hi-erarchical Clustering Algorithm Based on Tree Edit Distance)。新算法通过树编辑操作使用最少的代价计算属性相似性,克服了传统聚类算法标称型计算的缺陷,提高了聚类精度,通过设定阈值对给定样本聚类。实验证明了新方法在身份识别上的准确性和有效性,讨论了不同参数对实验结果的影响,对比传统聚类算法,HCTED算法性能明显提高。新算法已经应用到警用流动人口分析中,取得了良好效果。 相似文献
10.
编辑距离是一种距离测量法,源于将一个字符串变换为另一个字符串所需要的编辑操作数,该方法能够自动将语言进行分类,最近这些年在西方很受关注,被证明测量语言或方言间距离是有效的。运用编辑距离算法对侗台语族语言做出计量分类以及亲缘关系程度的描述。结果表明编辑距离分类结果与历史语言学的分类结果是基本一致的,为计量法提供了新思路。编辑距离可以应用于东亚语言的研究中。 相似文献
11.
Traditional normalized tree edit distances do not satisfy the triangle inequality. We present a metric normalization method
for tree edit distance, which results in a new normalized tree edit distance fulfilling the triangle inequality, under the
condition that the weight function is a metric over the set of elementary edit operations with all costs of insertions/deletions
having the same weight. We prove that the new distance, in the range [0, 1], is a genuine metric as a simple function of the
sizes of two ordered labeled trees and the tree edit distance between them, which can be directly computed through tree edit
distance with the same complexity. Based on an efficient algorithm to represent digits as ordered labeled trees, we show that
the normalized tree edit metric can provide slightly better results than other existing methods in handwritten digit recognition
experiments using the approximating and eliminating search algorithm (AESA) algorithm. 相似文献
12.
两字符串的编辑距离是从一个串转换到另一个串所需要的最少基本操作数。编辑距离广泛应用于字符串近似匹配、字符串相似连接等领域。动态规划法利用编辑距离矩阵来计算两个串的编辑距离,需要计算矩阵中的所有元素,时间效率低。改进的方法改变了矩阵中元素的计算次序,减少了需要比对的元素,但仍需要比对一半以上的元素,时间效率还有待提高。提出基于基本操作序列的编辑距离顺序验证方法。首先,分析了基本操作序列的可列性,给出了列举基本操作序列的方法。然后依次顺序验证基本操作数从小到大的基本操作序列直到某一序列通过验证,得到其编辑距离。在阈值为2的字符串近似搜索实验中发现,所提方法比动态规划类方法具有更高的效率。 相似文献
13.
Aïda Ouangraoua Valentin Guignon Sylvie Hamel Cedric Chauve 《Theoretical computer science》2011,412(8-10):753-764
In this paper, we propose a new algorithm for the alignment of nested arc-annotated sequences, having applications in the comparison of RNA secondary structures without pseudo-knots. We use a general edit distance model between arc-annotated sequences, that considers classical sequences of edit operations and structural edit operations on arcs. In this model, the general edit distance problem under a non-constrained weight scheme, is NP-hard. Recently, a hierarchy of arc-annotated sequence alignment problems that highlights less general, but tractable, problems was introduced. We refine this hierarchy of alignment problems and extend the class of tractable alignment problems. Up to date, the alignment problem we solve is the most general one that is known to be tractable in the considered edit distance model and under arbitrary weight schemes. This algorithm is efficient, as its asymptotic time and space complexities are the same as the complexities of the best previously published algorithm. 相似文献
14.
在图相似性搜索问题中,图编辑距离是较为普遍的度量方法,其计算性能很大程度上决定了图相似性搜索算法的性能。针对传统图编辑距离算法中存在的因大量冗余映射和较大搜索空间导致的性能低下问题,提出了一种改进的图编辑距离算法。该算法首先对图中顶点进行等价划分,以此计算映射编码来判断等价映射;然后定义映射完整性更新等价映射优先级,选出主映射参与扩展;其次,设计高效的启发式函数,提出基于映射编码的下界计算方法,快速得到最优映射。最后,将改进的图编辑距离算法扩展应用于图相似性搜索。在不同数据集上的实验结果表明,该算法具有更好的搜索性能,在搜索空间上最大可降低49%,速度提升了约29%。 相似文献
15.
Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We consider the problem of estimating X* by processing Y, which is a noisy version of U. We do this by defining the constrained edit distance between XH and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed. An algorithm to compute this constrained edit distance has been presented. Although in general the algorithm has a cubic time complexity, within the framework of our solution the algorithm possesses a quadratic time complexity. Recognition using the constrained edit distance as a criterion demonstrates remarkable accuracy. Experimental results which involve strings of lengths between 40 and 80 and which contain an average of 26.547 errors per string demonstrate that the scheme has about 99.5 percent accuracy. 相似文献
16.
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art algorithm for the tree edit distance is RTED, which is guaranteed to be fast independent of the tree shape. Unfortunately, this algorithm requires up to twice the memory of its competitors. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation.In this paper we present a new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance). Our algorithm runs at least as fast as RTED without trading in memory efficiency. This is achieved by releasing memory early during the first step of the algorithm, which computes a decomposition strategy for the actual distance computation. We show the correctness of our approach and prove an upper bound for the memory usage. The strategy computed by AP-TED is optimal in the class of all-path strategies, which subsumes the class of LRH strategies used in RTED. We further present the AP-TED+ algorithm, which requires less computational effort for very small subtrees and improves the runtime of the distance computation. Our experimental evaluation confirms the low memory requirements and the runtime efficiency of our approach. 相似文献
17.
We present an algorithm to approximate edit distance between two ordered and rooted trees of bounded degree. In this algorithm,
each input tree is transformed into a string by computing the Euler string, where labels of some edges in the input trees
are modified so that structures of small subtrees are reflected to the labels. We show that the edit distance between trees
is at least 1/6 and at most O(n
3/4) of the edit distance between the transformed strings, where n is the maximum size of two input trees and we assume unit cost edit operations for both trees and strings. The algorithm
works in O(n
2) time since computation of edit distance and reconstruction of tree mapping from string alignment takes O(n
2) time though transformation itself can be done in O(n) time. 相似文献