首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Double‐array structures have been widely used to implement dictionaries with string keys. Although the space efficiency of dynamic double‐array dictionaries tends to decrease with key updates, we can still maintain high efficiency using existing methods. However, these methods have practical problems of time and functionality. This paper presents several efficient rearrangement methods to solve these problems. Through experiments using real‐world datasets, we demonstrate that the proposed rearrangement methods are much more practical than existing methods.  相似文献   

4.
Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory, outweighing by far the theoretical slowdown brought by the space reduction. We present a novel compressed suffix tree, which is the first achieving at the same time sublogarithmic complexity for the operations, and space usage that asymptotically goes to zero as the entropy of the text does. The main ideas in our development are compressing the longest common prefix information, totally getting rid of the suffix tree topology, and expressing all the suffix tree operations using range minimum queries and a novel primitive called next/previous smaller value in a sequence. Our solutions to those operations are of independent interest.  相似文献   

5.
6.
7.
Tight connections between leaf languages and strings compressed by straight-line programs (SLPs) are established. It is shown that the compressed membership problem for a language L is complete for the leaf language class defined by L via logspace machines. A more difficult variant of the compressed membership problem for L is shown to be complete for the leaf language class defined by L via polynomial time machines. As a corollary, it is shown that there exists a fixed linear visibly pushdown language for which the compressed membership problem is PSPACE-complete. For XML languages, it is shown that the compressed membership problem is coNP-complete.Furthermore it is shown that the embedding problem for SLP-compressed strings is hard for PP (probabilistic polynomial time).  相似文献   

8.
We consider succinct, or highly space-efficient, representations of a (static) string consisting of n   pairs of balanced parentheses, which support natural operations such as finding the matching parenthesis for a given parenthesis, or finding the pair of parentheses that most tightly enclose a given pair. This problem was considered by Jacobson [Space-efficient static trees and graphs, in: Proc. of the 30th FOCS, 1989, pp. 549–554] and Munro and Raman [Succinct representation of balanced parentheses and static trees, SIAM J. Comput. 31 (2001) 762–776] who gave O(n)O(n)-bit and 2n+o(n)2n+o(n)-bit representations, respectively, that supported the above operations in O(1)O(1) time on the RAM model of computation. This data structure is a fundamental tool in succinct representations, and has applications in representing suffix trees, ordinal trees, planar graphs and permutations.  相似文献   

9.
近似字符串匹配是模式匹配研究领域中的一个重要研究方向。压缩后缀数组是字符串匹配、数据压缩等领域广泛使用的索引结构,具有检索速度快和适用广泛的优点。利用压缩后缀数组,提出了适合近似字符串匹配搜索算法的数据结构,并在此基础上提出了一种匹配搜索算法。实验结果表明,相对于现有的算法,提出的算法在小字母表的情况下具有计算优势。  相似文献   

10.
字符串相似性连接是数据质量管理的基本操作,也是数据价值发现的关键步骤。针对目前已有的方法不能满足面向大数据的增量式处理需求的问题,提出一种面向流式数据的增量式字符串相似性连接方法——Inc-Join,并对方法的索引技术进行了优化。该方法以Pass-Join字符串连接算法为基础,首先,采用字符串划分技术将字符串划分成多个互不相交的子串;然后,建立字符串的反向索引列表并将其作为状态;最后,新增数据只需根据状态进行相似性计算,每次连接操作结束后都对状态进行更新。实验结果表明,Inc-Join方法在不影响连接准确率的同时,有效将长、 短字符串重复匹配次数减少为√n(n是批处理方式的匹配次数)。 实验对3种数据集进行处理,发现使用批处理方式进行相似性连接的响应时间是Inc-Join的1至4.7倍,并呈现急剧递增的趋势;而且优化后Inc-Join方法的响应时间最小只占优化前的3/4,并随处理数据的增多所占比例越来越小。同时优化后的Inc-Join不需要保存状态,再一次减小了算法执行的时间和空间开销。  相似文献   

11.
A number string is a sequence of positive integers from a set (1,2,3, ..., p) 5; N. Interlocking Number String(INS) is a number string in which certain length of adjacent numbers are meaningfully interrelated. Code matrix is a matrix generated using INS. The development of INS and code matrix is novel, and has a practical origin. In addition, INS and code matrix has interesting theoretical characteristics. One may use this characteristics to develop tools and methodologies in various applications. This paper discusses basic theories on INS and code matrix, and briefly presents an application of the code matrix in computer vision area. The applications of INS and code matrix in real world problems are greatly anticipated.  相似文献   

12.
13.
14.
针对基于内容的可变长度的分块CDC算法中数字签名计算需要耗费大量CPU开销的问题,提出了一种基于位串内容感知的数据块分块算法。算法利用每一次失败匹配尝试所带来的位特征信息,最大限度地排除不能匹配的位置,从而获得最大的跳跃长度,减少中间计算和比较的开销。实验结果表明,本算法减小了数据分块过程中数字签名计算的开销,降低了确定块边界时的CPU资源消耗,从而优化了数据分块的时间性能。  相似文献   

15.
Multiplicative noise removal is a key issue in image processing problem. While a large amount of literature on this subject are total variation (TV)-based and wavelet-based methods, recently sparse representation of images has shown to be efficient approach for image restoration. TV regularization is efficient to restore cartoon images while dictionaries are well adapted to textures and some tricky structures. Following this idea, in this paper, we propose an approach that combines the advantages of sparse representation over dictionary learning and TV regularization method. The method is proposed to solve multiplicative noise removal problem by minimizing the energy functional, which is composed of the data-fidelity term, a sparse representation prior over adaptive learned dictionaries, and TV regularization term. The optimization problem can be efficiently solved by the split Bregman algorithm. Experimental results validate that the proposed model has a superior performance than many recent methods, in terms of peak signal-to-noise ratio, mean absolute-deviation error, mean structure similarity, and subjective visual quality.  相似文献   

16.
一种融合多种编辑距离的字符串相似度计算方法*   总被引:5,自引:0,他引:5  
针对中西文混合字符串,采用了将汉字作为西文字符的等价单位计算编辑距离的方法,并从输入法的角度提出了采用拼音编码和五笔编码计算编辑距离的方法,最后给出了融合三种编辑距离计算字符串相似度的算法。仿真结果表明,该方法在提高相似重复记录检测的查全率的同时,也能获得较高的查准率。  相似文献   

17.
18.
目的 自动指纹识别系统大多是基于细节点匹配的,系统性能依赖于输入指纹质量。输入指纹质量差是目前自动指纹识别系统面临的主要问题。为了提高系统性能,实现对低质量指纹的增强,提出了一种基于多尺度分类字典稀疏表示的指纹增强方法。方法 首先,构建高质量指纹训练样本集,基于高质量训练样本学习得到多尺度分类字典;其次,使用线性对比度拉伸方法对指纹图像进行预增强,得到预增强指纹;然后,在空域对预增强指纹进行分块,基于块内点方向一致性对块质量进行评价和分级;最后,在频域构建基于分类字典稀疏表示的指纹块频谱增强模型,基于块质量分级机制和复合窗口策略,结合频谱扩散,基于多尺度分类字典对块频谱进行增强。结果 在指纹数据库FVC2004上将提出算法与两种传统指纹增强算法进行了对比实验。可视化和量化实验结果均表明,相比于传统指纹增强算法,提出的方法具有更好的鲁棒性,能有效改善低质量输入指纹质量。结论 通过将指纹脊线模式先验引入分类字典学习,为拥有不同方向类别的指纹块分别学习一个更为可靠的字典,使得学习到的分类字典拥有更可靠的脊线模式信息。块质量分级机制和复合窗口策略不仅有助于频谱扩散,改善低质量块的频谱质量,而且使得多尺度分类字典能够成功应用,克服了增强准确性和抗噪性之间的矛盾,使得块增强结果更具稳定性和可靠性,显著提升了低质量指纹图像的增强质量。  相似文献   

19.
20.
经典字符串匹配算法的本质都是从左向右或者从右向左顺序进行字符匹配的,在主串中存在大量子串与模式串前缀或者后缀相同时效率较低,并且模式串最大右移长度为模式串长度。改进算法采用二分匹配字符串的方法,有效地避免了由主串中大量子串与模式串前缀相同或者后缀相同引起的无意义比较次数。模式串的移动距离根据改进的坏字符规则进行计算,增大了模式串的移动距离。实验结果表明,改进的字符串匹配算法可以有效地减少字符串的匹配次数和移动次数,达到了提高算法效率的目的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号