首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
基于约束树编辑距离与导航树的信息采集   总被引:1,自引:0,他引:1       下载免费PDF全文
姜波  丁岳伟 《计算机工程》2009,35(14):75-77
介绍基于网站和网页结构的信息采集算法,提出一种基于约束树编辑距离的导航树算法。该算法通过提取网页的HTML的重要标记生成网页结构的标签树,对网页进行结构分析,通过约束树编辑距离算法判断爬行到的网页与主题的相关性,并根据网站基于URL的拓扑结构,提出基于导航树的信息采集约束信息采集器的爬行路径,提高了目标页面采集的效率和准确率。  相似文献   

3.
提出一种改进的树匹配算法,通过考量HTML特性,对树编辑距离方法进行改进,根据不同HTML树结点在浏览器中所显示的相关数据的不同权重赋以不同的权重值。算法由HTML数据对象构造具有结点权重的HTML树,模式识别通过取得两棵构造树的最大映射值达成。通过基于商用网站的实验对算法有效性进行了证实。  相似文献   

4.
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ?th subword complexity , for small ?). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.  相似文献   

5.
图编辑距离是图模式匹配技术中常用的方法之一。基于图编辑距离的匹配方法能够处理多种类型的图数据,因而受到了学术界的广泛关注。首先介绍了图编辑距离的相关概念;然后简述了基于启发式搜索技术的精确图编辑距离算法,重点分析了基于二分图匹配的近似图编辑距离算法;最后对现存的一些图编辑问题进行了总结,并对未来的发展趋势进行了展望。  相似文献   

6.
Ferraro  Godin 《Algorithmica》2008,36(1):1-39
Abstract. In this paper we propose a dynamic programming algorithm to compare two quotiented trees using a constrained edit distance. A quotiented tree is a tree defined with an additional equivalent relation on vertices and such that the quotient graph is also a tree. The core of the method relies on an adaptation of an algorithm recently proposed by Zhang for comparing unordered rooted trees. This method is currently being used in plant architecture modelling to quantify different types of variability between plants represented by quotiented trees.  相似文献   

7.
Ferraro  Godin 《Algorithmica》2003,36(1):1-39
In this paper we propose a dynamic programming algorithm to compare two quotiented trees using a constrained edit distance. A quotiented tree is a tree defined with an additional equivalent relation on vertices and such that the quotient graph is also a tree. The core of the method relies on an adaptation of an algorithm recently proposed by Zhang for comparing unordered rooted trees. This method is currently being used in plant architecture modelling to quantify different types of variability between plants represented by quotiented trees.  相似文献   

8.
两字符串的编辑距离是从一个串转换到另一个串所需要的最少基本操作数。编辑距离广泛应用于字符串近似匹配、字符串相似连接等领域。动态规划法利用编辑距离矩阵来计算两个串的编辑距离,需要计算矩阵中的所有元素,时间效率低。改进的方法改变了矩阵中元素的计算次序,减少了需要比对的元素,但仍需要比对一半以上的元素,时间效率还有待提高。提出基于基本操作序列的编辑距离顺序验证方法。首先,分析了基本操作序列的可列性,给出了列举基本操作序列的方法。然后依次顺序验证基本操作数从小到大的基本操作序列直到某一序列通过验证,得到其编辑距离。在阈值为2的字符串近似搜索实验中发现,所提方法比动态规划类方法具有更高的效率。  相似文献   

9.
支持块编辑距离的索引结构   总被引:1,自引:0,他引:1  
在近似字符串匹配中,传统的编辑距离不能很好地衡量诸如人名、地址等数据的相似关系,而块编辑距离可以很好地衡量两个字符串的相似性.如何有效地支持块编辑距离,进行近似字符串查询处理具有重要的意义.计算两个字符串的块编辑距离是一个NP完全问题,因此希望提供有效的方法可以增强过滤能力,并减少假通过率.设计了一种支持移动编辑距离的新颖的索引结构SHV-Trie,通过研究移动编辑距离的操作特性,使用字母出现的频率作为支持移动编辑距离操作的一个下界,并且提出相应的查询过滤算法,同时,针对索引SHV-Trie的空间开销过大的问题,提出一种优化字母排列的索引结构和一种压缩的索引结构及相关查询过滤算法.真实数据集上的实验结果与分析显示了所提出的索引结构具有良好的过滤能力,并通过减少效率假通过率提高查询的效率.  相似文献   

10.
主要探讨了用Visual FoxPro设计编辑框中文本的查找与替换方法.首先在进行查找与替换表单界面的设计及相关属性设置的基础上,阐明了查找与替换的功能目标;然后分析了相应对象相关事件的基本过程,并提供了相应的Visual FoxPro程序代码.  相似文献   

11.
主要探讨了用Visual FoxPro设计编辑框中文本的查找与替换方法。首先在进行查找与替换表单界面的设计及相关属性设置的基础上,阐明了查找与替换的功能目标;然后分析了相应对象相关事件的基本过程,并提供了相应的Visual FoxPro程序代码。  相似文献   

12.
Given an undirected connected graph GG we consider the problem of finding a spanning tree of GG which has a maximum number of internal (non-leaf) vertices among all spanning trees of GG. This problem, called Maximum Internal Spanning Tree problem, is clearly NP-hard since it is a generalization of the Hamiltonian Path problem. From the optimization point of view the Maximum Internal Spanning Tree problem is equivalent to the Minimum Leaf Spanning Tree problem. However, the two problems have different approximability properties. Lu and Ravi proved that the latter has no constant factor approximation–unless P = NP–, while Salamon and Wiener gave a linear-time 2-approximation algorithm for the Maximum Internal Spanning Tree problem.  相似文献   

13.
14.
基于最小编辑距离的维语词语检错与纠错研究   总被引:2,自引:1,他引:2  
拼写错误的发现和候选词选取是文本分析中的一个重要的技术问题。本文结合维吾尔语的语音和词语结构特点,列出了文本中常见的拼写错误类型,详细分析了解决方法,利用最小编辑距离(minimum edit distance)算法实现了维吾尔语文本拼写错误分析中的查错和纠错功能,并以此为基础,结合维吾尔语构词规则,进一步提高了建议候选词的准确率和速度。该算法已被成功地应用到了维吾尔语文字自动校对和多文种文本检索等领域中。在以新疆高校学报为语料的测试中,词语查纠率达到 85%以上。  相似文献   

15.
16.
Spirn  J. 《Computer》1976,9(11):14-20
To analyze or simulate an operating/computer system, one must construct a model of the programs executing within the system. Many recent analyses, particularly queueing models, have used mathematically convenient program models. For example, in one popular model the times between page faults are assumed to be exponentially distributed and independent. However, such a program model is inaccurate, which may cause the analysis to be unrepresentative of the real world. Simulation models of systems, on the other hand, have relied largely on traces of actual programs. Such traces are undoubtedly more accurate than simple mathematical models, but they have several drawbacks. They are expensive to generate, they may not be truly representative of typical programs, and they may contain more detail than is necessary for accurate system modeling. Moreover, it is difficult to extrapolate their behavior to other similar programs.  相似文献   

17.
We consider the problem of finding a minimum diameter spanning tree with maximum node degree $B$ in a complete undirected edge-weighted graph. We provide an $O(\sqrt{\log_Bn})$-approximation algorithm for the problem. Our algorithm is purely combinatorial, and relies on a combination of filtering and divide and conquer.  相似文献   

18.
Approximating the Degree-Bounded Minimum Diameter Spanning Tree Problem   总被引:1,自引:0,他引:1  
We consider the problem of finding a minimum diameter spanning tree with maximum node degree $B$ in a complete undirected edge-weighted graph. We provide an $O(\sqrt{\log_Bn})$-approximation algorithm for the problem. Our algorithm is purely combinatorial, and relies on a combination of filtering and divide and conquer.  相似文献   

19.
如何快速有效地对数据立方体上的聚集查询给出近似的回答,是数据挖掘和数据仓库研究领域中的核心问题之一。现有大多数聚集查询算法在同一个数据立方体上只能支持某种特定的而非多种类型的聚集查询。本文给出了一种新的框架AdenTS,即基于密度的自适应树结构,它可以回答同一数据立方体上的各类聚集查询,也提出了一些近似和启发式技术,改善了查询结果和精度。实验结果表明,这种方法在支持的查询种类和性能上是更好的。  相似文献   

20.
基于编辑距离和多种后处理的生物实体名识别   总被引:1,自引:1,他引:0       下载免费PDF全文
基于编辑距离和多种后处理的生物医学文献实体名识别方法通过“全称缩写对识别算法”扩充词典,利用编辑距离算法提高识别召回率。在后处理阶段,使用前后缀词扩展、POS扩展、合并邻近实体及利用上下文线索等方法进一步提高性能。实验结果表明,使用该方法即使利用内部词典也可以获得较好的识别效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号