首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
在研究程序代码相似性度量方法的基础上,提出一种基于XML store的程序代码查询匹配算法。由于XML store以树型结构保存XML文件,算法将通过查询XML store中DVM树来对判断程序之间是否具有相同结构的子树,进行相似度度量。最后,通过在原型系统上进行的一系列实验,进一步证明了提出的算法在程序代码相似度度量实际应用中的可行性和有效性。  相似文献   

2.
基于关联规则的本体相似度综合计算方法   总被引:1,自引:0,他引:1  
李华  苏乐 《计算机应用》2012,32(9):2472-2475
目前较为流行的最小风险的本体映射(RiMOM)框架通过采用“多策略”的思想虽然取得了一定的效果,但其框架比较臃肿庞杂,且采用的计算结构相似度的选择策略存在一定的局限性。针对上述问题,提出一种基于关联规则的本体相似度综合计算方法。首先,构造关联规则的结构“树”模型,得出相应事务集;其次,进行关联规则的挖掘,根据关联规则计算概念结构的相似性;然后,计算概念的实例、属性、名称的相似度;最后,对多个特征相似度进行综合加权处理,实现本体相似度的最优计算。实验结果表明,该方法较RiMOM在查全率、查准率方面均有较大提高;同时该方法省去了策略选择的步骤,有效降低了时间复杂度。  相似文献   

3.
基于内容相似度的网页正文提取   总被引:6,自引:0,他引:6       下载免费PDF全文
提出一种将复杂的网页脚本进行简化并映射成一棵易于操作的树型结构的方法。该方法不依赖于DOM树,无须用HTMLparser包进行解析,而是利用文本相似度计算方法,通过计算树节点中文本内容与各级标题的相似度判定小块文本信息的有用性,由此进行网页清洗与正文抽取,获得网页文本信息,实验结果表明,该方法对正文抽取具有较高的通用性与准确率。  相似文献   

4.
模型驱动工程中,模型合成技术能降低软件复杂度,提高开发效率和质量.提出基于语义和结构进行模型匹配并按规则进行合成的方法,首先建立形式化的中间数据模型,用于统一模型合成过程中的模型表示方式;然后结合元素名称设计语义相似度函数,并基于元素属性和元素之间的关系计算结构相似度,提高模型匹配精度;根据相似度值确定匹配元素,对于完全匹配和完全不匹配的元素,自动确定合成规则,对于相似元素对,经过少量的人工分析可判定合成规则;根据合成规则能自动产生合成模型;设计校验规则和验证框架检验合成模型的正确性和一致性.最后用实例说明了方法的有效性和可行性.  相似文献   

5.
情感评价单元的抽取是情感分析的基础任务之一,目前大部分的抽取方法都基于句法路径等扁平化的特征,区分中文评价文本中噪声的能力不强。提出了一种基于短语句法的树型结构来表示情感评价单元模式,并使用近似卷积树核的方法来计算这种结构的相似度;在此基础上,采用基于相似度计算的模式匹配方法进行情感评价单元抽取。在中文商品评论上进行实验,该方法比基于句法路径的方法准确率提高了13.4%,召回率提高了9.2%。实验证明提出的方法对中文商品评价的情感评价单元抽取效果较好。  相似文献   

6.
概率模型是解决不确定性推理和数据分析的有效工具。针对本体匹配的不确定性,提出一种基于马尔科夫网的本体匹配改进算法。采用多种传统匹配算法计算相似度矩阵,改进相似度传播规则,添加2种结构稳定性约束规则和1种Disjoint一致性约束规则,定义其对应团的势函数。根据相似度矩阵和上述规则,给出马尔科夫网的构造方法,使用循环置信度传播算法计算随机变量的后验概率,依据后验概率得到最后的本体匹配结果。在OAEI2010数据集上进行实验,结果表明,与iMatch本体匹配系统相比,该算法能有效降低概率模型的复杂度,提高本体匹配的准确率和召回率。  相似文献   

7.
目前国际上对变化检测算法的研究主要集中于在效率或空间上的优化,变化检测的精确程度不能令人满意,比如不能准确定位改变的文字等。通过将XML文档的树型结构和文本之间相似度相结合,提出了一种新颖的面向文本内容的变化检测算法DML-Diff,重点突出了文本内容的变化,使得变化检测结果更精确。  相似文献   

8.
目前国际上对变化检测算法的研究主要集中于在效率或空间上的优化,变化检测的精确程度不能令人满意,比如不能准确定位改变的文字等.通过将XML文档的树型结构和文本之间相似度相结合,提出了一种新颖的面向文本内容的变化检测算法DML-Diff,重点突出了文本内容的变化,使得变化检测结果更精确.  相似文献   

9.
提出一种程序源代码相似度度量方法,根据C语言程序源代码的结构特点划分函数作用域,采用相关规则对划分后的程序代码进行规格化处理,对生成的Token序列求Hash值,使用散列值匹配算法对程序源代码进行相似度度量。实验结果证明,该方法可提高程序源代码相似度度量精度,且运行效率较高。  相似文献   

10.
化学结构相似性检索在现代化学研究中具有重要作用。而化学结构的相似性度量是进行相似度检索的基础和前提。目前在化学信息学的研究中,有数量众多的化学结构距离度量和相似度表示方法。本文采用了Daylight的分子指纹方法,并采用了Tanimoto系数定义的相似度度量方法。并根据这种度量方法使用CDK来进行计算化学结构的相似度指数。在上述研究工作的基础上,开发了基于浏览器/服务器模式的化学结构相似度检索系统,通过该系统可以在中药活性成分数据库中进行化学结构相似度检索。用户在进行化学结构相似度检索时,可以选择已存在的化学结构,也可以采用JME来绘制新的化学结构。下一步将在该数据库中进行聚类分析和分子多样性的研究。  相似文献   

11.
杨国清 《计算机时代》2020,(3):50-52,56
关系结构是最常用的数据逻辑形式。在关系数据库中,存在局部的树形结构数据形态。针对关系数据库中的树形结构数据,提出一种基于矩阵模型的数据组织方法,直接使用SQL查询,在数据库内部实现树形结构的插入、遍历、删除、移动等算法。  相似文献   

12.
In the past few decades, much success has been achieved in the use of artificial neural networks for classification, recognition, approximation and control. Flexible neural tree (FNT) is a special kind of artificial neural network with flexible tree structures. The most distinctive feature of FNT is its flexible tree structures. This makes it possible for FNT to obtain near-optimal network structures using tree structure optimization algorithms. But the modeling efficiency of FNT is always a problem due to its two-stage optimization. This paper designed a parallel evolving algorithm for FNT (PE-FNT). This algorithm uses PIPE algorithm to optimize tree structures and PSO algorithm to optimize parameters. The evaluation processes of tree structure populations and parameter populations were both parallelized. As an implementation of PE-FNT algorithm, two parallel programs were developed using MPI. A small data set, two medium data sets and three large data sets were applied for the performance evaluations of these programs. Experimental results show that PE-FNT algorithm is an effective parallel FNT algorithm especially for large data sets.  相似文献   

13.
在大规模多媒体数据库中进行基于内容的检索,高维数据牵引结构的研究是重要问题,提出了一种有效的高维索引结构-自适应近似树,阐述了它的结构,给出了构建和检索算法,它结合了树结构和顺序检索的共同优点,针对不同的数据分布情况可以自适应地调整结构,维数较低或数据分布偏斜较大时它呈现树的结构,高维或数据分布密集时呈现顺序扫描的结构,以达到更优的检索效率,在结构上,对MBR使用了压缩存储的方法以节省存储空间,在算法中充分利用了空间划分是MBS和MBR共存的特点,减少了大量复杂的计算,从而大大提高检索效率。  相似文献   

14.
Database applications very often require a sophisticated class of storage structures in order to answer different types of queries efficiently. This often dictates that the file should be organized on multiple keys. Several storage structures have been proposed to satisfy these needs. Most of these are a generalization of the storage structures used for managing one-dimensional data. Thek-d tree is one such example and it is a natural generalization of the standard one-dimensional binary search tree. Recently, a new storage structure, called theBD tree, was proposed to manage multidimensional data. This structure has good dynamic characteristics. Several variations are possible on the basick-d tree structure. This paper studies the performance implications of three variations. Further, it provides an empirical performance comparison of thek-d tree andBD tree in database applications.  相似文献   

15.
This paper defines and demonstrates four philosophies for processing queries on tree structures; shows that the data semantics of queries shuld be described by designating sets of nodes from which v values for attnbutes may be returned to the data consumer; shows that the data semantics of database processing can be specified totally independent of any madhine, file structure, or implementation; shows that set theory is a natural and effective vehicle for analyzing the semantics of queries on tree structures; and finally, shows that Bolts is an adequate formalism for conveying the semantics of tree structure processing.  相似文献   

16.
17.
G-tree: a new data structure for organizing multidimensional data   总被引:4,自引:0,他引:4  
The author describes an efficient data structure called the G-tree (or grid tree) for organizing multidimensional data. The data structure combines the features of grids and B-trees in a novel manner. It also exploits an ordering property that numbers the partitions in such a way that partitions that are spatially close to one another in a multidimensional space are also close in terms of their partition numbers. This structure adapts well to dynamic data spaces with a high frequency of insertions and deletions, and to nonuniform distributions of data. We demonstrate that it is possible to perform insertion, retrieval, and deletion operations, and to run various range queries efficiently using this structure. A comparison with the BD tree, zkdb tree and the KDB tree is carried out, and the advantages of the G-tree over the other structures are discussed. The simulated bucket utilization rates for the G-tree are also reported  相似文献   

18.
现有的关系学习研究都是基于完备数据进行的,而现实问题中,数据通常是不完备的.提出一种从不完备关系数据中学习概率关系模型(probabilistic relational models,简称PRMs)的方法——MLTEC(maximum likelihood tree and evolutionary computing method).首先,随机填充不完备关系数据得到完备关系数据.然后从每个随机填充后的数据样本中分别生成最大似然树并作为初始PRM网络,再利用进化过程中最好的网络结构反复修正不完备数据集,最后得到概率关系模型.实验结果显示,MLTEC方法能够从不完备关系数据中学习到较好的概率关系模型.  相似文献   

19.
Several novel data center network structures have been proposed to improve the topological properties of data centers. A common characteristic of these structures is that they are designed for supporting general applications and services. Consequently, these structures do not match well with the specific requirements of some dedicated applications. In this paper, we propose a hyper‐fat‐tree network (HFN): a novel data center structure for MapReduce, a well‐known distributed data processing application. HFN possesses the advanced characteristics of BCube as well as fat‐tree structures and naturally supports MapReduce. We then address several challenging issues that face HFN in supporting MapReduce. Mathematical analysis and comprehensive evaluation show that HFN possesses excellent properties and is indeed a viable structure for MapReduce in practice. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

20.
We present a Markov chain model for the analysis of the behaviour of binary search trees (BSTs) under the dynamic conditions of insertions and deletions. The model is based on a data structure called a lineage tree, which provides a compact representation of different BST structures while still retaining enough information to model the effect of insertions and deletions and to compute average path length and tree height. Different lineages in the lineage tree correspond to states in the Markov chain. Transition probabilities are based on the number of BST structures corresponding to each lineage. The model is based on a similar lineage tree model developed for B-trees. The BST model is not intended for practical computations, but rather as a demonstration of the generalizability of the lineage tree approach for modeling data structures such as B-trees, B*-trees, B+-trees, BSTs, etc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号