首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hamming范数的XML流相关性估测算法
引用本文:孙 贺,朱 洪.基于Hamming范数的XML流相关性估测算法[J].软件学报,2010,21(4):672-679.
作者姓名:孙 贺  朱 洪
作者单位:1. 复旦大学,上海市智能信息处理重点实验室,上海,200433
2. 华东师范大学,上海市高可信计算重点实验室,上海,200062
基金项目:Supported by the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z189 (国家高技术研究发展计划(863)); the Shanghai Leading Academic Discipline Project of China under Grant No.B412 (上海重点学科建设项目资助)
摘    要:在数据库理论中,如何在较小的空间条件下快速地比较不同的XML(extensible markup language)流的差异性是一个基本问题.在这一问题的研究中,人们提出了树编辑距离等测度来描述XML文本的差异性.提出了一种基于Hamming范数的l0测度——即XML树的不同子树的个数,并以此来刻画XML文本的相关性.在数据流模型下,给出了基于空间有界伪随机数发生器、稳态分布于哈希函数的l0测度的概率算法.理论上的时空复杂性分析、正确性证明与实验模拟结果表明,这一概率算法对问题的输入提供了一个理想的近似.

关 键 词:算法设计  数据流  Hamming范数  稳态分布  XML(extensible  markup  language)
收稿时间:2007/12/17 0:00:00
修稿时间:7/2/2008 12:00:00 AM

Correlation Estimating Algorithm of XML Stream Based on Hamming Norms
SUN He and ZHU Hong.Correlation Estimating Algorithm of XML Stream Based on Hamming Norms[J].Journal of Software,2010,21(4):672-679.
Authors:SUN He and ZHU Hong
Affiliation:SUN He1,ZHU Hong2 1(Shanghai Key Laboratory of Intelligent Information Processing,Fudan University,Shanghai 200433,China) 2(Shanghai Key Laboratory of Trustworthy Computing,East China Normal University,Shanghai 200062,China)
Abstract:It is of great importance to compare the correlation of different XML (extensible markup language) streams in the limited space in the Database Theory. In the study of these problems, several measures are proposed, e.g. the tree-edit distance, to show the difference of XML trees. This paper proposes a natural measure l0 employing Hamming norms, i.e. the number of distinct sub-trees between two XML trees, to estimate the correlation. Furthermore, a probabilistic estimating algorithm involving space-bounded pseudorandom generators, stable distributions and hash functions has been presented in the data stream model. Theoretical time/space complexity analysis, correctness proof and experimental simulation show that this algorithm can give a desired approximation.
Keywords:algorithm design  data stream  Hamming norm  stable distribution  XML (extensible markup language)
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号