首页 | 本学科首页   官方微博 | 高级检索  
     

一种通过内容和结构查询文档数据库的方法
引用本文:王晓玲,文继荣,栾金锋,马维英,董逸生.一种通过内容和结构查询文档数据库的方法[J].软件学报,2003,14(5):976-983.
作者姓名:王晓玲  文继荣  栾金锋  马维英  董逸生
作者单位:1. 东南大学计算机科学与工程系,江苏,南京,210096
2. 微软亚洲研究院,北京,100080
基金项目:This work was performed while the first author was a visiting student at Microsoft Research Asia.
摘    要:文档是有一定逻辑结构的,标题、章节、段落等这些概念是文档的内在逻辑.不同的用户对文档的检索,有不同的需求,检索系统如何提供有意义的信息,一直是研究的中心任务.结合文档的结构和内容,对结构化文件的检索,提出了一种新的计算相似度的方法.这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节.基于这种方法实现了一个问题回答系统,测试集是微软的百科全书Encarta,通过与传统方法实验比较,证明通过这种方法检索的文章片断更合理、更有效.

关 键 词:文档数据库  信息检索  段落检索  结构化文档
收稿时间:4/4/2002 12:00:00 AM
修稿时间:2002/10/17 0:00:00

A Method to Query Document Database by Content and Structure
WANG Xiao-Ling,WEN Ji-Rong,LUAN Jin-Feng,MA Wei-Ying and DONG Yi-Sheng.A Method to Query Document Database by Content and Structure[J].Journal of Software,2003,14(5):976-983.
Authors:WANG Xiao-Ling  WEN Ji-Rong  LUAN Jin-Feng  MA Wei-Ying and DONG Yi-Sheng
Abstract:Structured documents are made up of a few logical components, such as title, sections, subsections andparagraphs. The components in each structured document can be represented by an ordered tree model, which canalso be viewed as a hierarchical concept relationship. To meet the user's requirements for more precise andconcentrated search results, the retrieval techniques should allow the user to retrieve document components withvarying granularity. This paper presents a method to query document database by content and structure. The keyidea is to construct a more comprehensive similarity function by taking advantage of the inherent hierarchicalstructure in documents. This work combines Information Retrieval techniques, semi-structured data query andproximate search for document documents. The proposed method is evaluated on the Encarta encyclopediadocument set and the experimental results show that it can provide more accurate and focused answers thantraditional document retrieval methods.
Keywords:document database  information retrieval  passage retrieval  structured document
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号