首页 | 本学科首页   官方微博 | 高级检索  
     

基于决策树和马尔可夫链的问答对自动提取
引用本文:刘佳宾,胡国平,陈超,邵正荣.基于决策树和马尔可夫链的问答对自动提取[J].中文信息学报,2007,21(2):46-51.
作者姓名:刘佳宾  胡国平  陈超  邵正荣
作者单位:中国科学技术大学 电子工程与信息科学系,安徽 合肥 230027)
基金项目:国家自然科学基金;微软基金
摘    要:问答系统能用准确、简洁的答案回答用户用自然语言提出的问题,很明显系统中问答对的规模是影响问答系统最终性能的主要因素。为了提高问答对的规模、充分利用互联网资源,本文提出了一种基于决策树和马尔科夫链的在互联网上自动抽取问答对的算法。先根据网页中的HTML标记把网页表示成一棵DOM树;然后利用树中每个节点的结构和文字信息,抽取相应的特征;最后将得到的节点特征通过由决策树和一阶马尔可夫链结合得出的分类模型进行分类。试验结果表明准确率达到了90.398%,召回率达到了86.032%。对大量网页抽取的结果表明该分类模型能够适应对各种各样的网页的抽取。

关 键 词:人工智能  模式识别  信息抽取  DOM树  决策树  马尔可夫链  
文章编号:1003-0077(2007)02-0046-06
收稿时间:2006-02-08
修稿时间:2006-11-13

Decision Tree and Markov Model Based Question-Answer Pair Extraction
LIU Jia-bin,HU Guo-ping,CHEN Chao,SHAO Zheng-rong.Decision Tree and Markov Model Based Question-Answer Pair Extraction[J].Journal of Chinese Information Processing,2007,21(2):46-51.
Authors:LIU Jia-bin  HU Guo-ping  CHEN Chao  SHAO Zheng-rong
Affiliation:Department of EEIS, University of Science and Technology of China, Hefei, Anhui 230027, China
Abstract:Question Answering System can give users precise answer to the question presented in natural language and the major factor which influence the System’s performance is the scale of Question-Answer pairs. In order to increase the Question-Answer pair’s scale and make full use of Web Pages’ resource, in this paper we propose a method that uses decision tree and Markov model to extract Question-Answer pairs in Web Pages. The method uses DOM tree to represent a web page according to HTML tags. Then acquire features value from every DOM tree’s node. Last allow the features overpass the classification model, which created by decision tree and Markov model, to get the node’s last classification result. Experimental results show that the precision achieved 90.40% and recall achieved 86.03%. Experimental results also show that this model could extract information from all kinds of Web Pages.
Keywords:artificial intelligence  pattern recognition  information extraction  DOM tree  decision tree  Markovmodel
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号