采用树自动机推理技术的信息抽取方法 Information extraction using tree automata inference technique期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

采用树自动机推理技术的信息抽取方法

引用本文：	谭鹏许,张来顺. 采用树自动机推理技术的信息抽取方法[J]. 计算机工程与应用, 2010, 46(16): 153-156. DOI: 10.3778/j.issn.1002-8331.2010.16.045

作者姓名：	谭鹏许张来顺

作者单位：	解放军信息工程大学电子技术学院，郑州 450004

摘要：	提出了一种利用改进的k-contextual树自动机推理算法的信息抽取技术。其核心思想是将结构化（半结构化）文档转换成树，然后利用一种改进的k-contextual树（KLH树）来构造出能够接受样本的无秩树自动机，依据该自动机接收和拒绝状态来确定是否抽取网页信息。该方法充分利用了网页文档的树状结构，依托树自动机将传统的以单一结构途径的信息抽取方法与文法推理原则相结合，得到信息抽取规则。实验证明，该方法与同类抽取方法相比，样本学习时间以及抽取所需时间上均有所缩短。
关键词：	树自动机推理算法结构化（半结构化）文档无秩树自动机信息抽取 KLH树
收稿时间：	2008-11-19
修稿时间：	2009-2-18
Information extraction using tree automata inference technique

TAN Peng-xu,ZHANG Lai-shun. Information extraction using tree automata inference technique[J]. Computer Engineering and Applications, 2010, 46(16): 153-156. DOI: 10.3778/j.issn.1002-8331.2010.16.045

Authors:	TAN Peng-xu ZHANG Lai-shun

Affiliation:	Institute of Electronic Technology，the PLA Information Engineering University，Zhengzhou 450004，China

Abstract:	This paper proposes an information extraction method based on an improved k-contextual tree automata inference algorithm.The key idea is to transform（semi-） structured documents into tree,creating unranked tree automata which can accept the tree and extract data according to the unranked tree automata state of acceptance and rejection,using an advanced k-contextual tree language,which is called KLH tree language.The method makes full use of the tree structure of the web document and combines the method based on web structure with grammar inference.Experimental results show that the approach with tree automata inference is favorable against some other approach in the learning time and extraction time.

Keywords:	tree automata inference algorithm （semi-）structured documents unranked tree automata information extraction KLH tree language
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏