首页 | 本学科首页   官方微博 | 高级检索  
     

基于本体的网页数据抽取技术的研究
引用本文:常丽君. 基于本体的网页数据抽取技术的研究[J]. 数字社区&智能家居, 2014, 0(6): 3726-3728
作者姓名:常丽君
作者单位:南京财经大学信息工程学院,江苏南京210046
摘    要:随着网络上信息的飞速增长,网络已发展成为一个巨大的数据库,人们对快速准确地获取网页数据提出了更多的需求。目前,自然语言处理领域已经将网页信息抽取技术的研究作为一个重点。首先该文介绍了关于本体的一些基础知识,在此基础上提出并实现了一种基于领域本体的网页数据抽取方法。在该文中,利用领域本体的关键词、概念及关系来生成抽取规则,采用语法分析模块对输入的文档进行预处理,最后根据语法分析的机构和生成的抽取规则来对文档实现数据抽取。实验证明,该方法具有良好的性能。

关 键 词:本体  网页数据抽取  包装器

Web Information Extraction Based on Ontology
CHANG Li-jun. Web Information Extraction Based on Ontology[J]. Digital Community & Smart Home, 2014, 0(6): 3726-3728
Authors:CHANG Li-jun
Affiliation:CHANG Li-jun (School of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210046, China)
Abstract:With the rapid growth of information on the network, the network has developed into a huge database, people are more desirable to get pages of data quickly. Currently, the field of natural language processing has focused the web information extraction. First this pager introduces the basic knowledge of ontology. Based on this, this pager presents a new approach to ex-tracting information from normal document based on ontology. This paper first introduces some basic knowledege about the on-tology, then proposed and implemented a web data extraction method. In this pager, it used domain ontology words, concepts and relationships to generate extraction rules, used the syntax analysis module for pre-processing the input document. At last, it achieve the data according to extraction rules and documents generated by parsing. The experiment has shown that the approch got a very good performance.
Keywords:ontology  web information extraction  wrapper
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号