正则表达式的Web数据提取研究 Study on Extraction Approach of Web Information Based on Regular Expression期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

正则表达式的Web数据提取研究

引用本文：	刘松业. 正则表达式的Web数据提取研究[J]. 电脑编程技巧与维护, 2008, 0(15): 89-91

作者姓名：	刘松业

作者单位：	华东师范大学信息学院,上海,200062

摘要：	Internet正在日益成为一个重要的信息来源，如何对Web数据进行检索和加工，使得用户能够更好地利用Intemet上的数据资源己经成为了新的研究热点。文中论述了半自动化数据提取算法，其中使用了基于扩展正则表达式的信息槽提取算法和基于网页特性的事件分割算法。同时描述了利用这些算法的信息提取系统，并详细介绍了系统的体系结构和实现细节。该系统可以被用于真实的Web环境中以提高存储、利用信息的效率，在一定程度上解决在Internet上获取信息及利用信息的困难。
关键词：	数据提取算法正则表达式半结构化数据
Study on Extraction Approach of Web Information Based on Regular Expression

LIU Songye. Study on Extraction Approach of Web Information Based on Regular Expression[J]. Computer Programming Skills & Maintenance, 2008, 0(15): 89-91

Authors:	LIU Songye

Affiliation:	LIU Songye (Information Schoal East China Normal University, Shanghai 200062)

Abstract:	Internet is becoming a very important information resource.It has been a hot field in academic research on how to retrieve and process the Web information to make users utilize the resources on the Interact more effectively and more efficiently. The paper also describes a semi automatic information extraction algorithm, which use the extraction algorithm based on extended regular expression and event split algorithm based on web page features. The algorithm is used in a web recruitment information extraction project and good performance is obtained. Relevant experiments are performed to show the advantages and disadvantages of these algorithms.

Keywords:	Web information extraction algorithms regular expression semi structure data
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏