正则表达式在Web信息抽取中的应用 Regular expression and its applications to web information extraction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

正则表达式在Web信息抽取中的应用

引用本文：	胡军伟,秦奕青,张伟.正则表达式在Web信息抽取中的应用[J].北京机械工业学院学报,2011(6):86-89.

作者姓名：	胡军伟秦奕青张伟

作者单位：	北京信息科技大学计算机学院,北京100192

基金项目：	北京市教育科技计划面上项目（KM201110772014）

摘要：	针对基于HTML结构的信息抽取方法,提出了正则表达式的处理方法。利用正则表达式的匹配、替换和提取等功能,重点讨论了正则表达式在Web信息抽取过程中的应用。正则表达式已成功的应用在数据搜集、页面优化、规则学习和信息抽取等整个Web信息抽取的过程中。
关键词：	Web信息抽取正则表达式匹配替换提取
Regular expression and its applications to web information extraction

HU Jun-wei,QIN Yi-qing,ZHANG-Wei.Regular expression and its applications to web information extraction[J].Journal of Beijing Institute of Machinery,2011(6):86-89.

Authors:	HU Jun-wei QIN Yi-qing ZHANG-Wei

Affiliation:	(School of Computer Science,Beijing Information Science and Technology University,Beijing 100192,China)

Abstract:	A processing approach of the regular expression is proposed in connection with information extraction methods based on HTML-structure.The applications of the regular expression in the process of web information extraction is discussed,by using the regular expression＇s functions of matching、replacing、extraction and so on.The regular expression is used successfully in the whole process of web information extraction,such as webpage collecting、webpage optimization、rule learning and information extraction.

Keywords:	Web information extraction regular expression matching substitution extraction
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏