基于Web的新闻采集系统 News Extraction System Based on Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Web的新闻采集系统

引用本文：	胡静芳,沈亚斌.基于Web的新闻采集系统[J].数字社区&智能家居,2009(19).

作者姓名：	胡静芳沈亚斌

作者单位：	景德镇陶瓷学院信息工程学院;中国直升机设计研究所;

摘要：	随着Internet的飞速发展,Web已经发展成为一个巨大的信息资源库,但是目前Web数据大都以HTML形式出现,这使得应用程序无法直接利用Web上的海量信息。针对这一问题,出现了Web信息采集技术。该文对信息采集技术进行了探讨,并在此基础上实现了一个基于Web的新闻采集系统,该系统可根据用户使用正则表达式编写的采集规则快速而精确的采集目标网页中的信息,保存在本地数据库中,用于内部使用或外网发布。
关键词：	Web信息采集正则表达式采集规则
News Extraction System Based on Web

HU Jing-fang,SHEN Ya-bin.News Extraction System Based on Web[J].Digital Community & Smart Home,2009(19).

Authors:	HU Jing-fang SHEN Ya-bin

Affiliation:	1.School of Information Engineering;Jingdezhen Ceramic Insititute;Jingdezhen 333403;China;2.China Helicopter Research and Devel-opment Institute;Jingdezhen 333001;China

Abstract:	With the rapid development of Internet,Web has become a huge,distribution and sharing of information resources library.But most of Web-data are represented with HTML.So the massive Web-data are not available to the applications.For this purpose,the tech-nology of Web-information extraction appeared.In this thesis,we discussed the technology of information extraction,and on this basis to achieve a Web-based news extraction system,which users can use regular expressions to make extraction rule and use it to e...

Keywords:	Web-information extraction regular expressions extraction rule
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏