WEB文献资料采集系统 Web Literature Collection System期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

WEB文献资料采集系统

引用本文：	马创新.WEB文献资料采集系统[J].计算机系统应用,2012,21(7):9-12,37.

作者姓名：	马创新

作者单位：	南京师范大学文学院,南京210097

基金项目：	国家社科基金重大项目（10＆ZD117）; 江苏高校重点研究基地重大项目（2010JDXM023）; 江苏省教育厅高校哲学社会科学基金（2011SJB740010）; 江苏省高校自然科学研究项目（11KJD520009）

摘要：	为了能够充分利用WEB上丰富的文献资源,设计了一个专业的WEB文献资料采集系统WLES。该系统集成了网页抓取和网页清洗两方面技术,并且引入机器学习方法到网页清洗中,通过机器对训练语料的学习得到一个清洗模型,然后用该模型来实施网页清洗。实验证明该系统在网页抓取和网页清洗方面都具有优良的性能,能够满足使用者的文献采集需求。
关键词：	文献资料采集机器学习网页清洗清洗模型
收稿时间：	2011/11/3 0:00:00
修稿时间：	2011/12/1 0:00:00
Web Literature Collection System

MA Chuang-Xin.Web Literature Collection System[J].Computer Systems& Applications,2012,21(7):9-12,37.

Authors:	MA Chuang-Xin

Affiliation:	MA Chuang-Xin(College of Liberal Arts,Nanjing Normal University,Nanjing 210097,China)

Abstract:	In order to take advantage of the rich literature resources on the WEB,this paper designed a professional web literature collection system WLES.The WLES integrates Web crawling and Web cleaning technology.The machine learning method is introduced to the study of Web cleaning.Machine learning on the training data can get a clean model,and then use the model to implement web cleaning.Experiments show： WLES in web crawling and web page cleaning has an excellent performance,to meet the needs of the user＇s literature collection.

Keywords:	literature collection machine learning pages clean cleaning model
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏