基于Web挖掘的网页清洗技术 Web Page Cleaning Technology Based on Web Mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Web挖掘的网页清洗技术

引用本文：	李嘉佑,贾自艳,何清,史忠植. 基于Web挖掘的网页清洗技术[J]. 计算机工程与应用, 2006, 42(25): 98-101

作者姓名：	李嘉佑贾自艳何清史忠植

作者单位：	中国科技大学,合肥,230027;中国科学院计算技术研究所智能信息处理实验室,北京,100080;中国科学院计算技术研究所智能信息处理实验室,北京,100080

摘要：	随着互联网上信息的大量增多,Web挖掘技术越来越重要。而在Web挖掘过程中,基于Web的信息抽取的主要部分是如何去除网页中的噪音数据,它是Web数据的预处理的过程,这个预处理结果影响了Web挖掘的结果。在文中先分析了噪音数据的特点,然后根据实际观察提取规则并且用于模型统计的方法,去除噪音数据,抽取相关可利用的信息。
关键词：	Web数据信息抽取噪音数据
文章编号：	1002-8331-（2006）25-0098-04
收稿时间：	2006-03-01
修稿时间：	2006-03-01
Web Page Cleaning Technology Based on Web Mining

LI Jia-you,JIA Zi-yan,HE Qing,SHI Zhong-zhi. Web Page Cleaning Technology Based on Web Mining[J]. Computer Engineering and Applications, 2006, 42(25): 98-101

Authors:	LI Jia-you JIA Zi-yan HE Qing SHI Zhong-zhi

Affiliation:	1 University of Science and Technology of China,Hefei 230027;2 Key Laboratory of Intelligent Information Processing,Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080

Abstract:	With rapid expansion of information resources important role.How to eliminate noisy information in web on the Internet increasingly,Web mining technology plays an pages is a main part of information extraction based on Web mining.It is a preprocessing step in the Web mining.The result of Web mining lies on the step.In the paper,we firstly analyze the feature of noisy information.Then,based on our observation ,using some extracting rules and statistic methods to eliminate noisy information and extract available information.

Keywords:	Web data information extraction noisy information
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏