基于XML的Web数据挖掘关键技术的研究 Research on Key Technologies of Web Mining Based on XML期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于XML的Web数据挖掘关键技术的研究

引用本文：	崔建群,何炎祥,郑世珏,吴黎兵.基于XML的Web数据挖掘关键技术的研究[J].计算机工程,2006,32(20):43-44,7.

作者姓名：	崔建群何炎祥郑世珏吴黎兵

作者单位：	1. 华中师范大学网络与通信研究所,武汉,430079;武汉大学计算机学院,武汉,430072 2. 武汉大学计算机学院,武汉,430072 3. 华中师范大学网络与通信研究所,武汉,430079

摘要：	由于存在着大量的在线信息，WWW成为数据挖掘的热点。该文介绍了Web网页的数据挖掘技术，提出一种基于XML的Web数据挖掘模型，阐述将半结构化HTML文档转换成良构的XML文档的原因，并给出基于HTML Tide库的转换代码，介绍了利用XML技术从Web网页析取数据的关键技术，包括XHTML、XSLT和XQuery等，对Web数据挖掘的其他方面如数据检验和集成作了一定的探讨。
关键词：	Web数据挖掘 XML模型关键技术
文章编号：	1000-3428（2006）20-0043-02
收稿时间：	2006-03-30
修稿时间：	2006-03-30
Research on Key Technologies of Web Mining Based on XML

CUI Jianqun,HE Yanxiang,ZHENG Shijue,WU Libing.Research on Key Technologies of Web Mining Based on XML[J].Computer Engineering,2006,32(20):43-44,7.

Authors:	CUI Jianqun HE Yanxiang ZHENG Shijue WU Libing

Affiliation:	(1. Institute of Network & Communication Technology, Huazhong Normal University, Wuhan 430079; 2. School of Computer, Wuhan University, Wuhan 430072)

Abstract:	With the huge amount of information available online, the World Wide Web is a fertile area for data mining research. This paper addresses the issues related to data extraction from Web pages, and strongly suggests an XML-based approach for solving it. This paper describes the motivations behind converting semi-structured HTML documents into well-formed XML and presents a portion of conversion source codes that is developed based on HTML Tidy library, illustrates how to extract desired information from Web pages with XML technologies, including XHTML, XSLT and XQuery. It also discusses other aspects in the Web mining project such as data check and data integration.

Keywords:	Web data mining XML-based model Key technologies
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏