频繁项集在Deep Web数据源聚类中的应用 Deep Web data source clustering using frequent itemsets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

频繁项集在Deep Web数据源聚类中的应用

引用本文：	张蓬飞,朱群雄.频繁项集在Deep Web数据源聚类中的应用[J].计算机工程与应用,2012,48(14):152-157.

作者姓名：	张蓬飞朱群雄

作者单位：	北京化工大学信息科学与技术学院,北京,100029

摘要：	在Deep Web页面的背后隐藏着海量的可以通过结构化的查询接口进行访问的数据源。将这些数据源按所属领域进行组织划分,是DeepWeb数据集成中的一个关键步骤。已有的划分方法主要是基于查询接口模式和提交查询返回结果,存在查询接口特征难以完全抽取和提交数据库查询效率不高等问题。提出了一种结合网页文本信息,基于频繁项集的聚类方法,根据数据源查询接口所在页面的标题、关键词和提示文本,将数据源按照领域进行聚类,有效解决了传统方法中依赖查询接口特征以及文本模型的高维性问题。实验结果表明该方法是可行的,具有较高的效率。
关键词：	深层网络数据源聚类文本聚类频繁项集数据集成
Deep Web data source clustering using frequent itemsets

ZHANG Pengfei , ZHU Qunxion.Deep Web data source clustering using frequent itemsets[J].Computer Engineering and Applications,2012,48(14):152-157.

Authors:	ZHANG Pengfei ZHU Qunxion

Affiliation:	College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China

Abstract:	There are thousands of data sources hiding behind the Deep Web pages which can be accessed through structured query interfaces.Organizing these data sources by their domains has become an important step in Deep Web data integration process.Existing methods mainly focus on query interface schema and query results which have the disadvantages of difficulty in extracting interface schemas and deficiency of submitting queries to the database.A method based on frequent itemsets is proposed to cluster the data sources by their domains.This method considers the Web page text information such as title,key words and label text and solves the problems of overdependency on the query interface and high dimensionality of text processing in traditional solutions.Experimental results show effectiveness and efficiency of this method.

Keywords:	Deep Web data source clustering text clustering frequent itemsets data integration
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏