基于聚类的非清洁数据库的聚集查询处理算法 An Aggregation Query Processing Method of Dirty Database Based on Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于聚类的非清洁数据库的聚集查询处理算法

引用本文：	姜国华,王宏志,李建中,高宏.基于聚类的非清洁数据库的聚集查询处理算法[J].计算机研究与发展,2009,46(Z2).

作者姓名：	姜国华王宏志李建中高宏

作者单位：	哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001

基金项目：	国家"九七三"重点基础研究发展计划基金项目，国家自然科学基金重点项目，国家自然科学基金项目，黑龙江省青年科技专项基金项目，NSFC/RGC联合科研基金项目

摘要：	现实数据库中的不完整数据、不一致数据、重复数据等非清洁数据为数据库的有效使用带来了影响,从包含非清洁数据的数据库中得到满足清洁度要求的统计分析结果,为数据库研究带来了新的挑战,聚集查询是统计分析的基础.面向非清洁数据,提出了有清洁度保证的聚集查询处理算法,用于处理包含group by子句的聚集查询.考虑到在非清洁数据中,同一个元组可能属于不同的分组,提出的方法是利用可重叠聚类的方法将数据库中的元组加以分组,从而得到考虑数据非清洁性的分组,以及基于这些分组计算得到的聚集结果及其以概率表达的清洁度.提出的方法适用于多种聚集函数以及包含选择条件的聚集查询.通过实验验证了方法的效率.
关键词：	非清洁数据聚集聚集查询
An Aggregation Query Processing Method of Dirty Database Based on Clustering

Jiang Guohua,Wang Hongzhi,Li Jianzhong,Gao Hong.An Aggregation Query Processing Method of Dirty Database Based on Clustering[J].Journal of Computer Research and Development,2009,46(Z2).

Authors:	Jiang Guohua Wang Hongzhi Li Jianzhong Gao Hong

Abstract:	In the real world databases,dirty data such as incomplete data,inconsistent data,duplicate data affect the effectiveness of applications with databases.It brings new challenges to retrieve data with clean-degree assurance from the database with dirty data.Aggregation queries are the base of statistical analysis.In this paper,an aggregation query processing method on dirty data with cleandegree is proposed.It focuses on aggregation queries with"group by"clause.In dirty databases,one tuple may belong to multiple groups,so the proposed method uses overlap clustering methods to group the tuples and retrieves groups with a clean-degree.Based on these groups,the aggregated results and their clean-degree expressed by probability are computed.The method can deal with several kinds of aggregation functions and aggregation queries with constraints.Experimental results show the efficiency of the algorithms presented in this paper.

Keywords:	dirty data aggregation aggregation query
本文献已被万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏