基于聚类分析技术的数据清洗研究 Improved Algorithms for Data Cleansing Based on Clustering Analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于聚类分析技术的数据清洗研究

引用本文：	刘芳,何飞.基于聚类分析技术的数据清洗研究[J].计算机工程与科学,2005,27(6):70-71.

作者姓名：	刘芳何飞

作者单位：	华中科技大学计算机科学与技术学院,湖北,武汉,430074;华中科技大学计算机科学与技术学院,湖北,武汉,430074

基金项目：	国家“十五”重大科技专项课题(2001BA102A06 11)

摘要：	数据清洗是建立数据仓库及进行数据挖掘的一个重要步骤。数据清洗的核心是检测近似重复记录，而聚类是将相似度高的数据对象聚集到一个类中的分析方法。本文描述的数据清洗过程就基于聚类分析，它将基于密度的改进聚类算法ICAD应用到数据清洗过程中，该算法通过不断调节密度发现近似重复记录，快速完成大容量数据清洗任务。
关键词：	数据清洗近似重复记录聚类 ICAD
文章编号：	1007-130X(2005)06-0070-02
修稿时间：	2004年2月11日
Improved Algorithms for Data Cleansing Based on Clustering Analysis

LIU Fang,HE Fei.Improved Algorithms for Data Cleansing Based on Clustering Analysis[J].Computer Engineering & Science,2005,27(6):70-71.

Authors:	LIU Fang HE Fei

Abstract:	Data cleansing is an important part of data warehousing and data mining, and finding the approximately duplicate database records is a key technology of data cleansing. Another sub area of data mining is clustering analysis, which congregates similar data records in a group. The data cleansing process described in this paper is based on clustering, it employs the algorithm ICAD (Improved Clustering algorithms using Adjustable Density) in the data cleansing process. ICAD can find approximately duplicate database records by using adjustable density. It accomplishes the data cleansing task in a large volume data set.

Keywords:	data cleansing approximately duplicate record clustering ICAD
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏