基于聚类模式的数据清洗技术 Data Cleaning Based on Clustering Technique期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于聚类模式的数据清洗技术

引用本文：	唐懿芳,钟达夫,严小卫. 基于聚类模式的数据清洗技术[J]. 计算机应用, 2004, 24(5): 116-119

作者姓名：	唐懿芳钟达夫严小卫

作者单位：	广西师范大学,计算机科学系,广西,桂林,541004;广西师范大学,计算机科学系,广西,桂林,541004;悉尼理工大学,信息技术学院,澳大利亚,悉尼

摘要：	在挖掘前都必须对所要挖掘的数据源进行清洗，以去掉不正确的数据。本文对数据清洗中整合多个数据源的问题做了相关的研究。针对现有检测复制记录技术存在的不足，提出了采用Canopy聚类技术进行聚类复制记录的数据清洗方法，并通过实验结果验证了所提算法的有效性和准确性。
关键词：	数据清洗 Canopy聚类技术复制记录
文章编号：	1001-9081(2004)05-0116-04
Data Cleaning Based on Clustering Technique

TANG Yi-fang,ZHONG Da-fu,YAN Xiao-wei. Data Cleaning Based on Clustering Technique[J]. Journal of Computer Applications, 2004, 24(5): 116-119

Authors:	TANG Yi-fang ZHONG Da-fu YAN Xiao-wei

Affiliation:	TANG Yi-fang~1,ZHONG Da-fu~1,YAN Xiao-wei~

Abstract:	The data sources need to be cleaned before mining,in order to eliminate incorrect data. This paper aims at the studies on the problem of data cleaning in integrating multiple data sources. After analyzing problems of existing techniques for duplicate records detection,this paper proposes an approach of data cleaning,by using the Canopy clustering technique to cluster duplicate records. Experiment results show effectiveness and accuracy of these algorithms.

Keywords:	data cleaning Canopy clustering technique duplicate record
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏