首页 | 本学科首页   官方微博 | 高级检索  
     

一种提高相似重复记录检测精度的方法
引用本文:陈伟,王昊,朱文明. 一种提高相似重复记录检测精度的方法[J]. 计算机应用与软件, 2006, 23(10): 29-30,42
作者姓名:陈伟  王昊  朱文明
作者单位:南京审计学院,江苏,南京,210029;南京审计学院,江苏,南京,210029;南京审计学院,江苏,南京,210029
基金项目:江苏省高校自然科学基金;国家高技术研究发展计划(863计划);国家自然科学基金
摘    要:如何消除数据源中的相似重复记录是数据清理研究中的一个重要问题。为了提高相似重复记录的检测精度,在相似重复记录检测算法的基础上,采用等级法为记录各字段指定合适的权重,从而提高了相似重复记录的检测精度。最后,以一个实例验证了该方法的效果。

关 键 词:数据挖掘  数据清理  相似重复记录  等级法
收稿时间:2004-10-22
修稿时间:2004-10-22

A METHOD OF IMPROVING APPROXIMATELY DUPLCATED RECORDS DETECTION PRECISION
Chen Wei,Wang Hao,Zhu Wenming. A METHOD OF IMPROVING APPROXIMATELY DUPLCATED RECORDS DETECTION PRECISION[J]. Computer Applications and Software, 2006, 23(10): 29-30,42
Authors:Chen Wei  Wang Hao  Zhu Wenming
Affiliation:Nanjing Audit Institute, Nanjing Jiangsu 210029, China
Abstract:How to clean approximately duplicated records in data source is an important problem in data cleaning. To improve the detecting precision, based on method of approximately duplicated records cleaning, each field of record is appointed a proper weight through using rank - based weights method in the process of approximately duplicated records detecting. Finally, the validity of this method is proved by an example.
Keywords:Data mining Data cleaning Approximately duplicated records Rank -based weights method
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号