近似重复记录的增量式识别算法 Incremental Algorithm for Detecting Approximately Duplicate Database Records期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

近似重复记录的增量式识别算法

引用本文：	许向阳,佘春红. 近似重复记录的增量式识别算法[J]. 计算机工程与应用, 2003, 39(12): 191-193,220

作者姓名：	许向阳佘春红

作者单位：	华中科技大学计算机学院数据库与多媒体技术研究所,武汉,430074

基金项目：	国家科技攻关计划项目“科技部科技电子政务系统关键技术及应用系统的研究”(编号:2001BA110B01)

摘要：	摘要数据清理是数据仓库中的一个重要研究内容,近似重复记录的识别是其中的一个技术难点。文章介绍了近邻排序方法,并以此为基础,研究了在数据模式与匹配规则不变的前提下,数据源动态增加时近似重复记录识别问题,提出了一种增量式算法IMPN(IncrementalMulti-Passsorted-Neighborhood)。文章最后给出了实验结果。
关键词：	数据清理近似重复记录增量式识别特征记录
文章编号：	1002-8331-(2003)12-0191-03
Incremental Algorithm for Detecting Approximately Duplicate Database Records

Xu Xiangyang She Chunhong. Incremental Algorithm for Detecting Approximately Duplicate Database Records[J]. Computer Engineering and Applications, 2003, 39(12): 191-193,220

Authors:	Xu Xiangyang She Chunhong

Abstract:	Data cleaning is an important area of data warehouse.Detecting approximately duplicate database records is one of technology difficulties.This paper introduces sorted-neighborhood method.Based on this idea,it studies the prob-lem for detecting approximately duplicate records while receiving increments of data with no changes in data schema and matching rule-set,and presents an incremental algorithm for detecting the records.Finally,it gives out the experi-mental results.

Keywords:	Data cleaning Approximately duplicate records Incremental detection Representative record
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏