一种中文地址类相似重复信息的检测方法 Detection Method of Approximately Duplicated Chinese Address Information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种中文地址类相似重复信息的检测方法

引用本文：	刘哲,夏秀峰,宋晓燕,林桐.一种中文地址类相似重复信息的检测方法[J].小型微型计算机系统,2008,29(4):726-729.

作者姓名：	刘哲夏秀峰宋晓燕林桐

作者单位：	1. 沈阳航空工业学院,计算机学院,辽宁,沈阳,110136;沈阳师范大学,计算中心,辽宁,沈阳,110034 2. 沈阳航空工业学院,计算机学院,辽宁,沈阳,110136

摘要：	数据仓库中相似重复记录的识别与消除是数据清洗的热点问题,其中地址类信息对相同实体识别起着非常重要的作用.针对中文地址类信息的处理,建立了包含分词规则的元数据库,提出一种相似重复检测模型.在此基础上,描述了基于特征字符的分词算法和利用可变权值策略计算记录相似度的算法.实验结果表明该方法能有效解决中文地址类重复信息的检测,提高了算法的执行效率及检测精度.
关键词：	相似重复记录中文地址特征字符分词可变权值中文地址相似度重复信息检测方法 Address Information Chinese Method of 检测精度执行效率有效解决结果实验分词算法记录计算策略可变权值利用特征字符描述
文章编号：	1000-1220(2008)04-0726-04
修稿时间：	2006年11月24
Detection Method of Approximately Duplicated Chinese Address Information

LIU Zhe,XIA Xiu-feng,SONG Xiao-yan,LIN Tong.Detection Method of Approximately Duplicated Chinese Address Information[J].Mini-micro Systems,2008,29(4):726-729.

Authors:	LIU Zhe XIA Xiu-feng SONG Xiao-yan LIN Tong

Affiliation:	LIU Zhe1,2,XIA Xiu-feng1,SONG Xiao-yan1,LIN Tong1 1(School of Computer,Shenyang Institute of Aeronautical Engineering,Shenyang 110136,China) 2(Computing Center,Shenyang Normal University,Shenyang 110034,China)

Abstract:	It's a hot issue to eliminate approximately duplicated records in data cleansing operation of data warehouse,in which the address information play an important role to identify the same entity.Aiming at the processing of Chinese address information,the meta-database of segment rules is established,and an approximately duplicated detection model is proposed.The feature word based segment algorithm and similarity computation algorithm are presented.The experiment results indicate that this method can detect a...

Keywords:	approximately duplicated records Chinese address information tagged word segment variable weight
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏