首页 | 本学科首页   官方微博 | 高级检索  
     


An effective weighted rule-based method for entity resolution
Authors:Hiba Abu Ahmad  Hongzhi Wang
Affiliation:1.Department of Computer Science,Harbin Institute of Technology,Harbin,China
Abstract:Entity resolution is an important task in data cleaning to detect records that belong to the same entity. It has a critical impact on digital libraries where different entities share the same name without any identifier key. Conventional methods adopt similarity measures and clustering techniques to reveal the records of a specific entity. Due to the lack of performance, recent methods build rules on records’ attributes with distinct values for entities to overcome some drawbacks. However, they use inadequate attributes and ignore common and empty attributes values which affect the quality of entity resolution. In this paper, we define a multi-attributes weighted rule system (MAWR) that investigates all values of records’ attributes in order to represent the difficult record-entity mapping. Then, we propose a rule generation algorithm based on this system. We also propose an entity resolution algorithm (MAWR-ER) depending on the generated rules to identify entities. We verify our method on real data, and the experimental results prove the effectiveness and efficiency of our proposed method.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号