首页 | 本学科首页   官方微博 | 高级检索  
     

支持隐私保护的众包实体解析
引用本文:燕彩蓉,张洋舜,徐光伟. 支持隐私保护的众包实体解析[J]. 计算机科学与探索, 2014, 0(7): 802-811
作者姓名:燕彩蓉  张洋舜  徐光伟
作者单位:[1]东华大学计算机科学与技术学院,上海201620 [2]同济大学嵌入式系统与服务计算教育部重点实验室,上海200092
基金项目:The Fundamental Research Funds for the Central Universities of China under Grant No.14D111210(中央高校基本科研业务费专项基金);the Open Fund of the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University(同济大学嵌入式系统与服务计算教育部重点实验室开放课题).
摘    要:实体解析是指发现并聚合描述现实世界中同一对象的记录。纯粹的机器算法虽然可以获得较高的效率,但是准确率难以保证。提出了一种机器计算与众包相结合的实体解析方法。该方法首先采用MapReduce并行计算框架排除不可能匹配的记录对,减少人类智能任务的数量,然后由人工进行确定性标注。为了支持隐私保护,在众包计算时提出了基于角色的访问控制模型和重要信息隐藏策略。该方法和模型被应用于某医院患者主索引构建平台,实验结果表明,人机结合方法充分利用了机器和人工处理的优势,可以进行高效率和高精度的患者实体解析,并且有效地避免了患者信息的泄漏。

关 键 词:实体解析  众包  MapReduce编程模型  隐私保护  患者主索引

Crowdsourcing Entity Resolution with Privacy Protection
YAN Cairong,ZHANG Yangshun,XU Guangwei. Crowdsourcing Entity Resolution with Privacy Protection[J]. Journal of Frontier of Computer Science and Technology, 2014, 0(7): 802-811
Authors:YAN Cairong  ZHANG Yangshun  XU Guangwei
Affiliation:1. School of Computer Science and Technology, Donghua University, Shanghai 201620, China 2. Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China)
Abstract:Entity resolution is to find and cluster records that refer to the same real-world object. It can be an extremely difficult process to get high accuracy for computer algorithms alone although they can bring high efficiency. This paper proposes a hybrid approach combining machine processing with crowdsourcing for entity resolution. Firstly the record pairs that are impossible to match are excluded by MapReduce-based parallel computing framework so as to reduce the number of human intelligence tasks, and then those ambiguous record pairs are labeled by human oper-ation. A role-based access control model and related information hiding strategies are adopted for privacy protection during the crowdsourcing sessions. The approach and the model are applied on the master patient index building platform for a hospital. The experimental results show that they make full use of the advantages of machine-based and human-based processing ways, bring high efficiency and accuracy for patient entity resolution, and avoid the leakage of patient information.
Keywords:entity resolution  crowdsourcing  MapReduce programming model  privacy protection  master patient index
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号