首页 | 本学科首页   官方微博 | 高级检索  
     

面向智慧民生领域的增量交互式数据集成方法
引用本文:夏丁, 王亚沙, 赵梓棚, 崔达. 面向智慧民生领域的增量交互式数据集成方法[J]. 计算机研究与发展, 2017, 54(3): 586-596. DOI: 10.7544/issn1000-1239.2017.20151048
作者姓名:夏丁  王亚沙  赵梓棚  崔达
作者单位:1(高可信软件技术教育部重点实验室(北京大学) 北京 100871);2(北京大学信息科学技术学院 北京 100871);3(软件工程国家工程研究中心(北京大学) 北京 100871) (fabkxd@foxmail.com)
基金项目:国家“八六三”高技术研究发展计划基金项目(2013AA01A605);质检公益性行业科研专项(201510209)
摘    要:智慧民生作为智慧城市的重点领域,包含众多应用系统,积累了大量层次结构数据.为了形成城市范围完整数据集,需要集成并统一异构的数据模式,向用户提供统一的数据视图.针对智慧民生领域的领域知识宽泛、缺乏中文语义匹配支持、模式数量众多、元素标签缺失但实例数据丰富等几方面特点,提出了一种增量交互式模式集成方法.该方法采用增量迭代的方式逐步完成多模式集成任务,大幅降低集成计算量;在模式匹配阶段,综合利用模式信息和实例数据构造了多种适用于中文且能力互补的匹配器,并通过相似度熵来度量机器的决策置信度,适度引入人工干预;在中介模式生成阶段,处理模式间可能出现的各种冲突,最终输出全局统一的中介模式.利用从互联网爬取的多源二手房数据设计并完成实验,实验结果表明:此方法在人工干预程度足够小的前提下,具有较好的模式匹配准确性.

关 键 词:模式匹配  模式集成  数据集成  智慧城市  智慧民生

Incremental and Interactive Data Integration Approach for Hierarchical Data in Domain of Intelligent Livelihood
Xia Ding, Wang Yasha, Zhao Zipeng, Cui Da. Incremental and Interactive Data Integration Approach for Hierarchical Data in Domain of Intelligent Livelihood[J]. Journal of Computer Research and Development, 2017, 54(3): 586-596. DOI: 10.7544/issn1000-1239.2017.20151048
Authors:Xia Ding  Wang Yasha  Zhao Zipeng  Cui Da
Affiliation:1(Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871);2(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871);3(National Engineering & Research Center of Software Engineering (Peking University), Beijing 100871)
Abstract:Intelligent livelihood is an important domain of the smart city. In this domain, there are many application systems that have accumulated a large number of multi-source hierarchical data. In order to form an overall and unified view of the multi-source data in the whole city, variant data schemas should be integrated. Considering the distinct characteristics of the data from intelligent livelihood domain, such as lacking support for semantic matching of Chinese labels, numerous quantities of schemas, missing element labels, the existing schema integration approaches are not suitable. Under such circumstances, we propose an incremental and iterative approach which can deduce the massive computation workload due to the big number of schemas. In each iteration, both meta information and instance data are used to create more precise results, and a similarity entropy based criteria is carefully introduced to control the human intervention. Experiments are also conducted based on real data of second-hand housing in Beijing fetched from multiple second-hand Web applications. The results show that our approach can get high matching accuracy with only little human interventions.
Keywords:schema matching  schema integration  data integration  smart city  intelligent livelihood
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号