首页 | 本学科首页   官方微博 | 高级检索  
     

基于同义实体识别的Web信息集成
引用本文:徐喆昊,吴共庆,胡学钢.基于同义实体识别的Web信息集成[J].计算机系统应用,2015,24(9):35-42.
作者姓名:徐喆昊  吴共庆  胡学钢
作者单位:合肥工业大学 计算机系, 合肥 230009;合肥工业大学 计算机系, 合肥 230009;合肥工业大学 计算机系, 合肥 230009
基金项目:国家高技术研究发展计划(863)(2012AA011005);国家自然科学基金(61273297)
摘    要:准确有效地集成海量Web信息, 是Web信息动态聚合、市场情报分析、舆情分析、商业智能等分析型应用的重要基础. 针对数据集成过程中不同实体指代同一实体的问题, 利用搜索引擎返回的页面摘要信息, 设计并实现了一种基于搜索引擎的同义实体识别算法FSE, 并提出了一种基于同义实体识别的Web信息集成框架. 在医院信息集成测试数据集上的实验结果表明, FSE算法效果优于基于VarientDice、VarientCosine、VarientJaccard、VarientOverlap相似度计算的同义实体识别算法.

关 键 词:Web信息集成  同义实体识别  相似度计算  搜索引擎
收稿时间:2015/1/20 0:00:00
修稿时间:3/4/2015 12:00:00 AM

Web Information Integration Based on Synonymous Entities Recognition
XU Zhe-Hao,WU Gong-Qing and HU Xue-Gang.Web Information Integration Based on Synonymous Entities Recognition[J].Computer Systems& Applications,2015,24(9):35-42.
Authors:XU Zhe-Hao  WU Gong-Qing and HU Xue-Gang
Affiliation:Department of Computer Science, Hefei University of Technology, Hefei 230009, China;Department of Computer Science, Hefei University of Technology, Hefei 230009, China;Department of Computer Science, Hefei University of Technology, Hefei 230009, China
Abstract:Integrating massive information on the Web accurately and effectively is the important basis of developing analytic applications, such as Web information dynamic aggregation tools, market information analysis tools, public opinion analysis tools, and business intelligence tools, etc. To solve the problem that different presentations refer to the same entity during the integrating process, this paper proposes an algorithm to recognize the synonymous entities by using the snippets from the search engine and a frame of Web information integration based on synonymous entities recognition. The experimental results on hospital information integration testing data sets show that the proposed method outperforms the synonymous entities recognition based on VarientDice, VarientCosine, VarientJaccard and VarientOverlap.
Keywords:Web information integration  synonymous entities recognition  similarity computation  search engine
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号