首页 | 本学科首页   官方微博 | 高级检索  
     

健康领域Web信息抽取
引用本文:李汝君,张俊,张晓民,桂小庆.健康领域Web信息抽取[J].计算机应用,2016,36(1):163-170.
作者姓名:李汝君  张俊  张晓民  桂小庆
作者单位:大连海事大学 信息科学技术学院, 辽宁 大连 116026
基金项目:国家自然科学基金资助项目(61073057)。
摘    要:针对Web信息抽取(WIE)技术在健康领域应用的问题,提出了一种基于WebHarvest的健康领域Web信息抽取方法。通过对不同健康网站的结构分析设计健康实体的抽取规则,实现了基于WebHarvest的自动抽取健康实体及其属性的算法;再把抽取的实体及其属性进行一致性检查后存入关系数据库中,然后对关系数据库中隐含健康实体的属性值利用Ansj自然语言处理方法进行实体识别, 进而抽取健康实体之间的联系。该技术在健康实体抽取实验中,平均F值达到99.9%,在实体联系抽取实验中,平均F值达到80.51%。实验结果表明提出的Web信息抽取技术在健康领域抽取的健康信息具有较高的质量和可信性。

关 键 词:信息抽取  健康信息抽取  一致性检查  实体识别  实体联系抽取  
收稿时间:2015-07-01
修稿时间:2015-08-12

Web information extraction in health field
LI Rujun,ZHANG Jun,ZHANG Xiaomin,GUI Xiaoqing.Web information extraction in health field[J].journal of Computer Applications,2016,36(1):163-170.
Authors:LI Rujun  ZHANG Jun  ZHANG Xiaomin  GUI Xiaoqing
Affiliation:College of Information Science and Technology, Dalian Maritime University, Dalian Liaoning 116026, China
Abstract:For the question how to apply the Web Information Extraction (WIE) technology to health field, a Web information extraction method based on WebHarvest was proposed. Through the structure analysis of different health Web sites and the design of health entity extraction rules, the automatic extraction algorithm of health entity and its attributes based on WebHarvest was realized; then they were stored in a relational database after consistency check; in the end, the values of entity attributes were analyzed to recognize entities by using processing method of natural language Ansj to extract relationship among entities. In the health entity extraction experiments, the average F-measure of the technology reached 99.9%; in the entity contact extraction experiments, the average F-measure reached 80.51%. The experimental results show that the proposed Web information extraction technology has high quality and credibility in the health information extraction.
Keywords:information extraction                                                                                                                        health information extraction                                                                                                                        consistency check                                                                                                                        entity recognition                                                                                                                        entity relationship extraction
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号