首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场与时间词库的中文时间表达式识别
引用本文:吴琼,黄德根. 基于条件随机场与时间词库的中文时间表达式识别[J]. 中文信息学报, 2014, 28(6): 169-174
作者姓名:吴琼  黄德根
作者单位:大连理工大学 计算机科学与技术学院,辽宁 大连 116024
基金项目:国家自然科学基金(61173100,61173101,61272375)
摘    要:该文提出一种统计与规则相结合的时间表达式识别方法。首先,通过分析中文文本中时间表达式的词形、词性和上下文信息,采用条件随机场识别时间单元而非时间表达式整体,避免了中文时间表达式边界定位不准确的问题;然后,从训练语料中自动获取候选触发词,并依据评价函数对候选触发词打分,筛选出正确的触发词完善触发词库;最后,根据时间触发词库与时间缀词库,制定规则对时间表达式边界进行定位。实验结果显示开式测试F1值达到98.31%。

关 键 词:CRF  规则  时间触发词  时间缀词  

Temporal Information Extraction Based on CRF and Time Thesaurus
WU Qiong,HUANG Degen. Temporal Information Extraction Based on CRF and Time Thesaurus[J]. Journal of Chinese Information Processing, 2014, 28(6): 169-174
Authors:WU Qiong  HUANG Degen
Affiliation:School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
Abstract:This paper proposes a generic algorithm for time expression recognition task by combining rules with statistics. By analyzing a set of linguistic features of time expressions such as lexical features and context information, Conditional Random Fields (CRF) is applied to recognize time unit rather than time expression so as to, avoid the boundary localization problems in Chinese time expressions. In addition, the candidate trigger words are automatically obtained from the test corpus, refining the trigger thesaurus by a designed score function. Finally, rules for the time expression boundary localization are formulated based on time trigger thesaurus and time affix word thesaurus. Our experimental results show that the F1 value reaches 98.31% in an open test.
Keywords:CRF   rule   time trigger   time affix word  
本文献已被 CNKI 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号