基于半CRF模型的百科全书文本段落划分 A Semi-Markov CRF Model Approach to Encyclopedia Text Topic Segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于半CRF模型的百科全书文本段落划分

引用本文：	许勇,宋柔.基于半CRF模型的百科全书文本段落划分[J].北京工业大学学报,2008,34(2):204-210.

作者姓名：	许勇宋柔

作者单位：	1. 北京工业大学计算机学院,北京,100022 2. 北京语言大学计算机系,北京,100083

基金项目：	国家自然科学基金，国家高技术研究发展计划(863计划)

摘要：	介绍了基于半条件随机域(semi-Markov conditional random fields,简称semi-CRFs)模型的百科全书文本段落划分方法.为了克服单纯的HMM模型和CRF模型的段落类型重复问题,以经过整理的HMM模型状态的后验分布为基本依据,使用了基于词汇语义本体知识库的段落开始特征以及针对特定段落类型的提示性特征来进一步适应目标文本的特点.实验结果表明,该划分方法可以综合利用各种不同类型的信息,比较适合百科全书文本的段落结构,可以取得比单纯的HMM模型和CRF模型更好的性能.
关键词：	自然语言处理机器学习隐马尔科夫模型文本段落划分半条件随机域模型
文章编号：	0254-0037（2008）02-0204-07
收稿时间：	2006-11-10
修稿时间：	2006年11月10
A Semi-Markov CRF Model Approach to Encyclopedia Text Topic Segmentation

XU Yong,SONG Rou.A Semi-Markov CRF Model Approach to Encyclopedia Text Topic Segmentation[J].Journal of Beijing Polytechnic University,2008,34(2):204-210.

Authors:	XU Yong SONG Rou

Abstract:	This paper introduced the semi-markov Conditional Random Fields(semi-CRFs)model based method for Chinese Encyclopedia text topic segmentation.The authors adopted HMM model state posterior as the basic segmentation clue which was adjusted to each text instance to overcome the topic duplication problem of fully connected state HMM model and CRF model.The authors also used several segment level word semantic features derived from domain thesaurus,and additional topic specific cue phrases to make the method more adapted to target domain.The experiment result showed that this method was suitable for Chinese Encyclopedia text topic structure and achieved better performance than HMM model and CRF model.

Keywords:	natural language processing systems machine learning hidden markov models topic segmentation semi-Markov CRF
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏