基于多模板隐马尔可夫模型的文本信息抽取算法 Text information extraction algorithm based on multiple templates hidden Markov model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多模板隐马尔可夫模型的文本信息抽取算法

引用本文：	胡宇舟,王雷,顾学道.基于多模板隐马尔可夫模型的文本信息抽取算法[J].计算机应用,2008,28(3):699-702.

作者姓名：	胡宇舟王雷顾学道

作者单位：	1. 天津大学,管理学院,天津,300072 2. 清华大学,计算机科学与技术博士后流动站,北京,100084;深圳现代计算机有限公司,博士后科研工作站,广东,深圳,518057 3. 深圳现代计算机有限公司,博士后科研工作站,广东,深圳,518057

基金项目：	湖南省自然科学基金 , 福建省青年科技人才创新基金

摘要：	由于训练数据来源的多样化，难以通过学习得到最优的模型参数，因此提出了一种基于多模板隐马尔可夫模型的文本信息抽取算法。该算法首先利用文本排版格式和分隔符等信息，对文本进行分块；然后在分块的基础上，对训练数据进行聚类以形成多个形式的模板（多模板），并对多模板数据训练得到隐马尔可夫初始概率及转移概率参数；最后，用被训练的数据统一训练释放概率参数，结合初始概率、转移概率以及释放概率参数对文本信息进行抽取。实验结果表明，该算法在精确度和召回率指标上比简单隐马尔可夫模型具有更好的性能。
关键词：	文本信息抽取隐马尔可夫模型多模板文本分块
文章编号：	1001-9081(2008)03-0699-04
收稿时间：	2007-09-27
修稿时间：	2007年9月27日
Text information extraction algorithm based on multiple templates hidden Markov model

HU Yu-zhou,WANG Lei,GU Xue-dao.Text information extraction algorithm based on multiple templates hidden Markov model[J].journal of Computer Applications,2008,28(3):699-702.

Authors:	HU Yu-zhou WANG Lei GU Xue-dao

Affiliation:	HU Yu-zhou1,WANG Lei2,3,GU Xue-dao3(1.School of Management,Tianjin University,Tianjin 300072,China,2.Postdoctoral Program of Computer Science , Technology,Tsinghua University,Beijing 100804,3.Postdoctoral Program,Shenzhen Modern Computer Manufacture,Shenzhen Guangdong 518057,China)

Abstract:	Since training data sources are varied and it is difficult to obtain optimal model parameters through learning,a text information extraction algorithm based on Hidden Markov Model(HMM)with multiple templates was proposed.Firstly the algorithm segmented texts by using the information of typesetting formats and list separators.Then multiple templates were formed with clustering the training data based on the segmentations,and the parameters of initial probability and transition probability for HMM were obtain...

Keywords:	text information extraction Hidden Markov Model(HMM) multiple templates text block
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏