首页 | 本学科首页   官方微博 | 高级检索  
     

面向概括性小文本的文本分割算法
引用本文:陈源,陈蓉,胡俊锋,林霖,张靖波,于中华.面向概括性小文本的文本分割算法[J].计算机工程,2008,34(22):43-45.
作者姓名:陈源  陈蓉  胡俊锋  林霖  张靖波  于中华
作者单位:四川大学计算机学院,成都,610064
基金项目:国家自然科学基金资助项目(60473071); 高等学校博士学科点专项科研基金资助项目(20020610007); 四川大学计算机学院青年基金资助项目
摘    要:文本分割是自然语言文本处理的一项重要研究内容。该文针对现有模型无法有效分割概括性小文本的不足,提出基于隐马尔可夫模型的统计算法。该算法利用小文本中各结构块的长度及词汇信息,对概括性小文本进行同一主题不同论述侧面的分割。对发射概率设计了基于句群和基于分割点2种不同的计算方法。以Medline摘要为样本进行的实验表明,该算法对概括性小文本分割是有效的,明显好于经典的TextTiling算法。

关 键 词:文本分割  概括性小文本  隐马尔可夫模型  边界识别  相似性度量
修稿时间: 

Text Segmentation Algorithm Oriented to Small General-text
CHEN Yuan,CHEN Rong,HU Jun-feng,LIN Lin,ZHANG Jing-bo,YU Zhong-hua.Text Segmentation Algorithm Oriented to Small General-text[J].Computer Engineering,2008,34(22):43-45.
Authors:CHEN Yuan  CHEN Rong  HU Jun-feng  LIN Lin  ZHANG Jing-bo  YU Zhong-hua
Affiliation:(School of Computer Science, Sichuan University, Chengdu 610064)
Abstract:Text segmentation is an important filed in the area of natural language processing. However, there is a defect that the existing models cannot effectively segment small general-text. For the reason, an algorithm based on Hidden Markov Model(HMM) is proposed in this paper. The algorithm segments a small general-text with a single topic into its different aspects of discussion using the length distribution of every structure block and the terms. Two methods are designed for computing symbol emission probabilities of the HMM, one of them is based on sentence group while the other is based on segmentation point. Experiments on Medline abstracts show that the effect of the algorithm proposed is much better than the TextTiling algorithm.
Keywords:text segmentation  small general-text  Hidden Markov Model(HMM)  boundary recognition  similarity metric
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号