首页 | 官方网站   微博 | 高级检索  
     

基于隐Markov模型的文本分类
引用本文:罗双虎,欧阳为民.基于隐Markov模型的文本分类[J].计算机工程与应用,2007,43(30):179-181.
作者姓名:罗双虎  欧阳为民
作者单位:上海大学,计算机科学与工程学院,上海,200072
摘    要:把基于序列模型的隐Markov模型引入文本分类领域。把待分类文本描述成一系列状态演化的隐Markov过程,其中状态以特定的概率产生代表文本的特征项。用序列模式来描述文本类,文本序列通过与隐Markov模型的匹配,求出其对应状态序列和最大输出概率。比较各个文本类的结果,达到文本分类的目的。最后通过和简单向量算法,KNN,Naive Bayes分类算法的比较,说明本算法的在文本分类中的成功应用。

关 键 词:隐马尔可夫  文本分类  序列模型
文章编号:1002-8331(2007)30-0179-03
修稿时间:2007-03

HMM based text categorization
LUO Shuang-hu,OUYANG Wei-min.HMM based text categorization[J].Computer Engineering and Applications,2007,43(30):179-181.
Authors:LUO Shuang-hu  OUYANG Wei-min
Affiliation:College of Computer Science and Engineering,Shanghai University,Shanghai 200072,China
Abstract:Presents the new method using Hidden Markov Models(HMM) to supervise document classification.Represents the document to be classified in a kind of hidden Markov models.The states of HMM eject the symbols with a certain probability.These symbols composes of the classified documents.The class of document is supposed to be composed by some character item series.By calculating the output probability of the HMM on the class character series can get the max corresponding output probability and the output series.Compares the result on all the class can decide the category of a certain document.The model is evaluated on the real dataset with Naive Bayes,KNN and simple vector models.It is shown to be successful method in text categorization.
Keywords:Hidden Markov Models(HMM)  text categorization  sequence model
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号