首页 | 本学科首页   官方微博 | 高级检索  
     

基于LDA的条件随机场主题模型研究
引用本文:史庆伟,郭朋亮.基于LDA的条件随机场主题模型研究[J].计算机工程与应用,2015,51(7):131-135.
作者姓名:史庆伟  郭朋亮
作者单位:辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
基金项目:“十二五”国家科技支撑计划(No.2013BAG06B01)。
摘    要:使用主题模型对文本建模,提取文本的隐含主题,进而进行词性标注和文本分类等工作,是机器学习和文本挖掘领域的研究热点。提出一个基于LDA的主题模型,它基于“段袋“假设--文本中的段落具有相同的主题,且连续的段落更倾向于具有相同的主题。对于文章的段落,采用条件随机场(CRF)模型划分并判断它们是否具有相同主题。实验表明,新模型相比LDA模型能更好得提取主题并具有更低的困惑度,同时,能够较好地进行词性标注和文本分类工作。

关 键 词:潜在的狄利克雷分配(LDA)  条件随机场  主题  

Conditional random fields topic model based on LDA model
SHI Qingwei,GUO Pengliang.Conditional random fields topic model based on LDA model[J].Computer Engineering and Applications,2015,51(7):131-135.
Authors:SHI Qingwei  GUO Pengliang
Affiliation:College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
Abstract:Using the topic model to model text and extract latent topic for part-of-speech tagging and document classification is a hot spot in the machine learning and text mining areas. This paper proposes a new model which based on LDA and an assumption called “section of the bag” that paragraph has the same topic, and the successive paragraphs tend to have the same topic. For passages from the article, it uses Conditional Random Field(CRF) model to divide them and judge whether they have the same topic. Experiments show that the improved model compared with LDA model has better topic extraction ability and lower degree of perplexity. At the same time, the improved model has better performance in part-speech-tagging and document classification.
Keywords:Latent Dirichlet Allocation(LDA)  conditional random fields  topic
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号