首页 | 本学科首页   官方微博 | 高级检索  
     

基于主题模型的博客标签语义知识获取
引用本文:何婷婷,李芳.基于主题模型的博客标签语义知识获取[J].中国通信学报,2012,9(3):38-48.
作者姓名:何婷婷  李芳
摘    要:

收稿时间:2012-04-20;

Semantic Knowledge Acquisition from Blogs with Tag-Topic Model
He Tingting,Li Fang.Semantic Knowledge Acquisition from Blogs with Tag-Topic Model[J].China communications magazine,2012,9(3):38-48.
Authors:He Tingting  Li Fang
Affiliation:1Department of Computer Science, Central China Normal University, Wuhan 430079, P. R. China
2Engineering & Research Center for Information Technology on Education, Central China Normal University, Wuhan 430079, P. R. China
3Network Media Branch, National Language Resources Monitoring and Research Center, Wuhan 430079, P. R. China
Abstract:This paper focuses on semantic knowledge acquisition from blogs with the proposed tag-topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer between the document and the topic. Each document is represented by a mixture of tags; each tag is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. After parameter estimation, the tags are used to describe the underlying topics. Thus the latent semantic knowledge within the topics could be represented explicitly. The tags are treated as concepts, and the top-N words from the top topics are selected as related words of the concepts. Then PMI-IR is employed to compute the relatedness between each tag-word pair and noisy words with low correlation removed to improve the quality of the semantic knowledge. Experiment results show that the proposed method can effectively capture semantic knowledge, especially the polyseme and synonym.
Keywords:semantic knowledge acquisition  topic model  tag
点击此处可从《中国通信学报》浏览原始摘要信息
点击此处可从《中国通信学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号