首页 | 本学科首页   官方微博 | 高级检索  
     

基于概率主题模型的标签预测
引用本文:袁柳,张龙波.基于概率主题模型的标签预测[J].计算机科学,2011,38(7):175-180.
作者姓名:袁柳  张龙波
作者单位:1. 陕西师范大学计算机科学学院,西安,710062
2. 山东理工大学计算机学院,淄博,255049
基金项目:本文受国家自然科学基金项目面向入侵检测的数据流挖掘研究(60873196)资助。
摘    要:充分利用用户自定义标签信息,是理解Web资源语义,提高Web应用智能程度的重要途径。针对资源标签分派中大量存在的信息不完整、不一致的现象,建立基于用户标记行为特征的概率主题模型,利用概率主题模型实现对标记信息不完整资源的标签预测。根据每个资源所对应的标签的统计特征,可产生不同形式的标签文档,通过分析标签文档所生成主题的性能,确定适合于特定数据集的标签文档形式;利用同一主题内词汇间的高度相关性,设计合理的预测标签排序方法,从而实现对标记信息不完整资源的标签预测以及标签语义不一致现象的检测。在数据集DeliciousT 140和Wikilo+上的测试表明,所提方法能有效实现标签预测,并可提高信息检索的性能。

关 键 词:标签系统,标签预测,统计主题模型

Social Tag Predication Based on Probabilistic Topic Model
YUAN Liu,ZHANG Long-bo.Social Tag Predication Based on Probabilistic Topic Model[J].Computer Science,2011,38(7):175-180.
Authors:YUAN Liu  ZHANG Long-bo
Affiliation:(College of Computer Science, Shaanxi formal University, Xi' an 710062 , China);(School of Computer Science and Technology,Shandong University of Technology,Ziho 255049,China)
Abstract:Fagging information created by users is important to understand the Web resource semantics and to improve the intelligence of Web applications. Probabilistic topic model was exploited to deal with the incompleteness and inconsistence of tagging systems. A probabilistic topic model generating technique based on tag statistical characteristics was proposed. According to tag statistical characteristics of each resource, tag documents with different format can be created. By analyzing the performance generated by different tag documents, document format that is appropriate for a certwin dataset was confirmed. High relatedness between the vocabularies in the same topic was exploited to predicate the tag for resources with incomplete and inconsistence tags. Experiments on DeliciousT 140 and Wiki10+ show the effectiveness of the technique proposed.
Keywords:Tagging system  Tag predication  Statistical topic model
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号