首页 | 本学科首页   官方微博 | 高级检索  
     

基于伪相关反馈的短文本扩展与分类
引用本文:王蒙,林兰芬,王锋.基于伪相关反馈的短文本扩展与分类[J].浙江大学学报(自然科学版 ),2014,48(10):1835-1842.
作者姓名:王蒙  林兰芬  王锋
作者单位:浙江大学 计算机科学与技术学院,浙江 杭州 310027
基金项目:博士点基金资助项目(20110101110065);国家“十二五”科技支撑计划资助项目(2012BAD35B01-3,2013BAF02B10)
摘    要:针对短文本分类问题,提出基于伪相关反馈(PFR)的短文本扩展与分类方法.在保持语义不变的情况下,利用互联网中的相似语料对短文本的内容进行了扩展.对现有的仅使用局部特征的扩展语料特征抽取方法进行改进,引入全局特征抽取,将全局特征与局部特征相结合得到了更好的特征向量,有效地解决了分类过程中由短文本长度有限导致的特征矩阵高度稀疏的问题.通过在开放数据集上的测试和与其他文献的结果比对,验证了该方法在短文本分类的问题上可以取得较好的效果.

关 键 词:伪相关反馈  短文本分类  特征提取

Short text expansion and classification based on pseudo-relevance feedback
WANG Meng;LIN Lan-fen;WANG Feng.Short text expansion and classification based on pseudo-relevance feedback[J].Journal of Zhejiang University(Engineering Science),2014,48(10):1835-1842.
Authors:WANG Meng;LIN Lan-fen;WANG Feng
Affiliation:WANG Meng;LIN Lan-fen;WANG Feng;College of Computer Science and Technology,Zhejiang University;
Abstract:A novel classification method based on pseudo-relevance feedback (PFR) was proposed in order to solve the sparseness problems in short text classification. The short texts were expanded using the web pages which are similar to them in semantic level. The feature vector generation algorithm was modified to extract both the local features and the global features. The method can alleviate the sparseness problem of the final feature matrix, which is common in short text classification because of the limited length of the texts. The experimental results on an open dataset show that the method can significantly improve the short text classification effect compared with state-of-the-art methods.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
点击此处可从《浙江大学学报(自然科学版 )》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号