首页 | 本学科首页   官方微博 | 高级检索  
     

基于稳定性语义聚类的相关模型估计
引用本文:孙芯宇,吴江,蒲强.基于稳定性语义聚类的相关模型估计[J].计算机应用,2016,36(5):1313-1318.
作者姓名:孙芯宇  吴江  蒲强
作者单位:1. 西南财经大学 经济信息工程学院, 成都 611130;2. 成都大学 信息科学与工程学院, 成都 610106
基金项目:教育部人文社会科学研究青年基金资助项目(11YJCZH084);四川省科技厅科技支撑计划项目(2014GZ0013,2014SZ0107);四川省教育厅自然科学重点项目(13ZA0297)。
摘    要:针对由不稳定聚类估计的相关模型影响检索性能的问题,提出了基于稳定性语义聚类的相关模型(SSRM)。首先利用初始查询前N个结果文档构成反馈数据集;然后探测数据集中稳定的语义类别数量;接着从稳定性语义聚类中选择与用户查询最相似的语义类别估计SSRM;最后通过实验对模型的检索性能进行了验证。对TREC数据集5个子集的实验结果显示,SSRM相比相关模型(RM)、语义相关模型(SRM),平均准确率(MAP)性能最少提高了32.11%和0.41%;相比基于聚类的文档模型(CBDM)、基于LDA的文档模型(LBDM)和Resampling等基于聚类的检索方法,MAP性能最少提高了23.64%,19.59%和8.03%。实验结果表明,SSRM有利于改善检索性能。

关 键 词:信息检索  语义聚类  稳定性验证  独立分量分析  相关模型估计  
收稿时间:2015-10-21
修稿时间:2016-01-07

Relevance model estimation based on stable semantic clustering
SUN Xinyu,WU Jiang,PU Qiang.Relevance model estimation based on stable semantic clustering[J].journal of Computer Applications,2016,36(5):1313-1318.
Authors:SUN Xinyu  WU Jiang  PU Qiang
Affiliation:1. School of Economic Information Engineering, Southwestern University of Finance and Economics, Chengdu Sichuan 611130, China;2. School of Information Science and Engineering, Chengdu University, Chengdu Sichuan 610106, China
Abstract:To solve the problem of relevance model based on unstable clustering estination and its effect on retrieval performance, a new Stable Semantic Relevance Model (SSRM) was proposed. The feedback data set was first formed by using the top N documents from user initial query, after the stable number of semantic clusters had been detected, SSRM was estimated by those stable semantic clusters selected according to higher user-query similarity. Finally, the SSRM retrieval performance was verified by experiments. Compared with Relevance Model (RM), Semantic Relevance Model (SRM) and the clustering-based retrieval methods including Cluster-Based Document Model (CBDM), LDA-Based Document Model (LBDM) and Resampling, SSRM has improvement of MAP by at least 32.11%, 0.41%, 23.64%,19.59%, 8.03% respectively. The experimental results show that retrieval performance can benefit from SSRM.
Keywords:information retrieval  semantic clustering  stability validation  Independent Component Analysis (ICA)  relevance model estimation  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号