首页 | 本学科首页   官方微博 | 高级检索  
     

基于潜在语义索引的文本特征词权重计算方法
引用本文:李媛媛,马永强.基于潜在语义索引的文本特征词权重计算方法[J].计算机应用,2008,28(6):1460-1462.
作者姓名:李媛媛  马永强
作者单位:西南交通大学,信息科学与技术学院,成都,610031
摘    要:潜在语义索引具有可计算性强,需要人参与少等优点。对其中重要的优化过程--权重计算,进行了深入分析。针对目前应用最广泛的TF-IDF方法中,采用线性处理的不合理性以及难以突出对文本内容起关键性作用的特征的缺点,提出了一种基于"Sigmiod函数"和"位置因子"的新权重方案。突出了文本中不同特征词的重要程度,更有利于潜在语义空间的构造。通过实验平台"中文潜在语义索引分析系统"的测试结果表明,该权重方法更利于基于潜在语义的检索性能的提高。

关 键 词:潜在语义索引  Sigmiod函数  位置因子  权重算法
文章编号:1001-9081(2008)06-1460-03
收稿时间:2007-12-10
修稿时间:2007年12月10

Text term weighting approach based on latent semantic indexing
LI Yuan-yuan,MA Yong-qiang.Text term weighting approach based on latent semantic indexing[J].journal of Computer Applications,2008,28(6):1460-1462.
Authors:LI Yuan-yuan  MA Yong-qiang
Affiliation:LI Yuan-yuan,MA Yong-qiangSchool of Information Science , Technology,Southwest Jiaotong University,Chengdu Sichuan 610031,China
Abstract:Latent Semantic Indexing (LSI) is a new document retrieval model that has been developed during the last ten years. It is easy to compute and requires less human intervention. Term weighting, which is a difficult problem and of great importance in LSI, was studied in detail. In view of the most popular term weighting algorithms, TF-IDF, which is unreasonable to make use of linear and unable to emphasize the significance of key terms which contribute mainly to the content of a text, a new weighting design based on Sigmiod function and location factor was proposed. The new method highlights the importance of the different terms in documents and is in more favor of constructing the latent semantic space. It was tested in the experimental platform named "Chinese LSI Retrieval Analysis System", and the results show that the new method enhances the performance of LSI information retrieve.
Keywords:Latent Semantic Indexing  Sigmiod function  location factor  weighting algorithms
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号