首页 | 本学科首页   官方微博 | 高级检索  
     

基于语料库的潜语义信息度量
引用本文:江开忠,李路,王昭宗.基于语料库的潜语义信息度量[J].计算机应用,2009,29(9).
作者姓名:江开忠  李路  王昭宗
作者单位:上海工程技术大学,基础教学学院,上海,201620
基金项目:上海市科学技术委员会科技攻关项目,上海工程技术大学大学生创新项目 
摘    要:为关键词定义了与主题或语义相关联的信息度量.首先获取基于主题的语料库,然后建立语料库的潜语义向量空间模型,通过该模型定义关键词的信息度量.由此可以计算任意文档包含该主题的信息量,定义文档对主题的隶属度.设定文档对主题隶属度阈值,从而判断文档是否属于该主题类.实验表明,与主题或语义关联的信息度量可以克服搜索中"词匹配"的不足,达到"语义匹配"的搜索.

关 键 词:潜语义  信息度量  度量分布  隶属度

Latent semantic information measurement of corpus orientation
JIANG Kai-zhong,LI Lu,WANG Zhao-zong.Latent semantic information measurement of corpus orientation[J].journal of Computer Applications,2009,29(9).
Authors:JIANG Kai-zhong  LI Lu  WANG Zhao-zong
Affiliation:College of Basic Teaching;Shanghai University of Engineering Science;Shanghai 200335;China
Abstract:The authors defined an information measurement associated with a topic or semantics for a keyword.Firstly,the topic-based corpus was obtained.Then the latent semantic vector space model of the corpus was established.After that,the information measurement of the keyword was defined through the model.Accordingly,the amount of the topic information any document contained could be calculated.Lastly,the membership measurement which measured the membership degree of the document belonging to the topic was introdu...
Keywords:latent semantics  information measurement  metric distribution  membership degree  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号