首页 | 本学科首页   官方微博 | 高级检索  
     

隐含语义索引及其在中文文本处理中的应用研究
引用本文:周水庚,关佶红,胡运发.隐含语义索引及其在中文文本处理中的应用研究[J].小型微型计算机系统,2001,22(2):239-243.
作者姓名:周水庚  关佶红  胡运发
作者单位:1. 复旦大学计算机系 上海 200433
2. 武汉大学计算机学院 武汉 430072
基金项目:国家自然科学基金资助项目“电子图书馆的相关关键技术”(69933010)和国家863计划资助项目“智能图书馆系统”(863-306-ZT04-02-2)的资助
摘    要:信息检索本质上是语义检索,而传统信息检索系统都是基于独立词索引,因此检索效果并不理想,隐含语义索引是一种新型的信息检索模型,它通过奇异值分析,将词向量和文档向量投影到一个低维空间,消减了词和文档之间的语义模糊度,使得文档之间的语义关系更为明晰。实验和理论结果证实了隐含语义索引能够取得更好的检索效果。本文论述了隐含语义索引的理论基础,研究了隐含语义索引在中文文本处理中的应用,包括中文文本检索、中文文本分类和中文文本聚类等。

关 键 词:信息检索  隐含语义索引  中文文本处理  中文信息处理
文章编号:1000-1220(2001)02-0239-05

LATENT SEMANTIC INDEXING (LSI) AND ITS APPLICATIONS IN CHINESE TEXT PROCESSING
ZHOU Shui geng,GUAN Ji hong,HU Yun fa.LATENT SEMANTIC INDEXING (LSI) AND ITS APPLICATIONS IN CHINESE TEXT PROCESSING[J].Mini-micro Systems,2001,22(2):239-243.
Authors:ZHOU Shui geng  GUAN Ji hong  HU Yun fa
Affiliation:ZHOU Shui geng 1 GUAN Ji hong 2 HU Yun fa 1 1
Abstract:Information retrieval is essentially semantic retrieval. However, most classic information retrieval systems represent the contents of documents and queries with a set of index terms, which can lead to poor retrieval performance. Latent semantic index (LSI) is a new algebraic model for information retrieval, which maps documents and queries vectors into a lower dimensional space by singular value decomposition, so that the inherent vagueness associated with a retrieval process based on keyword sets is considerably reduced and semantic association among the documents is highlighted consequently. Theoretic analyses and experimental results all show that LSI can improve retrieval performance significantly. This paper introduces the fundamental principles of LSI and explores its applications in Chinese text processing, including Chinese text retrieval, Chinese text classification and clustering etc.
Keywords:Information retrieval  Latent semantic indexing  Singular value decomposition  Chinese text processing  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号