首页 | 本学科首页   官方微博 | 高级检索  
     

一种大容量文本集的智能检索方法
引用本文:金小峰.一种大容量文本集的智能检索方法[J].计算机工程与应用,2011,47(7):143-145.
作者姓名:金小峰
作者单位:延边大学工学院计算机科学与技术系智能信息处理研究室,吉林,延吉,133002
基金项目:吉林省科技发展计划国际合作项目
摘    要:分析了潜在语义模型,研究了潜在语义空间中文本的表示方法,提出了一种大容量文本集的检索策略。检索过程由粗粒度非相关剔除和相关文本的精确检索两个步骤组成。使用潜在语义空间模型对文本集进行初步的筛选,剔除非相关文本;使用大规模文本检索方法对相关文本在段落一级进行精确检索,其中为了提高检索的执行效率,在检索算法中引入了遗传算法;输出这些候选的段落序号。实验结果证明了这种方法的有效性和高效性。

关 键 词:向量空间模型  潜在语义索引  奇异值分解  文本信息检索
修稿时间: 

Intelligent information retrieval approach for large-scale collections of full-text document
JIN Xiaofeng.Intelligent information retrieval approach for large-scale collections of full-text document[J].Computer Engineering and Applications,2011,47(7):143-145.
Authors:JIN Xiaofeng
Affiliation:JIN Xiaofeng Intelligent Info. Processing Lab,Dept. of Computer Sci. and Tech.,College of Engineering,Yanbian University,Yanji,Jilin 133002,China
Abstract:An information retrieval approach for large-scale collections of full-text document is proposed according to latent model analysis and investigation of latent space-based text representation form.The retrieval process is divided into rough irrelative full-text documents culling procedure,and relative full-text document precise searching procedure.Irrelative documents are removed by the first procedure.Relative full-text documents are retrieved in passage level by the second one,and in this process,GA algorithm is introduced in order to achieve best performance.Finally,the candidate passage indices are returned.The validity and high efficiency of the proposed method are shown by experimental results.
Keywords:Vector Space Model (VSM)  Latent Semantic Indexing (LSI)  Singular Value Decomposition (SVD)  text informa-tion retrieval
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号