首页 | 本学科首页   官方微博 | 高级检索  
     

基于合并因子的多种格式文件索引技术
引用本文:孙广路,易成岐,郎非. 基于合并因子的多种格式文件索引技术[J]. 哈尔滨理工大学学报, 2012, 17(2): 1-4
作者姓名:孙广路  易成岐  郎非
作者单位:1. 清华大学信息技术研究院,北京100084;哈尔滨理工大学计算机科学与技术学院信息安全与智能技术研究中心,黑龙江哈尔滨150080
2. 哈尔滨理工大学计算机科学与技术学院信息安全与智能技术研究中心,黑龙江哈尔滨,150080
3. 哈尔滨理工大学外国语学院,黑龙江哈尔滨,150080
基金项目:国家自然科学基金,黑龙江省自然科学基金,教育部人文社科项目
摘    要:为了改进传统的文本检索技术存在检索文件格式种类单一,索引大数据量文件速度慢,甚至造成内存溢出等问题,基于Lucene系统及相关技术,研究了基于合并因子的多种格式文件索引技术,并在此基础上构建了中文文本信息检索系统.实验分析表明,本系统有效地实现了多种格式文件检索功能,通过合并因子的设定有效提高了索引速度,系统可靠性高.

关 键 词:文本检索  合并因子  多种格式文件索引

Multiple Formats File Indexing Technology Based on Merging Factor
SUN Guang-lu , YI Cheng-qi , LANG Fei. Multiple Formats File Indexing Technology Based on Merging Factor[J]. Journal of Harbin University of Science and Technology, 2012, 17(2): 1-4
Authors:SUN Guang-lu    YI Cheng-qi    LANG Fei
Affiliation:1.Research Institute of Information and Technology,Tsinghua University,Beijing 100084,China; 2.Research Center of Information Security and Intelligent Technology,School of Computer Science and Technology, Harbin University of Science and Technology,Harbin 150080,China; 3.School of Foreign Languages,Harbin University of Science and Technology,Harbin 150080,China)
Abstract:Traditional file indexing technology has many problems,such as single formats of file,low speed of indexing a mass of data and documents,and even out of memory.To tackle the above problems,this paper proposes the multiple formats file indexing technology based on merging factor.Furthermore,the Chinese text information retrieval system is built based on the improvement of the Lucene system.Experimental results show that the system effectively realizes the function of the multiple formats file indexing with high reliability.The speed of indexing is improved by the set of merging factors.
Keywords:text retrieval  merging factor  multiple formats file indexing.
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号