首页 | 本学科首页   官方微博 | 高级检索  
     

中文信息检索引擎中的分词与检索技术
引用本文:吴栋,滕育平.中文信息检索引擎中的分词与检索技术[J].计算机应用,2004,24(7):128-131.
作者姓名:吴栋  滕育平
作者单位:南开大学,组合数学研究中心核心数学与组合数学教育部重点实验室,天津,300071
摘    要:文中论述了在开发中文信息检索系统中所涉及到的两项关键技术,即中文分词技术和检索技术。针对中文分词技术,介绍了一种改进的正向最大匹配切分算法,以及为消除歧义引入的校正策略,并在此基础上结合统计方法处理未登录词。针对检索技术,综述了几种最常用的检索模型的原理,并对每种模型的优缺点进行了简要分析。最后对给出的分词算法进行了测试,测试结果表明该分词算法准确度和效率能够满足实用的要求。

关 键 词:信息检索  搜索引擎  分词技术  检索技术
文章编号:1001-9081(2004)07-0128-04

Word Segment and Search Techniques for Chinese Information Search Engines
WU Dong,TENG Yu ping.Word Segment and Search Techniques for Chinese Information Search Engines[J].journal of Computer Applications,2004,24(7):128-131.
Authors:WU Dong  TENG Yu ping
Abstract:Two key techniques in the development of Chinese Information Retrieval System are discussed in this paper, i.e., Chinese word segmentation and search technique. For Chinese word segmentation, the paper presents an improved MM segmentation algorithm, the revise strategy for disambiguation, and the statistic method for unknown words recognition based on the previous methods. For search technique, the paper summarizes the principle of several kinds of search models, and analyzes the advantages and disadvantages of each model simply. At last, the given segmentation algorithm is evaluated, and the results reveal that the veracity and efficiency of the algorithm can satisfy the applied request.
Keywords:information retrieval  search engine  word segmentation  search technique
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号