首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于Lucene的地名数据库快速检索系统*
引用本文:张文元,周世宇,谈国新.一种基于Lucene的地名数据库快速检索系统*[J].计算机应用研究,2017,34(6).
作者姓名:张文元  周世宇  谈国新
作者单位:华中师范大学 国家文化产业研究中心,华中师范大学 国家文化产业研究中心,华中师范大学 国家文化产业研究中心
基金项目:国家科技支撑计划课题(2012BAH83F00)
摘    要:针对传统关系型数据库海量地名数据检索效率低下的问题,提出了一种盘古分词和Lucene全文检索相结合的地名数据库快速检索方法。首先,设计了一种地名数据表结构,比较了几种常用开源分词器的中文分词性能,并选用性能优异的盘古中文分词器,通过扩展其词典来实现中文地名的有效分词。其次,利用内存索引和多线程并行处理技术提高Lucene创建倒排索引效率,并依据地名类别和显示优先级属性优化了检索结果相关度排序策略。最后,开发了一套具有快速搜索和地图定位展示的Web地名检索系统,使用500万条真实地名数据测试了其检索性能,查询平均耗时不到1秒,比MySQL数据库模糊检索效率提高了15倍,匹配结果也更加准确,能够提供高效灵活的海量地名公共检索服务。

关 键 词:Lucene  地名  全文检索  数据库  中文分词  相关度排序
收稿时间:2016/6/21 0:00:00
修稿时间:2017/4/9 0:00:00

A place name database quick searching system based on Lucene
Zhang Wenyuan,Zhou Shiyu and Tan Guoxin.A place name database quick searching system based on Lucene[J].Application Research of Computers,2017,34(6).
Authors:Zhang Wenyuan  Zhou Shiyu and Tan Guoxin
Affiliation:National Research Center of Cultural Industries,Central China Normal University,Wuhan,National Research Center of Cultural Industries,Central China Normal University,Wuhan,National Research Center of Cultural Industries,Central China Normal University,Wuhan
Abstract:To avoid the low efficiency in massive place names searching with the traditional relational database, this paper proposed a fast place name database retrieval method with the integration of PanGuAnalyzer and Lucene full-text search toolbox. Firstly, a place name data structure was designed, and the segmentation performances of several open source Chinese analyzers were compared. Based on the results, the excellent PanguAnalyzer with a rich place dictionary was integrated into Lucene so as to improve the effect of Chinese place name segmentation. To improve the efficiency of creating inverted index, this paper adopted memory index and multi-thread parallel processing. The query result ranking strategy based on similarity scoring was also optimized according to the category and display priority attributes of place names. Finally, a place name searching system was developed, which integrated various functions including place name searching, visualization, and location service. More than 5,000,000 real place name records were used to test the performance of the new searching technique. By comparing with the searching results of fuzzy query method based on MySQL database, the average response time of the new method was less than one second, and it was nearly three times faster than the database retrieval. The new full-text search strategy proposed in this article demonstrated its advantage in terms of accuracy and rapid response, and it can provide efficient and flexible public place name search service.
Keywords:Lucene  place name  full-text search  database  Chinese word segmentation  relevancy ranking
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号