首页 | 本学科首页   官方微博 | 高级检索  
     

基于Lucene的XML文件相似度检索系统
引用本文:吴新强,周娅,王如意,张敬伟,林煜明. 基于Lucene的XML文件相似度检索系统[J]. 计算机系统应用, 2015, 24(2): 134-139
作者姓名:吴新强  周娅  王如意  张敬伟  林煜明
作者单位:桂林电子科技大学 计算机科学与工程学院,桂林,541004
基金项目:广西教育厅高校科技项目(2013YB095);广西信息实验科学中心重点项目(20130111);广西教育厅一般资助项目(20103YB051);桂林电子科技大学研究生创新项目(GDYCS201465)
摘    要:经分析研究开源的Lucene系统架构以及特殊xml数据源,针对Lucene搜索得分公式的不足,提出了结合词项位置和二次检索的公式,设计一种文本搜索系统;并以提高检索性能、相似性搜索的准确率、索引的空间效率和支持查询的时间效率为目标进行实验,最后通过部署Tomcat服务器实现.经实验验证,改进的系统较之于原Lucene系统提高了建立索引效率、查询效率、准确率.

关 键 词:Web Lucene  相似度  词项位置  二次检索  XML
收稿时间:2014-05-13
修稿时间:2014-06-13

XML File Similarity Retrieval System Based on Lucene
WU Xin-Qiang,ZHOU Y,WANG Ru-Ri,ZHANG Jin-Wei and LIN Yu-Ming. XML File Similarity Retrieval System Based on Lucene[J]. Computer Systems& Applications, 2015, 24(2): 134-139
Authors:WU Xin-Qiang  ZHOU Y  WANG Ru-Ri  ZHANG Jin-Wei  LIN Yu-Ming
Affiliation:School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin 541004, China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin 541004, China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin 541004, China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin 541004, China;School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin 541004, China
Abstract:On the basis of analysis and study on the open source Lucene system architecture, a semantic search system is designed based on the special XML data sources in this paper. What's more, we use the word item location and word semantic to improve the Lucene's search results and conduct experiments to test and verify the retrieval performance, the accuracy of similarity search, the space efficiency of index and the time-efficiency of supporting inquiry: And finally by deploying the Tomcat server to implement our implement system. The experiment results prove that compared with the original Lucene indexing system, our system can improve the indexing efficiency, query efficiency and accuracy.
Keywords:Lucene  similarity  lexical item location  secondary retrieval  XML
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号