基于Lucene的PDF文档的全文检索的实现 Implementation of PDF Full-text Based on Lucene期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Lucene的PDF文档的全文检索的实现

引用本文：	黄江平,黄理灿,徐玲.基于Lucene的PDF文档的全文检索的实现[J].工业控制计算机,2012,25(5):103-104.

作者姓名：	黄江平黄理灿徐玲

作者单位：	浙江理工大学信息学院,浙江杭州,310018

摘要：	在Lucene的全文检索中,直接对PDF文档进行全文检索几乎是不可能的。在实际应用中又需要对大量的PDF文档进行检索,通过Xpdf工具先对PDF文档转换为TXT文本,然后对TXT文本建立索引,在进行检索时通过文件名实现和原始PDF文档的一一对应,最终实现PDF文档的全文检索功能,同时还能实现对PDF文档所检索的包含关键词的内容进行高亮显示,实现全文检索的功能,通过实际项目应用,检索效果能够达到很好的效果。
关键词：	Lucene PDF 全文检索高亮显示
Implementation of PDF Full-text Based on Lucene

Abstract:	In the Lucene full-text search in PDF documents directly to the full-text search is almost impossible.In practice,they need to retrieve a large number of PDF documents,this article first,through the Xpdf tools convert PDF documents to TXT text,and then the TXT text indexing,search through the file name during the implementation and the original PDF document one by one corresponds to,and ultimately the full-text search of PDF documents,but also enables PDF documents to re- trieve content that contains keyword highlighted,to achieve full-text search function.

Keywords:	lucene PDF full-text search highlight
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏