PDF文件信息的抽取与分析 Extraction and Analysis of Information from PDF Files期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

PDF文件信息的抽取与分析

引用本文：	李珍,田学东. PDF文件信息的抽取与分析[J]. 计算机应用, 2003, 23(12): 145-147

作者姓名：	李珍田学东

作者单位：	河北大学,数学与计算机学院,河北,保定,071002

基金项目：	河北省自然科学基金项目 (6 0 2 1 2 7)

摘要：	PDF文件网络信息抽取的重要资源。通过对PDF文件结构的分析，针对最流行的线性PDF文件，在论述如何从源代码中取出正文内容字符串流并进行解码的基础上，对从解码后的字符串流中提取出文本及其相关的字体、字号和换行等文本信息进行了详细的讨论。这将有助于根据需要进一步抽取PDF文件信息。
关键词：	信息抽取 PDF文件文本信息分析
文章编号：	1001-9081(2003)12-0145-03
修稿时间：	2003-06-19
Extraction and Analysis of Information from PDF Files

LI Zhen,TIAN Xue-dong. Extraction and Analysis of Information from PDF Files[J]. Journal of Computer Applications, 2003, 23(12): 145-147

Authors:	LI Zhen TIAN Xue-dong

Abstract:	PDF files are important resource of Internet information extraction. Based on the analysis of PDF file structure, the article discusses the extraction methods of text and related information such as font, font size and line information from the most popular linearized PDF files. These will contribute to extract the information of PDF files we needed further.'

Keywords:	information extraction PDF file analysis of text information
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏