基于Web的表格信息抽取研究 Study on Tables Information Extraction Based on Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Web的表格信息抽取研究

引用本文：	秦振海,谭守标,徐超.基于Web的表格信息抽取研究[J].微机发展,2010(2):217-220.

作者姓名：	秦振海谭守标徐超

作者单位：	安徽大学电子科学与技术学院;

基金项目：	安徽省自然科学研究重点项目(2005KJ004ZD)

摘要：	如今，Web成为了网络信息的主要平台。根据研究发现，表格在Web文本中被经常使用。正因为表格形式简洁并且含有丰富的信息，自动理解表格在知识管理、信息检索、Web挖掘等应用中有着广泛的用途，所以研究Web表格信息抽取有着重要的现实意义。互联网上有大量信息采用HTML表格表示，由于HTML不描述数据的内容，机器不能理解和查询。论文首先将HTML文档转换为XML文档，结合本体形成启发式规则，对表格定位、表格结构识别两个关键技术进行了分析。在此基础上，利用HTML表格属性，将HTML表格标准化，从而适用于复杂表格的信息抽取。
关键词：	HTML表格信息抽取 Web XML
Study on Tables Information Extraction Based on Web

QIN Zhen-hai,TAN Shou-biao,XU Chao.Study on Tables Information Extraction Based on Web[J].Microcomputer Development,2010(2):217-220.

Authors:	QIN Zhen-hai TAN Shou-biao XU Chao

Affiliation:	QIN Zhen-hai,TAN Shou-biao,XU Chao(Department of Electronic Science , Technology,Anhui University,Hefei 230039,China)

Abstract:	Nowadays,web becomes the main information resource.According to the report,tables are used frequently in web documents.Since tables are inherently concise as well as information rich,the automatic understanding of tables has many applications including knowledge management,information retrieval,web mining and so on.Study on tables information extraction based on web has an important practical significance.A large amount of information available on the web is formatted in HTML tables,which are not content-or...

Keywords:	HTML tables information extraction Web XML
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏