基于表格语义的Web信息抽取方法的研究 A Research on the Method of Web Information Extraction Based on Table Semantic期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于表格语义的Web信息抽取方法的研究

作者单位：	广州城市职业学院计算机工程系广东广州510405

摘要：	在Web页面常用到表格这种元素。本文提出一种根据表格语义来进行信息抽取方法。首先提出了一种短语语义相似度的度量方法,然后利用短语语义的相似度确定表格标题行(列),并对表格行(列)与抽取字段的对应关系进行计算,最后计算表格的整体语义,度量该表格与所要抽取的内容有多大相关度。
关键词：	Web信息抽取表格短语语义
A Research on the Method of Web Information Extraction Based on Table Semantic

YU Cheng-Jian. A Research on the Method of Web Information Extraction Based on Table Semantic[J]. Digital Community & Smart Home, 2008, 0(12)

Authors:	YU Cheng-Jian

Abstract:	The table tag is often used in web page. In thispaper a method of web information extraction is presented which is based on table semantic. First, a method for calculating semantic likelihood between two phrase is carried out. Then use the likelihood to determine the title row or column of the table, the correlation between titles and fields is determined at the same time. Base title of the table, a simply method is presented which can be used to calculate correlativity between this table and what we wanted to extract.

Keywords:	web information extraction table phrasal semantic
本文献已被 CNKI 等数据库收录！