首页 | 本学科首页   官方微博 | 高级检索  
     

基于表格语义的Web信息抽取方法的研究
作者单位:广州城市职业学院计算机工程系 广东广州510405
摘    要:在Web页面常用到表格这种元素。本文提出一种根据表格语义来进行信息抽取方法。首先提出了一种短语语义相似度的度量方法,然后利用短语语义的相似度确定表格标题行(列),并对表格行(列)与抽取字段的对应关系进行计算,最后计算表格的整体语义,度量该表格与所要抽取的内容有多大相关度。

关 键 词:Web信息抽取  表格  短语语义

A Research on the Method of Web Information Extraction Based on Table Semantic
YU Cheng-Jian. A Research on the Method of Web Information Extraction Based on Table Semantic[J]. Digital Community & Smart Home, 2008, 0(12)
Authors:YU Cheng-Jian
Abstract:The table tag is often used in web page. In thispaper a method of web information extraction is presented which is based on table semantic. First, a method for calculating semantic likelihood between two phrase is carried out. Then use the likelihood to determine the title row or column of the table, the correlation between titles and fields is determined at the same time. Base title of the table, a simply method is presented which can be used to calculate correlativity between this table and what we wanted to extract.
Keywords:web information extraction  table  phrasal semantic
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号