首页 | 本学科首页   官方微博 | 高级检索  
     

基于网页结构挖掘的信息提取
引用本文:李媛,耿桦,张甍,潘金贵. 基于网页结构挖掘的信息提取[J]. 计算机科学, 2006, 33(3): 191-193
作者姓名:李媛  耿桦  张甍  潘金贵
作者单位:南京大学计算机软件新技术国家重点实验室,南京,210093;南京大学计算机软件新技术国家重点实验室,南京,210093;南京大学计算机软件新技术国家重点实验室,南京,210093;南京大学计算机软件新技术国家重点实验室,南京,210093
摘    要:本文提出了两种细粒度的、基于网页结构挖掘的信息提取方法,比较了它们的优缺点,并给出了相应具体实现的性能测试和结果分析.

关 键 词:信息提取  网页结构挖掘  重复模式  时间特征  RSS

Extracting Information by Mining Structures of Web Pages
LI Yuan,GENG Hua,ZHANG Meng,PAN Jin-Gui. Extracting Information by Mining Structures of Web Pages[J]. Computer Science, 2006, 33(3): 191-193
Authors:LI Yuan  GENG Hua  ZHANG Meng  PAN Jin-Gui
Affiliation:State Key Laboratory for Novel Software Technology of Nanjing University, Multimedia Technology Institute of Nanjing University, Nanjing 210093
Abstract:To simplify the task of obtaining information from the vast number of information sources that are available on the WWW, we have developed two different methods to extract information of fine grain. This paper firstly describes the principles of the two methods, which work by mining structures of Web pages, and then compares the advantages and disadvantages of them. Finally, we test the performance of the two methods and analyze the experiment results.
Keywords:RSS
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号