基于DOM树的视频元数据抽取系统 Video Metadata Extraction System Based on DOM Tree期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于DOM树的视频元数据抽取系统

引用本文：	唐朝伟,李俊,苗光胜,杜欣慧. 基于DOM树的视频元数据抽取系统[J]. 计算机工程, 2012, 38(8): 268-270

作者姓名：	唐朝伟李俊苗光胜杜欣慧

作者单位：	1. 重庆大学通信工程学院,重庆,400044 2. 中国科学院声学研究所高性能网络实验室,北京,100190

基金项目：	国家科技重大专项基金资助项目(2011ZX002-4,2011ZX03002-005-02);重庆大学研究生教育改革基金资助项目(2010JGXM015)

摘要：	目前多数抽取方法主要针对主题信息块的提取，未深入到各单独信息块。为此，设计一种基于DOM树的视频元数据抽取系统。通过改进Heritrix的链接过滤功能和URL队列管理策略，结合网页DOM树节点类型，从各单独信息块中抽取网页元数据。实验结果表明，该系统的网页平均查准率为95.7%，平均抽取准确率为98.4%，高于同类系统。
关键词：	网络爬虫信息采集 URL调度增量更新 DOM树
收稿时间：	2011-06-23
Video Metadata Extraction System Based on DOM Tree

TANG Chao-wei , LI Jun , MIAO Guang-sheng , DU Xin-hui. Video Metadata Extraction System Based on DOM Tree[J]. Computer Engineering, 2012, 38(8): 268-270

Authors:	TANG Chao-wei LI Jun MIAO Guang-sheng DU Xin-hui

Affiliation:	1(1.College of Communication Engineering,Chongqing University,Chongqing 400044,China;2.High Performance Network Lab,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190,China)

Abstract:	Most of the extraction methods mainly focus on the extraction of the subject information block,and pay no attention on the individual information piece.A video metadata extraction system based on DOM tree is proposed to solve this problem.Combining with the node type of Web DOM tree,it extracts the metadata of Web pages thorough individual subject information block by improving the links filter functions of Heritrix and queue management strategy of URL.Experimental results show that the average precision ratio of Web page and the average extraction ratio of the system are 95.7% and 98.4%,greatly higher than the similar systems.

Keywords:	Web crawler information collection URL schedule incremental update DOM tree
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏