首页 | 本学科首页   官方微博 | 高级检索  
     

基于模板法的网页英语试卷自动抽取技术的研究
引用本文:熊惠荟,欧阳君.基于模板法的网页英语试卷自动抽取技术的研究[J].计算机与数字工程,2009,37(4):50-52.
作者姓名:熊惠荟  欧阳君
作者单位:华中科技大学信息存储和薄膜技术研究所,武汉,430074
摘    要:为解决在线考试系统中建立海量数据库的问题,采用基于模板法的Web信息抽取方法,提取相似网页中的正文内容。并根据包含英文试卷的网页特点,制定正文抽取规则,最终可获得完整的英语试卷及其答案。实验结果表明,该方法具有较高的准确率和提取速度。

关 键 词:Web  信息抽取  DOM抽取规则  模板

A Novel Method to Extract English Examination Papers from Web Pages Based on Template
Xiong Huihui,Ouyang Jun.A Novel Method to Extract English Examination Papers from Web Pages Based on Template[J].Computer and Digital Engineering,2009,37(4):50-52.
Authors:Xiong Huihui  Ouyang Jun
Affiliation:Research Institute of Information Storage and Film Technology;Huazhong University of Science and Technology;Wuhan 430074
Abstract:In order to solve the problems of building massive database in on-line examination system,a novel method to extract English examination papers from similar web pages based on template was proposed in this paper.The extraction rules were formulated according to the features of web pages including English examination papers.At last,full papers and answers could be obtained.Experiments indicate that the accuracy and extraction speed reach a high level.
Keywords:Web
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号