网页内容安全快速信息抽取方法 Research on Rapid Information Extractionin Web Content Security期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

网页内容安全快速信息抽取方法

引用本文：	张驰,罗森林. 网页内容安全快速信息抽取方法[J]. 信息网络安全, 2012, 0(10): 20-22

作者姓名：	张驰罗森林

作者单位：	北京理工大学信息系统及安全对抗实验中心,北京 100081

摘要：	文章提出一种基于静态网页特征的文本信息抽取方法。该方法首先根据静态网页的URL特征判断其是否是静态网页，然后根据静态网页的结构特征和内容特征对标题和正文文本内容进行抽取．再按照统一规范将结果顺序存储便于再处理。实验结果表明，网页内容信息抽取的查全率和查准率分别为96．2％和95．9％，该方法计算量小、抽取速度快、正确率高，可实际应用于大规模的网页内容安全分析。
关键词：	信息抽取网页内容静态网页文本信息
Research on Rapid Information Extractionin Web Content Security

ZHANG Chi,LUO Sen-lin. Research on Rapid Information Extractionin Web Content Security[J]. Netinfo Security, 2012, 0(10): 20-22

Authors:	ZHANG Chi LUO Sen-lin

Affiliation:	(Information System and Security & Countermeasures Experimental Center,Beijing Institute of Technology,Beijing 100081,China)

Abstract:	This paper proposes a new text information extraction algorithm based on the characteristics of static web page.This method can boost the efficiency of static web page recognition and text information extraction according to the URL,structure and content features,and then it will store them sequentially according tothe standard formats.The experimental results show that the algorithm can extract web text information perfectly with the recall and precision ratioreaching up to 96.2% and 95.9%.This method has a small amount of computation and works fast and accurately,which can be applied directly to the large-scale analysis of web contentsecurity.

Keywords:	information extraction Web content static Web page text information
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏