首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hadoop的大规模网络安全实体识别方法
引用本文:秦娅1,2,申国伟1,2,余红星1,2. 基于Hadoop的大规模网络安全实体识别方法[J]. 智能系统学报, 2019, 14(5): 1017-1025. DOI: 10.11992/tis.201809024
作者姓名:秦娅1  2  申国伟1  2  余红星1  2
作者单位:1. 贵州大学 计算机科学与技术学院, 贵州 贵阳 550025;2. 贵州大学 贵州省公共大数据重点实验室, 贵州 贵阳 550025
摘    要:随着大数据时代的到来,如何从多源异构数据中准确地识别网络安全实体是构建网络安全知识图谱的基础问题。因此本文针对网络安全相关文本数据,研究支持海量网络数据的安全实体识别算法,为构建网络安全知识图谱奠定基础。针对海量的文本类网络数据中安全实体的高效精准抽取问题,本文基于Hadoop分布式计算框架提出改进的条件随机场(conditional random fields,CRF)算法,对数据集进行有效分割,实现安全实体的高效准确识别。在大规模真实网络数据集上的实验证明,本文提出的算法达到了较高的网络安全实体识别准确率,同时提高了识别的效率。

关 键 词:大数据  异构数据  网络安全  知识图谱  安全实体  实体识别  网络数据  Hadoop  CRF算法

Large-scale network security entity recognition method based on Hadoop
QIN Ya1,2,SHEN Guowei1,2,YU Hongxing1,2. Large-scale network security entity recognition method based on Hadoop[J]. CAAL Transactions on Intelligent Systems, 2019, 14(5): 1017-1025. DOI: 10.11992/tis.201809024
Authors:QIN Ya1  2  SHEN Guowei1  2  YU Hongxing1  2
Affiliation:1. Department of Computer Science and Technology, Guizhou University, Guiyang 550025, China;2. Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
Abstract:In this era of big data, a fundamental problem for constructing network security knowledge graphs is how to efficiently and accurately identify the network security entities present in multi-source heterogeneous data. This study focuses on text data related to network safety and investigate the use of a security entity recognition algorithm that supports massive-network text data, thereby laying a foundation for building the network security knowledge graph. To efficiently and accurately extract the security entities in massive-network text data, we propose an improved conditional random fields (CRF) algorithm based on the Hadoop distributed computing framework to segment data sets effectively, which realize efficient and accurate recognition of security entities. The experimental results reveal that the proposed security entity recognition algorithm achieved a high precision rate on a large-scale real network data set and improved the efficiency of network security entity recognition..
Keywords:big data   heterogeneous data   network security   knowledge graph   security entity   entity recognition   network data   Hadoop   CRF algorithm
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号