首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
一种新的半监督入侵检测算法   总被引:3,自引:0,他引:3  
宋凌  李枚毅  李孝源 《计算机应用》2008,28(7):1781-1783
针对无监督学习的入侵检测算法准确度不高、监督学习的入侵检测算法训练样本难以获取的问题,提出了一种粒子群改进的K均值半监督入侵检测算法,利用少量的标记数据生成正确样本模型来指导大量的未标记数据聚类,对聚类后仍未能标记的数据采用粒群优化的K均值聚类,有效提高分类器的分类准确性,并实现了对新类型攻击的检测。实验结果表明,算法的整体检测效果明显优于基于无监督学习和监督学习的检测算法。  相似文献   

2.
一种半聚类的异常入侵检测算法   总被引:2,自引:0,他引:2  
俞研  黄皓 《计算机应用》2006,26(7):1640-1642
针对基于监督学习的入侵检测算法所面临的训练样本不足的问题,提出了一种结合改进k 近邻法的基于半监督聚类的异常入侵检测算法,利用少量的标记数据改善算法的学习能力,并实现了对新攻击类型的检测。实验结果表明,在标记数据极少的情况下,算法的检测结果明显好于非监督学习的算法,接近于监督学习的检测算法。  相似文献   

3.
Tax fraud is one of the substantial issues affecting governments around the world. It is defined as the intentional alteration of information provided on a tax return to reduce someone’s tax liability. This is done by either reducing sales or increasing purchases. According to recent studies, governments lose over $500 billion annually due to tax fraud. A loss of this magnitude motivates tax authorities worldwide to implement efficient fraud detection strategies. Most of the work done in tax fraud using machine learning is centered on supervised models. A significant drawback of this approach is that it requires tax returns that have been previously audited, which constitutes a small percentage of the data. Other strategies focus on using unsupervised models that utilize the whole data when they search for patterns, though ignore whether the tax returns are fraudulent or not. Therefore, unsupervised models are limited in their usefulness if they are used independently to detect tax fraud. The work done in this paper focuses on addressing such limitations by proposing a fraud detection framework that utilizes supervised and unsupervised models to exploit the entire set of tax returns. The framework consists of four modules: A supervised module, which utilizes a tree-based model to extract knowledge from the data; an unsupervised module, which calculates anomaly scores; a behavioral module, which assigns a compliance score for each taxpayer; and a prediction module, which utilizes the output of the previous modules to output a probability of fraud for each tax return. We demonstrate the effectiveness of our framework by testing it on existent tax returns provided by the Saudi tax authority.  相似文献   

4.
目前的入侵检测系统主要采用的是基于特征的误用方法。另外,近几年出现的基于数据挖掘技术的入侵检测方法则需要依靠带标识的训练数据来保证检测效果,然而在现实环境中,训练数据往往是难以获得的。与之相比,非监督式的异常检测系统则具有独特的优势,它无需大量的带标识的、用于标明各种攻击的训练数据,而只需要寻找和定义正常的分类,因此,它具有在不具备任何先验知识的情况下发现新型攻击的能力。文章提出了一种采用模糊自适应谐振网(fuzzyART)发现网络入侵的新方法,并在最后采用KDDCUP99的测试数据集对该方法进行了评估,证实了该方法在网络异常检测中的有效性。  相似文献   

5.
In this paper, we propose a method for network intrusion detection based on language models. Our method proceeds by extracting language features such as n-grams and words from connection payloads and applying unsupervised anomaly detection—without prior learning phase or presence of labeled data. The essential part of this procedure is linear-time computation of similarity measures between language models of connection payloads. Particular patterns in these models decisive for differentiation of attacks and normal data can be traced back to attack semantics and utilized for automatic generation of attack signatures. Results of experiments conducted on two datasets of network traffic demonstrate the importance of high-order n-grams and variable-length language models for detection of unknown network attacks. An implementation of our system achieved detection accuracy of over 80% with no false positives on instances of recent remote-to-local attacks in HTTP, FTP and SMTP traffic.  相似文献   

6.
In-operation construction vibration monitoring records inevitably contain various anomalies caused by sensor faults, system errors, or environmental influence. An accurate and efficient anomaly detection technique is essential for vibration impact assessment. Identifying anomalies using visualization tools is computationally expensive, time-consuming, and labor-intensive. In this study, an unsupervised approach for detecting anomalies in construction vibration monitoring data was proposed based on a temporal convolutional network and autoencoder. The anomalies were autonomously detected on the basis of the reconstruction errors between the original and reconstructed signals. Considering the false and missed detections caused by great variability in vibration signals, an adaptive threshold method was applied to achieve the best identification performance. This method used the log-likelihood of the reconstruction errors to search for an optimal coefficient for anomalies. A distributed training strategy was implemented on a cloud platform to speed up training and perform anomaly detection without significant time delay. Construction-induced accelerations measured by a real vibration monitoring system were used to evaluate the proposed method. Experimental results show that the proposed approach can successfully detect anomalies with high accuracy; and the distributed training can remarkably save training time, thereby realizing anomaly detection for online monitoring systems with accumulated massive data.  相似文献   

7.
信用卡欺诈检测是一个重要的问题,为了提升对于真实世界的信用卡欺诈数据的识别率,提出了一种混合的信用卡欺诈检测模型AWFD(Anomaly weight of credit card fraud detection),首先通过异常检测的方法将数据划分为可信和异常数据,然后利用半监督的方法训练一个集成模型,最终再利用异常检测进一步剔除检测结果中的异常结果。AWFD在保障对于可信数据的学习效果上,通过半监督集成学习的方法,利用异常数据进一步扩充集成模型的多样性,并将异常检测和集成模型融合。实验结果表明,比起一些传统的机器学习方法,AWFD可以提高整体的信用卡欺诈检测的识别率。  相似文献   

8.
基于视觉信息的目标检测和识别模型在训练时往往依赖于来自于训练样本的视角信息,然而附带了视角信息的训练样本通常只有很少的数据库可以提供。当此类信息缺失时,传统的通用目标检测系统通常通过一些非监督学习方法来对样本的视角信息进行粗略估计。本文改进并引入了一种选择性迁移学习方法即TransferBoost方法来解决目标视角信息缺失的问题。本文TransferBoost方法基于GentleBoost框架实现,该方法通过重新利用其它类别样本中的先验信息来提升当前类别样本的学习质量。当给定一个标定完善的样本集作为源数据库时,TransferBoost通过同时调整每个样本的权值和每个源任务的权值实现样本级和任务级的两级知识迁移。这种双层迁移学习更有效地从混合了相关源数据和不相关源数据的数据集中提取了有用的信息。实验结果表明,和直接使用传统的机器学习方法相比较,迁移学习方法所需要的训练样本数大大减少,从而降低了目标检测与识别系统的训练代价,扩展了现有系统的应用范围。  相似文献   

9.
Online auction sites are a target for fraud due to their anonymity, number of potential targets and low likelihood of identification. Researchers have developed methods for identifying fraud. However, these methods must be individually tailored for each type of fraud, since each differs in the characteristics important for their identification. Using supervised learning methods, it is possible to produce classifiers for specific types of fraud by providing a dataset where instances with behaviours of interest are assigned to a separate class. However this requires multiple labelled datasets: one for each fraud type of interest. It is difficult to use real-world datasets for this purpose since they are difficult to label, often limited in size, and contain zero or multiple suspicious behaviours that may or may not be under investigation.The aims of this work are to: (1) demonstrate the approach of using supervised learning together with a validated synthetic data generator to create fraud detection models that are experimentally more accurate than existing methods and that is effective over real data, and (2) to evaluate a set of features for use in general fraud detection is shown to further improve the performance of the created detection models.The approach is as follows: the data generator is an agent-based simulation modelled on users in commercial online auction data. The simulation is extended using fraud agents which model a known type of online auction fraud called competitive shilling. These agents are added to the simulation to produce the synthetic datasets. Features extracted from this data are used as training data for supervised learning. Using this approach, we optimise an existing fraud detection algorithm, and produce classifiers capable of detecting shilling fraud.Experimental results with synthetic data show the new models have significant improvements in detection accuracy. Results with commercial data show the models identify users with suspicious behaviour.  相似文献   

10.
Summarization is an important intermediate step for expediting knowledge discovery tasks such as anomaly detection. In the context of anomaly detection from data stream, the summary needs to represent both anomalous and normal data. But streaming data has distinct characteristics, such as one-pass constraint, for which conducting data mining operations are difficult. Existing stream summarization techniques are unable to create summary which represent both normal and anomalous instances. To address this problem, in this paper, a number of hybrid summarization techniques are designed and developed using the concept of reservoir for anomaly detection from network traffic. Experimental results on thirteen benchmark data streams show that the summaries produced from stream using pairwise distance (PSSR) and template matching (TMSSR) techniques can retain more anomalies than existing stream summarization techniques, and anomaly detection technique can identify the anomalies with high true positive and low false positive rate.  相似文献   

11.
Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system.Although many machine learning methods have been successfully applied to the task,most of them fail to consider two practical yet important issues in software defect detection.First,it is rather difficult to collect a large amount of labeled training data for learning a well-performing model;second,in a software system there are usually much fewer defective modules than defect-free modules,so learning would have to be conducted over an imbalanced data set.In this paper,we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus.This method exploits the abundant unlabeled examples to improve the detection accuracy,as well as employs under-sampling to tackle the class-imbalance problem in the learning process.Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection.Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.  相似文献   

12.
In this research, we propose two new clustering algorithms, the improved competitive learning network (ICLN) and the supervised improved competitive learning network (SICLN), for fraud detection and network intrusion detection. The ICLN is an unsupervised clustering algorithm, which applies new rules to the standard competitive learning neural network (SCLN). The network neurons in the ICLN are trained to represent the center of the data by a new reward-punishment update rule. This new update rule overcomes the instability of the SCLN. The SICLN is a supervised version of the ICLN. In the SICLN, the new supervised update rule uses the data labels to guide the training process to achieve a better clustering result. The SICLN can be applied to both labeled and unlabeled data and is highly tolerant to missing or delay labels. Furthermore, the SICLN is capable to reconstruct itself, thus is completely independent from the initial number of clusters.To assess the proposed algorithms, we have performed experimental comparisons on both research data and real-world data in fraud detection and network intrusion detection. The results demonstrate that both the ICLN and the SICLN achieve high performance, and the SICLN outperforms traditional unsupervised clustering algorithms.  相似文献   

13.
网络入侵检测技术是指对危害计算机系统安全的行为进行检测的方法,它是计算机网络安全领域中的必不可少的防御机制。目前,基于有监督学习的网络异常入侵检测技术具有较高的效率和准确率,该类方法获得了广泛关注,取得了大量的研究成果。但是这类方法需要借助大量标注样本进行模型训练。为减少对标注样本依赖,基于无监督学习或半监督学习的网络入侵检测技术被提出,并逐渐成为该领域的研究热点。其中,基于自编码器的网络异常检测技术是这方面技术的典型代表。该文首先介绍了各类自编码器的基本原理、模型结构、损失函数和训练方法。然后在此基础上将其分为基于阈值和基于分类的方法。其中,基于阈值的方法用又可分为基于重构误差和基于重构概率两类。合适的阈值对异常检测技术的成败至关重要,该文介绍了三种阈值的计算方法。接着对比分析了多个代表性研究工作的方法、性能及创新点,最后对该研究中存在的问题做了介绍,并对未来的研究方向做了展望。  相似文献   

14.
Anomaly detection in resource constrained wireless networks is an important challenge for tasks such as intrusion detection, quality assurance and event monitoring applications. The challenge is to detect these interesting events or anomalies in a timely manner, while minimising energy consumption in the network. We propose a distributed anomaly detection architecture, which uses multiple hyperellipsoidal clusters to model the data at each sensor node, and identify global and local anomalies in the network. In particular, a novel anomaly scoring method is proposed to provide a score for each hyperellipsoidal model, based on how remote the ellipsoid is relative to their neighbours. We demonstrate using several synthetic and real datasets that our proposed scheme achieves a higher detection performance with a significant reduction in communication overhead in the network compared to centralised and existing schemes.  相似文献   

15.
异常检测是机器学习与数据挖掘的热点研究领域之一, 主要应用于故障诊断、入侵检测、欺诈检测等领域. 当前已有很多有效的相关研究工作, 特别是基于隔离森林的异常检测方法, 但在处理高维数据时仍然存在许多困难. 提出了一种新的k近邻隔离森林的异常检算法: k-nearest neighbor based isolation forest (KNIF). 该方法采用超球体作为隔离工具, 利用第k近邻的方法来构建隔离森林, 并构建基于距离的异常值计算方法. 通过充分实验表明KNIF方法能有效地进行复杂分布环境下的异常检测, 并能适应不同分布形式的应用场景.  相似文献   

16.
高校网络被外网访问时,外网访问数据没有类别标记,导致数据识别特征不明显,传统的入侵检测模型不能 有效提取出无监督外网访问数据中的识别特征,无法准确训练入侵检测模型,造成高校网络入侵检测准确度不高。为 了解决这一难题,提出一种基于无监督免疫优化分层的入侵检测算法,即在免疫网络中对数据进行学习,用小规模的 网络完成数据压缩,集中增强数据的识别特征,运用分层聚类方法分析网络,完成数据模型的建立。仿真实验表明,这 种无监督入侵检测模型方法克服了高校网络外网访问数据的识别特性不明显,提高了高校网络入侵检测的准确率,取 得了满意的结果。  相似文献   

17.
李忠  靳小龙  庄传志  孙智 《软件学报》2021,32(1):167-193
近年来,随着Web 2.0的普及,使用图挖掘技术进行异常检测受到人们越来越多的关注.图异常检测在欺诈检测、入侵检测、虚假投票、僵尸粉丝分析等领域发挥着重要作用.在广泛调研国内外大量文献以及最新科研成果的基础上,按照数据表示形式将面向图的异常检测划分成静态图上的异常检测与动态图上的异常检测两大类,进一步按照异常类型将静态...  相似文献   

18.
基于孤立点检测的入侵检测方法研究   总被引:3,自引:0,他引:3       下载免费PDF全文
本文提出了一种基于孤立点检测的核聚类入侵检测方法。方法的基本思想是首先将输入空间中的样本映射到高维特征空间中,并通过重新定义特征空间中数据点到聚类之间的距离来生成聚类,并根据正常类比例N来确定异常数据类别,然后再用于真实数据的检测。该方法具有更快的收敛速度以及更为准确的聚类,并且不需要用人工的或其他的方法来对训练集进行分类。实验采用了KDD99的测试数据,结果表明,该方法能够比较有效的检测入侵行为。  相似文献   

19.
基于主成分分析的无监督异常检测   总被引:5,自引:0,他引:5  
入侵检测系统在训练过程中需要大量有标识的监督数据进行学习,不利于其应用和推广.为了解决该问题,提出了一种基于主成分分析的无监督异常检测方法,在最小均方误差原则下学习样本的主要特征,经过压缩和还原的互逆过程后能最大限度地复制样本信息,从而根据均方误差的差异检测出异常信息.构建的仿真系统经过实验证明,基于主成分分析的无监督异常检测方法能够在无需专家前期参与的情况下检测出入侵,实验结果验证了其有效性.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号