首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
一种面向高维混合属性数据的异常挖掘算法   总被引:2,自引:0,他引:2  
李庆华  李新  蒋盛益 《计算机应用》2005,25(6):1353-1356
异常检测是数据挖掘领域研究的最基本的问题之一,它在欺诈甄别、气象预报、客户分类和入侵检测等方面有广泛的应用。针对网络入侵检测的需求提出了一种新的基于混合属性聚类的异常挖掘算法,并且依据异常点(outliers)是数据集中的稀有点这一本质,给出了一种新的数据相似性和异常度的定义。本文所提出算法具有线性时间复杂度,在KDDCUP99和WisconsinPrognosisBreastCancer数据集上的实验表明,算本法在提供了近似线性时间复杂度和很好的可扩展性的同时,能够较好的发现数据集中的异常点。  相似文献   

Identification of attacks by a network intrusion detection system (NIDS) is an important task. In signature or rule based detection, the previously encountered attacks are modeled, and signatures/rules are extracted. These rules are used to detect such attacks in future, but in anomaly or outlier detection system, the normal network traffic is modeled. Any deviation from the normal model is deemed to be an outlier/ attack. Data mining and machine learning techniques are widely used in offline NIDS. Unsupervised and supervised learning techniques differ the way NIDS dataset is treated. The characteristic features of unsupervised and supervised learning are finding patterns in data, detecting outliers, and determining a learned function for input features, generalizing the data instances respectively. The intuition is that if these two techniques are combined, better performance may be obtained. Hence, in this paper the advantages of unsupervised and supervised techniques are inherited in the proposed hierarchical model and devised into three stages to detect attacks in NIDS dataset. NIDS dataset is clustered using Dirichlet process (DP) clustering based on the underlying data distribution. Iteratively on each cluster, local denser areas are identified using local outlier factor (LOF) which in turn is discretized into four bins of separation based on LOF score. Further, in each bin the normal data instances are modeled using one class classifier (OCC). A combination of Density Estimation method, Reconstruction method, and Boundary methods are used for OCC model. A product rule combination of the threemethods takes into consideration the strengths of each method in building a stronger OCC model. Any deviation from this model is considered as an attack. Experiments are conducted on KDD CUP’99 and SSENet-2011 datasets. The results show that the proposed model is able to identify attacks with higher detection rate and low false alarms.  相似文献   

如何对生产环境中经代码混淆的结构化数据集的敏感属性(字段)进行自动化识别、分类分级,已成为对结构化数据隐私保护的瓶颈。提出一种面向结构化数据集的敏感属性自动化识别与分级算法,利用信息熵定义了属性敏感度,通过对敏感度聚类和属性间关联规则挖掘,将任意结构化数据集的敏感属性进行识别和敏感度量化;通过对敏感属性簇中属性间的互信息相关性和关联规则分析,对敏感属性进行分组并量化其平均敏感度,实现敏感属性的分类分级。实验表明,该算法可识别、分类、分级任意结构化数据集的敏感属性,效率和精确率更高;对比分析表明,该算法可同时实现敏感属性的识别与分级,无须预知属性特征、敏感特征字典,兼顾了属性间的相关性和关联关系。  相似文献   

Attribute subset selection based on rough sets is a crucial preprocessing step in data mining and pattern recognition to reduce the modeling complexity. To cope with the new era of big data, new approaches need to be explored to address this problem effectively. In this paper, we review recent work related to attribute subset selection in decision-theoretic rough set models. We also introduce a scalable implementation of a parallel genetic algorithm in Hadoop MapReduce to approximate the minimum reduct which has the same discernibility power as the original attribute set in the decision table. Then, we focus on intrusion detection in computer networks and apply the proposed approach on four datasets with varying characteristics. The results show that the proposed model can be a powerful tool to boost the performance of identifying attributes in the minimum reduct in large-scale decision systems.  相似文献   

Multivariate outlier identification requires the choice of reliable cut-off points for the robust distances that measure the discrepancy from the fit provided by high-breakdown estimators of location and scatter. Multiplicity issues affect the identification of the appropriate cut-off points. It is described how a careful choice of the error rate which is controlled during the outlier detection process can yield a good compromise between high power and low swamping, when alternatives to the Family Wise Error Rate are considered. Multivariate outlier detection rules based on the False Discovery Rate and the False Discovery Exceedance criteria are proposed. The properties of these rules are evaluated through simulation. The rules are then applied to real data examples. The conclusion is that the proposed approach provides a sensible strategy in many situations of practical interest.  相似文献   

Roth V 《Neural computation》2006,18(4):942-960
The problem of detecting atypical objects or outliers is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The method presented in this letter bridges the gap between kernelized one-class classification and gaussian density estimation in the induced feature space. Having established the exact relation between the two concepts, it is now possible to identify atypical objects by quantifying their deviations from the gaussian model. This model-based formalization of outliers overcomes the main conceptual shortcoming of most one-class approaches, which, in a strict sense, are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. In order to overcome the inherent model selection problem of unsupervised kernel methods, a cross-validated likelihood criterion for selecting all free model parameters is applied. Experiments for detecting atypical objects in image databases effectively demonstrate the applicability of the proposed method in real-world scenarios.  相似文献   

王美晶  叶东毅 《计算机应用》2012,32(Z1):139-143
针对Mohemmed等新近提出的基于粒子群优化(PSO)算法的离群点检测方法(MOHEMMED A,ZHANG M,BROWNE W.Particle swarm optimisation for outlier detection[C]∥GECCO'10:Proceedings of the 12th AnnualConfernce on Genetic and Evolutionary Computation.Oregon,Portland:ACM,2010:83-84)可能出现适应值和相应数据对象的离群度不匹配的不合理现象,分析了存在这种现象的原因,并提出一种改进的适应值函数.新的适应值调整了对不合理邻域半径估值的惩罚力度,从而弱化粒子适应值和对象离群度之间的偏差;算法在解空间范围内搜索近似最优粒子,以确定合适的邻域半径估值;最终基于该半径估值衡量各数据对象的离群度.通过对若干UGI数据案的实验表明,采用新的适应值函数的离群检测算法优于原有方法和LOF方法.所提算法不仅解决了上述存在的问题,离群点检测效果也更突出,这表明合理定义适应值函数有助于提高算法的检测质量.  相似文献   

适用于关联属性的样本自适应参数孤立点检测法   总被引:1,自引:0,他引:1  
为解决数据集中关联属性之间的干扰问题,通过引进Mahalanobis距离,并对传统的k近邻孤立点检测方法进行了改进,提出了一种新的基于样本的参数选取方法。该方法通过训练数据集中的正常数据和孤立点数据,以获得最优的k距离值和阈值。实验仿真结果表明,提出的算法有更高的准确率,同时降低了误检率。  相似文献   

提出了在高维空间中利用特征抽取提高离群点检测性能问题的解决方法。近年来,传统的检测技术已经不能适应高维的数据。介绍了一种有效的基于特征抽取的DROPT方法,该方法整合ERE策略和APCDA方法进行无特征损失的本征空间规则化之后降维,能够大大提高离群点检测精度,在此基础上还可以减小检测难度。实验证明这种在离群点检测中应用特征抽取的方法有一定的实用性。  相似文献   

Outlier detection algorithms are often computationally intensive because of their need to score each point in the data. Even simple distance-based algorithms have quadratic complexity. High-dimensional outlier detection algorithms such as subspace methods are often even more computationally intensive because of their need to explore different subspaces of the data. In this paper, we propose an exceedingly simple subspace outlier detection algorithm, which can be implemented in a few lines of code, and whose complexity is linear in the size of the data set and the space requirement is constant. We show that this outlier detection algorithm is much faster than both conventional and high-dimensional algorithms and also provides more accurate results. The approach uses randomized hashing to score data points and has a neat subspace interpretation. We provide a visual representation of this interpretability in terms of outlier sensitivity histograms. Furthermore, the approach can be easily generalized to data streams, where it provides an efficient approach to discover outliers in real time. We present experimental results showing the effectiveness of the approach over other state-of-the-art methods.  相似文献   

Dynamic human shape in video contains rich perceptual information, such as the body posture, identity, and even the emotional state of a person. Human locomotion activities, such as walking and running, have familiar spatiotemporal patterns that can easily be detected in arbitrary views. We present a framework for detecting shape outliers for human locomotion using a dynamic shape model that factorizes the body posture, the viewpoint, and the individual’s shape style. The model uses a common embedding of the kinematic manifold of the motion and factorizes the shape variability with respect to different viewpoints and shape styles in the space of the coefficients of the nonlinear mapping functions that are used to generate the shapes from the kinematic manifold representation. Given a corrupted input silhouette, an iterative procedure is used to recover the body posture, viewpoint, and shape style. We use the proposed outlier detection approach to fill in the holes in the input silhouettes, and detect carried objects, shadows, and abnormal motions.  相似文献   

最近几年,谱聚类思想开始用于数据挖掘领域,并取得了较好的效果;离群数据挖掘是对离群点进行检测,发掘出有用知识。将谱聚类中的NJW算法成功应用到离群数据挖掘领域,并结合离群指数的概念,提出了一种适合离群数据挖掘的谱聚类算法。与原有的基于聚类的离群检测算法相比,具有更好的效率和适应性。实验验证了所提算法的有效性和可行性。  相似文献   

过程控制时间序列中异常值的动态检测   总被引:1,自引:0,他引:1  
针对传统小波异常值检测方法的不足以及控制调节系统在调节阶段采集的震荡数据所具有的特点, 提出了适用于调节系统震荡数据异常检测的自回归模型(auto-regression, AR)与小波相结合的在线异常值检测方法. 该方法通过引入改进的鲁棒AR模型, 克服了传统小波分析方法检测控制过程数据异常值时存在的不足; 为了避免传统异常值检测方法需要事先设定检测阈值的问题, 算法引入隐马尔科夫模型(hidden Markov model, HMM)来分析小波系数, 并在线更新HMM参数, 提高了算法的检测精度. 通过实验与应用证明了本文提出的异常数据检测方法更适合震荡的控制过程数据, 具有一定的实用性.  相似文献   

To detect the problems of time delay, path error and destination error in express logistics process effectively, a novel outlier detection algorithm for express logistics is proposed in this paper. To test the detection results, the express logistics system operating model is built to test the detection results. Experiment results show that the proposed algorithm is well applied to the express logistics data with multi-attribute characteristics, and can work well in detecting the abnormal conditions of express logistics.  相似文献   

如何检测数据集中的奇异值仍然是多元校正中的1个重要的问题.对于化学计量学研究者来说,找到1个普遍适用的方法仍然是1个重要的任务.本文的目的是介绍1种较新的基于自助法的奇异值检测方法.本法以内部学生化残差为基准,用自助法对相关变量进行估计,并采用刀切-自助法对估计值进行评价.它不要求回归模型的残差服从正态分布,因而适用于大部分回归分析中的奇异值检测.本文中采用烟草和玉米样本的近红外光谱数据对该法进行验证,结果表明,采用基于自助法的奇异值检测方法剔除奇异样品后,模型的预测误差减小15%,优于学生化残差-杠杆值法和稳健偏最小二乘法.我们还在玉米近红外光谱的基础上,进行了奇异样品数的模拟研究,并采用该法进行检验.结果表明,当奇异样品的数量少于总样品数的10%时,该方法的表现较其它2种方法好.所以,基于自助法的奇异值检测方法是1种有效的方法.  相似文献   

梅林  张凤荔  高强 《计算机应用研究》2020,37(12):3521-3527
为了深入了解离群点检测技术发展状况,对近年离群点检测技术进行综述,首先介绍与总结了离群点的定义、引起离群的原因和离群点挖掘算法的分类;其次,对基于邻近性的离群点检测算法、分布式架构下的离群点检测算法以及基于深度学习的离群点检测算法进行综述与总结,尤其对该领域目前最有代表性的方法进行了探讨,指出了其优缺点;最后展望了离群点检测技术未来的研究方向。  相似文献   

Density-based outlier detection identifies an outlying observation with reference to the density of the surrounding space. In spite of the several advantages of density-based outlier detections, its computational complexity remains one of the major barriers to its application.The purpose of the present study is to reduce the computation time of LOF (Local Outlier Factor), a density-based outlier detection algorithm. The proposed method incorporates kd-tree indexing and an approximated k-nearest neighbors search algorithm (ANN). Theoretical analysis on the approximation of nearest neighbor search was conducted. A set of experiments was conducted to examine the performance of the proposed algorithm. The results show that the method can effectively detect local outliers in a reduced computation time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号