首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Local anomaly detection for mobile network monitoring   总被引:1,自引:0,他引:1  
Huge amounts of operation data are constantly collected from various parts of communication networks. These data include measurements from the radio connections and system logs from servers. System operators and developers need robust, easy to use decision support tools based on these data. One of their key applications is to detect anomalous phenomena of the network. In this paper we present an anomaly detection method that describes the normal states of the system with a self-organizing map (SOM) identified from the data. Large deviation in the data samples from the SOM nodes is detected as anomalous behavior. Large deviation has traditionally been detected using global thresholds. If variation of the data occurs in separate parts of the data space, the global thresholds either fail to reveal anomalies or reveal false anomalies. Instead of one global threshold, we can use local thresholds, which depend on the local variation of the data. We also present a method to find an adaptive threshold using the distribution of the deviations. Our anomaly detection method can be used both in exploration of history data or comparison of unforeseen data against a data model derived from history data. It is applicable to wide range of processes that produce multivariate data. In this paper we present examples of this method applied to server log data and radio interface data from mobile networks.  相似文献   

2.
Web日志挖掘是将数据挖掘技术应用到Web服务器的日志中,发现Web用户的行为模式,以便进一步改善网站结构或为用户提供个性化的服务。文中探讨了Web日志挖掘中的用户识别算法,提出了一种多重约束条件的用户识别算法。  相似文献   

3.
An outlier is defined as an observation that is significantly different from the other data in its set. An auditor will employ many techniques, processes and tools to identify these entries, and data mining is one such medium through which the auditor can analyze information. The enormous amount of information contained within transactional processing systems׳ logs means that auditors must employ automated systems for anomalous data detection. Several data mining algorithms have been tested, especially those that deal specifically with classification and outlier detection. A group of these previously described algorithms was selected for use in designing and developing a process to assist the auditor in anomalous data detection within audit logs. We have been successful in creating and ratifying an outlier detection process that works in the alphanumeric fields of the audit logs from an information system, thus constituting a useful tool for system auditors performing data analysis tasks.  相似文献   

4.
一种非均匀分布数据的非线性标准化方法   总被引:1,自引:1,他引:0  
传统的数据标准化处理通常采用的是线性的变换方法,其在处理非均匀分布的数据集时,容易因局部区间内数据点间距过小导致后续的数据挖掘(尤其是基于距离的挖掘)结果不够精确。因此,为非均匀分布数据提出一种基于数据拟合的非线性变换标准化方法,该方法能够在不改变数据整体分布规律的前提下,依据统计找出对应的非线性变换函数,根据函数对各数据点的取值进行非线性放缩,将数据稠密的区间进行扩大的同时将数据稀疏的区间进行压缩,让挖掘的结果更加精确。实验采用BP(Back Propagation)神经网络、支持向量机(Support Vector Machine,SVM)、最近邻分类(K-Nearest Neighbor,KNN) 3种经典分类算法结合不同的数据集进行了挖掘,结果表明,分类的错误率有不同程度的下降,同时F1度量有所提高。  相似文献   

5.
The rapid evolution of technology has led to the generation of high dimensional data streams in a wide range of fields, such as genomics, signal processing, and finance. The combination of the streaming scenario and high dimensionality is particularly challenging especially for the outlier detection task. This is due to the special characteristics of the data stream such as the concept drift, the limited time and space requirements, in addition to the impact of the well-known curse of dimensionality in high dimensional space. To the best of our knowledge, few studies have addressed these challenges simultaneously, and therefore detecting anomalies in this context requires a great deal of attention. The main objective of this work is to study the main approaches existing in the literature, to identify a set of comparison criteria, such as the computational cost and the interpretation of outliers, which will help us to reveal the different challenges and additional research directions associated with this problem. At the end of this study, we will draw up a summary report which summarizes the main limits identified and we will detail the different directions of research related to this issue in order to promote research for this community.  相似文献   

6.
An effective and efficient algorithm for high-dimensional outlier detection   总被引:8,自引:0,他引:8  
The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are most important for high-dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms have been proposed for outlier detection that use several concepts of proximity in order to find the outliers based on their relationship to the other points in the data. However, in high-dimensional space, the data are sparse and concepts using the notion of proximity fail to retain their effectiveness. In fact, the sparsity of high-dimensional data can be understood in a different way so as to imply that every point is an equally good outlier from the perspective of distance-based definitions. Consequently, for high-dimensional data, the notion of finding meaningful outliers becomes substantially more complex and nonobvious. In this paper, we discuss new techniques for outlier detection that find the outliers by studying the behavior of projections from the data set.Received: 19 November 2002, Accepted: 6 February 2004, Published online: 19 August 2004Edited by: R. Ng.  相似文献   

7.
An online fault detection and isolation (FDI) technique for nonlinear systems based on neurofuzzy networks (NFN) is proposed in this paper. Two NFNs are used. The first one trained by data obtained under normal operating condition models the system and the second one trained online models the residuals. Fuzzy rules that are activated under fault free and faulty conditions are extracted from the second NFN and stored in the symptom vectors using a binary code. A fault database is then formed from these symptom vectors. When applying the proposed FDI technique, the NFN that models the residuals is updated recursively online, from which the symptom vector is obtained. By comparing this symptom vector with those in the fault database, faults are isolated. Further, the fuzzy rules obtained from the symptom vector can also provide linguistic information to experienced operators for identifying the faults. The implementation and performance of the proposed FDI technique is illustrated by simulation examples involving a two-tank water level control system under faulty conditions.  相似文献   

8.
数据预处理是Web使用挖掘的一个关键环节,数据预处理的结果直接影响到后续步骤,如事务识别、路径分析、关联规则挖掘和序列模式挖掘的效果。提出了一种精确识别用户和会话的数据预处理算法(USIA),并且用实验证明它是高效的。  相似文献   

9.
基于神经网络的数据挖掘研究   总被引:10,自引:0,他引:10  
尽管神经网络具有结构复杂、网络训练时间长、结果表示不容易理解等缺陷,但其对噪声数据的高承受能力和低错误率的优点是其他方法所不及的,并在数据挖掘所采用的方法中具有优势。该文对基于神经网络的数据挖掘进行了详细的研究。  相似文献   

10.
Dynamic input–output-models have been identified for columns of an industrial sequential ion-exclusive chromatographic separation unit. Models are aimed at describing motion and form transformation of the fronts of different substances in the columns so that changes in “limit cycles” dynamics and drifts to undesired disturbed states could be observed on-line with model based simulations. The model structure has been innovated on the basis of classical Wiener representation, in which nonlinear dynamic system is described with a combination of linear Laguerre dynamics and static nonlinear mapping. The static mapping is realized here with MLP-type neural network. A separate delay model is needed for describing the movement of the front. The delay time adapts on variations of the process flow rate. Form transformation of the front is described with a dispersion model, which is smoother type Wiener-MLP model. Forward and backward Laguerre presentations are calculated with Laguerre filters. These Laguerre presentations are mapped to the output with a neural network. Dynamics of “salt” and two important compounds have been modeled on the basis of analyzed samples, which were taken in a factory experiment during normal production. A priori information about the process dynamics can be included in the dispersion model by choosing a suitable Laguerre parameter, but otherwise representativeness of the identification data determines validity of the model.  相似文献   

11.
Estimator design in jet engine applications   总被引:1,自引:0,他引:1  
Jet engines are nonlinear dynamical systems for which an exact mathematical model cannot be used for estimator design, because it is either not available or so complex that it does not fit the necessary assumptions. Thus, classical analytical tools for studying standard system properties like observability, which is very important in estimator design, cannot be directly applied. Generally, for practical jet engine applications, the designer faces two closely related problems: first, given a non-measurable parameter, find the minimal set of estimator inputs that facilitates achieving a satisfactory estimation performance (input selection); second, given a predetermined set of inputs, derive an “observability” measure that characterizes the estimation feasibility of a specific non-measurable parameter. In this paper, techniques for solving these two problems are developed and applied to estimator design for jet engine thrust, stall margins, and an unmeasurable state.  相似文献   

12.
基于用户访问树的Web日志挖掘数据预处理   总被引:1,自引:0,他引:1  
刘加伶  范军 《计算机科学》2009,36(9):154-156
在Web日志挖掘中数据预处理是整个挖掘过程的基础,直接影响日志挖掘的质量和结果.提出了一种基于用户访问树的Web日志挖掘数据预处理方法,该方法在处理过程中根据Web日志建立用户访问树,并利用用户访问树进行用户和事务识别,从而可以在缺乏网站拓扑结构的情况下准确地对Web日志进行预处理.  相似文献   

13.
This article presents a novel classification of wavelet neural networks based on the orthogonality/non-orthogonality of neurons and the type of nonlinearity employed. On the basis of this classification different network types are studied and their characteristics illustrated by means of simple one-dimensional nonlinear examples. For multidimensional problems, which are affected by the curse of dimensionality, the idea of spherical wavelet functions is considered. The behaviour of these networks is also studied for modelling of a low-dimension map.  相似文献   

14.
In this article we discuss automated preprocessing of environmental data for further use. Environmental data is by default heterogeneous, as it may consist of data from sources such as weather stations, weather radars, chemical sensors, acoustic sensors, and off-line laboratory analysis. When integrating data from such heterogeneous sources, it needs to be processed in a context dependent manner. In addition, there is no single generic processing method; rather, several atomic methods need to be applied and in an appropriate sequence. Furthermore, the problem is complicated by the requirements set by the intended use of the data. The requirements influence not only the set of applicable methods but also the application sequence. In this article, we study automation of the selection and sequencing of preprocessing methods based on the user requirements. As the main contribution, we propose here the use of characterizations and a reachability algorithm to solve the selection and sequencing problem. In this article, we present the algorithm and argue for its correctness. We also discuss, how the algorithm is implemented as a cloud service, and illustrate the use of the service with simple case studies.  相似文献   

15.
In this paper, two Neural Network (NN) identifiers are proposed for nonlinear systems identification via dynamic neural networks with different time scales including both fast and slow phenomena. The first NN identifier uses the output signals from the actual system for the system identification. The on-line update laws for dynamic neural networks have been developed using the Lyapunov function and singularly perturbed techniques. In the second NN identifier, all the output signals from nonlinear system are replaced with the state variables of the neuron networks. The on-line identification algorithm with dead-zone function is proposed to improve nonlinear system identification performance. Compared with other dynamic neural network identification methods, the proposed identification methods exhibit improved identification performance. Three examples are given to demonstrate the effectiveness of the theoretical results.  相似文献   

16.
本文研究了连续函数映射网络学习样本的次序对网络收敛性的影响,提出了样本次序重组的规则。数学分析和计算机仿真实验结果均表明,据此规则实现的改进型算法有效地克服了网络实时学习中存在的“记忆遗忘”现象。  相似文献   

17.
数据流的异常模式检测中,有时受噪声等因素影响发生概念漂移,影响了检测效率。针对此问题,提出一种基于构造型神经网络增量学习的异常模式动态检测方法,以提取滑动窗口内数据的数据概要,修正全局数据概要,更新已有的学习模型。另外,数据流速、流量等因素也影响检测效率,采用粒度分析思想改进检测方法,设置合适的时间滑动窗口,根据数据量自适应选择分析粒度,进而更准确地发现异常模式。无线电通信信号监测数据异常模式检测实验验证了本方法的有效性。  相似文献   

18.
Anomaly detection in large populations is a challenging but highly relevant problem. It is essentially a multi-hypothesis problem, with a hypothesis for every division of the systems into normal and anomalous systems. The number of hypothesis grows rapidly with the number of systems and approximate solutions become a necessity for any problem of practical interest. In this paper we take an optimization approach to this multi-hypothesis problem. It is first shown to be equivalent to a non-convex combinatorial optimization problem and then is relaxed to a convex optimization problem that can be solved distributively on the systems and that stays computationally tractable as the number of systems increase. An interesting property of the proposed method is that it can under certain conditions be shown to give exactly the same result as the combinatorial multi-hypothesis problem and the relaxation is hence tight.  相似文献   

19.
会话识别是Web日志预处理过程中的一个重要环节,针对传统会话识别的不足,提出一种改进的会话识别算法.在识别出具体的用户之后,过滤大量的框架网页;然后根据每个页面的内容及网站结构,构造出相对合理的页面访问时间阈值,并以此阈值来进行用户的会话识别.最后通过实验数据,与几种传统的会话识别方法进行了比较,表明该算法更为合理有效.  相似文献   

20.
在详细介绍ELF日志文件格式的基础上定义了会话表,并对预处理过程中几个主要步骤进行深入讨论,总结已有的各种处理手段提出新的改进方法,其中重点针对会话识别进行了改进并给出了新的算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号