共查询到20条相似文献,搜索用时 0 毫秒
1.
In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers - objects who behave in an unexpected way or have abnormal properties. The identification of outliers is important for many applications such as intrusion detection, credit card fraud, criminal activities in electronic commerce, medical diagnosis and anti-terrorism, etc. In this paper, we propose a hybrid approach to outlier detection, which combines the opinions from boundary-based and distance-based methods for outlier detection (
[Jiang et al., 2005],
[Jiang et al., 2009] and [Knorr and Ng, 1998]). We give a novel definition of outliers - BD ( boundary and distance)- based outliers, by virtue of the notion of boundary region in rough set theory and the definitions of distance-based outliers. An algorithm to find such outliers is also given. And the effectiveness of our method for outlier detection is demonstrated on two publicly available databases. 相似文献
2.
This paper is concerned with a SOM-based data mining strategy for adaptive modelling of a slowly varying process. The aim is to follow the process in a way that makes a representative up-to-date data set of a reasonable size available at any time. The technique developed allows analysis and filtering of redundant data, detection of the need to update the process models and the core-module of the system itself and creation of process models of adaptive, data-dependent complexity. Experimental investigations performed using data from a slowly varying offset lithographic printing process have shown that the tools developed can follow the process and make the necessary adaptations of the data set and the process models. A low-process modelling error has been obtained by employing data-dependent committees for modelling the process. 相似文献
3.
The offset lithographic printing process requires the operator to make appropriate and timely on-line adjustments to compensate for color deviations from the desired print. An operator acquires proficiency by working on the same machine over a period of several years; thus he is able to apply adjustments according to its specific characteristics. It was found that this machine-specific knowledge consists of articulated and unarticulated knowledge. A connectionist representation was designed to map the observable variables to the operator's adjustments; while a forward chaining expert system was developed to represent the operator's articulated knowledge. A weight-based conflict resolution technique was constructed to dynamically update the knowledge base. This paper begins by presenting the press characterization problem. Then the development of the system is described. Finally, an analysis of results that cover all possible categories is documented. 相似文献
4.
In this paper we present a genetic solution to the outlier detection problem. The essential idea behind this technique is to define outliers by examining those projections of the data, along which the data points have abnormal or inconsistent behavior (defined in terms of their sparsity values). We use a partitioning method to divide the data set into groups such that all the objects in a group can be considered to behave similarly. We then identify those groups that contain outliers. The algorithm assigns an ‘outlier-ness’ value that gives a relative measure of how strong an outlier group is. An evolutionary search computation technique is employed for determining those projections of the data over which the outliers can be identified. A new data structure, called the grid count tree (GCT), is used for efficient computation of the sparsity factor. GCT helps in quickly determining the number of points within any grid defined over the projected space and hence facilitates faster computation of the sparsity factor. A new crossover is also defined for this purpose. The proposed method is applicable for both numeric and categorical attributes. The search complexity of the GCT traversal algorithm is provided. Results are demonstrated for both artificial and real life data sets including four gene expression data sets. 相似文献
5.
局部离群因子(LOF)是对过程数据的局部离群程度的定义,然而工业过程对数据异常检测的实时性要求高,要求出所有采样点的离群因子计算量较大。故本文对LOF算法进行相应的改进,采用k-近邻计算对象的局部可达密度,同时利用1种预处理采样点的方法CDC(Closest Distance to Center),通过计算每个点到中心点的距离先对采样点进行修剪,剔除大部分不可能是离群点的采样点,只需要计算剩余点改进的LOF值,从而提高离群点检测的效率。最终通过对TE过程数据仿真,说明在保证离群点检测准确性的情况下,相比于LOF缩短了算法运行的时间。 相似文献
6.
Zero-day cyber attacks such as worms and spy-ware are becoming increasingly widespread and dangerous. The existing signature-based intrusion detection mechanisms are often not sufficient in detecting these types of attacks. As a result, anomaly intrusion detection methods have been developed to cope with such attacks. Among the variety of anomaly detection approaches, the Support Vector Machine (SVM) is known to be one of the best machine learning algorithms to classify abnormal behaviors. The soft-margin SVM is one of the well-known basic SVM methods using supervised learning. However, it is not appropriate to use the soft-margin SVM method for detecting novel attacks in Internet traffic since it requires pre-acquired learning information for supervised learning procedure. Such pre-acquired learning information is divided into normal and attack traffic with labels separately. Furthermore, we apply the one-class SVM approach using unsupervised learning for detecting anomalies. This means one-class SVM does not require the labeled information. However, there is downside to using one-class SVM: it is difficult to use the one-class SVM in the real world, due to its high false positive rate. In this paper, we propose a new SVM approach, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised SVM approach.We use the following additional techniques to improve the performance of the proposed approach (referred to as Anomaly Detector using Enhanced SVM): First, we create a profile of normal packets using Self-Organized Feature Map (SOFM), for SVM learning without pre-existing knowledge. Second, we use a packet filtering scheme based on Passive TCP/IP Fingerprinting (PTF), in order to reject incomplete network traffic that either violates the TCP/IP standard or generation policy inside of well-known platforms. Third, a feature selection technique using a Genetic Algorithm (GA) is used for extracting optimized information from raw internet packets. Fourth, we use the flow of packets based on temporal relationships during data preprocessing, for considering the temporal relationships among the inputs used in SVM learning. Lastly, we demonstrate the effectiveness of the Enhanced SVM approach using the above-mentioned techniques, such as SOFM, PTF, and GA on MIT Lincoln Lab datasets, and a live dataset captured from a real network. The experimental results are verified by m-fold cross validation, and the proposed approach is compared with real world Network Intrusion Detection Systems (NIDS). 相似文献
7.
如何检测数据集中的奇异值仍然是多元校正中的1个重要的问题.对于化学计量学研究者来说,找到1个普遍适用的方法仍然是1个重要的任务.本文的目的是介绍1种较新的基于自助法的奇异值检测方法.本法以内部学生化残差为基准,用自助法对相关变量进行估计,并采用刀切-自助法对估计值进行评价.它不要求回归模型的残差服从正态分布,因而适用于大部分回归分析中的奇异值检测.本文中采用烟草和玉米样本的近红外光谱数据对该法进行验证,结果表明,采用基于自助法的奇异值检测方法剔除奇异样品后,模型的预测误差减小15%,优于学生化残差-杠杆值法和稳健偏最小二乘法.我们还在玉米近红外光谱的基础上,进行了奇异样品数的模拟研究,并采用该法进行检验.结果表明,当奇异样品的数量少于总样品数的10%时,该方法的表现较其它2种方法好.所以,基于自助法的奇异值检测方法是1种有效的方法. 相似文献
8.
Trajectory outlier detection is one of the most popular trajectory data mining topics. It helps researchers obtain a lot of valuable information that can be used as important guidance in monitoring and forecasting. Existing methods have difficulty in detecting the outlying trajectories with continuous multi-segment exception. To address the problem, in this paper, we propose a novel trajectory outlier detection algorithm based on common slices sub-sequence (TODCSS). For each trajectory, the direction-code sequence is firstly calculated based on the direction of each trajectory segment. Secondly, the corresponding sequence consisting of trajectory slices is obtained by inflection point segmentation. And then, the common slices sub-sequences between two trajectories are found to measure their distance. Finally, the slice outliers and trajectory outliers are detected based on the new CSS distance calculation. Both the intuitive visualization presentation and the experimental results on real Atlantic hurricane dataset, real-life mobility trajectory dataset of taxis in San Francisco and synthetic labeled dataset show that the proposed TODCSS algorithm effectively detects slice and trajectory outliers, and improves accuracy and stability in trajectory outlier detection. 相似文献
9.
To implement on-line process monitoring techniques such as principal component analysis (PCA) or partial least squares (PLS), it is necessary to extract data associated with the normal operating conditions from the plant historical database for calibrating the models. One way to do this is to use robust outlier detection algorithms such as resampling by half-means (RHM), smallest half volume (SHV), or ellipsoidal multivariate trimming (MVT) in the off-line model building phase. While RHM and SHV are conceptually clear and statistically sound, the computational requirements are heavy. Closest distance to center (CDC) is proposed in this paper as an alternative for outlier detection. The use of Mahalanobis distance in the initial step of MVT for detecting outliers is known to be ineffective. To improve MVT, CDC is incorporated with MVT. The performance was evaluated relative to the goal of finding the best half of a data set. Data sets were derived from the Tennessee Eastman process (TEP) simulator. Comparable results were obtained for RHM, SHV, and CDC. Better performance was obtained when CDC is incorporated with MVT, compared to using CDC and MVT alone. All robust outlier detection algorithms outperformed the standard PCA algorithm. The effect of auto scaling, robust scaling and a new scaling approach called modified scaling were investigated. With the presence of multiple outliers, auto scaling was found to degrade the performance of all the robust techniques. Reasonable results were obtained with the use of robust scaling and modified scaling. 相似文献
10.
Evaluating conceptual design alternatives in a new product development (NPD) environment has been one of the most critical issues for many companies which try to survive in the fast-growing world markets. Therefore, most companies have used various methods to successfully carry out this difficult and time-consuming evaluation process. Of these methods, analytic hierarchy process (AHP) has been widely used in multiple-criteria decision-making (MCDM) problems. But, in this study, we used analytical network process (ANP), a more general form of AHP, instead of AHP due to the fact that AHP cannot accommodate the variety of interactions, dependencies and feedback between higher and lower level elements. Furthermore, in some cases, due to the vagueness and uncertainty on the judgments of a decision-maker, the crisp pairwise comparison in the conventional ANP is insufficient and imprecise to capture the right judgments of the decision-maker. Therefore, a fuzzy logic is introduced in the pairwise comparison of ANP to make up for this deficiency in the conventional ANP, and is called as fuzzy ANP. In short, in this paper, a fuzzy ANP-based approach is proposed to evaluate a set of conceptual design alternatives developed in a NPD environment in order to reach to the best one satisfying both the needs and expectations of customers, and the engineering specifications of company. In addition, a numerical example is presented to illustrate the proposed approach. 相似文献
11.
Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are
EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions. 相似文献
12.
Chemometrics, the application of mathematical and statistical methods to the analysis of chemical data, is finding ever widening applications in the chemical process environment. This article reviews the chemometrics approach to chemical process monitoring and fault detection. These approaches rely on the formation of a mathematical/statistical model that is based on historical process data. New process data can then be compared with models of normal operation in order to detect a change in the system. Typical modelling approaches rely on principal components analysis, partial least squares and a variety of other chemometric methods. Applications where the ordered nature of the data is taken into account explicitly are also beginning to see use. This article reviews the state-of-the-art of process chemometrics and current trends in research and applications. 相似文献
13.
Neural Computing and Applications - Outlier detection is an essential task in data mining applications which include, military surveillance, tax fraud detection, telecommunication, etc. In recent... 相似文献
14.
Multimedia Tools and Applications - In this paper, we present a hybrid deep network based approach for crowd anomaly detection in videos. For improved performance, the proposed approach exploits... 相似文献
15.
The K nearest neighbors approach is a viable technique in time series analysis when dealing with ill-conditioned and possibly chaotic processes. Such problems are frequently encountered in, e.g., finance and production economics. More often than not, the observed processes are distorted by nonnormal disturbances, incomplete measurements, etc. that undermine the identification, estimation and performance of multivariate techniques. If outliers can be duly recognized, many crisp statistical techniques may perform adequately as such. Geno-mathematical programming provides a connection between statistical time series theory and fuzzy regression models that may be utilized e.g., in the detection of outliers. In this paper we propose a fuzzy distance measure for detecting outliers via geno-mathematical parametrization. Fuzzy KNN is connected as a linkable library to the genetic hybrid algorithm ( GHA) of the author, in order to facilitate the determination of the LR-type fuzzy number for automatic outlier detection in time series data. We demonstrate that GHA[ Fuzzy KNN] provides a platform for automatically detecting outliers in both simulated and real world data. 相似文献
16.
To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy consumption, (3) uses only single-hop communication, thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance by simulation, using real sensor data streams. Our results demonstrate that our approach is accurate and imposes reasonable communication and power consumption demands. 相似文献
17.
Pattern Analysis and Applications - Outlier detection approaches show their efficacy while extracting unforeseen knowledge in domains such as intrusion detection, e-commerce, and fraudulent... 相似文献
18.
简略介绍了PageRank算法,给出其在孤立点检测应用中的算法及实验结果和分析,最后将该算法与其他算法进行比较.结果证明,该方法能较准确地检测到孤立点,并能适应各种图形. 相似文献
19.
针对XML数据中的孤立点问题,利用聚类分析思想和XML数据嵌套结构特性所蕴含的元素间的上下文信息,设计了一种在XML半结构数据中检测孤立点的算法.该算法把逻辑相关的结点聚集到相应的子空间中,并基于这些相关子空间计算孤立点兴趣度度量XO度量,以此来识别孤立点数据.实验结果表明,该算法在一定规模的孤立点数据下能够达到较高的识别效率. 相似文献
20.
To prevent internal data leakage, database activity monitoring uses software agents to analyze protocol traffic over networks and to observe local database activities. However, the large size of data obtained from database activity monitoring has presented a significant barrier to effective monitoring and analysis of database activities. In this paper, we present database activity monitoring by means of a density-based outlier detection method and a commercial database activity monitoring solution. In order to provide efficient computing of outlier detection, we exploited a kd-tree index and an Approximated k-nearest neighbors (ANN) search method. By these means, the outlier computation time could be significantly reduced. The proposed methodology was successfully applied to a very large log dataset collected from the Korea Atomic Energy Research Institute (KAERI). The results showed that the proposed method can effectively detect outliers of database activities in a shorter computation time. 相似文献
|