首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Outlier or anomaly detection is a fundamental data mining task with the aim to identify data points, events, transactions which deviate from the norm. The identification of outliers in data can provide insights about the underlying data generating process. In general, outliers can be of two kinds: global and local. Global outliers are distinct with respect to the whole data set, while local outliers are distinct with respect to data points in their local neighbourhood. While several approaches have been proposed to scale up the process of global outlier discovery in large databases, this has not been the case for local outliers. We tackle this problem by optimising the use of local outlier factor (LOF) for large and high-dimensional data. We propose projection-indexed nearest-neighbours (PINN), a novel technique that exploits extended nearest-neighbour sets in a reduced-dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of random projection (RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300,000 elements and 102,600 dimensions. A further investigation into the use of high-dimensionality-specific indexing such as spatial approximate sample hierarchy (SASH) shows that our novel technique holds benefits over even these types of highly efficient indexing. We cement the practical applications of our novel technique with insights into what it means to find local outliers in real data including image and text data, and include potential applications for this knowledge.  相似文献   

2.
We propose a hybrid radial basis function network-data envelopment analysis (RBFN-DEA) neural network for classification problems. The procedure uses the radial basis function to map low dimensional input data from input space to a high dimensional + feature space where DEA can be used to learn the classification function. Using simulated datasets for a non-linearly separable binary classification problem, we illustrate how the RBFN-DEA neural network can be used to solve it. We also show how asymmetric misclassification costs can be incorporated in the hybrid RBFN-DEA model. Our preliminary experiments comparing the RBFN-DEA with feed forward and probabilistic neural networks show that the RBFN-DEA fares very well.  相似文献   

3.
In practical cluster analysis tasks, an efficient clustering algorithm should be less sensitive to parameter configurations and tolerate the existence of outliers. Based on the neural gas (NG) network framework, we propose an efficient prototype-based clustering (PBC) algorithm called enhanced neural gas (ENG) network. Several problems associated with the traditional PBC algorithms and original NG algorithm such as sensitivity to initialization, sensitivity to input sequence ordering and the adverse influence from outliers can be effectively tackled in our new scheme. In addition, our new algorithm can establish the topology relationships among the prototypes and all topology-wise badly located prototypes can be relocated to represent more meaningful regions. Experimental results1on synthetic and UCI datasets show that our algorithm possesses superior performance in comparison to several PBC algorithms and their improved variants, such as hard c-means, fuzzy c-means, NG, fuzzy possibilistic c-means, credibilistic fuzzy c-means, hard/fuzzy robust clustering and alternative hard/fuzzy c-means, in static data clustering tasks with a fixed number of prototypes.  相似文献   

4.
This paper presents a new loss function for neural network classification, inspired by the recently proposed similarity measure called Correntropy. We show that this function essentially behaves like the conventional square loss for samples that are well within the decision boundary and have small errors, and L0 or counting norm for samples that are outliers or are difficult to classify. Depending on the value of the kernel size parameter, the proposed loss function moves smoothly from convex to non-convex and becomes a close approximation to the misclassification loss (ideal 0–1 loss). We show that the discriminant function obtained by optimizing the proposed loss function in the neighborhood of the ideal 0–1 loss function to train a neural network is immune to overfitting, more robust to outliers, and has consistent and better generalization performance as compared to other commonly used loss functions, even after prolonged training. The results also show that it is a close competitor to the SVM. Since the proposed method is compatible with simple gradient based online learning, it is a practical way of improving the performance of neural network classifiers.  相似文献   

5.
基于数据间内在关联性的自适应模糊聚类模型   总被引:2,自引:0,他引:2  
唐成龙  王石刚 《自动化学报》2010,36(11):1544-1556
提出了一种新的模糊聚类模型(Fuzzy C-means clustering model, FCM), 称为自适应模糊聚类(Adaptive FCM, AFCM). 和现有的大多数模糊聚类方法不同的是, AFCM考虑了数据集中全体数据的内在关联性, 模型中引入了自适应度向量W和自适应指数p. 其中, W在迭代过程中是自适应的, p是一个给定参数. W和p共同作用调控聚类过程. AFCM同时输出三组参数: 模糊隶属度集U, 自适应度向量W, 以及聚类原型集V. 本文给出了两组数据实验验证AFCM的性能. 第1组实验验证AFCM的聚类性能, 以FCM为比较对象. 实验表明 AFCM可以得到更好的聚类质量, 而且通过合理选择自适应指数p, AFCM和FCM在时间复杂性上保持同一水平. 第2组实验检验了AFCM的离群点挖掘性能, 以目前常用的基于密度的LOF为比较对象. 实验表明AFCM算法具有极大的计算效率优势, 且AFCM得到的离群点是全局的, 反映的是离群点和整个数据集的关系, 离群点涵盖的信息也更丰富. 文章指出, AFCM在挖掘大数据集和实时数据中的离群点应用方面, 以及获得高质量的聚类结果的应用方面, 特别在聚类的同时需要挖掘离群点的应用方面具有独特的优势.  相似文献   

6.
We propose a robust algorithm for estimating the projective reconstruction from image features using the RANSAC-based Triangulation method. In this method, we select input points randomly, separate the input points into inliers and outliers by computing their reprojection error, and correct the outliers so that they can become inliers. The reprojection error and correcting outliers are computed using the Triangulation method. After correcting the outliers, we can reliably recover projective motion and structure using the projective factorization method. Experimental results showed that errors can be reduced significantly compared to the previous research as a result of robustly estimated projective reconstruction.  相似文献   

7.
We propose to fit a recurrent feedback neural network structure to input–output data through prediction error minimization. The recurrent feedback neural network structure takes the form of a nonlinear state estimator, which can compactly represent a multivariable dynamic system with stochastic inputs. The inclusion of the feedback error term as an input to the model allows the user to update the model based on feedback measurements in real-time uses. The model can be useful in a variety of applications including software sensing, process monitoring, and predictive control. A dynamic learning algorithm for training the recurrent neural network has been developed. Through some examples, we evaluate the efficacy of the proposed method and the prediction improvement achieved by the inclusion of the feedback error term.  相似文献   

8.
9.
During the course of most bioproccss development programs a large amount of process data is generated and stored. However, while these data records contain important information about the process, little or no use is made of this asset. The work described here uses a neural network approach to “learn” to recognize patterns in fermentation data. Neural networks, trained using fermentation data generated from previous runs, are then used to interpret data from a new fermentation. We propose a task decomposition approach to the problem. The approach involves decomposing the problem of bioprocess data interpretation into specific tasks. Separate neural networks are trained to perform each of these tasks which include fault diagnosis, growth phase determination and metabolic condition evaluation. These trained networks are combined into a multiple neural network hierarchy for the diagnosis of bioprocess data. The methodology is evaluated using experimental data from fed-batch, Saccharomyces cerevisiae fermentations. We argue that the task decomposition approach taken here allows for each network to develop a task specific representation and that this in turn, can lead to network activations and connection weights that are more clearly interpretable. These expert networks can now be pruned to remove nodes that do not contribute significant additional information.  相似文献   

10.
《Applied Soft Computing》2007,7(3):957-967
In this study, CPBUM neural networks with annealing robust learning algorithm (ARLA) are proposed to improve the problems of conventional neural networks for modeling with outliers and noise. In general, the obtained training data in the real applications maybe contain the outliers and noise. Although the CPBUM neural networks have fast convergent speed, these are difficult to deal with outliers and noise. Hence, the robust property must be enhanced for the CPBUM neural networks. Additionally, the ARLA can be overcome the problems of initialization and cut-off points in the traditional robust learning algorithm and deal with the model with outliers and noise. In this study, the ARLA is used as the learning algorithm to adjust the weights of the CPBUM neural networks. It tunes out that the CPBUM neural networks with the ARLA have fast convergent speed and robust against outliers and noise than the conventional neural networks with robust mechanism. Simulation results are provided to show the validity and applicability of the proposed neural networks.  相似文献   

11.
12.
《Applied Soft Computing》2007,7(2):577-584
In the paper, as an improvement of fuzzy clustering neural network FCNN proposed by Zhang et al., a novel robust fuzzy clustering neural network RFCNN is presented to cope with the sensitive issue of clustering when outliers exist. This new algorithm is based on Vapnik's ɛ-insensitive loss function and quadratic programming optimization. Our experimental results demonstrate that RFCNN has much better robustness for outliers than FCNN.  相似文献   

13.
The role of bootstrap is highlighted for nonlinear discriminant analysis using a feedforward neural network model. Statistical techniques are formulated in terms of the principle of the likelihood of a neural-network model when the data consist of ungrouped binary responses and a set of predictor variables. We illustrate that the information criterion based on the bootstrap method is shown to be favorable when selecting the optimum number of hidden units for a neural-network model. In order to summarize the measure of goodness-of-fit, the deviance on fitting a neural-network model to binary response data can be bootstrapped. We also provide the bootstrap estimates of the biases of excess error in a prediction rule constructed by fitting to the training sample in the neural network model. We also propose bootstrap methods for the analysis of residuals in order to identify outliers and examine distributional assumptions in neural-network model fitting. These methods are illustrated through the analyzes of medical diagnostic data.  相似文献   

14.

Data points situated near a cluster boundary are called boundary points and they can represent useful information about the process generating this data. The existing methods of boundary points detection cannot differentiate boundary points from outliers as they are affected by the presence of outliers as well as by the size and density of clusters in the dataset. Also, they require tuning of one or more parameters and prior knowledge of the number of outliers in the dataset for tuning. In this research, a boundary points detection method called BPF is proposed which can effectively differentiate boundary points from outliers and core points. BPF combines the well-known outlier detection method Local Outlier Factor (LOF) with Gravity value to calculate the BPF score. Our proposed algorithm StaticBPF can detect the top-m boundary points in the given dataset. Importantly, StaticBPF requires tuning of only one parameter i.e. the number of nearest neighbors \((k)\) and can employ the same \(k\) used by LOF for outlier detection. This paper also extends BPF for streaming data and proposes StreamBPF. StreamBPF employs a grid structure for improving k-nearest neighbor computation and an incremental method of calculating BPF scores of a subset of data points in a sliding window over data streams. In evaluation, the accuracy of StaticBPF and the runtime efficiency of StreamBPF are evaluated on synthetic and real data where they generally performed better than their competitors.

  相似文献   

15.
罗世华  陈坤 《控制与决策》2021,36(2):491-497
高炉冶炼是个具有高度复杂性、混沌性、时滞性的动态过程,工业上常常用铁水硅含量反馈高炉炉温热状态波动变化,而偏态投影深度在数据有偏时可以较好地反映出数据的离群情况,在高维数据分类计算中十分稳健.首先,通过差分处理及相关性分析确定11个影响因素作为输入变量,用于研究各变量变化对硅含量变化的关系;然后,将偏态投影深度值在90...  相似文献   

16.
In this paper, we propose a neural network model for predicting the durations of syllables. A four layer feedforward neural network trained with backpropagation algorithm is used for modeling the duration knowledge of syllables. Broadcast news data in three Indian languages Hindi, Telugu and Tamil is used for this study. The input to the neural network consists of a set of features extracted from the text. These features correspond to phonological, positional and contextual information. The relative importance of the positional and contextual features is examined separately. For improving the accuracy of prediction, further processing is done on the predicted values of the durations. We also propose a two-stage duration model for improving the accuracy of prediction. From the studies we find that 85% of the syllable durations could be predicted from the models within 25% of the actual duration. The performance of the duration models is evaluated using objective measures such as average prediction error (μ), standard deviation (σ) and correlation coefficient (γ).  相似文献   

17.
《Pattern recognition letters》2001,22(6-7):691-700
In this paper, a two-phase clustering algorithm for outliers detection is proposed. We first modify the traditional k-means algorithm in Phase 1 by using a heuristic “if one new input pattern is far enough away from all clusters' centers, then assign it as a new cluster center”. It results that the data points in the same cluster may be most likely all outliers or all non-outliers. And then we construct a minimum spanning tree (MST) in Phase 2 and remove the longest edge. The small clusters, the tree with less number of nodes, are selected and regarded as outlier. The experimental results show that our process works well.  相似文献   

18.
Detecting and tracking regional outliers in meteorological data   总被引:1,自引:0,他引:1  
Detecting spatial outliers can help identify significant anomalies in spatial data sequences. In the field of meteorological data processing, spatial outliers are frequently associated with natural disasters such as tornadoes and hurricanes. Previous studies on spatial outliers mainly focused on identifying single location points over a static data frame. In this paper, we propose and implement a systematic methodology to detect and track regional outliers in a sequence of meteorological data frames. First, a wavelet transformation such as the Mexican Hat or Morlet is used to filter noise and enhance the data variation. Second, an image segmentation method, λ-connected segmentation, is employed to identify the outlier regions. Finally, a regression technique is applied to track the center movement of the outlying regions for consecutive frames. In addition, we conducted experimental evaluations using real-world meteorological data and events such as Hurricane Isabel to demonstrate the effectiveness of our proposed approach.  相似文献   

19.
Recurrent neural networks and robust time series prediction   总被引:22,自引:0,他引:22  
We propose a robust learning algorithm and apply it to recurrent neural networks. This algorithm is based on filtering outliers from the data and then estimating parameters from the filtered data. The filtering removes outliers from both the target function and the inputs of the neural network. The filtering is soft in that some outliers are neither completely rejected nor accepted. To show the need for robust recurrent networks, we compare the predictive ability of least squares estimated recurrent networks on synthetic data and on the Puget Power Electric Demand time series. These investigations result in a class of recurrent neural networks, NARMA(p,q), which show advantages over feedforward neural networks for time series with a moving average component. Conventional least squares methods of fitting NARMA(p,q) neural network models are shown to suffer a lack of robustness towards outliers. This sensitivity to outliers is demonstrated on both the synthetic and real data sets. Filtering the Puget Power Electric Demand time series is shown to automatically remove the outliers due to holidays. Neural networks trained on filtered data are then shown to give better predictions than neural networks trained on unfiltered time series.  相似文献   

20.
工程应用中的手势识别需要较高的实时性和准确性,而现场环境通常无法提供足够的计算能力,采用轻量化神经网络在解决了上述问题的同时,还能达到与深度神经网络相当的识别效果。为此,提出一种基于改进轻量化神经网络的手势识别方法。该方法改进用于手部关键点检测的ReXNet网络结构,以改善骨骼点的局部关注;同时将关键点检测损失函数MSE替换为Huber loss,以提升离群点的抗干扰性。实验环境搭建基于普通单目镜头捕获图像后,经YOLO v3手部识别模型和改进的ReXNet关键点检测模型,并根据约束手部骨骼关键点的向量角而定义的不同手势,最后达到实时检测的效果。改进模型在RWTH公开数据集上的测试结果表明,改进后的手势识别方法的检测准确度较改进前整体提升2.62%,达到了96.18%,且收敛速度更快。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号