首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
When scanning an object using a 3D laser scanner, the collected scanned point cloud is usually contaminated by numerous measurement outliers. These outliers can be sparse outliers, isolated or non-isolated outlier clusters. The non-isolated outlier clusters pose a great challenge to the development of an automatic outlier detection method since such outliers are attached to the scanned data points from the object surface and difficult to be distinguished from these valid surface measurement points. This paper presents an effective outlier detection method based on the principle of majority voting. The method is able to detect non-isolated outlier clusters as well as the other types of outliers in a scanned point cloud. The key component is a majority voting scheme that can cut the connection between non-isolated outlier clusters and the scanned surface so that non-isolated outliers become isolated. An expandable boundary criterion is also proposed to remove isolated outliers and preserve valid point clusters more reliably than a simple cluster size threshold. The effectiveness of the proposed method has been validated by comparing with several existing methods using a variety of scanned point clouds.  相似文献   

2.
张悦  刘杰  李航 《计算机工程》2013,39(3):46-50,55
现有孤立点检测方法大多数都需要预先设定孤立点个数,若设定不准确将降低孤立点检测的准确性。针对该问题,提出一种基于概率的孤立点检测方法。结合基于密度的DBSCAN算法与中位数求方差的方法,对待检测数据集进行聚类,提取出不包含在任何聚类中的可疑孤立点并进行分析,从而确定最终孤立点。该方法所检测的数据与时间因素线性无关,不必预先设定孤立点个数及聚类数,并且对噪声数据具有较强的抗干扰能力。IRIS测试数据集上的实验结果表明,该方法能够有效地识别孤立点。  相似文献   

3.

Outlier detection has become an important research area in the field of stream data mining due to its vast applications. In the literature, many methods have been proposed, but they work well for simple and positive regions of outliers, where boundary regions are not given much importance. Moreover, an algorithm which processes stream data must be effective and able to compute infinite data in one pass or limited number of passes. These problems have motivated us to propose an outlier detection approach for large-scale data stream. The proposed algorithm employs the concept of relative cardinality, entropy outlier factor theory of information-based system, and size-variant sliding window in stream data. In addition, we propose a new methodology for concept drift adaptation on evolving data streams. The proposed method is executed on nine benchmark datasets and compared with six existing methods that are EXPoSE, iForest, OC-SVM, LOF, KDE, and FastAbod. Experimental results show that the proposed method outperforms six existing methods in terms of receiver operating characteristic curve, precision recall, and computational time for positive regions as well as for boundary regions.

  相似文献   

4.
Outlier detection is an imperative field of data mining that has several applications in the field of medical research. Mining outliers based on the notion of rare patterns can be a promising solution for medical diagnosis as it attempts to identify the unconventional and abnormal risk patterns present in medical data. A crucial issue in medical data analysis is the continuous growth of medical databases due to the addition of new records. Existing outlier detection techniques are capable of handling only static data and thus re-execute from scratch to identify the outliers from incremental medical data. This paper introduces an efficient rare pattern based outlier detection (RPOD) method that identifies outliers by mining rare patterns from incremental data. To avoid multiple database scans and expensive candidate generation steps performed by existent rare pattern mining techniques and facilitate incremental mining, a single pass prefix tree-based rare pattern mining technique is proposed. The proposed rare pattern mining technique is a modification of the well-known FP-Growth frequent pattern mining algorithm. Furthermore, to identify the outliers based on the set of generated rare patterns, an outlier detection technique is also presented. The significance of proposed RPOD approach is demonstrated using several well-known medical datasets. Comparative performance evaluation substantiates the predominance of RPOD approach over existing outlier mining methods.  相似文献   

5.
针对数据源中出现的错误数据,分析了孤立点检测方法在数据清理中的重要性,提出了一种基于孤立点检测的错误数据清理方法。在对常用孤立点检测方法进行比较、分析的基础上,采用一种有效的孤立点检测方法来检测数据源中的孤立点。最后,以一个实例验证了该方法的效果。研究表明:基于孤立点检测的错误数据清理方法能有效地检测数据源中的错误数据。  相似文献   

6.
In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers - objects who behave in an unexpected way or have abnormal properties. The identification of outliers is important for many applications such as intrusion detection, credit card fraud, criminal activities in electronic commerce, medical diagnosis and anti-terrorism, etc. In this paper, we propose a hybrid approach to outlier detection, which combines the opinions from boundary-based and distance-based methods for outlier detection ( [Jiang et al., 2005], [Jiang et al., 2009] and [Knorr and Ng, 1998]). We give a novel definition of outliers - BD (boundary and distance)-based outliers, by virtue of the notion of boundary region in rough set theory and the definitions of distance-based outliers. An algorithm to find such outliers is also given. And the effectiveness of our method for outlier detection is demonstrated on two publicly available databases.  相似文献   

7.
针对基于主元分析 (PCA)的统计监控模型受到历史数据中异常点强烈影响的不足,鉴于建模历史数据中存在的异常点会影响过程监控效果,分析目前常用的鲁棒异常值检测算法原理及其缺陷,提出将中心最短距离(CDC)法与椭球多变量整理(MVT)法相结合,构成一种基于鲁棒尺度的CDC-MVT异常值综合检测算法,更加准确地检测异常点。将该算法应用于工业发酵过程,与CDC法和MVT法相比较,该算法能够有效去除建模数据中的异常点。  相似文献   

8.
针对窃电行为现场查证具有难以克服的现实困难,提出一种基于离群数据挖掘的窃电行为检测方法。该离群算法基于密度聚类算法,采用基于用电量波动的不同方向识别不同的用电模式,基于用电频率、离群距离以及异常规则关联度的计算挖掘潜在离群数据点,并通过基于评价矩阵确定离群阈值对离群数据点存在窃电行为的可能性进行确定性分析,实现对窃电行为的数据化检测。最后通过仿真测试证明该算法在针对混杂不同用电模式的用电数据的窃电检测方面相对于其他数据挖掘算法具有更好的性能表现。  相似文献   

9.
梁绍一  韩德强 《控制与决策》2019,34(7):1433-1440
异常点检测(outlier detection)领域的大量研究都集中于一类“基于密度的”方法,这类方法能够克服许多传统异常点检测方法的缺陷,但仍大多使用基于几何距离的方式进行数据点局部密度的估计,导致在某些情况下反直观结果的出现.针对该问题,用一种基于邻域链的方法取代传统方法进行局部密度的估计,设计新的异常点检测方法.实验结果表明,对比经典的基于密度的异常点检测方法LOF(Local outlier factor)以及几种基于LOF的改进方法,所提出的方法能够更加准确地区分正常和异常数据点,避免反直观结果的出现.  相似文献   

10.
A fuzzy index for detecting spatiotemporal outliers   总被引:1,自引:1,他引:0  
The detection of spatial outliers helps extract important and valuable information from large spatial datasets. Most of the existing work in outlier detection views the condition of being an outlier as a binary property. However, for many scenarios, it is more meaningful to assign a degree of being an outlier to each object. The temporal dimension should also be taken into consideration. In this paper, we formally introduce a new notion of spatial outliers. We discuss the spatiotemporal outlier detection problem, and we design a methodology to discover these outliers effectively. We introduce a new index called the fuzzy outlier index, FoI, which expresses the degree to which a spatial object belongs to a spatiotemporal neighbourhood. The proposed outlier detection method can be applied to phenomena evolving over time, such as moving objects, pedestrian modelling or credit card fraud.  相似文献   

11.
基于数学形态学的模糊异常点检测   总被引:1,自引:0,他引:1  
异常点检测作为数据挖掘的一项重要任务,可能会导致意想不到的知识发现.但传统的异常点检测技术都忽略了数据的自然结构,即异常点与簇的联系.然而,把异常点得分和聚类方法结合起来有利于对异常点与簇的联系的研究.提出基于数学形态学的模糊异常点检测与分析,把数学形态学技术和基于连接的异常点检测方法集成到一个模糊模型中,从异常隶属度和模糊隶属度这两个方面来分析对象与簇集的模糊关系.通过充分的实验证明,该算法能够对复杂面状和变密度的数据集,正确、高效地找出异常点,同时发现与异常点相关联的簇信息,探索异常点与簇核的关联深度,对异常点本身的意义具有启发作用.  相似文献   

12.

Data points situated near a cluster boundary are called boundary points and they can represent useful information about the process generating this data. The existing methods of boundary points detection cannot differentiate boundary points from outliers as they are affected by the presence of outliers as well as by the size and density of clusters in the dataset. Also, they require tuning of one or more parameters and prior knowledge of the number of outliers in the dataset for tuning. In this research, a boundary points detection method called BPF is proposed which can effectively differentiate boundary points from outliers and core points. BPF combines the well-known outlier detection method Local Outlier Factor (LOF) with Gravity value to calculate the BPF score. Our proposed algorithm StaticBPF can detect the top-m boundary points in the given dataset. Importantly, StaticBPF requires tuning of only one parameter i.e. the number of nearest neighbors \((k)\) and can employ the same \(k\) used by LOF for outlier detection. This paper also extends BPF for streaming data and proposes StreamBPF. StreamBPF employs a grid structure for improving k-nearest neighbor computation and an incremental method of calculating BPF scores of a subset of data points in a sliding window over data streams. In evaluation, the accuracy of StaticBPF and the runtime efficiency of StreamBPF are evaluated on synthetic and real data where they generally performed better than their competitors.

  相似文献   

13.
基于密度的局部离群点检测算法   总被引:1,自引:0,他引:1  
基于统计学和基于距离的离群点检测都依赖与给定数据点集的全局分布,然而数据通常并非都是均匀分布的。当分析分布密度相差很大的数据时,基于密度的局部离群点检测方法有着很好的识别局部离群点的能力。但存在时间复杂度较大,文章提出了一种改进的算法,能降低时间复杂度,实现有效的局部离群点的检测。  相似文献   

14.
离群点检测问题中的数据可被看作是正常点与异常点在空间中的高度混合,在减少正常点损失的前提下,离群点通常包含在离聚类中心最远的样本集中。受这种思想启发,提出一种针对高维稀疏数据的基于插值的离群点检测方法,该方法在K-means基础上应用遗传算法对原始数据进行插值处理,解决了K-means聚类中稀疏数据容易被合并的问题。实验结果表明,对比基于传统K-means聚类的离群点检测方法以及几种典型的基于改进K-means的检测方法,本文 方法损失的正常点更少,提高了检测的准确率和精确率。  相似文献   

15.
Statistical outlier detection using direct density ratio estimation   总被引:2,自引:2,他引:0  
We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score. This approach is expected to have better performance even in high-dimensional problems since methods for directly estimating the density ratio without going through density estimation are available. Among various density ratio estimation methods, we employ the method called unconstrained least-squares importance fitting (uLSIF) since it is equipped with natural cross-validation procedures, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. Furthermore, uLSIF offers a closed-form solution as well as a closed-form formula for the leave-one-out error, so it is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.  相似文献   

16.
针对数据流中离群点挖掘问题,在K-means聚类算法基础上,提出了基于距离的准则进行数据间离群点判断的离群点检测DOKM算法。根据数据流概念漂移检测结果来自适应地调整滑动窗口大小,从而实现对数据流的离群点检测,与其他离群点算法的一系列实验验证和对比结果表明,DOKM算法在人工数据集和真实数据集中均可以实现对离群点的有效检测。  相似文献   

17.
局部离群点挖掘算法研究   总被引:14,自引:0,他引:14  
离群点可分为全局离群点和局部离群点.在很多情况下,局部离群点的挖掘比全局离群点的挖掘更有意义.现有的基于局部离群度的离群点挖掘算法存在检测精度依赖于用户给定的参数、计算复杂度高等局限.文中提出将对象属性分为固有属性和环境属性,用环境属性确定对象邻域、固有属性计算离群度的方法克服上述局限;并以空间数据为例,将空间属性与非空间属性分开,用空间属性确定空间邻域,用非空间属性计算空间离群度,设计了空间离群点挖掘算法.实验结果表明,所提算法具有对用户依赖性少、检测精度高、可伸缩性强和运算效率高的优点.  相似文献   

18.
针对基于距离的离群点检测算法受全局阈值的限制, 只能检测全局离群点, 提出了基于聚类划分的两阶段离群点检测算法挖掘局部离群点。首先基于凝聚层次聚类迭代出K-means所需的k值, 然后再利用K-means的方法将数据集划分成若干个微聚类; 其次为了提高挖掘效率, 提出基于信息熵的聚类过滤机制, 判定微聚类中是否包含离群点; 最后从包含离群点的微聚类中利用基于距离的方法挖掘出相应的局部离群点。实验结果表明, 该算法效率高、检测精度高、时间复杂度低。  相似文献   

19.
异常检测一直是数据挖掘领域的重要工作之一。基于欧式距离的异常检测算法在应用于高维数据时存在检测精度无法保证和运行时间过长的问题。在基于角度方差的异常检测算法基础上提出了一种多层次的高维数据异常检测算法(Hybrid outlier detection algorithm based on angle variance for High-dimensional data, HODA)。算法结合了粗糙集理论,分析属性之间的相互作用以排除影响较小的属性;通过分析各维度上的数据分布,对数据进行网格划分,寻找可能存在异常点的网格;最后对可能存在异常点的网格计算角度方差异常因子,筛选异常数据。实验结果表明,与ABOD, FastVOA和经典LOF算法相比,HODA算法在保证精测精度的前提下,运行时间显著缩短且可扩展性强。  相似文献   

20.
鉴于离群点引发的数据质量问题给电力应用造成的不良影响,对电力感知数据的特征进行了分析,并基于电力感知数据的时间特征和异常检测技术的易用性需求,提出一种电力感知数据的离群点检测方案。该方案由异常检测服务框架和离群点检测方法构成。异常检测服务框架借鉴Web服务的思想,基于大数据技术,能够支持电力感知数据的存储和计算,并且以服务的形式提供电力感知数据的异常检测能力。离群点检测方法是基于聚类算法和考虑时间属性的数据分段方法来检测电力感知数据中的离群点异常。通过实验验证了该方法的可行性和有效性,结果表明该方法能够有效识别具有时间相关性和连续性的电力感知数据中存在的离群点,且在数据规模增大时,具有良好的并行性和可扩展性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号