首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One-class classification belongs to the one of the novel and very promising topics in contemporary machine learning. In recent years ensemble approaches have gained significant attention due to increasing robustness to unknown outliers and reducing the complexity of the learning process. In our previous works, we proposed a highly efficient one-class classifier ensemble, based on input data clustering and training weighted one-class classifiers on clustered subsets. However, the main drawback of this approach lied in difficult and time consuming selection of a number of competence areas which indirectly affects a number of members in the ensemble. In this paper, we investigate ten different methodologies for an automatic determination of the optimal number of competence areas for the proposed ensemble. They have roots in model selection for clustering, but can be also effectively applied to the classification task. In order to select the most useful technique, we investigate their performance in a number of one-class and multi-class problems. Numerous experimental results, backed-up with statistical testing, allows us to propose an efficient and fully automatic method for tuning the one-class clustering-based ensembles.  相似文献   

2.
This paper focuses on outlier detection and its application to process monitoring. The main contribution is that we propose a dynamic ensemble detection model, of which one-class classifiers are used as base learners. Developing a dynamic ensemble model for one-class classification is challenging due to the absence of labeled training samples. To this end, we propose a procedure that can generate pseudo outliers, prior to which we transform outputs of all base classifiers to the form of probability. Then we use a probabilistic model to evaluate competence of all base classifiers. Friedman test along with Nemenyi test are used together to construct a switching mechanism. This is used for determining whether one classifier should be nominated to make the decision or a fusion method should be applied instead. Extensive experiments are carried out on 20 data sets and an industrial application to verify the effectiveness of the proposed method.  相似文献   

3.
The problem addressed in this study concerns mining data streams with concept drift. The goal of the article is to propose and validate a new approach to mining data streams with concept-drift using the ensemble classifier constructed from the one-class base classifiers. It is assumed that base classifiers of the proposed ensemble are induced from incoming chunks of the data stream. Each chunk consists of prototypes and information about whether the class prediction of these instances, carried-out at earlier steps, has been correct. Each data chunk can be updated by using the instance selection technique when new data arrive. When a new data chunk is formed, the ensemble model is also updated on the basis of weights assigned to each one-class classifier. In this article, two well-known instance-based learning algorithms—the CNN and the ENN—have been adopted to solve the one-class classification problems and, consequently, update the proposed classifier ensemble. The proposed approaches have been validated experimentally, and the computational experiment results are shown and discussed. The experiment results prove that the proposed approach using the ensemble classifier constructed from the one-class base classifiers with instance selection for chunk updating can outperform well-known approaches for data streams with concept drift.  相似文献   

4.
《Real》2002,8(3):213-226
This article presents a quantitative and objective approach to cat ganglion cell characterization and classification. The combination of several biologically relevant features such as diameter, eccentricity, fractal dimension, influence histogram, influence area, convex hull area, and convex hull diameter are derived from geometrical transforms and then processed by three different clustering methods (Ward's hierarchical scheme, K-means and genetic algorithm), whose results are then combined by a voting strategy. These experiments indicate the superiority of some features and also suggest some possible biological implications.  相似文献   

5.
The fault diagnosis of bevel gearbox is of great significance. At present, the commonly used methods are based on pattern recognition, such as support vector machine, convex hull classifier and hyperdisk classifier. However, the number of elements in the kernel matrix of these kernel function-based classification methods increases squarely with the data size, resulting in intolerable training time. Based on this, a sparse random projection-based hyperdisk classifier model is proposed. The proposed method has the following novelties: First, based on sparse random projection and the geometrical characteristics of the hyperdisk model, a method is designed to efficiently screen out the core samples, and these samples are given different weights in this process. Second, the proposed method introduces slack variables and the dynamic penalty parameter to obtain a hyperdisk model with more reasonable boundary. Last, a strategy is developed to minimize the adverse effects of imbalanced training data. The effectiveness and applicability of the proposed method are verified on bevel gearbox fault data. The experimental results show that compared with other classifiers, the proposed method can greatly reduce the training time while guaranteeing a high classification accuracy. What’s more, it has better performance and efficiency in fault diagnosis with imbalanced training data.  相似文献   

6.
The implementation of anomaly detection systems represents a key problem that has been focusing the efforts of scientific community. In this context, the use one-class techniques to model a training set of non-anomalous objects can play a significant role. One common approach to face the one-class problem is based on determining the geometric boundaries of the target set. More specifically, the use of convex hull combined with random projections offers good results but presents low performance when it is applied to non-convex sets. Then, this work proposes a new method that face this issue by implementing non-convex boundaries over each projection. The proposal was assessed and compared with the most common one-class techniques, over different sets, obtaining successful results.  相似文献   

7.
The pulse-coupled neural network (PCNN) has been widely used in image processing. The outputs of PCNN represent unique features of original stimulus and are invariant to translation, rotation, scaling and distortion, which is particularly suitable for feature extraction. In this paper, PCNN and intersecting cortical model (ICM), which is a simplified version of PCNN model, are applied to extract geometrical changes of rotation and scale invariant texture features, then an one-class support vector machine based classification method is employed to train and predict the features. The experimental results show that the pulse features outperform of the classic Gabor features in aspects of both feature extraction time and retrieval accuracy, and the proposed one-class support vector machine based retrieval system is more accurate and robust to geometrical changes than the traditional Euclidean distance based system.  相似文献   

8.
Evolving diverse ensembles using genetic programming has recently been proposed for classification problems with unbalanced data. Population diversity is crucial for evolving effective algorithms. Multilevel selection strategies that involve additional colonization and migration operations have shown better performance in some applications. Therefore, in this paper, we are interested in analysing the performance of evolving diverse ensembles using genetic programming for software defect prediction with unbalanced data by using different selection strategies. We use colonization and migration operators along with three ensemble selection strategies for the multi-objective evolutionary algorithm. We compare the performance of the operators for software defect prediction datasets with varying levels of data imbalance. Moreover, to generalize the results, gain a broader view and understand the underlying effects, we replicated the same experiments on UCI datasets, which are often used in the evolutionary computing community. The use of multilevel selection strategies provides reliable results with relatively fast convergence speeds and outperforms the other evolutionary algorithms that are often used in this research area and investigated in this paper. This paper also presented a promising ensemble strategy based on a simple convex hull approach and at the same time it raised the question whether ensemble strategy based on the whole population should also be investigated.  相似文献   

9.
由于高维数据通常存在冗余和噪声,在其上直接构造覆盖模型不能充分反映数据的分布信息,导致分类器性能下降.为此提出一种基于精简随机子空间多树集成分类方法.该方法首先生成多个随机子空间,并在每个子空间上构造独立的最小生成树覆盖模型.其次对每个子空间上构造的分类模型进行精简处理,通过一个评估准则(AUC值),对生成的一类分类器进行精简.最后均值合并融合这些分类器为一个集成分类器.实验结果表明,与其它直接覆盖分类模型和bagging算法相比,多树集成覆盖分类器具有更高的分类正确率.  相似文献   

10.
In this article, a novel technique for user’s authentication and verification using gait as a biometric unobtrusive pattern is proposed. The method is based on a two stages pipeline. First, a general activity recognition classifier is personalized for an specific user using a small sample of her/his walking pattern. As a result, the system is much more selective with respect to the new walking pattern. A second stage verifies whether the user is an authorized one or not. This stage is defined as a one-class classification problem. In order to solve this problem, a four-layer architecture is built around the geometric concept of convex hull. This architecture allows to improve robustness to outliers, modeling non-convex shapes, and to take into account temporal coherence information. Two different scenarios are proposed as validation with two different wearable systems. First, a custom high-performance wearable system is built and used in a free environment. A second dataset is acquired from an Android-based commercial device in a ‘wild’ scenario with rough terrains, adversarial conditions, crowded places and obstacles. Results on both systems and datasets are very promising, reducing the verification error rates by an order of magnitude with respect to the state-of-the-art technologies.  相似文献   

11.
Normal support vector machine (SVM) is not suitable for classification of large data sets because of high training complexity. Convex hull can simplify the SVM training. However, the classification accuracy becomes lower when there exist inseparable points. This paper introduces a novel method for SVM classification, called convex–concave hull SVM (CCH-SVM). After grid processing, the convex hull is used to find extreme points. Then, we use Jarvis march method to determine the concave (non-convex) hull for the inseparable points. Finally, the vertices of the convex–concave hull are applied for SVM training. The proposed CCH-SVM classifier has distinctive advantages on dealing with large data sets. We apply the proposed method on several benchmark problems. Experimental results demonstrate that our approach has good classification accuracy while the training is significantly faster than other SVM classifiers. Compared with the other convex hull SVM methods, the classification accuracy is higher.  相似文献   

12.
为了更加准确地检测出图像中的显著性目标,提出了多先验融合的显著性目标检测算法。针对传统中心先验对偏离图像中心的显著性目标会出现检测失效的情况,提出在多颜色空间下求显著性目标的最小凸包交集来确定目标的大致位置,以凸包区域中心计算中心先验。同时通过融合策略将凸包区域中心先验、颜色对比先验和背景先验融合并集成到特征矩阵中。最后通过低秩矩阵恢复模型生成结果显著图。在公开数据集MSRA1000和ESSCD上的仿真实验结果表明,MPLRR能够得到清晰高亮的显著性目标视觉效果图,同时F,AUC,MAE等评价指标也比现有的许多方法有明显提升。  相似文献   

13.
顾晓清  张聪  倪彤光 《控制与决策》2020,35(5):1151-1158
传统的基于核函数的分类方法中核矩阵运算复杂度较高,无法满足大规模数据分类的要求.针对这一问题,提出基于随机投影的快速凸包分类器(FCHC-RP).首先,使用随机投影的方法将样本投影到多个二维子空间,并将子空间数据映射到特征空间;其次,根据数据分布的几何特征得到凸包候选集;再次,基于凸包的定义计算出特征空间中的凸包向量;最后,使用与凸包向量对应的原始样本及其权值训练支持向量机.此外,FCHC-RP还适用于不平衡数据的分类问题,根据两类样本的不平衡程度选择不同的参数,可以得到规模相当的两类样本的凸包集,实现训练数据的类别平衡.理论分析和实验结果验证了FCHC-RP在分类性能和训练时间上的优势.  相似文献   

14.
刁树民  王永利 《计算机应用》2009,29(6):1578-1581
在进行组合决策时,已有的组合分类方法需要对多个组合分类器均有效的公共已知标签训练样本。为了解决在没有已知标签样本的情况下数据流组合分类决策问题,提出一种基于约束学习的数据流组合分类器的融合策略。在判定测试样本上的决策时,根据直推学习理论设计满足每一个局部分类器约束度量的方法,保证了约束的可行性,解决了分布式分类聚集时最大熵的直推扩展问题。测试数据集上的实验证明,与已有的直推学习方法相比,此方法可以获得更好的决策精度,可以应用于数据流组合分类的融合。  相似文献   

15.
Despite the big success of transfer learning techniques in anomaly detection, it is still challenging to achieve good transition of detection rules merely based on the preferred data in the anomaly detection with one-class classification, especially for the data with a large distribution difference. To address this challenge,a novel deep one-class transfer learning algorithm with domain-adversarial training is proposed in this paper. First, by integrating a hypersphere adaptation constraint into...  相似文献   

16.
随着移动互联网时代的到来,越来越多的含地理位置信息的空间数据需要处理,如何在海量的空间数据中进行常见的几何查询成为一个挑战,凸包问题因其在模式识别、图像处理、统计学、地理信息系统、博弈论、图论等领域中被广泛应用成为近些年研究的一个热点。凸包问题的研究始于单机版的算法,进而过渡到Hadoop等基于硬盘的分布式系统,但是受限于单节点的计算存储能力的瓶颈以及Hadoop平台基于硬盘的特性,其计算性能尚不能达到人们的在线实时计算的需求。研究基于内存的分布式计算框架Spark下的凸包问题,给出基于Spark平台的凸包查询整体框架,框架从查询接口、语法解析和物理执行等多方面结合SparkSQL引擎。随后,给出基于Andrew单调链算法的单机算法CHStand,分析单机算法并行度上的问题后,提出基于Spark的CHSpark算法,进一步优化算法并提出一种Spark平台下的优化算法CHGeom。通过实验对比说明三种算法的相对性能提升,实验发现Spark平台下的解决方案相对传统的单机平台下的解决方案有着较大的性能提升,所提算法具有良好的拓展性和广泛的实际应用价值。  相似文献   

17.
基于样本选择的最近邻凸包分类器   总被引:1,自引:0,他引:1       下载免费PDF全文
最近邻凸包分类算法是一种以测试点到各类别样本凸包的距离为分类度量的最近邻分类算法。然而,该算法的凸二次规划问题优化求解的较高的计算复杂度限制了其在较大规模数据集上的应用。本文提出一种样本选择方法——子类凸包生长法。通过迭代,选择距离选出样本凸包最远的点,直到满足终止条件,从而实现数据集的有效约简。ORL数据库和MIT-CBCL人脸识别training-synthetic库上的实验结果表明,子类凸包生长法选出的少量样本生成的凸包能够很好的表征训练集,在不降低最近邻凸包分类器性能的同时,使得算法的计算速度大为提高。  相似文献   

18.
多分类器系统作为混合智能系统的分支,集成了具有多样性的分类器集合,使整体得到更优的分类性能.结果融合是该领域中的一个重要问题,在相同分类器成员下,好的融合策略可以有效提升系统整体的分类正确率.随着模型安全性得到重视,传统融合策略可解释性差的问题凸显.本文基于心理学中的知识线记忆理论进行建模,参考人类决策过程,提出了一种...  相似文献   

19.
Many graphics applications require that 3D models automatically be presented upright and from a good view. We propose a method that simultaneously recognizes upright orientation and good view for 3D man-made models. The strategy is to determine the best base on which a 3D model can stand upright from a small set of candidate bases. Every candidate base is composed of clustered facets of the simplified convex hull of the given 3D model. Next, a proposed UV-measurement selects the best base from the candidate bases using weighted feature-based evaluation functions based on geometrical, physical, and visual aspects. Our method has been tested using a public 3D model database and compared with previous methods. As experimental results show, our method outperforms previous work in both efficiency and accuracy.  相似文献   

20.
针对解决数据缺少和单个卷积网络模型性能的限制造成细粒度分类准确率不高的问 题,提出了一种数据增强和多模型集成融合的分类算法。首先通过镜像、旋转、多尺度缩放、高 斯噪声、随机剪切和色彩增强6 种变换对CompCars 数据集进行增强处理,然后采用差异化采样 数据集的方法训练CaffeNet、VGG16 和GoogleNet 3 种差异化的网络。然后采用多重集成的方法 集成多种模型的输出结果。实验中测试网络结构在不同数据增强算法和不同模型集成下的分类结 果。模型集成的分类准确率达到94.9%,比最好的单GoogleNet 模型的分类精确率提高了9.2 个 百分点。实验结果表明该算法可以有效地提高分类的准确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号