首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The one-class classification problem aims to distinguish a target class from outliers. The spherical one-class classifier (SOCC) solves this problem by finding a hypersphere with minimum volume that contains the target data while keeping outlier samples outside. SOCC achieves satisfactory performance only when the target samples have the same distribution tendency in all orientations. Therefore, the performance of the SOCC is limited in the way that many superfluous outliers might be mistakenly enclosed. The authors propose to exploit target data structures obtained via unsupervised methods such as agglomerative hierarchical clustering and use them in calculating a set of hyperellipsoidal separating boundaries. This method is named the structured one-class classifier (TOCC). The optimization problem in TOCC can be formulated as a series of second-order cone programming problems that can be solved with acceptable efficiency by primal-dual interior-point methods. The experimental results on artificially generated data sets and benchmark data sets demonstrate the advantages of TOCC.  相似文献   

2.
In many common data analysis scenarios the data elements are logically grouped into sets. Venn and Euler style diagrams are a common visual representation of such set membership where the data elements are represented by labels or glyphs and sets are indicated by boundaries surrounding their members. Generating such diagrams automatically such that set regions do not intersect unless the corresponding sets have a non-empty intersection is a difficult problem. Further, it may be impossible in some cases if regions are required to be continuous and convex. Several approaches exist to draw such set regions using more complex shapes, however, the resulting diagrams can be difficult to interpret. In this paper we present two novel approaches for simplifying a complex collection of intersecting sets into a strict hierarchy that can be more easily automatically arranged and drawn (Figure 1). In the first approach, we use compact rectangular shapes for drawing each set, attempting to improve the readability of the set intersections. In the second approach, we avoid drawing intersecting set regions by duplicating elements belonging to multiple sets. We compared both of our techniques to the traditional non-convex region technique using five readability tasks. Our results show that the compact rectangular shapes technique was often preferred by experimental subjects even though the use of duplications dramatically improves the accuracy and performance time for most of our tasks. In addition to general set representation our techniques are also applicable to visualization of networks with intersecting clusters of nodes.  相似文献   

3.
《Pattern recognition》2014,47(2):854-864
In this work, a new one-class classification ensemble strategy called approximate polytope ensemble is presented. The main contribution of the paper is threefold. First, the geometrical concept of convex hull is used to define the boundary of the target class defining the problem. Expansions and contractions of this geometrical structure are introduced in order to avoid over-fitting. Second, the decision whether a point belongs to the convex hull model in high dimensional spaces is approximated by means of random projections and an ensemble decision process. Finally, a tiling strategy is proposed in order to model non-convex structures. Experimental results show that the proposed strategy is significantly better than state of the art one-class classification methods on over 200 datasets.  相似文献   

4.
As a large-scale database of hundreds of thousands of face images collected from the Internet and digital cameras becomes available, how to utilize it to train a well-performed face detector is a quite challenging problem. In this paper, we propose a method to resample a representative training set from a collected large-scale database to train a robust human face detector. First, in a high-dimensional space, we estimate geodesic distances between pairs of face samples/examples inside the collected face set by isometric feature mapping (Isomap) and then subsample the face set. After that, we embed the face set to a low-dimensional manifold space and obtain the low-dimensional embedding. Subsequently, in the embedding, we interweave the face set based on the weights computed by locally linear embedding (LLE). Furthermore, we resample nonfaces by Isomap and LLE likewise. Using the resulting face and nonface samples, we train an AdaBoost-based face detector and run it on a large database to collect false alarms. We then use the false detections to train a one-class support vector machine (SVM). Combining the AdaBoost and one-class SVM-based face detector, we obtain a stronger detector. The experimental results on the MIT + CMU frontal face test set demonstrated that the proposed method significantly outperforms the other state-of-the-art methods.  相似文献   

5.
One-class learning algorithms are used in situations when training data are available only for one class, called target class. Data for other class(es), called outliers, are not available. One-class learning algorithms are used for detecting outliers, or novelty, in the data. The common approach in one-class learning is to use density estimation techniques or adapt standard classification algorithms to define a decision boundary that encompasses only the target data. In this paper, we introduce OneClass-DS learning algorithm that combines rule-based classification with greedy search algorithm based on density of features. Its performance is tested on 25 data sets and compared with eight other one-class algorithms; the results show that it performs on par with those algorithms.  相似文献   

6.
In this paper, the problem of time-optimal control for hybrid systems with discrete-time dynamics is considered. The hybrid controller steers all trajectories starting from a maximal set to a given target set in minimum time. We derive an algorithm that computes this maximal winning set. Also, algorithms for the computation of level sets associated with the value function rather than the value function itself are presented. We show that by solving the reachability problem for the discrete time hybrid automata we obtain the time optimal solution as well. The control synthesis is subject to hard constraints on both control inputs and states. For linear discrete-time dynamics, linear programming and quantifier elimination techniques are employed for the backward reachability analysis. Emphasis is given on the computation of operators for non-convex sets using an extended convex hull approach. A two-tank example is considered in order to demonstrate the techniques of the paper.  相似文献   

7.
In this paper, we investigate the problem of estimating high-density regions from univariate or multivariate data samples. We estimate minimum volume sets, whose probability is specified in advance, known in the literature as density contour clusters. This problem is strongly related to one-class support vector machines (OCSVM). We propose a new method to solve this problem, the one-class neighbor machine (OCNM) and we show its properties. In particular, the OCNM solution asymptotically converges to the exact minimum volume set prespecified. Finally, numerical results illustrating the advantage of the new method are shown.  相似文献   

8.
Speaker verification is a challenging problem in speaker recognition where the objective is to determine whether a segment of speech in fact comes from a specific individual. In supervised machine learning terms this is a challenging problem as, while examples belonging to the target class are easy to gather, the set of counter-examples is completely open. This makes it difficult to cast this as a supervised classification problem as it is difficult to construct a representative set of counter examples. So we cast this as a one-class classification problem and evaluate a variety of state-of-the-art one-class classification techniques on a benchmark speech recognition dataset. We construct this as a two-level classification process whereby, at the lower level, speech segments of 20 ms in length are classified and then a decision on an complete speech sample is made by aggregating these component classifications. We show that of the one-class classification techniques we evaluate, Gaussian Mixture Models shows the best performance on this task.  相似文献   

9.
10.
Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers.  相似文献   

11.
Novelty detection is the identification of new observation that a machine learning system is not aware. Detecting novel instances is one of the interesting topics in recent studies. The problem of the current methods is their high run-time, so often make them unusable for large data sets. This paper presents the proposed method concerning this problem. Focusing on the task of one-class classification, the labeled data are mapped into two hypersphere regions for target and non-target objects. This mapping process is considered as a nonlinear programming. The problem is solved by employing the filled function for finding global minimizer. The global minimizer is considered as a boundary which is fit the target class. In the end, a one-class classifier to detect target class members is obtained. To present the power of the proposed method, several experiments have been conducted based on 10-fold cross-validation over real-world data sets from UCI repository. Experimental results show that the proposed method is superior than the state-of-the-art competing methods regarding applied evaluation metrics.  相似文献   

12.
针对传统的单分类器不适用于周期时间序列的异常检测,提出了一种基于移相加权球面单簇聚类的单分类器PS-WS1M-OCC.通过在聚类过程中增加高效的循环移位操作,解决了时间序列记录之间相似度计算的问题.另一方面,基于时间序列记 录的权重分布,提出了新的阈值自适应确定方法,从而使单分类器对训练集包含的异常数据和参数设置不敏感.实验表明,本文提出的单分类器可以用于周 期时间序列的异常检测;与传统的单分类器相比,可以成功地从包含异常数据的训练集中进行无监督学习,对训练集包含的异常数据鲁棒,并且对参数不敏感.  相似文献   

13.
One-class learning and concept summarization for data streams   总被引:2,自引:2,他引:0  
In this paper, we formulate a new research problem of concept learning and summarization for one-class data streams. The main objectives are to (1) allow users to label instance groups, instead of single instances, as positive samples for learning, and (2) summarize concepts labeled by users over the whole stream. The employment of the batch-labeling raises serious issues for stream-oriented concept learning and summarization, because a labeled instance group may contain non-positive samples and users may change their labeling interests at any time. As a result, so the positive samples labeled by users, over the whole stream, may be inconsistent and contain multiple concepts. To resolve these issues, we propose a one-class learning and summarization (OCLS) framework with two major components. In the first component, we propose a vague one-class learning (VOCL) module for concept learning from data streams using an ensemble of classifiers with instance level and classifier level weighting strategies. In the second component, we propose a one-class concept summarization (OCCS) module that uses clustering techniques and a Markov model to summarize concepts labeled by users, with only one scanning of the stream data. Experimental results on synthetic and real-world data streams demonstrate that the proposed VOCL module outperforms its peers for learning concepts from vaguely labeled stream data. The OCCS module is also able to rebuild a high-level summary for concepts marked by users over the stream.  相似文献   

14.
η-one-class问题和η-outlier及其LP学习算法   总被引:1,自引:0,他引:1  
陶卿  齐红威  吴高巍  章显 《计算机学报》2004,27(8):1102-1108
用SVM方法研究one-class和outlier问题.在将one-class问题理解为一种函数估计问题的基础上,作者首次定义了η-one-class和η-outlier问题的泛化错误,进而定义了线性可分性和边缘,得到了求解one-class问题的最大边缘、软边缘和v-软边缘算法.这些学习算法具有统计学习理论依据并可归结为求解线性规划问题.算法的实现采用与boosting类似的思路.实验结果表明该文的算法是有实际意义的.  相似文献   

15.
A machine learning evaluation of an artificial immune system   总被引:1,自引:0,他引:1  
ARTIS is an artificial immune system framework which contains several adaptive mechanisms. LISYS is a version of ARTIS specialized for the problem of network intrusion detection. The adaptive mechanisms of LISYS are characterized in terms of their machine-learning counterparts, and a series of experiments is described, each of which isolates a different mechanism of LISYS and studies its contribution to the system's overall performance. The experiments were conducted on a new data set, which is more recent and realistic than earlier data sets. The network intrusion detection problem is challenging because it requires one-class learning in an on-line setting with concept drift. The experiments confirm earlier experimental results with LISYS, and they study in detail how LISYS achieves success on the new data set.  相似文献   

16.
Solution of a non-convex optimization arising in PI/PID control design   总被引:1,自引:0,他引:1  
As shown by Åström et al. (Automatica 34(5) (1998) 585), the problem of designing a stabilizing PI controller based on minimizing integral of error associated with step load disturbance while subjecting to constraints on maximum sensitivity and/or complementary sensitivity amounts to that of finding the maximum allowable integral gain. The latter problem is a non-convex optimization problem whose true solution cannot be obtained with a guarantee by a gradient-based search algorithm. In this paper, we present a novel and effective approach to solve such a non-convex optimization problem. Our approach is based on regarding an equality constraint set on controller gain parameters as a two-dimensional value set in the complex plane and using the notion of principal points to characterize its boundary. With this treatment, we are able to derive analytical expressions for describing the boundary of an equality constraint set in the controller gain plane. These expressions allow one to trace the boundaries of equality constraint sets using an existing path-following algorithm. Hence, by constructing the boundary of the feasible domain in the controller gain space, the maximum allowable integral gain can be obtained. In addition to having the ability to obtain global optimal solution, our approach can handle sensitivity and complementary sensitivity constraints simultaneously without using an iterative procedure.  相似文献   

17.
Dimension reduction (DR) is important in the processing of data in domains such as multimedia or bioinformatics because such data can be of very high dimension. Dimension reduction in a supervised learning context is a well posed problem in that there is a clear objective of discovering a reduced representation of the data where the classes are well separated. By contrast DR in an unsupervised context is ill posed in that the overall objective is less clear. Nevertheless successful unsupervised DR techniques such as principal component analysis (PCA) exist—PCA has the pragmatic objective of transforming the data into a reduced number of dimensions that still captures most of the variation in the data. While one-class classification falls somewhere between the supervised and unsupervised learning categories, supervised DR techniques appear not to be applicable at all for one-class classification because of the absence of a second class label in the training data. In this paper we evaluate the use of a number of up-to-date unsupervised DR techniques for one-class classification and we show that techniques based on cluster coherence and locality preservation are effective.  相似文献   

18.
Detecting fraudulent plastic card transactions is an important and challenging problem. The challenges arise from a number of factors including the sheer volume of transactions financial institutions have to process, the asynchronous and heterogeneous nature of transactions, and the adaptive behaviour of fraudsters. In this fraud detection problem the performance of a supervised two-class classification approach is compared with performance of an unsupervised one-class classification approach. Attention is focussed primarily on one-class classification approaches. Useful representations of transaction records, and ways of combining different one-class classifiers are described. Assessment of performance for such problems is complicated by the need for timely decision making. Performance assessment measures are discussed, and the performance of a number of one- and two-class classification methods is assessed using two large, real world personal banking data sets.  相似文献   

19.
Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non‐trivial problem due to the large number of possible relations between them. We provide a systematic overview of state‐of‐the‐art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net .  相似文献   

20.
Locating human faces in photographs   总被引:19,自引:0,他引:19  
The human face is an object that is easily located in complex scenes by infants and adults alike. Yet the development of an automated system to perform this task is extremely challenging. An attempt to solve this problem raises two important issues in object location. First, natural objects such as human faces tend to have boundaries which are not exactly described by analytical functions. Second, the object of interest (face) could occur in a scene in various sizes, thus requiring the use of scale independent techniques which can detect instances of the object at all scales.Although, the task of identifying a well-framed face (as one of a set of labeled faces) has been well researched, the task of locating a face in a natural scene is relatively unexplored. We present a computational theory for locating human faces in scenes with certain constraints. The theory will be validated by experiments confined to instances where people's faces are the primary subject of the scene, occlusion is minimal, and the faces contrast well against the background.This work was supported in part by the NSF and The Eastman Kodak Company.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号