期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multivariate Decision Trees 总被引：24，自引：0，他引：24

Brodley Carla E. Utgoff Paul E. 《Machine Learning》1995,19(1):45-77

Unlike a univariate decision tree, a multivariate decision tree is not restricted to splits of the instance space that are orthogonal to the features' axes. This article addresses several issues for constructing multivariate decision trees: representing a multivariate test, including symbolic and numeric features, learning the coefficients of a multivariate test, selecting the features to include in a test, and pruning of multivariate decision trees. We present several new methods for forming multivariate decision trees and compare them with several well-known methods. We compare the different methods across a variety of learning tasks, in order to assess each method's ability to find concise, accurate decision trees. The results demonstrate that some multivariate methods are in general more effective than others (in the context of our experimental assumptions). In addition, the experiments confirm that allowing multivariate tests generally improves the accuracy of the resulting decision tree over a univariate tree. 相似文献

2.

An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection 总被引：12，自引：0，他引：12

Lane Terran Brodley Carla E. 《Machine Learning》2003,51(1):73-107

This paper introduces the computer security domain of anomaly detection and formulates it as a machine learning task on temporal sequence data. In this domain, the goal is to develop a model or profile of the normal working state of a system user and to detect anomalous conditions as long-term deviations from the expected behavior patterns. We introduce two approaches to this problem: one employing instance-based learning (IBL) and the other using hidden Markov models (HMMs). Though not suitable for a comprehensive security solution, both approaches achieve anomaly identification performance sufficient for a low-level focus of attention detector in a multitier security system. Further, we evaluate model scaling techniques for the two approaches: two clustering techniques for the IBL approach and variation of the number of hidden states for the HMM approach. We find that over both model classes and a wide range of model scales, there is no significant difference in performance at recognizing the profiled user. We take this invariance as evidence that, in this security domain, limited memory models (e.g., fixed-length instances or low-order Markov models) can learn only part of the user identity information in which we're interested and that substantially different models will be necessary if dramatic improvements in user-based anomaly detection are to be achieved. 相似文献

3.

Goal-directed classification using linear machine decision trees

Draper B.A. Brodley C.E. Utgoff P.E. 《IEEE transactions on pattern analysis and machine intelligence》1994,16(9):888-893

Recent work in feature-based classification has focused on nonparametric techniques that can classify instances even when the underlying feature distributions are unknown. The inference algorithms for training these techniques, however, are designed to maximize the accuracy of the classifier, with all errors weighted equally. In many applications, certain errors are far more costly than others, and the need arises for nonparametric classification techniques that can be trained to optimize task-specific cost functions. This correspondence reviews the linear machine decision tree (LMDT) algorithm for inducing multivariate decision trees, and shows how LMDT can be altered to induce decision trees that minimize arbitrary misclassification cost functions (MCF's). Demonstrations of pixel classification in outdoor scenes show how MCF's can optimize the performance of embedded classifiers within the context of larger image understanding systems 相似文献

4.

Recursive Automatic Bias Selection for Classifier Construction 总被引：1，自引：0，他引：1

Brodley Carla E. 《Machine Learning》1995,20(1-2):63-94

The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; each is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In this article we present an approach that uses characteristics of the given data set, in the form of feedback from the learning process, to guide a search for a tree-structured hybrid classifier. Heuristic knowledge about the characteristics that indicate one bias is better than another is encoded in the rule base of the Model Class Selection (MCS) system. The approach does not assume that the entire instance space is best learned using a single representation language; for some data sets, choosing to form a hybrid classifier is a better bias, and MCS has the ability to determine these cases. The results of an empirical evaluation illustrate that MCS achieves classification accuracies equal to or higher than the best of its primitive learning components for each data set, demonstrating that the heuristic rules effectively select an appropriate learning bias. 相似文献

5.

Finding anomalous periodic time series

Umaa Rebbapragada Pavlos Protopapas Carla E. Brodley Charles Alcock 《Machine Learning》2009,74(3):281-313

Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD’s reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena. 相似文献

6.

FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

Noto K Brodley C Slonim D 《Data mining and knowledge discovery》2012,25(1):109-133

Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach. 相似文献

7.

Focusing attention on objects of interest using multiple matchedfilters

Stough T.M. Brodley C.E. 《IEEE transactions on image processing》2001,10(3):419-426

In order to be of use to scientists, large image databases need to be analyzed to create a catalog of the objects of interest. One approach is to apply a multiple tiered search algorithm that uses reduction techniques of increasing computational complexity to select the desired objects from the database. The first tier of this type of algorithm, often called a focus of attention (FOA) algorithm, selects candidate regions from the image data and passes them to the next tier of the algorithm. In this paper we present a new approach to FOA that employs multiple matched filters (MMF), one for each object prototype, to detect the regions of interest. The MMFs are formed using k-means clustering on a set of image patches identified by domain experts as positive examples of objects of interest. An innovation of the approach is to radically reduce the dimensionality of the feature space, used by the k-means algorithm, by taking block averages (spoiling) the sample image patches. The process of spoiling is analyzed and its applicability to other domains is discussed. The combination of the output of the MMFs is achieved through the projection of the detections back into an empty image and then thresholding. This research was motivated by the need to detect small volcanos in the Magellan probe data from Venus. An empirical evaluation of the approach illustrates that a combination of the MMF plus the average filter results in a higher likelihood of 100% detection of the objects of interest at a lower false positive rate than a single matched filter alone. 相似文献

8.

Unsupervised feature selection applied to content-based retrieval of lung images 总被引：3，自引：0，他引：3

Dy J.G. Brodley C.E. Kak A. Broderick L.S. Aisen A.M. 《IEEE transactions on pattern analysis and machine intelligence》2003,25(3):373-378

This paper describes a new hierarchical approach to content-based image retrieval called the "customized-queries" approach (CQA). Contrary to the single feature vector approach which tries to classify the query and retrieve similar images in one step, CQA uses multiple feature sets and a two-step approach to retrieval. The first step classifies the query according to the class labels of the images using the features that best discriminate the classes. The second step then retrieves the most similar images within the predicted class using the features customized to distinguish "subclasses" within that class. Needing to find the customized feature subset for each class led us to investigate feature selection for unsupervised learning. As a result, we developed a new algorithm called FSSEM (feature subset selection using expectation-maximization clustering). We applied our approach to a database of high resolution computed tomography lung images and show that CQA radically improves the retrieval precision over the single feature vector approach. To determine whether our CBIR system is helpful to physicians, we conducted an evaluation trial with eight radiologists. The results show that our system using CQA retrieval doubled the doctors' diagnostic accuracy. 相似文献

9.

Client-initiated homework in client-centered therapy.

Brodley Barbara T. 《Canadian Metallurgical Quarterly》2006,16(2):140

This article explains the way homework is integrated into client-centered therapy (sometimes called person-centered therapy). It first presents a summary of the theory based on Carl R. Rogers' therapeutic conditions (congruence, unconditional positive regard, and empathic understanding), emphasizing the importance of the nondirective attitude. It describes Rogers' change theory based on unconditional positive regard and illustrates the therapeutic interaction process with segments of a typical session conducted by Rogers. Homework is then described and explained as almost always initiated by the client, with therapists' responses that range from pure empathic following to occasionally providing suggestions and instructions. The results of a small survey of nondirective client-centered therapists concerning homework are summarized, and several client/therapist interactions relating to homework are described. Homework in client-centered therapy, when it does occur, is an outcome of clients' initiatives and is consistent with the way the therapy fosters and protects clients' autonomy, self-determination and their sense of self. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

10.

Using Human Perceptual Categories for Content-Based Retrieval from a Medical Image Database

Chi-Ren Shyu Christina Pavlopoulou Avinash C. Kak Carla E. Brodley Lynn S. Broderick 《Computer Vision and Image Understanding》2002,88(3):119

It is often difficult to come up with a well-principled approach to the selection of low-level features for characterizing images for content-based retrieval. This is particularly true for medical imagery, where gross characterizations on the basis of color and other global properties do not work. An alternative for medical imagery consists of the “scattershot” approach that first extracts a large number of features from an image and then reduces the dimensionality of the feature space by applying a feature selection algorithm such as the Sequential Forward Selection method.This contribution presents a better alternative to initial feature extraction for medical imagery. The proposed new approach consists of (i) eliciting from the domain experts (physicians, in our case) the perceptual categories they use to recognize diseases in images; (ii) applying a suite of operators to the images to detect the presence or the absence of these perceptual categories; (iii) ascertaining the discriminatory power of the perceptual categories through statistical testing; and, finally, (iv) devising a retrieval algorithm using the perceptual categories. In this paper we will present our proposed approach for the domain of high-resolution computed tomography (HRCT) images of the lung. Our empirical evaluation shows that feature extraction based on physicians' perceptual categories achieves significantly higher retrieval precision than the traditional scattershot approach. Moreover, the use of perceptually based features gives the system the ability to provide an explanation for its retrieval decisions, thereby instilling more confidence in its users. 相似文献