共查询到20条相似文献,搜索用时 234 毫秒
1.
A software system Gel Analysis System for Epo (GASepo) has been developed within an international WADA project. As recent
WADA criteria of rEpo positivity are based on identification of each relevant object (band) in Epo images, development of
suitable methods of image segmentation and object classification were needed for the GASepo system. In the paper we address
two particular problems: segmentation of disrupted bands and classification of the segmented objects into three or two classes.
A novel band projection operator is based on convenient object merging measures and their discrimination analysis using specifically
generated training set of segmented objects. A weighted ranks classification method is proposed, which is new in the field
of image classification. It is based on ranks of the values of a specific criterial function. The weighted ranks classifiers
proposed in our paper have been evaluated on real samples of segmented objects of Epo images and compared to three selected
well-known classifiers: Fisher linear classifier, Support Vector Machine, and Multilayer Perceptron.
相似文献
2.
In this paper, we study the performance improvement that it is possible to obtain combining classifiers based on different
notions (each trained using a different physicochemical property of amino-acids). This multi-classifier has been tested in
three problems: HIV-protease; recognition of T-cell epitopes; predictive vaccinology. We propose a multi-classifier that combines
a classifier that approaches the problem as a two-class pattern recognition problem and a method based on a one-class classifier.
Several classifiers combined with the “sum rule” enables us to obtain an improvement performance over the best results previously
published in the literature.
相似文献
3.
In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the
included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment
of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community,
we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps:
first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then,
the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio
track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their
combination.
相似文献
4.
In this paper we present a novel methodology for sequence classification, based on sequential pattern mining and optimization
algorithms. The proposed methodology automatically generates a sequence classification model, based on a two stage process.
In the first stage, a sequential pattern mining algorithm is applied to a set of sequences and the sequential patterns are
extracted. Then, the score of every pattern with respect to each sequence is calculated using a scoring function and the score
of each class under consideration is estimated by summing the specific pattern scores. Each score is updated, multiplied by
a weight and the output of the first stage is the classification confusion matrix of the sequences. In the second stage an
optimization technique, aims to finding a set of weights which minimize an objective function, defined using the classification
confusion matrix. The set of the extracted sequential patterns and the optimal weights of the classes comprise the sequence
classification model. Extensive evaluation of the methodology was carried out in the protein classification domain, by varying
the number of training and test sequences, the number of patterns and the number of classes. The methodology is compared with
other similar sequence classification approaches. The proposed methodology exhibits several advantages, such as automated
weight assignment to classes using optimization techniques and knowledge discovery in the domain of application.
相似文献
5.
Actuated artificial whiskers modeled on rat macrovibrissae can provide effective tactile sensor systems for autonomous robots.
This article focuses on texture classification using artificial whiskers and addresses a limitation of previous studies, namely,
their use of whisker deflection signals obtained under relatively constrained experimental conditions. Here we consider the
classification of signals obtained from a whiskered robot required to explore different surface textures from a range of orientations
and distances. This procedure resulted in a variety of deflection signals for any given texture. Using a standard Gaussian
classifier we show, using both hand-picked features and ones derived from studies of rat vibrissal processing, that a robust
rough-smooth discrimination is achievable without any knowledge of how the whisker interacts with the investigated object.
On the other hand, finer discriminations appear to require knowledge of the target’s relative position and/or of the manner
in which the whisker contact its surface.
Electronic Supplementary Material The online version of this article () contains supplementary material, which is available to authorized users.
相似文献
6.
This paper presents a new approach to Particle Swarm Optimization, called Michigan Approach PSO (MPSO), and its application
to continuous classification problems as a Nearest Prototype (NP) classifier. In Nearest Prototype classifiers, a collection
of prototypes has to be found that accurately represents the input patterns. The classifier then assigns classes based on
the nearest prototype in this collection. The MPSO algorithm is used to process training data to find those prototypes. In
the MPSO algorithm each particle in a swarm represents a single prototype in the solution and it uses modified movement rules
with particle competition and cooperation that ensure particle diversity. The proposed method is tested both with artificial
problems and with real benchmark problems and compared with several algorithms of the same family. Results show that the particles
are able to recognize clusters, find decision boundaries and reach stable situations that also retain adaptation potential.
The MPSO algorithm is able to improve the accuracy of 1-NN classifiers, obtains results comparable to the best among other
classifiers, and improves the accuracy reported in literature for one of the problems.
相似文献
7.
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications
require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics
curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present
two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating
the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use
similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly
outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
相似文献
8.
Recently, multi-objective evolutionary algorithms have been applied to improve the difficult tradeoff between interpretability
and accuracy of fuzzy rule-based systems. It is known that both requirements are usually contradictory, however, these kinds
of algorithms can obtain a set of solutions with different trade-offs. This contribution analyzes different application alternatives
in order to attain the desired accuracy/interpr-etability balance by maintaining the improved accuracy that a tuning of membership
functions could give but trying to obtain more compact models. In this way, we propose the use of multi-objective evolutionary
algorithms as a tool to get almost one improved solution with respect to a classic single objective approach (a solution that
could dominate the one obtained by such algorithm in terms of the system error and number of rules). To do that, this work
presents and analyzes the application of six different multi-objective evolutionary algorithms to obtain simpler and still
accurate linguistic fuzzy models by performing rule selection and a tuning of the membership functions. The results on two
different scenarios show that the use of expert knowledge in the algorithm design process significantly improves the search
ability of these algorithms and that they are able to improve both objectives together, obtaining more accurate and at the
same time simpler models with respect to the single objective based approach.
相似文献
9.
As an effective technique for feature extraction and pattern classification Fisher linear discriminant (FLD) has been successfully applied in many fields. However, for a task with very high-dimensional data such as face images,
conventional FLD technique encounters a fundamental difficulty caused by singular within-class scatter matrix. To avoid the
trouble, many improvements on the feature extraction aspect of FLD have been proposed. In contrast, studies on the pattern
classification aspect of FLD are quiet few. In this paper, we will focus our attention on the possible improvement on the
pattern classification aspect of FLD by presenting a novel linear discriminant criterion called maximum scatter difference (MSD). Theoretical analysis demonstrates that MSD criterion is a generalization of Fisher discriminant criterion, and is
the asymptotic form of discriminant criterion: large margin linear projection. The performance of MSD classifier is tested in face recognition. Experiments performed on the ORL, Yale, FERET and AR databases
show that MSD classifier can compete with top-performance linear classifiers such as linear support vector machines, and is better than or equivalent to combinations of well known facial feature extraction methods, such as eigenfaces, Fisherfaces, orthogonal complementary space, nullspace, direct linear discriminant analysis, and the nearest neighbor classifier.
相似文献
10.
We present a scalable and multi-level feature extraction technique to detect malicious executables. We propose a novel combination
of three different kinds of features at different levels of abstraction. These are binary n-grams, assembly instruction sequences, and Dynamic Link Library (DLL) function calls; extracted from binary executables,
disassembled executables, and executable headers, respectively. We also propose an efficient and scalable feature extraction
technique, and apply this technique on a large corpus of real benign and malicious executables. The above mentioned features
are extracted from the corpus data and a classifier is trained, which achieves high accuracy and low false positive rate in
detecting malicious executables. Our approach is knowledge-based because of several reasons. First, we apply the knowledge
obtained from the binary n-gram features to extract assembly instruction sequences using our Assembly Feature Retrieval algorithm. Second, we apply
the statistical knowledge obtained during feature extraction to select the best features, and to build a classification model.
Our model is compared against other feature-based approaches for malicious code detection, and found to be more efficient
in terms of detection accuracy and false alarm rate.
相似文献
11.
Recently, a new class of data mining methods, known as privacy preserving data mining (PPDM) algorithms, has been developed by the research community working on security and knowledge discovery. The aim of these
algorithms is the extraction of relevant knowledge from large amount of data, while protecting at the same time sensitive
information. Several data mining techniques, incorporating privacy protection mechanisms, have been developed that allow one
to hide sensitive itemsets or patterns, before the data mining process is executed. Privacy preserving classification methods,
instead, prevent a miner from building a classifier which is able to predict sensitive data. Additionally, privacy preserving
clustering techniques have been recently proposed, which distort sensitive numerical attributes, while preserving general
features for clustering analysis. A crucial issue is to determine which ones among these privacy-preserving techniques better
protect sensitive information. However, this is not the only criteria with respect to which these algorithms can be evaluated.
It is also important to assess the quality of the data resulting from the modifications applied by each algorithm, as well
as the performance of the algorithms. There is thus the need of identifying a comprehensive set of criteria with respect to
which to assess the existing PPDM algorithms and determine which algorithm meets specific requirements.
In this paper, we present a first evaluation framework for estimating and comparing different kinds of PPDM algorithms. Then,
we apply our criteria to a specific set of algorithms and discuss the evaluation results we obtain. Finally, some considerations
about future work and promising directions in the context of privacy preservation in data mining are discussed.
*The work reported in this paper has been partially supported by the EU under the IST Project CODMINE and by the Sponsors of
CERIAS.
Editor: Geoff Webb
相似文献
12.
Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical
foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However,
despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training
complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world
data mining applications often involve huge numbers of data records. Thus it is too expensive to perform multiple scans on
the entire data set, and it is also infeasible to put the data set in memory. This paper presents a method, Clustering-Based SVM (CB-SVM), that maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies
a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples.
These samples carry statistical summaries of the data and maximize the benefit of learning. Our analyses show that the training
complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much less than that of
the entire data set. Our experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large
data sets and very accurate in terms of classification.
A preliminary version of the paper, “ Classifying Large Data Sets Using SVM with Hierarchical Clusters”, by H. Yu, J. Yang, and J. Han, appeared in Proc. 2003 Int. Conf. on Knowledge Discovery in Databases (KDD'03), Washington, DC, August 2003. However, this submission has substantially extended the previous paper and contains new and
major-value added technical contribution in comparison with the conference publication.
相似文献
13.
The k-nearest neighbors ( k-NN) classifier is one of the most popular supervised classification methods. It is very simple, intuitive and accurate in
a great variety of real-world domains. Nonetheless, despite its simplicity and effectiveness, practical use of this rule has
been historically limited due to its high storage requirements and the computational costs involved. On the other hand, the
performance of this classifier appears strongly sensitive to training data complexity. In this context, by means of several
problem difficulty measures, we try to characterize the behavior of the k-NN rule when working under certain situations. More specifically, the present analysis focuses on the use of some data complexity
measures to describe class overlapping, feature space dimensionality and class density, and discover their relation with the
practical accuracy of this classifier.
相似文献
14.
In this paper we propose three variants of a linear feature extraction technique based on Adaboost for two-class classification
problems. Unlike other feature extraction techniques, we do not make any assumptions about the distribution of the data. At
each boosting step we select from a pool of linear projections the one that minimizes the weighted error. We propose three
different variants of the feature extraction algorithm, depending on the way the pool of individual projections is constructed.
Using nine real and two artificial data sets of different original dimensionality and sample size we compare the performance
of the three proposed techniques with three classical techniques for linear feature extraction: Fisher linear discriminant
analysis (FLD), Nonparametric discriminant analysis (NDA) and a recently proposed feature extraction method for heteroscedastic
data based on the Chernoff criterion. Our results show that for data sets of relatively low-original dimensionality FLD appears
to be both the most accurate and the most economical feature extraction method (giving just one-dimension in the case of two
classes). The techniques based on Adaboost fare better than the classical techniques for data sets of large original dimensionality.
相似文献
15.
This paper proposes a framework to aid video analysts in detecting suspicious activity within the tremendous amounts of video
data that exists in today’s world of omnipresent surveillance video. Ideas and techniques for closing the semantic gap between
low-level machine readable features of video data and high-level events seen by a human observer are discussed. An evaluation
of the event classification and detection technique is presented and a future experiment to refine this technique is proposed.
These experiments are used as a lead to a discussion on the most optimal machine learning algorithm to learn the event representation
scheme proposed in this paper.
相似文献
16.
Listening to music on personal, digital devices whilst mobile is an enjoyable, everyday activity. We explore a scheme for
exploiting this practice to immerse listeners in navigation cues. Our prototype, ONTRACK, continuously adapts audio, modifying
the spatial balance and volume to lead listeners to their target destination. First we report on an initial lab-based evaluation
that demonstrated the approach’s efficacy: users were able to complete tasks within a reasonable time and their subjective
feedback was positive. Encouraged by these results we constructed a handheld prototype. Here, we discuss this implementation
and the results of field-trials. These indicate that even with a low-fidelity realisation of the concept, users can quite
effectively navigate complicated routes.
相似文献
17.
The paper presents a decision algorithmic model called vector gravitational force model in the feature space. The algorithmic
model, inspired by and similar to the Law of Universal Gravitation, is derived from the vector geometric analysis of the linear
classifier and established in the feature space. Based on this algorithmic model, we propose a classification method called
vector gravitational recognition. The proposed method is applied to the benchmark Glass Identification task in the UCI Database
available from USA Forensic Science Service, and other two UCI benchmark tasks. The experimental and comparative results show
that the proposed approach yields quite good results and outperforms some well known and recent approaches on the tasks, and
other applications may benefit from ours.
相似文献
18.
图像分类任务是计算机视觉中的一个重要研究方向。组合多种特征在一定程度上能够使得图像分类准确度得到提高。然而,如何组合多种图像特征是一个悬而未决的难题。提出了一种基于多类多核学习的多特征融合算法,并应用到图像分类任务。算法在有效地利用多核学习自动选取对当前任务有价值特征的优势的同时,避免了在多核学习中将多类问题分解为多个二分问题。在图像特征表示方面,使用字典自学习方法。实验结果表明,提出的算法能够有效地提高图像分类的准确度。 相似文献
19.
We present here a new randomized algorithm for repairing the topology of objects represented by 3D binary digital images.
By “repairing the topology”, we mean a systematic way of modifying a given binary image in order to produce a similar binary
image which is guaranteed to be well-composed. A 3D binary digital image is said to be well-composed if, and only if, the square faces shared by background and foreground
voxels form a 2D manifold. Well-composed images enjoy some special properties which can make such images very desirable in
practical applications. For instance, well-known algorithms for extracting surfaces from and thinning binary images can be
simplified and optimized for speed if the input image is assumed to be well-composed. Furthermore, some algorithms for computing
surface curvature and extracting adaptive triangulated surfaces, directly from the binary data, can only be applied to well-composed
images. Finally, we introduce an extension of the aforementioned algorithm to repairing 3D digital multivalued images. Such
an algorithm finds application in repairing segmented images resulting from multi-object segmentations of other 3D digital
multivalued images.
相似文献
20.
This work shows that earthquake damages in urban areas can be determined with an acceptable accuracy through the exploitation
of multitemporal SAR data and ancillary information defining urban blocks. In this article, two different methodologies are
presented: an unsupervised statistical analysis of the parameters of the models representing backscatterer intensity or coherence
values for each block of the urban area under analysis, and a supervised approach which involves a multi-band/multi-temporal
classification, performed using a Markov Random Field (MRF) classifier or a spatial Fuzzy ARTMAP (FA) classifier. The two
procedures are compared by using ERS images acquired before and after the earthquake of Turkey in 1999.
相似文献
|