共查询到20条相似文献,搜索用时 15 毫秒
1.
Chengfu Yang Zhang Yi 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(7):677-683
A method of document clustering based on locality preserving indexing (LPI) and support vector machines (SVM) is presented.
The document space is generally of high dimensionality, and clustering in such a high-dimensional space is often infeasible
due to the curse of dimensionality. In this paper, by using LPI, the documents are projected into a lower-dimension semantic
space in which the documents related to the same semantic are close to each other. Then, by using SVM, the vectors in semantic
space are mapped by means of a Gaussian kernel to a high-dimensional feature space in which the minimal enclosing sphere is
searched. The sphere, when mapped back to semantics space, can separate into several independent components by the support
vectors, each enclosing a separate cluster of documents. By combining the LPI and SVM, not only higher clustering accuracies
in a more unsupervised effective way, but also better generalization properties can be obtained. Extensive demonstrations
are performed on the Reuters-21578 and TDT2 data sets.
This work was supported by National Science Foundation of China under Grant 60471055, Specialized Research Fund for the Doctoral
Program of Higher Education under Grant 20040614017. 相似文献
2.
Acoustic events produced in controlled environments may carry information useful for perceptually aware interfaces. In this paper we focus on the problem of classifying 16 types of meeting-room acoustic events. First of all, we have defined the events and gathered a sound database. Then, several classifiers based on support vector machines (SVM) are developed using confusion matrix based clustering schemes to deal with the multi-class problem. Also, several sets of acoustic features are defined and used in the classification tests. In the experiments, the developed SVM-based classifiers are compared with an already reported binary tree scheme and with their correlative Gaussian mixture model (GMM) classifiers. The best results are obtained with a tree SVM-based classifier that may use a different feature set at each node. With it, a 31.5% relative average error reduction is obtained with respect to the best result from a conventional binary tree scheme. 相似文献
3.
In this paper, we propose a novel approach to identify unknown nonlinear systems with fuzzy rules and support vector machines. Our approach consists of four steps which are on-line clustering, structure identification, parameter identification and local model combination. The collected data are firstly clustered into several groups through an on-line clustering technique, then structure identification is performed on each group using support vector machines such that the fuzzy rules are automatically generated with the support vectors. Time-varying learning rates are applied to update the membership functions of the fuzzy rules. The modeling errors are proven to be robustly stable with bounded uncertainties by a Lyapunov method and an input-to-state stability technique. Comparisons with other related works are made through a real application of crude oil blending process. The results demonstrate that our approach has good accuracy, and this method is suitable for on-line fuzzy modeling. 相似文献
4.
Support vector regression (SVR) is a powerful tool in modeling and prediction tasks with widespread application in many areas. The most representative algorithms to train SVR models are Shevade et al.'s Modification 2 and Lin's WSS1 and WSS2 methods in the LIBSVM library. Both are variants of standard SMO in which the updating pairs selected are those that most violate the Karush-Kuhn-Tucker optimality conditions, to which LIBSVM adds a heuristic to improve the decrease in the objective function. In this paper, and after presenting a simple derivation of the updating procedure based on a greedy maximization of the gain in the objective function, we show how cycle-breaking techniques that accelerate the convergence of support vector machines (SVM) in classification can also be applied under this framework, resulting in significantly improved training times for SVR. 相似文献
5.
Texture classification using the support vector machines 总被引:12,自引:0,他引:12
In recent years, support vector machines (SVMs) have demonstrated excellent performance in a variety of pattern recognition problems. In this paper, we apply SVMs for texture classification, using translation-invariant features generated from the discrete wavelet frame transform. To alleviate the problem of selecting the right kernel parameter in the SVM, we use a fusion scheme based on multiple SVMs, each with a different setting of the kernel parameter. Compared to the traditional Bayes classifier and the learning vector quantization algorithm, SVMs, and, in particular, the fused output from multiple SVMs, produce more accurate classification results on the Brodatz texture album. 相似文献
6.
Felipe Alonso-Atienza José Luis Rojo-ÁlvarezAlfredo Rosado-Muñoz Juan J. VinagreArcadi García-Alberola Gustavo Camps-Valls 《Expert systems with applications》2012,39(2):1956-1967
Early detection of ventricular fibrillation (VF) is crucial for the success of the defibrillation therapy in automatic devices. A high number of detectors have been proposed based on temporal, spectral, and time-frequency parameters extracted from the surface electrocardiogram (ECG), showing always a limited performance. The combination ECG parameters on different domain (time, frequency, and time-frequency) using machine learning algorithms has been used to improve detection efficiency. However, the potential utilization of a wide number of parameters benefiting machine learning schemes has raised the need of efficient feature selection (FS) procedures. In this study, we propose a novel FS algorithm based on support vector machines (SVM) classifiers and bootstrap resampling (BR) techniques. We define a backward FS procedure that relies on evaluating changes in SVM performance when removing features from the input space. This evaluation is achieved according to a nonparametric statistic based on BR. After simulation studies, we benchmark the performance of our FS algorithm in AHA and MIT-BIH ECG databases. Our results show that the proposed FS algorithm outperforms the recursive feature elimination method in synthetic examples, and that the VF detector performance improves with the reduced feature set. 相似文献
7.
Karim O. Elish Author Vitae Author Vitae 《Journal of Systems and Software》2008,81(5):649-660
Effective prediction of defect-prone software modules can enable software developers to focus quality assurance activities and allocate effort and resources more efficiently. Support vector machines (SVM) have been successfully applied for solving both classification and regression problems in many applications. This paper evaluates the capability of SVM in predicting defect-prone software modules and compares its prediction performance against eight statistical and machine learning models in the context of four NASA datasets. The results indicate that the prediction performance of SVM is generally better than, or at least, is competitive against the compared models. 相似文献
8.
Manuele Bicego Author Vitae Mario A.T. Figueiredo Author Vitae 《Pattern recognition》2009,42(1):27-5183
This paper describes a new soft clustering algorithm in which each cluster is modelled by a one-class support vector machine (OC-SVM). The proposed algorithm extends a previously proposed hard clustering algorithm, also based on OC-SVM representation of clusters. The key building block of our method is the weighted OC-SVM (WOC-SVM), a novel tool introduced in this paper, based on which an expectation-maximization-type soft clustering algorithm is defined. A deterministic annealing version of the algorithm is also introduced, and shown to improve the robustness with respect to initialization. Experimental results show that the proposed soft clustering algorithm outperforms its hard clustering counterpart, namely in terms of robustness with respect to initialization, as well as several other state-of-the-art methods. 相似文献
9.
Shouxian Cheng Author Vitae Author Vitae 《Pattern recognition》2007,40(3):964-971
In this paper, we present an improved incremental training algorithm for support vector machines (SVMs). Instead of selecting training samples randomly, we divide them into groups and apply the k-means clustering algorithm to collect the initial set of training samples. In active query, we assign a weight to each sample according to its confidence factor and its distance to the separating hyperplane. The confidence factor is calculated from the error upper bound of the SVM to indicate the closeness of the current hyperplane to the optimal hyperplane. A criterion is developed to eliminate non-informative training samples incrementally. Experimental results show our algorithm works successfully on artificial and real data, and is superior to the existing methods. 相似文献
10.
George E. Sakr Imad H. Elhajj 《Engineering Applications of Artificial Intelligence》2013,26(8):1892-1901
Support vector machines (SVM) have been showing high accuracy of prediction in many applications. However, as any statistical learning algorithm, SVM's accuracy drops if some of the training points are contaminated by an unknown source of noise. The choice of clean training points is critical to avoid the overfitting problem which occurs generally when the model is excessively complex, which is reflected by a high accuracy over the training set and a low accuracy over the testing set (unseen points). In this paper we present a new multi-level SVM architecture that splits the training set into points that are labeled as ‘easily classifiable’ which do not cause an increase in the model complexity and ‘non-easily classifiable’ which are responsible for increasing the complexity. This method is used to create an SVM architecture that yields on average a higher accuracy than a traditional soft margin SVM trained with the same training set. The architecture is tested on the well known US postal handwritten digit recognition problem, the Wisconsin breast cancer dataset and on the agitation detection dataset. The results show an increase in the overall accuracy for the three datasets. Throughout this paper the word confidence is used to denote the confidence over the decision as commonly used in the literature. 相似文献
11.
Simultaneous feature selection and classification using kernel-penalized support vector machines 总被引:2,自引:0,他引:2
We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for the classifier. Additionally, KP-SVM employs an explicit stopping condition, avoiding the elimination of features that would negatively affect the classifier’s performance. We performed experiments on four real-world benchmark problems comparing our approach with well-known feature selection techniques. KP-SVM outperformed the alternative approaches and determined consistently fewer relevant features. 相似文献
12.
Fuzzy functions with support vector machines 总被引:1,自引:0,他引:1
A new fuzzy system modeling (FSM) approach that identifies the fuzzy functions using support vector machines (SVM) is proposed. This new approach is structurally different from the fuzzy rule base approaches and fuzzy regression methods. It is a new alternate version of the earlier FSM with fuzzy functions approaches. SVM is applied to determine the support vectors for each fuzzy cluster obtained by fuzzy c-means (FCM) clustering algorithm. Original input variables, the membership values obtained from the FCM together with their transformations form a new augmented set of input variables. The performance of the proposed system modeling approach is compared to previous fuzzy functions approaches, standard SVM, LSE methods using an artificial sparse dataset and a real-life non-sparse dataset. The results indicate that the proposed fuzzy functions with support vector machines approach is a feasible and stable method for regression problems and results in higher performances than the classical statistical methods. 相似文献
13.
《Pattern recognition》2003,36(7):1479-1488
Semiparametric Support Vector Machines have shown to present advantages with respect to nonparametric approaches, in the sense that generalization capability is further improved and the size of the machines is always under control. We propose here an incremental procedure for Growing Support Vector Classifiers, which serves to avoid an a priori architecture estimation or the application of a pruning mechanism after SVM training. The proposed growing approach also opens up new possibilities for dealing with multi-kernel machines, automatic selection of hyperparameters, and fast classification methods. The performance of the proposed algorithm and its extensions is evaluated using several benchmark problems. 相似文献
14.
It is important to develop a reliable system for predicting bacterial virulent proteins for finding novel drug/vaccine and for understanding virulence mechanisms in pathogens.In this work we have proposed a bacterial virulent protein prediction method based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence of a given protein. It is well known in the literature that the features extracted from the evolutionary information of a given protein are better than the features extracted from the amino acid sequence. Our method tries to fill the gap between the amino acid sequence based approaches and the evolutionary information based approaches.An extensive evaluation according to a blind testing protocol, where the parameters of the system are calculated using the training set and the system is validated in three different independent datasets, has demonstrated the validity of the proposed method. 相似文献
15.
Towards improving fuzzy clustering using support vector machine: Application to gene expression data
Anirban Mukhopadhyay Author Vitae Ujjwal Maulik Author Vitae 《Pattern recognition》2009,42(11):2744-2763
Recent advancement in microarray technology permits monitoring of the expression levels of a large set of genes across a number of time points simultaneously. For extracting knowledge from such huge volume of microarray gene expression data, computational analysis is required. Clustering is one of the important data mining tools for analyzing such microarray data to group similar genes into clusters. Researchers have proposed a number of clustering algorithms in this purpose. In this article, an attempt has been made in order to improve the performance of fuzzy clustering by combining it with support vector machine (SVM) classifier. A recently proposed real-coded variable string length genetic algorithm based clustering technique and an iterated version of fuzzy C-means clustering have been utilized in this purpose. The performance of the proposed clustering scheme has been compared with that of some well-known existing clustering algorithms and their SVM boosted versions for one simulated and six real life gene expression data sets. Statistical significance test based on analysis of variance (ANOVA) followed by posteriori Tukey-Kramer multiple comparison test has been conducted to establish the statistical significance of the superior performance of the proposed clustering scheme. Moreover biological significance of the clustering solutions have been established. 相似文献
16.
As an exploratory approach, the clustering of fMRI time series has proved its effectiveness in analyzing the functional MRI, especially in the detection of activated regions. Due to the arbitrary distribution of fMRI time series in the temporal domain, imposing simple assumption on the data structure usually could be misleading and limit the detector's performance. Therefore, a true data-driven clustering algorithm that adapts to the data structure is preferred, and only high-level control over the clustering procedure is desired. Support vector clustering (SVC) is a suitable one in some extent because of its advantages, such as no cluster shape restriction, no need to explicitly specify the number of clusters, and the mechanism in outlier elimination. In this work, we propose an extension of the SVC to step further toward a data-sensitive detector. This approach is named as ellipsoidal support vector clustering (ESVC). To be robust to noise, the clustering is performed on features extracted from the fMRI time series via Fourier transform. Experimental results on simulated and real data sets demonstrate the effectiveness of incorporating data structure in clustering fMRI time series. 相似文献
17.
Ligang Zhou Kin Keung Lai Lean Yu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2009,13(2):149-155
Support vector machines (SVM) is an effective tool for building good credit scoring models. However, the performance of the
model depends on its parameters’ setting. In this study, we use direct search method to optimize the SVM-based credit scoring
model and compare it with other three parameters optimization methods, such as grid search, method based on design of experiment
(DOE) and genetic algorithm (GA). Two real-world credit datasets are selected to demonstrate the effectiveness and feasibility
of the method. The results show that the direct search method can find the effective model with high classification accuracy
and good robustness and keep less dependency on the initial search space or point setting. 相似文献
18.
In cancer classification based on gene expression data, it would be desirable to defer a decision for observations that are difficult to classify. For instance, an observation for which the conditional probability of being cancer is around 1/2 would preferably require more advanced tests rather than an immediate decision. This motivates the use of a classifier with a reject option that reports a warning in cases of observations that are difficult to classify. In this paper, we consider a problem of gene selection with a reject option. Typically, gene expression data comprise of expression levels of several thousands of candidate genes. In such cases, an effective gene selection procedure is necessary to provide a better understanding of the underlying biological system that generates data and to improve prediction performance. We propose a machine learning approach in which we apply the l1 penalty to the SVM with a reject option. This method is referred to as the l1 SVM with a reject option. We develop a novel optimization algorithm for this SVM, which is sufficiently fast and stable to analyze gene expression data. The proposed algorithm realizes an entire solution path with respect to the regularization parameter. Results of numerical studies show that, in comparison with the standard l1 SVM, the proposed method efficiently reduces prediction errors without hampering gene selectivity. 相似文献
19.
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation. 相似文献
20.
Yinqiao Yan 《Optimization methods & software》2020,35(4):855-883
ABSTRACT Support vector machine (SVM) has proved to be a successful approach for machine learning. Two typical SVM models are the L1-loss model for support vector classification (SVC) and ε-L1-loss model for support vector regression (SVR). Due to the non-smoothness of the L1-loss function in the two models, most of the traditional approaches focus on solving the dual problem. In this paper, we propose an augmented Lagrangian method for the L1-loss model, which is designed to solve the primal problem. By tackling the non-smooth term in the model with Moreau–Yosida regularization and the proximal operator, the subproblem in augmented Lagrangian method reduces to a non-smooth linear system, which can be solved via the quadratically convergent semismooth Newton's method. Moreover, the high computational cost in semismooth Newton's method can be significantly reduced by exploring the sparse structure in the generalized Jacobian. Numerical results on various datasets in LIBLINEAR show that the proposed method is competitive with the most popular solvers in both speed and accuracy. 相似文献