首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
针对异常检测系统虚警率高、检测率低以及冗余特征对检测系统造成负担的问题,提出一种基于特征选择和支持向量机相结合的异常检测方法.该方法通过构造一种基于分类模型分类准确率计算的特征选择算法,筛选出能够获得分类准确率最高的特征组合,并与支持向量机分类算法相结合,实现数据的异常检测.仿真测试结果表明,该方法具有较高的检测准确率和较低的检测时间,并通过去除噪声特征,降低了系统的数据处理难度.  相似文献   

2.
Stock index forecasting is one of the most difficult tasks that financial organizations, firms and private investors have to face. Support vector regression (SVR) has become a popular alternative in stock index forecasting tasks due to its generalization capability in obtaining a unique solution. However, the major limitation of SVR is that it cannot capture the relative importance of independent variables to the dependent variable when many potential independent variables are considered. This study incorporates feature selection method and SVR for building stock index forecasting model. The proposed model uses multivariate adaptive regression splines (MARS), an effective nonlinear and nonparametric regression methodology, to identify important forecasting variables. The obtained significant predictor variables are then served as the inputs for the SVR model. Experimental results reveal that the obtained important variables from MARS can improve the forecasting performance of the SVR models. Moreover, the MARS results provide useful information about the relationship between the selected predictor variables and stock index through the obtained basis functions, important predictor variables and the MARS prediction function. Hence, the proposed stock index forecasting model can generate good forecasting performance and exhibits the capability of identifying significant predictor variables, which provide valuable information for further investment decisions/strategies.  相似文献   

3.
This paper focuses on feature selection in classification. A new version of support vector machine (SVM) named p-norm support vector machine ( $p\in[0,1]$ ) is proposed. Different from the standard SVM, the p-norm $(p\in[0,1])$ of the normal vector of the decision plane is used which leads to more sparse solution. Our new model can not only select less features but also improve the classification accuracy by adjusting the parameter p. The numerical experiments results show that our p-norm SVM is more effective than some usual methods in feature selection.  相似文献   

4.
从工业生产过程实用的观点出发,鉴于小波变换能有效地对信号进行消噪的优点和支持向量机的卓越学习性能,探讨基于小波和支持向量机的故障趋势预报,并结合专家系统建立解释机制。将其用于工业精对苯二甲酸(PTA)生产过程中对二甲苯(PX)氧化反应器尾氧浓度故障预报的结果表明:该方法能准确地对尾氧浓度故障趋势进行预测,并同时给出故障产生的概率大小,为PX氧化反应器的安全平稳操作提供了保证。  相似文献   

5.
In the areas of investment research and applications, feasible quantitative models include methodologies stemming from soft computing for prediction of financial time series, multi-objective optimization of investment return and risk reduction, as well as selection of investment instruments for portfolio management based on asset ranking using a variety of input variables and historical data, etc. Among all these, stock selection has long been identified as a challenging and important task. This line of research is highly contingent upon reliable stock ranking for successful portfolio construction. Recent advances in machine learning and data mining are leading to significant opportunities to solve these problems more effectively. In this study, we aim at developing a methodology for effective stock selection using support vector regression (SVR) as well as genetic algorithms (GAs). We first employ the SVR method to generate surrogates for actual stock returns that in turn serve to provide reliable rankings of stocks. Top-ranked stocks can thus be selected to form a portfolio. On top of this model, the GA is employed for the optimization of model parameters, and feature selection to acquire optimal subsets of input variables to the SVR model. We will show that the investment returns provided by our proposed methodology significantly outperform the benchmark. Based upon these promising results, we expect this hybrid GA-SVR methodology to advance the research in soft computing for finance and provide an effective solution to stock selection in practice.  相似文献   

6.
In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE.  相似文献   

7.
With the development and popularization of the remote-sensing imaging technology, there are more and more applications of hyperspectral image classification tasks, such as target detection and land cover investigation. It is a very challenging issue of urgent importance to select a minimal and effective subset from those mass of bands. This paper proposed a hybrid feature selection strategy based on genetic algorithm and support vector machine (GA–SVM), which formed a wrapper to search for the best combination of bands with higher classification accuracy. In addition, band grouping based on conditional mutual information between adjacent bands was utilized to counter for the high correlation between the bands and further reduced the computational cost of the genetic algorithm. During the post-processing phase, the branch and bound algorithm was employed to filter out those irrelevant band groups. Experimental results on two benchmark data sets have shown that the proposed approach is very competitive and effective.  相似文献   

8.
Support vector machine (SVM) is a novel pattern classification method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, significantly affects classification accuracy. The objective of this study is to obtain the better parameter values while also finding a subset of features that does not degrade the SVM classification accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM.To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classification accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM.  相似文献   

9.
Pattern Analysis and Applications - With the rapid development of computer technology, data collection becomes easier, and data object presents more complex. Data analysis method based on machine...  相似文献   

10.
In many pattern recognition applications, high-dimensional feature vectors impose a high computational cost as well as the risk of “overfitting”. Feature Selection addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification. This paper presents a novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM). In comparison with conventional wrapper methods that employ the SFS strategy, FS_SFS has two important properties to reduce the time of computation. First, it dynamically maintains a subset of samples for the training of SVM. Because not all the available samples participate in the training process, the computational cost to obtain a single SVM classifier is decreased. Secondly, a new criterion, which takes into consideration both the discriminant ability of individual features and the correlation between them, is proposed to effectively filter out nonessential features. As a result, the total number of training is significantly reduced and the overfitting problem is alleviated. The proposed approach is tested on both synthetic and real data to demonstrate its effectiveness and efficiency.  相似文献   

11.
Selecting relevant features for support vector machine (SVM) classifiers is important for a variety of reasons such as generalization performance, computational efficiency, and feature interpretability. Traditional SVM approaches to feature selection typically extract features and learn SVM parameters independently. Independently performing these two steps might result in a loss of information related to the classification process. This paper proposes a convex energy-based framework to jointly perform feature selection and SVM parameter learning for linear and non-linear kernels. Experiments on various databases show significant reduction of features used while maintaining classification performance.  相似文献   

12.
Piecewise linear representation (PLR) and back-propagation artificial neural network (BPN) have been integrated for the stock trading signal prediction recently (PLR–BPN). However, there are some disadvantages in avoiding over-fitting, trapping in local minimum and choosing the threshold of the trading decision. Since support vector machine (SVM) has a good way to avoid over-fitting and trapping in local minimum, we integrate PLR and weighted SVM (WSVM) to forecast the stock trading signals (PLR–WSVM). The new characteristics of PLR–WSVM are as follows: (1) the turning points obtained from PLR are set by different weights according to the change rate of the closing price between the current turning point and the next one, in which the weight reflects the relative importance of each turning point; (2) the prediction of stock trading signal is formulated as a weighted four-class classification problem, in which it does not need to determine the threshold of trading decision; (3) WSVM is used to model the relationship between the trading signal and the input variables, which improves the generalization performance of prediction model; (4) the history dataset is divided into some overlapping training–testing sets rather than training–validation–testing, which not only makes use of data fully but also reduces the time variability of data; and (5) some new technical indicators representing investors’ sentiment are added to the input variables, which improves the prediction performance. The comparative experiments among PLR–WSVM, PLR–BPN and buy-and-hold strategy (BHS) on 20 shares from Shanghai Stock Exchange in China show that the prediction accuracy and profitability of PLR–WSVM are all the best, which indicates PLR–WSVM is effective and can be used in the stock trading signal prediction.  相似文献   

13.
It is well-known that software defect prediction is one of the most important tasks for software quality improvement. The use of defect predictors allows test engineers to focus on defective modules. Thereby testing resources can be allocated effectively and the quality assurance costs can be reduced. For within-project defect prediction (WPDP), there should be sufficient data within a company to train any prediction model. Without such local data, cross-project defect prediction (CPDP) is feasible since it uses data collected from similar projects in other companies. Software defect datasets have the class imbalance problem increasing the difficulty for the learner to predict defects. In addition, the impact of imbalanced data on the real performance of models can be hidden by the performance measures chosen. We investigate if the class imbalance learning can be beneficial for CPDP. In our approach, the asymmetric misclassification cost and the similarity weights obtained from distributional characteristics are closely associated to guide the appropriate resampling mechanism. We performed the effect size A-statistics test to evaluate the magnitude of the improvement. For the statistical significant test, we used Wilcoxon rank-sum test. The experimental results show that our approach can provide higher prediction performance than both the existing CPDP technique and the existing class imbalance technique.  相似文献   

14.
Traditional classifiers including support vector machines use only labeled data in training. However, labeled instances are often difficult, costly, or time consuming to obtain while unlabeled instances are relatively easy to collect. The goal of semi-supervised learning is to improve the classification accuracy by using unlabeled data together with a few labeled data in training classifiers. Recently, the Laplacian support vector machine has been proposed as an extension of the support vector machine to semi-supervised learning. The Laplacian support vector machine has drawbacks in its interpretability as the support vector machine has. Also it performs poorly when there are many non-informative features in the training data because the final classifier is expressed as a linear combination of informative as well as non-informative features. We introduce a variant of the Laplacian support vector machine that is capable of feature selection based on functional analysis of variance decomposition. Through synthetic and benchmark data analysis, we illustrate that our method can be a useful tool in semi-supervised learning.  相似文献   

15.
Forecasting a stock price movement is one of the most difficult problems in finance. The reason is that financial time series are complex, non stationary. Furthermore, it is also very difficult to predict this movement with parametric models. Instead of parametric models, we propose two techniques, which are data driven and non parametric. Based on the idea that excess returns would be possible with publicly available information, we developed two models in order to forecast the short term price movements by using technical indicators. Our assumption is that the future value of a stock price depends on the financial indicators although there is no parametric model to explain this relationship. This relationship comes from the technical analysis. Comparison shows that support vector regression (SVR) out performs the multi layer perceptron (MLP) networks for a short term prediction in terms of the mean square error. If the risk premium is used as a comparison criterion, then the SVR technique is as good as the MLP method or better.  相似文献   

16.
为克服传统过程监控方法需假设过程特征信号服从多元正态分布的缺陷,本文提出了一种将独立成分分析(ICA)与支持向量机结合的故障诊断方法。通过建立独立成分模型确定相应的统计量界限,筛选出需进一步检测的故障数据,再由支持向量机进行故障识别。将该方法用于化工聚合反应的过程监控与故障诊断中,仿真结果表明,这种混合故障诊断方法通过适当地调节统计量控制界限,不仅能够正确识别故障,而且能够纠正由误检数据引起的误报,提高故障诊断的准确率。  相似文献   

17.
Multivariate calibration is a classic problem in the analytical chemistry field and frequently solved by partial least squares (PLS) and artificial neural networks (ANNs) in the previous works. The spaciality of multivariate calibration is high dimensionality with small sample. Here, we apply support vector regression (SVR) as well as ANNs, and PLS to the multivariate calibration problem in the determination of the three aromatic amino acids (phenylalanine, tyrosine and tryptophan) in their mixtures by fluorescence spectroscopy. The results of the leave-one-out method show that SVR performs better than other methods, and appear to be one good method for this task. Furthermore, feature selection is performed for SVR to remove redundant features and a novel algorithm named Prediction RIsk based FEature selection for support vector Regression (PRIFER) is proposed. Results on the above multivariate calibration data set show that PRIFER is a powerful tool for solving the multivariate calibration problems.  相似文献   

18.
基于一种新的特征提取法和支持向量机的膜蛋白分类研究   总被引:1,自引:2,他引:1  
引入加权思想,以一种新的特征提取法,即加权自相关函数,表示蛋白质序列,与支持向量机组合,并采用“一对多”、“一对一”分类策略对膜蛋白进行分类研究,结果有明显改善。在采用支持向量机算法及“一对多”分类策略下,加权自相关函数特征提取法的每一类别分类精度、Matthews相关系数和总分类精度都要高于氨基酸组成成分特征提取法相应的分类结果, 其总分类精度和脂链锚锭蛋白的分类精度分别为87.98%、65.85%,比氨基酸组成成分特征提取法分别提高3.38、9.75个百分点;“一对一”策略的总分类精度可达到94.88%,比“一对多”策略提高6.9个百分点;支持向量机机器学习算法的分类能力优于贝叶斯协方差统计算法,其总分类精度比贝叶斯协方差算法最大可提高15.6个百分点。  相似文献   

19.
We present a mechanism to train support vector machines (SVMs) with a hybrid kernel and minimal Vapnik-Chervonenkis (VC) dimension. After describing the VC dimension of sets of separating hyperplanes in a high-dimensional feature space produced by a mapping related to kernels from the input space, we proposed an optimization criterion to design SVMs by minimizing the upper bound of the VC dimension. This method realizes a structural risk minimization and utilizes a flexible kernel function such that a superior generalization over test data can be obtained. In order to obtain a flexible kernel function, we develop a hybrid kernel function and a sufficient condition to be an admissible Mercer kernel based on common Mercer kernels (polynomial, radial basis function, two-layer neural network, etc.). The nonnegative combination coefficients and parameters of the hybrid kernel are determined subject to the minimal upper bound of the VC dimension of the learning machine. The use of the hybrid kernel results in a better performance than those with a single common kernel. Experimental results are discussed to illustrate the proposed method and show that the SVM with the hybrid kernel outperforms that with a single common kernel in terms of generalization power.  相似文献   

20.
有机化合物的水溶解度是重要的物理化学性质.本文用18个拓扑符去描述1293个化合物的分子结构.先建立1293个有机化合物的分类模型,按照logS的大小,将数据分成三类,通过训练集建立模型,并用测试集检验,分类准确率达92.2%.在此基础上,以上述18个描述符作为输入,losS作为输出,研究水溶解度的定量,建立支持向量机预测模型.比较测试集的结果,以前建立的人工神经网络模型相关系数r2=0.94和标准偏差sd=0.52,而本文建立的支持向量机模型r2=0.95和sd=0.50,显然优于以前建立的模型.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号