首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce a novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions. Our method is based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration. We compare our approach with other algorithms like a filter method or Recursive Feature Elimination SVM to demonstrate its effectiveness and efficiency.  相似文献   

2.
The success of stock selection is contingent upon the future performance of stock markets. We incorporate stock prediction into stock selection to specifically capture the future features of stock markets, thereby forming a novel hybrid (two-step) stock selection method (involving stock prediction and stock scoring). (1) Stock returns for the next period are predicted using emerging computational intelligence (CI), i.e., extreme learning machine with a powerful learning capacity and a fast computing speed. (2) A stock scoring mechanism is developed as a linear combination of the predicted factor (generated in the first step) and the fundamental factors (popular in existing literature) based on CI-based optimization for weights, and top-ranked stocks are selected for an equally weighted portfolio. Using the A-share market of China as the study sample, the empirical results show that the novel hybrid approach, using highly weighted predicted factors, statistically outperforms both traditional methods (without stock prediction) and similar counterparts (with other model designs) in terms of market returns, which suggests the great contribution of stock prediction for improving stock selection.  相似文献   

3.
We focus on a hybrid approach of feature selection. We begin our analysis with a filter model, exploiting the geometrical information contained in the minimum spanning tree (MST) built on the learning set. This model exploits a statistical test of relative certainty gain, used in a forward selection algorithm. In the second part of the paper, we show that the MST can be replaced by the 1 nearest-neighbor graph without challenging the statistical framework. This leads to a feature selection algorithm belonging to a new category of hybrid models (filter-wrapper). Experimental results on readily available synthetic and natural domains are presented and discussed.  相似文献   

4.
Prediction of stock market trends is considered as an important task and is of great attention as predicting stock prices successfully may lead to attractive profits by making proper decisions. Stock market prediction is a major challenge owing to non-stationary, blaring, and chaotic data, and thus, the prediction becomes challenging among the investors to invest the money for making profits. Several techniques are devised in the existing techniques to predict the stock market trends. This work presents the detailed review of 50 research papers suggesting the methodologies, like Bayesian model, Fuzzy classifier, Artificial Neural Networks (ANN), Support Vector Machine (SVM) classifier, Neural Network (NN), Machine Learning Methods and so on, based on stock market prediction. The obtained papers are classified based on different prediction and clustering techniques. The research gaps and the challenges faced by the existing techniques are listed and elaborated, which help the researchers to upgrade the future works. The works are analyzed using certain datasets, software tools, performance evaluation measures, prediction techniques utilized, and performance attained by different techniques. The commonly used technique for attaining effective stock market prediction is ANN and the fuzzy-based technique. Even though a lot of research efforts, the current stock market prediction technique still have many limits. From this survey, it can be concluded that the stock market prediction is a very complex task, and different factors should be considered for predicting the future of the market more accurately and efficiently.  相似文献   

5.
Prediction of the stock market price direction is a challenging and important task of the financial time series. This study presents the prediction of the next day stock price direction by the optimal subset indicators selected with ensemble feature selection approach. The main focus is to obtain the final best feature subset which also yields good prediction of the next day price trend by removing irrelevant and redundant indicators from the dataset. For this purpose, filter methods are combined, support vector machines (SVM) has been carried out and finally voting scheme is applied. In order to conduct these processes, a real dataset obtained from Istanbul Stock Exchange (ISE) is used with technical and macroeconomic indicators. The result of this study shows that the prediction of the next day direction with reduced dataset has an improvement over the prediction of it with full dataset.  相似文献   

6.
With the economic successes of several Asian economies and their increasingly important roles in the global financial market, the prediction of Asian stock markets has becoming a hot research area. As Asian stock markets are highly dynamic and exhibit wide variation, it may more realistic and practical that assumed the stock indexes of Asian stock markets are nonlinear mixture data. In this research, a time series prediction model by combining nonlinear independent component analysis (NLICA) and neural network is proposed to forecast Asian stock markets. NLICA is a novel feature extraction technique to find independent sources from observed nonlinear mixture data where no relevant data mixing mechanisms are available. In the proposed method, we first use NLICA to transform the input space composed of original time series data into the feature space consisting of independent components representing underlying information of the original data. Then, the ICs are served as the input variables of the neural network to build prediction model. Among the Asian stock markets, Japanese and China’s stock markets are the biggest two in Asia and they respectively represent the two types of stock markets. Therefore, in order to evaluate the performance of the proposed approach, the Nikkei 225 closing index and Shanghai B-share closing index are used as illustrative examples. Experimental results show that the proposed forecasting model not only improves the prediction accuracy of the neural network approach but also outperforms the three comparison methods. The proposed stock index prediction model can be therefore a good alternative for Asian stock market indexes.  相似文献   

7.
This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely on a univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones.Here we propose a new approach whose main goal is to drastically reduce the number of wrapper evaluations while maintaining good performance (e.g. accuracy and size of the obtained subset). To do this we propose an algorithm that iteratively alternates between filter ranking construction and wrapper feature subset selection (FSS). Thus, the FSS only uses the first block of ranked attributes and the ranking method uses the current selected subset in order to build a new ranking where this knowledge is considered. The algorithm terminates when no new attribute is selected in the last call to the FSS algorithm. The main advantage of this approach is that only a few blocks of variables are analyzed, and so the number of wrapper evaluations decreases drastically.The proposed method is tested over eleven high-dimensional datasets (2400-46,000 variables) using different classifiers. The results show an impressive reduction in the number of wrapper evaluations without degrading the quality of the obtained subset.  相似文献   

8.
Stock market prediction is of great interest to stock traders and investors due to high profit in trading the stocks. A successful stock buying/selling generally occurs near price trend turning point. Thus the prediction of stock market indices and its analysis are important to ascertain whether the next day's closing price would increase or decrease. This paper, therefore, presents a simple IIR filter based dynamic neural network (DNN) and an innovative optimized adaptive unscented Kalman filter for forecasting stock price indices of four different Indian stocks, namely the Bombay stock exchange (BSE), the IBM stock market, RIL stock market, and Oracle stock market. The weights of the dynamic neural information system are adjusted by four different learning strategies that include gradient calculation, unscented Kalman filter (UKF), differential evolution (DE), and a hybrid technique (DEUKF) by alternately executing the DE and UKF for a few generations. To improve the performance of both the UKF and DE algorithms, adaptation of certain parameters in both these algorithms has been presented in this paper. After predicting the stock price indices one day to one week ahead time horizon, the stock market trend has been analyzed using several important technical indicators like the moving average (MA), stochastic oscillators like K and D parameters, WMS%R (William indicator), etc. Extensive computer simulations are carried out with the four learning strategies for prediction of stock indices and the up or down trends of the indices. From the results it is observed that significant accuracy is achieved using the hybrid DEUKF algorithm in comparison to others that include only DE, UKF, and gradient descent technique in chronological order. Comparisons with some of the widely used neural networks (NNs) are also presented in the paper.  相似文献   

9.
近年来股票市场预测研究一直较受欢迎。大量研究者尝试基于多种数学模型的技术指数及机器学习技术预测股票价格或指数。尽管现有方法展示了较满意的预测成就,但是股票市场是升还是降的预测准确性很少被分析。用Wrap-per方法从由23个技术指标构成的原始特征集中选择最优特征子集,然后用混合不同分类算法的投票法来预测两股票市场的趋势。实验结果表明Wrapper方法比常用的Filter式特征选择算法如χ^2-统计,信息增益,Relief F,对称不确定性,和CFS能有更好的性能。此外,提出的投票法超越单一的分类器如SVM,K最邻近,BP神经网络,决策树和Logistic回归。  相似文献   

10.
A new approach is introduced to identify natural clusters of acoustic emission signals. The presented technique is based on an exhaustive screening taking into account all combinations of signal features extracted from the recorded acoustic emission signals. For each possible combination of signal features an investigation of the classification performance of the k-means algorithm is evaluated ranging from two to ten classes. The numerical degree of cluster separation of each partition is calculated utilizing the Davies-Bouldin and Tou indices, Rousseeuw’s silhouette validation method and Hubert’s Gamma statistics. The individual rating of each cluster validation technique is cumulated based on a voting scheme and is evaluated for the number of clusters with best performance. This is defined as the best partitioning for the given signal feature combination. As a second step the numerical ranking of all these partitions is evaluated for the globally optimal partition in a second voting scheme using the cluster validation methods results. This methodology can be used as an automated evaluation of the number of natural clusters and their partitions without previous knowledge about the cluster structure of acoustic emission signals. The suitability of the current approach was evaluated using artificial datasets with defined degree of separation. In addition the application of the approach to clustering of acoustic emission signals is demonstrated for signals obtained from failure during loading of carbon fiber reinforced plastic specimens.  相似文献   

11.
This paper proposes a modified binary particle swarm optimization (MBPSO) method for feature selection with the simultaneous optimization of SVM kernel parameter setting, applied to mortality prediction in septic patients. An enhanced version of binary particle swarm optimization, designed to cope with premature convergence of the BPSO algorithm is proposed. MBPSO control the swarm variability using the velocity and the similarity between best swarm solutions. This paper uses support vector machines in a wrapper approach, where the kernel parameters are optimized at the same time. The approach is applied to predict the outcome (survived or deceased) of patients with septic shock. Further, MBPSO is tested in several benchmark datasets and is compared with other PSO based algorithms and genetic algorithms (GA). The experimental results showed that the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy, specially when compared to other PSO based algorithms. When compared to GA, MBPSO is similar in terms of accuracy, but the subset solutions have less selected features.  相似文献   

12.
Financially distressed prediction (FDP) has been a widely and continually studied topic in the field of corporate finance. One of the core problems to FDP is to design effective feature selection algorithms. In contrast to existing approaches, we propose an integrated approach to feature selection for the FDP problem that embeds expert knowledge with the wrapper method. The financial features are categorized into seven classes according to their financial semantics based on experts’ domain knowledge surveyed from literature. We then apply the wrapper method to search for “good” feature subsets consisting of top candidates from each feature class. For concept verification, we compare several scholars’ models as well as leading feature selection methods with the proposed method. Our empirical experiment indicates that the prediction model based on the feature set selected by the proposed method outperforms those models based on traditional feature selection methods in terms of prediction accuracy.  相似文献   

13.
Remote sensing hyperspectral sensors are important and powerful instruments for addressing classification problems in complex forest scenarios, as they allow one a detailed characterization of the spectral behavior of the considered information classes. However, the processing of hyperspectral data is particularly complex both from a theoretical viewpoint [e.g. problems related to the Hughes phenomenon (Hughes, 1968) and from a computational perspective. Despite many previous investigations that have been presented in the literature on feature reduction and feature extraction in hyperspectral data, only a few studies have analyzed the role of spectral resolution on the classification accuracy in different application domains. In this paper, we present an empirical study aimed at understanding the relationship among spectral resolution, classifier complexity, and classification accuracy obtained with hyperspectral sensors for the classification of forest areas. We considered two different test sets characterized by images acquired by an AISA Eagle sensor over 126 bands with a spectral resolution of 4.6 nm, and we subsequently degraded its spectral resolution to 9.2, 13.8, 18.4, 23, 27.6, 32.2 and 36.8 nm. A series of classification experiments were carried out with bands at each of the degraded spectral resolutions, and bands selected with a feature selection algorithm at the highest spectral resolution (4.6 nm). The classification experiments were carried out with three different classifiers: Support Vector Machine, Gaussian Maximum Likelihood with Leave-One-Out-Covariance estimator, and Linear Discriminant Analysis. From the experimental results, important conclusions can be made about the choice of the spectral resolution of hyperspectral sensors as applied to forest areas, also in relation to the complexity of the adopted classification methodology. The outcome of these experiments are also applicable in terms of directing the user towards a more efficient use of the current instruments (e.g. programming of the spectral channels to be acquired) and classification techniques in forest applications, as well as in the design of future hyperspectral sensors.  相似文献   

14.
We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.  相似文献   

15.
In the areas of investment research and applications, feasible quantitative models include methodologies stemming from soft computing for prediction of financial time series, multi-objective optimization of investment return and risk reduction, as well as selection of investment instruments for portfolio management based on asset ranking using a variety of input variables and historical data, etc. Among all these, stock selection has long been identified as a challenging and important task. This line of research is highly contingent upon reliable stock ranking for successful portfolio construction. Recent advances in machine learning and data mining are leading to significant opportunities to solve these problems more effectively. In this study, we aim at developing a methodology for effective stock selection using support vector regression (SVR) as well as genetic algorithms (GAs). We first employ the SVR method to generate surrogates for actual stock returns that in turn serve to provide reliable rankings of stocks. Top-ranked stocks can thus be selected to form a portfolio. On top of this model, the GA is employed for the optimization of model parameters, and feature selection to acquire optimal subsets of input variables to the SVR model. We will show that the investment returns provided by our proposed methodology significantly outperform the benchmark. Based upon these promising results, we expect this hybrid GA-SVR methodology to advance the research in soft computing for finance and provide an effective solution to stock selection in practice.  相似文献   

16.
Naive Bayes is one of the most widely used algorithms in classification problems because of its simplicity, effectiveness, and robustness. It is suitable for many learning scenarios, such as image classification, fraud detection, web mining, and text classification. Naive Bayes is a probabilistic approach based on assumptions that features are independent of each other and that their weights are equally important. However, in practice, features may be interrelated. In that case, such assumptions may cause a dramatic decrease in performance. In this study, by following preprocessing steps, a Feature Dependent Naive Bayes (FDNB) classification method is proposed. Features are included for calculation as pairs to create dependence between one another. This method was applied to the software defect prediction problem and experiments were carried out using widely recognized NASA PROMISE data sets. The obtained results show that this new method is more successful than the standard Naive Bayes approach and that it has a competitive performance with other feature-weighting techniques. A further aim of this study is to demonstrate that to be reliable, a learning model must be constructed by using only training data, as otherwise misleading results arise from the use of the entire data set.  相似文献   

17.
This article presents an intelligent stock trading system that can generate timely stock trading suggestions according to the prediction of short-term trends of price movement using dual-module neural networks(dual net). Retrospective technical indicators extracted from raw price and volume time series data gathered from the market are used as independent variables for neural modeling. Both neural network modules of thedual net learn the correlation between the trends of price movement and the retrospective technical indicators by use of a modified back-propagation learning algorithm. Reinforcing the temporary correlation between the neural weights and the training patterns, dual modules of neural networks are respectively trained on a short-term and a long-term moving-window of training patterns. An adaptive reversal recognition mechanism that can self-tune thresholds for identification of the timing for buying or selling stocks has also been developed in our system. It is shown that the proposeddual net architecture generalizes better than one single-module neural network. According to the features of acceptable rate of returns and consistent quality of trading suggestions shown in the performance evaluation, an intelligent stock trading system with price trend prediction and reversal recognition can be realized using the proposed dual-module neural networks.  相似文献   

18.
Incremental construction of classifier and discriminant ensembles   总被引:2,自引:0,他引:2  
We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers.  相似文献   

19.
Contemporary biological technologies produce extremely high-dimensional data sets from which to design classifiers, with 20,000 or more potential features being common place. In addition, sample sizes tend to be small. In such settings, feature selection is an inevitable part of classifier design. Heretofore, there have been a number of comparative studies for feature selection, but they have either considered settings with much smaller dimensionality than those occurring in current bioinformatics applications or constrained their study to a few real data sets. This study compares some basic feature-selection methods in settings involving thousands of features, using both model-based synthetic data and real data. It defines distribution models involving different numbers of markers (useful features) versus non-markers (useless features) and different kinds of relations among the features. Under this framework, it evaluates the performances of feature-selection algorithms for different distribution models and classifiers. Both classification error and the number of discovered markers are computed. Although the results clearly show that none of the considered feature-selection methods performs best across all scenarios, there are some general trends relative to sample size and relations among the features. For instance, the classifier-independent univariate filter methods have similar trends. Filter methods such as the t-test have better or similar performance with wrapper methods for harder problems. This improved performance is usually accompanied with significant peaking. Wrapper methods have better performance when the sample size is sufficiently large. ReliefF, the classifier-independent multivariate filter method, has worse performance than univariate filter methods in most cases; however, ReliefF-based wrapper methods show performance similar to their t-test-based counterparts.  相似文献   

20.
The monitoring of the expression profiles of thousands of genes have proved to be particularly promising for biological classification. DNA microarray data have been recently used for the development of classification rules, particularly for cancer diagnosis. However, microarray data present major challenges due to the complex, multiclass nature and the overwhelming number of variables characterizing gene expression profiles. A regularized form of sliced inverse regression (REGSIR) approach is proposed. It allows the simultaneous development of classification rules and the selection of those genes that are most important in terms of classification accuracy. The method is illustrated on some publicly available microarray data sets. Furthermore, an extensive comparison with other classification methods is reported. The REGSIR performance is comparable with the best classification methods available, and when appropriate feature selection is made the performance can be considerably improved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号