首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 741 毫秒
1.
Credit scoring has become a critical and challenging management science issue, as the credit industry has been facing fiercer competition in recent years. Many methods have been suggested to tackle this problem in the literature. In this paper, we proposed hybrid support vector machine technique based on three strategies: (1) using CART to select input features, (2) using MARS to select input features, (3) using grid search to optimize model parameters. In order to verify the feasibility and effectiveness of the proposed hybrid SVM model, one credit card dataset provided by a local bank in China is used in this study. Analytic results demonstrate that the hybrid SVM technique not only has the best classification rate, but also has the lowest Type II error in comparison with CART, MARS and SVM and justify the presumptions that SVM having better capability of capturing nonlinear relationship among variables.  相似文献   

2.
针对银行客户数据维度高、量级大和冗余特征多等问题,提出了一种借鉴多模态融合思想的综合特征筛选方法,通过计算并比较数据集中各特征的综合贡献度来对冗余特征进行筛选。基于真实银行客户数据特点,给出了一种包括类型转换及离散化、缺失值填充和标准化三部分的数据预处理方案,并对真实银行客户数据进行预处理;利用Pearson相关系数、随机森林、量化先验认知以及提出的多模态视角下的综合特征筛选方法对预处理后数据集中的冗余特征进行筛选,并分别提取到14个、8个、15个和11个特征;根据实验研究结果,从定性与定量两个层面对四种特征选择方法的特征选择效果进行充分比较。实验结果表明,提出的一种借鉴多模态融合思想的综合特征筛选方法能够有效弥补不同特征选择方法间的缺陷,降低数据维度,进而提升银行客户分类模型性能。  相似文献   

3.
In this paper, we develop a diagnosis model based on particle swarm optimization (PSO), support vector machines (SVMs) and association rules (ARs) to diagnose erythemato-squamous diseases. The proposed model consists of two stages: first, AR is used to select the optimal feature subset from the original feature set; then a PSO based approach for parameter determination of SVM is developed to find the best parameters of kernel function (based on the fact that kernel parameter setting in the SVM training procedure significantly influences the classification accuracy, and PSO is a promising tool for global searching). Experimental results show that the proposed AR_PSO–SVM model achieves 98.91% classification accuracy using 24 features of the erythemato-squamous diseases dataset taken from UCI (University of California at Irvine) machine learning database. Therefore, we can conclude that our proposed method is very promising compared to the previously reported results.  相似文献   

4.
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.  相似文献   

5.
Abstract: Tropical cyclones (TC) are often associated with severe weather conditions which cause great losses to lives and property. The precise classification of cyclone tracks is significantly important in the field of weather forecasting. In this paper we propose a novel hybrid model that integrates ontology and Support Vector Machine (SVM) to classify the tropical cyclone tracks into four types of classes namely straight, quasi-straight, curving and sinuous based on the track shape. Tropical Cyclone TRacks Ontology (TCTRO) described in this paper is a knowledge base which comprises of classes, objects and data properties that represent the interaction among the TC characteristics. A set of SWRL (Semantic Web Rule Language) rules are directly inserted to the TCTRO ontology for reasoning and inferring new knowledge from ontology. Furthermore, we propose a learning algorithm which utilizes the inferred knowledge for optimizing the feature subset. According to experiments on the IBTrACS dataset, the proposed ontology based SVM classifier achieves an accuracy of 98.3% with reduced classification error rates.  相似文献   

6.
基于主成份分析的肿瘤分类检测算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
基于基因表达谱的肿瘤诊断方法有望成为临床医学上一种快速而有效的诊断方法,但由于基因表达数据存在维数过高、样本量很小以及噪音大等特点,使得提取与肿瘤有关的信息基因成为一件有挑战性的工作。因此,在分析了目前肿瘤分类检测所采用方法的基础上,本文提出了一种结合基因特征记分和主成份分析的混合特征抽取方法。实验表明明,这种方法能够有效地提取分类特征信息,并在保持较高的肿瘤识别准确率的前提下大幅度地降低基因表达数据的维数,使得分类器性能得到很大提高。实验采用了两种与肿瘤有关的基因表达数据集来验证这种混合特征抽取方法的有效性,采用支持向量机的分类实验结果表明,所提出的混合方法不仅交叉验证识别准确率高而且分类结果能够可
可视化。对于结肠癌组织样本集,其交叉验证识别准确率高这95.16%;而对于急性白血病组织样本集,其交叉验证识别准确率高这100%。  相似文献   

7.
In the stock market, technical analysis is a useful method for predicting stock prices. Although, professional stock analysts and fund managers usually make subjective judgments, based on objective technical indicators, it is difficult for non-professionals to apply this forecasting technique because there are too many complex technical indicators to be considered. Moreover, two drawbacks have been found in many of the past forecasting models: (1) statistical assumptions about variables are required for time series models, such as the autoregressive moving average model (ARMA) and the autoregressive conditional heteroscedasticity (ARCH), to produce forecasting models of mathematical equations, and these are not easily understood by stock investors; and (2) the rules mined from some artificial intelligence (AI) algorithms, such as neural networks (NN), are not easily realized.In order to overcome these drawbacks, this paper proposes a hybrid forecasting model, using multi-technical indicators to predict stock price trends. Further, it includes four proposed procedures in the hybrid model to provide efficient rules for forecasting, which are evolved from the extracted rules with high support value, by using the toolset based on rough sets theory (RST): (1) select the essential technical indicators, which are highly related to the future stock price, from the popular indicators based on a correlation matrix; (2) use the cumulative probability distribution approach (CDPA) and minimize the entropy principle approach (MEPA) to partition technical indicator value and daily price fluctuation into linguistic values, based on the characteristics of the data distribution; (3) employ a RST algorithm to extract linguistic rules from the linguistic technical indicator dataset; and (4) utilize genetic algorithms (GAs) to refine the extracted rules to get better forecasting accuracy and stock return. The effectiveness of the proposed model is verified with two types of performance evaluations, accuracy and stock return, and by using a six-year period of the TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) as the experiment dataset. The experimental results show that the proposed model is superior to the two listed forecasting models (RST and GAs) in terms of accuracy, and the stock return evaluations have revealed that the profits produced by the proposed model are higher than the three listed models (Buy-and-Hold, RST and GAs).  相似文献   

8.
为了提高基于数据挖掘的商业银行信贷管理系统的信贷风险评估水平,将多决策树的Choquet模糊积分融合(MTCFF)模型应用到银行信贷管理系统中。基本思想是采用决策树在已知类型的客户数据上进行挖掘,按照决策树剪枝程度不同形成不同的决策树并产生规则,利用所生成的不同决策树的规则,对未知类型的客户数据进行分类,然后让Choquet模糊积分对多棵决策树的分类结果进行融合,形成最优判断。采用UCI数据库中German客户信用卡数据集进行验证,实验证明Choquet模糊积分的非线性融合效果优于单棵决策树的分类效果,也优于其他线性融合方法,并且Choquet模糊积分要优于Sugeno模糊积分。  相似文献   

9.
知识图谱(KG)是实现领域问答系统的关键技术之一,能够降低客服成本,推进客户自助服务的智能化,具有较大的商用价值和研究意义。针对基于KG问答系统中存在的中文问题表达模糊、线上服务运维成本高的问题,融合领域特征知识图谱的电网客服问答系统(HDKG-QA),其能基于LSTM模型识别实体/断言,基于主题比较的语义增强方法准确寻找外部知识,使用启发式规则优化答案候选集,并定期根据ILP求解器设置全局KG的更新策略。HDKG-QA能够达到较高的实体/断言识别准确率,自动将领域知识映射为本地KG,快速实现服务知识库的在线更新,达到以较低的响应延迟实现高准确率的回答。根据国网重庆市电力公司信息通信分公司的实际客服问答数据集对本系统进行验证,实验结果表明通过引入LSTM和语义增强方法,问答系统的准确率提高了17%;基于启发式规则的优化答案排序策略将准确率提高了8%;通过引入ILP求解器,在保障同样准确率的情况下,问答响应延迟降低了9%。  相似文献   

10.
Extreme learning machines (ELM), as a learning tool, have gained popularity due to its unique characteristics and performance. However, the generalisation capability of ELM often depends on the nature of the dataset, particularly on whether uncertainty is present in the dataset or not. In order to reduce the effects of uncertainties in ELM prediction and improve its generalisation ability, this paper proposes a hybrid system through a combination of type-2 fuzzy logic systems (type-2 FLS) and ELM; thereafter the hybrid system was applied to model permeability of carbonate reservoir. Type-2 FLS has been chosen to be a precursor to ELM in order to better handle uncertainties existing in datasets beyond the capability of type-1 fuzzy logic systems. The type-2 FLS is used to first handle uncertainties in reservoir data so that its final output is then passed to the ELM for training and then final prediction is done using the unseen testing dataset. Comparative studies have been carried out to compare the performance of the proposed T2-ELM hybrid system with each of the constituent type-2 FLS and ELM, and also artificial neural network (ANN) and support Vector machines (SVM) using five different industrial reservoir data. Empirical results show that the proposed T2-ELM hybrid system outperformed each of type-2 FLS and ELM, as the two constituent models, in all cases, with the improvement made to the ELM performance far higher against that of type-2 FLS that had a closer performance to the hybrid since it is already noted for being able to model uncertainties. The proposed hybrid also outperformed ANN and SVM models considered.  相似文献   

11.
The style of people's handwriting is a biometric feature that is used in person authentication. In this paper, we have proposed a text independent method for Persian writer identification. In the proposed method, pattern based features are extracted from data using Gabor and XGabor filter. The extracted features are represented for each person by using a graph that is called FRG (feature relation graph). This graph is constructed using relations between extracted features by employing a fuzzy method. The fuzzy method determines the similarity between features extracted from different handwritten instances of each person. In the identification phase, a graph similarity approach is employed to determine the similarity of the FRG generated from the test data and the FRGs generated by training data. The experimental results were satisfactory and the proposed method got about 100% accuracy on a dataset with 100 writers when enough training data was used. However, this method has been applied on Persian handwritings but we believe it can be extended on other languages especially in data representation and classification parts.  相似文献   

12.
Hybrid system is a potential tool to deal with construction engineering and management problems. This study proposes an optimized hybrid artificial intelligence model to integrate a fast messy genetic algorithm (fmGA) with a support vector machine (SVM). The fmGA-based SVM (GASVM) is used for early prediction of dispute propensity in the initial phase of public–private partnership projects. Particularly, the SVM mainly provides learning and curve fitting while the fmGA optimizes SVM parameters. Measures in term of accuracy, precision, sensitivity, specificity, and area under the curve and synthesis index are used for performance evaluation of proposed hybrid intelligence classification model. Experimental comparisons indicate that GASVM achieves better cross-fold prediction accuracy compared to other baseline models (i.e., CART, CHAID, QUEST, and C5.0) and previous works. The forecasting results provide the proactive-warning and decision-support information needed to manage potential disputes.  相似文献   

13.
Support vector machine (SVM) is an effective tool for financial distress identification (FDI). However, a potential issue that keeps SVM from being efficiently applied in identifying financial distress is how to select features in SVM-based FDI. Although filters are commonly employed, yet this type of approach does not consider predictive capability of SVM itself when selecting features. This research devotes to constructing a statistics-based wrapper for SVM-based FDI by using statistical indices of ranking-order information from predictive performances on various parameters. This wrapper consists of four levels, i.e., data level, model level based on SVM, feature ranking-order level, and the index level of feature selection. When data is ready, predictive accuracies of a type of SVM model, i.e., linear SVM (LSVM), polynomial SVM (PSVM), Gaussian SVM (GSVM), or sigmoid SVM (SSVM), on various pairs of parameters are firstly calculated. Then, performances of SVM models on each candidate feature are transferred to be ranking-order indices. After this step, the two statistical indices of mean and standard deviation values are calculated from ranking-order information on each feature. Finally, the feature selection indices of SVM are produced by a combination of statistical indices. Each feature with its feature selection index being smaller than half of the average index is selected to compose the optimal feature set. With a dataset collected for Chinese FDI prior to 3 years, we statistically verified the performance of this statistics-based wrapper against a non-statistics-based wrapper, two filters, and non-feature selection for SVM-based FDI. Results from unseen dataset indicate that GSVM with the statistics-based wrapper significantly outperformed the other SVM models on the other feature selection methods and two wrapper-based classical statistical models.  相似文献   

14.
Recognition of human actions is a very important, task in many applications such as Human Computer Interaction, Content based video retrieval and indexing, Intelligent video surveillance, Gesture Recognition, Robot learning and control, etc. An efficient action recognition system using Difference Intensity Distance Group Pattern (DIDGP) method and recognition using Support Vector Machines (SVM) classifier is presented. Initially, Region of Interest (ROI) is extracted from the difference frame, where it represents the motion information. The extracted ROI is divided into two blocks B1 and B2. The proposed DIDGP feature is applied on the maximum intensity block of the ROI to discriminate the each action from video sequences. The feature vectors obtained from the DIDGP are recognized using SVM with polynomial and RBF kernel. The proposed work has been evaluated on KTH action dataset which consists of actions like walking, running, jogging, hand waving, clapping and boxing. The proposed method has been experimentally tested on KTH dataset and an overall accuracy of 94.67% for RBF kernel.  相似文献   

15.
This study proposed a novel PSO–SVM model that hybridized the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a small and appropriate feature subset. This optimization mechanism combined the discrete PSO with the continuous-valued PSO to simultaneously optimize the input feature subset selection and the SVM kernel parameter setting. The hybrid PSO–SVM data mining system was implemented via a distributed architecture using the web service technology to reduce the computational time. In a heterogeneous computing environment, the PSO optimization was performed on the application server and the SVM model was trained on the client (agent) computer. The experimental results showed the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy.  相似文献   

16.
Using SVM to Extract Acronyms from Text   总被引:1,自引:0,他引:1  
The paper addresses the problem of extracting acronyms and their expansions from text. We propose a support vector machines (SVM) based approach to deal with the problem. First, all likely acronyms are identified using heuristic rules. Second, expansion candidates are generated from surrounding text of acronyms. Last, SVM model is employed to select the genuine expansions. Analysis shows that the proposed approach has the advantages of saving over the conventional rule based approaches. Experimental results show that our approach outperforms the baseline method of using rules. We also show that the trained SVM model is generic and can adapt to other domains easily.  相似文献   

17.
In this work a novel technique for building ensembles of classifiers for spectrogram classification is presented. We propose a simple approach for classifying signals from a large database of plant echoes, these echoes are highly complex stochastic signals, anyway their spectrograms contain enough information for extracting a good set of features for training the proposed ensemble of classifiers.The proposed ensemble of classifiers is a novel modified version of a recent feature transform based ensemble method: the Input Decimated Ensemble. In the proposed variant different subsets of randomly extracted training patterns are used to create a set of different Neighborhood Preserving Embedding subspace projections. These feature transformations are applied to the whole dataset and a set of decision trees are trained using these transformed spaces. Finally, the scores of this set of classifiers are combined by sum rule.Experiments carried out on a yet proposed dataset show the superiority of this method with respect to other approaches. The proposed approach outperforms the yet proposed, for the tested dataset, combination of principal component analysis and support vector machine (SVM). Moreover, we show that the fusion between the proposed ensemble and the system based on SVM outperforms both the stand-alone methods.  相似文献   

18.
Credit scoring with a data mining approach based on support vector machines   总被引:3,自引:0,他引:3  
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

19.

Credit scoring is a process of calculating the risk associated with an applicant on the basis of applicant’s credentials such as social status, financial status, etc. and it plays a vital role to improve cash flow for financial industry. However, the credit scoring dataset may have a large number of irrelevant or redundant features which leads to poorer classification performances and higher complexity. So, by removing redundant and irrelevant features may overcome the problem with huge number of features. This work emphasized on the role of feature selection and proposed a hybrid model by combining feature selection by utilizing Binary BAT optimization technique with a novel fitness function and aggregated with for Radial Basis Function Neural Network (RBFN) for credit score classification. Further, proposed feature selection approach is aggregated with Support Vector Machine (SVM) & Random Forest (RF), and other optimization approaches namely: Hybrid Particle Swarm Optimization and Gravitational Search Algorithm (PSOGSA), Hybrid Particle Swarm Optimization and Genetic Algorithm (PSOGA), Improved Krill Herd (IKH), Improved Cuckoo Search (ICS), Firefly Algorithm (FF) and Differential Evolution (DE) are also applied for comparative analysis.

  相似文献   

20.
Support vector machine (SVM) is a state-of-art classification tool with good accuracy due to its ability to generate nonlinear model. However, the nonlinear models generated are typically regarded as incomprehensible black-box models. This lack of explanatory ability is a serious problem for practical SVM applications which require comprehensibility. Therefore, this study applies a C5 decision tree (DT) to extract rules from SVM result. In addition, a metaheuristic algorithm is employed for the feature selection. Both SVM and C5 DT require expensive computation. Applying these two algorithms simultaneously for high-dimensional data will increase the computational cost. This study applies artificial bee colony optimization (ABC) algorithm to select the important features. The proposed algorithm ABC–SVM–DT is applied to extract comprehensible rules from SVMs. The ABC algorithm is applied to implement feature selection and parameter optimization before SVM–DT. The proposed algorithm is evaluated using eight datasets to demonstrate the effectiveness of the proposed algorithm. The result shows that the classification accuracy and complexity of the final decision tree can be improved simultaneously by the proposed ABC–SVM–DT algorithm, compared with genetic algorithm and particle swarm optimization algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号