首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
针对财务管理中的数据量问题,提出使用监督性支持向量机(SVM)算法解决财务数据分类问题。首先,将财务数据按照不同的年份,不同的部门进行预处理,并将其标签化;其次,按照一定比例选择训练数据和验证数据,将训练数据输入到SVM中进行分类器的训练;最后利用最优分类器完成财务数据分类。实验结果表明:所提出的算法在财务管理中有较高的应用价值。  相似文献   

2.
针对高维数据分类问题的特点,提出一种基于改进型局部线性嵌入LLE(Locally Linear Embedding)算法的数据降维算法,结合支持向量机SVM(Support Vector Machine)算法实现数据分类。首先,通过LLE算法降维后的数据集,按照数据集内的离差最小化,数据集间的离差最大化的原则,计算得到最优化邻近点个数;其次,将最优邻近点个数所得的降维数据作为最优结果,按一定比例选取训练集,输入SVM算法建立数据分类器;最后,将测试集输入训练完成的分类器中,实现最优化数据分类。选取Iris flower,Yale等多类数据集与传统算法进行对比实验,验证算法的可行性。实验结果表明:所提出的算法可以有效地完成数据分类,针对低维数据和高维数据分类问题具有较好的适用性和优越性,在人脸检测中也取得较好的结果。  相似文献   

3.
SVM-KNN分类算法研究   总被引:1,自引:0,他引:1  
SVM-KNN分类算法是一种将支持向量机(SVM)分类和最近邻(NN)分类相结合的新分类方法。针对传统SVM分类器中存在的问题,该算法通过支持向量机的序列最小优化(SMO)训练算法对数据集进行训练,将距离差小于给定阈值的样本代入以每类所有的支持向量作为代表点的K近邻分类器中进行分类。在UCI数据集上的实验结果表明,该分类器的分类准确率比单纯使用SVM分类器要高,它在一定程度上不受核函数参数选择的影响,具有较好的稳健性。  相似文献   

4.
当前机器学习面临的主要问题之一是如何有效地处理海量数据,而标记训练数据是十分有限且不易获得的。提出了一种新的半监督SVM算法,该算法在对SVM训练中,只要求少量的标记数据,并能利用大量的未标记数据对分类器反复的修正。在实验中发现,Tri-training的应用确实能够提高SVM算法的分类精度,并且通过增大分类器间的差异性能够获得更好的分类效果,所以Tri-training对分类器的要求十分宽松,通过SVM的不同核函数来体现分类器之间的差异性,进一步改善了协同训练的性能。理论分析与实验表明,该算法具有较好的学习效果。  相似文献   

5.
基于SVM的增量学习算法及其在网页分类中的应用   总被引:1,自引:0,他引:1  
根据支持向量的作用,利用基于SVM的增量学习算法将一个大型数据集分成许多不相交的子集,按批次对各个训练子集中的样本进行训练而得到一个分类器,从而对网页文件进行自动分类。在进行网页文件分类时,本文提出只利用正例数据和一些无标记数据来训练SVM分类器,以提高分类的准确性。  相似文献   

6.
针对数据识别分类在传统的支持向量机(SVM)个体分类器上正确识别率不理想的问题,提出一种基于代价敏感思想(cost-sensitive)和自适应增强(AdaBoost)的SVM集成数据分类算法(CAB-SVM)。在自适应增强算法每次迭代训练SVM弱分类器之前,根据样本总数设置初始样本权值,并抽取样本组成临时训练集训练SVM弱分类器。其中在权重迭代更新阶段,赋予被分错样本更高的误分代价,使得被分错样本权重增加更快,有效地减少了算法迭代次数。同时,算法迭代过程极大地优化了个体分类器的识别鲁棒性能,使得提出的CAB-SVM算法获得了更优越的数据分类性能。利用UCI数据样本集的实验结果表明CAB-SVM分类算法的正确识别率高于SVM和SVME算法。  相似文献   

7.
一种基于凸壳算法的SVM集成方法   总被引:1,自引:1,他引:0       下载免费PDF全文
为提高支持向量机(SVM)集成的训练速度,提出一种基于凸壳算法的SVM集成方法,得到训练集各类数据的壳向量,将其作为基分类器的训练集,并采用Bagging策略集成各个SVM。在训练过程中,通过抛弃性能较差的基分类器,进一步提高集成分类精度。将该方法用于3组数据,实验结果表明,SVM集成的训练和分类速度平均分别提高了266%和25%。  相似文献   

8.
针对实时行人检测中AdaBoost级联分类算法存在的问题,改进AdaBoost级联分类器的训练算法,提出了Ada-Boost-SVM级联分类算法,它结合了AdaBoost和SVM两种算法的优点.对自定义样本集和PET图像库进行行人检测实验,实验中选择固定大小的窗口作为候选区域并利用类Haar矩形特征进行特征提取,通过AdaBoost-SVM级联分类器进行分类.实验结果表明AdaBoost-SVM级联分类器的分类器准确率达到99.5%,误报率低于0.05%,优于AdaBoost级联分类器,训练时间要远远小于SVM分类器.  相似文献   

9.
针对于使用支持向量机求解大规模复杂问题存在训练时间过长和分类精度不高等困难,本文提出了一种结合支持向量机(SvM)和K-最近邻(KNN)分类的分治算法.首先对支持向量机分类机理进行分析可以得出它作为分类器实际相当于每类只选一个代表点的最近邻分类器.在此基础上,根据分治算法的基本思想将训练集划分为多个训练子集,用每个子集单独训练一个SVM,这样每个训练子集由训练后的SVM可以分别得到正例和反例的一个代表点,由这些代表点的全体构成了整个训练集的正例和反例代表点的集合,然后在这个代表点集合基础上使用KNN分类器最为整个问题的解.实验结果表明该分治算法对于大规模数据可使训练时间大幅度下降且使分类精度不同程度提高.  相似文献   

10.
针对支持向量机分类器的行人检测方法采用欠采样方法,存在正负行人比例不平衡造成的准确率不高问题,结合欠采样和EasyEnsemble方法,提出一种聚合支持向量机(Ensemble SVM)分类器的行人检测方法。随机选择负样本作为初始训练样本,并将其划分为与正样本集均衡的多个子负样本集,构建平衡子训练集,线性组合成EasyEnsemble SVM分类器;利用该分类器对负样本进行分类判断,将误判样本作为难例样本,重新划分构建新的平衡子训练集,训练子分类器,结合EasyEnsemble SVM分类器,得到Ensemble SVM分类器行人检测方法。在INRIA行人数据集上的实验表明,该方法在检测速度和检测率上都优于经典的SVM行人检测算法。  相似文献   

11.
针对当前的财务数据分类系统误分率较高的问题,设计一种基于业务流程的财务数据自动化分类系统。该系统以财务数据分类算法为中心,在程序加载和交叉编译模式下,采用分布式云计算技术对采集到的财务数据进行融合处理,提取其高阶统计特征量;采用分组样本检验分析方法分析财务数据间的关联性,结合业务流程进行财务数据的属性分类识别;以业务流程的模糊聚类分布为中心矢量,采用分段检测方法实现财务数据的自动化分类;将上述过程采用程序加载方式移植到处理器终端,进行财务数据分类系统的交叉编译控制,实现财务数据的自动化分类系统的设计。仿真实验结果表明,采用该系统进行财务数据自动化分类的准确性较高、误分率较低,提高了财务数据的业务管理和分析能力。  相似文献   

12.
针对传统金融分析报告分类效率低的问题,提出基于支持向量机的中文文本分类技术来对金融分析报告进行分类,该分类技术采用中科院提供的中文分词系统以及使用两种特征选择算法相结合进行分词和特征选择,并且提出针对TF/IDF权重计算的改进方法。该分类技术选择支持向量机作为分类算法,通过开源的支持向量机对样本进行训练和测试。实验结果表明,采用中文文本分类技术对金融分析报告按照行业进行分类能够满足金融机构的使用需求。  相似文献   

13.
Applying inductive learning to enhance knowledge-based expert systems   总被引:1,自引:0,他引:1  
This paper describes the use of inductive learning in MARBLE, a knowledge-based expert system I have developed for assisting business loan evaluation. Inductive learning is the process of inferring classification concepts from raw data; I use this technique to generate loan-granting decision rules based on historical and proforma financial information. A learning method is presented in this paper that can induce decision rules from training examples.  相似文献   

14.
Bank failures threaten the economic system as a whole. Therefore, predicting bank financial failures is crucial to prevent and/or lessen the incoming negative effects on the economic system. This is originally a classification problem to categorize banks as healthy or non-healthy ones. This study aims to apply various neural network techniques, support vector machines and multivariate statistical methods to the bank failure prediction problem in a Turkish case, and to present a comprehensive computational comparison of the classification performances of the techniques tested. Twenty financial ratios with six feature groups including capital adequacy, asset quality, management quality, earnings, liquidity and sensitivity to market risk (CAMELS) are selected as predictor variables in the study. Four different data sets with different characteristics are developed using official financial data to improve the prediction performance. Each data set is also divided into training and validation sets. In the category of neural networks, four different architectures namely multi-layer perceptron, competitive learning, self-organizing map and learning vector quantization are employed. The multivariate statistical methods; multivariate discriminant analysis, k-means cluster analysis and logistic regression analysis are tested. Experimental results are evaluated with respect to the correct accuracy performance of techniques. Results show that multi-layer perceptron and learning vector quantization can be considered as the most successful models in predicting the financial failure of banks.  相似文献   

15.
Multirelational classification aims to discover patterns across multiple interlinked tables (relations) in a relational database. In many large organizations, such a database often spans numerous departments and/or subdivisions, which are involved in different aspects of the enterprise such as customer profiling, fraud detection, inventory management, financial management, and so on. When considering classification, different phases of the knowledge discovery process are affected by economic utility. For instance, in the data preprocessing process, one must consider the cost associated with acquiring, cleaning, and transforming large volumes of data. When training and testing the data mining models, one has to consider the impact of the data size on the running time of the learning algorithm. In order to address these utility-based issues, the paper presents an approach to create a pruned database for multirelational classification, while minimizing predictive performance loss on the final model. Our method identifies a set of strongly uncorrelated subgraphs from the original database schema, to use for training, and discards all others. The experiments performed show that our strategy is able to, without sacrificing predictive accuracy, significantly reduce the size of the databases, in terms of the number of relations, tuples, and attributes.The approach prunes the sizes of databases by as much as 94 %. Such reduction also results in decreasing computational cost of the learning process. The method improves the multirelational learning algorithms’ execution time by as much as 80 %. In particular, our results demonstrate that one may build an accurate model with only a small subset of the provided database.  相似文献   

16.
In one-class classification, the low variance directions in the training data carry crucial information to build a good model of the target class. Boundary-based methods like One-Class Support Vector Machine (OSVM) preferentially separates the data from outliers along the large variance directions. On the other hand, retaining only the low variance directions can result in sacrificing some initial properties of the original data and is not desirable, specially in case of limited training samples. This paper introduces a Covariance-guided One-Class Support Vector Machine (COSVM) classification method which emphasizes the low variance projectional directions of the training data without compromising any important characteristics. COSVM improves upon the OSVM method by controlling the direction of the separating hyperplane through incorporation of the estimated covariance matrix from the training data. Our proposed method is a convex optimization problem resulting in one global optimum solution which can be solved efficiently with the help of existing numerical methods. The method also keeps the principal structure of the OSVM method intact, and can be implemented easily with the existing OSVM libraries. Comparative experimental results with contemporary one-class classifiers on numerous artificial and benchmark datasets demonstrate that our method results in significantly better classification performance.  相似文献   

17.
Classification of operating performance of the enterprises is not only a hot issue emphasized by the management, but it is an important reference for investors too in their decision-making. Generally speaking, when predicting or analyzing business performance classification, most researchers adopt corporate financial early warning or credit-rating models, which pretty much use previous data and facts. Therefore, this paper brings about an alternative method to discriminate between excellent and poor business management, so as to take preventive measures prior to business crisis or bankruptcy. We collected the financial reports and financial ratios from the listed firms in mainland China and Taiwan as our samples to build up four kinds of forecasting models for business performance. The empirical results show that the hybrid model provides better classification forecasting capability than the other models, while the ANFIS model adjusted by genetic algorithm could effectively enhance the classification forecasting capability.  相似文献   

18.
Over the past two decades, the quantitative analysis of financial and banking decisions has gained significant interest among researchers and practitioners. A significant part of the research conducted in this field focused on the development of analytical models that can be used in evaluating the alternative ways of action in financial and banking problems. Typically, this evaluation involves the choice of the best alternative, the ranking of the alternatives from the best to the worst ones, or their classification into predefined homogenous classes. This paper is focused on the classification approach illustrating the use of multi–criteria decision aid (MCDA) classification methods in making financial and banking decisions. Three MCDA approaches (the UTADIS method, the ELECTRE TRI method, and the rough set approach) are applied in financial and banking problems, such as business failure prediction, credit–risk assessment, and portfolio selection and management. A comparison is also performed with linear and quadratic discriminant analysis, and logit analysis.  相似文献   

19.
In solving pattern recognition problems, many classification methods, such as the nearest-neighbor (NN) rule, need to determine prototypes from a training set. To improve the performance of these classifiers in finding an efficient set of prototypes, this paper introduces a training sample sequence planning method. In particular, by estimating the relative nearness of the training samples to the decision boundary, the approach proposed here incrementally increases the number of prototypes until the desired classification accuracy has been reached. This approach has been tested with a NN classification method and a neural network training approach. Studies based on both artificial and real data demonstrate that higher classification accuracy can be achieved with fewer prototypes.  相似文献   

20.
传统机器学习方法的有效性依赖于大量的有效训练数据,而这难以满足,因此迁移学习被广泛研究并成为近年来的研究热门.针对由于训练数据严重不足导致多分类场景下分类性能降低的挑战,提出一种基于DLSR(discriminative least squares regressions)的归纳式迁移学习方法(TDLSR).该方法从归纳式迁移学习出发,通过知识杠杆机制,将源域知识迁移到目标域并同目标域数据同时进行模型学习,在提升分类性能的同时保证源域数据的安全性.TDLSR继承了DLSR在多分类任务中扩大类别间间隔的优势,为DLSR注入了迁移能力以适应数据不足的挑战,更加适用于复杂的多分类任务.通过在12个真实UCI数据集上进行实验,验证了所提出方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号