首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
张永  浮盼盼  张玉婷 《计算机应用》2013,33(10):2801-2803
针对大规模数据的分类问题,将监督学习与无监督学习结合起来,提出了一种基于分层聚类和重采样技术的支持向量机(SVM)分类方法。该方法首先利用无监督学习算法中的k-means聚类分析技术将数据集划分成不同的子集,然后对各个子集进行逐类聚类,分别选出各类中心邻域内的样本点,构成最终的训练集,最后利用支持向量机对所选择的最具代表样本点进行训练建模。实验表明,所提方法可以大幅度降低支持向量机的学习代价,其分类精度比随机欠采样更优,而且可以达到采用完整数据集训练所得的结果  相似文献   

2.
基于神经网络的支持向量机学习方法研究   总被引:4,自引:0,他引:4       下载免费PDF全文
针对支持向量机(Support Vector Machine,SVM)对大规模样本分类效率低下的问题,提出了基于自适应共振理论(Adaptive Resonance Theory,ART)神经网络与自组织特征映射(Self-Organizing feature Map,SOM)神经网络的SVM训练算法,分别称为ART-SVM算法与SOM-SVM算法。这两种算法通过聚类压缩数据集,使SVM训练的速度大大提高,同时可获得令人满意的泛化能力。  相似文献   

3.
提出L1范数正则化支持向量机(SVM)聚类算法。该算法能够同时实现聚类和特征选择功能。给出L1范数正则化SVM聚类原问题和对偶问题形式,采用类似迭代坐标下降的方法求解困难的混合整数规划问题。在多组数据集上的实验结果表明,L1范数正则化SVM聚类算法聚类准确率与L2范数正则化SVM聚类算法相近,而且能够实现特征选择。  相似文献   

4.
针对单层稀疏编码结构对图像特征学习能力的局限性问题,提出了一个基于图像块稀疏表示的深层架构,即多层融合局部性和非负性的Laplacian稀疏编码算法(MLLSC)。对每个图像平均区域划分并进行尺度不变特征变换(SIFT)特征提取,在稀疏编码阶段,在Laplacian稀疏编码的优化函数中添加局部性和非负性,在第一层和第二层分别进行字典学习和稀疏编码,分别得到图像块级、图像级的稀疏表示,为了去除冗余特征,在进行第二层稀疏编码之前进行主成分分析(PCA)降维,最后采用多类线性支持向量机进行分类。在四个标准数据集上进行验证,实验结果表明,MLLSC方法具有高效的特征学习能力,能够捕获图像更深层次的特征信息,相对于单层结构算法准确率提高了3%~13%,相对于多层稀疏编码算法准确率提高了1%~2.3%;并对不同参数进行了对比分析,充分展现了其在图像分类中的有效性。  相似文献   

5.
曹鸿亮  张莹  武斌  李繁菀  那绪博 《计算机应用》2021,41(12):3608-3613
已有很多机器学习算法能够很好地应对预测分类问题,但这些方法在用于小样本、大特征空间的医疗数据集时存在着预测准确率和F1值不高的问题。为改善肝移植并发症预测的准确率和F1值,提出一种基于迁移成分分析(TCA)和支持向量机(SVM)的肝移植并发症预测分类方法。该方法采用TCA进行特征空间的映射和降维,将源领域和目标领域映射到同一再生核希尔伯特空间,从而实现边缘分布自适应;迁移完成之后在源领域上训练SVM,训练完成后在目标领域上实现并发症的预测分析。在肝移植并发症预测实验中,针对并发症Ⅰ、并发症Ⅱ、并发症Ⅲa、并发症Ⅲb、并发症Ⅳ进行预测,与传统机器学习和渐进式对齐异构域适应(HDA)相比,所提方法的准确率提升了7.8%~42.8%,F1值达到85.0%~99.0%,而传统机器学习和HDA由于正负样本不均衡出现了精确率很高而召回率很低的情况。实验结果表明TCA结合SVM能够有效提高肝移植并发症预测的准确率和F1值。  相似文献   

6.
In classification, every feature of the data set is an important contributor towards prediction accuracy and affects the model building cost. To extract the priority features for prediction, a suitable feature selector is schemed. This paper proposes a novel memetic based feature selection model named Shapely Value Embedded Genetic Algorithm (SVEGA). The relevance of each feature towards prediction is measured by assembling genetic algorithms with shapely value measures retrieved from SVEGA. The obtained results are then evaluated using Support Vector Machine (SVM) with different kernel configurations on 11 + 11 benchmark datasets (both binary class and multi class). Eventually, a contrasting analysis is done between SVEGA-SVM and other existing feature selection models. The experimental results with the proposed setup provides robust outcome; hence proving it to be an efficient approach for discovering knowledge via feature selection with improved classification accuracy compared to conventional methods.  相似文献   

7.
在机器学习及其分类问题时经常会遇到非平衡数据集,为了提高非平衡数据集分类的有效性,提出了基于商空间理论的过采样分类算法,即QMSVM算法。对训练集中多数类样本进行聚类结构划分,所得划分结果和少数类样本合并进行线性支持向量机(SVM)学习,从而获取多数类样本的支持向量和错分的样本粒;另一方面,获取少数类样本的支持向量和错分的样本,进行SMOTE采样,最后把上述得到的两类样本合并进行SVM学习,这样来实现学习数据集的再平衡处理,从而得到更加合理的分类超平面。实验结果表明,和其他几种算法相比,所提算法虽在正确分类率上有所降低,但较大改善了g_means值和acc+值,且对非平衡率较大的数据集效果会更好。  相似文献   

8.
童林  官铮 《计算机应用》2021,41(10):2919-2927
针对支持向量机(SVM)在交通流量预测中存在波动性且预测精度低的问题,提出了采用模糊信息粒化(FIG)和改进鲸鱼优化算法(IWOA)的SVM模型来预测交通流量的变化趋势和动态区间。首先,对数据处理采用FIG方法进行处理,从而得到交通流量变化区间的上界(Up)、下界(Low)和趋势值(R);其次,在鲸鱼优化算法(WOA)的种群初始化中采用动态对立学习来增加种群多样性,并引入了非线性收敛因子和自适应权重来增强算法的全局搜索及局部寻优能力,然后建立了IWOA模型,并分析了IWOA的复杂度;最后,以预测交通流量的均方误差(MSE)为目标函数,在IWOA迭代过程中不断优化SVM的超参数,建立了基于FIG-IWOA-SVM的交通流量区间预测模型。在国内和国外交通流量数据集上进行测试的结果表明,在国外交通流量预测上,与基于遗传算法优化的支持向量机(GA-SVM)、基于粒子群优化算法优化的支持向量机(PSO-SVM)和基于鲸鱼优化算法的支持向量机(WOA-SVM)相比,IWOA-SVM模型的平均绝对误差(MAE)分别降低了89.5%、81.5%和1.5%;而FIG-IWOA-SVM模型在交通流量动态区间和趋势预测上与FIG-GA-SVM、FIG-PSO-SVM和FIG-WOA-SVM等模型相比预测精度更高且预测范围更平稳。实验结果表明,在不增加算法复杂度的前提下,FIG-IWOA-SVM模型能够合理地预测交通流量的变化趋势和变化区间,为后续的交通规划和流量控制提供依据。  相似文献   

9.
基于模糊分割和邻近对的支持向量机分类器   总被引:1,自引:0,他引:1  
支持向量机算法对噪声点和异常点是敏感的,为了解决这个问题,人们提出了模糊支持向量机,但其中的模糊隶属度函数需要人为设置。提出基于模糊分割和邻近对的支持向量机分类器。在该算法中,首先根据聚类有效性用模糊c-均值聚类算法分别对训练集中的正负类数据聚类;然后,根据聚类结果构造c个二分类问题,求解得c个二分类器;最后,用邻近对策略对样本点进行识别。用4个著名的数据集进行了数值实验,结果表明该算法能有效提高带噪声点和异常点数据集分类的预测精度。  相似文献   

10.
Graph-based methods have aroused wide interest in pattern recognition and machine learning, which capture the structural information in data into classifier design through defining a graph over the data and assuming label smoothness over the graph. Laplacian Support Vector Machine (LapSVM) is a representative of these methods and an extension of the traditional SVM by optimizing a new objective additionally appended Laplacian regularizer. The regularizer utilizes the local linear patches to approximate the data manifold structure and assumes the same label of the data on each patch. Though LapSVM has shown more effective classification performance than SVM experimentally, it in fact concerns more the locality than the globality of data manifold due to the Laplacian regularizer itself. As a result, LapSVM is relatively sensitive to the local change of the data and cannot characterize the manifold quite faithfully. In this paper, we design an alternative regularizer, termed as Glocalization Pursuit Regularizer. The new regularizer introduces a natural global structure measure to grasp the global and local manifold information as simultaneously as possible, which can be proved to make the representation of the manifold more compact than the Laplacian regularizer. We further introduce the new regularizer into SVM to develop an alternative graph-based SVM, called as Glocalization Pursuit Support Vector Machine (GPSVM). GPSVM not only inherits the advantages of both SVM and LapSVM but also uses the structural information more reasonably to guide the classifier design. The experiments both on the toy and real-world datasets demonstrate the better classification performance of our proposed GPSVM compared with SVM and LapSVM.  相似文献   

11.
针对现有算法不能有效应用于多因素轨迹异常检测的问题,提出基于核主成分分析(KPCA)的异常轨迹检测方法。首先,为了改善轨迹特征提取的效果,采用KPCA对轨迹数据进行空间转换,将非线性空间转换到高维线性空间;其次,为了提高异常检测的准确率,采用一类支持向量机对轨迹特征数据进行无监督学习和预测;最终检测出具有异常行为的轨迹。采用大西洋飓风数据对算法进行测试,实验结果表明,该算法能够有效提取出轨迹特征,并且与同类算法相比,该算法在多因素轨迹异常检测方面具有更好的检测效果。  相似文献   

12.
针对传统支持向量机(SVM)在封装式特征选择中分类精度低、特征子集选择冗余以及计算效率差的不足,利用元启发式优化算法同步优化SVM与特征选择。为改善SVM分类效果以及选择特征子集的能力,首先,利用自适应差分进化(DE)算法、混沌初始化与锦标赛选择策略对斑点鬣狗优化(SHO)算法改进,以增强其局部搜索能力并提高其寻优效率与求解精度;其次,将改进后的算法用于特征选择与SVM参数调整的同步优化中;最后,在UCI数据集进行特征选择仿真实验,采取分类准确率、选择特征数、适应度值及运行时间来综合评估所提算法的优化性能。实验结果证明,改进算法的同步优化机制能够在高分类准确率下降低特征选择的数目,该算法比传统算法更适合解决封装式特征选择问题,具有良好的应用价值。  相似文献   

13.
Support Vector Machine (SVM) is one of the well-known classifiers. SVM parameters such as kernel parameters and penalty parameter (C) significantly influence the classification accuracy. In this paper, a novel Chaotic Antlion Optimization (CALO) algorithm has been proposed to optimize the parameters of SVM classifier, so that the classification error can be reduced. To evaluate the proposed algorithm (CALO-SVM), the experiment adopted six standard datasets which are obtained from UCI machine learning data repository. For verification, the results of the CALO-SVM algorithm are compared with grid search, which is a conventional method of searching parameter values, standard Ant Lion Optimization (ALO) SVM, and three well-known optimization algorithms: Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Social Emotional Optimization Algorithm (SEOA). The experimental results proved that the proposed algorithm is capable of finding the optimal values of the SVM parameters and avoids the local optima problem. The results also demonstrated lower classification error rates compared with GA, PSO, and SEOA algorithms.  相似文献   

14.
This paper proposes a novel kernel clustering algorithm using a hybrid memetic algorithm for clustering complex, unlabeled, and linearly non-separable datasets. The kernel function can transform nonlinear data into a high dimensional feature space. It increases the probability of the linear separability of the patterns within the transformed space and simplifies the associated data structure. According to the distribution of various datasets, three local learning operators are designed; meanwhile double mutation operators incorporated into local learning operators to further enhance the ability of global exploration and overcome premature convergence effectively. The performance comparisons of the proposed method with k-means, kernel k-means, global kernel k-means and spectral clustering algorithms on artificial datasets and UCI datasets indicate that the proposed clustering algorithm outperforms the compared algorithms.  相似文献   

15.
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

16.
Credit scoring with a data mining approach based on support vector machines   总被引:3,自引:0,他引:3  
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

17.
Most of the widely used pattern classification algorithms, such as Support Vector Machines (SVM), are sensitive to the presence of irrelevant or redundant features in the training data. Automatic feature selection algorithms aim at selecting a subset of features present in a given dataset so that the achieved accuracy of the following classifier can be maximized. Feature selection algorithms are generally categorized into two broad categories: algorithms that do not take the following classifier into account (the filter approaches), and algorithms that evaluate the following classifier for each considered feature subset (the wrapper approaches). Filter approaches are typically faster, but wrapper approaches deliver a higher performance. In this paper, we present the algorithm – Predictive Forward Selection – based on the widely used wrapper approach forward selection. Using ideas from meta-learning, the number of required evaluations of the target classifier is reduced by using experience knowledge gained during past feature selection runs on other datasets. We have evaluated our approach on 59 real-world datasets with a focus on SVM as the target classifier. We present comparisons with state-of-the-art wrapper and filter approaches as well as one embedded method for SVM according to accuracy and run-time. The results show that the presented method reaches the accuracy of traditional wrapper approaches requiring significantly less evaluations of the target algorithm. Moreover, our method achieves statistically significant better results than the filter approaches as well as the embedded method.  相似文献   

18.
支持向量引导的字典学习算法依据大间隔分类原则,仅考虑每类编码向量边界条件建立决策超平面,未利用数据的分布信息,在一定程度上限制了模型的泛化能力.为解决该问题,提出最小类内方差支持向量引导的字典学习算法.将融合Fisher线性鉴别分析和支持向量机大间隔分类准则的最小类内方差支持向量机作为鉴别条件,在模型分类器的交替优化过程中,充分考虑编码向量的分布信息,保障同类编码向量总体一致的同时降低向量间的耦合度并修正分类矢量,从而挖掘编码向量鉴别信息,使其更好地引导字典学习以提高算法分类性能.在人脸、物体和手写数字识别数据集上的实验结果表明,在大部分样本和原子数量条件下,该算法的识别率和原子鲁棒性均优于K奇异值分解、局部特征和类标嵌入约束等经典字典学习算法.  相似文献   

19.
子空间聚类算法是一种面向高维数据的聚类方法,具有独特的数据自表示方式和较高的聚类精度。传统子空间聚类算法聚焦于对输入数据构建最优相似图再进行分割,导致聚类效果高度依赖于相似图学习。自适应近邻聚类(CAN)算法改进了相似图学习过程,根据数据间的距离自适应地分配最优邻居以构建相似图和聚类结构。然而,现有CAN算法在进行高维数据非线性聚类时,难以很好地捕获局部数据结构,从而导致聚类准确性及算法泛化能力有限。提出一种融合自动权重学习与结构化信息的深度子空间聚类算法。通过自编码器将数据映射到非线性潜在空间并降维,自适应地赋予潜在特征不同的权重从而处理噪声特征,最小化自编码器的重构误差以保留数据的局部结构信息。通过CAN方法学习相似图,在潜在表示下迭代地增强各特征间的相关性,从而保留数据的全局结构信息。实验结果表明,在ORL、COIL-20、UMIST数据集上该算法的准确率分别达到0.780 1、0.874 3、0.742 1,聚类性能优于LRR、LRSC、SSC、KSSC等算法。  相似文献   

20.
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm   总被引:3,自引:0,他引:3  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号