首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A method of document clustering based on locality preserving indexing (LPI) and support vector machines (SVM) is presented. The document space is generally of high dimensionality, and clustering in such a high-dimensional space is often infeasible due to the curse of dimensionality. In this paper, by using LPI, the documents are projected into a lower-dimension semantic space in which the documents related to the same semantic are close to each other. Then, by using SVM, the vectors in semantic space are mapped by means of a Gaussian kernel to a high-dimensional feature space in which the minimal enclosing sphere is searched. The sphere, when mapped back to semantics space, can separate into several independent components by the support vectors, each enclosing a separate cluster of documents. By combining the LPI and SVM, not only higher clustering accuracies in a more unsupervised effective way, but also better generalization properties can be obtained. Extensive demonstrations are performed on the Reuters-21578 and TDT2 data sets. This work was supported by National Science Foundation of China under Grant 60471055, Specialized Research Fund for the Doctoral Program of Higher Education under Grant 20040614017.  相似文献   

2.
S. Asharaf 《Pattern recognition》2005,38(10):1779-1783
In this paper a novel kernel-based soft clustering method is proposed. This method incorporates rough set theoretic flavour in support vector clustering paradigm to achieve soft clustering. Empirical studies show that this method can find soft clusters having arbitrary shapes.  相似文献   

3.
On-line fuzzy modeling via clustering and support vector machines   总被引:1,自引:0,他引:1  
Wen Yu  Xiaoou Li 《Information Sciences》2008,178(22):4264-4279
In this paper, we propose a novel approach to identify unknown nonlinear systems with fuzzy rules and support vector machines. Our approach consists of four steps which are on-line clustering, structure identification, parameter identification and local model combination. The collected data are firstly clustered into several groups through an on-line clustering technique, then structure identification is performed on each group using support vector machines such that the fuzzy rules are automatically generated with the support vectors. Time-varying learning rates are applied to update the membership functions of the fuzzy rules. The modeling errors are proven to be robustly stable with bounded uncertainties by a Lyapunov method and an input-to-state stability technique. Comparisons with other related works are made through a real application of crude oil blending process. The results demonstrate that our approach has good accuracy, and this method is suitable for on-line fuzzy modeling.  相似文献   

4.
Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.  相似文献   

5.
基于核聚类方法的多层次支持向量机分类树   总被引:2,自引:0,他引:2  
针对解决多类模式识别问题的SVM方法进行研究。在比较几种常用的多类SVM分类算法的基础上,提出一种基于核聚类方法的多层次SVM分类树,将核空问中的无监督学习方法和有监督学习方法结合起来,实现了一种结构更加简洁清晰、计算效率更高的多层SVM分类树算法,并在实验中取得了良好的结果.  相似文献   

6.
针对直推式支持向量机(TSVM)学习模型求解难度大的问题,提出了一种基于k均值聚类的直推式支持向量机学习算法——TSVMKMC。该算法利用k均值聚类算法,将无标签样本分为若干簇,对每一簇样本赋予相同的类别标签,将无标签样本和有标签样本合并进行直推式学习。由于TSVMKMC算法有效地降低了状态空间的规模,因此运行速度较传统算法有了很大的提高。实验结果表明,TSVMSC算法能够以较快的速度达到较高的分类准确率。  相似文献   

7.
This paper describes a new soft clustering algorithm in which each cluster is modelled by a one-class support vector machine (OC-SVM). The proposed algorithm extends a previously proposed hard clustering algorithm, also based on OC-SVM representation of clusters. The key building block of our method is the weighted OC-SVM (WOC-SVM), a novel tool introduced in this paper, based on which an expectation-maximization-type soft clustering algorithm is defined. A deterministic annealing version of the algorithm is also introduced, and shown to improve the robustness with respect to initialization. Experimental results show that the proposed soft clustering algorithm outperforms its hard clustering counterpart, namely in terms of robustness with respect to initialization, as well as several other state-of-the-art methods.  相似文献   

8.
支持向量机(support vector machine, SVM)具有良好的泛化性能而被广泛应用于机器学习及模式识别领域。然而,当训练集较大时,训练SVM需要极大的时间及空间开销。另一方面,SVM训练所得的判定函数取决于支持向量,使用支持向量集取代训练样本集进行学习,可以在不影响结果分类器分类精度的同时缩短训练时间。采用混合方法来削减训练数据集,实现潜在支持向量的选择,从而降低SVM训练所需的时间及空间复杂度。实验结果表明,该算法在极大提高SVM训练速度的同时,基本维持了原始分类器的泛化性能。  相似文献   

9.
Support vector clustering involves three steps—solving an optimization problem, identification of clusters and tuning of hyper-parameters. In this paper, we introduce a pre-processing step that eliminates data points from the training data that are not crucial for clustering. Pre-processing is efficiently implemented using the R*-tree data structure. Experiments on real-world and synthetic datasets show that pre-processing drastically decreases the run-time of the clustering algorithm. Also, in many cases reduction in the number of support vectors is achieved. Further, we suggest an improvement for the step of identification of clusters.  相似文献   

10.
11.
黄华娟  韦修喜  周永权   《智能系统学报》2019,14(6):1271-1277
针对传统的粒度支持向量机(granular support vector machine, GSVM)将训练样本在原空间粒化后再映射到核空间,导致数据与原空间的分布不一致,从而降低GSVM的泛化能力的问题,本文提出了一种基于模糊核聚类粒化的粒度支持向量机学习算法(fuzzy kernel cluster granular support vector machine, FKC-GSVM)。FKC-GSVM通过利用模糊核聚类直接在核空间对数据进行粒的划分和支持向量粒的选取,在相同的核空间中进行支持向量粒的GSVM训练。在UCI数据集和NDC大数据上的实验表明:与其他几个算法相比,FKC-GSVM在更短的时间内获得了精度更高的解。  相似文献   

12.
针对标签均值半监督支持向量机在图像分类中随机选取无标记样本会导致分类正确率不高,以及算法的稳定性较低的问题,提出了基于聚类标签均值的半监督支持向量机算法。该算法修改了原算法对于无标记样本的惩罚项,对选取的无标记样本聚类,使用聚类标签均值替换标签均值。实验结果表明,使用聚类标签均值训练的分类器大大减少了背景与目标的错分情况,提高了分类的正确率以及算法的稳定性,适合用于图像分类。  相似文献   

13.
Support vector regression (SVR) is a powerful tool in modeling and prediction tasks with widespread application in many areas. The most representative algorithms to train SVR models are Shevade et al.'s Modification 2 and Lin's WSS1 and WSS2 methods in the LIBSVM library. Both are variants of standard SMO in which the updating pairs selected are those that most violate the Karush-Kuhn-Tucker optimality conditions, to which LIBSVM adds a heuristic to improve the decrease in the objective function. In this paper, and after presenting a simple derivation of the updating procedure based on a greedy maximization of the gain in the objective function, we show how cycle-breaking techniques that accelerate the convergence of support vector machines (SVM) in classification can also be applied under this framework, resulting in significantly improved training times for SVR.  相似文献   

14.
针对非线性时间序列故障预报问题,提出了一种基于聚类和支持向量机的方法.将正常的时间序列按照K-均值聚类算法进行聚类学习,同时利用支持向量机回归的时间序列预测算法获得预测序列,然后通过比较聚类所得的正常原型和预测序列的相似性实现故障预报.仿真结果表明:本文提出的方法更能满足实时性的要求,也更为准确.  相似文献   

15.
针对当前SMB (simulated moving bed)难以实时在线测得输出组分纯度的现状,结合Ncut (normalized cut)聚类及增量学习支持向量机的方法建立达到周期性稳定状态时系统的智能模型。采用Ncut方法对离线采集的数据样本进行聚类,得到样本的聚类结果;将聚类后的样本数据按反复记忆增强机制输入向量机进行增强‐增量学习训练;将原始测试样本输入到训练好的模型中进行检验。检验结果表明,采用该模型可以获得更好的模型适应度和检验精度,仿真结果验证了该方法的有效性。  相似文献   

16.
针对胃癌患者住院费用分类标签设定的复杂性以及传统费用建模算法的局限性, 本文提出了一种基于聚类和支持向量机的住院费用建模算法, 为胃癌患者住院费用的控制和预测提供方法基础. 搜集整理宁夏某三甲医 院2009–2011年间1583例胃癌患者为样本, 采用K-means对总住院费用逐年聚类得到分类标签, 最后通过支持向量机对住院费用进行建模预测以及影响因素分析, 用分类准确率作为预测效果的评价指标. 实验结果表明胃癌患者住院费用呈逐年增加趋势, 其中以西药费为主, 占总费用的53.74%. 通过K-Means以年份对费用聚类比单纯以费用分布特征聚类的分类准确率提高了13.13%, 当核函数选用高斯核函数, 且惩罚因子C = 10和核参数 = 1时建立的支持向量机模型最稳定, 分类准确率为92.11%. 实验结果表明根据年份聚类得到类别标签更合理, 结合聚类的SVM来预测住院费用更有效.  相似文献   

17.
Breast cancer is one of the most common cancers diagnosed in women. Large margin classifiers like the support vector machine (SVM) have been reported effective in computer-assisted diagnosis systems for breast cancers. However, since the separating hyperplane determination exclusively relies on support vectors, the SVM is essentially a local classifier and its performance can be further improved. In this work, we introduce a structured SVM model to determine if each mammographic region is normal or cancerous by considering the cluster structures in the training set. The optimization problem in this new model can be solved efficiently by being formulated as one second order cone programming problem. Experimental evaluation is performed on the Digital Database for Screening Mammography (DDSM) dataset. Various types of features, including curvilinear features, texture features, Gabor features, and multi-resolution features, are extracted from the sample images. We then select the salient features using the recursive feature elimination algorithm. The structured SVM achieves better detection performance compared with a well-tested SVM classifier in terms of the area under the ROC curve.  相似文献   

18.
19.
目的 为了提高图像超分辨率算法对数据奇异点的鲁棒性,提出一种采用K均值聚类和支持向量数据描述的图像超分辨率重建算法(Kmeans-SVDD)。方法 训练过程:首先用K均值聚类算法将训练图像的近似子带划分为若干类,然后用支持向量数据描述去除每类数据的奇异点,最后在小波域内用主成分分析训练近似子带和细节子带字典。测试过程:根据同一场景高低分辨率图像近似子带相似这一现象,首先将待重建低分辨率测试图像的近似子带作为相应高分辨率测试图像的近似子带,然后由训练得到的字典恢复出高分辨率测试图像的细节子带,最后通过逆小波变换得到高分辨率测试图像。结果 相比于当前双三次插值、Zeyde、ANR与Kmeans-PCA算法,Kmeans-SVDD算法重建的高分辨率测试图像的平均峰值信噪比依次提高了1.82 dB、0.37 dB、0.30 dB、0.15 dB。结论 通过大量实验发现,在字典训练之前加入SVDD过程可以去除离群点,提高字典质量。在小波域中将各频带分开重建,可避免低频图像中包含的不可靠高频信息对超分辨率结果的影响,从而恢复出可靠的高频信息。  相似文献   

20.
针对支持向量数据描述(SVDD)单类分类方法运算复杂度高的缺点,提出一种启发式约减支持向量数据描述(HR-SVDD)方法。以启发的方式从原有训练集中筛选出部分样本构成约减训练集,对约减训练集进行二次规划解算,得到支持向量和决策边界。通过不同宽度系数高斯核SVDD特征的讨论,证明了HR-SVDD的有效性。人工数据集和真实数据集上的实验结果表明, HR-SVDD分类精度与传统支持向量数据描述相当,但具有更快的运算速度和更小的内存占用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号