首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The median graph has been presented as a useful tool to represent a set of graphs. Nevertheless its computation is very complex and the existing algorithms are restricted to use limited amount of data. In this paper we propose a new approach for the computation of the median graph based on graph embedding. Graphs are embedded into a vector space and the median is computed in the vector domain. We have designed a procedure based on the weighted mean of a pair of graphs to go from the vector domain back to the graph domain in order to obtain a final approximation of the median graph. Experiments on three different databases containing large graphs show that we succeed to compute good approximations of the median graph. We have also applied the median graph to perform some basic classification tasks achieving reasonable good results. These experiments on real data open the door to the application of the median graph to a number of more complex machine learning algorithms where a representative of a set of graphs is needed.  相似文献   

2.
Credit scoring aims to assess the risk associated with lending to individual consumers. Recently, ensemble classification methodology has become popular in this field. However, most researches utilize random sampling to generate training subsets for constructing the base classifiers. Therefore, their diversity is not guaranteed, which may lead to a degradation of overall classification performance. In this paper, we propose an ensemble classification approach based on supervised clustering for credit scoring. In the proposed approach, supervised clustering is employed to partition the data samples of each class into a number of clusters. Clusters from different classes are then pairwise combined to form a number of training subsets. In each training subset, a specific base classifier is constructed. For a sample whose class label needs to be predicted, the outputs of these base classifiers are combined by weighted voting. The weight associated with a base classifier is determined by its classification performance in the neighborhood of the sample. In the experimental study, two benchmark credit data sets are adopted for performance evaluation, and an industrial case study is conducted. The results show that compared to other ensemble classification methods, the proposed approach is able to generate base classifiers with higher diversity and local accuracy, and improve the accuracy of credit scoring.  相似文献   

3.
In this paper, we consider a vector optimization problem where all functions involved are defined on Banach spaces. New classes of generalized type-I functions are introduced for functions between Banach spaces. Based upon these generalized type-I functions, we obtain a few sufficient optimality conditions and prove some results on duality.  相似文献   

4.
改进的K均值聚类算法在支持矢量机中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
将一种改进的K均值聚类算法应用于支持矢量机(SVM)的训练。基于这一改进的聚类算法,设计了SVM的增量式训练步骤,并给出了在训练过程中删除无用样本的的方法。模式分类的实验结果表明,这种改进的K均值聚类算法在SVM中的应用不仅大幅度地缩短了SVM的训练时间,而且进一步提高了它的分类能力。  相似文献   

5.
The Semantic Web is distributed yet interoperable: Distributed since resources are created and published by a variety of producers, tailored to their specific needs and knowledge; Interoperable as entities are linked across resources, allowing to use resources from different providers in concord. Complementary to the explicit usage of Semantic Web resources, embedding methods made them applicable to machine learning tasks. Subsequently, embedding models for numerous tasks and structures have been developed, and embedding spaces for various resources have been published. The ecosystem of embedding spaces is distributed but not interoperable: Entity embeddings are not readily comparable across different spaces. To parallel the Web of Data with a Web of Embeddings, we must thus integrate available embedding spaces into a uniform space.Current integration approaches are limited to two spaces and presume that both of them were embedded with the same method — both assumptions are unlikely to hold in the context of a Web of Embeddings. In this paper, we present FedCoder— an approach that integrates multiple embedding spaces via a latent space. We assert that linked entities have a similar representation in the latent space so that entities become comparable across embedding spaces. FedCoder employs an autoencoder to learn this latent space from linked as well as non-linked entities.Our experiments show that FedCoder substantially outperforms state-of-the-art approaches when faced with different embedding models, that it scales better than previous methods in the number of embedding spaces, and that it improves with more graphs being integrated whilst performing comparably with current approaches that assumed joint learning of the embeddings and were, usually, limited to two sources. Our results demonstrate that FedCoder is well adapted to integrate the distributed, diverse, and large ecosystem of embeddings spaces into an interoperable Web of Embeddings.  相似文献   

6.
针对直推式支持向量机(TSVM)学习模型求解难度大的问题,提出了一种基于k均值聚类的直推式支持向量机学习算法——TSVMKMC。该算法利用k均值聚类算法,将无标签样本分为若干簇,对每一簇样本赋予相同的类别标签,将无标签样本和有标签样本合并进行直推式学习。由于TSVMKMC算法有效地降低了状态空间的规模,因此运行速度较传统算法有了很大的提高。实验结果表明,TSVMSC算法能够以较快的速度达到较高的分类准确率。  相似文献   

7.
Graph-based representations are of broad use and applicability in pattern recognition. They exhibit, however, a major drawback with regards to the processing tools that are available in their domain. Graph embedding into vector spaces is a growing field among the structural pattern recognition community which aims at providing a feature vector representation for every graph, and thus enables classical statistical learning machinery to be used on graph-based input patterns. In this work, we propose a novel embedding methodology for graphs with continuous node attributes and unattributed edges. The approach presented in this paper is based on statistics of the node labels and the edges between them, based on their similarity to a set of representatives. We specifically deal with an important issue of this methodology, namely, the selection of a suitable set of representatives. In an experimental evaluation, we empirically show the advantages of this novel approach in the context of different classification problems using several databases of graphs.  相似文献   

8.
分析了支持向量机在解决无监督分类问题上的不足,提出一种基于支持向量机思想的最大间距的聚类新方法。实验结果表明,该算法能成功地解决很多非监督分类问题。  相似文献   

9.
Searching and mining biomedical literature databases are common ways of generating scientific hypotheses by biomedical researchers. Clustering can assist researchers to form hypotheses by seeking valuable information from grouped documents effectively. Although a large number of clustering algorithms are available, this paper attempts to answer the question as to which algorithm is best suited to accurately cluster biomedical documents. Non-negative matrix factorization (NMF) has been widely applied to clustering general text documents. However, the clustering results are sensitive to the initial values of the parameters of NMF. In order to overcome this drawback, we present the ensemble NMF for clustering biomedical documents in this paper. The performance of ensemble NMF was evaluated on numerous datasets generated from the TREC Genomics track dataset. With respect to most datasets, the experimental results have demonstrated that the ensemble NMF significantly outperforms classical clustering algorithms of bisecting K-means, and hierarchical clustering. We compared four different methods for constructing an ensemble NMF. For clustering biomedical documents, this research is the first to compare ensemble NMF with typical classical clustering algorithms, and validates ensemble NMF constructed from different graph-based ensemble algorithms. This is also the first work on ensemble NMF with Hybrid Bipartite Graph Formulation for clustering biomedical documents.  相似文献   

10.
曹晓莉  江朝元  甘思源 《计算机应用》2008,28(10):2648-2651
针对船用污水处理装置状态监测与故障诊断问题,提出了一种聚类支持向量机的故障诊断算法模型。该算法模型首先采用神经网络聚类算法将设备监测状态样本空间聚类分析出正常与异常子空间,再对异常子空间构造多分类支持向量机对故障进行诊断识别。该算法模型避免了盲目故障分类,提高了分类性能。通过对某船用污水处理装置实测样本的训练和检验表明,该算法具有较好的泛化性和推广能力。  相似文献   

11.
Although many multi-view clustering approaches have been developed recently, one common shortcoming of most of them is that they generally rely on the original feature space or consider the two components of the similarity-based clustering separately (i.e., similarity matrix construction and cluster indicator matrix calculation), which may negatively affect the clustering performance. To tackle this shortcoming, in this paper, we propose a new method termed Multi-view Clustering in Latent Embedding Space (MCLES), which jointly recovers a comprehensive latent embedding space, a robust global similarity matrix and an accurate cluster indicator matrix in a unified optimization framework. In this framework, each variable boosts each other in an interplay manner to achieve the optimal solution. To avoid the optimization problem of quadratic programming, we further propose to relax the constraint of the global similarity matrix, based on which an improved version termed Relaxed Multi-view Clustering in Latent Embedding Space (R-MCLES) is proposed. Compared with MCLES, R-MCLES achieves lower computational complexity with more correlations between pairs of data points. Extensive experiments conducted on both image and document datasets have demonstrated the superiority of the proposed methods when compared with the state-of-the-art.  相似文献   

12.
针对传统图像分割算法对不同类型噪声敏感性缺陷的问题,基于临近像素空间距离的模糊C均值聚类算法即SFCM (fuzzy C means clustering algorithm based on the space distance of the nearest pixels)算法,采用核化的空间距离公式,将点到点之间的距离转化为点到空间的距离,很好的平衡了考察像素点临近像素点的灰度信息与位置信息间的关系,进一步克服了临近像素的位置差异对考察像素影响不同的缺点.通过在合成图像和自然图像上的大量实验并与几个传统算法进行对比,不仅表现出了很强的抗干扰能力,提高了聚类精度,并且很好的保留了原图像边缘等细节信息,体现出了较强的鲁棒性.  相似文献   

13.
张烨  田雯  刘盛鹏 《计算机工程》2012,38(24):152-155
采用集合经验模式分解(EEMD)和多变量相空间重构技术,结合非线性支持向量回归(SVR)模型,提出一种火灾次数时间序列组合预测方法。根据EEMD将非平稳的火灾时间序列分解为一系列不同尺度的固有模态分量,利用多变量相空间重构技术对分解的各个分量进行相空间重构,构建其训练数据,对重构的训练数据建立各分量的非线性支持向量回归预测模型,使用SVR集成预测方法对火灾时间序列进行预测。仿真结果表明,与单变量相空间重构方法以及SVR方法相比,该方法具有较高的预测精度。  相似文献   

14.
给出了一种空间向量遗传聚类分析方法,对海洋环境监测得到的多参数数据进行分析。采用空间向量遗传聚类方法对采样点的温度,盐度,pH,DO等参数进行聚类,并将聚类结果投影到环境监测参数特征空间,便可以在特征空间中直观地对监测区某一时段采样点进行多参数数据分析,获知各采样点水质状况。通过对不同时段采样点数据的聚类分析,还可以对监测区海水变化趋势进行判断。此方法不仅能挖掘出采样点数据的关联性,而且使得对多采样点多参数数据的分析变得直观、清晰,提高了对海洋环境监测数据的分析效果。  相似文献   

15.
基于向量空间的Web服务发现模糊方法   总被引:2,自引:0,他引:2  
彭敦陆  周傲英 《计算机应用》2006,26(9):2009-2012
Web服务已逐渐发展成为重要的分布式计算范式。在综合分析了现有的Web服务描述文档的基础上,提出了一种基于模糊集的服务特征项集选取算法以及Web服务向量空间的生成方法。利用生成的向量空间,对Web服务进行模糊聚类。基于此,文中给出了向量空间中进行Web服务发现的模糊方法。所提出的方法只需利用现有的Web服务描述信息,保证了服务发现的有效性。  相似文献   

16.
Manufacturing processes usually exhibit mixed operational conditions (OCs) due to changes in process/tool/equipment health status. Undesired OCs are direct causes of out-of-control production and thus need to be identified. Data-driven OC identification has been widely used for recognizing undesired OCs, yet most methods of this kind require labels indicating the OCs in model training. In industrial applications, such labels are rarely available due to delay, incompleteness or physical constraints in data collection. A typical case is the thermal images acquired by in-process infrared camera and pyrometer, which contain rich information about process health status but are unlabeled. To facilitate data-driven OC identification with unlabeled thermal images, this study proposes a feature extraction-clustering framework that characterizes the heat-affected zone by its temperature profile and performs ensemble clustering on the extracted features to label the data. Domain knowledge from plant manufacturing is incorporated in the framework to map cluster labels to OCs. Both offline OC recovery and online OC identification are studied. Thermal images from hot stamping in automotive manufacturing are used to demonstrate and validate the proposed method. The feasibility, effectiveness and generality are well justified by the case study results.  相似文献   

17.
高斯核参数σ的选择,直接影响着高斯核支持向量机的分类性能。将聚类方法与最小距离分类法进行融合,构造了能有效确定高斯核参数σ的优化算法。采用高斯核支持向量机方法对测试集进行分类,以分类正确率来评判选取核参数σ的效果。实验表明,该方法适宜于较广泛的数据类型,具有良好的推广能力,并能有效提高分类效果。  相似文献   

18.
胡磊  牛秦洲  陈艳 《计算机应用》2013,33(4):991-993
针对传统重复聚类算法精度不高、消耗资源较大的缺点,提出了一种模糊C均值(FCM)与支持向量机(SVM)相结合的增强聚类算法。该算法思路是先将实例数据集利用FCM粗分为C类,然后使用SVM再对每一类进行细化分类,实现中提出了基于完全二叉树的决策级联式SVM模型,以便达到增强聚类的目的。针对使用FCM迭代聚类的过程中有可能会出现新的特征使原有的聚类失去平衡性的问题,提出了使用划分的思想对数据集进行预处理来消除这种不利影响。利用鸢尾属植物真实数据集对相关算法进行实验对比分析,结果表明该算法能够克服精度低的缺点,并节约了系统资源,可以提高聚类的质量。  相似文献   

19.
引入事务的恢复机制改进Kmeans算法,改进后的算法允许在运行过程中的任何时刻停机,重新启动后可在停机前运算成果的基础上继续运算,直至算法结束。改进后的算法使得普通机器条件下针对大数据集运用Kmeans算法成为可能。改进后的算法在长达400 h的聚类运算中得到了检验。  相似文献   

20.
从建立像素色彩空间的多维向量模型出发,采用一种改进的模糊C均值聚类算法对图像进行分割,从而得到一组图像像素空间的特征区域向量,并采用特征向量相似度计算方法计算图像相似度,进而比较两幅图像相似度大小,以达到图像识别的目的.通过实验对图像相似识别效果进行验证,实验表明,基于多维向量模型模糊聚类方法在图像识别中有一定应用价值.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号