首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Chemoinformatics is a research field concerned with the study of physical or biological molecular properties through computer science?s research fields such as machine learning and graph theory. From this point of view, graph kernels provide a nice framework which allows to naturally combine machine learning and graph theory techniques. Graph kernels based on bags of patterns have proven their efficiency on several problems both in terms of accuracy and computational time. Treelet kernel is a graph kernel based on a bag of small subtrees. We propose in this paper several extensions of this kernel devoted to chemoinformatics problems. These extensions aim to weight each pattern according to its influence, to include the comparison of non-isomorphic patterns, to include stereo information and finally to explicitly encode cyclic information into kernel computation.  相似文献   

2.
复杂网络分析与机器学习方法相结合的阿尔茨海默病辅助诊断研究受到了越来越多的关注,其通常采用脑功能网络的方法来描述大脑活动的信息.然而,现有的成果大多基于时域信号匹配构建脑功能网络,忽略了脑活动信息在各个频段下的差异.因此,本文提出了脑网络多频融合图核的阿尔茨海默病诊断方法.首先,将功能磁共振成像产生的图像通过小波变换的方法进行分频段处理;其次,分别计算得到的各频段图像中任意两个脑区间的互信息,并设定阈值与互信息值进行比较进而构造出多频脑网络模型;然后,基于此提出面向多频脑网络模型的融合图核;最后,基于多频融合图核、采用核极限学习机在ADNI(Alzheimer’s Disease Neuroimaging Initiative)公开数据库中获取的一组数据以及在OASIS(Open Access Series of Imaging Studies)公开数据库上获取的一组数据进行阿尔茨海默病的诊断.同时,还通过实验验证了不同参数设置对诊断结果的影响.两组数据集的实验结果表明,提出的多频融合图核的辅助诊断方法能够取得最佳性能,且该方法的辅助诊断准确率在两种数据集上比对比方法的最好结果分别提高了13.79%和15.29%.  相似文献   

3.
In the framework of online object retrieval with learning, we address the problem of graph matching using kernel functions. An image is represented by a graph of regions where the edges represent the spatial relationships. Kernels on graphs are built from kernel on walks in the graph. This paper firstly proposes new kernels on graphs and on walks, which are very efficient for graphs of regions. Secondly we propose fast solutions for exact or approximate computation of these kernels. Thirdly we show results for the retrieval of images containing a specific object with the help of very few examples and counter-examples in the framework of an active retrieval scheme.  相似文献   

4.
The median graph has been presented as a useful tool to represent a set of graphs. Nevertheless its computation is very complex and the existing algorithms are restricted to use limited amount of data. In this paper we propose a new approach for the computation of the median graph based on graph embedding. Graphs are embedded into a vector space and the median is computed in the vector domain. We have designed a procedure based on the weighted mean of a pair of graphs to go from the vector domain back to the graph domain in order to obtain a final approximation of the median graph. Experiments on three different databases containing large graphs show that we succeed to compute good approximations of the median graph. We have also applied the median graph to perform some basic classification tasks achieving reasonable good results. These experiments on real data open the door to the application of the median graph to a number of more complex machine learning algorithms where a representative of a set of graphs is needed.  相似文献   

5.
Nowadays, developing effective techniques able to deal with data coming from structured domains is becoming crucial. In this context kernel methods are the state-of-the-art tool widely adopted in real-world applications that involve learning on structured data. Contrarily, when one has to deal with unstructured domains, deep learning methods represent a competitive, or even better, choice. In this paper we propose a new family of kernels for graphs which exploits an abstract representation of the information inspired by the multilayer perceptron architecture. Our proposal exploits the advantages of the two worlds. From one side we exploit the potentiality of the state-of-the-art graph node kernels. From the other side we develop a multilayer architecture through a series of stacked kernel pre-image estimators, trained in an unsupervised fashion via convex optimization. The hidden layers of the proposed framework are trained in a forward manner and this allows us to avoid the greedy layerwise training of classical deep learning. Results on real world graph datasets confirm the quality of the proposal.  相似文献   

6.
已有的图核大多关注图的局部属性,利用局部的拓扑特征构建图的相似性度量,忽略图的层次结构信息.为了解决这个问题,文中提出基于最优传输的层次化图核.首先,将每个图表示成层次化的图结构.在层次化图结构构建过程中,利用K-means聚类算法构造每层图的节点,节点间的概率连接作为图的边.然后,利用带有熵约束的最优传输计算两图的层次结构上每层图之间的最优传输距离.最后,基于最优传输距离计算基于最优传输的层次化图核.在6个真实图数据集上的实验表明,文中方法可提升分类性能.  相似文献   

7.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL. Editors: David Page and Akihiro Yamamoto  相似文献   

8.
Comprehensive, automated software testing requires an oracle to check whether the output produced by a test case matches the expected behaviour of the programme. But the challenges in creating suitable oracles limit the ability to perform automated testing in some programmes, and especially in scientific software. Metamorphic testing is a method for automating the testing process for programmes without test oracles. This technique operates by checking whether the programme behaves according to properties called metamorphic relations. A metamorphic relation describes the change in output when the input is changed in a prescribed way. Unfortunately, finding the metamorphic relations satisfied by a programme or function remains a labour‐intensive task, which is generally performed by a domain expert or a programmer. In this work, we propose a machine learning approach for predicting metamorphic relations that uses a graph‐based representation of a programme to represent control flow and data dependency information. In earlier work, we found that simple features derived from such graphs provide good performance. An analysis of the features used in this earlier work led us to explore the effectiveness of several representations of those graphs using the machine learning framework of graph kernels, which provide various ways of measuring similarity between graphs. Our results show that a graph kernel that evaluates the contribution of all paths in the graph has the best accuracy and that control flow information is more useful than data dependency information. The data used in this study are available for download at http://www.cs.colostate.edu/saxs/MRpred/functions.tar.gz to help researchers in further development of metamorphic relation prediction methods. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

9.
Graph-based representations have been proved powerful in computer vision. The challenge that arises with large amounts of graph data is that of computationally burdensome edit distance computation. Graph kernels can be used to formulate efficient algorithms to deal with high dimensional data, and have been proved an elegant way to overcome this computational bottleneck. In this paper, we investigate whether the Jensen-Shannon divergence can be used as a means of establishing a graph kernel. The Jensen-Shannon kernel is nonextensive information theoretic kernel, and is defined using the entropy and mutual information computed from probability distributions over the structures being compared. To establish a Jensen-Shannon graph kernel, we explore two different approaches. The first of these is based on the von Neumann entropy associated with a graph. The second approach uses the Shannon entropy associated with the probability state vector for a steady state random walk on a graph. We compare the two resulting graph kernels for the problem of graph clustering. We use kernel principle components analysis (kPCA) to embed graphs into a feature space. Experimental results reveal that the method gives good classification results on graphs extracted both from an object recognition database and from an application in bioinformation.  相似文献   

10.
In this paper, we investigate the use of heat kernels as a means of embedding the individual nodes of a graph in a vector space. The reason for turning to the heat kernel is that it encapsulates information concerning the distribution of path lengths and hence node affinities on the graph. The heat kernel of the graph is found by exponentiating the Laplacian eigensystem over time. In this paper, we explore how graphs can be characterized in a geometric manner using embeddings into a vector space obtained from the heat kernel. We explore two different embedding strategies. The first of these is a direct method in which the matrix of embedding co-ordinates is obtained by performing a Young–Householder decomposition on the heat kernel. The second method is indirect and involves performing a low-distortion embedding by applying multidimensional scaling to the geodesic distances between nodes. We show how the required geodesic distances can be computed using parametrix expansion of the heat kernel. Once the nodes of the graph are embedded using one of the two alternative methods, we can characterize them in a geometric manner using the distribution of the node co-ordinates. We investigate several alternative methods of characterization, including spatial moments for the embedded points, the Laplacian spectrum for the Euclidean distance matrix and scalar curvatures computed from the difference in geodesic and Euclidean distances. We experiment with the resulting algorithms on the COIL database.  相似文献   

11.
Recently, multiple kernel learning (MKL) has gained increasing attention due to its empirical superiority over traditional single kernel based methods. However, most of state-of-the-art MKL methods are “uniform” in the sense that the relative weights of kernels keep fixed among all data.Here we propose a “non-uniform” MKL method with a data-dependent gating mechanism, i.e., adaptively determine the kernel weights for the samples. We utilize a soft clustering algorithm and then tune the weight for each cluster under the graph embedding (GE) framework. The idea of exploiting cluster structures is based on the observation that data from the same cluster tend to perform consistently, which thus increases the resistance to noises and results in more reliable estimate. Moreover, it is computationally simple to handle out-of-sample data, whose implicit RKHS representations are modulated by the posterior to each cluster.Quantitative studies between the proposed method and some representative MKL methods are conducted on both synthetic and widely used public data sets. The experimental results well validate its superiorities.  相似文献   

12.
针对图模式识别领域中现有图核方法对反映图本身拓扑结构的节点特征挖掘不够充分的问题,提出了基于空间句法和最短路径的图核。借鉴建筑学与城市规划学科中的空间句法理论构造分布于图节点上的拓扑特征的量化描述,基于此提出了可表示、计算,正定、适用范围较广的空间句法核和基于最短路径的空间句法核,进而借助支持向量机实现了非精确图匹配。不同于其他图核方法,该方法对图的拓扑特征表达能力强,通用性较好。实验结果表明,所设计的图核在分类精度方面相较于最短路径核有较显著的改善。  相似文献   

13.
Graph-based malware detection using dynamic analysis   总被引:1,自引:0,他引:1  
We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.  相似文献   

14.
Graphs are a powerful and popular representation formalism in pattern recognition. Particularly in the field of document analysis they have found widespread application. From the formal point of view, however, graphs are quite limited in the sense that the majority of mathematical operations needed to build common algorithms, such as classifiers or clustering schemes, are not defined. Consequently, we observe a severe lack of algorithmic procedures that can directly be applied to graphs. There exists recent work, however, aimed at overcoming these limitations. The present paper first provides a review of the use of graph representations in document analysis. Then we discuss a number of novel approaches suitable for making tools from statistical pattern recognition available to graphs. These novel approaches include graph kernels and graph embedding. With several experiments, using different data sets from the field of document analysis, we show that the new methods have great potential to outperform traditional procedures applied to graph representations.  相似文献   

15.
Recognizing characters extracted from natural scene images is quite challenging due to the high degree of intraclass variation. In this paper, we propose a multi-scale graph-matching based kernel for scene character recognition. In order to capture the inherently distinctive structures of characters, each image is represented by several graphs associated with multi-scale image grids. The similarity between two images is thus defined as the optimum energy by matching two graphs(images), which finds the best match for each node in the graph while also preserving the spatial consistency across adjacent nodes. The computed similarity is suitable to construct a kernel for support vector machine(SVM). Multiple kernels acquired by matching graphs with multi-scale grids are combined so that the final kernel is more robust. Experimental results on challenging Chars74k and ICDAR03-CH datasets show that the proposed method performs better than the state of the art methods.  相似文献   

16.
由于自然场景中的文字具有较大的类内间距, 因此识别场景文字具有很大的挑战性. 本文提出了一种基于多尺度图匹配核的场景单字识别方法. 为了利用字符特有的结构特征, 将每幅图像表示为基于不同网格划分的无向图, 通过计算两个无向图之间图匹配的最优能量值来得到两幅图像的相似度, 由于图匹配在计算每个节点的最佳匹配节点时也考虑了相邻节点之间的空间位置约束, 因此可以应对具有一定形变的文字. 通过图匹配得到的两幅图像之间的相似度很适合用来构造支持向量机的核矩阵. 本文将不同尺度网格划分下得到的核矩阵进行多核融合, 使得最终得到的核矩阵更加地鲁棒. 在国际公开场景文字识别数据集Chars74k和ICDAR03-CH上的实验结果表明, 本方法取得了高于国际上已发表的其他方法的单字识别率.  相似文献   

17.
近年来,多核图聚类(MKGC)受到了广泛的关注,这得益于多核学习能有效地避免核函数与核参数的选择,而图聚类能充分挖掘样本间的复杂结构信息。然而现有的MKGC方法存在着如下问题:图学习技术使得模型复杂化,图拉普拉斯矩阵的高秩特性使其难以保证学到的关系图包含精确的c个连通分量(块对角性质),以及大部分方法忽略了候选关系图间的高阶结构信息,使得多核信息难以被充分利用。针对以上问题,提出了一种新的MKGC方法。首先,提出一种新的上界单纯形投影图学习方法,直接将核矩阵投影到图单纯形上,降低了计算复杂度;同时,引入一种新的块对角约束,使学到的关系图能保持精确的块对角属性;此外,在上界单纯形投影空间中引入低秩张量学习来充分挖掘多个候选关系图的高阶结构信息。在多个数据集上与现有的MKGC方法相比,所提出方法计算量小、稳定性高,在聚类精度(ACC)和标准互信息(NMI)指标上具有较大的优势。  相似文献   

18.
Graphs are a flexible and general formalism providing rich models in various important domains, such as distributed computing, intelligent tutoring systems or social network analysis. In many cases, such models need to take changes in the graph structure into account, that is, changes in the number of nodes or in the graph connectivity. Predicting such changes within graphs can be expected to yield important insight with respect to the underlying dynamics, e.g. with respect to user behaviour. However, predictive techniques in the past have almost exclusively focused on single edges or nodes. In this contribution, we attempt to predict the future state of a graph as a whole. We propose to phrase time series prediction as a regression problem and apply dissimilarity- or kernel-based regression techniques, such as 1-nearest neighbor, kernel regression and Gaussian process regression, which can be applied to graphs via graph kernels. The output of the regression is a point embedded in a pseudo-Euclidean space, which can be analyzed using subsequent dissimilarity- or kernel-based processing methods. We discuss strategies to speed up Gaussian processes regression from cubic to linear time and evaluate our approach on two well-established theoretical models of graph evolution as well as two real data sets from the domain of intelligent tutoring systems. We find that simple regression methods, such as kernel regression, are sufficient to capture the dynamics in the theoretical models, but that Gaussian process regression significantly improves the prediction error for real-world data.  相似文献   

19.
In pattern recognition and related fields, graph-based representations offer a versatile alternative to the widely used feature vectors. Therefore, an emerging trend of representing objects by graphs can be observed. This trend is intensified by the development of novel approaches in graph-based machine learning, such as graph kernels or graph-embedding techniques. These procedures overcome a major drawback of graphs, which consists of a serious lack of algorithms for classification. This paper is inspired by the idea of representing graphs through dissimilarities and extends our previous work to the more general setting of Lipschitz embeddings. In an experimental evaluation, we empirically confirm that classifiers that rely on the original graph distances can be outperformed by a classification system using the Lipschitz embedded graphs.  相似文献   

20.
We provide polynomial time data reduction rules for Connected Dominating Set on planar graphs and analyze these to obtain a linear kernel for the planar Connected Dominating Set problem. To obtain the desired kernel we introduce a method that we call reduce or refine. Our kernelization algorithm analyzes the input graph and either finds an appropriate reduction rule that can be applied, or zooms in on a region of the graph which is more amenable to reduction. We find this method of independent interest and believe that it will be useful for obtaining linear kernels for other problems on planar graphs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号