gMLC: a multi-label feature selection framework for graph classification |
| |
Authors: | Xiangnan Kong Philip S Yu |
| |
Affiliation: | (1) AIST Computational Biology Research Center Tokyo, Tokyo, Japan |
| |
Abstract: | Graph classification has been showing critical importance in a wide variety of applications, e.g. drug activity predictions
and toxicology analysis. Current research on graph classification focuses on single-label settings. However, in many applications,
each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels
of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature
selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features
for graph objects with multiple labels. Different from existing feature selection methods in vector spaces that assume the
feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the subgraph
feature mining process. We derive an evaluation criterion to estimate the dependence between subgraph features and multiple
labels of graphs. Then, a branch-and-bound algorithm is proposed to efficiently search for optimal subgraph features by judiciously
pruning the subgraph search space using multiple labels. Empirical studies demonstrate that our feature selection approach
can effectively boost multi-label graph classification performances and is more efficient by pruning the subgraph search space
using multiple labels. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|