首页 | 本学科首页   官方微博 | 高级检索  
     


Network regression with predictive clustering trees
Authors:Daniela Stojanova  Michelangelo Ceci  Annalisa Appice  Sa?o D?eroski
Affiliation:1. Department of Knowledge Technologies, Jo?ef Stefan Institute, Jo?ef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia
2. Dipartimento di Informatica, Universit?? degli Studi di Bari ??Aldo Moro??, via Orabona 4, 70125, Bari, Italy
3. Department of Knowledge Technologies, Jo?ef Stefan Institute, Jo?ef Stefan International Postgraduate School, Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Jamova cesta 39, 1000, Ljubljana, Slovenia
Abstract:Network data describe entities represented by nodes, which may be connected with (related to) each other by edges. Many network datasets are characterized by a form of autocorrelation, where the value of a variable at a given node depends on the values of variables at the nodes it is connected with. This phenomenon is a direct violation of the assumption that data are independently and identically distributed. At the same time, it offers an unique opportunity to improve the performance of predictive models on network data, as inferences about one entity can be used to improve inferences about related entities. Regression inference in network data is a challenging task. While many approaches for network classification exist, there are very few approaches for network regression. In this paper, we propose a data mining algorithm, called NCLUS, that explicitly considers autocorrelation when building regression models from network data. The algorithm is based on the concept of predictive clustering trees (PCTs) that can be used for clustering, prediction and multi-target prediction, including multi-target regression and multi-target classification. We evaluate our approach on several real world problems of network regression, coming from the areas of social and spatial networks. Empirical results show that our algorithm performs better than PCTs learned by completely disregarding network information, as well as PCTs that are tailored for spatial data, but do not take autocorrelation into account, and a variety of other existing approaches.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号