首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于CDC的适用于高维数据的因果推断算法
引用本文:李洪飞,万亚平,阳小华,耿家兴.一种基于CDC的适用于高维数据的因果推断算法[J].计算机技术与发展,2020(1):38-43.
作者姓名:李洪飞  万亚平  阳小华  耿家兴
作者单位:南华大学计算机学院;中核集团高可信计算重点学科实验室
基金项目:中央军委科技委创新特区项目(17-163-15-XJ-002-002-04);国家自然科学基金(11805093);湖南省教育重点项目(17A185);湖南省自然科学基金资助项目(2019JJ0486)
摘    要:一对观测变量之间的因果关系的推断是科学中的基本问题,基于观测数据分析提出因果关系的方法对于产生假设和加速科学发现具有实用价值。利用传统的因果推断算法从高维数据中学习因果网络结构和提高学习准确率是目前研究的难点。在引入耦合相关系数(copula dependence coefficient,CDC)的基础上,提出了一种适用于高维数据的两步骤因果推断算法。首先该算法利用优于最大信息系数的CDC对变量间的关联度进行检测,寻找目标节点的父子节点集;然后使用非线性最小二乘独立回归算法,为图中的目标节点与其父子节点之间标注因果方向;最后迭代所有的节点完成完整的因果网络结构。实验结果表明,该算法提高了高维数据下因果网络结构学习的准确率。同时在大样本数据集中,该算法的时间复杂度优于传统算法,对异常值具有鲁棒性。

关 键 词:耦合相关系数  最大信息系数  最小二乘回归  因果推断

A High Dimensional Causal Inference Algorithm Based on CDC
LI Hong-fei,WAN Ya-ping,YANG Xiao-hua,GENG Jia-xing.A High Dimensional Causal Inference Algorithm Based on CDC[J].Computer Technology and Development,2020(1):38-43.
Authors:LI Hong-fei  WAN Ya-ping  YANG Xiao-hua  GENG Jia-xing
Affiliation:(School of Computer Science,University of South China,Hengyang 421001,China;CNNC Key Laboratory on High Trusted Computing,Hengyang 421001,China)
Abstract:The inference of the causal relationship between a pair of observation variables is a fundamental problem in science.The method of proposing causal relationships based on analysis of observation data is valuable for hypothesis generation and accelerating scientific discovery.It is difficult to study the causal network structure and improve the learning accuracy from high-dimensional data by using traditional causal inference algorithm.Based on copula dependence coefficient(CDC),we propose a two-step causal inference algorithm for high-dimensional data.Firstly,the CDC,which is superior to the maximum information coefficient,is used to detect the degree of correlation between variables for the set of parent and child nodes of the target node.Then,the nonlinear least squares independent regression algorithm is used to distinguish the directions between the target node and its parent and child nodes.Finally all nodes are iterated to complete the causal network structure.The experiment shows that the proposed algorithm improves the accuracy of causal network structure under high dimensional data.At the same time,in the large sample data set,the time complexity of this algorithm is better than that of traditional algorithm,with robustness to outliers.
Keywords:copula dependence coefficient  maximum information coefficient  least squares regression  causal inference
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号