首页 | 本学科首页   官方微博 | 高级检索  
     


Dimension reduction in principal component analysis for trees
Affiliation:1. Departamento de Matemáticas, Centro de Investigación y de Estudios Avanzados del IPN, Apartado Postal 14–740, 07000 Mexico City, D.F., Mexico;2. HP Laboratories, 1501 Page Mill Rd MS 1140, Palo Alto, CA, United States;3. Department of Neurosurgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States;1. Department of Computer Science, University of North Carolina at Charlotte, USA;2. Department of Computer Science, Rutgers University, USA;1. CINVESTAV del IPN, Departamento de Matemáticas, Av. Instituto Politécnico Nacional 2508, Ciudad de México, 07360, Mexico;2. Instituto Politécnico Nacional, Escuela Superior de Física y Matemáticas, Av. Instituto Politécnico Nacional s/n, Edificio 9, Ciudad de México, 07730, Mexico;1. Department of Mathematics, University of Isfahan, Isfahan 81746-73441, Iran;2. School of Mathematics, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran
Abstract:The statistical analysis of tree structured data is a new topic in statistics with wide application areas. Some Principal Component Analysis (PCA) ideas have been previously developed for binary tree spaces. These ideas are extended to the more general space of rooted and ordered trees. Concepts such as tree-line and forward principal component tree-line are redefined for this more general space, and the optimal algorithm that finds them is generalized.An analog of the classical dimension reduction technique in PCA for tree spaces is developed. To do this, backward principal components, the components that carry the least amount of information on tree data set, are defined. An optimal algorithm to find them is presented. Furthermore, the relationship of these to the forward principal components is investigated, and a path-independence property between the forward and backward techniques is proven.These methods are applied to a brain artery data set of 98 subjects. Using these techniques, the effects of aging to the brain artery structure of males and females is investigated. A second data set of the organization structure of a large US company is also analyzed and the structural differences across different types of departments within the company are explored.
Keywords:Object oriented data analysis  Combinatorial optimization  Principal component analysis  Tree-lines  Tree structured objects  Dimension reduction
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号