首页 | 本学科首页   官方微博 | 高级检索  
     


Integrated dimensionality reduction technique for mixed-type data involving categorical values
Affiliation:1. School of Computer Science and Digital Media, The Robert Gordon University, Aberdeen, United Kingdom;2. Department of Computer Science and Mathematics, Universitat Rovirai Virgili, Tarragona, Spain;3. Department of Mathematics and Computer Science, University of Münster, Münster, Germany
Abstract:Dimensionality reduction is a useful technique to cope with high dimensionality of the real-world data. However, traditional methods were studied in the context of datasets with only numeric attributes. With the demand of analyzing datasets involving categorical attributes, an extension to the recent dimensionality-reduction technique t-SNE is proposed. The extension facilitates t-SNE to handle mixed-type datasets. Each attribute of the data is associated with a distance hierarchy which allows the distance between numeric values and between categorical values be measured in a unified manner. More importantly, domain knowledge regarding distance considering semantics embedded in categorical values can be specified via the hierarchy. Consequently, the extended t-SNE can project the high-dimensional, mixed data to a low-dimensional space with topological order which reflects user's intuition.
Keywords:Information technology  Dimensionality reduction  Categorical data  Mixed-type data  Distance hierarchy
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号