首页 | 本学科首页   官方微博 | 高级检索  
     


Minimizer of the Reconstruction Error for multi-class document categorization
Affiliation:1. Research Program of Applied Mathematics and Computations, Mexican Petroleum Institute;2. Graduate Programs on Computer Sciences Tecnologico de Monterrey, Campus Estado de México;1. College of Biomedical Engineering and Instrument Science, Zhejiang University, 310008 Zhou Yiqing Building 510, Zheda road 38#, Hangzhou, Zhejiang, China;2. Department of Information and Communication Engineering, University of Murcia, Spain;1. Innovative Information Industry Research Center, School of Computer Science and Technology, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;2. Information and Communications Research Laboratories, ITRI, Hsinchu, Taiwan, ROC;3. CyLab, Carnegie Mellon University, Pittsburgh, USA;4. Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, ROC;1. Department of Industrial Engineering and Management, National Chin-Yi University of Technology, 57, Section 2, Zhongshan, Taiping District, Taichung 41170, Taiwan, ROC;2. Department of Industrial Design, National United University, 1, Lienda, Miaoli 36003, Taiwan, ROC;3. Department of Innovative Living Design, Overseas Chinese University, 100, Chiao Kwang Rd., Taichung 40721, Taiwan, ROC;1. Datameer, USA;2. Faculty of Computer Science, Otto-von-Guericke University, Magdeburg, Germany;3. University of Eichstätt-Ingolstadt, Germany
Abstract:In the present article we introduce and validate an approach for single-label multi-class document categorization based on text content features. The introduced approach uses the statistical property of Principal Component Analysis, which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. Such matrix transforms the original set of training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum reconstruction error. The proposed method, called Minimizer of the Reconstruction Error (mRE) classifier, uses this property, and extends and applies it to new unseen test documents. Several experiments on four multi-class datasets for text categorization are conducted in order to test the stable and generally better performance of the proposed approach in comparison with other popular classification methods.
Keywords:Document categorization  Text mining  Dimensionality reduction  Principal Component Analysis
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号