首页 | 本学科首页   官方微博 | 高级检索  
     


Text classification using genetic algorithm oriented latent semantic features
Affiliation:1. Grup de Recerca en Sistemes Intel·ligents, Ramon Llull University, Quatre Camins 2, 08022 Barcelona, Spain;2. Grup de Recerca en Internet Technologies & Storage, Ramon Llull University, Quatre Camins 2, 08022 Barcelona, Spain;3. Departamento de Ingeniería Matemática e Informática, Universidad Pública de Navarra, Campus de Arrosadía, 31006 Pamplona, Spain;1. Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, Niš, Serbia;2. Faculty of Mechanical Engineering, University of Niš, Aleksandra Medvedeva 14, Niš, Serbia;1. University of Cauca, Cll. 5 4-70 Popayán, Colombia;2. Universidad Carlos III de Madrid, Av. Universidad 30, 28911 Leganés, Spain;3. University of East London, Docklands Campus, London E16 2RD, United Kingdom;1. School of Management, Hefei University of Technology, Hefei 230009, PR China;2. Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei 230009, PR China;3. School of Electric Engineering and Automation, Hefei University of Technology, Hefei 230009, PR China
Abstract:In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions.
Keywords:Feature selection  Genetic algorithm  Latent semantic indexing  Text classification
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号