首页 | 本学科首页   官方微博 | 高级检索  
     


An automatically constructed thesaurus for neural network based document categorization
Authors:Cheng Hua Li  Wei Song  Soon Cheol Park
Affiliation:1. Bordeaux INP, ICMCB, CNRS, University of Bordeaux, UPMR 5026, F-33608 Pessac, France;2. UMR 8207 - UMET - Unité Matériaux et Transformations, University of Lille, CNRS, INRA, ENSCL, F-59000 Lille, France;3. Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, United States
Abstract:This paper presents a method for computing a thesaurus from a text corpus, and combined with a revised back-propagation neural network (BPNN) learning algorithm for document categorization. Automatically constructed thesaurus is a data structure that accomplished by extracting the relatedness between words. Neural network is one of the efficient approaches for document categorization. However the conventional BPNN has the problems of slow learning and easy to involve into the local minimum. We use a revised algorithm to improve the conventional BPNN that can overcome these problems. A well constructed thesaurus has been recognized as valuable tool in the effective operation of document categorization, it overcome some problem for the document categorization based on bag of words which ignored the relationship between words. To investigate the effectiveness of our method, we conducted the experiments on the standard Reuter-21578. The experimental results show that the proposed model was able to achieve higher categorization effectiveness as measured by the precision, recall and F-measure.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号