首页 | 本学科首页   官方微博 | 高级检索  
     


Temporal contexts: Effective text classification in evolving document collections
Authors:Leonardo Rocha,Fernando Mourã  o,Hilton Mota,Thiago Salles,Marcos André   Gonç  alves,Wagner Meira Jr.
Affiliation:1. Federal University of São João del-Rei, Computer Science Department—São João del-Rei, Brazil;2. Federal University of Minas Gerais, Computer Science Department—Belo Horizonte, Brazil;3. Federal University of Minas Gerais, Electrical Engineering Department—Belo Horizonte, Brazil
Abstract:The management of a huge and growing amount of information available nowadays makes Automatic Document Classification (ADC), besides crucial, a very challenging task. Furthermore, the dynamics inherent to classification problems, mainly on the Web, make this task even more challenging. Despite this fact, the actual impact of such temporal evolution on ADC is still poorly understood in the literature. In this context, this work concerns to evaluate, characterize and exploit the temporal evolution to improve ADC techniques. As first contribution we highlight the proposal of a pragmatical methodology for evaluating the temporal evolution in ADC domains. Through this methodology, we can identify measurable factors associated to ADC models degradation over time. Going a step further, based on such analyzes, we propose effective and efficient strategies to make current techniques more robust to natural shifts over time. We present a strategy, named temporal context selection, for selecting portions of the training set that minimize those factors. Our second contribution consists of proposing a general algorithm, called Chronos, for determining such contexts. By instantiating Chronos, we are able to reduce uncertainty and improve the overall classification accuracy. Empirical evaluations of heuristic instantiations of the algorithm, named WindowsChronos and FilterChronos, on two real document collections demonstrate the usefulness of our proposal. Comparing them against state-of-the-art ADC algorithms shows that selecting temporal contexts allows improvements on the classification accuracy up to 10%. Finally, we highlight the applicability and the generality of our proposal in practice, pointing out this study as a promising research direction.
Keywords:Classification   Text mining   Temporal evolution
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号