An Intelligent Information System for Organizing Online Text Documents |
| |
Authors: | Email author" target="_blank">Han-joon?KimEmail author Sang-goo?Lee |
| |
Affiliation: | (1) Department of Electrical and Computer Engineering, The University of Seoul, 90 Jeonnong-dong, Dongdaemun-gu, 130-743 Seoul, Korea;(2) School of Computer Science and Engineering, Seoul National University, Seoul, Korea |
| |
Abstract: | This paper describes an intelligent information system for effectively managing huge amounts of online text documents (such
as Web documents) in a hierarchical manner. The organizational capabilities of this system are able to evolve semi-automatically
with minimal human input. The system starts with an initial taxonomy in which documents are automatically categorized, and
then evolves so as to provide a good indexing service as the document collection grows or its usage changes. To this end,
we propose a series of algorithms that utilize text-mining technologies such as document clustering, document categorization,
and hierarchy reorganization. In particular, clustering and categorization algorithms have been intensively studied in order
to provide evolving facilities for hierarchical structures and categorization criteria. Through experiments using the Reuters-21578
document collection, we evaluate the performance of the proposed clustering and categorization methods by comparing them to
those of well-known conventional methods. |
| |
Keywords: | Document categorization Document clustering Fuzzy relations Hierarchical agglomerative clustering Information systems Na?ve Bayes Topic hierarchy |
本文献已被 SpringerLink 等数据库收录! |
|