首页 | 本学科首页   官方微博 | 高级检索  
     


A Latent Semantic Indexing-based approach to multilingual document clustering
Authors:Chih-Ping  Christopher C  Chia-Min  
Affiliation:aInstitute of Technology Management, College of Technology Management, National Tsing Hua University, Hsinchu, Taiwan, ROC;bDepartment of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong;cOpenfind Information Technology Inc., 4F, No. 222, Sec. 2, Nan-Chang Rd., Taipei, Taiwan, ROC
Abstract:The creation and deployment of knowledge repositories for managing, sharing, and reusing tacit knowledge within an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge repository, knowledge maps, often created by document clustering techniques, represent an appealing and promising approach. Various document clustering techniques have been proposed in the literature, but most deal with monolingual documents (i.e., written in the same language). However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual document clustering (MLDC) to create organizational knowledge maps. Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents. The empirical evaluation results show that the proposed LSI-based MLDC technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision, and is capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus.
Keywords:Document management  Text mining  Document clustering  Multilingual document clustering  Multilingual knowledge management  Latent Semantic Indexing
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号