首页 | 本学科首页   官方微博 | 高级检索  
     


Using latent semantic indexing for multilanguage information retrieval
Authors:Michael W. Berry and Paul G. Young
Affiliation:(1) Department of Computer Science, University of Tennessee, 107 Ayres Hall, 3796-1301 Knoxville, TN, USA
Abstract:
In this paper, a method for indexing cross-language databases for conceptual query matching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one language to the identical documents in the other language. The proposed merging strategy duplicates less than 7% of the entire database (made up of different translations of the Gospels). Previous strategies duplicated up to 34% of the initial database in order to perform the merger. The proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Indexing (LSI) is employed. Using the proposed merge strategies, LSI is shown to be effective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An effective Bible search product needs to allow the use of natural language for searching (queries). LSI enables the user to form queries with using natural expressions in the user's own native language. The merging strategy proposed in this study enables LSI to retrieve relevant documents effectively using a minimum of the database in a foreign language.Michael W. Berry is an Assistant Professor in the Department of Computer Science at the University of Tennessee, Knoxville. He recieved a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1990, and an M.S. in Applied Mathematics from North Carolina State University at Raleigh in 1983. His current interests include scientific computing, parallel algorithms, information retrieval applications, and computer performance evaluation. He is a member of the ACM, SIAM, and the IEEE Computer Society.Paul G. Young is now employed as an Associate Consultant with Oracle Government Services in Knoxville, TN. In 1984 he graduated from the Gordon-Conwell Theological Seminary in S. Hamilton, MA and became an Ordained Presbyterian Minister (PCUSA). He later received an M.S. in Computer Science from the University of Tennessee in 1994.
Keywords:Bible  English  Gospels  Greek  Hebrew  information retrieval  latent semantic indexing  singular value decomposition
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号