A synergistic strategy for combining thesaurus-based and corpus-based approaches in building ontology for multilingual search engines |
| |
Affiliation: | 1. The American College of Greece;2. College of Computer and Information Sciences;3. Instituto Politécnico Nacional;4. Department of Applied Informatics;5. University of Science and Technology of China (USTC);6. King Abdulaziz University;1. Center of Excellence in Information Assurance (CoEIA), King Saud University (KSU), Riyadh, Saudi Arabia;2. Center of Excellence in Information Assurance (CoEIA), College of Computer and Information Sciences (CCIS), King Saud University (KSU), Riyadh, Saudi Arabia;3. College of Computer and Information Sciences (CCIS), King Saud University (KSU), Riyadh, Saudi Arabia;4. Intelligent Systems Group (ISG), Department of Computing, Macquarie University, NSW 2109, Australia;1. School of Business, Anhui University, 230039 Hefei, China;2. School of Computer Science and Technology, University of Science and Technology of China, 230027 Hefei, China |
| |
Abstract: | In this article we illustrate a methodology for building cross-language search engine. A synergistic approach between thesaurus-based approach and corpus-based approach is proposed. First, a bilingual ontology thesaurus is designed with respect to two languages: English and Spanish, where a simple bilingual listing of terms, phrases, concepts, and subconcepts is built. Second, term vector translation is used – a statistical multilingual text retrieval techniques that maps statistical information about term use between languages (Ontology co-learning). These techniques map sets of t f id f term weights from one language to another. We also applied a query translation method to retrieve multilingual documents with an expansion technique for phrasal translation. Finally, we present our findings. |
| |
Keywords: | Multi-language search engines Cross-language search engines Ontologies Social Networks Ontology co-learning |
本文献已被 ScienceDirect 等数据库收录! |
|