An ontology‐based framework for automatic topic detection in multilingual environments |
| |
Authors: | Karel Gutiérrez‐Batista Jesús R Campaña Maria‐Amparo Vila Maria J Martin‐Bautista |
| |
Affiliation: | Department of Computer Science and Artificial Intelligence, ETSIIT – University of Granada, Granada, Spain |
| |
Abstract: | The detection of topics from large textual data volumes is currently a research area, which has many applications in the development of computational systems. A proposed solution for the detection of topics in data mining is the application of clustering methods. This paper presents the application of a new ontology‐based methodology for the automatic topic detection without any previous information based on the use of hierarchical clustering algorithms and a multilingual knowledge base. The approach also includes lexical resources that allow us to enrich the semantics of the analyzed texts. The novelty of this approach consists of the dimensionality reduction of the terms present in the texts by using ontology and the introduction of a method for the creation of a term weight matrix for use in clustering algorithms. With this approach, it is possible to improve automatic topic detection in documents. The proposed methodology was assessed with four datasets (two of them in English and two in Spanish). |
| |
Keywords: | multilingual topic detection ontologies text clustering text Mining |
|