首页 | 官方网站   微博 | 高级检索  
     


An ontology‐based framework for automatic topic detection in multilingual environments
Authors:Karel Gutiérrez‐Batista  Jesús R Campaña  Maria‐Amparo Vila  Maria J Martin‐Bautista
Affiliation:Department of Computer Science and Artificial Intelligence, ETSIIT – University of Granada, Granada, Spain
Abstract:The detection of topics from large textual data volumes is currently a research area, which has many applications in the development of computational systems. A proposed solution for the detection of topics in data mining is the application of clustering methods. This paper presents the application of a new ontology‐based methodology for the automatic topic detection without any previous information based on the use of hierarchical clustering algorithms and a multilingual knowledge base. The approach also includes lexical resources that allow us to enrich the semantics of the analyzed texts. The novelty of this approach consists of the dimensionality reduction of the terms present in the texts by using ontology and the introduction of a method for the creation of a term weight matrix for use in clustering algorithms. With this approach, it is possible to improve automatic topic detection in documents. The proposed methodology was assessed with four datasets (two of them in English and two in Spanish).
Keywords:multilingual topic detection  ontologies  text clustering  text Mining
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号