首页 | 本学科首页   官方微博 | 高级检索  
     


Semantic similarity based approach for reducing Arabic texts dimensionality
Authors:Arafat Awajan
Affiliation:1.The King Hussein Faculty of Computing Sciences,Princess Sumaya University for Technology,Amman,Jordan
Abstract:An efficient method is introduced to represent large Arabic texts in comparatively smaller size without losing significant information. The proposed method uses the distributional semantics to build the word-context matrix representing the distribution of words across contexts and to transform the text into a vector space model (VSM) representation based on word semantic similarity. The linguistic features of the Arabic language, in addition to the semantic information extracted from different lexical-semantic resources such as Arabic WordNet and named entities’ gazetteers are used to improve the text representation and to create word clusters of similar and related words. Distributional similarity measures have been used to capture the words’ semantic similarity and to create clusters of similar words. The conducted experiments have shown that the proposed method significantly reduces the size of text representation by about 27 % compared with the stem-based VSM and by about 50 % compared with the traditional bag-of-words model. Their results have shown that the amount of dimension reduction depends on the size and shape of the windows of analysis as well as on the content of the text.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号