首页 | 本学科首页   官方微博 | 高级检索  
     


SAMAR: Subjectivity and sentiment analysis for Arabic social media
Affiliation:1. Department of Linguistics, Indiana University, 1021 E 3rd. St., Bloomington, IN 47405, USA;2. School of Library and Information Science, 1320 East 10th Street, Bloomington, IN 47405, USA;3. Department of Computer Science, School of Engineering & Applied Science, The George Washington University, Washington, DC, USA;1. Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia;2. Faculty of Computer Sciences and Information Technology, Taiz University, Taiz, Yemen;1. Jordan University of Science and Technology, Irbid, Jordan;2. Université de Lyon, CNRS, UMR 5516, Laboratoire Hubert-Curien, Saint-Étienne, France;3. National Institute of Technology Kurukshetra, India
Abstract:SAMAR is a system for subjectivity and sentiment analysis (SSA) for Arabic social media genres. Arabic is a morphologically rich language, which presents significant complexities for standard approaches to building SSA systems designed for the English language. Apart from the difficulties presented by the social media genres processing, the Arabic language inherently has a high number of variable word forms leading to data sparsity. In this context, we address the following 4 pertinent issues: how to best represent lexical information; whether standard features used for English are useful for Arabic; how to handle Arabic dialects; and, whether genre specific features have a measurable impact on performance. Our results show that using either lemma or lexeme information is helpful, as well as using the two part of speech tagsets (RTS and ERTS). However, the results show that we need individualized solutions for each genre and task, but that lemmatization and the ERTS POS tagset are present in a majority of the settings.
Keywords:Subjectivity and sentiment analysis  Morphologically rich language  Arabic  Social media data
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号