首页 | 本学科首页   官方微博 | 高级检索  
     


Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis
Affiliation:1. Department of Computer Science, Kitami Institute of Technology, Japan;2. Graduate School of Information Science and Technology, Hokkaido University, Japan;3. Department of Electronics and Information Engineering, Faculty of Engineering, Hokkai-Gakuen University, Japan;1. Unit of Clinical Pharmacology, L. Sacco University Hospital, Milan, Italy;2. Department of Infectious Diseases, L. Sacco University Hospital, Milan, Italy;3. Clinical Pharmacology Unit, CNR Institute of Neuroscience, Dept. Biomedical and Clinical Sciences, L. Sacco University Hospital, Università di Milano, Milan, Italy;4. Scientific Institute IRCCS E.Medea, 23842 Bosisio Parini, Italy;5. Università degli Studi di Milano, Milan, Italy;1. Department of Mechanical Engineering, Guangdong College of Industry & Commerce, Guangzhou 510510, China;2. Department of Materials Science, Fudan University, Shanghai 200433, China;3. Department of Materials Science & Engineering, Peking University, Beijing 100871, China;1. Division of Urology, Seisyukai Clinic, Fukuoka, Japan;2. Department of Urology, Harasanshin Hospital, Fukuoka, Japan;3. Department of Radiology and Radiation Oncology, Kitasato University School of Medicine, Kanagawa, Japan;4. Department of Radiation Oncology, National Hospital Organization Tokyo Medical Center, Tokyo, Japan;5. Department of Urology, National Hospital Organization Tokyo Medical Center, Tokyo, Japan;6. Translational Research Center for Medical Innovation, Foundation for Biomedical Research and Innovation at Kobe, Hyogo, Japan;7. Department of Radiation Oncology, Hamamatsu University School of Medicine, Shizuoka, Japan;8. Department of Radiology, Kyoundo Hospital, Tokyo, Japan
Abstract:This paper presents our research on automatic annotation of a five-billion-word corpus of Japanese blogs with information on affect and sentiment. We first perform a study in emotion blog corpora to discover that there has been no large scale emotion corpus available for the Japanese language. We choose the largest blog corpus for the language and annotate it with the use of two systems for affect analysis: ML-Ask for word- and sentence-level affect analysis and CAO for detailed analysis of emoticons. The annotated information includes affective features like sentence subjectivity (emotive/non-emotive) or emotion classes (joy, sadness, etc.), useful in affect analysis. The annotations are also generalized on a two-dimensional model of affect to obtain information on sentence valence (positive/negative), useful in sentiment analysis. The annotations are evaluated in several ways. Firstly, on a test set of a thousand sentences extracted randomly and evaluated by over forty respondents. Secondly, the statistics of annotations are compared to other existing emotion blog corpora. Finally, the corpus is applied in several tasks, such as generation of emotion object ontology or retrieval of emotional and moral consequences of actions.
Keywords:Emotion corpora  Corpus annotation  Sentiment analysis  Affect analysis
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号