首页 | 本学科首页   官方微博 | 高级检索  
     

一种改进的社交媒体文本规范化方法
引用本文:宋亚军,于中华,陈 黎,丁革建,罗 谦.一种改进的社交媒体文本规范化方法[J].中文信息学报,2015,29(5):104-112.
作者姓名:宋亚军  于中华  陈 黎  丁革建  罗 谦
作者单位:1.四川大学 计算机学院,四川 成都 610065;
2. 浙江师范大学 数理与信息工程学院,浙江 金华 321004;
3. 中国民用航空总局第二研究所信息技术分公司,四川 成都 610042
基金项目:浙江省自然科学基金(LY12F02010);四川省科学支撑项目(2014GZ0063)
摘    要:社交媒体具有文本不规范的特点,现有自然语言处理工具直接应用于社交媒体文本时效果不甚理想,并且基于

关 键 词:社交媒体  文本规范化  自然语言处理  词嵌入  

An Improving Method for Social Media Text Normalization
SONG Yajun,YU Zhonghua,CHEN Li,DING Gejian,LUO Qian.An Improving Method for Social Media Text Normalization[J].Journal of Chinese Information Processing,2015,29(5):104-112.
Authors:SONG Yajun  YU Zhonghua  CHEN Li  DING Gejian  LUO Qian
Affiliation:1. College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China;
2. Colledge of Mathematics, Physics and Information Engineering,
Zhejiang Normal University, Jinhua, Zhejiang 321004, China;
3. Information Technology Branch, the Second Research Institute of General Administration
of Civil Aviation of China, Chengdu, Sichuang 610042, China
Abstract:The informal style of social media texts challenges many natural language processing tools, including many keyword-based methods proposed for social media textTherefore, the normalization of the social media text is indispensable. Based on the assumption of context similarity between the lexical variants, we proposed an improved graph-based social media text normalization method by introducing word embedding model to better capture the context similarity. As an unsupervised and language independent method, it can be used to process large-scale social media texts of various languages. Experimental results show that the proposed method outperforms the of previous methods with the best F-score.
Keywords:social media  text normalization  natural language process  word embedding  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号