首页 | 本学科首页   官方微博 | 高级检索  
     

网络舆情监控中新词识别问题的研究
引用本文:唐籍涛,李飞,郭昌松. 网络舆情监控中新词识别问题的研究[J]. 微机发展, 2012, 0(1): 119-121,125
作者姓名:唐籍涛  李飞  郭昌松
作者单位:[1]成都信息工程学院计算机系,四川成都610225 [2]成都信息工程学院网络工程系,四川成都610225
基金项目:四川省教育科研项目(川教函[2011]210号)
摘    要:在网络舆情监控中,由于事件的突发性和网络词汇的泛滥,各种各样的新兴词汇以及新的字符串大量涌现,而有穷的分词词典对新词的识别基本上无能为力,这些无法识别的字符串将被现有的分词系统分为零散的碎片,这将极大地影响热点词和主题词提取的准确性,成为网络舆情监控系统性能提升的瓶颈。文中分析了当前主要的几种分词技术的优缺点,利用网络舆情监控中未被词典收录的主题词的局部高频这一特性,通过计算异常分词与周围分词之间的粘结度,从而识别出未被词典收录的主题词。实验结果表明:所提出的分词算法能识别出未被词典收录的主题词,相比传统的分词算法,更加适合于网络舆情监控。

关 键 词:网络舆情监控  新词识别  分词词典

Research of New Word Pattern Recognization in Network Monitoring Public Opinion
TANG Ji-tao,LI Fei,GUO Chang-song. Research of New Word Pattern Recognization in Network Monitoring Public Opinion[J]. Microcomputer Development, 2012, 0(1): 119-121,125
Authors:TANG Ji-tao  LI Fei  GUO Chang-song
Affiliation:1. Department of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China; 2. Department of Network Engineering, Chengdu University of Information Technology, Chengdu 610225, China )
Abstract:With rapid development and deepen evolution of internet public opinion in the internet, a variety of new vocabulary and new string comes out due to the sudden of matters and the high frequence of new words occur on network, therefore, the current method of sub -dictionary has no effect on them in a large extent. The most important and most deadly is that those rare appear strings are divided into scattered fragments by the existing segmentation system, which will greatly affect the accuracy in extracting out the hot words and the keywords. Know that the situation will become the bottleneck of improving performance in network monitoring system. It analyzes the major advantages and disadvantages of several word segmentation and draw out the characteristics ,using the local high-frequency of the keyword not included into dictionary in the monitoring public opinion, then calculating the anomalous bond between the abnormal words and its around words, finally, to identify the keywords not edit. The experiment shows : compared to the traditional segmentation algo- rithm, this segmentation algorithm can identify the keywords better and is more suitable for network monitoring public opinion.
Keywords:network monitoring public opinion  new word pattem recognization  dictionary
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号