首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于名词的微博语义计算方法
引用本文:时睿,封化民.一种基于名词的微博语义计算方法[J].北京电子科技学院学报,2011,19(4):16-22,29.
作者姓名:时睿  封化民
作者单位:西安电子科技大学、通信工程学院陕西,西安中国,710071
基金项目:国家自然基金项目“基于多模态特征的多媒体语义分析关键理论与技术研究(No.60972139)”和北京市自然科学基金项目“基于网络多媒体信息语义的网络舆情分析研究(No.4092041)”的资助;在此表示感谢
摘    要:微博具有传播快、数量大、语言简练等特点,对舆情分析提出了更高要求。从微博短文本中提取特征用来计算相似度时,现有的字符串匹配方法在语义分析方面存在局限性。因此本文从语义角度提出一种基于名词语义的微博相似度算法。该算法将名词集合作为微博特征,利用《知网》词典树状结构,计算得到微博短文本间的相似度。中等规模微博数据集实验表明,本文提出的算法能够准确判断微博主题含义,同一类别微博相似度90%以上分布在0.6-1.0之间,可为后续微博聚类服务。

关 键 词:微博短文本  语义相似度  名词

A Noun--based Micro Blog Semantic Algorithm
Rui Shi Huamin Feng, .School of Telecommunication Engineering,Xidian University,Xi'an,China.A Noun--based Micro Blog Semantic Algorithm[J].Journal of Beijing Electronic Science & Technology Institute,2011,19(4):16-22,29.
Authors:Rui Shi Huamin Feng  School of Telecommunication Engineering  Xidian University  Xi'an  China
Affiliation:Rui Shi1 Huamin Feng1,2 1.School of Telecommunication Engineering,Xidian University,Xi'an710071,China 2.Beijing Electronic Science and Technology Institution,Beijing 100070,P.R.China
Abstract:To capture micro blogs' features for public opinion analysis, common metnocls using string matching have their limits in semantic analysis. In this paper, an algorithm that computes micro blogs' similarity based on nouns' semantics is proposed. The algorithm uses set of nouns as micro blog's feature, and computes the similarities of the short texts of micro blogs based on the tree structure of "HowNet" dictionary. The experimental result on a middle size of micro blogs shows that the algorithm can get the subject accurately, and more than 90% of the similarities of micro blogs from the same class are distribution in O. 6 to 1, which can help the work of clustering.
Keywords:Keyword  micro blog  semantic similarity  noun
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号