首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study addresses the problem of Chinese microblog opinion retrieval, which aims to retrieve opinionated Chinese microblog posts relevant to a target specified by a user query. Existing studies have shown that lexicon-based approaches employed online public sentiment resources to rank sentimentwords relying on the document features. However, this approach could not be effectively applied to microblogs that have typical user-generated content with valuable contextual information: “user–user” interpersonal interactions and “user–post/comment” intrapersonal interactions. This contextual information is very helpful in estimating the strength of sentiment words more accurately. In this study, we integrate the social contextual relationships among users, posts/comments, and sentiment words into a mutual reinforcement model and propose a unified three-layer heterogeneous graph, on which a random walk sentiment word weighting algorithm is presented to measure the strength of opinion of the sentiment words. Furthermore, the weights of sentiment words are incorporated into a lexicon-based model for Chinese microblog opinion retrieval. Comparative experiments are conducted on a Chinese microblog corpus, and the results show that our proposed mutual reinforcement model achieves significant improvement over previous methods.  相似文献   

2.
微博用户利用标签信息表征其兴趣及属性,通过分析微博用户标签特点以及现有微博推荐方法的局限性,提出一种改进的基于多标签语义关联关系的微博兴趣建模方法。为了解决现有加标方法忽略了语义关联及多标签间关联的问题,首先通过计算标签对在微博用户集合中的共现频率得到标签对语义内联关系;其次构建由标签对连接词组成的路径,通过共享熵进一步计算标签对语义外联关系;最后将两者结合得到标签对语义关联关系矩阵,由此来对用户 标签矩阵进行更新,得到基于多标签语义关联关系的微博用户兴趣模型。以新浪微博公开API抓取的大量微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文构建的用户兴趣模型具有较好的性能。  相似文献   

3.
Sponsored blog posts need to disclose sponsorship information, specifically whether the blogger received any compensation for the posts. While some bloggers simply include sponsorship information only (i.e., “simple” sponsorship disclosure), others add a note that the opinions in the post are honest although it is a sponsored post (i.e., “honest opinions” sponsorship disclosure). This study examines how emphasizing “honest opinions” in sponsored posts affect consumers' responses. This study found that, compared to the no disclosure (control) condition, source credibility perceptions and message attitudes became negative in the “simple” sponsorship condition. However, the negative effects of sponsorship disclosure on source credibility perceptions and message attitudes disappeared in the “honest opinions” condition. This trend was stronger among those who had high skepticism toward product review blog posts.  相似文献   

4.
仲兆满  管燕  胡云  李存华 《软件学报》2017,28(2):278-291
微博用户兴趣挖掘是个性化推荐、社群划分的基础工作.在深入分析微博网络特点的基础上,给出了能够揭示微博网络多模性的描述模型,对面向微博网络的后续研究具有参考价值.根据微博网络的特点,提出了基于背景的用户静态兴趣表示及挖掘方法,以及基于微博的用户动态兴趣表示和挖掘方法.针对微博网络中缺少背景信息、发表微博很少的大量不活跃用户,提出了基于关注的用户兴趣挖掘方法.以新浪微博为例,选取了时尚、企业管理、教育、军事、文化这5个领域进行用户兴趣挖掘及相似度计算的实验分析和比较,结果表明,与主流的兴趣挖掘方法相比,该微博用户兴趣的表示和挖掘方法可以有效地改善微博用户兴趣挖掘的效果.  相似文献   

5.
Since classical public sentiment analysis systems for microblog are based on the text sentiment analysis, it is difficult to determine the sentiment of short text without clear sentiment words in microblog posts. Fortunately, a lot of microblog posts contain images which also represent users’ sentiment. To fully understand users’ sentiment, we propose a cross-media public sentiment analysis system for microblog. The best advantage of this novel system is the unified cross-media public sentiment analysis framework which fuses the text sentiment and image sentiment not only from sentiment results, but also from sentiment ontology. To enhance presentation effects, this system presents sentiment results from macroscopic view and microscopic view which details the sentiment results in region, topic, microblog content and user diffusion. In our knowledge, this is the first unified cross-media public sentiment analysis system.  相似文献   

6.
提出了基于传统的PageRank算法的改进模型评估微博社区博主的影响力。微博社区中博主的影响力反映其话语权的大小,是研究微博社区的核心概念之一。通过对平均度、聚类系数和平均路径长度等网络特征指标的统计,验证了微博社区网络具有"小世界"的显著特性。从用户活跃度和博文质量两个角度出发,构建了博主影响力的评价指标,引入了博主传播能力这个因子,利用PageRank算法的思想设计了新的影响力排名(Influence Rank)算法模型来评估博主影响力。通过实验对比发现Influence Rank算法在考虑节点间的关系之外还考虑了节点本身的特性,能够更加准确客观地反映博主的影响力排名。  相似文献   

7.
微博信息溯源通过分析在平台采集的话题数据集,挖掘相关话题的真正源头,即发布时间较早且影响力大的微博集合,实现网络舆论的管控与引导。提出一种基于用户兴趣的微博溯源算法,该算法根据博主的兴趣计算博主影响力,同时根据评论人、转发人的兴趣计算评论人、转发人的影响力,结合博主关注度和发表时间等因素,利用网页排序算法对微博评分,根据微博得分进行排序溯源。实验结果表明,该算法相较于传统溯源算法在查全率上提升了约21%。  相似文献   

8.
A blog, or weblog, is an online diary whose writer is known as a blogger. Many bloggers choose to publish anonymously. This paper examines whether a blog by an anonymous blogger will be perceived as being any more or less credible than one by an identifiable blogger. Two studies were conducted in the UK to examine this, with one of the two studies being replicated in Malaysia. The first study presented respondents with a blog entry in one of three conditions: where the blogger was fully identifiable with a photograph, where only the age and sex of the blogger were revealed, and where only an alias was given for the blogger. Multi item constructs were used to measure the credibility of the blog and the blogger. No differences were found. Study 2 examined whether this was due to the presentation of the blog entry. This time respondents were shown one of two blog posts which conveyed exactly the same information and revealed exactly the same information about the blogger. One post introduced a number of spelling/grammar/punctuation errors. Results show that the well presented blog’s writer was perceived as being more credible than the writer of the badly presented blog, but there was no difference in the credibility of the blog itself. The implications of the results are discussed with reference to the use of blogs as a knowledge sharing tool.  相似文献   

9.
针对微博语言口语化和不规范导致微博数据质量低下的问题,利用质心、度-中心值和特征向量-中心值3种算法对微博话题数据进行净化,从而提高数据质量.通过比较净化前后话题帖子的规范性、相关性和有益性等属性指标分析算法性能.实验结果表明,经过3种净化算法处理,话题帖子的整体质量尤其是规范性指标均有所提高,质心算法对于有益性指标有较好的净化效果,度-中心值和特征向量-中心值算法有助于得到强相似度的话题帖子.  相似文献   

10.
基于滑动窗口的微博时间线摘要算法   总被引:1,自引:0,他引:1  
时间线摘要是在时间维度上对文本进行内容归纳和概要生成的技术。传统的时间线摘要主要研究诸如新闻之类的长文本,而本文研究微博短文本的时间线摘要问题。由于微博短文本内容特征有限,无法仅依靠文本内容生成摘要,本文采用内容覆盖性、时间分布性和传播影响力3种指标评价时间线摘要,并提出了基于滑动窗口的微博时间线摘要算法(Microblog timeline summariaztion based on sliding window, MTSW)。该算法首先利用词项强度和熵来确定代表性词项;然后基于上述3种指标构建出评价时间线摘要的综合评价指标;最后采用滑动窗口的方法,遍历时间轴上的微博消息序列,生成微博时间线摘要。利用真实微博数据集的实验结果表明,MTSW算法生成的时间线摘要可以有效地反映热点事件发展演化的过程。  相似文献   

11.
微博案件观点所涉方面的自动识别是了解互联网社交媒体新闻舆情的重要手段,但由于微博文本形式和内容均灵活多变,传统的方面识别方法通常只利用单一的正文或评论,使微博语义理解非常有限。针对涉案微博文本的方面识别问题开展研究,提出一种基于正文和评论交互注意的案件方面识别方法,通过融合社交媒体的上下文信息,实现对案件观点所涉方面的识别。首先基于Transformer框架对正文和评论分别进行编码;然后基于交互注意力机制,实现正文信息和评论信息的融合,并基于融合后的特征实现对评论文本案件方面的识别;最后基于12个案件构建的微博数据集进行实验,实验结果表明,采用交互注意力机制融合微博正文信息和评论信息可以显著提升案件方面识别的准确率,证明了所提方法的有效性。  相似文献   

12.
This paper introduces a two-layered framework that improves the result of authorship identification within larger sample numbers of bloggers as compared with earlier work. Previous studies are mainly divided into two categories: profile-based and instance-based methods. Each of these approaches has its advantages and limitations. The two-layered framework presented here integrates the two previous approaches and presents a new solution to a key problem in authorship identification, namely the drop in accuracy experienced as the number of authors increases. The paper begins by illustrating the regular instance-based core model and the investigated features. It then introduces a new psycholinguistic profile representation of authors, presents similarity grouping extraction over profiles, and applies blogger identification utilizing the two-layered approach. The results confirm the improvement introduced by the proposed two-layered approach against our regular classifier, as well as a selected baseline, for an extended number of users.  相似文献   

13.
Being aware of local community information is critical to maintaining civic engagement and participation. The use of online news and microblog content to create and disseminate community information has long been studied. However, interactions in the online spaces dedicated to local communities tend to only garner very limited usage, and people often do not consider microblog content as a meaningful source of local community information. Local News Chatter (LNC) was designed to address these challenges by augmenting local news feeds with microblog content and presenting them in a tag cloud that displays news topics of varying popularity with different tag sizes. Our study with 30 local residents highlights that LNC increases the visibility of hyperlocal community news information and successfully utilizes microblog as an additional information layer. LNC also increases one’s community awareness and shows the potential for leveraging community knowledge as a deliberation platform for local topics.  相似文献   

14.
Grammar induction, also known as grammar inference, is one of the most important research areas in the domain of natural language processing. Availability of large corpora has encouraged many researchers to use statistical methods for grammar induction. This problem can be divided into three different categories of supervised, semi-supervised, and unsupervised, based on type of the required data set for the training phase. Most current inductive methods are supervised, which need a bracketed data set for their training phase; but the lack of this kind of data set in many languages, encouraged us to focus on unsupervised approaches. Here, we introduce a novel approach, which we call history-based inside-outside (HIO), for unsupervised grammar inference, by using part-of-speech tag sequences as the only source of lexical information. HIO is an extension of the inside-outside algorithm enriched by using some notions of history based approaches. Our experiments on English and Persian languages show that by adding some conditions to the rule assumptions of the induced grammar, one can achieve acceptable improvement in the quality of the output grammar.  相似文献   

15.
We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets with partially overlapping thematic content (e.g., aligned Wikipedia articles in two different languages). To this end, we develop a new bilingual probabilistic topic model called comparable bilingual latent Dirichlet allocation (C-BiLDA), which is able to deal with such comparable data, and, unlike the standard bilingual LDA model (BiLDA), does not assume the availability of document pairs with identical topic distributions. We present a full overview of C-BiLDA, and show its utility in the task of cross-lingual knowledge transfer for multi-class document classification on two benchmarking datasets for three language pairs. The proposed model outperforms the baseline LDA model, as well as the standard BiLDA model and two standard low-rank approximation methods (CL-LSI and CL-KCCA) used in previous work on this task.  相似文献   

16.
向微博用户推荐对其有价值和感兴趣的内容,是改善用户体验的重要途径。通过分析微博的特点以及现有微博推荐算法的缺陷,利用标签信息表征用户兴趣,提出一种基于标签概率相关性的微博推荐方法 LPCMR。首先,该方法利用标签之间的概率相关性,构造标签相似性矩阵。然后通过相关性标签权重加权方案,加强标签权重,构建用户-标签矩阵。针对用户标签矩阵稀疏的问题,采用标签相似性矩阵对用户-标签矩阵进行更新,使该矩阵既包含用户兴趣信息,又包含标签与标签之间的关系。以新浪微博公开API抓取的微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文提出的推荐算法具有较好的效果。  相似文献   

17.
The massive acceptance and usage of the blog communities by a significant portion of the Web users has rendered knowledge extraction from blogs a particularly important research field. One of the most interesting related problems is the issue of the opinionated retrieval, that is, the retrieval of blog entries which contain opinions about a topic. There has been a remarkable amount of work towards the improvement of the effectiveness of the opinion retrieval systems. The primary objective of these systems is to retrieve blog posts which are both relevant to a given query and contain opinions, and generate a ranked list of the retrieved documents according to the relevance and opinion scores. Although a wide variety of effective opinion retrieval methods have been proposed, to the best of our knowledge, none of them takes into consideration the issue of the importance of the retrieved opinions. In this work we introduce a ranking model which combines the existing retrieval strategies with query-independent information to enhance the ranking of the opinionated documents. More specifically, our model accounts for the influence of the blogger who authored an opinion, the reputation of the blog site which published a specific blog post, and the impact of the post itself. Furthermore, we expand the current proximity-based opinion scoring strategies by considering the physical locations of the query and opinion terms within a document. We conduct extensive experiments with the TREC Blogs08 dataset which demonstrate that the application of our methods enhances retrieval precision by a significant margin.  相似文献   

18.
微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。  相似文献   

19.
多特征融合的博客文章分类方法   总被引:2,自引:0,他引:2  
博客已经成为了互联网上最热门的应用之一.博客文章内容千差万别,对其进行分类具有重要意义.博客文章有别于新闻文章,普通文本分类方法直接应用于博客文章效果不理想.提出一种新的方法,充分利用了博客文章特有的Tag、用户自定义类别等多个特征,并对各项特征进行融合.另外,通过对自定义类别进行预处理,过滤与类别无关的噪声单词.实验结果表明多特征融合的方法能够有效提高博客文章分类的准确率.  相似文献   

20.
推荐系统的冷启动问题是近期的研究热点,而用户的活跃性判定是冷启动问题的基础。已有方法在判定用户的活跃性时,单纯地考虑了用户发表信息量,对社交媒体的社交关系及行为等特征利用不够。该文面向微博网络,提出了系统的用户活跃性判定方法,创新性主要体现在: (1)提出了微博网络影响用户活跃性的四类指标,包括用户背景、社交关系、发表内容质量及社交行为,避免了仅仅使用用户发表信息数量判定用户是否活跃的粗糙方式;(2)提出了用户活跃性判定流程,提出了基于四类指标的用户与用户集的差异度计算模型。以新浪微博为例,选取了学术研究、企业管理、教育、文化、军事五个领域的900个用户作为测试集,使用准确率P、召回率R及F值为评价指标,进行了实验分析和比较。结果显示,该文所提用户活跃性判定方法的准确率P、召回率R、F值比传统的判定方法分别提高了21%、13%和16%,将该文所提方法用于用户推荐,得到的P、R和F值比最新的方法分别提高了5%、2%和3%,验证了所提方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号