首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于MB-LDA模型的微博主题挖掘   总被引:5,自引:0,他引:5  
随着微博的日趋流行,Twitter等微博网站已成为海量信息的发布体,对微博的研究也需要从单一的用户关系分析向微博本身内容的挖掘进行转变.在数据挖掘领域,尽管传统文本的主题挖掘已经得到了广泛的研究,但对于微博这种特殊的文本,因其本身带有一些结构化的社会网络方面的信息,传统的文本挖掘算法不能很好地对它进行建模.提出了一个基于LDA的微博生成模型MB-LDA,综合考虑了微博的联系人关联关系和文本关联关系,来辅助进行微博的主题挖掘.采用吉布斯抽样法对模型进行推导,不仅能挖掘出微博的主题,还能挖掘出联系人关注的主题.此外,模型还能推广到许多带有社交网络性质的文本中.在真实数据集上的实验表明,MB-LDA模型能有效地对微博进行主题挖掘.  相似文献   

2.
于广川  贺瑞芳  刘洋  党建武 《软件学报》2017,28(10):2654-2673
时序推特摘要是文本摘要任务中的一个重要分支,旨在从热点事件相关的海量推特流中总结出随时间演化的简要推特集,以帮助用户快速获取信息.推特作为当今最流行的社交媒体平台,其信息量爆发式的增长以及文本碎片的非结构性,使得单纯依赖文本内容的传统摘要方法不再适用.与此同时,社交媒体的新特性也为推特摘要带来了新的机遇.将推特流视作信号,剖析了其中的复杂噪声,提出融合推特流随时序变化的宏微观信号以及用户社交上下文语境信息的时序推特摘要新方法.首先,通过小波分析对推特流全局时序信息建模,实现某一关键词相关的热点子事件时间点检测;接着,融入推特流局部时序信息和用户社交信息建立推特的随机步图模型摘要框架,为每个热点子事件生成推特摘要.在算法评估过程中,对真实推特数据集进行了专家时间点和专家摘要的人工标注,实验结果表明了小波分析和融合了时序-社交上下文语境的图模型在时序推特摘要中的有效性.  相似文献   

3.

With the development of online social networking applications, microblogs have become a necessary online communication network in daily life. Users are interested in obtaining personalized recommendations related to their tastes and needs. In some microblog systems, tags are not available, or the use of tags is rare. In addition, user-specified social relations are extremely rare. Hence, sparsity is a problem in microblog systems. To address this problem, we propose a new framework called Pblog to alleviate sparsity. Pblog identifies users’ interests via their microblogs and social relations and computes implicit similarity among users using a new algorithm. The experimental results indicated that the use of this algorithm can improve the results. In online social networks, such as Twitter, the number of microblogs in the system is high, and it is constantly increasing. Therefore, providing personalized recommendations to target users requires considerable time. To address this problem, the Pblog framework groups similar users using the analytic hierarchy process (AHP) method. Then, Pblog prunes microblogs of the target user group and recommends microblogs with higher ratings to the target user. In the experimental results section, the Pblog framework was compared with several other frameworks. All of these frameworks were run on two datasets: Twitter and Tumblr. Based on the results of these comparisons, the Pblog framework provides more appropriate recommendations to the target user than previous frameworks.

  相似文献   

4.
基于滑动窗口的微博时间线摘要算法   总被引:1,自引:0,他引:1  
时间线摘要是在时间维度上对文本进行内容归纳和概要生成的技术。传统的时间线摘要主要研究诸如新闻之类的长文本,而本文研究微博短文本的时间线摘要问题。由于微博短文本内容特征有限,无法仅依靠文本内容生成摘要,本文采用内容覆盖性、时间分布性和传播影响力3种指标评价时间线摘要,并提出了基于滑动窗口的微博时间线摘要算法(Microblog timeline summariaztion based on sliding window, MTSW)。该算法首先利用词项强度和熵来确定代表性词项;然后基于上述3种指标构建出评价时间线摘要的综合评价指标;最后采用滑动窗口的方法,遍历时间轴上的微博消息序列,生成微博时间线摘要。利用真实微博数据集的实验结果表明,MTSW算法生成的时间线摘要可以有效地反映热点事件发展演化的过程。  相似文献   

5.
微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。  相似文献   

6.
为了模拟信息在微博环境中的传播情况,根据微博用户行为(发布、关注、转发和评论等)和微博内容,提出一种融合用户行为和内容的微博用户影响力算法。通过对微博用户行为的分析,得到行为因子数据,进而计算出用户影响力的权值。利用微博用户内容建立词共现矩阵,继而运用狄利克雷分配(LDA)模型进行潜在主题分布的识别,通过KL(Kullback Leibler)散度的方法得到用户之间的相似性,最后结合用户影响力权值,得到用户的影响力。实验表明,此算法较为有效。  相似文献   

7.
This study addresses the problem of Chinese microblog opinion retrieval, which aims to retrieve opinionated Chinese microblog posts relevant to a target specified by a user query. Existing studies have shown that lexicon-based approaches employed online public sentiment resources to rank sentimentwords relying on the document features. However, this approach could not be effectively applied to microblogs that have typical user-generated content with valuable contextual information: “user–user” interpersonal interactions and “user–post/comment” intrapersonal interactions. This contextual information is very helpful in estimating the strength of sentiment words more accurately. In this study, we integrate the social contextual relationships among users, posts/comments, and sentiment words into a mutual reinforcement model and propose a unified three-layer heterogeneous graph, on which a random walk sentiment word weighting algorithm is presented to measure the strength of opinion of the sentiment words. Furthermore, the weights of sentiment words are incorporated into a lexicon-based model for Chinese microblog opinion retrieval. Comparative experiments are conducted on a Chinese microblog corpus, and the results show that our proposed mutual reinforcement model achieves significant improvement over previous methods.  相似文献   

8.
Social broadcasting services, such as Twitter and YouTube, are revolutionizing the way we access information and publish our own content. What is the key innovation of such services? We argue that the key innovation of social broadcasting services is recognizing and connecting people’s need for information and attention. While the value of information is widely studied, the importance of attention is less well understood. We use a collection of nearly 3 million Twitter user profiles to study the cross-sectional characteristics of user behavior; we also monitor 521 active Twitter users over a period of 282 days to carry out time-series analyses and a panel data analysis of user behavior. The empirical results consistently suggest that people’s search for attention is an important motivation for them to contribute content on Twitter. This finding supports our conceptual view of social broadcasting services as innovative platforms connecting people’s need for information and attention. It also has important implications for practitioners in this booming field.  相似文献   

9.
Social media, especially Twitter is now one of the most popular platforms where people can freely express their opinion. However, it is difficult to extract important summary information from many millions of tweets sent every hour. In this work we propose a new concept, sentimental causal rules, and techniques for extracting sentimental causal rules from textual data sources such as Twitter which combine sentiment analysis and causal rule discovery. Sentiment analysis refers to the task of extracting public sentiment from textual data. The value in sentiment analysis lies in its ability to reflect popularly voiced perceptions that are stated in natural language. Causal rules on the other hand indicate associations between different concepts in a context where one (or several concepts) cause(s) the other(s). We believe that sentimental causal rules are an effective summarization mechanism that combine causal relations among different aspects extracted from textual data as well as the sentiment embedded in these causal relationships. In order to show the effectiveness of sentimental causal rules, we have conducted experiments on Twitter data collected on the Kurdish political issue in Turkey which has been an ongoing heated public debate for many years. Our experiments on Twitter data show that sentimental causal rule discovery is an effective method to summarize information about important aspects of an issue in Twitter which may further be used by politicians for better policy making.  相似文献   

10.
向微博用户推荐对其有价值和感兴趣的内容,是改善用户体验的重要途径。通过分析微博的特点以及现有微博推荐算法的缺陷,利用标签信息表征用户兴趣,提出一种基于标签概率相关性的微博推荐方法 LPCMR。首先,该方法利用标签之间的概率相关性,构造标签相似性矩阵。然后通过相关性标签权重加权方案,加强标签权重,构建用户-标签矩阵。针对用户标签矩阵稀疏的问题,采用标签相似性矩阵对用户-标签矩阵进行更新,使该矩阵既包含用户兴趣信息,又包含标签与标签之间的关系。以新浪微博公开API抓取的微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文提出的推荐算法具有较好的效果。  相似文献   

11.
Twitter is one of the most popular social media platforms for online users to create and share information. Tweets are short, informal, and large-scale, which makes it difficult for online users to find reliable and useful information, arising the problem of Twitter summarization. On the one hand, tweets are short and highly unstructured, which makes traditional document summarization methods difficult to handle Twitter data. On the other hand, Twitter provides rich social-temporal context beyond texts, bringing about new opportunities. In this paper, we investigate how to exploit social-temporal context for Twitter summarization. In particular, we provide a methodology to model temporal context globally and locally, and propose a novel unsupervised summarization framework with social-temporal context for Twitter data. To assess the proposed framework, we manually label a real-world Twitter dataset. Experimental results from the dataset demonstrate the importance of social-temporal context in Twitter summarization.  相似文献   

12.
马慧芳  张迪  赵卫中  史忠植 《软件学报》2019,30(11):3397-3412
向微博用户推荐对其有价值和感兴趣的内容,是改善用户体验的重要途径.通过分析微博特点以及现有微博推荐算法的缺陷,利用标签信息表征用户兴趣,提出一种结合标签扩充与标签概率相关性的微博推荐方法.首先,考虑到大部分微博用户未给自己添加任何标签或添加标签过少,视用户发布微博为超边,微博中的词视为超点来构建超图,并以一定的加权策略对超边和超点进行加权,通过在超图上随机游走,得到一定数量的关键词,对微博用户标签进行扩充;然后,采用相关性标签权重加权方案构建用户-标签矩阵,利用标签之间的概率相关性,构造标签相似性矩阵,对用户-标签矩阵进行更新,使该矩阵既包含用户兴趣信息,又包含标签与标签之间的关系.以新浪微博公开API抓取的微博信息作为实验数据进行了一系列的实验和分析,结果表明,该推荐算法具有较好的效果.  相似文献   

13.
微博平台隐含潜在的用户信息,通过微博数据挖掘用户兴趣具有重要的社会意义。结合用户兴趣与微博信息的特点,提出了一种文本聚类与兴趣衰减的微博用户兴趣挖掘(TCID-MUIM)方法。首先,通过基于词林的同义词合并策略弥补建模时词频信息不足的弊端;其次,利用二次Single-Pass不完全聚类算法将用户微博划分为多个簇,将簇合并为同一文档以弥补微博文本短小难以挖掘主题信息的问题;最后,通过LDA模型建模,并考虑用户兴趣随时间变化的问题,引入时间因子,将微博—主题矩阵压缩为用户—主题矩阵,获取用户兴趣。实验表明,较之传统建模方法与合并用户历史微博为同一文档的建模方法,TCID-MUIM方法挖掘的用户兴趣主题具有更好的主题区分度,且更贴合用户的真实兴趣偏好。  相似文献   

14.
Social media such as microblogs (Twitter™) allow more people to disclose more personal and private information more frequently to more others than ever before. But what is the nature of, and what factors influence, those disclosures? Applying concepts from research and theory on self-disclosure research and microblogging, this study analyses 3751 tweets, with nearly half including disclosures, over a three-day period. At the user level, user-controlled boundary impermeability varied by user gender, feed identity (parenting, social media professional), and their interaction. At the tweet level, tweet valence, presence of disclosure, and front- or back-stage disclosure were variously influenced by user gender, Twitter feed identity, interactions between them, and boundary impermeability. Social construction of gender roles and social identities, as well as individual tendencies, and possibly communication contexts, are reflected in the valence, presence, and stage of disclosures in microblog content.  相似文献   

15.
段大高  白宸宇  韩忠明  熊海涛 《计算机工程》2022,48(10):138-145+157
社交媒体谣言检测是当前研究的热点问题,现有方法多数通过获取大量用户属性学习用户特征,但不适用于谣言的早期检测,忽略了用户之间的潜在关系对信息传播的影响。提出一种基于多传递影响力的谣言检测方法,根据源微博及其对应转发(评论)之间的关系构建文本信息传播图,并通过图卷积神经网络来捕获、学习文本信息的传播特征。利用文本信息和用户传播过程中的影响力,丰富可用于谣言检测早期的检测信息。将存在转发关系的用户构成用户影响力传播图,构建一种用户节点影响力学习方法,获取用户节点影响力,以增强用户特征信息。在此基础上,将文本特征与用户特征融合以进行谣言检测,从而提升检测效果。在3个真实社交媒体数据集上的实验结果表明,该方法在谣言自动检测以及早期检测的效果都有显著提升,与目前最好的基准方法相比,在微博、Twitter15、Twitter16数据集上的正确率分别提高了2.8%、6.9%和3.4%。  相似文献   

16.
微博转发行为是实现信息传播的重要方式,微博转发预测对微博影响力分析、微博话题分析具有重要价值。现有微博转发预测研究大多围绕消息属性、用户属性等微博自身特征,该文提出融合热点话题的微博转发预测方法,对背景热点话题内容和传播趋势对用户转发行为的影响进行量化分析,提出融合背景热点信息的转发兴趣、转发活跃度、行为模式等特征,并基于分类算法建立了面向热点话题相关微博的转发预测模型,在真实数据上的实验结果表明,该方法的预测准确性达到96.6%,提升幅度最高达到12.14%。  相似文献   

17.
目前,针对微博领域的谣言检测方法主要基于微博正文,同时辅以用户评论特征、传播特征等信息进行判定。然而已有方法没有考虑用户评论质量会直接影响谣言检测的性能,质量低的评论甚至会引入无用甚至负面的特征,进而对谣言检测的性能带来更大的影响。针对该问题,基于用户评论和谣言检测的关联性,首次提出一种考虑评论有效性,并基于多任务联合学习的谣言检测方法。首先将谣言检测作为主任务,用户评论相关性检测为辅助任务;然后采用门控机制和注意力机制过滤和选择有效的用户评论特征;最后基于自主构建的3万条疫情微博谣言数据集进行实验。实验结果表明,对用户评论进行筛选不仅可以提升谣言检测性能,还能对用户评论质量进行判定。  相似文献   

18.
社交网络用户的指数型增长,导致用户在网络中难以找到适合自己的好友.提出一种基于多目标检测算法SSD和时序模型的微博好友推荐算法BSBT-FR,首先利用SSD对搜集到的用户图像进行信息提取,再利用时序模型在时间维度上对提取到的信息做进一步处理,然后利用JS散度公式计算用户间的相似度,最后与基于用户个人信息得出的相似度进行...  相似文献   

19.
当前,由于全民自媒体兴起而引发了巨大的舆情危机,如何高效快速地从海量的碎片化信息中发现热点并抽取实用信息成为一项重大的挑战。在此背景下,提出一种基于Lex-PageRank的微博摘要优化方法,在该方案中,以聚类结果作为实验数据,从微博影响力周期的时间特性和权重属性考虑,提出改进的Lex-PageRank算法,从聚类结果中抽取若干文本组织生成摘要。在新浪微博数据基础上进行的对比实验表明,本方案可以有效地从大量文本中提取出关键信息。  相似文献   

20.
Twitter provides search services to help people find users to follow by recommending popular users or the friends of their friends. However, these services neither offer the most relevant users to follow nor provide a way to find the most interesting tweet messages for each user. Recently, collaborative filtering techniques for recommendations based on friend relationships in social networks have been widely investigated. However, since such techniques do not work well when friend relationships are not sufficient, we need to take advantage of as much other information as possible to improve the performance of recommendations.In this paper, we propose TWILITE, a recommendation system for Twitter using probabilistic modeling based on latent Dirichlet allocation which recommends top-K users to follow and top-K tweets to read for a user. Our model can capture the realistic process of posting tweet messages by generalizing an LDA model as well as the process of connecting to friends by utilizing matrix factorization. We next develop an inference algorithm based on the variational EM algorithm for learning model parameters. Based on the estimated model parameters, we also present effective personalized recommendation algorithms to find the users to follow as well as the interesting tweet messages to read. The performance study with real-life data sets confirms the effectiveness of the proposed model and the accuracy of our personalized recommendations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号