首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
微博作为一种新型的社会媒体,以其信息的高实时性、话题动态关注、传播速度快的特点,逐渐被人们所接受和使用。筛选出相关话题的微博信息,帮助用户关注话题的动态发展,成为迫切需要解决的问题。由于微博信息篇幅极短、包含的信息和特征少等特点,为相关话题微博信息的筛选带来了新的挑战,而传统的文本分类技术已不再适用。该文提出了基于信息熵的筛选规则学习算法,利用学习得到的规则对微博信息进行有效的筛选。算法利用信息熵来评价规则的好坏,同时基于模拟退火的随机策略使算法中的规则选择避免了过于贪心。分别通过来自新浪微博的约九万条标注数据和TREC2011中约三千条特定话题的标注数据进行实验,该文算法相比于CPAR和SVM算法,学习得到的规则在筛选时取得了较高的F值。  相似文献   

2.
Web‐based social networking such as microblogging administrations and long‐range informal communication locales are changing the way in which individuals collaborate on the web and search for data and opinions. An essential parameter of online networking discourse is searchability. A key semiotic asset supporting this capacity is the hashtag, a type of social label that enables microbloggers to insert metadata in online networking posts. In this paper, an attempt is made to analyze stance detection and app recommendation discourse on tweets in view of hashtag techniques, which is in the territory of etymology, and to spotlight the structure of dialect at the provision level. With a revival of enthusiasm for topics identified by modeling language at the discourse level, a graphical model of conversational structure (ie, the structural topic model) has been constructed by means of utilizing three methods: displaying words connected with topics or documents highly connected with topics, calculating topic correlations, and assessing associations between metadata and topical content, its capture of latent topics, and topical structures inside documents on a benchmark dataset (ie, SemEval 2016) has been scrutinized for stance detection, and data have been crawled from Twitter, using the hashtag #App for app recommendations.  相似文献   

3.
How the online social media, like Twitter or its variant Weibo, interacts with the stock market and whether it can be a convincing proxy to predict the stock market have been debated for years, especially for China. As the traditional theory in behavioral finance states, the individual emotions can influence decision-makings of investors, it is reasonable to further explore these controversial topics systematically from the perspective of online emotions, which are richly carried by massive tweets in social media. Through thorough studies on over 10 million stock-relevant tweets and 3 million investors from Weibo, it is revealed that inexperienced investors with high emotional volatility are more sensible to the market fluctuations than the experienced or institutional ones, and their dominant occupation also indicates that the Chinese market might be more emotional as compared to its western counterparts. Then both correlation analysis and causality test demonstrate that five attributes of the stock market in China can be competently predicted by various online emotions, like disgust, joy, sadness and fear. Specifically, the presented prediction model significantly outperforms the baseline model, including the one taking purely financial time series as input features, on predicting five attributes of the stock market under the K-means discretization. We also employ this prediction model in the scenario of realistic online application and its performance is further testified.  相似文献   

4.
微博用户影响力分析作为社交网络分析的重要组成部分,一直受到研究人员的关注。针对现有研究工作分析用户行为时间性的不足和忽略用户与参与话题之间关联性等问题,提出了一种面向微博话题的用户影响力分析算法——基于话题和传播能力的用户排序(TSRank)算法。首先,基于微博话题分析用户转发行为时间性,进一步构建用户转发和用户博文转发两种话题转发关系网络,预测用户话题信息传播能力;然后,分析用户个人历史微博和背景话题微博文本内容,挖掘用户与背景话题之间的关联性;最后,综合考虑用户话题信息传播能力以及用户与背景话题间关联性计算微博用户影响力。爬取新浪微博真实话题数据进行实验,实验结果表明,话题关联度更高用户的话题转发量明显大于关联度很低的用户,引入用户转发行为时间性相比无转发时间性,TSRank算法的捕获率(CR)提高了18.7%,进一步与典型影响力分析算法WBRank、TwitterRank和PageRank相比,TSRank算法在准确率和召回率上分别提高了5.9%、8.7%、13.1%和6.7%、9.1%、14.2%,验证了TSRank算法的有效性。该研究成果对社交网络的社会属性、话题传播等理论研究以及好友推荐、舆情监控等应用研究具有支撑作用。  相似文献   

5.
针对微博文本数据稀疏导致热点话题难以检测的问题,提出了一种基于IDLDA-ITextRank的话题检测模型。首先,通过引入微博时间序列特征和词频特征,构建了IDLDA话题文本聚类模型,利用该模型将同一话题的文本聚到一个文本集合TS;然后,通过采用编辑距离和字向量相结合的相似度计算方法,构建了ITextRank文本摘要和关键词抽取模型,对文本集合TS抽取摘要及其关键词;最后,利用词语互信息和左右信息熵将所抽取的关键词转换成关键主题短语,再将关键主题短语和摘要相结合对话题内容进行表述。通过实验表明,IDLDA模型相较于传统的BTM和LDA模型对话题文本的聚类效果更好,利用关键主题短语和摘要对微博的话题进行表述,比直接利用主题词进行话题表述具有更好的可理解性。  相似文献   

6.
贺瑞芳  王浩成  刘宏宇  王博 《软件学报》2023,34(11):5162-5178
社交媒体主题检测旨在从大规模短帖子中挖掘潜在的主题信息. 由于帖子形式简短、表达非正规化, 且社交媒体中用户交互复杂多样, 使得该任务具有一定的挑战性. 前人工作仅考虑了帖子的文本内容, 或者同时对同构情境下的社交上下文进行建模, 忽略了社交网络的异构性. 然而, 不同的用户交互方式, 如转发, 评论等, 可能意味着不同的行为模式和兴趣偏好, 其反映了对主题的不同的关注与理解; 此外, 不同用户对同一主题的发展和演化具有不同影响, 社区中处于引领地位的权威用户相对于普通用户对主题推断会产生更重要的作用. 因此, 提出一种新的多视图主题模型(multi-view topic model, MVTM), 通过编码微博会话网络中的异构社交上下文来推断更加完整、连贯的主题. 首先根据用户之间的交互关系构建一个属性多元异构会话网络, 并将其分解为具有不同交互语义的多个视图; 接着, 考虑不同交互方式与不同用户的重要性, 借助邻居级注意力和交互级注意力机制, 得到特定视图的嵌入表示; 最后, 设计一个多视图驱动的神经变分推理方法, 以捕捉不同视图之间的深层关联, 并自适应地平衡它们的一致性和独立性, 从而产生更连贯的主题. 在3个月新浪微博数据集上的实验结果证明所提方法的有效性.  相似文献   

7.
As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and more attention online, which provide a unique opportunity to gauge the relation between expressed public sentiment and hot topics. This paper presents a Social Sentiment Sensor (SSS) system on Sina Weibo to detect daily hot topics and analyze the sentiment distributions toward these topics. SSS includes two main techniques, namely, hot topic detection and topic-oriented sentiment analysis. Hot topic detection aims to detect the most popular topics online based on the following steps, topic detection, topic clustering, and topic popularity ranking. We extracted topics from the hashtags using a hashtag filtering model because they can cover almost all the topics. Then, we cluster the topics that describe the same issue, and rank the topic clusters via their popularity to exploit the final hot topics. Topic-oriented sentiment analysis aims to analyze public opinions toward the hot topics. After retrieving the topic-related messages, we recognize sentiment for each message using a state-of-the-art SVM (Support Vector Machine) sentiment classifier. Then, we summarize the sentiments for the hot topic to achieve topic sentiment distribution. Based on the above framework and algorithms, SSS produces a real-time visualization system to monitor social sentiments, which is offering the public a new and timely perspective on the dynamics of the social topics.  相似文献   

8.
微博是个人和组织用户分享或获取简短实时信息的重要社交平台,微博文本自动生成技术能帮助用户在微博平台上快速实现各种社交意图。为辅助用户发表博文并表达社交意图,提出一种基于用户意图的微博文本生成技术,以挖掘提取微博文本特征,并在给定微博主题的条件下生成与用户意图相一致的微博文本。采用预训练语言模型与微调相结合的方法,在预训练语言模型GPT2上实现联合主题和用户意图的文本控制生成,以及具备用户对话功能的文本预测生成。实验结果表明,该技术生成的文本具有较高的可读性且符合微博文本语言风格,结合主题和5类用户意图的生成样本人工评分达77分以上。  相似文献   

9.
微博作为一种重要的社交媒体,许多学者都对微博中用户的影响力进行研究,但大多数影响力的评价算法都是根据微博话题中用户的静态属性或微博话题发生后用户的行为特征对用户影响力进行评价。从用户的转发、评论和点赞三种行为入手,结合突现计算模型,提出一种基于Swarm模型的用户影响力排序算法,SMRank算法可以在微博话题发生的过程中对用户每个时间段的影响力进行计算,给出了一种计算微博话题用户影响力的新方法。通过使用真实的微博话题数据进行实验,结果表明提出的SMRank算法可以有效地发现微博话题中影响等级较大的用户,并能计算出不同用户不同时刻的影响力。  相似文献   

10.
Twitter has become one of the most popular social media platforms, widely used for discussion and information dissemination on all kinds of topics. As a result, both business and academics have researched methods to identify the topics being discussed on Twitter. Those methods can be employed for a number of applications, including emergency management, advertisements, and corporate/government communication. However, deriving topics from this short text based and highly dynamic environment remains a huge challenge. Most current methods use the content of tweets as the only source for topic derivation. Recently, tweet interactions have been considered for improving the quality of topic derivation. In this paper, we propose a method that considers both content and interactions with a temporal aspect to further improve the quality of topic derivation. The impact of the temporal aspect in user/tweet interactions is analyzed based on several Twitter datasets. The proposed method incorporates time when it clusters tweets and identifies representative terms for each topic. Experimental results show that the inclusion of the temporal aspect in the interactions results in a significant improvement in the quality of topic derivation comparing to existing baseline methods.  相似文献   

11.
The COVID-19 pandemic has become one of the severe diseases in recent years. As it majorly affects the common livelihood of people across the universe, it is essential for administrators and healthcare professionals to be aware of the views of the community so as to monitor the severity of the spread of the outbreak. The public opinions are been shared enormously in microblogging media like twitter and is considered as one of the popular sources to collect public opinions in any topic like politics, sports, entertainment etc., This work presents a combination of Intensity Based Emotion Classification Convolution Neural Network (IBEC-CNN) model and Non-negative Matrix Factorization (NMF) for detecting and analyzing the different topics discussed in the COVID-19 tweets as well the intensity of the emotional content of those tweets. The topics were identified using NMF and the emotions are classified using pretrained IBEC-CNN, based on predefined intensity scores. The research aimed at identifying the emotions in the Indian tweets related to COVID-19 and producing a list of topics discussed by the users during the COVID-19 pandemic. Using the Twitter Application Programming Interface (Twitter API), huge numbers of COVID-19 tweets are retrieved during January and July 2020. The extracted tweets are analyzed for emotions fear, joy, sadness and trust with proposed Intensity Based Emotion Classification Convolution Neural Network (IBEC-CNN) model which is pretrained. The classified tweets are given an intensity score varies from 1 to 3, with 1 being low intensity for the emotion, 2 being the moderate and 3 being the high intensity. To identify the topics in the tweets and the themes of those topics, Non-negative Matrix Factorization (NMF) has been employed. Analysis of emotions of COVID-19 tweets has identified, that the count of positive tweets is more than that of count of negative tweets during the period considered and the negative tweets related to COVID-19 is less than 5%. Also, more than 75% negative tweets expressed sadness, fear are of low intensity. A qualitative analysis has also been conducted and the topics detected are grouped into themes such as economic impacts, case reports, treatments, entertainment and vaccination. The results of analysis show that the issues related to the pandemic are expressed different emotions in twitter which helps in interpreting the public insights during the pandemic and these results are beneficial for planning the dissemination of factual health statistics to build the trust of the people. The performance comparison shows that the proposed IBEC-CNN model outperforms the conventional models and achieved 83.71% accuracy. The % of COVID-19 tweets that discussed the different topics vary from 7.45% to 26.43% on topics economy, Statistics on cases, Government/Politics, Entertainment, Lockdown, Treatments and Virtual Events. The least number of tweets discussed on politics/government on the other hand the tweets discussed most about treatments.  相似文献   

12.
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.  相似文献   

13.
The geographical identification of content in Social Networks have enabled to bridge the gap between online social platforms and the physical world. Although vast amounts of data in such networks are due to breaking news or global occurrences, local events witnessed by users in situ are also present in these streams and of great importance for many city entities. Nowadays, unsupervised machine learning techniques, such as Tweet-SCAN, are able to retrospectively detect these local events from tweets. However, these approaches have limited abilities to reason about unseen observations in a principled way due to the lack of a proper probabilistic foundation. Probabilistic models have also been proposed for the task, but their event identification capabilities are far from those of Tweet-SCAN. In this paper, we identify two key factors which, when combined, boost the accuracy of such models. As a first key factor, we notice that the large amount of meaningless social data requires explicitly modeling non-event observations.Therefore, we propose to incorporate a background model that captures spatio-temporal fluctuations of non-event tweets. As a second key factor, we observe that the shortness of tweets hampers the application of traditional topic models. Thus, we integrate event detection and topic modeling, assigning topic proportions to events instead of assigning them to individual tweets. As a result, we propose Warble, a new probabilistic model and learning scheme for retrospective event detection that incorporates these two key factors. We evaluate Warble in a data set of tweets located in Barcelona during its festivities. The empirical results show that the model outperforms other state-of-the-art techniques in detecting various types of events while relying on a principled probabilistic framework that enables to reason under uncertainty.  相似文献   

14.
Successful adoption and management of sustainable urban systems hinges on the community embracing these systems. Capturing citizens’ ideas, views, and assessments of the built environment will be essential to this goal. In collaborative city planning, these are qualified and valued forms of partial knowledge that should be collectively used to shape the decision making process of urban planning. Among other tools, social media and online social network analytics can provide means to capture elements of such a distributed knowledge. While a structured definition of sustainability (normally dictated in a top-down fashion) may not sufficiently respond well to the pluralist nature of such knowledge acquisition; dealing with the unstructured community inputs, assessments and contributions on social media can be confusing. We can detect fully relevant topics/ideas in community discussions; but they typically suffer from lack of coherence.In this paper, we advocate the use of a semi-structured approach for capturing, analyzing, and interpreting citizens’ inputs. Public officials and professionals can develop the main elements (topical aspects) of sustainability, which can act as the skeleton of a taxonomy. It is however, the community inputs/ideas (in our case collected via social media and parsed), that can shape-up that skeleton and augment those topical aspects with adding the required semantic depth. In more specific terms, we collected tweets for four urban infrastructure mega-projects in North America. Then we used a game-with-a-purpose to crowdsource the identification of topics for a training set of tweets. This was then used to train machine learning algorithms to cluster the rest of collected tweets. We studied the semantic (finding the topics) of tweets as well as their sentiment (in terms of being opposing or supportive of a project). Our classification tested different decision trees with different topic hierarchies. We considered/extracted eight different linguistic features in studying contents of a tweet. Finally, we examined the accuracy of three algorithms in classifying tweets according to the sequence in the tree, and based on the extracted features. These are: K-nearest neighbors, Naïve Bayes classifiers and Support Vector Machines (SVM).Respective to our data set, SVM outperformed other algorithms. Semantic analysis was insensitive to the depth/number of linguistic features considered. In contrast, sentiment analysis was enhanced when part of speech (PoS) was tracked. Interestingly, our work shows that considering the topic (semantic) of a tweet helped enhance the accuracy of sentiment analysis: including topical class as a feature in conducting sentiment analysis results in higher accuracies. This could be used as means to detect the evolution of community opinion: that topic-based social networks are evolving within the communities tweeting about urban projects. It could also be used to identify the topics of top priority to the community or the ones that have the widest spread of views. In our case, these were mainly the impacts of the design and engineering features on social issues.  相似文献   

15.
孙媛  赵倩 《中文信息学报》2017,31(1):102-111
如何获取藏文话题在其他语种中的相关信息,对于促进少数民族地区的社会管理科学化水平、维护民族团结和国家统一、构建和谐社会具有重要意义。目前大多数研究集中在英汉跨语言信息处理方面,针对藏汉跨语言研究较少。如何根据藏语、汉语的特点,并结合目前藏语信息处理的研究现状,实现藏汉多角度的社会网络关系关联,同步发现关注话题并进行数据比较,是迫切需要解决的问题。该文在藏汉可比语料的基础上,利用词向量对文本词语进行语义扩展,进而构建LDA话题模型,并利用Gibbs sampling进行模型参数的估计,抽取出藏语和汉语话题。在LDA话题模型生成的文档-话题分布的基础上,提出一种基于余弦相似度、欧氏距离、Hellinger距离和KL距离四种相似度算法的投票方法,来实现藏汉话题的对齐。  相似文献   

16.
The objective of this study is to examine and quantify the relationships among sociodemographic factors, damage claims, and social media attention on areas during natural disasters. Social media has become an important communication channel for people to share and seek situational information to learn of risks, to cope with community disruptions, and to support disaster response. Recent studies in disaster informatics have recognized the presence of bias in the representation of social media activity in areas affected by disasters. To explore related factors for such bias, existing studies have used geo-tagged tweets to assess the extent of social media activity in disaster-affected areas to evaluate whether vulnerable populations remain silent on social media. However, less than 1% of all tweets are actually geo-tagged; therefore, attempts to understand the representativeness of geotagged tweets to the general population have shown that certain populations are over- or underrepresented. To address this limitation, this study examined the attention given to locations based on social media content. The study conducted a content-based analysis to filter tweets related to 84 super-neighborhoods in Houston during Hurricane Harvey and 57 cities in North Carolina during Hurricane Florence. By examining the relationships among sociodemographic factors, the number of damage claims, and the volume of tweets, the results showed that social media attention concentrates in populous areas, independent of education, language, unemployment, and median income. The relationship between population and social media attention is characterized by a sub-linear power law, indicating a large variation among the sparsely populated areas. Using a machine-learning model to label the topics of the tweets, the results showed that social media users pay more attention to rescue- and donation-related information; nevertheless, the topic variation is consistent across areas with different levels of attention. These findings contribute to a better understanding of the spatial concentration of social media attention regarding posting and spreading situational information in disasters. The findings could inform emergency managers and public officials to effectively use social media data for equitable resource allocation and action prioritization.  相似文献   

17.
Social media platforms such as Twitter are becoming increasingly mainstream which provides valuable user-generated information by publishing and sharing contents. Identifying interesting and useful contents from large text-streams is a crucial issue in social media because many users struggle with information overload. Retweeting as a forwarding function plays an important role in information propagation where the retweet counts simply reflect a tweet’s popularity. However, the main reason for retweets may be limited to personal interests and satisfactions. In this paper, we use a topic identification as a proxy to understand a large number of tweets and to score the interestingness of an individual tweet based on its latent topics. Our assumption is that fascinating topics generate contents that may be of potential interest to a wide audience. We propose a novel topic model called Trend Sensitive-Latent Dirichlet Allocation (TS-LDA) that can efficiently extract latent topics from contents by modeling temporal trends on Twitter over time. The experimental results on real world data from Twitter demonstrate that our proposed method outperforms several other baseline methods.  相似文献   

18.
Social media data can be valuable in many ways. However, the vast amount of content shared and the linguistic variants of languages used on social media are making it very challenging for high-value topics to be identified. In this paper, we present an unsupervised multilingual approach for identifying highly relevant terms and topics from the mass of social media data. This approach combines term ranking, localised language analysis, unsupervised topic clustering and multilingual sentiment analysis to extract prominent topics through analysis of Twitter's tweets from a period of time. It is observed that each of the ranking methods tested has their strengths and weaknesses, and that our proposed ‘Joint’ ranking method is able to take advantage of the strengths of the ranking methods. This ‘Joint’ ranking method coupled with an unsupervised topic clustering model is shown to have the potential to discover topics of interest or concern to a local community. Practically, being able to do so may help decision makers to gauge the true opinions or concerns on the ground. Theoretically, the research is significant as it shows how an unsupervised online topic identification approach can be designed without much manual annotation effort, which may have great implications for future development of expert and intelligent systems.  相似文献   

19.
Social media influence analysis, sometimes also called authority detection, aims to rank users based on their influence scores in social media. Existing approaches of social influence analysis usually focus on how to develop effective algorithms to quantize users’ influence scores. They rarely consider a person’s expertise levels which are arguably important to influence measures. In this paper, we propose a computational approach to measuring the correlation between expertise and social media influence, and we take a new perspective to understand social media influence by incorporating expertise into influence analysis. We carefully constructed a large dataset of 13,684 Chinese celebrities from Sina Weibo (literally ”Sina microblogging”). We found that there is a strong correlation between expertise levels and social media influence scores. Our analysis gave a good explanation of the phenomenon of “top across-domain influencers”. In addition, different expertise levels showed influence variation patterns: e.g., (1) high-expertise celebrities have stronger influence on the “audience” in their expertise domains; (2) expertise seems to be more important than relevance and participation for social media influence; (3) the audiences of top expertise celebrities are more likely to forward tweets on topics outside the expertise domains from high-expertise celebrities.  相似文献   

20.
王臻皇  陈思明  袁晓如 《软件学报》2018,29(4):1115-1130
随着微博的发展,其影响力日益增大,对微博主题内容进行分析具有重要的价值.主题模型技术能够从文本数据中提取主题,但是,由于微博文本短、随意性大、信息量小等特点,微博主题的分析具有一定的难度.提出了一个微博主题可视分析系统,利用多种互相关联的视图与丰富的交互手段,支持用户对主题模型结果进行分析与探索.系统结合了微博数据的特点,引入微博用户与时间因素,支持分析者从多角度对微博主题进行全面分析.系统支持用户在主题可视分析的基础上,通过交互操作对主题进行编辑,从而改进主题模型,提高模型的准确性和可靠性.案例分析结果表明,提出的系统可以有效地帮助用户分析微博主题和修正主题.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号