首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
ABSTRACT

Twitter has become a popular microblogging service that allows millions of active users share news, emergent social events, personal opinions, etc. That leads to a large amount of data producing every day and the problem of managing tweets becomes extremely difficult. To categorize the tweets and make easily in searching, the users can use the hashtags embedding in their tweets. However, valid hashtags are not restricted which lead to a very heterogeneous set of hashtags created on Twitter, increasing the difficulty of tweet categorization. In this paper, we propose a hashtag recommendation method based on analyzing the content of tweets, user characteristics, and currently popular hashtags on Twitter. The proposed method uses personal profiles of the users to discover the relevant hashtags. First, a combination of tweet contents and user characteristics is used to find the top-k similar tweets. We exploit the content of historical tweets, used hashtags, and the social interaction to build the user profiles. The user characteristics can help to find the close users and enhance the accuracy of finding the similar tweets to extract the hashtag candidates. Then a set of hashtag candidates is ranked based on their popularity in long and short periods. The experiments on tweet data showed that the proposed method significantly improves the performance of hashtag recommendation systems.  相似文献   

2.
Hashtags, terms prefixed by a hash-symbol #, are widely used and inserted anywhere within short messages (tweets) on micro-blogging systems as they present rich sentiment information on topics that people are interested in. In this paper, we focus on the problem of hashtag recommendation considering their personalized and temporal aspects. As far as we know, this is the first work addressing this issue specially to recommend personalized hashtags combining longterm and short-term user interest.We introduce three features to capture personal and temporal user interest: 1) hashtag textual information; 2) user behavior; and 3) time. We offer two recommendation models for comparison: a linearcombined model, and an enhanced session-based temporal graph (STG) model, Topic-STG, considering the features to learn user preferences and subsequently recommend personalized hashtags. Experiments on two real tweet datasets illustrate the effectiveness of the proposed models and algorithms.  相似文献   

3.
微博等社交媒体为人们情绪表达提供了重要平台,分析微博的情绪倾向具有重要的商业价值和社会意义。文中提出了基于词典的规则方法识别微博所表达的喜、哀、怒、惧、恶、惊六种情绪。针对情绪表达的重要线索表情符利用互信息法生成了表情符词典,与传统情绪词典相结合,制定了针对否定用法的规则对微博进行分析。建立了第一个包含六种情绪的人工标注微博数据集。实验表明,传统的情绪词典虽然收录了大量词汇,但对于社交媒体文本分析的准确率和覆盖率都不高。表情符词典的应用显著地提高了微博情绪分析的精度和覆盖率。  相似文献   

4.
Hashtag recommendation for microblogs is a very hot research topic that is useful to many applications involving microblogs. However, since short text in microblogs and low utilization rate of hashtags will lead to the data sparsity problem, it is difficult for typical hashtag recommendation methods to achieve accurate recommendation. In light of this, we propose HRMF, a hashtag recommendation method based on multi-features of microblogs in this article. First, our HRMF expands short text into long text, and then it simultaneously models multi-features (i.e., user, hashtag, text) of microblogs by designing a new topic model. To further alleviate the data sparsity problem, HRMF exploits hashtags of both similar users and similar microblogs as the candidate hashtags. In particular, to find similar users, HRMF combines the designed topic model with typical user-based collaborative filtering method. Finally, we realize hashtag recommendation by calculating the recommended score of each hashtag based on the generated topical representations of multi-features. Experimental results on a real-world dataset crawled from Sina Weibo demonstrate the effectiveness of our HRMF for hashtag recommendation.  相似文献   

5.
近年来,Hashtag推荐任务吸引了很多研究者的关注。目前,大部分深度学习方法把这个任务看作是一个多标签分类问题,将Hashtag看作为微博的类别。但是这些方法的输出空间固定,在没有进行重新训练的情况下,不能处理训练不可见的Hashtag。然而,实际上Hashtag会随着时事热点不断快速更新。为了解决这一问题,该文提出将Hashtag推荐任务建模成小样本学习任务。同时,结合用户使用Hashtag的偏好降低推荐的复杂度。在真实的推特数据集上的实验表明,与目前最优方法相比,该模型不仅可以取得更好的推荐结果,而且表现得更为鲁棒。  相似文献   

6.
Recently, Twitter has become a prominent part of social protest movement communication. This study examines Twitter as a new kind of citizen journalism platform emerging at the aggregate in the context of such “crisis” situations by undertaking a case study of the use of Twitter in the 2011 Wisconsin labor protests. A corpus of more than 775,000 tweets tagged with #wiunion during the first 3 weeks of the protests provides the source of the analyses. Findings suggest that significant differences exist between users who tweet via mobile devices, and thus may be present at protests, and those who tweet from computers. Mobile users post fewer URLs overall; however, when they do, they are more likely to link to traditional news sources and to provide additional hashtags for context. Over time, all link-posting declines, as users become better able to convey first-hand information. Notably, results for most analyses significantly change when restricted to original tweets only, rather than including retweets.  相似文献   

7.
The purpose of this study is to investigate influencers on Twitter to discover the characteristics of their tweets through PIAR, a unique data mining research tool developed by the University of Salamanca that combines graph theory and social influence theory. An analysis of 3853 users posting about two automotive Japanese car firms, Toyota and Nissan, reveals the characteristics influencers have on this social network. The findings suggest that influencers use more hashtags and mentions on average when they tweet, and their word count is fewer than those with less power on this virtual community. Surprisingly, they tend to include less embedded links on their posts. Additionally, influencers have on average a large number of people they follow and they clearly express their opinions and feelings (either positive or negative) when tweeting. The results broaden the understanding of how influencers write and behave on social networks when they communicate with their users' community. Further, it provides insights for practitioners and marketers on how to discover influencers talking about their brands by observing tweets' content.  相似文献   

8.
Social emotion detection of online users has become an important task for mining public opinions. Social emotion detection aims at predicting the readers’ emotions evoked by news articles, tweets, etc. In this article, we focus on building a social emotion detection system for online news. The system is built based on the modules of document selection, Part-of-speech (POS) tagging, and social emotion lexicon generation. Empirical studies are extensively conducted on a large scale real-world collection of news articles. Experiments show that the document selection algorithm has a positive effect on the social emotion detection. The system performs better with the words and POS combination compared to a feature set consisting only of words. POS is also useful to detect emotion ambiguity of words and the context dependence of their sentiment orientations. Furthermore, the proposed method of generating the lexicon outperforms the baselines in terms of social emotion prediction.  相似文献   

9.
微博作为一种新型的社会媒体,以其信息的高实时性、话题动态关注、传播速度快的特点,逐渐被人们所接受和使用。筛选出相关话题的微博信息,帮助用户关注话题的动态发展,成为迫切需要解决的问题。由于微博信息篇幅极短、包含的信息和特征少等特点,为相关话题微博信息的筛选带来了新的挑战,而传统的文本分类技术已不再适用。该文提出了基于信息熵的筛选规则学习算法,利用学习得到的规则对微博信息进行有效的筛选。算法利用信息熵来评价规则的好坏,同时基于模拟退火的随机策略使算法中的规则选择避免了过于贪心。分别通过来自新浪微博的约九万条标注数据和TREC2011中约三千条特定话题的标注数据进行实验,该文算法相比于CPAR和SVM算法,学习得到的规则在筛选时取得了较高的F值。  相似文献   

10.
Arabic is one of the most spoken languages across the globe. However, there are fewer studies concerning Sentiment Analysis (SA) in Arabic. In recent years, the detected sentiments and emotions expressed in tweets have received significant interest. The substantial role played by the Arab region in international politics and the global economy has urged the need to examine the sentiments and emotions in the Arabic language. Two common models are available: Machine Learning and lexicon-based approaches to address emotion classification problems. With this motivation, the current research article develops a Teaching and Learning Optimization with Machine Learning Based Emotion Recognition and Classification (TLBOML-ERC) model for Sentiment Analysis on tweets made in the Arabic language. The presented TLBOML-ERC model focuses on recognising emotions and sentiments expressed in Arabic tweets. To attain this, the proposed TLBOML-ERC model initially carries out data pre-processing and a Continuous Bag Of Words (CBOW)-based word embedding process. In addition, Denoising Autoencoder (DAE) model is also exploited to categorise different emotions expressed in Arabic tweets. To improve the efficacy of the DAE model, the Teaching and Learning-based Optimization (TLBO) algorithm is utilized to optimize the parameters. The proposed TLBOML-ERC method was experimentally validated with the help of an Arabic tweets dataset. The obtained results show the promising performance of the proposed TLBOML-ERC model on Arabic emotion classification.  相似文献   

11.
Twitter users mention cities in the context of tourist attractions or events, such as protests or games, thus forming a network between cities from which they tweet and cities that they tweet about. This study tackles the challenge of explaining why users tweet about cities outside of their own by analyzing an underlying network of city mentions on Twitter. It applies graph theory as well as various measures of network connectivity such as indegree, hub score, and authority score to examine the prominence of individual cities in the Twitter landscape and the connection patterns between cities. Closely related to communication ties is the sentiment of tweets about other cities, which can be extracted from the text of tweets that contain geohashtags, i.e., hashtags with names of other cities. The effect of distance between cities on user sentiments towards cities will be explored. Furthermore, Quadratic Assignment Procedure (QAP) network regression will be used to build a general socio-demographic and geographic model that helps to identify which characteristics of city pairs, e.g. separation distance, or similarity in employment data or population, increase or decrease the likelihood of mentions between those cities. Findings show that distance and network size (compactness) are major determinants in communication ties between cities. City popularity, when measured by indegree, follows a power-law distribution, and is closely tied to population, GDP, or visitor numbers. Larger cities reveal a higher percentage of self-mentions than smaller cities, showing the high level of attention these metropolitan areas attract from Twitter users due to the many opportunities, events, and sights offered. Future research in the field of analysis of geotagged tweets can further extend the network regression model with new covariates.  相似文献   

12.
作为仅次于及时通信和搜索引擎的中国互联网网民第三大应用,网络音乐及其应用技术受到业界学者的青睐。音乐作为人类最重要的交流媒介,携带着丰富的情感信息,计算机音乐情感分析更是得到人机情感交互技术领域的高度重视。在基于歌词文本的音乐情感分析过程中,一部合理的音乐领域情感词典,将提供更加细致、更加准确的分析结果。以改进后的Hevner情感环模型为基础,借助HowNet所提供的语义资源和从网络爬取的歌词文本语料库,构建了一部树形层次结构的音乐领域中文情感词典,并利用LRC歌词携带的时间标签获取歌曲的语速信息,实现了基于情感向量空间模型和情感词典的歌词情感分类。实验表明与人工构建的情感词典相比,所构建的情感词典更适用于音乐领域。  相似文献   

13.
目前比较流行的中文分词方法为基于统计模型的机器学习方法。基于统计的方法一般采用人工标注的句子级的标注语料进行训练,但是这种方法往往忽略了已有的经过多年积累的人工标注的词典信息。这些信息尤其是在面向跨领域时,由于目标领域句子级别的标注资源稀少,从而显得更加珍贵。因此如何充分而且有效的在基于统计的模型中利用词典信息,是一个非常值得关注的工作。最近已有部分工作对它进行了研究,按照词典信息融入方式大致可以分为两类:一类是在基于字的序列标注模型中融入词典特征,而另一类是在基于词的柱搜索模型中融入特征。对这两类方法进行比较,并进一步进行结合。实验表明,这两类方法结合之后,词典信息可以得到更充分的利用,最终无论是在同领域测试和还是在跨领域测试上都取得了更优的性能。  相似文献   

14.
In this paper a study concerning the evaluation and analysis of natural language tweets is presented. Based on our experience in text summarisation, we carry out a deep analysis on user's perception through the evaluation of tweets manual and automatically generated from news. Specifically, we consider two key issues of a tweet: its informativeness and its interestingness. Therefore, we analyse: (1) do users equally perceive manual and automatic tweets?; (2) what linguistic features a good tweet may have to be interesting, as well as informative? The main challenge of this proposal is the analysis of tweets to help companies in their positioning and reputation on the Web. Our results show that: (1) automatically informative and interesting natural language tweets can be generated as a result of summarisation approaches; and (2) we can characterise good and bad tweets based on specific linguistic features not present in other types of tweets.  相似文献   

15.

During disasters, multimedia content on social media sites offers vital information. Reports of injured or deceased people, infrastructure destruction, and missing or found people are among the types of information exchanged. While several studies have demonstrated the importance of both text and picture content for disaster response, previous research has primarily concentrated on the text modality and not so much success with multi-modality. Latest research in multi-modal classification in disaster related tweets uses comparatively primitive models such as KIMCNN and VGG16. In this research work we have taken this further and utilized state-of-the-art models in both text and image classification to try and improve multi-modal classification of disaster related tweets. The research was conducted on two different classification tasks, first to detect if a tweet is informative or not, second to understand the response needed. The process of multimodal analysis is broken down by incorporating different methods of feature extraction from the textual data corpus and pre-processing the corresponding image corpus, then we use several classification models to train and predict the output and compare their performances while tweaking the parameters to improve the results. Models such as XLNet, BERT and RoBERTa in text classification and ResNet, ResNeXt and DenseNet in image classification were trained and analyzed. Results show that the proposed multimodal architecture outperforms models trained using a single modality (text or image alone). Also, it proves that the newer state-of-the-art models outperform the baseline models by a reasonable margin for both the classification tasks.

  相似文献   

16.
该文旨在探索一种面向微博的社会情绪词典构建方法,并将其应用于社会公共事件的情绪分析中。首先通过手工方法建立小规模的基准情绪词典,然后利用深度学习工具Word2vec对社会热点事件的微博语料通过增量式学习方法来扩展基准词典,并结合HowNet词典匹配和人工筛选生成最终的情绪词典。接下来,分别利用基于情绪词典和基于SVM的情绪方法对实验标注语料进行情绪分析,结果对比分析表明基于词典的情绪分析方法优于基于SVM的情绪分析方法,前者的平均准确率和召回率比后者分别高13.9%和1.5%。最后运用所构建的情绪词典对热点公共事件进行情绪分析,实验结果表明该方法是有效的。  相似文献   

17.
The CMU-EBMT machine translation system   总被引:1,自引:1,他引:0  
  相似文献   

18.
In recent years, Twitter has become one of the most important microblogging services of the Web 2.0. Among the possible uses it allows, it can be employed for communicating and broadcasting information in real time. The goal of this research is to analyze the task of automatic tweet generation from a text summarization perspective in the context of the journalism genre. To achieve this, different state-of-the-art summarizers are selected and employed for producing multi-lingual tweets in two languages (English and Spanish). A wide experimental framework is proposed, comprising the creation of a new corpus, the generation of the automatic tweets, and their assessment through a quantitative and a qualitative evaluation, where informativeness, indicativeness and interest are key criteria that should be ensured in the proposed context.From the results obtained, it was observed that although the original tweets were considered as model tweets with respect to their informativeness, they were not among the most interesting ones from a human viewpoint. Therefore, relying only on these tweets may not be the ideal way to communicate news through Twitter, especially if a more personalized and catchy way of reporting news wants to be performed. In contrast, we showed that recent text summarization techniques may be more appropriate, reflecting a balance between indicativeness and interest, even if their content was different from the tweets delivered by the news providers.  相似文献   

19.
Web‐based social networking such as microblogging administrations and long‐range informal communication locales are changing the way in which individuals collaborate on the web and search for data and opinions. An essential parameter of online networking discourse is searchability. A key semiotic asset supporting this capacity is the hashtag, a type of social label that enables microbloggers to insert metadata in online networking posts. In this paper, an attempt is made to analyze stance detection and app recommendation discourse on tweets in view of hashtag techniques, which is in the territory of etymology, and to spotlight the structure of dialect at the provision level. With a revival of enthusiasm for topics identified by modeling language at the discourse level, a graphical model of conversational structure (ie, the structural topic model) has been constructed by means of utilizing three methods: displaying words connected with topics or documents highly connected with topics, calculating topic correlations, and assessing associations between metadata and topical content, its capture of latent topics, and topical structures inside documents on a benchmark dataset (ie, SemEval 2016) has been scrutinized for stance detection, and data have been crawled from Twitter, using the hashtag #App for app recommendations.  相似文献   

20.
Sporting events evoke strong emotions among fans and thus act as natural laboratories to explore emotions and how they unfold in the wild. Computational tools, such as sentiment analysis, provide new ways to examine such dynamic emotional processes. In this article we use sentiment analysis to examine tweets posted during 2014 World Cup. Such analysis gives insight into how people respond to highly emotional events, and how these emotions are shaped by contextual factors, such as prior expectations, and how these emotions change as events unfold over time. Here we report on some preliminary analysis of a World Cup twitter corpus using sentiment analysis techniques. After performing initial tests of validation for sentiment analysis on data in this corpus, we show these tools can give new insights into existing theories of what makes a sporting match exciting. This analysis seems to suggest that, contrary to assumptions in sports economics, excitement relates to expressions of negative emotion. The results are discussed in terms of innovations in methodology and understanding the role of emotion for “tuning in” to real world events. We also discuss some challenges that such data present for existing sentiment analysis techniques and discuss future analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号