首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although topic detection and tracking techniques have made great progress, most of the researchers seldom pay more attention to the following two aspects. First, the construction of a topic model does not take the characteristics of different topics into consideration. Second, the factors that determine the formation and development of hot topics are not further analyzed. In order to correctly extract news blog hot topics, the paper views the above problems in a new perspective based on the W2T (Wisdom Web of Things) methodology, in which the characteristics of blog users, context of topic propagation and information granularity are investigated in a unified way. The motivations and features of blog users are first analyzed to understand the characteristics of news blog topics. Then the context of topic propagation is decomposed into the blog community, topic network and opinion network, respectively. Some important factors such as the user behavior pattern, opinion leader and network opinion are identified to track the development trends of news blog topics. Moreover, a blog hot topic detection algorithm is proposed, in which news blog hot topics are identified by measuring the duration, topic novelty, attention degree of users and topic growth. Experimental results show that the proposed method is feasible and effective. These results are also useful for further studying the formation mechanism of opinion leaders in blogspace.  相似文献   

2.
Modeling the propagation of hot online topic is a preliminary requirement of predicting the trend of hot online topic. We propose a time-varying hot topic propagation model in online discussion context based upon the collective behavior of users who are in different social subgroups on blog networks and bulletin board system (BBS) sites. By analyzing the stability of the equilibrium of our model, we search for the threshold to be watershed of the trend of hot online topic and generalize about two theorems from the results of analysis, they exposit two sufficient conditions under which the trend of hot online topic will die out or remain uniformly weakly persistent. Furthermore, we propose methods to predict the trend of hot online topic on the strength of our model and theorems. For different motivation, we design two methods: Method (I) is mainly served as a way of theoretical research for predicting long trend of single-peak hot online topic by the thresholds of theorems; and for application, we design method (II) to predict the number of users writing or commenting upon article posts with respect to multi-peak hot online topic and single-peak one in the following two days with the help of Method (I). Experiments of two methods are performed on widely-discussed topics on the Sina Blog and the famous Liang Quan Qi Mei (LQQM) BBS and Xi’an Jiaotong University (BMY) BBS in China. The experimental results show that our methods predict the trend of hot online topic efficiently not only for theoretical motivation but also for applicable motivation, and reduce the computational complexity. Hence, our model can serve as basis for predicting trends in hot online topic propagation.  相似文献   

3.
基于动态主题模型融合多维数据的微博社区发现算法   总被引:1,自引:0,他引:1  
随着微博用户的不断增加,微博网络已经成为用户进行信息交流的平台.针对由于博文长度受限,传统的社区发现算法无法有效解决微博网络的稀疏性等问题,提出了DC-DTM算法.DC-DTM算法首先将微博网络映射为有向加权网络,网络中边的方向反映结点之间的关注关系,利用提出的DTM模型计算出结点之间的语义相似度,并将其作为节点间连边的权重.DTM模型是一种微博主题模型,该模型不仅能够挖掘博客的主题分布,而且能计算出某一主题中用户的影响力大小.其次,利用提出的复杂度低的标签传播算法WLPA进行微博网络的社区发现.该算法的初始化阶段将影响力大的用户结点作为初始结点,标签按照结点的影响力从大到小进行传播,克服了传统标签传播算法的逆流现象,提高了标签传播算法的稳定性.在真实数据上的实验表明,DTM模型能很好地对微博进行主题挖掘,DC-DTM算法能够有效地挖掘出微博网络的社区.  相似文献   

4.
针对传统模型难以真实地描述社交网络舆情话题传播过程,提出一种基于传染病模型的社交网络舆情话题传播模型。分析了社交网络舆情话题的传播特点,根据传染病动力学机制,将内部感染概率、外部感染概率、免疫概率以及直接免疫概率引入舆情话题传播过程中,构建了社交网络舆情话题传播模型,在Matlab 2012平台下采用Facebook数据集进行仿真测试。仿真实验结果表明,该模型可以准确描述社交网络中的话题传播行为特征,研究结果可以为社交网络舆论管理者提供有价值的参考意见。  相似文献   

5.
Weblogs have emerged as a new communication and publication medium on the Internet for diffusing the latest useful information. Providing value-added mobile services, such as blog articles, is increasingly important to attract mobile users to mobile commerce, in order to benefit from the proliferation and convenience of using mobile devices to receive information any time and anywhere. However, there are a tremendous number of blog articles, and mobile users generally have difficulty in browsing weblogs owing to the limitations of mobile devices. Accordingly, providing mobile users with blog articles that suit their particular interests is an important issue. Very little research, however, has focused on this issue.In this work, we propose a novel Customized Content Service on a mobile device (m-CCS) to filter and push blog articles to mobile users. The m-CCS includes a novel forecasting approach to predict the latest popular blog topics based on the trend of time-sensitive popularity of weblogs. Mobile users may, however, have different interests regarding the latest popular blog topics. Thus, the m-CCS further analyzes the mobile users’ browsing logs to determine their interests, which are then combined with the latest popular blog topics to derive their preferred blog topics and articles. A novel hybrid approach is proposed to recommend blog articles by integrating personalized popularity of topic clusters, item-based collaborative filtering (CF) and attention degree (click times) of blog articles. The experiment result demonstrates that the m-CCS system can effectively recommend mobile users’ desired blog articles with respect to both popularity and personal interests.  相似文献   

6.
博客是Web环境中个人表达观点和情感的一种重要载体,一般涉及较宽泛的话题,蕴含丰富的舆情信息。现有针对有关社会事件的用户产生内容进行情感分析的研究多数以篇章级为处理粒度,尚不能满足博客文本深度情感分析的需求。该文提出一种基于LDA话题模型与Hownet词典的中文博客多方面话题情感分析方法。该方法首先利用数据语料训练LDA话题模型,然后以滑动窗口为基本处理单位,利用训练好的LDA模型对博客文本进行话题识别与划分;在此基础上,基于Hownet词典对划分后的话题段落进行情感倾向计算。该方法有助于同时识别博客文本所涉及的多方面子话题及每个子话题上的情感倾向。实验结果表明,该方法不仅能获得较好的话题划分结果,也有助于改善情感分析的准确率。  相似文献   

7.
Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviors, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modeling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network, and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics, and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events.  相似文献   

8.
There are hidden and rich information for data mining in the topology of topic-specific websites. A new topic-specific association rules mining algorithm is proposed to further the research on this area. The key idea is to analyze the frequent hyperlinked relati ons between pages of different topics. In the topic-specific area, if pages of onetopic are frequently hyperlinked by pages of another topic, we consider the two topics are relevant. Also, if pages oftwo different topics are frequently hyperlinked together by pages of the other topic, we consider the two topics are relevant.The initial experiments show that this algorithm performs quite well while guiding the topic-specific crawling agent and it can be applied to the further discovery and mining on the topic-specific website.  相似文献   

9.
User comments, as a large group of online short texts, are becoming increasingly prevalent with the development of online communications. These short texts are characterized by their co-occurrences with usually lengthier normal documents. For example, there could be multiple user comments following one news article, or multiple reader reviews following one blog post. The co-occurring structure inherent in such text corpora is important for efficient learning of topics, but is rarely captured by conventional topic models. To capture such structure, we propose a topic model for co-occurring documents, referred to as COTM. In COTM, we assume there are two sets of topics: formal topics and informal topics, where formal topics can appear in both normal documents and short texts whereas informal topics can only appear in short texts. Each normal document has a probability distribution over a set of formal topics; each short text is composed of two topics, one from the set of formal topics, whose selection is governed by the topic probabilities of the corresponding normal document, and the other from a set of informal topics. We also develop an online algorithm for COTM to deal with large scale corpus. Extensive experiments on real-world datasets demonstrate that COTM and its online algorithm outperform state-of-art methods by discovering more prominent, coherent and comprehensive topics.  相似文献   

10.
常见的词嵌入向量模型存在每个词只具有一个词向量的问题,词的主题值是重要的多义性条件,可以作为获得多原型词向量的附加信息。在skip-gram(cbow)模型和文本主题结构基础上,该文研究了两种改进的多原型词向量方法和基于词与主题的嵌入向量表示的文本生成结构。该模型通过联合训练,能同时获得文本主题、词和主题的嵌入向量,实现了使用词的主题信息获得多原型词向量,和使用词和主题的嵌入式向量学习文本主题。实验表明,该文提出的方法不仅能够获得具有上下文语义的多原型词向量,也可以获得关联性更强的文本主题。  相似文献   

11.
社交网络舆情已经成为社会舆情的主要阵地。针对传统模型难以描述社交网络舆情话题的真实传播过程,分析社交网络舆情话题的真实特点,补充加入社交网络中显著的水军和僵尸粉这2大显著特征,作为舆情话题传播中的正负反馈,分别对舆情话题的传播起到推动及抑制作用,构建带有正负反馈的社交网络舆情传播话题模型,提高舆情预测模型的准确率,得出正负反馈对舆情传播的影响力。  相似文献   

12.
微博是舆情话题传播的重要渠道,研究微博网络中的舆情话题传播机制,将有利于对舆情话题的传播过程进行分析与监控,而传统的网络信息传播模型却无法真实地描述微博网络中的舆情话题传播机制。针对以上问题,分析了微博网络中的信息互动模式及舆情话题的传播特点,以传染病动力学中的SIR模型为基础,通过引入一个新的节点状态--接触状态,构建了基于SCIR(Susceptible Contacted Infected Removed)的微博网络舆情话题传播模型。仿真结果表明,该模型可以很好地描述微博网络中的舆情话题传播规律。  相似文献   

13.
跨语言新闻话题发现是将互联网上报道相同事件的不同语言新闻进行自动归类,由于不同语言文本很难表示在同一特征空间下,对其共同话题的挖掘就比较困难。然而类似的新闻事件在不同语言文本表达上具有相同的新闻要素,这些要素之间关联能够体现出新闻事件的关联性,因此,针对汉越新闻话题发现问题,提出基于文档图聚类的汉越双语新闻话题发现方法。首先提取汉越新闻文本新闻要素,借助文本中要素相似度计算汉越文本相关度,构建汉越双语文本图模型,获得新闻文本相似度矩阵;然后,借助图模型中文本间的传播特点,采用随机游走算法对相似度矩阵进行调整,最后利用信息传递算法进行聚类。实验结果表明提出的方法取得了很好的效果。  相似文献   

14.
现有话题流行度预测方法仅基于话题本身的特征进行流行度预测,未考虑不同话题间的相关性.然而在微博上下文不同的话题之间存在一定的相关性,特别是在同一个事件的不同话题之间.因此,文中利用动态话题模型探测微博中的隐式话题及其流行度时间序列,通过Jensen Shannon散度和皮尔逊相关系数分别分析话题间的内容和时序相关度,然后在预测模型中引入话题时序相关性,提出基于向量自回归模型的微博隐式话题流行度预测算法.通过在真实微博数据上的实验分析可知,相比未考虑话题相关性的算法,文中算法具有更高的预测准确率和更好的模型拟合效果.  相似文献   

15.
软件缺陷预测通常针对代码表面特征训练预测模型并对新样本进行预测,忽视了代码背后隐藏的不同技术方面和主题,从而导致预测不准确。针对这种问题,提出了一种基于主题模型的软件缺陷预测方法。将软件代码库视为不同技术方面和主题的集合,不同的主题或技术方面有不同的缺陷倾向。采用LDA主题模型对不同主题及其缺陷倾向进行建模,根据建模结果计算主题度量,并将传统度量方式和主题度量结合进行模型训练和预测。实验结果显示,该方法相对传统的软件缺陷预测技术有高的准确性,并且可以在软件演化中保证模型相对稳定,可以适用于各种缺陷预测任务。  相似文献   

16.
17.
Documents come naturally with structure: a section contains paragraphs which itself contains sentences; a blog page contains a sequence of comments and links to related blogs. Structure, of course, implies something about shared topics. In this paper we take the simplest form of structure, a document consisting of multiple segments, as the basis for a new form of topic model. To make this computationally feasible, and to allow the form of collapsed Gibbs sampling that has worked well to date with topic models, we use the marginalized posterior of a two-parameter Poisson-Dirichlet process (or Pitman-Yor process) to handle the hierarchical modelling. Experiments using either paragraphs or sentences as segments show the method significantly outperforms standard topic models on either whole document or segment, and previous segmented models, based on the held-out perplexity measure.  相似文献   

18.
文健  李舟军 《中文信息学报》2008,22(1):61-66,122
近年来研究表明使用主题语言模型增强了信息检索的性能,但是仍然不能解决信息检索存在的一些难点问题,如数据稀疏问题,同义词问题,多义词问题,对文档中不可见项和可见项的平滑问题。这些问题在一些领域相关文献检索中显得尤其重要,比如大规模的生物文献检索。本文提出了一种新的基于聚类的主题语言模型方法进行生物文献检索,这主要包括两个方面工作,一是采用本体库中的概念表示文档,并在此基础上进行模糊聚类,把聚类的结果作为数据集中的主题,文档属于某个主题的概率由文档与聚类的模糊相似度决定。二是采用EM算法来估计主题产生项的概率。把上述方法集成到语言模型中就得到本文的语言模型。本文的语言模型能够准确描述项在不同主题中的分布概率,以及文档属于某个主题的概率,并且利用本体中概念部分地解决了同义词问题,而且项可以由不同的主题产生,这也能够部分解决词的多义问题。本文的方法在TREC 2004/05 Genomics Track数据集上进行了测试,与简单语言模型以及现有主题语言模型相比,检索性能得到一定的提高。  相似文献   

19.
项目文档主题表征的好坏直接影响后续评审专家的推荐效果.为有效利用项目文档片段之间的关联关系进行项目主题分析,提出一种基于半监督图聚类的项目主题模型构建方法.该方法首先分析项目文档的结构特点,提取项目名称、项目关键字等能表征主题的结构信息,结合专家证据文档、专家主题关系网等能表征专家主题的外部资源,定义及提取项目文档片段之间的关联关系特征;然后,利用不同类型的关联关系计算项目文档片段之间的相关性,构建项目文档片段间的无向图模型;最后,利用已标记关联关系特征作为聚类的监督信息,采用半监督图聚类算法对项目文档片段进行聚类,从而实现项目主题的提取.项目主题提取对比实验结果验证了所提方法的有效性,项目文档结构化特征、专家证据文档以及专家主题关系网对项目主题模型的构建具有一定的指导作用.  相似文献   

20.
《Knowledge》2007,20(7):607-613
Discovering topics from large amount of documents has become an important task recently. Most of the topic models treat document as a word sequence, whether in discrete character or term frequency form. However, the number of words in a document is greatly different from that in other documents. This will lead to several problems for current topic models in dealing with topics analysis. On the other hand, it is difficult to perform topic transition analysis based on current topic models. In an attempt to overcome these deficiencies, a variable space hidden Markov model (VSHMM) is proposed to represent the topics, and several operations based on space computation are presented. A hierarchical clustering algorithm with dynamically changing of the component number in topic model is proposed to demonstrate the effectiveness of the VSHMM. Method of document partition based on topic transition is also present. Experiments on a real-world dataset show that the VSHMM can improve the accuracy while decreasing the algorithm’s time complexity greatly compared with the algorithm based on current mixture model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号