首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
视频专题演化分析有助于从海量的视频数据中发现有价值的模式。研究了基于聚类的视频专题演化分析方法,首先基于二部图对视频的视觉相似性进行分析;在此基础上,为增强同一专题视频之间的关联度以及不同专题视频之间的区分度,采用基于链路分析的方法对视频专题进行聚类,进而对视频专题的演化过程进行分析;最后通过实验证明了所提方法的有效性。  相似文献   

2.
一种基于LDA的在线主题演化挖掘模型   总被引:3,自引:1,他引:2  
崔凯  周斌  贾焰  梁政 《计算机科学》2010,37(11):156-159
基于文本内容的隐含语义分析建立在线主题演化计算模型,通过追踪不同时间片内主题的变化趋势进行主题演化分析。将Latent Dirichlet Allocation(LDA)模型扩展到在线文本流,建立并实现了在线LDA模型;利用前一时间片的后验概率影响当前时间片的先验概率来维持主题间的连续性;根据改进的增量Gibbs算法进行推理,获取主题一词和文档一主题的概率分布,利用KullbackLeibler(KL)相对嫡来衡量主题之间的相似度,从而发现主题演化中的“主题遗传”和“主题变异”。实验结果表明,该模型能从互联网语料中找出主题的演化趋势,具有良好的效果。  相似文献   

3.
Video scene segmentation using Markov chain Monte Carlo   总被引:3,自引:0,他引:3  
Videos are composed of many shots that are caused by different camera operations, e.g., on/off operations and switching between cameras. One important goal in video analysis is to group the shots into temporal scenes, such that all the shots in a single scene are related to the same subject, which could be a particular physical setting, an ongoing action or a theme. In this paper, we present a general framework for temporal scene segmentation in various video domains. The proposed method is formulated in a statistical fashion and uses the Markov chain Monte Carlo (MCMC) technique to determine the boundaries between video scenes. In this approach, a set of arbitrary scene boundaries are initialized at random locations and are automatically updated using two types of updates: diffusion and jumps. Diffusion is the process of updating the boundaries between adjacent scenes. Jumps consist of two reversible operations: the merging of two scenes and the splitting of an existing scene. The posterior probability of the target distribution of the number of scenes and their corresponding boundary locations is computed based on the model priors and the data likelihood. The updates of the model parameters are controlled by the hypothesis ratio test in the MCMC process, and the samples are collected to generate the final scene boundaries. The major advantage of the proposed framework is two-fold: 1) it is able to find the weak boundaries as well as the strong boundaries, i.e., it does not rely on the fixed threshold; 2) it can be applied to different video domains. We have tested the proposed method on two video domains: home videos and feature films, and accurate results have been obtained.  相似文献   

4.
主题模型LDA的多文档自动文摘   总被引:3,自引:0,他引:3  
近年来使用概率主题模型表示多文档文摘问题受到研究者的关注.LDA (latent dirichlet allocation)是主题模型中具有代表性的概率生成性模型之一.提出了一种基于LDA的文摘方法,该方法以混乱度确定LDA模型的主题数目,以Gibbs抽样获得模型中句子的主题概率分布和主题的词汇概率分布,以句子中主题权重的加和确定各个主题的重要程度,并根据LDA模型中主题的概率分布和句子的概率分布提出了2种不同的句子权重计算模型.实验中使用ROUGE评测标准,与代表最新水平的SumBasic方法和其他2种基于LDA的多文档自动文摘方法在通用型多文档摘要测试集DUC2002上的评测数据进行比较,结果表明提出的基于LDA的多文档自动文摘方法在ROUGE的各个评测标准上均优于SumBasic方法,与其他基于LDA模型的文摘相比也具有优势.  相似文献   

5.
This paper proposes a novel approach to generate and analyze path model by structure equation modeling (SEM). SEM is an important technique to carry out causal analysis based on path model. As such, constructing path models, which result in reliable analysis, are important in SEM. LSA-based method, which is used to build a path model from text data, is proposed. However, this method requires each document to belong to one topic; thus, the model cannot express natural variables and relationships. Therefore, this paper extends the existing approach to latent Dirichlet allocation (LDA) and generates a path model from the extracted topics by LDA. Experiments using review text data can confirm the feasibility and applicability of the proposed process.  相似文献   

6.
User communities in social networks are usually identified by considering explicit structural social connections between users. While such communities can reveal important information about their members such as family or friendship ties and geographical proximity, just to name a few, they do not necessarily succeed at pulling like‐minded users that share the same interests together. Therefore, researchers have explored the topical similarity of social content to build like‐minded communities of users. In this article, following the topic‐based approaches, we are interested in identifying communities of users that share similar topical interests with similar temporal behavior. More specifically, we tackle the problem of identifying temporal (diachronic) topic‐based communities, i.e., communities of users who have a similar temporal inclination toward emerging topics. To do so, we utilize multivariate time series analysis to model the contributions of each user toward emerging topics. Further, our modeling is completely agnostic to the underlying topic detection method. We extract topics of interest by employing seminal topic detection methods; one graph‐based and two latent Dirichlet allocation‐based methods. Through our experiments on Twitter data, we demonstrate the effectiveness of our proposed temporal topic‐based community detection method in the context of news recommendation, user prediction, and document timestamp prediction applications, compared with the nontemporal as well as the state‐of‐the‐art temporal approaches.  相似文献   

7.
微博话题检测是当前研究的热点,提出一种基于复杂网络重叠社团发现的微博话题检测方法。该方法对一段时间内的微博数据进行预处理,在分词后,根据词性以及词的时域分布抽取出主题词,在相关度高的主题词之间构造边得到复杂网络。引入社团独立模块度的概念,并通过社团独立模块度最大化模型发现重叠社团,把每个社团看成一个微博话题。重叠社团发现的方法可以解决由一个或多个主题词属于多个话题引起的话题检测准确率低的问题。实验结果证明了该方法在微博话题检测中的有效性。  相似文献   

8.
郑世卓  崔晓燕 《软件》2014,(1):46-48
在如今信息数据大爆炸的时代,数据的增长呈现指数级增长,而且其中大部分数据是非结构化数据,这些数据中蕴藏着大量且重要的知识等待着我们用合理的办法将其挖掘出来,如何方便合理快速的进行文本分类也是一个非常重要的课题。LDA模型是一种无监督的模型,它可以发现隐性的主题,为了更有效的发现隐性主题,本文提出一种基于半监督的LDA主题模型,找到一个主题集作为隐性层的知识集,通过这种方法找到的主题与文本更相关,另外,将LDA模型与基于半监督LDA模型应用于文本的特征提取,并与其它特征提取方法比对,实验表明,半监督LDA模型性能略好。  相似文献   

9.
基于词相似性与CRP的主题模型   总被引:1,自引:0,他引:1  
主题模型能提取隐含在文档中的主题,使文档可按主题进行归约、分类和检索,成为信息分类和检索领域的研究热点。针对LDA(Latent Dirichlet Allocation)主题模型不能自动确定主题数目的问题,提出一种结合词相似性与CRP(Chinese Restaurant Process )的隐主题模型,可自适应地动态更新主题内容,确定合理的主题数目。同时提出一种在动态更新主题数时超参数设置方法。在中医临床诊疗数据的实验中,获得领域专家解释性较好的分析结果。  相似文献   

10.
针对传统主题模型在挖掘多源文本数据集信息时存在主题发现效果不佳的问题,设计一种基于狄利克雷多项式分配(DMA)与特征划分的多源文本主题模型。以DMA模型为基础,放宽对预先输入的主题数量的限制,为每个数据源分配专有的主题分布参数,使用Gibbs采样算法估计每个数据源的主题数量。同时,对每个数据源分配专有的噪音词分布参数以及主题-词分布参数,采用特征划分方法区分每个数据源的特征词和噪音词,并学习每个数据源的用词特征,避免噪音词集对模型聚类的干扰。实验结果表明,与传统主题模型相比,该模型能够保留每个数据源特有的词特征,具有更好的主题发现效果及鲁棒性。  相似文献   

11.
This paper presents the extraction of a boundary independent dynamic compact thermal model (DCTM). The paper specifically focuses on time-varying Dirichlet boundary conditions influence and the methodology proposed to obtain the DCTM is applied to a thermopile based infrared (IR) sensor. These type of sensors are quite sensitive to environment changes because a variation in the sensor bulk silicon temperature usually generates a different temperature influence in the hot and cold areas that can produce incorrect transient measurements of the incident IR radiation. A DCTM can be used to estimate the influence of the environmental temperature evolution in the sensor output and with the help of a temperature sensor correct the measurement of the IR incident radiation in the real device. The methodology to construct the DCTM is based in the construction of an equivalent thermal RC network, the topology of which, as well as its component values, are obtained from the analysis of the dynamic power-temperature relationship on the points of interest.  相似文献   

12.
This paper investigates the possibility of extracting latent aspects of a video in order to develop a video fingerprinting framework. Semantic visual information about humans, more specifically face occurrences in video frames, along with a generative probabilistic model, namely the Latent Dirichlet Allocation (LDA), are used for this purpose. The latent variables, namely the video topics are modeled as a mixture of distributions of faces in each video. The method also involves a clustering approach based on Scale Invariant Features Transform (SIFT) for clustering the detected faces and adapts the bag-of-words concept into a bag-of-faces one, in order to ensure exchangeability between topics distributions. Experimental results, on three different data sets, provide low misclassification rates of the order of 2% and false rejection rates of 0%. These rates provide evidence that the proposed method performs very efficiently for video fingerprinting.  相似文献   

13.
In this paper, we propose a hierarchical probabilistic model for scene classification. This model infers the local–class–shared and local–class-specific latent topics respectively. Our approach consists of first learning the latent topics from the BoW representation and subsequently, training SVM on the distribution of the latent topics. This approach is compared to that of using traditional graphical models to learn the latent topics and training SVM on the topic distribution. The experiments on a variety of datasets show that the topics learned by our model have higher discriminative power.  相似文献   

14.
An extended susceptible-infective (SI) epidemic model is presented in this paper to describe the collective blogging behavior on popular incidental topics. Our model has two major extensions over the classic SI model: in the new model, different blog writers get interested in a specific topic with different probabilities, while in a classic SI model, the infection probability of a disease between any two individuals is identical; the new model takes into consideration the impact of external mainstream media on blog writers, while in a classical SI model, spreading of diseases is merely based on personal contacts between individuals. The new model is capable of explaining the widely observed early burst and heavy tail of topic propagation velocity. The proposed model has a closed-form solution when the individual interest is of uniform distribution with the external influence assumed constant. We validate the proposed model using ten topics from two different data sets: Sina Blog and LiveJournal Blogspace, the results indicating that our model fits the topic propagation velocity and predicts the propagation trend very well.  相似文献   

15.
在基于链接的概率隐含语义分析的基础上提出一种融合文本链接的增量方法进行主题建模。首先在原有网页集上进行主题建模;然后随着网页的结构和内容动态变化,利用一种合理的更新机制更新模型参数,从而高效快速地处理在线网页流的动态变化。此外,提出一个自适应非对称学习方法融合文本与链接模态的隐含主题。对于每个网页,它在两种模态上的主题分布通过加权进行融合,而权值由该网页的特征词分布的熵值确定。由于融合之后的概率结构合理地关联了链接模态和文本模态的信息,故能得到很好的建模效果。两种类型的数据集上的实验结果显示该算法可以有效地节省时间,并对网页分类有较大性能的提高,此外还提供了由本文模型生成的主题显示结果。  相似文献   

16.
In this paper, we propose a hierarchical Bayesian model, an improved hierarchical Dirichlet process-hidden Markov model (iHDP-HMM), for visual document analysis. The iHDP-HMM is capable of clustering visual documents and capturing the temporal correlations between the visual words within a visual document while identifying the number of document clusters and the number of visual topics adaptively. A Bayesian inference mechanism for the iHDP-HMM is developed to carry out likelihood evaluation, topic estimation, and cluster membership prediction. We apply the iHDP-HMM to simultaneously cluster motion trajectories and discover latent topics for trajectory words, based on the proposed method for constructing the trajectory word codebook. Then, an iHDP-HMM-based probabilistic trajectory retrieval framework is developed. The experimental results verify the clustering accuracy of the iHDP-HMM and trajectory retrieval accuracy of the proposed framework.  相似文献   

17.
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.  相似文献   

18.
随着网络的发展,主题提取的应用越来越广泛,尤其是学术文献的主题提取。尽管学术文献摘要是短文本,但其具有高维性的特点导致文本主题模型难以处理,其时效性的特点致使主题挖掘时容易忽略时间因素,造成主题分布不均、不明确。针对此类问题,提出一种基于TTF-LDA(time+tf-idf+latent Dirichlet allocation)的学术文献摘要主题聚类模型。通过引入TF-IDF特征提取的方法,对摘要进行特征词的提取,能有效降低LDA模型的输入文本维度,融合学术文献的发表时间因素,建立时间窗口,限定学术文献主题分析的时间,并通过文献的发表时间增加特征词的时间权重,使用特征词的时间权重之和协同主题引导特征词词库作为LDA的影响因子。通过在爬虫爬取的数据集上进行实验,与标准的LDA和MVC-LDA相比,在选取相同的主题数的情况下,模型的混乱程度更低,主题与主题之间的区分度更高,更符合学术文献本身的特点。  相似文献   

19.
We propose two new models for human action recognition from video sequences using topic models. Video sequences are represented by a novel “bag-of-words” representation, where each frame corresponds to a “word.” Our models differ from previous latent topic models for visual recognition in two major aspects: first of all, the latent topics in our models directly correspond to class labels; second, some of the latent variables in previous topic models become observed in our case. Our models have several advantages over other latent topic models used in visual recognition. First of all, the training is much easier due to the decoupling of the model parameters. Second, it alleviates the issue of how to choose the appropriate number of latent topics. Third, it achieves much better performance by utilizing the information provided by the class labels in the training set. We present action classification results on five different data sets. Our results are either comparable to, or significantly better than previously published results on these data sets.  相似文献   

20.
为了对教学视频这一专门类别视频进行自动标注,本文首先提取视频中的字幕信息,通过文本预处理后,使用视频中的字幕文本信息内容结合潜在狄利克雷分布(Latent Dirichlet allocation,LDA)主题模型方法获得视频镜头在主题上的概率分布,通过计算主题概率分布差异,进行语义层面镜头分割。然后以镜头为样本,使用安全的半监督支持向量机(Safe semi-supervised support vector machine,S4VM)方法,通过少量的标注镜头样本,完成对未标注镜头的自动标注。实验结果表明,本文方法利用字幕文本信息和LDA模型,有效完成了视频的语义镜头分割,不仅可以对镜头完成标注,而且可以对整个视频进行关键词标注。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号