首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
于广川  贺瑞芳  刘洋  党建武 《软件学报》2017,28(10):2654-2673
时序推特摘要是文本摘要任务中的一个重要分支,旨在从热点事件相关的海量推特流中总结出随时间演化的简要推特集,以帮助用户快速获取信息.推特作为当今最流行的社交媒体平台,其信息量爆发式的增长以及文本碎片的非结构性,使得单纯依赖文本内容的传统摘要方法不再适用.与此同时,社交媒体的新特性也为推特摘要带来了新的机遇.将推特流视作信号,剖析了其中的复杂噪声,提出融合推特流随时序变化的宏微观信号以及用户社交上下文语境信息的时序推特摘要新方法.首先,通过小波分析对推特流全局时序信息建模,实现某一关键词相关的热点子事件时间点检测;接着,融入推特流局部时序信息和用户社交信息建立推特的随机步图模型摘要框架,为每个热点子事件生成推特摘要.在算法评估过程中,对真实推特数据集进行了专家时间点和专家摘要的人工标注,实验结果表明了小波分析和融合了时序-社交上下文语境的图模型在时序推特摘要中的有效性.  相似文献   

2.
微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。  相似文献   

3.
通过对自动文摘技术的研究,针对叙事类文本,以事件作为基本语义单元,提出一种基于事件的多主题文本自动文摘方法。利用事件和事件间的关系构建事件网络文本表示模型,使用社区划分算法解决子事件主题划分问题。实验结果表明,该方法提取出的准确率、召回率及F值较高,能更好地概括文本的内容。  相似文献   

4.
为帮助数据持有者规避法律风险,有必要对数据集中的个人信息做检测和统计.然而当前尚缺有效工具支持检测中文数据集中的个人信息.为应对上述问题,根据法律文献整理出需要检测的个人信息类别,提出综合了模式匹配与自然语言处理技术的个人信息自动化检测框架,对中文文本中的个人信息进行检测.同时,提出一种识别家庭住址的方法,解决地址格式...  相似文献   

5.
New event detection (NED), which is crucial to firms’ environmental surveillance, requires timely access to and effective analysis of live streams of news articles from various online sources. These news articles, available in unprecedent frequency and quantity, are difficult to sift through manually. Most of existing techniques for NED are full-text-based; typically, they perform full-text analysis to measure the similarity between a new article and previous articles. This full-text-based approach is potentially ineffective, because a news article often contains sentences that are less relevant to define the focal event being reported and the inclusion of these less relevant sentences into the similarity estimation can impair the effectiveness of NED. To address the limitation of the full-text-based approach and support NED more effectively and efficiently, this study proposes and develops a summary-based event detection method that first selects relevant sentences of each article as a summary, then uses the resulting summaries to detect new events. We empirically evaluate our proposed method in comparison with some prevalent full-text-based techniques, including a vector space model and two deep-learning-based models. Our evaluation results confirm that the proposed method provides greater utilities for detecting new events from online news articles. This study demonstrates the value and feasibility of the text summarization approach for generating news article summaries for detecting new events from live streams of online news articles, proposes a new method more effective and efficient than the benchmark techniques, and contributes to NED research in several important ways.  相似文献   

6.
Clustering over Multiple Evolving Streams by Events and Correlations   总被引:1,自引:0,他引:1  
In applications of multiple data streams such as stock market trading and sensor network data analysis, the clusters of streams change at different times because of data evolution. The information about evolving cluster is valuable to support corresponding online decisions. In this paper, we present a framework for clustering over multiple evolving streams by correlations and events, which, abbreviated as COMET-CORE, monitors the distribution of clusters over multiple data streams based on their correlation. Instead of directly clustering the multiple data streams periodically, COMET-CORE applies efficient cluster split and merge processes only when significant cluster evolution happens. Accordingly, we devise an event detection mechanism to signal the cluster adjustments. The coming streams are smoothed as sequences of end points by employing piecewise linear approximation. At the time when end points are generated, weighted correlations between streams are updated. End points are good indicators of significant change in streams, and this is a main cause of a cluster evolution event. When an event occurs, through split and merge operations we can report the latest clustering results. As shown in our experimental studies, COMET-CORE can be performed effectively with good clustering quality.  相似文献   

7.
原始RFID数据流上复杂事件处理研究   总被引:1,自引:0,他引:1  
一般的RFID复杂事件检测是建立在经过数据清洗的数据模型上,但RFID数据清洗往往代价较高且目的单一,更为影响效率的是其数据清洗步骤和复杂事件处理步骤需要扫描数据流两次.针对这些问题,提出直接在原始RFID数据流上进行复杂事件处理,将数据清洗步骤与复杂事件处理步骤相结合的方法,并设计出了集成此方法的复杂事件处理引擎架构,最后编程实现了上述架构的处理引擎.通过大量对比实验分析验证了该方法的正确性与高效性.  相似文献   

8.
Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, transformation, pattern mining, and similarity search have been studied extensively. Although the time series data are dynamic by nature, as in data streams, current solutions to these fundamental problems have been mostly for the static time series databases. In this paper, we first propose a framework to online summary generation for large-scale and dynamic time series data, such as data streams. Then, we propose online transform-based summarization techniques over data streams that can be updated in constant time and space. We present both the exact and approximate versions of the proposed techniques and provide error bounds for the approximate case. One of our main contributions in this paper is the extensive performance analysis. Our experiments carefully evaluate the quality of the online summaries for point, range, and knn queries using real-life dynamic data sets of substantial size. Edited by W. Aref  相似文献   

9.
Structured scenario photos, referring to the images which capture important events that usually follow specific routines/structures (such as wedding ceremonies, graduation ceremonies, etc.), account for a significant proportion in personal photo collections. Conventional image analysis techniques without considering the event routines/structures are not sufficient to handle these photos. In this paper, we explore the appropriate framework to learn and utilize the specific routines for understanding these structure scenario photos. Specifically, we propose a novel framework which can systematically integrate Hidden Markov Model and Gaussian Mixture Model to recognize sub-events from structured scenario photos. Then we present a comprehensive criterion to select representative images to summarize the whole photo collection. Experimental results conducted on the real-world datasets demonstrate the superiority of our framework in both of sub-event recognition and photo summarization tasks.  相似文献   

10.
随着Web 2.0的兴起以及移动互联网与智能终端的蓬勃发展,以微博为代表的社交媒体迅速发展壮大。基于社交媒体的事件脉络挖掘技术在突发事件检测、事件走势分析、舆情预测等诸多方面发挥着重要作用,受到学术界的广泛关注。该文在最新研究成果与文献的基础上,以事件脉络挖掘的实现为出发点,概括总结了核心步骤中存在的关键技术,并归纳提出了目前事件脉络挖掘与分析过程中存在的4个关键性的技术问题与挑战,分别如下: 多模态信息融合条件下的事件脉络生成、跨媒介异构数据协同下的事件挖掘与事件脉络生成、层次化多粒度复杂事件的关系映射和实时数据条件下动态事件的快速识别与脉络生成。同时,针对上述关键问题与技术挑战进行了理论探讨、工作进展与趋势分析以及实际应用介绍,从而为深入研究和解决基于社交媒体的事件脉络挖掘技术提供了新的研究线索与方向。  相似文献   

11.
社会网络中海量、无序且碎片化的新闻数据,使得人们无法从细粒度感知新闻事件,更无法多视角把握事件发展脉络.为了解决这个问题,该文提出基于命名实体敏感的分层新闻故事线生成方法,在无监督的情况下,充分利用新闻信息构造层次化、多视点的事件脉络.该方法主要通过以下3个步骤实现:①基于事件主题信息与隐式语义信息相结合的方法检测事件...  相似文献   

12.
Twitter is one of the most popular social media platforms for online users to create and share information. Tweets are short, informal, and large-scale, which makes it difficult for online users to find reliable and useful information, arising the problem of Twitter summarization. On the one hand, tweets are short and highly unstructured, which makes traditional document summarization methods difficult to handle Twitter data. On the other hand, Twitter provides rich social-temporal context beyond texts, bringing about new opportunities. In this paper, we investigate how to exploit social-temporal context for Twitter summarization. In particular, we provide a methodology to model temporal context globally and locally, and propose a novel unsupervised summarization framework with social-temporal context for Twitter data. To assess the proposed framework, we manually label a real-world Twitter dataset. Experimental results from the dataset demonstrate the importance of social-temporal context in Twitter summarization.  相似文献   

13.
We propose a novel approach based on predictive quantization (PQ) for online summarization of multiple time-varying data streams. A synopsis over a sliding window of most recent entries is computed in one pass and dynamically updated in constant time. The correlation between consecutive data elements is effectively taken into account without the need for preprocessing. We extend PQ to multiple streams and propose structures for real-time summarization and querying of a massive number of streams. Queries on any subsequence of a sliding window over multiple streams are processed in real time. We examine each component of the proposed approach, prediction, and quantization separately and investigate the space-accuracy trade-off for synopsis generation. Complementing the theoretical optimality of PQ-based approaches, we show that the proposed technique, even for very short prediction windows, significantly outperforms the current techniques for a wide variety of query types on both synthetic and real data sets.  相似文献   

14.
We present the multivariate Bayesian scan statistic (MBSS), a general framework for event detection and characterization in multivariate spatial time series data. MBSS integrates prior information and observations from multiple data streams in a principled Bayesian framework, computing the posterior probability of each type of event in each space-time region. MBSS learns a multivariate Gamma-Poisson model from historical data, and models the effects of each event type on each stream using expert knowledge or labeled training examples. We evaluate MBSS on various disease surveillance tasks, detecting and characterizing outbreaks injected into three streams of Pennsylvania medication sales data. We demonstrate that MBSS can be used both as a “general” event detector, with high detection power across a variety of event types, and a “specific” detector that incorporates prior knowledge of an event’s effects to achieve much higher detection power. MBSS has many other advantages over previous event detection approaches, including faster computation and easy interpretation and visualization of results, and allows faster and more accurate event detection by integrating information from the multiple streams. Most importantly, MBSS can model and differentiate between multiple event types, thus distinguishing between events requiring urgent responses and other, less relevant patterns in the data.  相似文献   

15.
在实际的供应链系统中,物品通常会被包装起来流通,检测最低包装层级物品的标签代价高昂。现有的在线和离线的无线射频识别(radio frequency identification,RFID)复杂事件检测方法中都假定可以检测到每一个最低包装层级的标签,不支持含有多种包装层级数据的RFID数据流上的复杂事件检测。根据部署有RFID的供应链系统产生的RFID数据流的特点,提出了一种新的复杂事件检测方法。采用区间编码离线保存物品的包装关系,通过在线数据和离线数据结合来完成复杂事件检测,对不同类型的复杂事件采用不同的检测策略以提高复杂事件检测效率。实验证明该方法能够有效地支持供应链系统中的复杂事件检测,并具有较好的性能。  相似文献   

16.
提出了一种基于主题与子事件抽取的多文档自动文摘方法。该方法突破传统词频统计方法,除考虑词语频率、位置信息外,还将词语是否为描述文本集合的主题和子事件作为因素,提取出了8个基本特征,利用逻辑回归模型预测基本特征对词语权重的影响,计算词语权重。通过建立句子向量空间模型给句子打分,结合句子分数和冗余度产生文摘。对N-gram同现频率、主题词覆盖率和高频词覆盖率3种不同参数,分别在Coverage Baseline、Centroid-Based Summary和Word Mining based Summary(WMS)3种不同文摘系统下所产生的文摘质量,进行了对比实验,结果表明WMS系统在多方面具有优越的性能。  相似文献   

17.
RFID数据流随着时间而不断变化,捕捉其中蕴含的变化可以用于检测有意义事件的发生.提出了一种捕获数据流事件的算法--CECD,通过分析聚类结果分布变化和值域中产生的偏差检测数据流中蕴含的变化,同时采用组合分类技术对变化进行分类,捕获观察到的事件或现象的特性,建立事件与响应的映射关系.实验证明提出的框架可以高效检测数据流上的变化,与不借助变化检测的单纯基于规则的事件检测方法相比可以更准确地捕获事件.  相似文献   

18.
Event detection is a fundamental information extraction task, which has been explored largely in the context of question answering, topic detection and tracking, knowledge base population, news recommendation, and automatic summarization. In this article, we explore an event detection framework to improve a key phrase-guided centrality-based summarization model. Event detection is based on the fuzzy fingerprint method, which is able to detect all types of events in the ACE 2005 Multilingual Corpus. Our base summarization approach is a two-stage method that starts by extracting a collection of key phrases that will be used to help the centrality-as-relevance retrieval model. We explored three different ways to integrate event information, achieving state-of-the-art results in text and speech corpora: (1) filtering of nonevents, (2) event fingerprints as features, and (3) combination of filtering of nonevents and event fingerprints as features.  相似文献   

19.
In recent years, microblogs have become an important source for reporting real-world events. A real-world occurrence reported in microblogs is also called a social event. Social events may hold critical materials that describe the situations during a crisis. In real applications, such as crisis management and decision making, monitoring the critical events over social streams will enable watch officers to analyze a whole situation that is a composite event, and make the right decision based on the detailed contexts such as what is happening, where an event is happening, and who are involved. Although there has been significant research effort on detecting a target event in social networks based on a single source, in crisis, we often want to analyze the composite events contributed by different social users. So far, the problem of integrating ambiguous views from different users is not well investigated. To address this issue, we propose a novel framework to detect composite social events over streams, which fully exploits the information of social data over multiple dimensions. Specifically, we first propose a graphical model called location-time constrained topic (LTT) to capture the content, time, and location of social messages. Using LTT, a social message is represented as a probability distribution over a set of topics by inference, and the similarity between two messages is measured by the distance between their distributions. Then, the events are identified by conducting efficient similarity joins over social media streams. To accelerate the similarity join, we also propose a variable dimensional extendible hash over social streams. We have conducted extensive experiments to prove the high effectiveness and efficiency of the proposed approach.  相似文献   

20.
Traditional summarization methods only use the internal information of a Web document while ignoring its social information such as tweets from Twitter, which can provide a perspective viewpoint for readers towards an event. This paper proposes a framework named SoRTESum to take the advantages of social information such as document content reflection to extract summary sentences and social messages. In order to do that, the summarization was formulated in two steps: scoring and ranking. In the scoring step, the score of a sentence or social message is computed by using intra-relation and inter-relation which integrate the support of local and social information in a mutual reinforcement form. To calculate these relations, 16 features are proposed. After scoring, the summarization is generated by selecting top m ranked sentences and social messages. SoRTESum was extensively evaluated on two datasets. Promising results show that: (i) SoRTESum obtains significant improvements of ROUGE-scores over state-of-the-art baselines and competitive results with the learning to rank approach trained by RankBoost and (ii) combining intra-relation and inter-relation benefits single-document summarization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号