首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Twitter is among the fastest‐growing microblogging and online social networking services. Messages posted on Twitter (tweets) have been reporting everything from daily life stories to the latest local and global news and events. Monitoring and analyzing this rich and continuous user‐generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire actionable knowledge. This article provides a survey of techniques for event detection from Twitter streams. These techniques aim at finding real‐world occurrences that unfold over space and time. In contrast to conventional media, event detection from Twitter streams poses new challenges. Twitter streams contain large amounts of meaningless messages and polluted content, which negatively affect the detection performance. In addition, traditional text mining techniques are not suitable, because of the short length of tweets, the large number of spelling and grammatical errors, and the frequent use of informal and mixed language. Event detection techniques presented in literature address these issues by adapting techniques from various fields to the uniqueness of Twitter. This article classifies these techniques according to the event type, detection task, and detection method and discusses commonly used features. Finally, it highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.  相似文献   

2.
The automated detection of points in a time series with a special meaning to a user, commonly referred to as the detection of events, is an important aspect of temporal data mining. These events often are points in a time series that can be peaks, level changes, sudden changes of spectral characteristics, etc. Fast algorithms are needed for event detection for online applications or applications with huge time series data sets. In this article, we present a very fast algorithm for event detection that learns detection criteria from labeled sample time series (i.e., time series where events are marked). This algorithm is based on fast transformations of time series into low-dimensional feature spaces and probabilistic modeling techniques to identify criteria in a supervised manner. Events are then found in one, single fast pass over the signal (therefore, the algorithm is called SwiftEvent) by evaluating learned thresholds on Mahalanobis distances in the feature space. We analyze the run-time complexity of SwiftEvent and demonstrate its application in some use cases with artificial and real-world data sets in comparison with other state-of-the-art techniques.  相似文献   

3.
While it is possible to analyze the run-time behavior of a business process through process mining techniques, in practice there is often a gap between the low-level nature of the events recorded in an event log and the high-level of abstraction at which the process is modeled. This makes it difficult to understand the recorded behavior in terms of the high-level activities in the process model. Also, it makes it difficult to improve the model based on run-time data about the process. In this work we present an approach to mine mappings between the events in the log and the activities in the model. These mappings can be used to generate suggestions of how the process model can be extended in order to capture the behavior recorded in the event log. Using a real-world and publicly available event log, we show how the approach can improve the model in a stepwise manner, until it covers all the behavior recorded in the event log.  相似文献   

4.
The increasing popularity of Twitter as social network tool for opinion expression as well as information retrieval has resulted in the need to derive computational means to detect and track relevant topics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we apply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events.  相似文献   

5.
Twitter has become an important data source for detecting events, especially tracking detailed information for events of a specific domain. Previous studies on targeted-domain Twitter information extraction have used supervised learning techniques to identify domain-related tweets, however, the need for extensive manual labeling makes these supervised systems extremely expensive to build and maintain. What’s more, most of these existing work fail to consider spatiotemporal factors, which are essential attributes of target-domain events. In this paper, we propose a semi-supervised method for Automatical Targeted-domain Spatiotemporal Event Detection (ATSED) in Twitter. Given a targeted domain, ATSED first learns tweet labels from historical data, and then detects on-going events from real-time Twitter data streams. Specifically, an efficient label generation algorithm is proposed to automatically recognize tweet labels from domain-related news articles, a customized classifier is created for Twitter data analysis by utilizing tweets’ distinguishing features, and a novel multinomial spatial-scan model is provided to identify geographical locations for detected events. Experiments on 305 million tweets demonstrated the effectiveness of this new approach.  相似文献   

6.
Twitter has recently emerged as a popular microblogging service that has 284 million monthly active users around the world. A part of the 500 million tweets posted on Twitter everyday are personal observations of immediate environment. If provided with time and location information, these observations can be seen as sensory readings for monitoring and localizing objects and events of interests. Location information on Twitter, however, is scarce, with less than 1% of tweets have associated GPS coordinates. Current researches on Twitter location inference mostly focus on city-level or coarser inference, and cannot provide accurate results for fine-grained locations. We propose an event monitoring system for Twitter that emphasizes local events, called SNAF (Sense and Focus). The system filters personal observations posted on Twitter and infers location of each report. Our extensive experiments with real Twitter data show that, the proposed observation filtering approach can have about 22% improvement over existing filtering techniques, and our location inference approach can increase the location accuracy by up to 36% within the 3km error range. By aggregating the observation reports with location information, our prototype event monitoring system can detect real world events, in many case earlier than news reports.  相似文献   

7.
Online social network such as Twitter, Facebook and Instagram are increasingly becoming the go-to medium for users to acquire information and discuss what is happening globally. Understanding real-time conversations with masses on social media platforms can provide rich insights into events, provided that there is a way to detect and characterise events. To this end, in the past twenty years, many researchers have been developing event detection methods based on the data collected from various social media platforms. The developed methods for discovering events are generally modular in design and novel in scale and speed. To review the research in this field, we line up existing works for event detection in online social networks and organise them to provide a comprehensive and in-depth survey. This survey comprises three major parts: research methodologies, the review of state-of-the-art literature and the evolution of significant challenges. Each part is supposed to attract readers with different motivations and expectations on the ‘things’ delivered in this survey. For example, the methodologies provide the life-cycle to design new event detection models, from data collection to model evaluations. A timeline and a taxonomy of existing methods are also introduced to elaborate the development of various technologies under the umbrella of event detection. These two parts benefit those with a background in event detection and want to commit a deep exploration of existing models such as discussing their pros and cons alike. The third part shows the development of the major open issues in this field. It also indicates the milestones of each challenge in terms of typical models. Our survey can contribute to the community by highlighting possible new problem statements and opening new research directions.  相似文献   

8.
Social networks once being an innoxious platform for sharing pictures and thoughts among a small online community of friends has now transformed into a powerful tool of information, activism, mobilization, and sometimes abuse. Detecting true identity of social network users is an essential step for building social media an efficient channel of communication. This paper targets the microblogging service, Twitter, as the social network of choice for investigation. It has been observed that dissipation of pornographic content and promotion of followers market are actively operational on Twitter. This clearly indicates loopholes in the Twitter’s spam detection techniques. Through this work, five types of spammers-sole spammers, pornographic users, followers market merchants, fake, and compromised profiles have been identified. For the detection purpose, data of around 1 Lakh Twitter users with their 20 million tweets has been collected. Users have been classified based on trust, user and content based features using machine learning techniques such as Bayes Net, Logistic Regression, J48, Random Forest, and AdaBoostM1. The experimental results show that Random Forest classifier is able to predict spammers with an accuracy of 92.1%. Based on these initial classification results, a novel system for real-time streaming of users for spam detection has been developed. We envision that such a system should provide an indication to Twitter users about the identity of users in real-time.  相似文献   

9.
Performance and scalability are critically-important for on-chip interconnect in many-core chip-multiprocessor systems. Packet-switched interconnect fabric, widely viewed as the de facto on-chip data communication backplane in the many-core era, offers high throughput and excellent scalability. However, these benefits come at the price of router latency due to run-time multi-hop data buffering and resource arbitration. The network accounts for a majority of on-chip data transaction latency. In this work, we propose dynamic in-network resource reservation techniques to optimize run-time on-chip data transactions. This idea is motivated by the need to preserve existing abstraction and general-purpose network performance while optimizing for frequently-occurring network events such as data transactions. Experimental studies using multithreaded benchmarks demonstrate that the proposed techniques can reduce on-chip data access latency by 28.4% on average in a 16-node system and 29.2% on average in a 36-node system.  相似文献   

10.
11.
Discovery of unusual regional social activities using geo-tagged microblogs   总被引:1,自引:0,他引:1  
The advent of microblogging services represented by Twitter evidently stirred a popular trend of personal update sharing from all over the world. Furthermore, the recent mobile device and wireless network technologies are greatly expanding the connectivity between people over the social networking sites. Regarding the shared buzzes over the sites as a crowd-sourced database reflecting a various kind of real-world events, we are able to conduct a variety of social analytics using the crowd power in much easier ways. In this paper, we propose a geo-social event detection method by finding out unusually crowded places based on the conception of social networking sites as a social event detector. In order to detect unusual statuses of a region, we previously construct geographical regularities deduced from geo-tagged microblogs. Especially, we utilize a large number of geo-tagged Twitter messages which are collected by means of our own tweets acquisition method in terms of geographic relevancy. By comparing to those regularities, we decide if there are any unusual events happening in monitoring geographical areas. Finally, we describe the experimental results to evaluate the proposed unusuality detection method on the basis of geographical regularities which are computed from a large number of real geo-tagged tweet dataset around Japan.  相似文献   

12.
13.
The volume of information generated by social and cellular networks has significantly increased in recent years. Automated collection of these data and its rapid analyses allow for better and faster detection of major (in terms of National impact) ‘real life’ events. This study uses data obtained from social networks such as Twitter and Google+. It proposes a mechanism for detecting major events and a system to alert on their manifestation. The article describes the considerations and needed algorithms required to develop and establish such a system. The methodology presented here is based on linking major events that occurred in Israel during the years 2011–2014, with information extracted from social networks. Results indicate that alerts were received shortly after the event occurred for most of major events. Such are large fires, earthquakes and terror attacks. However, attempts to achieve alerts for ‘local’ secondary events failed. This as their impact on the social network is low. Furthermore, it was found that the volume of false alerts depends on the type of domain and keywords.  相似文献   

14.
为了解决谣言检测中由于缺乏外部知识而导致模型难以感知内隐信息,进而限制了模型挖掘深层信息的能力这个问题,提出了基于知识图谱的多特征融合谣言检测方法(KGMRD)。首先,对于每个事件,将帖子和评论共同构建为一个文本序列,并利用分类器从中提取其中的情感特征,利用ConceptNet基于文本构造其知识图谱,将知识图谱中的实体表示利用注意力机制与文本的语义特征进行聚合,进而得到增强的语义特征表示;其次,在传播结构方面:对于每个事件,基于帖子的传播转发关系构建传播结构图,使用DropEdge对传播结构图进行剪枝,从而得到更有效的传播结构特征;最后,将得到的特征进行融合处理得到一个新的表示。在Weibo、Twitter15和Twitter16 三个真实数据集上,使用SVM-RBF等七个模型作为基线进行了对比实验。实验结果表明:对比当前效果最好的基线,提出的KGMRD方法在Weibo数据集的Acc指标提升了1.1%;在Twitter15和Twitter16数据集的Acc指标上提升了2.2%,实验证明提出的KGMRD方法是合理的、有效的。  相似文献   

15.
金大卫  施斯  易彩  杨兵 《计算机科学》2017,44(7):151-160
复杂事件处理技术从数据流中提取满足特定模式的事件序列,具有实时、海量、智能的特点,近年来引起了学术界和商业界的广泛关注。但是,之前的工作侧重于对单层复杂事件检测的研究。事实上,由于业务系统对信息有不同层次的需求,需要对事件进行分层处理,单层复杂事件检测并不能充分支持事件分层的需求。针对这种情况,在事件层次概念以及传统NFA模型的基础上,定义了分层复杂事件检测模型层次自动机NHA,基于NHA模型设计了更为直观高效的EH-Tree结构,并给出了分层复杂事件检测HCED算法和代价模型。最后以吞吐量和内存占用为指标,进行了大量的实验,对比并分析了HCED算法与传统基于NFA模型的SASE算法的时间性能和空间性能。实验结果表明,HCED算法能有效且高效地实现分层复杂事件检测,填补了CEP不支持分层复杂事件检测的空白,为下一步研究提供了基础。  相似文献   

16.
17.
Detection of events using voluntarily generated content in microblogs has been the objective of numerous recent studies. One essential challenge tackled in these studies is estimating the locations of events. In this paper, we review the state-of-the-art location estimation techniques used in the localization of events detected in microblogs, particularly in Twitter, which is one of the most popular microblogging platforms worldwide. We analyze these techniques with respect to the targeted event type, granularity of estimated locations, location-related features selected as sources of spatial evidence, and the method used to make aggregate decisions based on the extracted evidence. We discuss the strengths and advantages of alternative solutions to various problems related to location estimation, as well as their preconditions and limitations. We examine the most widely used evaluation methods to analyze the accuracy of estimations and present the results reported in the literature. We also discuss our findings and highlight important research challenges that may need further attention.  相似文献   

18.
The number of people and organizations using online social networks as a new way of communication is continually increasing. Messages that users write in networks and their interactions with other users leave a digital trace that is recorded. In order to understand what is going on in these virtual environments, it is necessary systems that collect, process, and analyze the information generated. The majority of existing tools analyze information related to an online event once it has finished or in a specific point of time (i.e., without considering an in-depth analysis of the evolution of users’ activity during the event). They focus on an analysis based on statistics about the quantity of information generated in an event. In this article, we present a multi-agent system that automates the process of gathering data from users’ activity in social networks and performs an in-depth analysis of the evolution of social behavior at different levels of granularity in online events based on network theory metrics. We evaluated its functionality analyzing users’ activity in events on Twitter.  相似文献   

19.
This paper presents a series of studies on probabilistic properties of activity data in an information system for detecting intrusions into the information system. Various probabilistic techniques of intrusion detection, including decision tree, Hotelling's T2 test, chi-square multivariate test, and Markov chain are applied to the same training set and the same testing set of computer audit data for investigating the frequency property and the ordering property of computer audit data. The results of these studies provide answers to several questions concerning which properties are critical to intrusion detection. First, our studies show that the frequency property of multiple audit event types in a sequence of events is necessary for intrusion detection. A single audit event at a given time is not sufficient for intrusion detection. Second, the ordering property of multiple audit events provides additional advantage to the frequency property for intrusion detection. However, unless the scalability problem of complex data models taking into account the ordering property of activity data is solved, intrusion detection techniques based on the frequency property provide a viable solution that produces good intrusion detection performance with low computational overhead  相似文献   

20.
吴蓉  李剑慧  朱传琪 《计算机工程》2001,27(7):103-104,150
介绍了动态数据流分析的基本方法,分析了它在复杂控制流条件下的不足,提出了一种能够使用后向信息来进行动态数据流分析的BPD测试方法,该方法能够消除动态死码的副作用,从一个循环中提取相当部分的并行性。给出了在SPEC95基准程序包中的fpppp.f的实验结果,验证了BPD测试可以获得其他现有方法不能取得的显著的加速比。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号