首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Twitter has become an important data source for detecting events, especially tracking detailed information for events of a specific domain. Previous studies on targeted-domain Twitter information extraction have used supervised learning techniques to identify domain-related tweets, however, the need for extensive manual labeling makes these supervised systems extremely expensive to build and maintain. What’s more, most of these existing work fail to consider spatiotemporal factors, which are essential attributes of target-domain events. In this paper, we propose a semi-supervised method for Automatical Targeted-domain Spatiotemporal Event Detection (ATSED) in Twitter. Given a targeted domain, ATSED first learns tweet labels from historical data, and then detects on-going events from real-time Twitter data streams. Specifically, an efficient label generation algorithm is proposed to automatically recognize tweet labels from domain-related news articles, a customized classifier is created for Twitter data analysis by utilizing tweets’ distinguishing features, and a novel multinomial spatial-scan model is provided to identify geographical locations for detected events. Experiments on 305 million tweets demonstrated the effectiveness of this new approach.  相似文献   

2.
Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-of-the-art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.  相似文献   

3.
Li  Huan  Zhang  Ruisheng  Zhao  Zhili  Liu  Xin  Yuan  Yongna 《Applied Intelligence》2021,51(11):7749-7765
Applied Intelligence - Influence maximization refers to selecting a small number of influential nodes in a given network to maximize the influence affected by the subset. In social network analysis...  相似文献   

4.
Influence Maximization aims to find the top-K influential individuals to maximize the influence spread within a social network, which remains an important yet challenging problem. Most existing greedy algorithms mainly focus on computing the exact influence spread, leading to low computational efficiency and limiting their application to real-world social networks. While in this paper we show that through supervised sampling, we can efficiently estimate the influence spread at only negligible cost of precision, thus significantly reducing the execution time. Motivated by this, we propose ESMCE, a power-law exponent supervised Monte Carlo estimation method. In particular, ESMCE exploits the power-law exponent of the social network to guide the sampling, and employs multiple iterative steps to guarantee the estimation accuracy. Moreover, ESMCE shows excellent scalability and well suits large-scale social networks. Extensive experiments on six real-world social networks demonstrate that, compared with state-of-the-art greedy algorithms, ESMCE is able to achieve almost two orders of magnitude speedup in execution time with only negligible error (2.21 % on average) in influence spread.  相似文献   

5.
Applied Intelligence - Influence maximization in social networks refers to the process of finding influential users who make the most of information or product adoption. The social networks is...  相似文献   

6.
The Journal of Supercomputing - Depression is the most prevalent mental disorder that can lead to suicide. Due to the tendency of people to share their thoughts on social platforms, social data...  相似文献   

7.
Multimedia Tools and Applications - An efficient way of extracting useful information from multiple sources of data is to use data fusion technology. This paper introduces a data fusion approach in...  相似文献   

8.
Multimedia Tools and Applications - Online Social Networks(OSNs) are generally at the risk of many potential dangers. Malicious attackers use compromised OSN accounts to spread fake news, to send...  相似文献   

9.
Multimedia Tools and Applications - Microblogs have become a customary news media source in recent times. But as synthetic text or ‘readfakes’ scale up the online disinformation...  相似文献   

10.
Multimedia Tools and Applications - A wheelchair users detector is presented to extend people detection, providing a more general solution to detect people in environments such as houses adapted...  相似文献   

11.
A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company.We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart.Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection.  相似文献   

12.
Micro-blogging networks have become the most influential online social networks in recent years, more and more people are used to obtain and diffuse information in them. Detecting topics from a great number of tweets in micro-blogging is important for information propagation and business marketing, especially detecting emerging topics in the early period could strongly support these real-time intelligent systems, such as real-time recommendation, ad-targeting, marketing strategy. However, most of previous researches are useful to detect emerging topic on a large scale, but they are not so effective for the early detection due to less informative properties in a relatively small size. To solve this problem, we propose a new early detection method for emerging topics based on Dynamic Bayesian Networks in micro-blogging networks. We first analyze the topic diffusion process and find two main characteristics of emerging topic which are attractiveness and key-node. Then based on this finding, we select features from the topology properties of topic diffusion, and build a DBN-based model by the conditional dependencies between features to identify the emerging keywords. An emerging keyword not only occurs in a given time period with frequency properties, but also diffuses with specific topology properties. Finally, we cluster the emerging keywords into emerging topics by the co-occurrence relations between keywords. Based on the real data of Sina micro-blogging, the experimental results demonstrate that our method is effective and capable of detecting the emerging topics one to two hours earlier than the other methods.  相似文献   

13.
14.
电商平台的刷单行为在一定程度上提高了店铺收益,但是刷单行为一方面抬高了电商平台的推广成本,导致了严重的信誉安全问题;另一方面,虚假的刷单信息致使消费者易受误导,从而造成财产损失。针对电商平台刷单现象,提出面向用户的电商平台刷单行为智能检测方法——SVM-NB算法,并提出构建刷单特征值方法。首先收集商品的相关数据,建立特征值数据库;其次利用基于有监督学习的支持向量机(SVM)算法建立分类器,求解刷单行为的判断结果;最后通过朴素贝叶斯公式计算商品刷单行为的概率,反馈给买家,为其提供购物的参考数据。通过K折交叉验证算法验证了SVM-NB算法应用的合理性和准确性,实验条件下计算结果的准确率高达95.0536%。  相似文献   

15.
16.
This paper presents an evolutionary algorithm for Discriminative Pattern (DP) mining that focuses on high dimensional data sets. DPs aims to identify the sets of characteristics that better differentiate a target group from the others (e.g. successful vs. unsuccessful medical treatments). It becomes more natural to extract information from high dimensionality data sets with the increase in the volume of data stored in the world (30 GB/s only in the Internet). There are several evolutionary approaches for DP mining, but none focusing on high-dimensional data. We propose an evolutionary approach attributing features that reduce the cost of memory and processing in the context of high-dimensional data. The new algorithm thus seeks the best (top-k) patterns and hides from the user many common parameters in other evolutionary heuristics such as population size, mutation and crossover rates, and the number of evaluations. We carried out experiments with real-world high-dimensional and traditional low dimensional data. The results showed that the proposed algorithm was superior to other approaches of the literature in high-dimensional data sets and competitive in the traditional data sets.  相似文献   

17.
As we move deeper into the 21st century, critical infrastructures related to energy and transportation are becoming smart–monitor themselves, communicate, and most importantly self-govern. Various drivers have enabled this transition, including sustainability concerns, scarcity in resources, economic considerations, and rapid growth in enabling technologies of sensor networks, and computational and communication systems. Two notable examples of such infrastructures are smart grids and smart cities. The idea behind a Smart grid is the creation of a dynamic, cyber-physical infrastructure that meets the challenges of intermittency and distributed availability of renewables, and realizes reduced operational costs and emissions, via a flexible, intelligent, and networked grid that plans, controls, and balances supply and demand over an entire region. The concept of a Smart City is gaining popular attention driven by goals of sustainability and efficiency, the needs of enhancing quality of life and affordability, growing urbanization of the world’s population, and the explosion of technological advances in communication and computation. While systems and control problems abound in any complex dynamic system, two characteristics that are specific to critical infrastructures are the need to deliver reliable service and the ability to accomplish this goal amidst constrained resources. These in turn lead to new research topics in systems and control including empowered consumers, transactive control, and resilience. The focus of this paper is on these emerging topics. Their role in smart infrastructures, the opportunities they provide, and the research challenges that they bring in are all discussed. Specific illustrations of recent successes are presented that are based on coordinated adjustment of generation and consumption using concepts of multi-agents and multi-timescales in smart grids and socio-technical models of empowered drivers in smart cities.  相似文献   

18.
从社会网络中发现重要节点是一个很有意义的研究问题,目前多数重要节点发现方法是基于不加权网络。由于在社会网络中,节点之间的关系具有强弱差异,社会网络本质上是一个加权网络。对于加权社会网络中的重要节点发现较少有研究。利用节点交互,提出了节点间关系强度的一种度量方法,该方法考虑了节点局部有向交互特征与全局交互特征。利用节点的行为特征定义了节点活跃度。 采用关系强度作为边的权重,活跃度作为节点权重形成了加权社会网络。基于PageRank算法的思想,提出了两个改进算法,算法采用节点权值作为阻尼系数,在迭代式过程用边的权重代替了PageRank算法中的入边和。分别选择国内外具有代表性的2个社交网络上的数据集进行大量实验,并分别选择了不同的方法作为比较,实验结果表明改进算法能较好地发现加权社会网络中的重要节点。  相似文献   

19.
Detecting anomalies in process runtime behavior is crucial: they might reflect, on the one side, security breaches and fraudulent behavior and on the other side desired deviations due to, for example, exceptional conditions. Both scenarios yield valuable insights for process analysts and owners, but happen due to different reasons and require a different treatment. Hence a distinction into malign and benign anomalies is required. Existing anomaly detection approaches typically fall short in supporting experts when in need to take this decision. An additional problem are false positives which could result in selecting incorrect countermeasures. This paper proposes a novel anomaly detection approach based on association rule mining. It fosters the explanation of anomalies and the estimation of their severity. In addition, the approach is able to deal with process change and flexible executions which potentially lead to false positives. This facilitates to take the appropriate countermeasure for a malign anomaly and to avoid the possible termination of benign process executions. The feasibility and result quality of the approach are shown by a prototypical implementation and by analyzing real life logs with injected artificial anomalies. The explanatory power of the presented approach is evaluated through a controlled experiment with users.  相似文献   

20.
Named entity recognition (NER) methods have been regarded as an efficient strategy to extract relevant entities for answering a given query. The aim of this work is to exploit the conventional NER methods for analyzing a large set of microtexts of which lengths are short. Particularly, the microtexts are streaming on online social media, e.g., Twitter. To do so, this paper proposes three properties of contextual association among the microtexts to discover contextual clusters of the microtexts, which can be expected to improve the performance of NER tasks. As a case study, we have applied the proposed NER system to Twitter. Experimental results demonstrate the feasibility of the proposed method (around 90.3% of precision) for extracting relevant information in online social network applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号