首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
Over the last decade, the infrastructure supporting the smart city has lived together with and was surpassed by the rise of social media. The tremendous growth of both mobile devices and social media users has unearthed a new kind of services in the so‐called location‐based social networks (LBSNs). In this new scenario, the term crowdsensing refers to sharing data collected by sensing humans with the aim of measuring phenomena of common interest. Crowd‐sourced location data provide the ability to study, for the first time, the movement of individuals in urban environments. In this paper, we address the problem of monitoring crowds, whereabouts and movement, which can assist decision making in education, emergency training, urban planning, traffic engineering, etc. Precisely, two‐phase density‐based analysis for collectives and crowds (2PD‐CC) is a novel methodology over public data in LBSN, which combines density‐based clustering, outlier detection a topic modeling over a region under study to detect, predict, and explain abnormal group behavior. In order to validate the methodology and its potential application to full‐scale problems, an experiment over Twitter data was performed in Madrid city.  相似文献   

2.
Traditional ways to study urban social behavior, e.g. surveys, are costly and do not scale. Recently, some studies have been showing new ways of obtaining data through location-based social networks (LBSNs), such as Foursquare, which could revolutionize the study of urban social behavior. We use Foursquare check-ins to represent user preferences regarding eating and drinking habits. Considering datasets differing in terms of volume of data and observation window size, our results indicate that spatio-temporal eating and drinking habits of users voluntarily expressed in LBSNs has the potential to explain cultural habits of the users. From this, we propose a methodology to identify cultural boundaries and similarities across populations at different scales, e.g., countries, cities, or neighborhoods. This methodology is extensively evaluated in several aspects. For instance, by proposing some variations of it disregarding some of the considered dimensions, as well as analyzing the results using datasets from different periods and window of observation. The results indicate that our proposed methodology is a promising approach for automatic cultural habits separation, which could enable new urban services.  相似文献   

3.
With the recent surge of location-based social networks (LBSNs), e.g., Foursquare and Facebook Places, huge amount of human digital footprints that people leave in the cyber-physical space become accessible, including users’ profiles, online social connections, and especially the places that they have checked in. Different from social networks (e.g., Flickr, Facebook) which have explicit groups for users to subscribe or join, LBSNs usually have no explicit community structure. Meanwhile, unlike social networks which only contain a single type of social interaction, the coexistence of online/offline social interactions and user/venue attributes in LBSNs makes the community detection problem much more challenging. In order to capitalize on the large number of potential users/venues as well as the huge amount of heterogeneous social interactions, quality community detection approach is needed. In this paper, by exploring the heterogenous digital footprints of LBSNs users in the cyber-physical space, we come out with a novel edge-centric co-clustering framework to discover overlapping communities. By employing inter-mode as well as intra-mode features, the proposed framework is able to group like-minded users from different social perspectives. The efficacy of our approach is validated by intensive empirical evaluations based on the collected Foursquare dataset.  相似文献   

4.
Smog disasters are becoming more and more frequent and may cause severe consequences on the environment and public health, especially in urban areas. Social media as a real-time urban data source has become an increasingly effective channel to observe people׳s reactions on smog-related health hazard. It can be used to capture possible smog-related public health disasters in its early stage. We then propose a predictive analytic approach that utilizes both social media and physical sensor data to forecast the next day smog-related health hazard. First, we model smog-related health hazards and smog severity through mining raw microblogging text and network information diffusion data. Second, we developed an artificial neural network (ANN)-based model to forecast smog-related health hazard with the current health hazard and smog severity observations. We evaluate the performance of the approach with other alternative machine learning methods. To the best of our knowledge, we are the first to integrate social media and physical sensor data for smog-related health hazard forecasting. The empirical findings can help researchers to better understand the non-linear relationships between the current smog observations and the next day health hazard. In addition, this forecasting approach can provide decision support for smog-related health hazard management through functions like early warning.  相似文献   

5.
The popularity of GPS-equipped gadgets and mapping mashup applications has motivated the growth of geotagged Web resources as well as georeferenced multimedia applications. More and more research attention have been put on mining collaborative knowledge from mass user-contributed geotagged contents. However, little attention has been paid to generating high-quality geographical clusters, which is an important preliminary data-cleaning process for most geographical mining works. Previous works mainly use geotags to derive geographical clusters. Simply using one channel information is not sufficient for generating distinguishable clusters, especially when the location ambiguity problem occurs. In this paper, we propose a two-level clustering framework to utilize both the spatial and the semantic features of photographs for clustering. For the first-level geoclustering phase, we cluster geotagged photographs according to their spatial ties to roughly partition the dataset in an efficient way. Then we leverage the textual semantics in photographs' annotation to further refine the grouping results in the second-level semantic clustering phase. To effectively measure the semantic correlation between photographs, a semantic enhancement method as well as a new term weighting function have been proposed. We also propose a method for automatic parameter determination for the second-level spectral clustering process. Evaluation of our implementation on real georeferenced photograph dataset shows that our algorithm performs well, producing distinguishable geographical cluster with high accuracy and mutual information.  相似文献   

6.
The idiosyncrasy of the Web has, in the last few years, been altered by Web 2.0 technologies and applications and the advent of the so-called Social Web. While users were merely information consumers in the traditional Web, they play a much more active role in the Social Web since they are now also data providers. The mass involved in the process of creating Web content has led many public and private organizations to focus their attention on analyzing this content in order to ascertain the general public’s opinions as regards a number of topics. Given the current Web size and growth rate, automated techniques are essential if practical and scalable solutions are to be obtained. Opinion mining is a highly active research field that comprises natural language processing, computational linguistics and text analysis techniques with the aim of extracting various kinds of added-value and informational elements from users’ opinions. However, current opinion mining approaches are hampered by a number of drawbacks such as the absence of semantic relations between concepts in feature search processes or the lack of advanced mathematical methods in sentiment analysis processes. In this paper we propose an innovative opinion mining methodology that takes advantage of new Semantic Web-guided solutions to enhance the results obtained with traditional natural language processing techniques and sentiment analysis processes. The main goals of the proposed methodology are: (1) to improve feature-based opinion mining by using ontologies at the feature selection stage, and (2) to provide a new vector analysis-based method for sentiment analysis. The methodology has been implemented and thoroughly tested in a real-world movie review-themed scenario, yielding very promising results when compared with other conventional approaches.  相似文献   

7.
程琳 《计算机与数字工程》2009,37(11):95-98,151
在公安信息化应用过程中,对于刑事案件信息系统数据的应用仅限于存储、查询、和检索,而对这些数据进行更深层次的挖掘必将有利于提高案件的侦破水平和破案效率。并案侦查是公安机关刑事案件侦查中的一种重要手段,文章针对刑事案件信息数据的特点,在对案件相关信息预处理的基础上,通过构建自组织特征映射神经网络,对相同或相似案件进行聚类分析,为刑侦人员串并案件提供了一种新的方法,同时也为公安机关进行案件分析、发布预警信息提供辅助参考依据。  相似文献   

8.
随着网络安全问题的日益重要,入侵检测领域的研究越来越深入,但目前IDS的误报和漏报不能使人满意。该文提出了一种基于数据挖掘方法的协同入侵检测系统(CoIDS)框架。文章详细讨论了协同工作和数据挖掘方法在入侵检测中的应用。使用了多种数据挖掘方法来建立检测模型,并采用了Agent/Manger/UI三层实体结构。并通过具体的例子重点介绍了在此框架中数据挖掘的应用过程。  相似文献   

9.
This paper addresses the problem of fully automated mining of public space video data, a highly desirable capability under contemporary commercial and security considerations. This task is especially challenging due to the complexity of the object behaviors to be profiled, the difficulty of analysis under the visual occlusions and ambiguities common in public space video, and the computational challenge of doing so in real-time. We address these issues by introducing a new dynamic topic model, termed a Markov Clustering Topic Model (MCTM). The MCTM builds on existing dynamic Bayesian network models and Bayesian topic models, and overcomes their drawbacks on sensitivity, robustness and efficiency. Specifically, our model profiles complex dynamic scenes by robustly clustering visual events into activities and these activities into global behaviours with temporal dynamics. A Gibbs sampler is derived for offline learning with unlabeled training data and a new approximation to online Bayesian inference is formulated to enable dynamic scene understanding and behaviour mining in new video data online in real-time. The strength of this model is demonstrated by unsupervised learning of dynamic scene models for four complex and crowded public scenes, and successful mining of behaviors and detection of salient events in each.  相似文献   

10.
Early detection of unusual events in urban areas is a priority for city management departments, which usually deploy specific complex video-based infrastructures typically monitored by human staff. However, and with the emergence and quick popularity of Location-based social networks (LBSNs), detecting abnormally high or low number of citizens in a specific area at a specific time could be done by an expert system that automatically analyzes the public geo-tagged posts. Our approach focuses exclusively on the location information linked to these posts. By applying a density-based clustering algorithm, we obtain the pulse of the city (24 h–7 days) in a first training phase, which enables the detection of outliers (unexpected behaviors) on-the-fly in an ulterior test or monitoring phase. This solution entails that no specific infrastructure is needed since the citizens are the ones who buy, maintain, carry the mobile devices and freely disclose their location by proactively sharing posts. Besides, location analysis is lighter than video analysis and can be automatically done. Our approach was validated using a dataset of geo-tagged posts obtained from Instagram in New York City for almost six months with good results. Actually, not only all the already previously known events where detected, but also other unknown events where discovered during the experiment.  相似文献   

11.
The detection of retinal microaneurysms is crucial for the early detection of important diseases such as diabetic retinopathy. However, the detection of these lesions in retinography, the most widely available retinal imaging modality, remains a very challenging task. This is mainly due to the tiny size and low contrast of the microaneurysms in the images. Consequently, the automated detection of microaneurysms usually relies on extensive ad-hoc processing. In this regard, although microaneurysms can be more easily detected using fluorescein angiography, this alternative imaging modality is invasive and not adequate for regular preventive screening.In this work, we propose a novel deep learning methodology that takes advantage of unlabeled multimodal image pairs for improving the detection of microaneurysms in retinography. In particular, we propose a novel adversarial multimodal pre-training consisting in the prediction of fluorescein angiography from retinography using generative adversarial networks. This pre-training allows learning about the retina and the microaneurysms without any manually annotated data. Additionally, we also propose to approach the microaneurysms detection as a heatmap regression, which allows an efficient detection and precise localization of multiple microaneurysms. To validate and analyze the proposed methodology, we perform an exhaustive experimentation on different public datasets. Additionally, we provide relevant comparisons against different state-of-the-art approaches. The results show a satisfactory performance of the proposal, achieving an Average Precision of 64.90%, 31.36%, and 33.55% in the E-Ophtha, ROC, and DDR public datasets. Overall, the proposed approach outperforms existing deep learning alternatives while providing a more straightforward detection method that can be effectively applied to raw unprocessed retinal images.  相似文献   

12.
Human relationships have led to complex communication networks among different individuals in a society. As the nature of relationship is change, these networks will change over the time too which makes them dynamic networks including several consecutive snapshots. Nowadays, the pervasiveness of electronic communication networks, so called Social Networks, has facilitated obtaining this valuable communication information and highlighted as one of the most interesting researchers in the field of data mining, called social network mining. One of the most challenging issues in the field of social network mining is community detection. It means to detect hidden communities in a social network based on the available information. This study proposes an appropriate solution to find and track communities in a dynamic social network based on the local information. Our approach tries to detect communities by finding initial kernels and maintaining them in the next snapshots. Using well-known datasets, the investigation and comparison of the proposed method with some state-of-the-art approaches indicates that the performance and computation complexity of our method is promising and can outperform its competitors.  相似文献   

13.
A new dependency and correlation analysis for features   总被引:3,自引:0,他引:3  
The quality of the data being analyzed is a critical factor that affects the accuracy of data mining algorithms. There are two important aspects of the data quality, one is relevance and the other is data redundancy. The inclusion of irrelevant and redundant features in the data mining model results in poor predictions and high computational overhead. This paper presents an efficient method concerning both the relevance of the features and the pairwise features correlation in order to improve the prediction and accuracy of our data mining algorithm. We introduce a new feature correlation metric Q/sub Y/(X/sub i/,X/sub j/) and feature subset merit measure e(S) to quantify the relevance and the correlation among features with respect to a desired data mining task (e.g., detection of an abnormal behavior in a network service due to network attacks). Our approach takes into consideration not only the dependency among the features, but also their dependency with respect to a given data mining task. Our analysis shows that the correlation relationship among features depends on the decision task and, thus, they display different behaviors as we change the decision task. We applied our data mining approach to network security and validated it using the DARPA KDD99 benchmark data set. Our results show that, using the new decision dependent correlation metric, we can efficiently detect rare network attacks such as User to Root (U2R) and Remote to Local (R2L) attacks. The best reported detection rates for U2R and R2L on the KDD99 data sets were 13.2 percent and 8.4 percent with 0.5 percent false alarm, respectively. For U2R attacks, our approach can achieve a 92.5 percent detection rate with a false alarm of 0.7587 percent. For R2L attacks, our approach can achieve a 92.47 percent detection rate with a false alarm of 8.35 percent.  相似文献   

14.
异常检测一直是数据挖掘领域的重要工作之一。基于欧式距离的异常检测算法在应用于高维数据时存在检测精度无法保证和运行时间过长的问题。在基于角度方差的异常检测算法基础上提出了一种多层次的高维数据异常检测算法(Hybrid outlier detection algorithm based on angle variance for High-dimensional data, HODA)。算法结合了粗糙集理论,分析属性之间的相互作用以排除影响较小的属性;通过分析各维度上的数据分布,对数据进行网格划分,寻找可能存在异常点的网格;最后对可能存在异常点的网格计算角度方差异常因子,筛选异常数据。实验结果表明,与ABOD, FastVOA和经典LOF算法相比,HODA算法在保证精测精度的前提下,运行时间显著缩短且可扩展性强。  相似文献   

15.
ObjectiveThis work proposes a novel approach to model the spatiotemporal distribution of crowd motions and detect anomalous events.MethodsWe first learn the regions of interest (ROIs) which inform the behavioral patterns by trajectory analysis with Hierarchical Dirichlet Processes (HDP), so that the main trends of crowd motions can be modeled. Based on the ROIs, we then build a series of histograms both on global and local levels as the templates for the observed movement distribution, which statistically describes time-correlated crowd events. Once the template has been built hierarchically, we import real data containing the discrete trajectory observations from video surveillance and detect abnormal events for individuals and for crowds.ResultsExperimental results show the effectiveness of our approach, which is able to analyze and extract the crowd motion information from observed trajectory dataset, and achieve the anomaly detection at the hierarchical levels.ConclusionThe proposed hierarchical approach can learn the moving trends of crowd both in global and local area and describe the crowd behaviors in statistical way, which build a template for pedestrian movement distribution that allows for the detection of time-correlated abnormal crowd events.  相似文献   

16.
Discovering Dispatching Rules Using Data Mining   总被引:1,自引:0,他引:1  
This paper introduces a novel methodology for generating scheduling rules using a data-driven approach. We show how to use data mining to discover previously unknown dispatching rules by applying the learning algorithms directly to production data. This approach involves preprocessing of historic scheduling data into an appropriate data file, discovery of key scheduling concepts, and representation of the data mining results in a way that enables its use for job scheduling. We also consider how by using this new approach unexpected knowledge and insights can be obtained, in a manner that would not be possible if an explicit model of the system or the basic scheduling rules had to be obtained beforehand. All of our results are illustrated via numerical examples and experiments on simulated data.  相似文献   

17.
Yu  Dongjin  Yu  Ting  Wang  Dongjing  Shen  Yi 《Multimedia Tools and Applications》2022,81(27):39207-39228
Multimedia Tools and Applications - Nowadays, many people like to share the places they visited in the Location-based Social Networks (LBSNs). A Point of Interest (POI) recommendation, as one of...  相似文献   

18.
Discovering shared conceptualizations in folksonomies   总被引:2,自引:0,他引:2  
Social bookmarking tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. Unlike ontologies, shared conceptualizations are not formalized, but rather implicit. We present a new data mining task, the mining of all frequent tri-concepts, together with an efficient algorithm, for discovering these implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution. Finally, we show the applicability of our approach on three large real-world examples.  相似文献   

19.
We investigate the use of biased sampling according to the density of the data set to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional data sets. In density-biased sampling, the probability that a given point will be included in the sample depends on the local density of the data set. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally, we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach.  相似文献   

20.
OLAP cubes enable aggregation-centric analysis of transactional data by shaping data records into measurable facts with dimensional characteristics. A multidimensional view is obtained from the available data fields and explicit relationships between them. This classical modeling approach is not feasible for scenarios dealing with semi-structured or poorly structured data. We propose to the data warehouse design methodology with a content-driven discovery of measures and dimensions in the original dataset. Our approach is based on introducing a data enrichment layer responsible for detecting new structural elements in the data using data mining and other techniques. Discovered elements can be of type measure, dimension, or hierarchy level and may represent static or even dynamic properties of the data. This paper focuses on the challenge of generating, maintaining, and querying discovered elements in OLAP cubes.We demonstrate the power of our approach by providing OLAP to the public stream of user-generated content on the Twitter platform. We have been able to enrich the original set with dynamic characteristics, such as user activity, popularity, messaging behavior, as well as to classify messages by topic, impact, origin, method of generation, etc. Knowledge discovery techniques coupled with human expertise enable structural enrichment of the original data beyond the scope of the existing methods for obtaining multidimensional models from relational or semi-structured data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号