首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.  相似文献   

2.
Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.  相似文献   

3.
In this paper, the problem of automated scene understanding by tracking and predicting paths for multiple humans is tackled, with a new methodology using data from a single, fixed camera monitoring the environment. Our main idea is to build goal-oriented prior motion models that could drive both the tracking and path prediction algorithms, based on a coarse-to-fine modeling of the target goal. To implement this idea, we use a dataset of training video sequences with associated ground-truth trajectories and from which we extract hierarchically a set of key locations. These key locations may correspond to exit/entrance zones in the observed scene, or to crossroads where trajectories have often abrupt changes of direction. A simple heuristic allows us to make piecewise associations of the ground-truth trajectories to the key locations, and we use these data to learn one statistical motion model per key location, based on the variations of the trajectories in the training data and on a regularizing prior over the models spatial variations. We illustrate how to use these motion priors within an interacting multiple model scheme for target tracking and path prediction, and we finally evaluate this methodology with experiments on common datasets for tracking algorithms comparison.  相似文献   

4.
互联网的发展使有组织犯罪集团的认定与处罚变的更为困难. 针对有组织犯罪集团的特点, 采用共犯网络结构分析和数据挖掘技术相结合的方法确定有组织犯罪集团结构及其组成实体, 提出一种对有组织犯罪集团进行系统分析和挖掘的新算法. 该算法能从大型真实犯罪数据集提取信息, 快速获取有组织犯罪集团证据, 提高了有组织犯罪集团检测效率. 最后将算法与其他现有算法进行比较, 实验结果表明该算法的时间性能优越.  相似文献   

5.
This paper presents a redundant multicast routing problem in multilayer networks that arises from large-scale distribution of realtime multicast data (e.g., Internet TV, videocasting, online games, stock quotes). Since these multicast services commonly operate in multilayer networks, the communications paths need to be robust against a single router or link failure as well as multiple such failures due to shared risk link groups (SRLGs). The main challenge of this multicast is to ensure the service availability and reliability using a path protection scheme, which is to find a redundant path that is SRLG-disjoint (diverse) from each working path. The objective of this problem is, therefore, to find two redundant multicast trees, each from one of the two redundant sources to every destination, at a minimum total communication cost whereas two paths from the two sources to every destination are guaranteed to be SRLG-diverse (i.e., links in the same risk group are disjoint). In this paper, we present two new mathematical programming models, edge-based and path-based, for the redundant multicast routing problem with SRLG-diverse constraints. Because the number of paths in path-based model grows exponentially with the network size, it is impossible to enumerate all possible paths in real life networks. We develop three approaches (probabilistic, non-dominated and nearly non-dominated) to generate potentially good paths that may be included in the path-based model. This study is motivated by emerging applications of internet-protocol TV service, and we evaluate the proposed approaches using real life network topologies. Our empirical results suggest that both models perform very well, and the nearly non-dominated path approach outperforms all other path generation approaches.  相似文献   

6.
李忠飞  杨雅君  王鑫 《软件学报》2019,30(3):515-536
最短路径查询是图数据管理中非常重要的一类问题.研究了基于规则的最短路径查询,它是一类特殊的最短路径查询问题.给定起点和终点,基于规则的最短路径查询是指找到一条从起点到终点的最短路径,使得此路径经过用户指定点集中的所有点,并且某些点的访问顺序满足一定的偏序规则.该问题被证明是一个NP-hard问题.目前已有的工作侧重于空间数据集(两点之间的最短距离用欧氏距离表示)上基于规则的最短路径问题,它采用穷举的方式列出所有满足规则的路径,然后选择长度最小的路径作为问题的解.然而在实际的道路交通网中,两点之间的距离等于两点之间的最短路径的长度,它往往大于两点之间的欧氏距离;此外,采用穷举的方式会造成大量重复的计算.因此,设计了一种前向搜索算法以及一些优化技术来求解该问题.最后,在不同的真实数据集上设计了大量的实验来验证算法的有效性.实验结果表明,该算法可以快速给出问题的解,而且算法的效率在很大程度上超过了现有的算法.  相似文献   

7.
Semi-supervised clustering is gaining importance these days since neither supervised nor unsupervised learning methods in a stand-alone manner provide satisfactory results. Existing semi-supervised clustering techniques are mostly based on pair-wise constraints, which could be misleading. These semi-supervised clustering algorithms also fail to address the problem of dealing with attributes having different weights. In most of the real-life applications, all attributes do not have equal importance and hence same weights cannot be assigned for each attribute. In this paper, a novel distance-based semi-supervised clustering algorithm has been proposed, which uses functional link neural network (FLNN) for finding weights for attributes with small amount of labeled data for further use in parametric Minkowski’s model for clustering. In FLNN, the nonlinearity is captured by enhancing the input using orthonormal basis functions. The effectiveness of the approach has been illustrated over a number of datasets taken from UCI machine learning repository. Comparative performance evaluation demonstrates that the proposed approach outperforms the existing semi-supervised clustering algorithms. The proposed approach has also been successfully used to cluster the crime locations and to find crime hot spots in India on the data provided by National Crime Records Bureau (NCRB).  相似文献   

8.
Crime is a focal problem in modern society, affecting social stability, public safety, economic development, and life quality of residents. Promptly predicting crime occurrence places in a relatively high accuracy is a very important and meaningful research direction. Via the rapid development of social media (e.g., Twitter), the online information can act as a strong supplement for the offline information (crime records). Additionally, the geographic information and taxi flow between communities can model the spatial relationship between communities, which has already been confirmed effective in previous work. In order to efficiently solve crime prediction problem, we propose a generalized deep multi-view representation learning framework for crime forecasting. Our extensive experiments on a 4-month city-wide dataset that consists of 77 communities and 22 crime types show our model improve the prediction accuracy on most crime types.  相似文献   

9.
以前的研究与实验已经充分表明,基于延时的拥塞控制优于基于报文丢失事件的拥塞控制。不幸的是,这些模型大多忽略了反向流量的影响。针对这个问题,我们提出了一个简单的分析模型,重点分析拥塞窗口的平衡点与往返延时之间的关系。实验结果表明,反向路径的延时与前向数据路径具有同样的重要性。  相似文献   

10.
We consider a multiple-criterion shortest path problem with resource constraints, in which one needs to find paths between two points in a terrain for the movement of an unmanned combat vehicle (UCV). In the path planning problem considered here, cumulative traverse time of the UCV, risk level, and (communication) jamming level associated with the paths are limited to be less than or equal to given limits. We propose a modified label-correcting algorithm with a new label-selection strategy to find Pareto-optimal solutions for the multiple objectives of minimizing the traverse time, risk level, and jamming level related to the paths. In addition, we develop a path planning algorithm based on the label-correcting method to solve problems with a single objective within a reasonably short time. For evaluation of the performance of the proposed algorithms, computational experiments are performed on a number of instances, and results show that the proposed algorithms perform better than existing methods in terms of a computation time.  相似文献   

11.
With the extensive applications of machine learning, it has been witnessed that machine learning has been applied in various fields such as e-commerce, mobile data processing, health analytics and behavioral analytics etc. Word vector training is usually deployed in machine learning to provide a model architecture and optimization, for example, to learn word embeddings from a large amount of datasets. Training word vector in machine learning needs a lot of datasets to train and then outputs a model, however, some of which might contain private and sensitive information, and the training phase will lead to the exposure of the trained model and user datasets. In order to offer utilizable, plausible, and personalized alternatives to users, this process usually also entails a breach of their privacy. For instance, the user data might contain of face,irirs and personal identities etc. This will release serious problem in the machine learning. In this article, we investigate the problem of training high-quality word vectors on encrypted datasets by using privacy-preserving learning algorithms. Firstly, we use a pseudo-random function to generate a statistical token for each word to help build the vocabulary of the word vector. Then we employ functional inner-product encryption to calculate the activation function to obtain the inner product, securely. Finally, we use BGN cryptosystem to encrypt and hide the sensitive datasets, and complete the homomorphic operation over the ciphertexts to perform the training procedure. In order to implement the privacy preservation of word vector training, we propose four privacy-preserving machine learning schemes to provide the privacy protection in our scheme. We analyze the security and efficiency of our protocols and give the numerical experiments. Compared with the existing solutions, it indicates that our scheme can provide a higher efficiency and less communication overhead.  相似文献   

12.
The Orienteering Problem with Time Windows (OPTW) is the problem of finding a path that maximizes the profit available at the nodes in a time-constrained network. The OPTW has multiple applications in transportation, telecommunications, and scheduling. First, we extend an exact method for shortest path problems with side constraints into a general-purpose framework for hard shortest path variants. Then, using this framework, we develop a new method for the OPTW that incorporates problem-specific knowledge. Our method outperforms the state-of-the-art algorithm on instances derived from benchmark datasets from the literature achieving speedups of up to 266 times and is able to find optimal solutions to large-scale problems with up to 562 nodes in short computational times.  相似文献   

13.
Subspace clustering algorithms have shown their advantage in handling high-dimensional data by optimizing a linear combination of clustering criteria. However, setting the coefficients of these criteria items without prior knowledge will lead to inaccurate and poor robust clustering results. To address this problem, in this paper, we propose to optimize the multiple clustering criteria simultaneously without any predefined coefficients by a multi-objective evolutionary algorithm. Furthermore, to accelerate the convergence of the algorithm, we provide a novel local search method. In it, the multi-objective clustering problem is decomposed into many localized scalarizing sub-problems by reference vectors. Solutions are then locally searched around their associated sub-problems. Thirdly, we develop a knee-pruning fuzzy ensemble method for selecting the final solution. This method applies clustering ensemble in solutions selected from knee regions to get robust results. Experiments on UCI benchmarks and gene expression datasets show that our proposed algorithm can efficiently handle high-dimensional clustering problems without any user-defined coefficients.  相似文献   

14.
Three-dimensional terrain reconstruction from 2D aerial images is a problem of utmost importance due its wide level of applications. It is relevant in the context of intelligent systems for disaster managements (for example to analyze a flooded area), soil analysis, earthquake crisis, civil engineering, urban planning, surveillance and defense research.It is a two level problem, being the former the acquisition of the aerial images and the later, the 3D reconstruction. We focus here in the first problem, known as coverage path planning, and we consider the case where the camera is mounted on an unmanned aerial vehicle (UAV).In contrast with the case when ground vehicles are used, coverage path planning for a UAV is a lesser studied problem. As the areas to cover become complex, there is a clear need for algorithms that will provide good enough solutions in affordable times, while taking into account certain specificities of the problem at hand. Our algorithm can deal with both convex and non-convex areas and their main aim is to obtain a path that reduces the battery consumption, through minimizing the number of turns.We comment on line sweep calculation and propose improvements for the path generation and the polygon decomposition problems such as coverage alternatives and the interrupted path concept. Illustrative examples show the potential of our algorithm in two senses: ability to perform the coverage when complex regions are considered, and achievement of better solution than a published result (in terms of the number of turns used).  相似文献   

15.
Crime attractors are locations (e.g. shopping malls) that attract criminally motivated offenders because of the presence of known criminal opportunities. Although there have been many studies that explore the patterns of crime in and around these locations, there are still many questions that linger. In recent years, there has been a growing interest to develop mathematical models in attempts to help answer questions about various criminological phenomena. In this paper, we are interested in applying a formal methodology to model the relative attractiveness of crime attractor locations based on characteristics of offenders and the crime they committed. To accomplish this task, we adopt fuzzy logic techniques to calculate the attractiveness of crime attractors in three suburban cities in the Metro Vancouver region of British Columbia, Canada. The fuzzy logic techniques provide results comparable with our real‐life expectations that offenders do not necessarily commit significant crimes in the immediate neighbourhood of the attractors, but travel towards it, and commit crimes on the way. The results of this study could lead to a variety of crime prevention benefits and urban planning strategies.  相似文献   

16.
Over the last few years, activity recognition in the smart home has become an active research area due to the wide range of human centric-applications. With the development of machine learning algorithms for activity classification, dataset is significantly important for algorithms testing and validation. Collection of real data is a challenging process due to involved budget, human resources, and annotation cost that’s why mostly researchers prefer to utilize existing datasets for evaluation purposes. However, openly available smart home datasets indicate variation in terms of performed activities, deployed sensors, and environment settings. Unfortunately, the analysis of existing datasets characteristic is a bottleneck for researchers while selecting datasets of their intent. In this paper, we develop a Framework for Smart Homes Dataset Analysis (FSHDA) to reflect their diverse dimensions in predefined format. It analyzes a list of data dimensions that covers the variations in time, activities, sensors, and inhabitants. For validation, we examine the effects of proposed data dimension on state-of-the-art activity recognition techniques. The results show that dataset dimensions highly affect the classifiers’ individual activity label assignments and their overall performances. The outcome of our study is helpful for upcoming researchers to develop a better understanding about the smart home datasets characteristics with classifier’s performance.  相似文献   

17.
Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.  相似文献   

18.
Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude.  相似文献   

19.
We propose an internal cluster validity index for a fuzzy c-means algorithm which combines a mathematical model for the fuzzy c-partition and a heuristic search for the number of clusters in the data. Our index resorts to information theoretic principles, and aims to assess the congruence between such a model and the data that have been observed. The optimal cluster solution represents a trade-off between discrepancy and the complexity of the underlying fuzzy c-partition. We begin by testing the effectiveness of the proposed index using two sets of synthetic data, one comprising a well-defined cluster structure and the other containing only noise. Then we use datasets arising from real life problems. Our results are compared to those provided by several available indices and their goodness is judged by an external measure of similarity. We find substantial evidence supporting our index as a credible alternative to the cluster validation problem, especially when it concerns structureless data.  相似文献   

20.
While data clustering has a long history and a large amount of research has been devoted to the development of numerous clustering techniques, significant challenges still remain. One of the most important of them is associated with high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such datasets, utilising information driven by the principal component analysis. In this work, we try to deepen our understanding on what can be achieved by this kind of approaches. We attempt to theoretically discover the relationship between true clusters in the data and the distribution of their projection onto the principal components. Based on such findings, we propose appropriate criteria for the various steps involved in hierarchical divisive clustering and develop compilations of them into new algorithms. The proposed algorithms require minimal user-defined parameters and have the desirable feature of being able to provide approximations for the number of clusters present in the data. The experimental results indicate that the proposed techniques are effective in simulated as well as real data scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号