首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means\( ++ \) initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.  相似文献   

2.
In this paper, a steganographic scheme adopting the concept of the generalized K d -distance N-dimensional pixel matching is proposed. The generalized pixel matching embeds a B-ary digit (B is a function of K and N) into a cover vector of length N, where the order-d Minkowski distance-measured embedding distortion is no larger than K. In contrast to other pixel matching-based schemes, a N-dimensional reference table is used. By choosing d, K, and N adaptively, an embedding strategy which is suitable for arbitrary relative capacity can be developed. Additionally, an optimization algorithm, namely successive iteration algorithm (SIA), is proposed to optimize the codeword assignment in the reference table. Benefited from the high dimensional embedding and the optimization algorithm, nearly maximal embedding efficiency is achieved. Compared with other content-free steganographic schemes, the proposed scheme provides better image quality and statistical security. Moreover, the proposed scheme performs comparable to state-of-the-art content-based approaches after combining with image models.  相似文献   

3.
The field of high dynamic range imaging addresses the problem of capturing and displaying the large range of luminance levels found in the world, using devices with limited dynamic range. In this paper we present a novel tone mapping algorithm that is based on K-means clustering. Using dynamic programming we are able to not only solve the clustering problem efficiently, but also find the global optimum. Our algorithm runs in \(\hbox {O}(N^2K)\) for an image with N input luminance levels and K output levels. We show that our algorithm gives comparable results to state-of-the-art tone mapping algorithms, but with the additional large benefit of a minimum of parameters. We show how to extend the method to handle video input. We test our algorithm on a number of standard high dynamic range images and video sequences and give qualitative and quantitative comparisons to a number of state-of-the-art tone mapping algorithms.  相似文献   

4.
Recently, sparse subspace clustering, as a subspace learning technique, has been successfully applied to several computer vision applications, e.g. face clustering and motion segmentation. The main idea of sparse subspace clustering is to learn an effective sparse representation that are used to construct an affinity matrix for spectral clustering. While most of existing sparse subspace clustering algorithms and its extensions seek the forms of convex relaxation, the use of non-convex and non-smooth l q (0 < q < 1) norm has demonstrated better recovery performance. In this paper we propose an l q norm based Sparse Subspace Clustering method (lqSSC), which is motivated by the recent work that l q norm can enhance the sparsity and make better approximation to l 0 than l 1. However, the optimization of l q norm with multiple constraints is much difficult. To solve this non-convex problem, we make use of the Alternating Direction Method of Multipliers (ADMM) for solving the l q norm optimization, updating the variables in an alternating minimization way. ADMM splits the unconstrained optimization into multiple terms, such that the l q norm term can be solved via Smooth Iterative Reweighted Least Square (SIRLS), which converges with guarantee. Different from traditional IRLS algorithms, the proposed algorithm is based on gradient descent with adaptive weight, making it well suit for general sparse subspace clustering problem. Experiments on computer vision tasks (synthetic data, face clustering and motion segmentation) demonstrate that the proposed approach achieves considerable improvement of clustering accuracy than the convex based subspace clustering methods.  相似文献   

5.
This paper introduces Hk-medoids, a modified version of the standard k-medoids algorithm. The modification extends the algorithm for the problem of clustering complex heterogeneous objects that are described by a diversity of data types, e.g. text, images, structured data and time series. We first proposed an intermediary fusion approach to calculate fused similarities between objects, SMF, taking into account the similarities between the component elements of the objects using appropriate similarity measures. The fused approach entails uncertainty for incomplete objects or for objects which have diverging distances according to the different component. Our implementation of Hk-medoids proposed here works with the fused distances and deals with the uncertainty in the fusion process. We experimentally evaluate the potential of our proposed algorithm using five datasets with different combinations of data types that define the objects. Our results show the feasibility of the our algorithm, and also they show a performance enhancement when comparing to the application of the original SMF approach in combination with a standard k-medoids that does not take uncertainty into account. In addition, from a theoretical point of view, our proposed algorithm has lower computation complexity than the popular PAM implementation.  相似文献   

6.
We consider a single-cell network with a hybrid full-/half-duplex base station. For the practical scenario with N channels, K uplink users, and M downlink users (max{K,M} ≤ NK + M), we tackle the issue of user admission and power control to simultaneously maximize the user admission number and minimize the total transmit power when guaranteeing the quality-of-service requirement of individual users. We formulate a 0–1 integer programming problem for the joint-user admission and power allocation problem. Because finding the optimal solution of this problem is NP-hard in general, a low-complexity algorithm is proposed by introducing the novel concept of adding dummy users. Simulation results show that the proposed algorithm achieves performance similar to that of branch and bound algorithm and significantly outperforms the random pairing algorithm.  相似文献   

7.
Clustering is a powerful machine learning technique that groups “similar” data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver “qbsolv.” The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.  相似文献   

8.
In this paper, we propose a new robust self-tuning control, called the generalized minimum variance αl-equivalent selftuning control (GMVSTC-αl) for the linear timevarying (LTV) systems, which can be described by the discrete-time auto-regressive exogenous (ARX) mathematical model in the presence of unmodelled dynamics. The estimation of the parameters contained in this mathematical model is made on the basis of the proposed modified recursive least squares (m-RLS) parametric estimation algorithm with dead zone and forgetting factor. The stability analysis of the proposed parametric estimation algorithm m-RLS is treated on the basis of a Lyapunov function. A numerical simulation example is used to prove the performances and the effectiveness of the explicit scheme of the proposed robust self-tuning control GMVSTC-αl.  相似文献   

9.
Efficient and effective processing of the distance-based join query (DJQ) is of great importance in spatial databases due to the wide area of applications that may address such queries (mapping, urban planning, transportation planning, resource management, etc.). The most representative and studied DJQs are the K Closest Pairs Query (KCPQ) and εDistance Join Query (εDJQ). These spatial queries involve two spatial data sets and a distance function to measure the degree of closeness, along with a given number of pairs in the final result (K) or a distance threshold (ε). In this paper, we propose four new plane-sweep-based algorithms for KCPQs and their extensions for εDJQs in the context of spatial databases, without the use of an index for any of the two disk-resident data sets (since, building and using indexes is not always in favor of processing performance). They employ a combination of plane-sweep algorithms and space partitioning techniques to join the data sets. Finally, we present results of an extensive experimental study, that compares the efficiency and effectiveness of the proposed algorithms for KCPQs and εDJQs. This performance study, conducted on medium and big spatial data sets (real and synthetic) validates that the proposed plane-sweep-based algorithms are very promising in terms of both efficient and effective measures, when neither inputs are indexed. Moreover, the best of the new algorithms is experimentally compared to the best algorithm that is based on the R-tree (a widely accepted access method), for KCPQs and εDJQs, using the same data sets. This comparison shows that the new algorithms outperform R-tree based algorithms, in most cases.  相似文献   

10.
The k nearest neighbors (k-NN) classification technique has a worldly wide fame due to its simplicity, effectiveness, and robustness. As a lazy learner, k-NN is a versatile algorithm and is used in many fields. In this classifier, the k parameter is generally chosen by the user, and the optimal k value is found by experiments. The chosen constant k value is used during the whole classification phase. The same k value used for each test sample can decrease the overall prediction performance. The optimal k value for each test sample should vary from others in order to have more accurate predictions. In this study, a dynamic k value selection method for each instance is proposed. This improved classification method employs a simple clustering procedure. In the experiments, more accurate results are found. The reasons of success have also been understood and presented.  相似文献   

11.
The successive projection algorithm (SPA) can quickly solve a nonnegative matrix factorization problem under a separability assumption. Even if noise is added to the problem, SPA is robust as long as the perturbations caused by the noise are small. In particular, robustness against noise should be high when handling the problems arising from real applications. The preconditioner proposed by Gillis and Vavasis (SIAM J Optim 25(1):677–698, 2015) makes it possible to enhance the noise robustness of SPA. Meanwhile, an additional computational cost is required. The construction of the preconditioner contains a step to compute the top-k truncated singular value decomposition of an input matrix. It is known that the decomposition provides the best rank-k approximation to the input matrix; in other words, a matrix with the smallest approximation error among all matrices of rank less than k. This step is an obstacle to an efficient implementation of the preconditioned SPA. To address the cost issue, we propose a modification of the algorithm for constructing the preconditioner. Although the original algorithm uses the best rank-k approximation, instead of it, our modification uses an alternative. Ideally, this alternative should have high approximation accuracy and low computational cost. To ensure this, our modification employs a rank-k approximation produced by an SPA based algorithm. We analyze the accuracy of the approximation and evaluate the computational cost of the algorithm. We then present an empirical study revealing the actual performance of the SPA based rank-k approximation algorithm and the modified preconditioned SPA.  相似文献   

12.
This paper suggests ways to facilitate creativity and innovation in software development. The paper applies four perspectives – Product, Project, Process, and People – to identify an outlook for software innovation. The paper then describes a new facility – Software Innovation Research Lab (SIRL) – and a new method concept for software innovation – Essence – based on views, modes, and team roles. Finally, the paper reports from an early experiment using SIRL and Essence and identifies further research.  相似文献   

13.
A p-adaptive continuous residual distribution scheme is proposed in this paper. Under certain conditions, primarily the expression of the total residual on a given element K into residuals on the sub-elements of K and the use of a suitable combination of quadrature formulas, it is possible to change locally the degree of the polynomial approximation of the solution. The discrete solution can then be considered non continuous across the interface of elements of different orders, while the numerical scheme still verifies the hypothesis of the discrete Lax–Wendroff theorem which ensures its convergence to a correct weak solution. We detail the theoretical material and the construction of our p-adaptive method in the frame of a continuous residual distribution scheme. Different test cases for non-linear equations at different flow velocities demonstrate numerically the validity of the theoretical results.  相似文献   

14.
Cellular Learning Automata (CLAs) are hybrid models obtained from combination of Cellular Automata (CAs) and Learning Automata (LAs). These models can be either open or closed. In closed CLAs, the states of neighboring cells of each cell called local environment affect on the action selection process of the LA of that cell whereas in open CLAs, each cell, in addition to its local environment has an exclusive environment which is observed by the cell only and the global environment which can be observed by all the cells in CLA. In dynamic models of CLAs, one of their aspects such as structure, local rule or neighborhood radius may change during the evolution of the CLA. CLAs can also be classified as synchronous CLAs or asynchronous CLAs. In a synchronous CLA, all LAs in different cells are activated synchronously whereas in an asynchronous CLA, the LAs in different cells are activated asynchronously. In this paper, a new closed asynchronous dynamic model of CLA whose structure and the number of LAs in each cell may vary with time has been introduced. To show the potential of the proposed model, a landmark clustering algorithm for solving topology mismatch problem in unstructured peer-to-peer networks has been proposed. To evaluate the proposed algorithm, computer simulations have been conducted and then the results are compared with the results obtained for two existing algorithms for solving topology mismatch problem. It has been shown that the proposed algorithm is superior to the existing algorithms with respect to communication delay and average round-trip time between peers within clusters.  相似文献   

15.
Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently.  相似文献   

16.
In recent years, many layered indexing techniques over distributed hash table (DHT)-based peer-to-peer (P2P) systems have been proposed to realize distributed range search. In this paper, we present a fault tolerant constant degree dynamic Distributed Spatial Data Structure called DSDS that supports orthogonal range search on a set of N d-dimensional points published on n nodes. We describe a total order binary relation algorithm to publish points among supernodes and determine supernode keys. A non-redundant rainbow skip graph is used to coordinate message passing among nodes. The worst case orthogonal range search cost in a d-dimensional DSDS with n nodes is \(O\left (\log n+m+\frac {K}{B}\right )\) messages, where m is the number of nodes intersecting the query, K is the number of points reported in range, and B is the number of points that can fit in one message. A complete backup copy of data points stored in other nodes provides redundancy for our DSDS. This redundancy permits answering a range search query in the case of failure of a single node. For single node failure, the DSDS routing system can be recovered to a fully functional state at a cost of O(log n) messages. Backup sets in DSDS nodes are used to first process a query in the most efficient dimension, and then used to process a query containing the data in a failed node in d-dimensional space. The DSDS search algorithm can process queries in d-dimensional space and still tolerate failure of one node. Search cost in the worst case with a failed node increases to \(O\left (d\log n+dm+\frac {K}{B}\right )\) messages for d dimensions.  相似文献   

17.
Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions.  相似文献   

18.
The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius N rad and minimum number of points in neighborhood N pts . So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.  相似文献   

19.
Caixun Wang 《Calcolo》2017,54(4):1293-1303
This paper introduces the K-nonnegative double splitting of a K-monotone matrix using knowledge of the matrices that leave a cone \(K\subseteq \mathbb {R}^n\) invariant. The convergence of this splitting is studied. Comparison theorems for two K-nonnegative double splittings of a K-monotone matrix are obtained. The results generalize the corresponding results introduced by Song and Song (Calcolo 48:245–260, 2011) for nonnegative double splitting. Some examples are provided to illustrate the main results.  相似文献   

20.
Representative skyline computation is a fundamental issue in database area, which has attracted much attention in recent years. A notable definition of representative skyline is the distance-based representative skyline (DBRS). Given an integer k, a DBRS includes k representative skyline points that aims at minimizing the maximal distance between a non-representative skyline point and its nearest representative. In the 2D space, the state-of-the-art algorithm to compute the DBRS is based on dynamic programming (DP) which takes O(k m 2) time complexity, where m is the number of skyline points. Clearly, such a DP-based algorithm cannot be used for handling large scale datasets due to the quadratic time cost. To overcome this problem, in this paper, we propose a new approximate algorithm called ARS, and a new exact algorithm named PSRS, based on a carefully-designed parametric search technique. We show that the ARS algorithm can guarantee a solution that is at most ?? larger than the optimal solution. The proposed ARS and PSRS algorithms run in O(klog2mlog(T/??)) and O(k 2 log3m) time respectively, where T is no more than the maximal distance between any two skyline points. We also propose an improved exact algorithm, called PSRS+, based on an effective lower and upper bounding technique. We conduct extensive experimental studies over both synthetic and real-world datasets, and the results demonstrate the efficiency and effectiveness of the proposed algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号