首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
The Relative Neighborhood Graph (RNG) of a set of nk-dimensional points connects “relatively close” neighbors: two points are connected by an edge if they are at least as close to each other as to any other point. Toussaint recently investigated the properties of the RNG in the Euclidean metric and proposed algorithms for its computation. This note examines one of the open problems listed by Toussaint: the extension of the analysis to non-Euclidean metrics. It is shown that Bentley's range query data structures may be used to improve the speed of the best known RNG algorithm in the L (for k ? 2) and L1 (for k = 2) metrics.  相似文献   

2.
k-Anonymity is a method for providing privacy protection by ensuring that data cannot be traced to an individual. In a k-anonymous dataset, any identifying information occurs in at least k tuples. To achieve optimal and practical k-anonymity, recently, many different kinds of algorithms with various assumptions and restrictions have been proposed with different metrics to measure quality. This paper evaluates a family of clustering-based algorithms that are more flexible and even attempts to improve precision by ignoring the restrictions of user-defined Domain Generalization Hierarchies. The evaluation of the new approaches with respect to cost metrics shows that metrics may behave differently with different algorithms and may not correlate with some applications’ accuracy on output data.  相似文献   

3.
Functional verification has become the key bottleneck that delays time-to-market during the embedded system design process. And simulation-based verification is the mainstream practice in functional verification due to its flexibility and scalability. In practice, the success of the simulation-based verification highly depends on the quality of functional tests in use which is usually evaluated by coverage metrics. Since test prioritization can provide a way to simulate the more important tests which can improve the coverage metrics evidently earlier, we propose a test prioritization approach based on the clustering algorithm to obtain a high coverage level earlier in the simulation process. The k-means algorithm, which is one of the most popular clustering algorithms and usually used for the test prioritization, has some shortcomings which have an effect on the effectiveness of test prioritization. Thus we propose three enhanced k-means algorithms to overcome these shortcomings and improve the effectiveness of the test prioritization. Then the functional tests in the simulation environment can be ordered with the test prioritization based on the enhanced k-means algorithms. Finally, the more important tests, which can improve the coverage metrics evidently, can be selected and simulated early within the limited simulation time. Experimental results show that the enhanced k-means algorithms are more accurate and efficient than the standard k-means algorithm for the test prioritization, especially the third enhanced k-means algorithm. In comparison with simulating all the tests randomly, the more important tests, which are selected with the test prioritization based on the third enhanced k-means algorithm, achieve almost the same coverage metrics in a shorter time, which achieves a 90% simulation time saving.  相似文献   

4.
This paper presents parallel algorithms for determining the number of partitions of a given integerN, where the partitions may be subject to restrictions, such as being composed of distinct parts, of a given number of parts, and/or of parts belonging to a specified set. We present a series of adaptive algorithms suitable for varying numbers of processors. The fastest of these algorithms computes the number of partitions ofnwith largest part equal tok, for 1 ≤knN, in timeO(log2(N)) usingO(N2/logN) processors. Parallel logarithmic time algorithms that generate partitions uniformly at random, using these quantities, are also presented.  相似文献   

5.
In this paper, a new approach called ‘instance variant nearest neighbor’ approximates a regression surface of a function using the concept of k nearest neighbor. Instead of fixed k neighbors for the entire dataset, our assumption is that there are optimal k neighbors for each data instance that best approximates the original function by fitting the local regions. This approach can be beneficial to noisy datasets where local regions form data characteristics that are different from the major data clusters. We formulate the problem of finding such k neighbors for each data instance as a combinatorial optimization problem, which is solved by a particle swarm optimization. The particle swarm optimization is extended with a rounding scheme that rounds up or down continuous-valued candidate solutions to integers, a number of k neighbors. We apply our new approach to five real-world regression datasets and compare its prediction performance with other function approximation algorithms, including the standard k nearest neighbor, multi-layer perceptron, and support vector regression. We observed that the instance variant nearest neighbor outperforms these algorithms in several datasets. In addition, our new approach provides consistent outputs with five datasets where other algorithms perform poorly.  相似文献   

6.
Many time-critical applications require predictable performance and tasks in these applications have deadlines to be met. For tasks with hard deadlines, a deadline miss can be catastrophic while for Quality of Service (QoS) degradable tasks (soft real-time tasks) timely approximate results of poorer quality or occasional deadline misses are acceptable. Imprecise computation and (m,k)-firm guarantee are two workload models that quantify the trade-off between schedulability and result quality. In this paper, we propose dynamic scheduling algorithms for integrated scheduling of real-time tasks, represented by these workload models, in multiprocessor systems. The algorithms aim at improving the schedulability of tasks by exploiting the properties of these models in QoS degradation. We also show how the proposed algorithms can be adapted for integrated scheduling of multimedia streams and hard real-time tasks, and demonstrate their effectiveness in quantifying QoS degradation. Through simulation, we evaluate the performance of these algorithms using the metrics – success ratio (measure of schedulability) and quality. Our simulation results show that one of the proposed algorithms, multilevel degradation algorithm, outperforms the others in terms of both the performance metrics.  相似文献   

7.
Finding the nearest k objects to a query object is a fundamental operation for many data mining algorithms. With the recent interest in privacy, it is not surprising that there is strong interest in k-NN queries to enable clustering, classification and outlier-detection tasks. However, previous approaches to privacy-preserving k-NN have been costly and can only be realistically applied to small data sets. In this paper, we provide efficient solutions for k-NN queries for vertically partitioned data. We provide the first solution for the L (or Chessboard) metric as well as detailed privacy-preserving computation of all other Minkowski metrics. We enable privacy-preserving L by providing a practical approach to the Yao’s millionaires problem with more than two parties. This is based on a pragmatic and implementable solution to Yao’s millionaires problem with shares. We also provide privacy-preserving algorithms for combinations of local metrics into a global metric that handles the large dimensionality and diversity of attributes common in vertically partitioned data. To manage very large data sets, we provide a privacy-preserving SASH (a very successful data structure for associative queries in high dimensions). Besides providing a theoretical analysis, we illustrate the efficiency of our approach with an empirical evaluation.  相似文献   

8.
In many complex computational processes one may want to store a sample of the process’ history for later use by placing checkpoints. In this paper we consider the problem of maintaining, in an online fashion, a collection of k checkpoints as an approximately uniformly spaced sample in the history of a continuous-time process. We present deterministic algorithms tailored for small values of k and a general one for arbitrary k. The algorithms are proven to be close to optimum for several different measures.  相似文献   

9.
Recommendation agents employ prediction algorithms to provide users with items that match their interests. In this paper, several prediction algorithms are described and evaluated, some of which are novel in that they combine user-based and item-based similarity measures derived from either explicit or implicit ratings. Both statistical and decision-support accuracy metrics of the algorithms are compared against different levels of data sparsity and different operational thresholds. The first metric evaluates the accuracy in terms of average absolute deviation, while the second evaluates how effectively predictions help users to select high-quality items. The experimental results indicate better performance of item-based predictions derived from explicit ratings in relation to both metrics. Category-boosted predictions lead to slightly better predictions when combined with explicit ratings, while implicit ratings, in the context that have been defined in this paper, perform much worse than explicit ratings.  相似文献   

10.
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over 100 processors and indicate that similar speedup can be obtained with several hundred processors.  相似文献   

11.
This paper derives a number of results related to the topological properties of OTIS k-ary n-cube interconnection networks. The basic topological metrics of size, degree, shortest distance, and diameter are obtained. Then results related to embedding in OTIS k-ary n-cubes of OTIS k-ary (n−1)-cubes, cycles, meshes, cubes, and spanning trees are derived. The OTIS k-ary n-cube is shown to be Hamiltonian. Minimal one-to-one routing and optimal broadcasting algorithms are proposed. The OTIS k-ary n-cube is shown to be maximally fault-tolerant. These results are derived based on known properties of k-ary n-cube networks and general properties of OTIS networks.  相似文献   

12.
The satisfiability problem is a basic core NP-complete problem. In recent years, a lot of heuristic algorithms have been developed to solve this problem, and many experiments have evaluated and compared the performance of different heuristic algorithms. However, rigorous theoretical analysis and comparison are rare. This paper analyzes and compares the expected runtime of three basic heuristic algorithms: RandomWalk, (1+1) EA, and hybrid algorithm. The runtime analysis of these heuristic algorithms on two 2-SAT instances shows that the expected runtime of these heuristic algorithms can be exponential time or polynomial time. Furthermore, these heuristic algorithms have their own advantages and disadvantages in solving different SAT instances. It also demonstrates that the expected runtime upper bound of RandomWalk on arbitrary k-SAT (k?3) is O(n(k−1)), and presents a k-SAT instance that has Θ(n(k−1)) expected runtime bound.  相似文献   

13.
In this paper, the conventional k-modes-type algorithms for clustering categorical data are extended by representing the clusters of categorical data with k-populations instead of the hard-type centroids used in the conventional algorithms. Use of a population-based centroid representation makes it possible to preserve the uncertainty inherent in data sets as long as possible before actual decisions are made. The k-populations algorithm was found to give markedly better clustering results through various experiments.  相似文献   

14.
We consider the problem of learning an acyclic discrete circuit with n wires, fan-in bounded by k and alphabet size s using value injection queries. For the class of transitively reduced circuits, we develop the Distinguishing Paths Algorithm, that learns such a circuit using (ns) O(k) value injection queries and time polynomial in the number of queries. We describe a generalization of the algorithm to the class of circuits with shortcut width bounded by b that uses (ns) O(k+b) value injection queries. Both algorithms use value injection queries that fix only O(kd) wires, where d is the depth of the target circuit. We give a reduction showing that without such restrictions on the topology of the circuit, the learning problem may be computationally intractable when s=n Θ(1), even for circuits of depth O(log?n). We then apply our large-alphabet learning algorithms to the problem of approximate learning of analog circuits whose gate functions satisfy a Lipschitz condition. Finally, we consider models in which behavioral equivalence queries are also available, and extend and improve the learning algorithms of (Angluin in Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, pp. 584–593, 2006) to handle general classes of gate functions that are polynomial time learnable from counterexamples.  相似文献   

15.
Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analysis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page–Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.  相似文献   

16.
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. For a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.  相似文献   

17.
In privacy-preserving data mining (PPDM), a widely used method for achieving data mining goals while preserving privacy is based on k-anonymity. This method, which protects subject-specific sensitive data by anonymizing it before it is released for data mining, demands that every tuple in the released table should be indistinguishable from no fewer than k subjects. The most common approach for achieving compliance with k-anonymity is to replace certain values with less specific but semantically consistent values. In this paper we propose a different approach for achieving k-anonymity by partitioning the original dataset into several projections such that each one of them adheres to k-anonymity. Moreover, any attempt to rejoin the projections, results in a table that still complies with k-anonymity. A classifier is trained on each projection and subsequently, an unlabelled instance is classified by combining the classifications of all classifiers.Guided by classification accuracy and k-anonymity constraints, the proposed data mining privacy by decomposition (DMPD) algorithm uses a genetic algorithm to search for optimal feature set partitioning. Ten separate datasets were evaluated with DMPD in order to compare its classification performance with other k-anonymity-based methods. The results suggest that DMPD performs better than existing k-anonymity-based algorithms and there is no necessity for applying domain dependent knowledge. Using multiobjective optimization methods, we also examine the tradeoff between the two conflicting objectives in PPDM: privacy and predictive performance.  相似文献   

18.
We present in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification different metrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining.As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques.  相似文献   

19.
Zhou  Jukai  Liu  Tong  Zhu  Jingting 《Multimedia Tools and Applications》2019,78(23):33415-33434

K-means clustering is one of the most popular clustering algorithms and has been embedded in other clustering algorithms, e.g. the last step of spectral clustering. In this paper, we propose two techniques to improve previous k-means clustering algorithm by designing two different adjacent matrices. Extensive experiments on public UCI datasets showed the clustering results of our proposed algorithms significantly outperform three classical clustering algorithms in terms of different evaluation metrics.

  相似文献   

20.
When a table containing individual data is published, disclosure of sensitive information should be prohibitive. Since simply removing identifiers such as name and social security number may reveal the sensitive information by linking attacks which joins the published table with other tables on some attributes, the notion of k-anonymity which makes each record in the table be indistinguishable with k−1 other records by suppression or generalization has been proposed previously. It is shown to be NP-hard to k-anonymize a table minimizing information loss. The approximation algorithms with up to O(k) approximation ratio were proposed when generalization is used for anonymization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号