共查询到20条相似文献,搜索用时 0 毫秒
1.
Vineet Chaoji Mohammad Al Hasan Saeed Salem Mohammed J. Zaki 《Knowledge and Information Systems》2009,21(2):201-229
Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages—the first stage runs a carefully initialized version of the Kmeans algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches. 相似文献
2.
3.
Y.C. Lee 《Computers & Fluids》2007,36(5):838-855
Gravity-driven continuous thin film flow over a plane, containing well-defined single and grouped topographic features, is modelled as Stokes flow using lubrication theory. The associated time-dependent, nonlinear, coupled set of governing equations is solved using a full approximation storage (FAS) multigrid algorithm by employing automatic mesh adaptivity, the power, efficiency and accuracy of which is demonstrated by comparing the results with corresponding global fine-mesh solutions. These show that automatic grid refinement effectively restricts the use of fine grids to regions of rapid flow development which, for the topographies considered, includes the topography itself, the upstream Capillary ridge, downstream surge region, and the characteristic bow wave. It is shown that for the accurate solution of the associated lubrication equations, adaptive multigrid offers increased flexibility together with a significant reduction in memory requirement. This is further demonstrated by solving the problem of transient flow over a trench topography, generated by a sinusoidally varying inlet condition. 相似文献
4.
Athitsos V Alon J Sclaroff S Kollios G 《IEEE transactions on pattern analysis and machine intelligence》2008,30(1):89-104
This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space, in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embeddingbased classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and non-metric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to brute-force search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods, such as Lipschitz embeddings, FastMap, and VP-trees. 相似文献
5.
We explore in this paper the efficient clustering of market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality and sparsity. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm k-todes, for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality as measured by information gain, indicating the usefulness of category-based adherence in market-basket data clustering. 相似文献
6.
7.
Li GS Tricoche X Weiskopf D Hansen C 《IEEE transactions on visualization and computer graphics》2008,14(5):1067-1080
We introduce a novel flow visualization method called Flow Charts, which uses a texture atlas approach for the visualization of flows defined over curved surfaces. In this scheme the surface and its associated flow are segmented into overlapping patches which are then parameterized and packed in the texture domain. This scheme allows accurate particle advection across multiple charts in the texture domain, providing a flexible framework that supports various flow visualization techniques. The use of surface parameterization enables flow visualization techniques requiring the global view of the surface over long time spans, such as Unsteady Flow LIC (UFLIC), particle-based Unsteady Flow Advection-Convolution (UFAC), or dye advection. It also prevents visual artifacts normally associated with view-dependent methods. Represented as textures, Flow Charts can be naturally integrated into GPU flow visualization techniques for interactive performance. 相似文献
8.
Recursively generated B-spline surfaces on arbitrary topological meshes 总被引:89,自引:0,他引:89
This paper describes a method for recursively generating surfaces that approximate points lying-on a mesh of arbitrary topology. The method is presented as a generalization of a recursive bicubic B-spline patch subdivision algorithm. For rectangular control-point meshes, the method generates a standard B-spline surface. For non-rectangular meshes, it generates surfaces that are shown to reduce to a standard B-spline surface except at a small number of points, called extraordinary points. Therefore, everywhere except at these points the surface is continuous in tangent and curvature. At the extraordinary points, the pictures of the surface indicate that the surface is at least continuous in tangent, but no proof of continuity is given. A similar algorithm for biquadratic B-splines is also presented. 相似文献
9.
The flow of a continuous liquid film, only several microns in thickness, on a plane surface containing occlusions, is modelled using lubrication theory. The resultant time-dependent non-linear equation set is discretised using both finite difference and finite element analogues; the former is solved using a very efficient full approximation storage (FAS) multigrid algorithm employing both automatic spatial and temporal adaptivity, the latter exploits the COMSOL suite of software to solve a weak form of the governing equations. The efficiencies and accuracies of both approaches are compared and a variety of problems, of increasing complexity, explored. Although the adaptive multigrid approach is clearly the most efficient, with the two sets of solutions indistinguishable, COMSOL offers an attractive alternative to the non-specialist user. The results, the first of their kind, further demonstrate: (i) the power of using adaptive multigriding, in time as well as space, for the simultaneous control of both spatial and temporal errors; (ii) the significant reduction in memory requirements and CPU time that can be achieved. It is revealed that occlusions lead to many of the features inherent in the flow of thin liquid films over fully submerged micro-scale topographic features; namely, the presence of capillary ridges linked to “bow-wave” plus “comet-tail” free-surface disturbances. 相似文献
10.
In this paper we propose and analyze a new spatial access method, namely the S*-tree, for the efficient secondary memory encoding and manipulation of images containing multiple non-overlapping features (i.e., coloured images). The S*-tree is based on a non-straightforward and space efficient extension to coloured images of its precursor, namely the S+-tree, which was explicitly designed for binary images. To assess experimentally the qualities of the S*-tree, we test it against the HL-quadtree, a previous spatial access method for coloured images, which is known to be space and time efficient. Our experiments show that the S*-tree reaches up to a 75% of space saving, and performs constantly less I/O accesses than the HL-quadtree in solving classical window queries. 相似文献
11.
《Information Systems》2005,30(3):227-244
Recently, several important database applications have called for the design of efficient techniques for incremental mining of association rules. In response to this need, we explore in this paper an effective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candidate itemset generation. Under SWF, the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions. Algorithm SWF not only significantly reduces I/O and CPU cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding-window partition. More importantly, algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database. By utilizing proper scan reduction techniques, only one scan of the incremented dataset is needed by algorithm SWF. The I/O cost of SWF is, in orders of magnitude, smaller than those required by prior methods, thus resolving the performance bottleneck. Extensive experimental studies are performed to evaluate performance of algorithm SWF. Sensitivity analysis of various parameters is conducted to provide many insights into algorithm SWF. It is noted that the improvement achieved by algorithm SWF is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases. 相似文献
12.
S. Moloudzadeh T. Allahviranloo P. Darabi 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2013,17(9):1725-1731
In this paper, We propose a simple and practical method (that works only for triangular fuzzy numbers) to solve an arbitrary fully fuzzy linear system (FFLS) in the form $\widetilde{A}\otimes \widetilde{x}=\widetilde{b},$ where $\widetilde{A}_{n \times n}$ is a fuzzy matrix, $\widetilde{x}$ and $\widetilde{b}$ are n × 1 fuzzy vectors. The idea of the presented method is constructed based on the extending 0-cut and 1-cut solution of original fully fuzzy linear systems (FFLS). We also define a fuzzy solution of FFLS and establish the necessary and sufficient conditions for the uniqueness of a fuzzy solution. 相似文献
13.
Li X Bao Y Guo X Jin M Gu X Qin H 《IEEE transactions on visualization and computer graphics》2008,14(4):805-819
Computing smooth and optimal one-to-one maps between surfaces of same topology is a fundamental problem in computer graphics and such a method provides us a ubiquitous tool for geometric modeling and data visualization. Its vast variety of applications includes shape registration/matching, shape blending, material/data transfer, data fusion, information reuse, etc. The mapping quality is typically measured in terms of angular distortions among different shapes. This paper proposes and develops a novel quasi-conformal surface mapping framework to globally minimize the stretching energy inevitably introduced between two different shapes. The existing state-of-the-art inter-surface mapping techniques only afford local optimization either on surface patches via boundary cutting or on the simplified base domain, lacking rigorous mathematical foundation and analysis. We design and articulate an automatic variational algorithm that can reach the global distortion minimum for surface mapping between shapes of arbitrary topology, and our algorithm is sorely founded upon the intrinsic geometry structure of surfaces. To our best knowledge, this is the first attempt towards numerically computing globally optimal maps. Consequently, our mapping framework offers a powerful computational tool for graphics and visualization tasks such as data and texture transfer, shape morphing, and shape matching. 相似文献
14.
Progressive image transmission (PIT) is useful in Web applications. When the network bandwidth is restricted, the user can preview a coarse version of the image to decide if the entire detail of a picture needs to be transmitted. We propose a newly developed mechanism for PIT. According to the continuity of image pixels, we introduce a guessing-by-neighbors strategy. Only half of the original pixels are transmitted, with additional pixels for error recovery from unsuccessful guessing. The proposed algorithms can be used for both gray-level and color pictures. Bandwidth is saved. The preliminary results show that the number of transmitted bits is reduced in the first few rounds. Additionally, the PSNR values in the first few rounds are much better than other PIT techniques. 相似文献
15.
Partially materialized digest scheme: an efficient verification method for outsourced databases 总被引:2,自引:0,他引:2
Kyriakos Mouratidis Dimitris Sacharidis HweeHwa Pang 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):363-381
In the outsourced database model, a data owner publishes her database through a third-party server; i.e., the server hosts
the data and answers user queries on behalf of the owner. Since the server may not be trusted, or may be compromised, users
need a means to verify that answers received are both authentic and complete, i.e., that the returned data have not been tampered with, and that no qualifying results have been omitted. We propose a
result verification approach for one-dimensional queries, called Partially Materialized Digest scheme (PMD), that applies to both static and dynamic databases. PMD uses separate indexes for the data and for their associated
verification information, and only partially materializes the latter. In contrast with previous work, PMD avoids unnecessary
costs when processing queries that do not request verification, achieving the performance of an ordinary index (e.g., a B+-tree). On the other hand, when an authenticity and completeness proof is required, PMD outperforms the existing state-of-the-art
technique by a wide margin, as we demonstrate analytically and experimentally. Furthermore, we design two verification methods
for spatial queries. The first, termed Merkle R-tree (MR-tree), extends the conventional approach of embedding authentication information into the data index (i.e., an R-tree).
The second, called Partially Materialized KD-tree (PMKD), follows the PMD paradigm using separate data and verification indexes. An empirical evaluation with real data shows
that the PMD methodology is superior to the traditional approach for spatial queries too. 相似文献
16.
Norifumi Katafuchi Mutsuo Sano Shuichi Ohara Masashi Okudaira 《Machine Vision and Applications》2000,12(4):170-176
A new method based on an optics model for highly reliable surface inspection of industrial parts has been developed. This
method uses multiple images taken under different camera conditions. Phong's model is employed for surface reflection, and
then the albedo and the reflection model parameters are estimated by the least squares method. The developed method has advantages
over conventional binarization in that it can easily determine the threshold of product acceptability and cope with changes
in light intensity when detecting defects. 相似文献
17.
Given a basic pseudo-random number generator which returns uniformly distributed samplesU from the interval (0, 1) and a statistical distribution as defined by its distribution functionF(x). Then the inversion methodX←F ?1 (U) produces samples fromF(x). A procedure is developed which prepares “guide tables” in order to facilitate this inversion so that sampling becomes efficient for arbitraryF(x). For discrete distributions these tables are small and easy to set up, and the resulting sampling algorithm compares well with known general methods. Continuous distributions require longer set-up times and more space for tables. These are prepared using given probability densitiesf(x). The method can cope with “reasonable”f(x) including most cases which are commonly encountered in statistics. The reported computational experience, on Poisson, Normal, Gamma and Cauchy distributions, indicates that our general routine is almost as fast as the best known sampling algorithms which were specially designed for these distributions. 相似文献
18.
O'Gorman L Sanderson AC 《IEEE transactions on pattern analysis and machine intelligence》1984,(3):280-288
The converging squares algorithm is a method for locating peaks in sampled data of two dimensions or higher. There are two primary advantages of this algorithm over conventional methods. First, it is robust with respect to noise and data type. There are no empirical parameters to permit adjustment of the process, so results are completely objective. Second, the method is computationally efficient. The inherent structure of the algorithm is that of a resolution pyramid. This enhances computational efficiency as well as contributing to the quality of noise immunity of the method. The algorithm is detailed for two-dimensional data, and is described for three-dimensional data. Quantitative comparisons of computation are made with two conventional peak picking methods. Applications to biomedical image analysis, and for industrial inspection tasks are discussed. 相似文献
19.
Layered point clouds: a simple and efficient multiresolution structure for distributing and rendering gigantic point-sampled models 总被引:1,自引:0,他引:1
We recently introduced an efficient multiresolution structure for distributing and rendering very large point sampled models on consumer graphics platforms [1]. The structure is based on a hierarchy of precomputed object-space point clouds, that are combined coarse-to-fine at rendering time to locally adapt sample densities according to the projected size in the image. The progressive block based refinement nature of the rendering traversal exploits on-board caching and object based rendering APIs, hides out-of-core data access latency through speculative prefetching, and lends itself well to incorporate backface, view frustum, and occlusion culling, as well as compression and view-dependent progressive transmission. The resulting system allows rendering of complex out-of-core models at high frame rates (over 60 M rendered points/second), supports network streaming, and is fundamentally simple to implement. We demonstrate the efficiency of the approach on a number of very large models, stored on local disks or accessed through a consumer level broadband network, including a massive 234 M samples isosurface generated by a compressible turbulence simulation and a 167 M samples model of Michelangelo's St. Matthew. Many of the details of our framework were presented in a previous study. We here provide a more thorough exposition, but also significant new material, including the presentation of a higher quality bottom-up construction method and additional qualitative and quantitative results. 相似文献