首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Texture‐by‐Numbers is an attractive texture synthesis framework, because it is able to cope with non‐homogeneous texture exemplars, and provides the user with intuitive creative control over the outcome of the synthesis process. Like many other exemplar‐based texture synthesis methods, its basic underlying mechanism is neighbourhood matching. In this paper we review a number of commonly used neighbourhood matching acceleration techniques, compare and analyse their performance in the specific context of Texture‐by‐Numbers (as opposed to ordinary unconstrained texture synthesis). Our study indicates that the standard approaches are not optimally suited for the Texture‐by‐Numbers framework, often producing visually inferior results compared to searching for the exact L2 nearest neighbour. We then show that performing Texture‐by‐Number using the Texture Optimization framework in conjunction with an efficient FFT‐based search is able to produce good results in reasonable running times and with a minimal memory overhead.  相似文献   

2.
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. Our connected components algorithm uses a novel approach for parallel merging which performs drastically limited updating during iterative steps, and concludes with a total consistency update at the final step. The algorithms have been coded in S -C and run on a variety of platforms. Our experimental results are consistent with the theoretical analysis and provide the best known execution times for these two primitives, even when compared with machine-specific implementations.  相似文献   

3.
External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Blockwise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in high performance parallel computation systems, for which blockwise communication is a crucial issue. {We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel computers. Our examples originate as algorithms for parallel machines, and we adapt them to the EM situation where the queries and data structure are considered to be much larger than the size of the available internal memory. } This paper presents techniques to achieve blocking for I/ O as well as for communication in multisearch on the BSP and EM-BSP models. We describe improvements to the 1-optimal BSP* multisearch algorithm of [8] which permit larger search trees to be handled. In the area of EM algorithms new algorithms for multisearch in balanced trees are described. For search trees of size O(nlog n) where n is the number of queries, we obtain a work-optimal, parallel, randomized EM multisearch algorithm whose I/ O and communication time are smaller, asymptotically, than the computation time. We obtain a deterministic version via a similar technique. These algorithms are obtained via the simulation techniques of [12], [17], [13], [14], and [24]. For larger trees we describe a parallel, EM algorithm which is 1-optimal considering computation, communication, and I/ O (for suitable parameter constellations) plus I/ O-optimal. We give a lower bound to the number of I/ O operations required for filtering n queries through a binary or multiway search tree of size m when m\geq n 2+ε , for a constant ε>0 . Online publication February 20, 2001.  相似文献   

4.
Image space occlusion culling is a powerful approach to reduce the rendering load of large polygonal models. However, occlusion culling is not for free; it trades overhead costs with the rendering costs of the possibly occluded geometry. Meanwhile, occlusion queries based on image space occlusion culling are supported on modern graphics hardware. However, a significant consumption of fillrate bandwidth and latency costs are associated with these queries. In this paper, we propose new techniques to reduce redundant occlusion queries. Our approach uses several "Occupancy Maps" to organize scene traversal. The respective information is accumulated efficiently by hardware‐supported asynchronous occlusion queries. To avoid redundant requests, we arrange these multiple occlusion queries according to the information of the Occupancy Maps. Our presented technique is conservative and benefits from a partial depth order of the geometry.  相似文献   

5.
Processing point clouds often requires information about the point neighbourhood in order to extract, calculate and determine characteristics. We continue the tradition of developing increasingly faster neighbourhood query algorithms and present a highly efficient algorithm for solving the exact neighbourhood problem in point clouds using the GPU. Both, the required data structures and the kNN query, are calculated entirely on the GPU. This enables real‐time performance for large queries in extremely large point clouds. Our experiments show a more than threefold acceleration, compared to state‐of‐the‐art GPU based methods including all memory transfers. In terms of pure query performance, we achieve over answered neighbourhood queries per millisecond for 16 nearest neighbours on common graphics hardware.  相似文献   

6.
Proximity queries such as closest point computation and collision detection have many applications in computer graphics, including computer animation, physics‐based modelling, augmented and virtual reality. We present efficient algorithms for proximity queries between a closed rigid object and an arbitrary, possibly deformable, polygonal mesh. Using graphics hardware to densely sample the distance field of the rigid object over the arbitrary mesh, we compute minimal proximity and collision response information on the graphics processing unit (GPU) using blending and depth buffering, as well as parallel reduction techniques, thus minimizing the readback bottleneck. Although limited to image‐space resolution, our algorithm provides high and steady performance when compared with other similar algorithms. Proximity queries between arbitrary meshes with hundreds of thousands of triangles and detailed distance fields of rigid objects are computed in a few milliseconds at high‐sampling resolution, even in situations with large overlap.  相似文献   

7.
We present graphics processing unit (GPU) data structures and algorithms to efficiently solve sparse linear systems that are typically required in simulations of multi‐body systems and deformable bodies. Thereby, we introduce an efficient sparse matrix data structure that can handle arbitrary sparsity patterns and outperforms current state‐of‐the‐art implementations for sparse matrix vector multiplication. Moreover, an efficient method to construct global matrices on the GPU is presented where hundreds of thousands of individual element contributions are assembled in a few milliseconds. A finite‐element‐based method for the simulation of deformable solids as well as an impulse‐based method for rigid bodies are introduced in order to demonstrate the advantages of the novel data structures and algorithms. These applications share the characteristic that a major computational effort consists of building and solving systems of linear equations in every time step. Our solving method results in a speed‐up factor of up to 13 in comparison to other GPU methods.  相似文献   

8.
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets. Recommended by: Ahmed Elmagarmid Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342.  相似文献   

9.
The constantly increasing complexity of polygonal models in interactive applications poses two major problems. First, the number of primitives that can be rendered at real‐time frame rates is currently limited to a few million. Secondly, less than 45 million triangles—with vertices and normal—can be stored per gigabyte. Although the rendering time can be reduced using level‐of‐detail (LOD) algorithms, representing a model at different complexity levels, these often even increase memory consumption. Out‐of‐core algorithms solve this problem by transferring the data currently required for rendering from external devices. Compression techniques are commonly used because of the limited bandwidth. The main problem of compression and decompression algorithms is the only coarse‐grained random access. A similar problem occurs in view‐dependent LOD techniques. Because of the interdependency of split operations, the adaption rate is reduced leading to visible popping artefacts during fast movements. In this paper, we propose a novel algorithm for real‐time view‐dependent rendering of gigabyte‐sized models. It is based on a neighbourhood dependency‐free progressive mesh data structure. Using a per operation compression method, it is suitable for parallel random‐access decompression and out‐of‐core memory management without storing decompressed data.  相似文献   

10.
This paper presents a reformulation of bidirectional path‐tracing that adequately divides the algorithm into processes efficiently executed in parallel on both the CPU and the GPU. We thus benefit from high‐level optimization techniques such as double buffering, batch processing, and asyncronous execution, as well as from the exploitation of most of the CPU, GPU, and memory bus capabilities. Our approach, while avoiding pure GPU implementation limitations (such as limited complexity of shaders, light or camera models, and processed scene data sets), is more than ten times faster than standard bidirectional path‐tracing implementations, leading to performance suitable for production‐oriented rendering engines.  相似文献   

11.
We present an efficient algorithm for object‐space proximity queries between multiple deformable triangular meshes. Our approach uses the rasterization capabilities of the GPU to produce an image‐space representation of the vertices. Using this image‐space representation, inter‐object vertex‐triangle distances and closest points lying under a user‐defined threshold are computed in parallel by conservative rasterization of bounding primitives and sorted using atomic operations. We additionally introduce a similar technique to detect penetrating vertices. We show how mechanisms of modern GPUs such as mipmapping, Early‐Z and Early‐Stencil culling can optimize the performance of our method. Our algorithm is able to compute dense proximity information for complex scenes made of more than a hundred thousand triangles in real time, outperforming a CPU implementation based on bounding volume hierarchies by more than an order of magnitude.  相似文献   

12.
In this paper, we present a novel hash map-based sparse data structure for Smoothed Particle Hydrodynamics, which allows for efficient neighbourhood queries in spatially adaptive simulations as well as direct ray tracing of fluid surfaces. Neighbourhood queries for adaptive simulations are improved by using multiple independent data structures utilizing the same underlying self-similar particle ordering, to significantly reduce non-neighbourhood particle accesses. Direct ray tracing is performed using an auxiliary data structure, with constant memory consumption, which allows for efficient traversal of the hash map-based data structure as well as efficient intersection tests. Overall, our proposed method significantly improves the performance of spatially adaptive fluid simulations and allows for direct ray tracing of the fluid surface with little memory overhead.  相似文献   

13.
14.
Hardware/software partitioning is an essential step in hardware/software co-design. For large size problems, it is difficult to consider both solution quality and time. This paper presents an efficient GPU-based parallel tabu search algorithm (GPTS) for HW/SW partitioning. A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically. A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS. To further minimize the transfer overhead of GPTS between CPU and GPU, an optimized transfer strategy for GPU-based tabu evaluation is proposed, which considers that all the candidates do not satisfy the given constraint. Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning. The proposed parallelization is significant when considering the ordinary GPU platform.  相似文献   

15.
The one‐step leapfrog alternative‐direction‐implicit finite‐difference time‐domain (ADI‐FDTD), free from the Courant‐Friedrichs‐Lewy (CFL) stability condition and sub‐step computations, is efficient when dealing with fine grid problems. However, solution of the numerous tridiagonal systems still imposes a great computational burden and makes the method hard to execute in parallel. In this paper, we proposed an efficient graphic processing unit (GPU)‐based parallel implementation of the one‐step leapfrog ADI‐FDTD for the far‐field EM scattering simulation of objects, in which we present and analyze the manners of calculation area division and thread allocation and a data layout transformation of z components is proposed to achieve better memory access mode, which is a key factor affecting GPU execution efficiency. The simulation experiment is carried out to verify the accuracy and efficiency of the GPU‐based implementation. The simulation results show that there is a good agreement between the proposed one‐step leapfrog ADI‐FDTD method and Yee's FDTD in solving the far‐field scattering problem and huge benefits in performance were encountered when the method was accelerated using GPU technology.  相似文献   

16.
Visualizing dynamic participating media in particle form by fully solving equations from the light transport theory is a computationally very expensive process. In this paper, we present a computational pipeline for particle volume rendering that is easily accelerated by the current GPU. To fully harness its massively parallel computing power, we transform input particles into a volumetric density field using a GPU-assisted, adaptive density estimation technique that iteratively adapts the smoothing length for local grid cells. Then, the volume data is visualized efficiently based on the volume photon mapping method where our GPU techniques further improve the rendering quality offered by previous implementations while performing rendering computation in acceptable time. It is demonstrated that high quality volume renderings can be easily produced from large particle datasets in time frames of a few seconds to less than a minute.  相似文献   

17.
While large‐scale parallel/distributed simulations are rapidly becoming critical research modalities in academia and industry, their efficient and scalable implementations continue to present many challenges. A key challenge is that the dynamic and complex communication/coordination required by these applications (dependent on the state of the phenomenon being modeled) are determined by the specific numerical formulation, the domain decomposition and/or sub‐domain refinement algorithms used, etc. and are known only at runtime. This paper presents Seine, a dynamic geometry‐based shared‐space interaction framework for scientific applications. The framework provides the flexibility of shared‐space‐based models and supports extremely dynamic communication/coordination patterns, while still enabling scalable implementations. The design and prototype implementation of Seine are presented. Seine complements and can be used in conjunction with existing parallel programming systems such as MPI and OpenMP. An experimental evaluation using an adaptive multi‐block oil‐reservoir simulation is used to demonstrate the performance and scalability of applications using Seine. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

18.
Presently, dynamic surface-based models are required to contain increasingly larger numbers of points and to propagate them over longer time periods. For large numbers of surface points, the octree data structure can be used as a balance between low memory occupation and relatively rapid access to the stored data. For evolution rules that depend on neighborhood states, extended simulation periods can be obtained by using simplified atomistic propagation models, such as the Cellular Automata (CA). This method, however, has an intrinsic parallel updating nature and the corresponding simulations are highly inefficient when performed on classical Central Processing Units (CPUs), which are designed for the sequential execution of tasks. In this paper, a series of guidelines is presented for the efficient adaptation of octree-based, CA simulations of complex, evolving surfaces into massively parallel computing hardware. A Graphics Processing Unit (GPU) is used as a cost-efficient example of the parallel architectures. For the actual simulations, we consider the surface propagation during anisotropic wet chemical etching of silicon as a computationally challenging process with a wide-spread use in microengineering applications. A continuous CA model that is intrinsically parallel in nature is used for the time evolution. Our study strongly indicates that parallel computations of dynamically evolving surfaces simulated using CA methods are significantly benefited by the incorporation of octrees as support data structures, substantially decreasing the overall computational time and memory usage.  相似文献   

19.
Object-oriented database management systems (OODBMSs) provide rich facilities for the modeling and processing of structural as well as behavioral properties of complex application objects. However, due to their inherent generality and continuously evolving functionalities, efficient implementations are important for these OODBMSs to support the present and future applications, particularly when the databases are very large. In this paper, we present several parallel, multi-wavefront algorithms based on two processing approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries. Both approaches allow more processors to operate concurrently on a query than the traditional tree-structured query processing approach, thus introducing a higher degree of parallelism in query processing. A heuristic method is presented for partitioning an object-oriented database (OODB). The main consideration for partitioning the database is load balancing. This method also tries to reduce the communication time by reducing the length of the path that wavefronts need to be propagated. Multiple wavefront algorithms based on the two approaches for tree-structured queries have been implemented on an nCUBE 2 parallel computer. The implementation of the query processor allows multiple queries to be executed simultaneously. This implementation provides an environment for evaluating the algorithms and the heuristic method for partitioning the database. The evaluation results are presented in this paper.Recommended by: Patrick Valduriez  相似文献   

20.
Recently, due to the imprecise nature of the data generated from a variety of streaming applications, such as sensor networks, query processing on uncertain data streams has become an important problem. However, all the existing works on uncertain data streams study unbounded streams. In this paper, we take the first step towards the important and challenging problem of answering sliding-window queries on uncertain data streams, with a focus on one of the most important types of queries—top-k queries. It is nontrivial to find an efficient solution for answering sliding-window top-k queries on uncertain data streams, because challenges not only stem from the strict space and time requirements of processing both arriving and expiring tuples in high-speed streams, but also rise from the exponential blowup in the number of possible worlds induced by the uncertain data model. In this paper, we design a unified framework for processing sliding-window top-k queries on uncertain streams. We show that all the existing top-k definitions in the literature can be plugged into our framework, resulting in several succinct synopses that use space much smaller than the window size, while they are also highly efficient in terms of processing time. We also extend our framework to answering multiple top-k queries. In addition to the theoretical space and time bounds that we prove for these synopses, we present a thorough experimental report to verify their practical efficiency on both synthetic and real data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号