首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 156 毫秒
1.
This paper presents a parallel volume rendering algorithm that can render a 256×256×225 voxel medical data set at over 15 Hz and a 512×512×334 voxel data set at over 7 Hz on a 32-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recently-reported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. The algorithm uses run-length encoding to exploit coherence and an efficient volume traversal to reduce overhead. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for shared-memory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets  相似文献   

2.
Communication costs for parallel volume-rendering algorithms   总被引:2,自引:0,他引:2  
The computational expense of volume rendering motivates the development of parallel implementations on multicomputers. Parallelism achieves higher frame rates, which provide more natural viewing control and enhanced comprehension of 3D structure. Although many parallel implementations exist, we have no framework to compare their relative merits independent of host hardware. The article attempts to establish that framework by enumerating and classifying parallel volume-rendering algorithms suitable for multicomputers with distributed memory and a communication network. It determined the communication costs for classes of parallel algorithms by considering their inherent communication requirements  相似文献   

3.
This paper presents the design and performance of a new parallel graphics renderer for 3D images. This renderer is based on an adaptive supersampling approach that works for time/space-efficient execution on two classes of parallel computers. Our rendering scheme takes subpixel supersamples only along polygon edges. This leads to a significant reduction in rendering time and in buffer memory requirements. Furthermore, we offer a balanced rasterization of all transformed polygons. Experimental results prove these advantages on both a shared-memory SGI multiprocessor server and a Unix cluster of Sun workstations. We reveal performance effects of the new rendering scheme on subpixel resolution, polygon number, scene complexity, and memory requirements. The balanced parallel renderer demonstrates scalable performance with respect to increase in graphic complexity and in machine size. Our parallel renderer outperforms Crow's scheme in benchmark experiments performed. The improvements are made in three fronts: (1) reduction in rendering time, (2) higher efficiency with balanced workload,: and (3) adaptive to available buffer memory size. The balanced renderer can be more cost-effectively embedded within many 3D graphics algorithms, such as those for edge smoothing and 3D visualization. Our parallel renderer is MPI-coded, offering high portability and cross-platform performance. These advantages can greatly improve the QoS in 3D imaging and in real-time interactive graphics  相似文献   

4.
Coordinating Parallel Processes on Networks of Workstations   总被引:1,自引:0,他引:1  
The network of workstations (NOW) we consider for scheduling is heterogeneous and nondedicated, where computing power varies among the workstations and local and parallel jobs may interact with each other in execution. An effective NOW scheduling scheme needs sufficient information about system heterogeneity and job interactions. We use the measured power weight of each workstation to quantify the differences of computing capability in the system. Without a processing power usage agreement between parallel jobs and local user jobs in a workstation, job interactions are unpredictable, and performance of either type of jobs may not be guaranteed. Using the quantified and deterministic system information, we design a scheduling scheme calledself-coordinated local schedulingon a heterogeneous NOW. Based on a power usage agreement between local and parallel jobs, this scheme coordinates parallel processes independently in each workstation based on the coscheduling principle. We discuss its implementation on Unix System V Release 4 (SVR4). Our simulation results on a heterogeneous NOW show the effectiveness of the self-coordinated local scheduling scheme.  相似文献   

5.
Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.  相似文献   

6.
Displaying a large number of lines within a limited amount of screen space is a task that is common to many different classes of visualization techniques such as time‐series visualizations, parallel coordinates, link‐node diagrams, and phase‐space diagrams. This paper addresses the challenging problems of cluttering and overdraw inherent to such visualizations. We generate a 2×2 tensor field during line rasterization that encodes the distribution of line orientations through each image pixel. Anisotropic diffusion of a noise texture is then used to generate a dense, coherent visualization of line orientation. In order to represent features of different scales, we employ a multi‐resolution representation of the tensor field. The resulting technique can easily be applied to a wide variety of line‐based visualizations. We demonstrate this for parallel coordinates, a time‐series visualization, and a phase‐space diagram. Furthermore, we demonstrate how to integrate a focus+context approach by incorporating a second tensor field. Our approach achieves interactive rendering performance for large data sets containing millions of data items, due to its image‐based nature and ease of implementation on GPUs. Simulation results from computational fluid dynamics are used to evaluate the performance and usefulness of the proposed method.  相似文献   

7.
8.
Scattering Points in Parallel Coordinates   总被引:1,自引:0,他引:1  
In this paper, we present a novel parallel coordinates design integrated with points (scattering points in parallel coordinates, SPPC), by taking advantage of both parallel coordinates and scatterplots. Different from most multiple views visualization frameworks involving parallel coordinates where each visualization type occupies an individual window, we convert two selected neighboring coordinate axes into a scatterplot directly. Multidimensional scaling is adopted to allow converting multiple axes into a single subplot. The transition between two visual types is designed in a seamless way. In our work, a series of interaction tools has been developed. Uniform brushing functionality is implemented to allow the user to perform data selection on both points and parallel coordinate polylines without explicitly switching tools. A GPU accelerated dimensional incremental multidimensional scaling (DIMDS) has been developed to significantly improve the system performance. Our case study shows that our scheme is more efficient than traditional multi-view methods in performing visual analysis tasks.  相似文献   

9.
图像处理中的对象阴影计算影响着图像的渲染速度,是图像处理领域的重要研究内容。为了进一步提高对象阴影的渲染速度,提出了一种基于聚类方法的对象阴影识别方法。按照光线的衰减半径将光线表示为一个个球体,当球体之间的距离大于预定义的最小距离时,将其划分到两个不同的类中,采用自上而下的层次方法对光线进行聚类。在聚类过程中,光线的衰减半径随着与光源的距离呈线性增长。对光线进行聚类分析后,对同一聚类内的光线采用相同的渲染方式,因而提高了阴影的渲染效率。最后通过实验验证了提出的方法的有效性。  相似文献   

10.
Dynamic load balancing for parallel polygon rendering   总被引:2,自引:0,他引:2  
Using parallel processing for visualization speeds up computer graphics rendering of complex data sets. A parallel algorithm designed for polygon scan conversion and rendering is presented which supports fast rendering of highly complex data sets using advanced lighting models. Dedicated graphics rendering engines do not necessarily suit such data sets, although they can support real-time update of moderately complex scenes using simple lighting. Advantages to using a software-based approach include the feasibility of adding special rendering features to the program and the capability of integrating a parallel scientific application with a parallel graphics renderer. A new work decomposition strategy presented, called task adaptive, is based on dynamically partitioning the amount of computational work left at a given time. The algorithm uses a heuristic for dynamic task decomposition in which image space tasks are partitioned without requiring interruption of the partitioned processor. A sophisticated memory referencing strategy lets local memory access graphics data during rendering. This permits implementation of the algorithm on a distributed memory multiprocessor. An in-depth analysis of the overhead costs accompanying parallel processing shows where performance is adequate or could be improved  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号