首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Cardiac simulation on multi-GPU platform   总被引:2,自引:2,他引:0  
The cardiac bidomain model is a popular approach to study electrical behavior of tissues and simulate interactions between the cells by solving partial differential equations. The iterative and data parallel model is an ideal match for the parallel architecture of Graphic Processing Units (GPUs). In this study, we evaluate the effectiveness of architecture-specific optimizations and fine grained parallelization strategies, completely port the model to GPU, and evaluate the performance of single-GPU and multi-GPU implementations. Simulating one action potential duration (350 msec real time) for a 256×256×256 tissue takes 453 hours on a high-end general purpose processor, while it takes 664 seconds on a four-GPU based system including the communication and data transfer overhead. This drastic improvement (a factor of 2460×) will allow clinicians to extend the time-scale of simulations from milliseconds to seconds and minutes; and evaluate hypotheses in a shorter amount of time that was not feasible previously.  相似文献   

2.
Morphological image compositing   总被引:1,自引:0,他引:1  
Image mosaicking can be defined as the registration of two or more images that are then combined into a single image. Once the images have been registered to a common coordinate system, the problem amounts to the definition of a selection rule to output a unique value for all those pixels that are present in more than one image. This process is known as image compositing. In this paper, we propose a compositing procedure based on mathematical morphology and its marker-controlled segmentation paradigm. Its scope is to position seams along salient image structures so as to diminish their visibility in the output mosaic even in the absence of radiometric corrections or blending procedures. We also show that it is suited to the seamless minimization of undesirable transient objects occurring in the regions where two or more images overlap. The proposed methodology and algorithms are illustrated for the composition of satellite images minimizing cloud cover.  相似文献   

3.
吴昊  徐丹 《中国图象图形学报》2012,17(11):1333-1346
数字图像合成一直是图像处理中的研究热点,在图片编辑,平面设计,电影特效等领域有着广泛的应用。从原图像中准确地提取目标物体并将其无缝地合成到新背景下是图像合成的基本目标。按所使用的关键技术分类,现有的数字图像合成技术可分为基于α分量的图像合成,基于梯度场的图像合成和基于多分辨率模型的图像合成。首先详述了3类方法中的典型算法,并从合成质量、鲁棒性、运算效率等方面进行分析比较,然后对新的图像合成应用方式进行了扩展介绍,最后总结了现有图像合成方法的普遍局限性,并探讨了图像合成今后面临的挑战和发展方向。  相似文献   

4.
5.
在当前量子计算的研究中,量子线路模拟器作为重要的研究工具,一直受到研究者们的高度重视.QuEST是一款开源的通用量子线路模拟器,能在单个CPU结点、多个CPU结点和单个GPU等多种测试平台上灵活运行.量子线路模拟固有的并行性使其非常适合在GPU上运行,并能获得较大的性能加速.但是其缺点在于所消耗的内存空间巨大,单个GP...  相似文献   

6.
A new antialiasing approach for image compositing   总被引:1,自引:0,他引:1  
  相似文献   

7.
为了获得更少瑕疵的图像合成结果,提出了一种融合图像合成的抠图算法,将抠图与图像合成融为一个统一的过程。通过将待合成背景的信息引入抠图过程,新算法能够更有针对性地进行抠图。理论分析和实验结果说明了当待合成背景与原图像近似的情况下,新算法能够有效地减少抠图误估计带来的影响;当待合成背景与原图像颜色差异较大的情况下,亦能获得较好的合成结果。  相似文献   

8.
Because of intensive inter‐node communications, image compositing has always been a bottleneck in parallel visualization systems. In a heterogeneous networking environment, the variation of link bandwidth and latency adds more uncertainty to the system performance. In this paper, we present a pipelining image compositing algorithm in heterogeneous networking environments, which is able to rearrange the direction of data flow of a compositing pipeline under strict ordering constraint. We introduce a novel directional image compositing operator that specifies not only the color and α channels of the output but also the direction of data flow when performing compositing. Based on this new operator, we thoroughly study the properties of image compositing pipelines in heterogeneous environments. We develop an optimization algorithm that could find the optimal pipeline from an exponentially large searching space in polynomial time. We conducted a comprehensive evaluation on the ns‐3 network simulator. Experimental results demonstrate the efficiency of our method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
NPU-based image compositing in a distributed visualization system   总被引:1,自引:0,他引:1  
This paper describes the first use of a network processing unit (NPU) to perform hardware-based image composition in a distributed rendering system. The image composition step is a notorious bottleneck in a clustered rendering system. Furthermore, image compositing algorithms do not necessarily scale as data size and number of nodes increase. Previous researchers have addressed the composition problem via software and/or custom-built hardware. We used the heterogeneous multicore computation architecture of the Intel IXP28XX NPU, a fully programmable commercial off-the-shelf (COTS) technology, to perform the image composition step. With this design, we have attained a nearly four-times performance increase over traditional software-based compositing methods, achieving sustained compositing rates of 22-28 fps on a 1.021times1.024 image. This system is fully scalable with a negligible penalty in frame rate, is entirely COTS, and is flexible with regard to operating system, rendering software, graphics cards, and node architecture. The NPU-based compositor has the additional advantage of being a modular compositing component that is eminently suitable for integration into existing distributed software visualization packages.  相似文献   

10.
Scientific datasets of large volumes generated by next-generation computational sciences need to be transferred and processed for remote visualization and distributed collaboration among a geographically dispersed team of scientists. Parallel visualization using high-performance computing facilities is a typical approach to processing such increasingly large datasets. We propose an optimized image compositing scheme with linear pipeline and adaptive transport to support efficient image delivery to a remote client. The proposed scheme arranges an arbitrary number of parallel processors within a cluster in a linear order and divides the image into a carefully selected number of segments, which flow through the linear in-cluster pipeline and wide-area networks to the remote client consecutively. We analytically determine the segment size that minimizes the final image display time and derive the conditions where the proposed image compositing and delivery scheme outperforms the traditional schemes including the binary swap algorithm. In order to match the transport throughput for image delivery over wide-area networks to the pipelining rate for image compositing within the cluster, we design a class of transport protocols using stochastic approximation methods that are able to stabilize the data flow at a target rate. The experimental results from remote visualization of large-scale scientific datasets justify the correctness of our theoretical analysis and illustrate the superior performances of the proposed method.  相似文献   

11.
This study strives to establish an objective basis for image compositing in satellite oceanography. Image compositing is a powerful technique for cloud filtering that often emphasizes cloud clearing at the expense of obtaining synoptic coverage. Although incomplete cloud removal in image compositing is readily apparent, the loss of synopticity, often, is not. Consequently, the primary goal of image compositing should be to obtain the greatest amount of cloud-free coverage or clarity in a period short enough that synopticity, to a significant degree, is preserved.To illustrate the process of image compositing and the problems associated with it, we selected a region off the coast of California and constructed two 16-day image composites, one, during the spring, and the second, during the summer of 2006, using Advanced Very High Resolution Radiometer (AVHRR) InfraRed (IR) satellite imagery. Based on the results of cloud clearing for these two 16-day sequences, rapid cloud clearing occurred up to day 4 or 5, followed by much slower cloud clearing out to day 16, suggesting an explicit basis for the growth in cloud clearing. By day 16, the cloud clearing had, in most cases, exceeded 95%. Based on these results, a shorter compositing period could have been employed without a significant loss in clarity.A method for establishing an objective basis for selecting the period for image compositing is illustrated using observed data. The loss in synopticity, which, in principle, could be estimated from pattern correlations between the images in the composite, was estimated from a separate time series of SST since the loss of synopticity, in our approach, is only a function of time. The autocorrelation function of the detrended residuals provided the decorrelation time scale and the basis for the decay process, which, together, define the loss of synopticity. The results show that (1) the loss of synopticity and the gain in clarity are inversely related, (2) an objective basis for selecting a compositing period corresponds to the day number where the decay and growth curves for synopticity and clarity intersect, and (3), in this case, the point of intersection occurred 3.2 days into the compositing period. By applying simple mathematics it was shown that the intersection time for the loss in synopticity and the growth in clarity is directly proportional to the initial conditions required to specify the clarity at the beginning of the compositing period, and inversely proportional to the sum of the rates of growth for clarity and the loss in synopticity. Finally, we consider these results to be preliminary in nature, and, as a result, hope that future work will bring forth significant improvements in the approach outlined in this study.  相似文献   

12.

The main objective of this study was to compare the adequacy of various multitemporal image compositing algorithms to produce composite images suitable for burned area analysis. Satellite imagery from the NOAA Advanced Very High Resolution Radiometer (AVHRR) from three different regions (Portugal, central Africa, and South America) were used to compare six algorithms, two of which involve the sequential application of two criteria. Performance of the algorithms was assessed with the Jeffries-Matusita distance, to quantify spectral separability of the burned and unburned classes in the composite images. The ability of the algorithms to avoid the retention of cloud shadows was assessed visually with red-green-blue colour composites, and the level of radiometric speckle in the composite images was quantified with the Moran's I spatial autocorrelation statistic. The commonly used NDVI maximum value compositing procedure was found to be the least appropriate to produce composites to be used for burned area mapping, from all standpoints. The best spectral separability is provided by the minimum channel 2 (m2) compositing approach which has, however, the drawback of retaining cloud shadows. A two-criterion approach which complements m2 with maximization of brightness temperature in a subset of the data (m2M4) is considered the better method.  相似文献   

13.
介绍了Windows环境下进行数字图像采集的两种方法:一种是根据图像采集卡自带的工具箱中的函数进行开发,另一种是利用Windows提供的视频开发组件VFW(VideoforWindows)进行视频开发。对于第一种方法,作者以大恒公司生产的QP-300视频采集卡为例详细介绍了如何利用生产商提供的各功能函数进行实时视频捕获和图像采集;采用第二种方法时,开发人员不用了解硬件特性及SDK中的功能函数也能熟练地开发采集程序。  相似文献   

14.
Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. To obtain a significant improvement on a single GPU and to scale by using multiple GPUs, we resort to a suitable combination of operations to rearrange data before processing them. We propose a novel technique for mapping threads to data that achieves a perfect load balance by leveraging prefix-sum and binary search operations. To reduce the communication overhead, we perform a pruning operation on the set of edges that needs to be exchanged at each BFS level. The result is an algorithm that exploits at its best the parallelism available on a single GPU and minimizes communication among GPUs. We show that a cluster of GPUs can efficiently perform a distributed BFS on graphs with billions of nodes.  相似文献   

15.
Actual HPC systems are composed by multicore processors and powerful graphics processing units. Adapting existing code and libraries to these new systems is a fundamental problem due to the important increment on programming difficulties. The heterogeneity, both at architectural and programming levels at the same time, raises the programmability wall. The performance of the code is affected by the large interdependence between the code and the parallel architecture. We have developed a dynamic load balancing library that allows parallel code to be adapted to a wide variety of heterogeneous systems. The overhead introduced by our system is minimal and the cost to the programmer negligible. This system has been successfully applied to solve load imbalance problems appearing in homogeneous and heterogeneous multiGPU platforms. We consider the Dynamic Programming technique as case of study to validate our proposals using different heterogeneous scenarios in multiGPU systems.  相似文献   

16.
Recent advances in neuroscientific understanding have highlighted the highly parallel computation power of the mammalian neocortex. In this paper we describe a GPGPU-accelerated implementation of an intelligent learning model inspired by the structural and functional properties of the neocortex. Furthermore, we consider two inefficiencies inherent to our initial implementation and propose software optimizations to mitigate such problems. Analysis of our application’s behavior and performance provides important insights into the GPGPU architecture, including the number of cores, the memory system, atomic operations, and the global thread scheduler. Additionally, we create a runtime profiling tool for the cortical network that proportionally distributes work across the host CPU as well as multiple GPGPUs available to the system. Using the profiling tool with these optimizations on Nvidia’s CUDA framework, we achieve up to 60× speedup over a single-threaded CPU implementation of the model.  相似文献   

17.
In the last two decades, we have seen an amazing development of image processing techniques targeted for medical applications. We propose multi-GPU-based parallel real-time algorithms for segmentation and shape-based object detection, aiming at accelerating two medical image processing methods: automated blood detection in wireless capsule endoscopy (WCE) images and automated bright lesion detection in retinal fundus images. In the former method we identified segmentation and object detection as being responsible for consuming most of the global processing time. While in the latter, as segmentation was not used, shape-based object detection was the compute-intensive task identified. Experimental results show that the accelerated method running on multi-GPU systems for blood detection in WCE images is on average 265 times faster than the original CPU version and is able to process 344 frames per second. By applying the multi-GPU framework for bright lesion detection in fundus images we are able to process 62 frames per second with a speedup average 667 times faster than the equivalent CPU version.  相似文献   

18.
Athanas  P.M. Abbott  A.L. 《Computer》1995,28(2):16-25
The authors explore the utility of custom computing machinery for accelerating the development, testing, and prototyping of a diverse set of image processing applications. We chose an experimental custom computing platform called Splash-2 to investigate this approach to prototyping real time image processing designs. Custom computing platforms are emerging as a class of computers that can provide near application specific computational performance. We developed a real time image processing system called VTSplash, based on the Splash-2 general-purpose platform. Splash-2 is an attached processor featuring programmable processing elements (PEs) and communication paths. The Splash-2 system uses arrays of RAM based field programmable gate arrays (FPGAs), crossbar networks, and distributed memory to accomplish the needed flexibility and performance tasks. Such platforms let designers customize specific operations for function and size, and data paths for individual applications  相似文献   

19.
持久性内存(persistmemory,PM)具有非易失、字节寻址、低时延和大容量等特性,打破了传统内外存之间的界限,对现有软件体系结构带来颠覆性影响.但是,当前PM硬件还存在着磨损不均衡、读写不对称等问题,特别是当跨NUMA(nonuniformmemoryaccess)节点访问PM时,存在着严重的I/O性能衰减问题.提出了一种NUMA感知的PM存储引擎优化设计,并应用到中兴新一代数据库系统GoldenX中,显著降低了数据库系统跨NUMA节点访问持久内存的开销.主要创新点包括:提出了一种DRAM+PM混合内存架构下跨NUMA节点的数据空间分布策略和分布式存取模型,实现了PM数据空间的高效使用;针对跨NUMA访问PM的高开销问题,提出了I/O代理例程访问方法,将跨NUMA访问PM开销转化为一次远程DRAM内存拷贝和本地访问PM的开销,设计了Cache Line Area (CLA)缓存页机制,缓解了I/O写放大问题,提升了本地访问PM的效率;扩展了传统表空间概念,让每个表空间既拥有独立的表数据存储,也拥有专门的WAL (write-ahead logging)日志存储,针对该分布式WA...  相似文献   

20.
Video compositing, the editing and integrating of many video sequences into a single presentation, is an integral part of advanced multimedia services. Single-user compositing systems have been suggested in the past, but when they are extended to accommodate many users, the amount of memory required quickly grows out of hand. We propose two new architectures for digital video compositing in a multiuser environment that are memory-efficient and can operate in real time. Both architectures decouple the task of memory management from compositing processing. We show that under hard throughput and bandwidth constraints, a memory less solution for transferring data from many video sources to many users does not exist. We overcome this using (i) a dynamic memory buffering architecture and (ii) a constant memory bandwidth solution that transforms the sources-to-users transfer schedule into two schedules, then pipelines the computation. The architectures support opaque overlapping of images, arbitrarily shaped images, and images whose shapes dynamically change from frame to frame.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号