首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
现代GPU一般都提供特定硬件(如纹理部件、光栅化部件及各种片上缓存)以加速二维图像的处理和显示过程,相应的编程模型(CUDA、OpenCL)都定义了特定程序设计接口(CUDA的纹理内存,OpenCL的图像对象)以便图像应用能利用相关硬件支持。以典型图像模糊化处理算法在AMD平台GPU的优化为例,探讨了OpenCL的图像对象在图像算法优化上的适用范围,尤其是分析了其相对于更通用的基于全局内存加片上局部存储进行性能优化的方法的优劣。实验结果表明,图像对象只有在图像为四通道且计算过程中需要缓存的数据量较小时才能带来较好的性能改善,其余情况采用全局内存加局部存储都能获得较好性能。优化后的算法性能相对于精心实现的CPU版加速比为200~1000;相对于NVIDIA NPP库相应函数的性能加速比为1.3~5。  相似文献   

2.
3D图形流水线像素处理后期的设计和实现   总被引:1,自引:0,他引:1  
针对3D图形流水线像素处理后期的实时大批量数据处理和存储器读写要求,以及嵌入式系统资源和功耗的特殊性,给出一种像素处理后期的硬件设计方案。设计首先实现所有测试功能,确保各种效果,其次采用了基于屏幕分割渲染的设计思想,减少存储器需求,然后吸收了Early Z算法,尽早抛弃不可见的三角面信息,减少渲染的数据,最后实现了Flip Quad反走样算法,提高图像的质量。模块已经完成了RTL级建模,并在FPGA上通过验证。  相似文献   

3.
《Parallel Computing》2002,28(7-8):1111-1139
Multimedia processing is becoming increasingly important with wide variety of applications ranging from multimedia cell phones to high definition interactive television. Media processing techniques typically involve the capture, storage, manipulation and transmission of multimedia objects such as text, handwritten data, audio objects, still images, 2D/3D graphics, animation and full-motion video. A number of implementation strategies have been proposed for processing multimedia data. These approaches can be broadly classified into two major categories, namely (i) general purpose processors with programmable media processing capabilities, and (ii) dedicated implementations (ASICs). We have performed a detailed complexity analysis of the recent multimedia standard (MPEG-4) which has shown the potential for reconfigurable computing, that adapts the underlying hardware dynamically in response to changes in the input data or processing environment. We therefore propose a methodology for designing a reconfigurable media processor. This involves hardware–software co-design implemented in the form of a parser, profiler, recurring pattern analyzer, spatial and temporal partitioner. The proposed methodology enables efficient partitioning of resources for complex and time critical multimedia applications.  相似文献   

4.
A large number of remote-sensing techniques and image-based photogrammetric approaches allow an efficient generation of massive 3D point clouds of our physical environment. The efficient processing, analysis, exploration, and visualization of massive 3D point clouds constitute challenging tasks for applications, systems, and workflows in disciplines such as urban planning, environmental monitoring, disaster management, and homeland security. We present an approach to segment massive 3D point clouds according to object classes of virtual urban environments including terrain, building, vegetation, water, and infrastructure. The classification relies on analysing the point cloud topology; it does not require per-point attributes or representative training data. The approach is based on an iterative multi-pass processing scheme, where each pass focuses on different topological features and considers already detected object classes from previous passes. To cope with the massive amount of data, out-of-core spatial data structures and graphics processing unit (GPU)-accelerated algorithms are utilized. Classification results are discussed based on a massive 3D point cloud with almost 5 billion points of a city. The results indicate that object-class-enriched 3D point clouds can substantially improve analysis algorithms and applications as well as enhance visualization techniques.  相似文献   

5.
The 3D discrete cosine transform and its inverse (3D DCT/IDCT) extend the spatial compression properties of conventional 2D DCT to the spatio-temporal coding of 2D videos. The 3D DCT/IDCT transform is particularly suited for embedded systems needing the low-complexity implementation of both video encoder and decoder, such as mobile terminals with video-communication capabilities. This paper addresses the problem of real-time and low-power 3D DCT/IDCT processing by presenting a context-aware fast transform algorithm and a family of VLSI architectures characterized by different levels of parallelism. Implemented in submicron CMOS technology, the proposed hardware macrocells support the real-time processing of main video formats (up to high definition ones with an input rate of tens of Mpixels/s) with different trade-offs between circuit complexity, power consumption and computational throughput. Voltage scaling and adaptive clock-gating strategies are applied to reduce the power consumption versus the state of the art.  相似文献   

6.
The impressive progress of rendering software and hardware over the last two decades often leads to the – too rapid – conclusion that high-quality 3D imagery can now be incorporated in all sorts of applications. Interestingly, these advances allow more and more complex applications to be envisioned: however, an increase in processing power is not necessarily used to treat the same problem faster, but also creates a desire to attack larger problems. In many ways the models for visual simulation or engineering applications grow faster than the graphics systems! recent work on image-based rendering and modeling shows a growing awareness that traditional 3D methods may not scale well for the current and coming complexity levels. This talk will examine some of the challenges lying ahead for the development of future graphics applications. Specifically, when is it better to use pixels than polygons? when is a 3D model required? how can we mix and match competing approaches? can image-based approaches help for different applications such as lighting simulations? some issues related to network applications will also be discussed.  相似文献   

7.
基于几何与图像混合绘制中的快速WARP变换算法研究   总被引:1,自引:0,他引:1  
基于几何与图像的混合绘制中,3D Warp算法以严格的数学变换为基础,从而能够保证准确的投影关系,但该算法在实时绘制阶段需进行大量的数学运算,故其时间复杂度较高,该文提出了一种新的快速Warp变换算法,算法以3D Warp算法为基础,采用了崭新的投影过程,从而使时间复杂度较3D Warp算法有较大幅度的下降(降低约7.51倍)。同时,该算法是一种流水结构,能够有效利用现有的加速硬件,而无需改变图形硬件的体系结构。  相似文献   

8.
9.
The rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors with sub-word parallelism instructions to become a dominant market trend in desk-top PCs as well as high end mobile devices. This paper presents an efficient parallel implementation of 2D convolution algorithm demanding high performance computing power in multi-core desktop PCs. It is a representative computation intensive algorithm, in image and signal processing applications, accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. The purpose of this study is to explore the effectiveness of exploiting the streaming SIMD (Single Instruction Multiple Data) extension (SSE) technology and TBB (Threading Building Block) run-time library in Intel multi-core processors. By doing so, we can take advantage of all the hardware features of multi-core processor concurrently for data- and task-level parallelism. For the performance evaluation, we implemented a 3?×?3 kernel based convolution algorithm using SSE2 and TBB with different combinations and compared their processing speeds. The experimental results show that both technologies have a significant effect on the performance and the processing speed can be greatly improved when using two technologies at the same time; for example, 6.2, 6.1, and 1.4 times speedup compared with the implementation of either of them are suggested for 256?×?256, 512?×?512, and 1024?×?1024 data sets, respectively.  相似文献   

10.
This paper presents a hardware architecture using Dichotomous Coordinate Descent (DCD) iterations for Adaptive Weight Computation (AWC) in Minimum Variance Distortionless Response (MVDR) Beamformer. The objective of the proposed work is to achieve low latency and reduced area architecture for the AWC stage in MVDR Beamformer. The work investigates the computation of adaptive weight for 4,8,16 and 32 channels array beamformer. In order to improve the weight updating rate, the existing complex valued cyclic DCD hardware implementation is optimized to 2 clock cycles per iteration. Moreover, DCD algorithm implementation does not require any multiplication and division. The proposed work is implemented on FPGA platform, and the results are compared with state-of-art literature and conclude that the proposed architecture is suitable for MVDR Beamformer employed in high sampling rate applications like medical ultrasound imaging due to its occupancy of a moderate amount of resources and improved processing speed.  相似文献   

11.
3D video has recently seen a massive increase in exposure in our lives. However, differences between the viewing and shooting conditions for a film lead to disparities between the reformed media and the original three-dimensional effect, which cause severe visual fatigue to viewers and result in headaches and dizziness. In this paper, a series of image processing algorithms are introduced to overcome these problems. The image processing pipeline is composed of four steps, eye-pupil detection, stereo correspondence computation, saliency map generation, and 3D warping. Each step is implemented in an S3DS-3D rendering system and its time complexity is measured. From the results, it was found that real-time stereoscopic 3D rendering is impossible using only a software implementation because SIFT and optical flow calculation requires a significant amount of time. Therefore, these two algorithm blocks should be implemented with hardware acceleration. Fortunately, active research is being conducted on these issues and real-time processing is expected to become available soon for applications beyond full-HD TV screens. In addition, it was found that saliency map generation and 3D warping blocks also need to be implemented in hardware for full-HD display although they do not have significant time complexity compared to SIFT and optical flow algorithm blocks.  相似文献   

12.
This paper describes a set of methods that make it possible to estimate the position of a feature inside a three-dimensional (3D) space by starting from a sequence of two-dimensional (2D) acoustic images of the seafloor acquired with a sonar system. Typical sonar imaging systems are able to generate just 2D images, and the acquisition of 3D information involves sharp increases in complexity and costs. The front-scan sonar proposed in this paper is a new equipment devoted to acquiring a 2D image of the seafloor to sail over, and allows one to collect a sequence of images showing a specific feature during the approach of the ship. This fact seems to make it possible to recover the 3D position of a feature by comparing the feature positions along the sequence of images acquired from different (known) ship positions. This opportunity is investigated in the paper, where it is shown that encouraging results have been obtained by a processing chain composed of some blocks devoted to low-level processing, feature extraction and analysis, a Kalman filter for robust feature tracking, and some ad hoc equations for depth estimation and averaging. A statistical error analysis demonstrated the great potential of the proposed system also if some inaccuracies affect the sonar measures and the knowledge of the ship position. This was also confirmed by several tests performed on both simulated and real sequences, obtaining satisfactory results on both the feature tracking and, above all, the estimation of the 3D position.  相似文献   

13.
This paper presents the complex environment that was built to ease the prototyping of real-time applications on the PAPRICA-3 massively parallel system. Applications are developed in C++ using high level data types and the corresponding Assembly code is automatically created by a code generator. A stochastic code optimizer takes the assembly code and improves it according to a genetic approach; due to the high computational power required by this approach, the stochastic code optimizer was implemented with MPI and runs in parallel on a cluster of workstations. The availability of this complex environment allowed to test the performance of the system and to tune it according to some target applications before the actual development of the hardware. For this purpose a system-level simulator was also built to determine the number of clock cycles required to run a specific segment of code. The whole environment has been used to validate possible solutions for the hardware system and to develop, test, and tune several real-time image processing applications. The hardware system is now completely defined.  相似文献   

14.
The clipping operation is still the bottleneck of the graphics pipeline in spite of the latest developments in graphical hardware and a significant increase in performance. Algorithms for line and line segment clipping have been studied for a long time and many research papers have been published so far. This paper presents a new robust approach to line and line segment clipping using a rectangular window. A simple extension for the case of convex polygon clipping is presented as well. The presented approach does not require a division operation and uses homogeneous coordinates for input and output point representation. The proposed algorithms can take advantage of operations supported by vector–vector hardware. The main contribution of this paper is a new approach to intersection computations applied to line and line segment clipping. This approach leads to algorithms that are simpler, robust, and easy to implement.  相似文献   

15.
Abstract. This paper describes the design of a reconfigurable architecture for implementing image processing algorithms. This architecture is a pipeline of small identical processing elements that contain a programmable logic device (FPGA) and double port memories. This processing system has been adapted to accelerate the computation of differential algorithms. The log-polar vision selectively reduces the amount of data to be processed and simplifies several vision algorithms, making possible their implementation using few hardware resources. The reconfigurable architecture design has been devoted to implementation, and has been employed in an autonomous platform, which has power consumption, size and weight restrictions. Two different vision algorithms have been implemented in the reconfigurable pipeline, for which some experimental results are shown. Received: 30 March 2001 / Accepted: 11 February 2002 RID="*" ID="*" This work has been supported by the Ministerio de Ciencia y Tecnología and FEDER under project TIC2001-3546 Correspondence to: J.A. Boluda  相似文献   

16.
Medical imaging scanners now exist that can generate 4D cardiac images. Since the heart moves, cardiac anatomy and physiology can be studied using 4D image sequences. Interactive manual 4D image analysis can be time-consuming and error-prone—automatic and semi-automatic methods have many advantages over manual segmentation. This paper describes a procedure for performing semi-automatic image segmentation on 4D image sequences. Our procedure is based on a small set of user-defined image-segmentation cues specified at certain time points in the sequence. These cues are then automatically interpolated or extrapolated for the remaining time points. The complete set of cues is interpreted and used to generate a sequence of image processing operations (such as operators for image enhancement, morphological processing, and region segmentation) that can subsequently segment the 4D image. This procedure permits 4D cardiac image segmentation with only a small amount of user interaction. The proposed approach compares favorably to results generated by defining cues on each individual volume and to results generated completely manually. The 4D approach also requires significantly less interaction time than pure manual analysis.  相似文献   

17.
BD-GKS3D是按图形国际标准GKS-3D(ISO 8805)开发的三维图形支持软件。本文讨论了该软件的设计原则:符合国际标准及尽可能高效率。还讨论了实现的策略,包括二维GKS与三维兼容问题,裁剪与变换,实现环境,图段的数据结构及三维输入等内容。最后对开发BD-GKS3D的工作量、与PHIGS,CGI的关系等作了说明。  相似文献   

18.
二维线段裁剪的概率模型   总被引:1,自引:0,他引:1  
相对于矩形窗口的二维线段裁剪是计算机图形学中的基本操作之一,已有多种裁剪算法.由于这些算法在不同情况下各具优劣,一般只能分不同情况比较算法的性能,无法比较算法的平均性能.本文首先分析了线段与窗口之间位置关系的概率分布,从而得到二维线段裁剪的概率模型.接着使用该模型计算出一些常用算法的平均运算次数,并对算法的平均性能进行比较.该模型也纠正了一些论文中关于线段与窗口之间位置关系的概率分布的错误观点.  相似文献   

19.
Due to rapid technology advance, Multiprocessor System-on-Chips (MPSoCs) are likely to become commodity computing platforms for embedded applications. In the future, it is possible that an MPSoC is equipped with a large number of processing elements as well as on-chip resources. The management of these faces many challenges, among which deadlock is one of the most crucial issues. This paper presents a novel hardware-oriented deadlock detection algorithm suitable for current and future MPSoCs. Unlike previously published methods whose runtime complexities are often affected by the number of processing elements and resources in the system, the proposed algorithm leverages specialized hardware to guarantee O(1) overall runtime complexity. Such complexity is achieved by: 1) classifying resource allocation events; 2) for each type of events, using hardware to perform a set of specific detection and/or preparation operations that only takes constant runtime; and 3) updating necessary information for multiple resources in parallel in hardware. We implement the algorithm in Verilog HDL and demonstrate through simulation that each algorithm invocation takes at most four clock cycles.  相似文献   

20.
Discrete relaxation techniques have proven useful in solving a wide range of problems in digital signal and digital image processing, artificial intelligence, operations research, and machine vision. Much work has been devoted to finding efficient hardware architectures. This paper shows that a conventional hardware design for a Discrete Relaxation Algorithm (DRA) suffers from O(n2m3) time complexity and O(n2m2) space complexity. By reformulating DRA into a parallel computational tree and using a multiple tree-root pipelining scheme, time complexity is reduced to O(nm), while the space complexity is reduced by a factor of 2. For certain relaxation processing, the space complexity can even be decreased to O(nm). Furthermore, a technique for dynamic configuring an architectural wavefront is used which leads to an O(n) time highly concurrent DRA3 architecture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号