首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 202 毫秒
1.
使用SOPC Builder自定制组件的方法,设计出JPEG2000图像压缩算法的IP核。通过对JPEG2000压缩原理中并行性的深入分析,利用这种并行性提出了一种针对压缩过程中EBCOT算法编码的硬件加速实现的设计方案。在DE2开发平台上的测试表明,该文设计的JPEG2000IP核压缩速度明显比串行结构的同类处理器速度要快。  相似文献   

2.
在分析GPU并行计算特点的基础上,提出并实现了基于GPU编程的地形纹理快速渲染方法,其核心是用GPU编程对地形纹理图像进行快速解压.与传统渲染流程不同,该方法首先把压缩纹理图像传输到图形卡中,然后通过GPU编程实现对压缩图像解压的硬件加速,从而解决了海量纹理数据存储;传输带宽以及解压速度等一系列问题.实验结果表明基于GPU编程的地形纹理快速渲染方法在虚拟场景的渲染速度方面优势明显,并且随着地形纹理图像分辨率的增大这种优势体现得更加充分.  相似文献   

3.
针对当前VxWorks系统图形显示驱动已无法满足用户需求的问题,提出了一种基于二维(2D)硬件加速引擎的图形显示驱动技术,实现了UGL的二维硬件加速、硬件光标、硬件双缓冲、Overlay覆盖等功能,提高了VxWorks系统的图形显示性能,满足了实时系统的应用需求.目前该设计已在多型海军装备中得到应用验证.  相似文献   

4.
罗勇 《电脑迷》2011,(6):71-71
在追求高效率浏览的今天而在所有类型的硬件加速中,GPU硬件加速是最为引人注目,GPU最大的特性就是运算能力强大,甚至比CPU的运算能力要强大很多倍(3D游戏渲染对运算器的要求非常高),越来越多的应用都可以使用GPU来实现硬件加速,比如高清视  相似文献   

5.
设计硬件加速部件已成为扩展通用微处理器计算平台科学应用用途的重要手段,在讨论分析可重构硬件加速部件与通用微处理器计算平台之间的耦合方式之后,针对存储总线具有高带宽低延迟的特点,提出了一种基于存储器总线耦合方式的可重构硬件加速部件(RHAU)的体系结构,并针对设计中所遇到的问题提出了解决方案.在性能评价部分,选取AES加密程序作为测试应用,通过SIS模拟器对其进行模拟,得出RHAU对AES128加密算法的加速比为22.  相似文献   

6.
基于FPGA的粒子滤波跟踪系统的设计与实现   总被引:1,自引:0,他引:1  
目标跟踪技术广泛应用于消费电子、工业检测、安防监控及智能交通等领域.由于PC体积大,功耗高,而嵌入式微处理器的资源和处理能力有限,所以本文基于FPGA技术,设计并实现了以粒子滤波算法为核心的目标跟踪系统,以硬件加速技术提升处理性能.通过对影响速度的复杂浮点数运算采用定制指令的硬件加速方式,对于重复性高、运算简单的RGB转HSV等运算,采用IP核硬件加速方式,实现了算法硬件并行化,系统处理速度得到提升.实验表明,通过FPGA技术和硬件加速技术实现的目标跟踪系统能够满足实时性要求,其单位频率的处理性能高于高性能PC机的处理性能的5倍左右.  相似文献   

7.
为适应当前“大数据+深度模型”时代的到来,利用FPGA进行各种算法的硬件加速为其提供了一种可行的解决方案. 本文利用Vivado HLS工具,基于遗传算法设计了一套智能硬件加速架构,编程实现自动生成tcl文件、自动调用HLS工具完成仿真和提取报表中的数据进行分析,并对Xilinx公司所给的FIR和DCT等案例程序进行了测试. 实验中寻找到了较优的解决方案,效率相比人工不断尝试的方法有了数量级的提升,满足了当前一般算法在硬件加速的通用性.  相似文献   

8.
一种铅笔滤镜生成算法及其在GPU上的实现   总被引:2,自引:0,他引:2  
提出了一种基于卷积算子的铅笔滤镜生成算法,通过分析真实铅笔纹理的结构特征,抽象出铅笔笔画的简单数学模型.根据该模型可以方便地确定其对应的笔刷模板(卷积算子),进而获得用户所需要的铅笔纹理.本文将该算法与图形处理单元(GPu)相结合,借助于GPU的硬件加速功能成功地实现了对视频图像的实时铅笔画风格绘制.  相似文献   

9.
嵌入式Linux中图形界面硬件加速的优化设计   总被引:1,自引:0,他引:1  
许多的显卡芯片提供了硬件加速功能,如何充分利用嵌入式系统有限的硬件条件、提高GUI的性能是嵌入式系统开发中的一个重要环节。文章分析了嵌入式Linux系统中GUI实现硬件加速的一般方法,设计实现了一种应用范围较广泛且更为规范的加速方案。  相似文献   

10.
以ZYNQ异构多核处理器为实现平台,采用HLS设计方法学对运动特征提取算法进行了FPGA硬件加速,达到了1080P 60 fps的计算能力.采用K-means对运动特征聚类,再生成高维向量,用SVM分类器进行分类和识别.最终,通过高效的系统结构和硬件加速电路实现了算法的加速.系统最终采用基于Linux和QT框架的人机交互方式,支持在线学习、创建动作库的功能.  相似文献   

11.
潘青松  张怡  杨宗明  秦剑秀 《计算机科学》2017,44(Z11):530-533, 556
以Zynq芯片为基础,采用软硬件协同设计的方法设计并实现整个系统。Zynq芯片内部采用ARM+FPGA的异构架构,既具备ARM处理器的灵活性,又拥有FPGA并行处理的能力。本系统的设计充分发挥了Zynq芯片的优势,在软硬件划分上, 通过ARM处理器来实现图像的采集;图像角点及边缘检测用FPGA来完成,即通过硬件加速提升系统的整体性能。ARM处理器与FPGA通过AXI4总线进行数据交互,在Zynq上实现集图像采集、图像特征提取、图像显示为一体的片上系统。最终系统测试结果表明,采用硬件加速实现图像特征提取的相关算法比在ARM处理器软件上实现的算法的速度提高了6~8倍。  相似文献   

12.
Fractal coding algorithm has many applications including image compression. In this paper a classification scheme is presented which allows the hardware implementation of the fractal coder. High speed and low power consumption are the goal of the suggested design. The introduced method is based on binary classification of domain and range blocks. The proposed technique increases the processing speed and reduces the power consumption while the qualities of the reconstructed images are comparable with those of the available software techniques. In order to show the functionality of the proposed algorithm, the architecture was implemented on a FPGA chip. The application of the proposed hardware is shown in image compression. The resulted compression ratios, PSNR error, gate count, compression speed and power consumption are compared with the existing designs. Other applications of the proposed design are feasible in certain fields such as mass–volume database coding and also in video coder’s block matching schemes.  相似文献   

13.
In this paper,a learning-based high-speed reconstruction system for ultra-low resolution faces is implemented using a software/hardware co-design paradigm.The hardware component working at 60 MHz contains a field programmable gate array,which is reconfigured to contain parallel processing units,and multiple memories to create parallel data.The hardware component effectively handles generating and sorting computationally intensive similarity metrics.This solves the processing speed problem in learning-based super-resolution reconstruction for ultra-low resolution faces.The system can reconstruct faces using 8 × 6,16 × 12,and 32 × 24 sized images,with 4 × 4,8 × 8,or 16 × 16 times magnification.The experimental results verify the effectiveness of our system in terms of both visual effect and low root mean square errors.The processing speed can be improved up to a maximum of 7900 times faster than a pure software implementation using C.  相似文献   

14.
针对当前在FPGA上实现卷积神经网络模型时卷积计算消耗资源大,提高FPGA芯片性能代价较大等问题,提出一种改进的基于嵌入式SoC的优化设计方法。对卷积计算的实现方法和存储访问通道加以优化,以提高并行计算性能;将32位位宽的浮点数量化为16位定点数,加快前向传播的数据传输;结合硬件描述软件的高层次综合技术,将卷积神经网络映射到硬件平台成为一种同步数据流模型从而加快计算速度。通过实验证明,该方案较现有设计节约了89%的BRAM和72%的LUT,在工作频率为100 MHz的测试中,其处理速度比单独使用Cortex-A9的方案提升了42倍。  相似文献   

15.
Texture mapping has been widely used to improve the quality of 3D rendered images. To reduce the storage and bandwidth impact of texture mapping, compression systems are commonly used. To further increase the quality of the rendered images, texture filtering is also often adopted. These two techniques are generally considered to be independent. First, a decompression step is executed to gather texture samples, which is then followed by a separate filtering step. We have investigated a system based on linear transforms that merges both phases together. This allows more efficient decompression and filtering at higher compression ratios. This paper formally presents our approach for any linear transformation, how the commonly used discrete cosine transform can be adapted to this new approach, and how this method can be implemented in real time on current-generation graphics cards using shaders. Through reuse of the existing hardware filtering, fast magnification and minification filtering is achieved. Our implementation provides fully anisotropically filtered samples four to six times faster than an implementation using two separate phases for decompression and filtering. Additionally, our transform-based compression also provides increased and variable compression ratios over standard hardware compression systems at a comparable or better quality level.  相似文献   

16.
We propose an end-to-end security scheme for mobility enabled healthcare Internet of Things (IoT). The proposed scheme consists of (i) a secure and efficient end-user authentication and authorization architecture based on the certificate based DTLS handshake, (ii) secure end-to-end communication based on session resumption, and (iii) robust mobility based on interconnected smart gateways. The smart gateways act as an intermediate processing layer (called fog layer) between IoT devices and sensors (device layer) and cloud services (cloud layer). In our scheme, the fog layer facilitates ubiquitous mobility without requiring any reconfiguration at the device layer. The scheme is demonstrated by simulation and a full hardware/software prototype. Based on our analysis, our scheme has the most extensive set of security features in comparison to related approaches found in literature. Energy-performance evaluation results show that compared to existing approaches, our scheme reduces the communication overhead by 26% and the communication latency between smart gateways and end users by 16%. In addition, our scheme is approximately 97% faster than certificate based and 10% faster than symmetric key based DTLS. Compared to our scheme, certificate based DTLS consumes about 2.2 times more RAM and 2.9 times more ROM resources. On the other hand, the RAM and ROM requirements of our scheme are almost as low as in symmetric key-based DTLS. Analysis of our implementation revealed that the handover latency caused by mobility is low and the handover process does not incur any processing or communication overhead on the sensors.  相似文献   

17.
In this paper, we propose a novel Convolutional Neural Network hardware accelerator called CoNNA, capable of accelerating pruned, quantized CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the compressed CNN acceleration, being able to accelerate all layer types commonly found in contemporary CNNs. CoNNA is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during CNN layer processing. The CoNNA architecture enables the on-the-fly selection of the CNN network that should be accelerated and also supports the acceleration of CNN networks with dynamic topology. Furthermore, by being able to directly process compressed feature and kernel maps, and skip all ineffectual computations during CNN layer processing, the CoNNA CNN accelerator is able to achieve higher CNN processing rates than some of the previously proposed solutions. The CoNNA architecture has been implemented using Xilinx ZynqUtrascale+ FPGA family and compared with seven previously proposed CNN hardware accelerators. Results of the experiments seem to indicate that the CoNNA architecture is up to 14.10, 6.05, 4.91, 2.67, 11.30, 3.08 and 3.58 times faster than previously proposed MIT's Eyeriss, NullHop, NVIDIA's Deep Learning Accelerator (NVDLA), NEURAghe, CNN_A1, fpgaConvNet, and Deephi's Aristotle CNN accelerators respectively, while using identical number of computing units and operating at the same clock frequency.  相似文献   

18.
为了推动音视频编码标准(AVS)解码芯片产业的发展,提出了一种基于AVS标准的帧间预测亮度插值电路的硬件结构。该设计方案将像素点按位置的不同分为三层,并采用了不同的流水线结构予以实现,充分利用了像素点之间的复用情况,兼顾处理速度和实现代价两方面考虑。该方案硬件实现效率较高,满足了硬件资源以及系统时钟频率的要求。  相似文献   

19.
矩阵乘法是数值分析以及图形图像处理算法的基础,通用的矩阵乘法加速器设计一直是嵌入式系统设计的研究热点。但矩阵乘法由于计算复杂度高,处理效率低,常常成为嵌入式系统运算速度的瓶颈。为了在嵌入式领域更好地使用矩阵乘法,提出了基于MPSoC(MultiProcessor System-on-Chip)的软硬件协同加速的架构。在MPSoC的架构下,一方面,设计了面向硬件约束的矩阵分块方法,从而实现了通用的矩阵乘法加速器系统;另一方面,通过利用MPSoC下的多核架构,提出了相应的任务划分和负载平衡调度算法,提高了并行效率和整体系统加速比。实验结果表明,所提架构及算法实现了通用的矩阵乘法计算,并且通过软硬件协同设计实现的多核并行调度算法与传统单核设计相比在计算效率方面得到了显著的提高。  相似文献   

20.
目的 针对异构网络环境下,不同终端用户对遥感影像质量的不同需求而导致的影像数据量过大、传输及显示延迟过长等问题,提出一种在线压缩—实时传输—实时解压缩的遥感影像渐进传输模型。方法 模型采用多线程流水线同步处理的加速算法,将基于质量渐进压缩的SPIHT算法与多线程流水线技术相结合,在VC++环境下,将遥感影像在线压缩成码流,在压缩的同时,启动多线程采用Socket信道对压缩码流实时发送,客户端收到码流后,利用多线程实时解压缩并显示。通过采用多线程技术,使得压缩、传输和解压缩同步进行,从而减少了整体处理时间。结果 实验结果表明,提出的实时压缩渐进传输模型,在不影响影像质量的前提下,算法处理速度提高近2倍。每个渐进分层影像与原影像的相似度比多分辨率渐进压缩分层影像与原影像的相似度平均增加20%。结论 该模型有效地解决了遥感影像渐进传输过程中压缩、传输和解压缩的不同步问题,从而提高了渐进传输效率。与多分辨率渐进传输比较,此渐进传输模型具有更好的视觉效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号