首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 265 毫秒
1.
介绍了一种适用于MPEG-4视频简单层解压缩应用的二维IDCT协处理器。该处理器采用Loeffler架构的IDCT快速算法,并使用加法和移位运算代替IDCT快速算法中的浮点乘法运算单元,用高度并行流水VLSI结构加快数据处理速度,采用一维的IDCT单元的复用的方式来实现二维的IDCT运算。在满足处理速度和精度要求的基础上,利用较少的晶体管数目实现了一种高性能的二维IDCT处理器。该方案已经应用于一款SOC芯片中的硬件MMA(多媒体加速单元)中,IDCT的运算精度也得到了验证。  相似文献   

2.
近年来某些领域的复杂嵌入式计算系统对于小型化的需求越来越迫切,但是常规的小型化设计手段存在处理器运算负荷加重,需要频繁处理中断事件等缺陷。针对这样的问题,提出了一种复杂嵌入式计算系统的新型硬件架构,该架构将处理器模块与接口模块合并为一个模块,去除了原接口模块中的协处理器及相关芯片,在现场可编程门阵列中集成一个指令执行电路来依据主处理器提供的指令自动进行各接口数据的发送、接收和预处理工作,减轻了主处理器的运算负荷,并且主处理器也无需频繁访问接口和处理接口中断。实际应用表明,该新型架构达到了体积小,功耗低,重量轻的设计目的,并且解决了常规小型化设计方法所存在的问题,可以在各领域的复杂嵌入式计算系统小型化设计中进行推广。  相似文献   

3.
前言FPGA已经不再单纯应用在芯片与系统之间的直接互联层,在软件无线电(SDR)中,FPGA逐渐被用做通用运算架构来实现硬件加速单元,在降低成本和功耗的基础上提高了性能。SDR调制解调器的典型实现包括通用处理器(GPP)、DSP和FPGA。而且,FPGA架构可以结合专用硬件加速单元,用来卸载  相似文献   

4.
文章讨论了定义在GaloisField(GF)2有限域上椭圆曲线密码体制(ECC)协处理器芯片的设计。首先在详细分析基于GF(2n)ECC算法的基础上提取了最基本和关键的运算,并提出了通过协处理器来完成关键运算步骤,主处理器完成其它运算的ECC加/解密实现方案。其次,进行了加密协处理器体系结构设计,在综合考虑面积、速度、功耗的基础上选择了全串行方案来实现GF(2n)域上的乘和加运算。然后,讨论了加密协处理器芯片的电路设计和仿真、验证问题。最后讨论了芯片的物理设计并给出了样片的测试结果。  相似文献   

5.
基于GF(2^n)的ECC协处理器芯片设计   总被引:2,自引:0,他引:2  
文章讨论了定义在Galois Field(GF)2^n有限域上椭圆曲线密码体制(ECC)协处理器芯片的设计。首先在详细分析基于GF(2^n)ECC算法的基础上提取了最基本和关键的运算,并提出了通过协处理器来完成关键运算步骤,主处理器完成其它运算的ECC加/解密实现方案。其次,进行了加密协处理器体系结构设计,在综合考虑面积、速度、功耗的基础上选择了全串行方案来实现GF(2^n)域上的乘和加运算。然后,讨论了加密协处理器芯片的电路设计和仿真、验证问题。最后讨论了芯片的物理设计并给出了样片的测试结果。  相似文献   

6.
在由通用RISC处理器核和附加定点硬件加速器构成的定点SoC(System-on-Chip)芯片体系架构基础上,提出了一种新颖的基于统计分析的定点硬件加速器字长设计方法。该方法利用统计参数在数学层面上求解计算出满足不同信噪比要求下的最小字长,能有效地降低芯片面积、功耗和制作成本,从而在没有DSP协处理器的低成本RISC处理器核SoC芯片上运行高计算复杂度应用。  相似文献   

7.
多模基带处理器芯片的设计已成为研究的热点。文章结合无线通信处理算法的特点.利用指令级加速技术,设计了一种基于无线通信中复数运算的16位嵌入式处理器核。利用它可以通过灵活的配置软件完成基带处理中绝大部分的复数相关运算和控制功能。经实际流片测试,该处理器核具有面积小、功耗低的特点.可用于多模无线通信基带处理器的设计。  相似文献   

8.
基于流体系结构的高效能分组密码处理器研究   总被引:1,自引:0,他引:1       下载免费PDF全文
针对现有密码处理器存在的问题,借鉴流处理器架构,提出了高效能的可重构分组密码流处理器架构.该架构采用层次化设计思想,通过分块式本地寄存器组的数据组织方式和共享拼接使用运算单元机制,实现了软件流水和硬件流水的协同工作,能够挖掘分组内和分组间的指令级并行性并提高功能单元的利用率.在65nm CMOS工艺下对架构进行了综合仿真,并经过了大量算法映射.实验结果证明,该架构在CBC和ECB加密模式下均具有良好的加密性能.与其他密码处理器相比,该架构具有小面积、高效能的特点.  相似文献   

9.
主要从系统级、算法级、结构级等多个层面综合考虑减少数字语音解码器的功耗.系统级使用双向不交叠时钟技术,在提高耗时长的模块运算频率的同时消除了电路的竞争与冒险;算法级主要使用汇编语言重写和优化原代码,既可以压缩源代码,更能充分挖掘硬件的运算潜力;在结构级,主要利用并行技术,增加协处理器进行并行计算,有效提高运算速度.另外在布局布线时使用全定制集成电路设计技术手工布线,大为减少解码器的芯片面积.  相似文献   

10.
蔡洪波  金声震 《电子学报》2005,33(9):1717-1719
本文提出了一种为空间太阳望远镜星载数据处理系统而设计的动态可重构协处理器方案,该方案利用4bits粒度可重构阵列将传统的基于指令流的运算方式变为基于数据流与配置流的运算方式,并通过指令流水实现了动态可重构单元与主处理器的协同工作.文章最后还给出了该方案在Xilinx XC2V3000上的实现及该实现用于乘法和1024点复数快速傅立叶变换时的性能.  相似文献   

11.
This paper presents the design and implementation of a hyperelliptic curve cryptography (HECC) coprocessor over affine and projective coordinates, along with measurements of its performance, hardware complexity, and power consumption. We applied several design techniques, including parallelism, pipelining, and loop unrolling, in designing field arithmetic units, group operation units, and scalar multiplication units to improve the performance and power consumption. Our affine and projective coordinate‐based HECC processors execute in 0.436 ms and 0.531 ms, respectively, based on the underlying field GF(289). These results are about five times faster than those for previous hardware implementations and at least 13 times better in terms of area‐time products. Further results suggest that neither case is superior to the other when considering the hardware complexity and performance. The characteristics of our proposed HECC coprocessor show that it is applicable to high‐speed network applications as well as resource‐constrained environments, such as PDAs, smart cards, and so on.  相似文献   

12.
The computational power required in many multimedia applications is well beyond the capabilities of today's multimedia systems. Therefore, the embedding of additional high-performance accelerator multimedia components into these systems is most decisive. This paper presents the embedding of multimedia components into computer systems using reconfigurable coprocessor boards. The goal of those reconfigurable platforms which can be adapted to several applications and which include programmable digital signal processors, control and memory devices as well as dedicated multimedia ASICs is worked out. On the way to such a platform four ASICs for image and text processing are presented. The embedding of these components into a computing system using a CardBus-based coprocessor board is shown. Such a reconfigurable coprocessor board is an important intermediate stage on the way to future hybrid reconfigurable systems on chip.  相似文献   

13.
Reconfigurable Filter Coprocessor Architecture for DSP Applications   总被引:1,自引:0,他引:1  
Digital Signal Processing (DSP) is widely used in high-performance media processing and communication systems. In majority of these applications, critical DSP functions are realized as embedded cores to meet the low-power budget and high computational complexity. Usually these cores are ASICs that cannot be easily retargeted for other similar applications that share certain commonalities. This stretches the design cycle that affects time-to-market constraints. In this paper, we present a reconfigurable high-performance low-power filter coprocessor architecture for DSP applications. The coprocessor architecture, apart from having the performance and power advantage of its ASIC counterpart, can be reconfigured to support a wide variety of filtering computations. Since filtering computations abound in DSP applications, the implementation of this coprocessor architecture can serve as an important embedded hardware IP.  相似文献   

14.
In this paper we analyze the power consumption and energy efficiency of general matrix-matrix multiplication (GEMM) and Fast Fourier Transform (FFT) implemented as streaming applications for an FPGA-based coprocessor card. The power consumption is measured with internal voltage sensors and the power draw is broken down onto the systems components in order to classify the energy consumed by the processor cores, the memory, the I/O links and the FPGA card. We present an abstract model that allows for estimating the power consumption of FPGA accelerators on the system level and validate the model using the measured kernels. The performance and energy consumption is compared against optimized multi-threaded software running on the POWER7 host CPUs. Our experimental results show that the accelerator can improve the energy efficiency by an order of magnitude when the computations can be undertaken in a fixed point format. Using floating point data, the gain in energy-efficiency was measured as up to 30 % for the double precision GEMM accelerator and up to 5 × for a 1k complex FFT.  相似文献   

15.
Next-generation mobile devices will continue to demand high processing power for imaging applications. The expected performance is in the class of supercomputers, but delivered with limited energy and memory bandwidth for embedded systems. This article advocates a streaming computation model that leverages the deterministic access patterns in imaging applications to deliver the necessary processing throughput. A reconfigurable datapath connects a set of functional units, forming a computation pipeline to offer energy efficiency. The architecture and implementation of a stream processor are presented along with the memory subsystem to support stream data transfers. The results show speedup ranging from a factor of 2 to 28 for imaging applications, offering favorable comparison against scalar processors.  相似文献   

16.
In this paper, we propose a configuration-aware data-partitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs available when implementing streaming embedded applications on fine-grained reconfigurable architectures. For a certain group of streaming applications, we show that an efficient hardware/software partitioning algorithm is required when targeting low power. However, if the application objective is performance, then we propose the use of dynamically reconfigurable architectures. We propose a design methodology that adapts the architecture and algorithms to the application requirements. The methodology has been proven to work on a real research platform based on Xilinx devices. Finally, we have applied our methodology and algorithms to the case study of image sharpening, which is required nowadays in digital cameras and mobile phones.  相似文献   

17.
Complexity management, portability and long term adaptivity are common challenges in different fields of embedded systems, normally colliding with the needs of efficient resource utilization and power balance. Image/signal processing systems, though required to offer a large variety of complex functions, have also to deal with battery-life limitations. Wearable signal processing systems, for example, should provide high performance and support new generation standards without compromising their portability and their long-term usability. These constraints challenge hardware designers: early stage trade-off analysis and power management automated techniques are helpful to guarantee a reasonable time-to-market. In the field of video codec specifications, the MPEG standard known as Reconfigurable Video Coding (RVC) framework addresses functional complexity and adaptivity leveraging on the intrinsic modularity of the dataflow model of computation, but it still lacks in offering power management support. The main contribution of this work is providing an automatic early-stage power management methodology to be adopted within the MPEG-RVC context. Starting from different high-level specifications, our mapping methodology identifies directly on the high-level models disjointed homogeneous logic clock regions, where the platform resources can be enabled/disabled together without affecting the overall system performance. To extend its usability to the RVC community, we have integrated this methodology within the Multi-Dataflow Composer (MDC) tool. MDC is a tool for on-the-fly reconfigurable signal processing platforms deployment. In this paper, we extended MDC to address power-aware multi-context systems. To prove the effectiveness of our work, a coprocessor for image and video processing acceleration has been assembled. This latter has been synthesized on a 90 nm ASIC technology, where demonstrated up to 90 % of reduction in the dynamic power consumption on different dataflow-intensive applications. The coprocessor has been implemented also on FPGA, confirming, partially, the benefits of adopting the proposed methodology.  相似文献   

18.
为保证智能电网中的数据安全,防止电力通信过程中的数据被篡改,安全芯片的应用必不可少,而RSA算法是安全芯片中应用最广泛的公钥算法之一。RSA算法复杂度高,硬件实现功耗较大,在设计的过程中常常无法完全兼顾性能、功耗、安全性等各个方面。文章设计了一种高性能、能抵抗常见侧信道攻击及EMA电磁攻击的高安全RSA协处理器。提出的随机存储模幂算法真伪运算结果的防护策略,增强了协处理器抵抗侧信道攻击、差分功耗攻击以及EMA电磁攻击的能力。通过两个层级的算法优化来提升协处理器性能,并通过结合CIOS平方算法和Karatsuba算法的改进的Montgomery模乘算法,使得1 024位带防护的RSA算法在UMC 55 nm工艺下的面积为4.8万门@30 MHz,功耗为4.62 mW@30 MHz, FPGA开发板上进行API测试的性能为709.3 kbit/s。  相似文献   

19.
The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号