首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper presents a TriMedia processor extended with three reconfigurable designs for entropy decoding (ED), inverse quantization (IQ), and two-dimensional (2-D) inverse discrete cosine transform (IDCT), and assesses the performance gain that is provided by such extensions when performing MPEG2-compliant pel reconstruction. We first describe an extension of the TriMedia architecture, which consists of a multiple-context field programmable gate array (FPGA)-based reconfigurable functional unit (RFU), a configuration unit managing the reconfiguration of the RFU, and their associated instructions. Then, we address the computation of the ED, IQ, and 2-D IDCT tasks, and propose to provide reconfigurable hardware support for a variable-length decoder that can decode two symbols per call (VLD-2), an inverse quantizer that can dequantize four coefficients per call (IQ-4), and an 1-D IDCT (1-D IDCT). The most important aspects concerning the implementation of the FPGA-mapped VLD-2, IQ-4, and 1-D IDCT units, as well as the organization of the software routines calling these FPGA-mapped computing units are outlined. Experimental results indicate that by configuring each of the VLD-2, IQ-4, and 1-D IDCT units on a different FPGA context, and by activating the contexts as needed, the FPGA-augmented TriMedia can perform MPEG2-compliant pel reconstruction with an average speed-up of 1.4/spl times/ over the standard TriMedia.  相似文献   

2.
The advances in the programmable hardware has lead to new architectures where the hardware can be dynamically adapted to the application to gain better performance. There are still many challenging problems to be solved before any practical general-purpose reconfigurable system is built. One fundamental problem is the placement of the modules on the reconfigurable functional unit (RFU). In reconfigurable systems, we are interested both in online placement, where arrival time of tasks is determined at runtime and is not known a priori, and offline in which the schedule is known at compile time. In the case of offline placement, we are willing to spend more time during compile time to find a compact floorplan for the RFU modules and utilize the RFU area more efficiently. In this paper we present offline placement algorithms based on simulated annealing and greedy methods and show the superiority of their placements over the ones generated by an online algorithm.  相似文献   

3.
A software radio architecture for linear multiuser detection   总被引:5,自引:0,他引:5  
The integration of multimedia services over wireless channels calls for provision of variable quality of service (QoS) requirements. While radio resource management algorithms (such as power control and call admission control) can provide certain levels of variability in QoS, an alternate approach is to use reconfigurable radio architectures to provide diverse QoS guarantees. We outline a novel reconfigurable architecture for linear multiuser detection, thereby providing a wide range of bit error rate (BER) requirements amongst the constituent receivers of the reconfigurable architecture. Specifically, we focus on achieving this dynamic reconfiguration via a software radio implementation of linear multiuser receivers. Using a unified framework for achieving this reconfiguration, we partition functionality into two core technologies [field programmable gate arrays (FPGA) and digital signal processor (DSP) devices] based on processing speed requirements. We present experimental results on the performance and reconfigurability of the software radio architecture as well as the impact of fixed point arithmetic (due to hardware constraints)  相似文献   

4.
在我国空间通信技术取得巨大发展的今天,对空间飞行器电子设备功能可重构、可升级、运行代码可更换的需求越来越多。分析了空间辐射效应对高性能数字信号处理器(DSP)和SRAM型FPGA的影响,提出了一种适用于空间飞行器上的可重构信息处理平台的硬件设计方法,具有可重构、可在线升级运行代码的特点。该硬件架构计由高性能DSP和高可靠性的反熔丝FPGA为主要组成,提出了这种架构对抗空间单粒子效应的方法。该电路设计方法可以为空间飞行器通信设备的设计提供参考。  相似文献   

5.
A low-power programmable processor named icyflex1 was designed combining features of a digital signal processor (DSP) and a micro-controller unit (MCU). Implemented as a synthesizable VHDL software intellectual property core, the processor implements a broad range of power saving features including its customizable architecture and reconfigurable instruction set. Its performance is compared with other processors from the market and values are given for its integration in a 180 nm technology. The processor targets applications with tight power consumption constraints and correspondingly significant processing performance.   相似文献   

6.
It’s a promising way to improve performance significantly by adding reconfigurable processing unit (RPU) to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is proposed. Reconfigurable logic is separated into RPUs logically, which are coupled with general purpose cores as co-processors via a full crossbar switch. An RPU Manager (RPU-M) is also designed to manage RPUs. To verify RMC, a simulation method based on the Simics and Virtex 5 FPGA is adopted, which simplifies the simulation and assures the evaluation accuracy of hardware function cores. Five workloads are selected to test RMC, including 3-DES, AES, SHA2, IDCT and JPEG_ENC. The experimental results show a 3.10 times average speedup over software implementation on the original multi-core, and the data and control communication overhead on RMC is acceptable.  相似文献   

7.
Yingjiao Ma  Jinglin Shi 《电信纪事》2018,73(9-10):639-650
This paper proposes a reconfigurable remote radio head (RRH) software and hardware architecture and presents the results of its implementation. The implemented RRH has been designed for a centralized architecture super base station that facilitates the integration of 2G, 3G, and 4G ground base stations and geosynchronous equatorial orbit (GEO) satellite gateway stations. The reconfigurable RRH is a primary aspect of the system. By modifying the programmable software configuration from the perspective of software-defined radio, the RRH can support multiple frequency allocations. The RRH is composed of radio frequency (RF) modules, intermediate frequency (IF) modules, and some of the physical layer functions. The development of a reconfigurable IF module is emphasized. A filter-bank multiplexing technique has been employed for the IF module, which reduces the use of hardware resources. To this end, we experimentally evaluated the main key performance indicators (KPIs) of the RRH. The experimental results demonstrate that the proposed design attains the goal of reducing hardware costs and improving system flexibility and stability.  相似文献   

8.
《电子学报:英文版》2017,(6):1161-1167
By exploring symmetric cryptographic data level and instruction-level parallelism, the reconfigurable processor architecture for symmetric ciphers is presented based on Very-long instruction word (VLIW) structure. The application-specific instruction-set system for symmetric ciphers is proposed. As for the same arithmetic operation of symmetric ciphers, eleven kinds of reconfigurable cryptographic arithmetic units are designed by the reconfigurable technology. As to the requirement of high energy-efficient design, the loop buffer structure for instruction fetching unit is proposed to reduce the power consumption significantly with the same frequency as conventional, meanwhile, the chain processing mechanism is proposed to improve the cryptographic throughput without any area overhead. It has been fabricated with 0.18μm CMOS technology. The result shows that the processor can work up to 200MHz, and the fourteen kinds of cryptographic algorithms were mapped in the processor, the encryption throughput of AES, SNOW2.0 and SHA2 algorithm can achieve 1.19Gbps, 1.05Gbps, and 407Mbps respectively.  相似文献   

9.
In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications.  相似文献   

10.
Application-specific processors offer an attractive option in the design of embedded systems by providing high performance for a specific application domain. In this work, we describe the use of a reconfigurable processor core based on an RISC architecture as starting point for application-specific processor design. By using a common base instruction set, development cost can be reduced and design space exploration is focused on the application-specific aspects of performance. An important aspect of deploying any new architecture is verification which usually requires lengthy software simulation of a design model. We show how hardware emulation based on programmable logic can be integrated into the hardware/software codesign flow. While previously hardware emulation required massive investment in design effort and special purpose emulators, an emulation approach based on high-density field-programmable gate array (FPGA) devices now makes hardware emulation practical and cost effective for embedded processor designs. To reduce development cost and avoid duplication of design effort, FPGA prototypes and ASIC implementations are derived from a common source: We show how to perform targeted optimizations to fully exploit the capabilities of the target technology while maintaining a common source base  相似文献   

11.
This paper describes a new architecture for embedded reconfigurable computing, based on a very-long instruction word (VLIW) processor enhanced with an additional run-time configurable datapath. The reconfigurable unit is tightly coupled with the processor, featuring an application-specific instruction-set extension. Mapping computation intensive algorithmic portions on the reconfigurable unit allows a more efficient elaboration, thus leading to an improvement in both timing performance and power consumption. A test chip has been implemented in a standard 0.18-/spl mu/m CMOS technology. The test of a signal processing algorithmic benchmark showed speedups ranging from 4.3/spl times/ to 13.5/spl times/ and energy consumption reduced up to 92%.  相似文献   

12.
In this paper, we present a new coarse-grained reconfigurable architecture called FleXilicon for multimedia and wireless communications, which improves resource utilization and achieves a high degree of loop level parallelism (LLP). The proposed architecture mitigates major shortcomings with existing architectures through wider memory bandwidth, reconfigurable controller, and flexible word-length support. VLSI implementation of FleXilicon indicates that the proposed pipeline architecture can achieve a high speed operation up to 1 GHz using 65-nm SOI CMOS process with moderate silicon area. To estimate the performance of FleXilicon, we modeled the processor in SystemC and implemented five different types of applications commonly used in wireless communications and multimedia applications and compared its performance with an ARM processor and a TI digital signal processor. The simulation results indicate that FleXilicon reduces the number of clock cycles and increases the speed for all five applications. The reduction and speedup ratios are as large as two orders of magnitude for some applications.   相似文献   

13.
As technology scaling reduces pace and energy efficiency becomes a new important design constraint, superscalar processor designs are reaching their performance limits due to area and power restrictions. As a result, new microarchitectural paradigms need to be developed. This work proposes a new organization for x86 processors, based on a traditional superscalar design coupled to a reconfigurable array. The system exploits the fact that few basic blocks are responsible for most of the instructions that execute in the processor, and transforms these basic blocks into configurations for the reconfigurable array. Each configuration encodes the semantics and dependencies for all instructions in the block, so that the ones already mapped can execute bypassing the fetch, decode and dependency checks stages and improving instruction throughput. Our study on the potential of the architecture shows that performance gains of up to 2.5\(\times \) with respect to a traditional superscalar can be achieved.  相似文献   

14.
In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler‐hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write‐back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single‐instruction multiple‐data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32‐bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2‐way MLEP and 33.7% faster with a 4‐way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.  相似文献   

15.
提出了一个基于硬件抽象机的流水线微处理器设计框架,创造性地使用了一种基于标签结构的模拟执行技术.基于这一框架,描述了一个堆栈抽象机的工作原理,实现了一个Java指令级并行处理器.利用堆栈硬件抽象机和堆栈指令折叠技术的组合解决了Java处理器中的堆栈依赖瓶颈问题.软件模拟证明了该处理器能够最大限度地挖掘出Java程序中的指令级并行,并且拥有更高的处理能力.  相似文献   

16.
基于流体系结构的高效能分组密码处理器研究   总被引:1,自引:0,他引:1       下载免费PDF全文
针对现有密码处理器存在的问题,借鉴流处理器架构,提出了高效能的可重构分组密码流处理器架构.该架构采用层次化设计思想,通过分块式本地寄存器组的数据组织方式和共享拼接使用运算单元机制,实现了软件流水和硬件流水的协同工作,能够挖掘分组内和分组间的指令级并行性并提高功能单元的利用率.在65nm CMOS工艺下对架构进行了综合仿真,并经过了大量算法映射.实验结果证明,该架构在CBC和ECB加密模式下均具有良好的加密性能.与其他密码处理器相比,该架构具有小面积、高效能的特点.  相似文献   

17.
陆智俊  贲德  毛博年 《红外与激光工程》2016,45(11):1126003-1126003(6)
针对立方体钠卫星GNC信息处理系统高计算性能与低功率消耗相矛盾的问题,提出了一种资源限制型可重构并行信息处理方法。该方法采用紧耦合可重构并行信息处理架构,将GNC信息处理中需要多次迭代计算且不适合CPU处理的复杂软件算法,以动态部分重构硬件电路单元(DPR)的方式实现,采用基于互斥量的多核并行可重构资源调度算法,通过多核CPU并行管理与调度共享的DPR单元,完成软件算法的硬件加速与优化。实验结果表明,该方法实现了立方星GNC信息处理系统的高效实时快速处理,与传统信息处理方法相比,可节约50%左右的功耗,可应用于计算资源极为有限的星上信息处理领域,具有很好的工程应用前景。  相似文献   

18.
Coarse-grained reconfigurable arrays (CGRAs) have shown potential for application in embedded systems in recent years. Numerous reconfigurable processing elements (PEs) in CGRAs provide flexibility while maintaining high performance by exploring different levels of parallelism. However, a difference remains between the CGRA and the application-specific integrated circuit (ASIC). Some application domains, such as software-defined radios (SDRs), require flexibility with performance demand increases. More effective CGRA architectures are expected to be developed. Customisation of a CGRA according to its application can improve performance and efficiency. This study proposes an application-specific CGRA architecture template composed of generic PEs (GPEs) and special PEs (SPEs). The hardware of the SPE can be customised to accelerate specific computational patterns. An automatic design methodology that includes pattern identification and application-specific function unit generation is also presented. A mapping algorithm based on ant colony optimisation is provided. Experimental results on the SDR target domain show that compared with other ordinary and application-specific reconfigurable architectures, the CGRA generated by the proposed method performs more efficiently for given applications.  相似文献   

19.
Data hiding systems have emerged as a solution against the piracy problem, particularly those based on quantization have been widely used for its simplicity and high performance. Several data hiding applications, such as broadcasting monitoring and live performance watermarking, require a real-time multi-channel behavior. While Digital Signal Processors (DSP) have been used for implementing these schemes achieving real-time performance for audio signal processing, custom hardware architectures offer the possibility of fully exploiting the inherent parallelism of this type of algorithms for more demanding applications. This paper presents an efficient hardware implementation of a Rational Dither Modulation (RDM) algorithm-based data hiding system in the Modulated Complex Lapped Transform (MCLT) domain. In general terms, the proposed hardware architecture is conformed by an MCLT processor, an Inverse MCLT processor, a Coordinate Rotation Digital Computer (CORDIC) and an RDM-QIM processor. Results of implementing the proposed hardware architecture on a Field Programmable Gate Array (FPGA) are presented and discussed.  相似文献   

20.
A run-time reconfigurable multiply-accumulate (MAC) architecture is introduced. It can be easily reconfigured to trade bitwidth for array size (thus maximizing the utilization of available hardware); process signed-magnitude, unsigned or 2's complement data; make use of part of its structure or adapt its structure based on the specified throughput requirements and the anticipated computational load. The proposed architecture consists of a reconfigurable multiplier, a reconfigurable adder, an accumulation unit, and two units for data representation conversion and incoming and outgoing data stream transfer. Reconfiguration can be done dynamically by using only a few control bits and the main component modules can operate independently from each other. Therefore, they can be enabled or disabled according to the required function each time. Comparison results in terms of performance, area and power consumption prove the superiority of the proposed reconfigurable module over existing realizations in a quantitative and qualitative manner.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号