期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于代码特征分析的TTA指令压缩技术与解压部件实现 总被引：3，自引：2，他引：1

赖明澈王志英戴葵高蕾《电子学报》2008,36(11):2234-2238

针对传输触发结构提出了一种高效的指令压缩技术.改进模板压缩,消除了空传输指令与空长立即数.基于传输局部性特征,提出垂直字典压缩,提高了指令压缩效果.最后,设计单周期解压部件,以较小硬件代价实现了低耦合实时解压.实验结果表明,该技术达到了37.2%的压缩比,并且使计算内核及指令存储器的面积与功耗分别下降了约29%与23%,执行开销仅增加了约4%. 相似文献

2.

针对粗粒度可配置结构芯片的蚁群路由系统设计

宋立国姜玉宪《微电子学与计算机》2007,24(4):15-17

以最大-最小蚁群系统为基础,为蚁群采用增加了嗅觉分辨能力,应用于粗粒度可配置结构芯片的路由问题。以开发的粗粒度可重构芯片CTaiJi为对象,通过几个算例的比较,可以看到此方法找到最优解的能力优于目前常用的谈判阻塞算法。相似文献

3.

Reconfigurable Hardware Architectures for Sequential and Hybrid Decoding 总被引：1，自引：0，他引：1

Benaissa M. Yiqun Zhu 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(3):555-565

A novel reconfigurable sequential decoder architecture based on the Fano algorithm is presented in which the constraint length, the threshold spacing, and the time-out threshold are all run time reconfigurable. To maximize decoding performance, a maximum possible backward depth (of a whole frame) is performed. This is achieved by using shift registers combined with memory to store the information of an entire visited path. A field-programmable gate array) prototype of the decoder is built and actual hardware decoding performances in terms of decoding speeds, bit error rates (BERs), and buffer overflow rates, are obtained and comparisons made. To overcome the decoding delay that is inherent in sequential decoders, a hybrid scheme, including simple block codes and cyclic redundancy check is proposed to limit the number of backward search operations that the sequential decoder has to execute. As a result, a significant reduction in decoding delay and buffer overflow rate is achieved while maintaining comparative decoding performance in terms of BER 相似文献

4.

Application of Reconfigurable CORDIC Architectures 总被引：1，自引：0，他引：1

Oskar Mencer Luc Séméria Martin Morf Jean-Marc Delosme 《The Journal of VLSI Signal Processing》2000,24(2-3):211-221

Reconfiguration enables the adaption of Coordinate Rotation DIgital Computer (CORDIC) units to the specific needs of sets of applications, hence creating application specific CORDIC-style implementations. Reconfiguration can be implemented at a high level, taking the entire CORDIC unit as a basic cell (CORDIC-cells) implemented in VLSI, or at a low level such as Field-Programmable Gate Arrays (FPGAs). We suggest a design methodology and analyze area/time results for coarse (VLSI) and fine-grain (FPGA) reconfigurable CORDIC units. For FPGAs we implement CORDIC units in Verilog HDL and our object-oriented design environment, PAM-Blox. For CORDIC-cells, multiple reconfigurable CORDIC modules are synthesized with state-of-the-art CAD tools. At the algorithm level we present a case study combining multiple CORDICs based on a geometrical interpretation of a normalized ladder algorithm for adaptive filtering to reduce latency and area of a fully pipelined CORDIC implementation. Ultimately, the goal is to create automatic tools to map applications directly to reconfigurable high-level arithmetic units such as CORDICs. 相似文献

5.

Design Aspects of Multi-level Reconfigurable Architectures

Sebastian Lange Martin Middendorf 《Journal of Signal Processing Systems》2008,51(1):23-37

Dynamically reconfigurable hardware has already been deployed for accelerating computationally demanding applications. Some of these hardware architectures allow run time reconfiguration but this usually leads to a large reconfiguration overhead. The advantage of run time reconfiguration is that it allows new algorithmic solutions for many applications. To study the potential of frequent run time reconfiguration it is interesting to investigate its costs and benefits from an abstract point of view and to develop new architectural concepts. Multi-level reconfigurable architectures are one such concept that introduces several levels of reconfiguration. This paper deals with new types of multi-level reconfigurable architectures. The corresponding problem of finding the best granularity for different reconfiguration levels is formulated and investigated. Although this problem is shown to be NP-complete, an interesting restricted subcase is solved optimally in polynomial time. For the general case, a good heuristic is proposed that is based on solutions for the restricted case. Results on three example applications show that the reconfiguration cost can be reduced with the new architectures. Based on a proposed measure of relative efficiency it is also shown that the new architectures are more efficient so that they obtain a larger reconfiguration cost reduction with less additional hardware.

Martin MiddendorfEmail:

相似文献

6.

High-Performance Software Protection Using Reconfigurable Architectures

Zambreno J. Honbo D. Choudhary A. Simha R. Narahari B. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2006,94(2):419-431

One of the key problems facing the computer industry today is ensuring the integrity of end-user applications and data. Researchers in the relatively new field of software protection investigate the development and evaluation of controls that prevent the unauthorized modification or use of system software. While many previously developed protection schemes have provided a strong level of security, their overall effectiveness has been hindered by a lack of transparency to the user in terms of performance overhead. Other approaches take to the opposite extreme and sacrifice security for the sake of this transparency. In this work we present an architecture for software protection that provides for a high level of both security and user transparency by utilizing field programmable gate array (FPGA) technology as the main protection mechanism. We demonstrate that by relying on FPGA technology, this approach can accelerate the execution of programs in a cryptographic environment, while maintaining the flexibility through reprogramming to carry out any compiler-driven protections that may be application-specific. 相似文献

7.

Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures

Nuzzo P. De Bernardinis F. Terreni P. Van der Plas G. 《IEEE transactions on circuits and systems. I, Regular papers》2008,55(6):1441-1454

The need for highly integrable and programmable analog-to-digital converters (ADCs) is pushing towards the use of dynamic regenerative comparators to maximize speed, power efficiency and reconfigurability. Comparator thermal noise is, however, a limiting factor for the achievable resolution of several ADC architectures with scaled supply voltages. While mismatch in these comparators can be compensated for by calibration, noise can irreparably hinder performance and is less straightforward to be accounted for at design time. This paper presents a method to estimate the input referred noise in fully dynamic regenerative comparators leveraging a reference architecture. A time-domain analysis is proposed that accounts for the time varying nature of the circuit exploiting some basic results from the solution of stochastic differential equations. The resulting symbolic expressions allow focusing designers' attention on the most influential noise contributors. Analysis results are validated by comparison with electrical simulations and measurement results from two ADC prototypes based on the reference comparator architecture, implemented in 0.18- $mu{hbox {m}}$ and 90-nm CMOS technologies. 相似文献

8.

A JPEG Chip for Image Compression and Decompression

Sung-Hsien Sun Shie-Jue Lee 《The Journal of VLSI Signal Processing》2003,35(1):43-60

相似文献

9.

JPEG解码算法在多核架构上的实现

朱玲邵世祥袁开智《信息技术与标准化》2008,124(1):38-40

介绍了JPEG算法的原理,设计出合理的流程将JPEG算法并行化,在多核处理器架构上并行处理,并通过内存读取等方面的优化,极大地提高了JPEG解码算法的解码速度.实测表明,在三核集成的多核处理器上,JPEG图像的平均解码周期为单核处理器上的36%左右. 相似文献

10.

一种快速解压的无损压缩算法

王松房利国韩炼冰刘鸿博《通信技术》2020,(5):1121-1126

随着信息技术的不断发展,大量数据给存储和传输都带来了巨大的挑战。数据压缩能够有效减少数据量,方便数据的处理和传输。无损压缩是一种利用数据的冗余特点进行压缩的压缩方法,解压时可以完全还原数据而不会有任何失真。在研究LZO算法的快速解压原理基础上,设计了一种新的压缩算法。该算法通过减少压缩数据中压缩块的数量,降低了解压程序的执行开销。测试结果表明,新算法可实现比LZO算法更快的解压速度。相似文献

11.

一种粗粒度可重构体系结构多目标优化映射算法

下载免费PDF全文

陈乃金江建慧《电子学报》2015,43(11):2151-2160

针对多约束下的行流水粗粒度可重构体系结构的硬件任务划分映射问题,提出了一种多目标优化映射算法.该算法根据运算节点执行时延、依赖度等因素构造了累加概率权值函数,在满足可重构单元面积和互连等约束下,通过该函数值动态调整就绪节点的映射调度次序,当一块可重构单元阵列当前行映射完毕后,就自动换行,当一块阵列被填满,就切换到下一块,当一个数据流图映射完毕后,就自动计算划分块数等参数.实验结果表明,与层贪婪映射算法相比,文中算法平均执行总周期降低了8.4%(RCA_4×4)和5.3%(RCA_6×6),与分裂压缩内核映射算法相比,文中算法平均执行总周期降低了20.6%(RCA_4×4)和21.0%(RCA_6×6),从而验证了文中提出算法的有效性. 相似文献

12.

面向VLIW结构的高性能代码生成技术

王红梅王敏张铁军单睿侯朝焕《微电子学与计算机》2010,27(2)

DSP处理器通过采用VLIW结构获得了高性能,同时也增加了编译器为其生成汇编代码的难度.代码生成器作为编译器的代码生成部件,是VLIW结构能够发挥性能的关键.由此提出并实现了一种基于可重定向编译框架的代码生成器.该代码生成器充分利用VLIW的体系结构特点,支持SIMD指令,支持谓词执行,能够生成高度指令级并行的汇编代码,显著提高应用程序的执行性能. 相似文献

13.

Code Decompression Unit Design for VLIW Embedded Processors

Yuan Xie Wolf W. Lekatsas H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(8):975-980

Code size "bloating" in embedded very long instruction word (VLIW) processors is a major concern for embedded systems since memory is one of the most restricted resources. In this paper, we describe a code compression algorithm based on arithmetic coding, discuss how to design decompression architecture, and illustrate the tradeoffs between compression ratio and decompression overhead, by using different probability models. Experimental results for a VLIW embedded processor TMS320C6x show that compression ratios between 67% and 80% can be achieved, depending on the probability models used. A precache decompression unit design is implemented in TSMC 0.25 mum and a test chip is fabricated. 相似文献

14.

Still Image Processing on Coarse-Grained Reconfigurable Array Architectures

Matthias?Hartmann Vasileios??Pantazis Tom?Vander Aa Mladen?Berekovic Christian?Hochberger 《Journal of Signal Processing Systems》2010,60(2):225-237

Due to the increasing demands on efficiency, performance and flexibility reconfigurable computational architectures are very promising candidates in embedded systems design. Recently coarse-grained reconfigurable array architectures (CGRAs), such as the ADRES CGRA and its corresponding DRESC compiler are gaining more popularity due to several technological breakthroughs in this area. We investigate the mapping of two image processing algorithms, Wavelet encoding and decoding, and TIFF compression on this novel type of array architectures in a systematic way. The results of our experiments show that CGRAs based on ADRES and its DRESC compiler technology deliver improved performance levels for these two benchmark applications when compared to results obtained on a state-of-the-art commercial DSP platform, the c64x DSP from Texas Instruments. ADRES/DRESC can beat its performance by at least 50% in cycle count and the power consumption even drops to 10% of the published numbers of the c64x DSP. 相似文献

15.

Optimized RTL Code Generation from Coarse-Grain Dataflow Specification for Fast HW/SW Cosynthesis

Hyunuk Jung Hoeseok Yang Soonhoi Ha 《Journal of Signal Processing Systems》2008,52(1):13-34

This paper presents a new methodology of automatic RTL code generation from coarse-grain dataflow specification for fast HW/SW cosynthesis. A node in a coarse-grain dataflow specification represents a functional block such as FIR and DCT and an arc may deliver multiple data samples per block invocation, which complicates the problem and distinguishes it from behavioral synthesis problem. Given optimized HW library blocks for dataflow nodes, we aim to generate the RTL codes for the entire hardware system including glue logics such as buffer and MUX, and the central controller. In the proposed design methodology, a dataflow graph can be mapped to various hardware structures by changing the resource allocation and schedule information. It simplifies the management of the area/performance tradeoff in hardware design and widens the design space of hardware implementation of a dataflow graph. We also support Fractional Rate Dataflow (FRDF) specification for more efficient hardware implementation. To overcome the additional hardware area overhead in the synthesized architecture, we propose two techniques reducing buffer overhead. Through experiments with some real examples, the usefulness of the proposed technique is demonstrated.

Soonhoi Ha (Corresponding author)Email:

相似文献

16.

Interconnect Exploration for Energy Versus Performance Tradeoffs for Coarse Grained Reconfigurable Architectures

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(1):151-155

Modern portable embedded devices require processors that can provide sufficient performance for demanding multimedia and wireless applications. At the same time they have to be flexible to support a wide range of products and extremely energy efficient to provide a long battery life. Coarse Grained Reconfigurable Architectures (CGRAs) potentially meet these constraints by providing a mix of flexible computational resources and large amounts of programmable interconnect. The vast design space of CGRAs complicates the development of optimized processors. Most effort has been spent on improving the performance. However, the energy cost of the programmable interconnect is becoming more expensive and this cost can no longer be neglected. In this work we present an energy- and performance-aware exploration for the interconnect of a CGRA and show that important tradeoffs can be made for those metrics. This will enable designers to develop more efficient architectures, tuned to a targeted application domain. 相似文献

17.

Dynamic Context Compression for Low-Power Coarse-Grained Reconfigurable Architecture

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2010,18(1):15-28

Most of the coarse-grained reconfigurable architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead (2.16%). 相似文献

18.

A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures

Yoon J.W. Shrivastava A. Sanghyun Park Minwook Ahn Yunheung Paek 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(11):1565-1578

Recently coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their efficiency and flexibility. While many CGRAs have demonstrated impressive performance improvements, the effectiveness of CGRA platforms ultimately hinges on the compiler. Existing CGRA compilers do not model the details of the CGRA, and thus they are i) unable to map applications, even though a mapping exists, and ii) using too many processing elements (PEs) to map an application. In this paper, we model several CGRA details, e.g., irregular CGRA topologies, shared resources and routing PEs in our compiler and develop a graph drawing based approach, split-push kernel mapping (SPKM), for mapping applications onto CGRAs. On randomly generated graphs our technique can map on average 4.5times more applications than the previous approach, while generating mappings which have better qualities in terms of utilized CGRA resources. Utilizing fewer resources is directly translated into increased opportunities for novel power and performance optimization techniques. Our technique shows less power consumption in 71 cases and shorter execution cycles in 66 cases out of 100 synthetic applications, with minimum mapping time overhead. We observe similar results on a suite of benchmarks collected from Livermore loops, Mediabench, Multimedia, Wavelet and DSPStone benchmarks. SPKM is not a customized algorithm only for a specific CGRA template, and it is demonstrated by exploring various PE interconnection topologies and shared resource configurations with SPKM. 相似文献

19.

Efficient Code Compression for Embedded Processors

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(12):1696-1707

Code density is of increasing concern in embedded system design since it reduces the need for the scarce resource memory and also implicitly improves further important design parameters like power consumption and performance. In this paper we introduce a novel, hardware-supported approach. Besides the code, also the lookup tables (LUTs) are compressed, that can become significant in size if the application is large and/or high compression is desired. Our scheme optimizes the number and size of generated LUTs to improve the compression ratio. To show the efficiency of our approach, we apply it to two compression schemes: “dictionary-based” and “statistical”. We achieve an average compression ratio of 48% (already including the overhead of the LUTs). Thereby, our scheme is orthogonal to approaches that take particularities of a certain instruction set architecture into account. We have conducted evaluations using a representative set of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS, and PowerPC. 相似文献

20.

Hardware Supported Task Scheduling on Dynamically Reconfigurable SoC Architectures

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(11):1465-1474

Dynamically reconfigurable system-on-a-chip (RSoC) technology features embedded microprocessors that are dispersed on the same die with significant amounts of programmable logic fabric. In this paper, we present a strategy to solve the recently emerging problem of how to utilize the flexible but still limited RSoC resources in an effective manner for a multi-task application. The major contribution of this paper is the development of a dynamic task scheduling algorithm that can be implemented in fixed or reconfigurable hardware that will perform the online scheduling of task systems onto the RSoC type architecture. The results from extensive simulations demonstrate the benefits of the proposed dynamic scheduling approach as compared with that of other static scheduling techniques taken from the technical literature. 相似文献