期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

HW/SW codesign techniques for dynamically reconfigurable architectures

Noguera J. Badia R.M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2002,10(4):399-415

Hardware/software (HW/SW) codesign and reconfigurable computing are commonly used methodologies for digital-systems design. However, no previous work has been carried out in order to define a HW/SW codesign methodology with dynamic scheduling for run-time reconfigurable architectures. In addition, all previous approaches to reconfigurable computing multicontext scheduling are based on static-scheduling techniques. In this paper, we present three main contributions: 1) a novel HW/SW codesign methodology with dynamic scheduling for discrete event systems using dynamically reconfigurable architectures; 2) a new dynamic approach to reconfigurable computing multicontext scheduling; and 3) a HW/SW partitioning algorithm for dynamically reconfigurable architectures. We have developed a whole codesign framework, where we have applied our methodology and algorithms to the case study of software acceleration. An exhaustive study has been carried out, and the obtained results demonstrate the benefits of our approach. 相似文献

2.

An Algorithm for Hardware/Software Partitioning Using Mixed Integer Linear Programming 总被引：6，自引：1，他引：5

Ralf Niemann Peter Marwedel 《Design Automation for Embedded Systems》1997,2(2):165-193

One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioning are needed. The paper presents an algorithm using integer programming for solving the hardware/software partitioning problem leading to promising results. 相似文献

3.

Efficient variable partitioning and scheduling for DSP processors with multiple memory modules

Qingfeng Zhuge Sha E.H.-M. Bin Xiao Chantrapornchai C. 《Signal Processing, IEEE Transactions on》2004,52(4):1090-1099

Multiple on-chip memory modules are attractive to many high-performance digital signal processing (DSP) applications. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multiple memory modules remains difficult. The performance gain in this kind of architecture strongly depends on variable partitioning and scheduling techniques. In this paper, we propose a graph model known as the variable independence graph (VIG) and algorithms to tackle the variable partitioning problem. Our results show that VIG is more effective than interference graph for solving variable partitioning problem. Then, we present a scheduling algorithm known as the rotation scheduling with variable repartition (RSVR) to improve the schedule lengths efficiently on a multiple memory module architecture. This algorithm adjusts the variable partitions during scheduling and generates a compact schedule based on retiming and software pipelining. The experimental results show that the average improvement on schedule lengths is 44.8% by using RSVR with VIG. We also propose a design space exploration algorithm using RSVR to find the minimum number of memory modules and functional units satisfying a schedule length requirement. The algorithm produces more feasible solutions with equal or fewer number of functional units compared with the method using interference graph. 相似文献

4.

一种同时实现算子调度与数据流图划分的高层次综合算法

王磊魏少军《半导体学报》2004,25(4)

选择分模块的数据通道作为高层次综合的目标结构,完整地定义了同时实现算子调度和数据流图划分的高层次综合算法,并提出一种有效的启发式求解方法.与传统的结构相比,由于在关键路径中消除了全局连线的延时,分模块的结构可以有效地减小时钟周期、优化电路性能.实验结果验证了该方法的有效性. 相似文献

5.

一种同时实现算子调度与数据流图划分的高层次综合算法

王磊魏少军《半导体学报》2004,25(4):383-387

选择分模块的数据通道作为高层次综合的目标结构,完整地定义了同时实现算子调度和数据流图划分的高层次综合算法,并提出一种有效的启发式求解方法.与传统的结构相比,由于在关键路径中消除了全局连线的延时,分模块的结构可以有效地减小时钟周期、优化电路性能.实验结果验证了该方法的有效性相似文献

6.

实现测试复用的SOC设计中的测试结构

王超沈海斌陆思安严晓浪《微电子学》2004,34(3):314-316,321

在系统芯片SOC(system on a chip)设计中实现IP核测试复用的芯片测试结构一般包含两个部分：1)用于传送测试激励和测试响应的片上测试访问机制TAM;2)实现测试控制的芯片测试控制器。文章分析了基于测试总线的芯片测试结构,详细阐述了SOC设计中测试调度的概念,给出了一种能够灵活实现各种测试调度结果的芯片测试控制器的设计。相似文献

7.

Exploiting intellectual properties with imprecise design costs for system-on-chip synthesis

Byoung-Woon Kim Chong-Min Kyung 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2002,10(3):240-252

This paper presents an intellectual property (IP)-based system-on-chip (SoC) synthesis framework focusing on how to select IPs from different sources and how to integrate the selected IPs using on-chip buses. In order to synthesize an on-chip bus-based SoC architecture using IPs with imprecise design costs, we propose a possibilistic mixed integer linear programming (PMILP) model, which is converted into an equivalent mixed integer linear programming (MILP) model without increasing the computational complexity. Then, the equivalent MILP model is solved to decide whether each IP is selected or not, and to locate the selected IP on the optimal on-chip bus of a hierarchical bus architecture that consists of on-chip buses with different bus attributes. Experimental results on an MP3 decoding system show that the IP-centric design space with uncertainty can be successfully explored using the proposed scheme. 相似文献

8.

An optimization approach to the synthesis of multichiparchitectures

Gebotys C.H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(1):11-20

An optimization approach to the high level synthesis of VLSI multichip architectures is presented in this paper. This research is important for industry since it is well known that these early high level decisions have the greatest impact on the final VLSI implementation. Optimal application-specific architectures are synthesized here to minimize latency given constraints on chip area, I/O pin count and interchip communication delays. A mathematical integer programming (IP) model for simultaneously partitioning, scheduling, and allocating hardware (functional units, I/O pins, and interchip busses) is formulated. By exploiting the problem structure, using polyhedral theory, the size of the search space is decreased and a new variable selection strategy is introduced based on the branch and bound algorithm. Multichip optimal architectures for several examples are synthesized in practical cpu times. Execution times are comparable to previous heuristic approaches, however there are significant improvements in optimal schedules and allocations of multichips. This research breaks new ground by 1) simultaneously partitioning, scheduling, and allocating in practical cpu times, 2) guaranteeing globally optimal architectures for multichip systems for a specific objective function, and 3) supporting interchip communication delay, interchip bus allocation, and other complex interface constraints 相似文献

9.

Topology/Floorplan/Pipeline Co-Design of Cascaded Crossbar Bus 总被引：1，自引：0，他引：1

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(8):1034-1047

On-chip bus design has a significant impact on the die area, power consumption, performance and design cycle of complex system-on-chips (SoCs). Especially, for high frequency systems having on-chip buses pipelined extensively to cope with long wire delay, a naive bus design may yield a significant area/power cost mostly due to bus pipeline cost. The topology, floorplan, and pipeline are the most important design factors that affect the cost and frequency of the on-chip bus. Since they are strongly correlated with each other, it is imperative to codesign all of the three. In this paper, we present an automated codesign method for cascaded crossbar bus design. We present CADBUS (CAscadeD crossbar BUS design tool), an automated tool for AXI-based cascaded crossbar bus architecture design. The primary objective of this study is to design a cascaded crossbar bus, including the topology/floorplan/bus pipelines, having minimum area/power cost while satisfying the given constraints of communication bandwidth/latency or frequency. Experimental results of the three industrial strength SoCs show that, compared to the existing approach, the proposed method gives as much as 11.6%–34.2% (9.9%–33.5%) savings in bus area (power consumption). 相似文献

10.

Hardware/software codesign: a systematic approach targeting data-intensive applications

Wiangtong T. Cheung P.Y.K. Luk W. 《Signal Processing Magazine, IEEE》2005,22(3):14-22

This article presents a systematic approach to hardware/software codesign targeting data-intensive applications. It focuses on the application processes that can be represented in directed acrylic graphs (DAGs) and use a synchronous dataflow (SDF) model, the popular form of dataflow employed in DSP systems when running the process. The codesign system is based on the ultrasonic reconfigurable platform, a system designed jointly at Imperial College and the SONY Broadcast Laboratory. This system is modeled as a loosely coupled structure consisting of a single instruction processor and multiple reconfigurable hardware elements. The paper also introduces and demonstrates a task-based hardware/software codesign environment specialized for real-time video applications. Both the automated partitioning and scheduling environment and the task manager program help to provide a fast robust for supporting demanding applications in the codesign system. 相似文献

11.

SoC based floating point implementation of differential evolution algorithm using FPGA

Kiran Kumar Anumandla Rangababu Peesapati Samrat L. Sabat Siba K. Udgata 《Design Automation for Embedded Systems》2012,16(4):221-240

This paper presents floating point design and implementation of System on Chip (SoC) based Differential Evolution (DE) algorithm using Xilinx Virtex-5 Field Programmable Gate Array (FPGA). The hardware implementation is carried out to enhance the execution speed of the embedded applications. Intellectual Property (IP) of DE algorithm is developed and interfaced with the 32-bit PowerPC 440 processor using processor local bus (PLB) of Xilinx Virtex-5 FPGA. In the proposed architecture the algorithmic parameters of DE are scalable. The software and hardware implementation of the DE algorithm is carried out in PowerPC embedded processor and hardware IP respectively. The optimization of numerical benchmark functions and system identification in control systems are implemented to verify the proposed hardware SoC platform. The performance of the IP is measured in terms of acceleration gain of the DE algorithm. The optimization problems are solved by using floating point arithmetic in both embedded processor and hardware. The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor. Further, as a case study an Infinite Impulse Response (IIR) based system identification task on SoC using the developed hardware accelerator is implemented. 相似文献

12.

An FPGA implementation of HW/SW codesign architecture for H.263 video coding 总被引：1，自引：0，他引：1

Ahmed Patrice Fahmi Patrice Nouri Herve 《AEUE-International Journal of Electronics and Communications》2007,61(9):605-620

In this paper, we present an efficient HW/SW codesign architecture for H.263 video encoder and its FPGA implementation. Each module of the encoder is investigated to find which approach between HW and SW is better to achieve real-time processing speed as well as flexibility. The hardware portions include the Discrete Cosine Transform (DCT), inverse DCT (IDCT), quantization (Q) and inverse quantization (IQ). Remaining parts were realized in software executed by the NIOS II softcore processor. This paper also introduces efficient design methods for HW and SW modules. In hardware, an efficient architecture for the 2-D DCT/IDCT is suggested to reduce the chip size. A NIOS II Custom instruction logic is used to implement Q/IQ. Software optimization technique is also explored by using the fast block-matching algorithm for motion estimation (ME). The whole design is described in VHDL language, verified in simulations and implemented in Stratix II EP2S60 FPGA. Finally, the encoder has been tested on the Altera NIOS II development board and can work up to 120 MHz. Implementation results show that when HW/SW codesign is used, a 15.8-16.5 times improvement in coding speed is obtained compared to the software based solution. 相似文献

13.

Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Lei Zhang Meikang Qiu Wei-Che Tseng Edwin H.-M. Sha 《Journal of Signal Processing Systems》2010,58(2):247-265

One of the most critical components that determine the success of an MPSoC based architecture is its on-chip memory. Scratch Pad Memory (SPM) is increasingly being applied to substitute cache as the on-chip memory of embedded MPSoCs due to its superior chip area, power consumption and timing predictability. SPM can be organized as a Virtually Shared SPM (VS-SPM) architecture that takes advantage of both shared and private SPM. However, making effective use of the VS-SPM architecture strongly depends on two inter-dependent problems: variable partitioning and task scheduling. In this paper, we decouple these two problems and solve them in phase-ordered manner. We propose two variable partitioning heuristics based on an initial schedule: High Access Frequency First (HAFF) variable partitioning and Global View Prediction (GVP) variable partitioning. Then, we present a loop pipeline scheduling algorithm known as Rotation Scheduling with Variable Partitioning (RSVP) to improve overall throughput. Our experimental results obtained on MiBench show that the average performance improvements over IDAS (Integrated Data Assignment with Scheduling) are 23.74% for HAFF and 31.91% for GVP on four-core MPSoC. The average schedule length generated by RSVP is 25.96% shorter than that of list scheduling with optimal variable partition. 相似文献

14.

MULTIPAR: behavioral partition for synthesizing multiprocessorarchitectures

Yunn-Yen Chen Yu-Chin Hsu Chung-Ta King 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(1):21-32

相似文献

15.

基于遗传算法的软硬件协同设计探索

郭兆阳《真空电子技术》2005,(2):13-15

软硬件协同设计的目标结构包括一个CPU和多个ASIC,它们通过一条总线进行通信.本文介绍一种用于多目标、多模式系统综合的协同设计的新方法.各个工作模式具有不同的运行概率.在满足设计约束的条件下,我们应用遗传算法对系统的速度和功耗两个目标进行优化.遗传算法是全局算法,它能避免陷入局部最小. 相似文献

16.

改进多路软硬件划分算法的筛选法

曹云边计年吴强《微电子学与计算机》2007,24(1):1-4

文章提出筛选法对基于抽象体系结构模板的多路软硬件划分算法进行了改进,从而使整个软硬件划分-任务调度过程的时间大大缩短。该方法在原算法的软硬件划分和任务调度过程之间加入了一个筛选步骤,对软硬件划分结果的硬件面积进行预估,依据预估的结果进行筛选,筛选后满足要求的划分方案才进行调度,从而大大减少了调度过程的工作量。实验结果表明,加入筛选步骤后,在最终结果性能基本不损失的前提下,整个软硬件划分-任务调度过程的速度有明显提高。相似文献

17.

Synthesis for Mixed Arithmetic

Anne Mignotte Jean-Michel Muller Olivier Peyran 《Design Automation for Embedded Systems》2000,5(1):29-60

The primary goal of this paper is to show that a clever use of redundant number systems in some parts of designs can significantly increase their speed, without noticeably increasing their area and power consumption. This can be achieved by automatically using, in the same design, redundant (e.g., carry save or borrow save) as well as non-redundant (i.e., conventional) number systems: this approach can be called mixed arithmetic. This implies specific constraints in the scheduling process. We propose an integer linear programming (ILP) formulation. It finds an optimal solution for examples of reasonable sizes. In some cases, the ILP computational delay may become huge. To solve this problem, we introduce a general solution, based on a constraint graph partitioning. This leads to an ILP formulation partitioning. This partitioning approach can be used for other similar problems in synthesis, also formulated as ILPs. 相似文献

18.

Efficient Block Scheduling to Minimize Context Switching Time for Programmable Embedded Processors

Inki Hong Miodrag Potkonjak Marios Papaefthymiou 《Design Automation for Embedded Systems》1999,4(4):311-327

Scheduling is one of the most often addressed optimization problems in DSP compilation, behavioral synthesis, and system-level synthesis research. With the rapid pace of changes in modern DSP applications requirements and implementation technologies, however, new types of scheduling challenges arise. This paper is concerned with the problem of scheduling blocks of computations in order to optimize the efficiency of their execution on programmable embedded systems under a realistic timing model of their processors. We describe an effective scheme for scheduling the blocks of any computation on a given system architecture and with a specified algorithm implementing each block. We also present algorithmic techniques for performing optimal block scheduling simultaneously with optimal architecture and algorithm selection. Our techniques address the block scheduling problem for both single- and multiple-processor system platforms and for a variety of optimization objectives including throughput, cost, and power dissipation. We demonstrate the practical effectiveness of our techniques on numerous designs and synthetic examples. 相似文献

19.

行为级DSP算法描述的阵列处理器综合方法研究

许超张增雁《电子学报》1993,21(5):1-9

本文提出了针对递归DSP算法的高层次系统综合流程,并以脉动(systolic)式处理器阵列结构实现.从DSP算法的FDDL行为级描述开始,经由编译及划分,产生数据相关流图(Data Dependency Graph),然后实现对算法流图的空间映射及时域规划,得到算法的信号流图(Signal Flow Graph),经时序重构,生成脉动阵列,最后实现对处理器单元的数据路径综合及控制器综合,并对处理器单元定位,本文同时提出了各设计阶段的算法策略及优化策略,并给出综合结果。相似文献

20.

A Petri Net Model for Hardware/Software Codesign 总被引：4，自引：0，他引：4

Paulo Maciel Edna Barros Wolfgang Rosenstiel 《Design Automation for Embedded Systems》1999,4(4):243-310

相似文献