首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to satisfy cost and performance requirements, digital signal processing and telecommunication systems are generally implemented with a combination of different components, from custom-designed chips to off-the-shelf processors. These components vary in their area, performance, programmability and so on, and the system functionality is partitioned amongst the components to best utilize this tradeoff. However, for performance critical designs, it is not sufficient to only implement the critical sections as custom-designed high-performance hardware, but it is also necessary to pipeline the system at several levels of granularity. We present a design flow and an algorithm to first allocate software and hardware components, and then partition and pipeline a throughput-constrained specification amongst the selected components. This is performed to best satisfy the throughput constraint at minimal application-specific integrated-circuit cost. Our ability to incorporate partitioning with pipelining at several levels of granularity enables us to attain high throughput designs, and also distinguishes this paper from previously proposed hardware/software partitioning algorithms  相似文献   

2.
While hardware/software partitioning has been shown to provide significant performance gains, most hardware/software partitioning approaches are limited to partitioning computational kernels utilizing integers or fixed point implementations. Software developers often initially develop an application using floating point representations built-in to most programming languages and later convert the application to a fixed point representation—a potentially time consuming process. In this paper, we present the Arizona Float Fixed Hardware Library (AFFHL) consisting of efficient, configurable floating point to fixed point and fixed point to floating point hardware converters. By utilizing these converters, a system’s hardware/software implementation can be separated into a floating point domain consisting of the microprocessor and memory subsystem and a fixed point domain consisting of one or more partitioned hardware coprocessors. This separation enables a rapid hardware/software partitioning approach in which floating point software kernels can be implemented using fixed point hardware coprocessors without the need for application developers to first rewrite software applications as fixed point implementations. We further present an overview of a basic hardware/software partitioning methodology for rapidly partitioning computational kernels within floating point software application to either statically determined fixed point hardware coprocessors or dynamically adaptable fixed point hardware coprocessors in which the required fixed point representation can be dynamically determined and adjusted at runtime.  相似文献   

3.
One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioning are needed. The paper presents an algorithm using integer programming for solving the hardware/software partitioning problem leading to promising results.  相似文献   

4.
We present Avalanche, a prototyping framework that addresses the issues of power estimation and optimization for mixed hardware and software embedded systems. Avalanche is based on a generic embedded system architecture consisting of embedded CPU, custom hardware, and a memory hierarchy. For system-level power estimation, given various system parameters like cache sizes, cache policies, and bus width, etc., Avalanche is able to rapidly evaluate/estimate power and performance and thus facilitate comprehensive design space explorations. For system-level power optimization, Avalanche offers different modes reflecting various design scenarios: if no hardware/software partitioning or only partial partitioning has been conducted, Avalanche guides the designer in finding power-aware hardware/software partitioning; when a system has already been partitioned, Avalanche can optimize system parameters such as cache and memory size; if system parameters and partitioning are given, Avalanche applies additional optimizations for power including source-to-source compiler transformations. Avalanche has been deployed during the design phase of real-world applications including an MPEG II encoder in a set-top box design. Extensive design space explorations in terms of power and performance could be conducted within several hours and various optimization techniques led to power reductions of up to 94% without performance losses and only a slight increases in total chip size (i.e., transistor count).  相似文献   

5.
程序执行轨迹(Program executions trace,以下简称trace)是程序执行过程的指令流信息的记录,trace完整地记录了程序执行过程中所执行指令的内容和顺序。对于大多数程序,少数几个较短的热trace决定了系统的总体性能。本文提出了基于程序执行轨迹提取加速模块的软硬件划分方法。利用热trace提取算法划分系统中关键的trace到硬件,使用分支断言构造原子执行单位,以较小的硬件代价获得较高的加速比。在本文实验中,与采用模拟退火算法的指令级细粒度划分相比,获得的性能平均高9.6%,最终结果硬件面积小29%。  相似文献   

6.
This paper presents two heuristics for automatic hardware/software partitioning of system level specifications. Partitioning is performed at the granularity of blocks, loops, subprograms, and processes with the objective of performance optimization with a limited hardware and software cost. We define the metric values for partitioning and develop a cost function that guides partitioning towards the desired objective. We consider minimization of communication cost and improvement of the overall parallelism as essential criteria during partitioning. Two heuristics for hardware/software partitioning, formulated as a graph partitioning problem, are presented: one based on simulated annealing and the other on tabu search. Results of extensive experiments, including real-life examples, show the clear superiority of the tabu search based algorithm.  相似文献   

7.
In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications.  相似文献   

8.
Reconfigurable computers (RCs) host multiple field programmable gate arrays (FPGAs) and one or more physical memories that communicate through an interconnection fabric. State-of-the-art RCs provide abundant hardware and storage resources, but have tight constraints on FPGA pin-out and inter-FPGA interconnection resources. These stringent constraints are the primary impediment for multi-FPGA partitioning tools to generate high-quality designs, in this paper, we present two integrated partitioning and synthesis approaches for RCs. The first approach involves fine-grained partitioning of a scheduled data-flow graph (DFG, or an operation graph), and the second involves a coarse-grained partitioning of an unscheduled control data flow graph (CDFG, or a block graph). A hardware design space exploration engine is integrated with the block graph partitioner that dynamically contemplates multiple schedules during partitioning. The novel feature in the partitioning approaches is that the physical memory in the RC is effectively used to alleviate the FPGA pin-out and inter-FPGA interconnection bottle-neck. Several experiments have been conducted, targeting commercial multi-FPGA boards, to compare the two partitioning approaches, and detailed summaries are presented  相似文献   

9.
Overlay routing has emerged as a promising approach to improving performance and reliability of Internet paths. To fully realize the potential of overlay routing under the constraints of deployment costs in terms of hardware, network connectivity and human effort, it is critical to carefully place infrastructure overlay nodes to balance the tradeoff between performance and resource constraints. In this paper, we investigate approaches to perform intelligent placement of overlay nodes to facilitate (i) resilient routing and (ii) TCP performance improvement. We formulate objective functions to capture application behavior: reliability and TCP performance, and develop several placement algorithms, which offer a wide range of tradeoffs in complexity and required knowledge of the client-server location and traffic load. Using simulations on synthetic and real Internet topologies, and PlanetLab experiments, we demonstrate the effectiveness of the placement algorithms and objective functions developed, respectively. We conclude that a hybrid approach combining greedy and random approaches provides the best tradeoff between computational efficiency and accuracy. We also uncover the fundamental challenge in simultaneously optimizing for reliability and TCP performance, and propose a simple unified algorithm to achieve both.   相似文献   

10.
Automatic Performance Setting for Dynamic Voltage Scaling   总被引:1,自引:0,他引:1  
The emphasis on processors that are both low power and high performance has resulted in the incorporation of dynamic voltage scaling into processor designs. This feature allows one to make fine granularity tradeoffs between power use and performance, provided there is a mechanism in the OS to control that tradeoff. In this paper, we describe a novel software approach to automatically controlling dynamic voltage scaling in order to optimize energy use. Our mechanism is implemented in the Linux kernel and requires no modification of user programs. Unlike previous automated approaches, our method works equally well with irregular and multiprogrammed workloads. Moreover, it has the ability to ensure that the quality of interactive performance is within user specified parameters. Our experiments show that as a result of our algorithm, processor energy savings of as much as 75% can be achieved with only a minimal impact on the user experience.  相似文献   

11.
In this paper we present a software programmable design flow that facilitates the implementation and integration of efficient digital pre-distortion (DPD) solutions on the leading-edge field programmable gate arrays, combining industry-standard embedded processors and programmable logic fabric into one chip. In addition to software programmability, another key contribution of this design flow is the flexible partitioning of functionality among the hardware and software components, depending on the complexity of the DPD parameter estimation algorithm in use. We have applied processor-specific optimizations to the software implementation and used Vivado high-level synthesis (HLS) tool as the design tool for the programmable logic. Furthermore, we have compared two different techniques for the integration of hardware and software components, where we have chosen the one with better area/latency trade-off. We present a comprehensive study reporting the DPD parameter update times when exploring the partitioning of the functionality among hardware and software. For low-complexity algorithms, we show that a software-only solution is applicable after carrying out the processor-specific software optimizations. For higher-complexity algorithms, we use Vivado HLS to accelerate the time-consuming blocks in the programmable logic, leading to a speed-up factor of up to 7× in the overall algorithm execution time. We present the performance results for two target devices. We also show that our accelerators use only a small portion of the programmable logic fabric on these devices and that a significant reduction of the system’s energy consumption can be obtained by leveraging the FPGA fabric.  相似文献   

12.
13.
Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures.  相似文献   

14.
吴伟  朱樟明 《电子质量》2004,(8):60-62,84
给出了基于SystemC的处理器片上系统(System On a Chip)的协同仿真的两种方法.并通过对系统的仿真,对两个方法进行了对比,给出了在仿真间隔时间、速度和其他性能之间的比较.对目前SOC的软硬件协同设计验证有一定的实际意义.  相似文献   

15.
Design of real-time electronic systems is critical since these systems include performance constraints as part of their requirements. The goal is to map all functions of such systems on to distributed hardware/software architecture in such a way that all performance constraints can be met. Hardware/software codesign approaches are an important issue. The aim of this paper is to discuss a case study of an X25 system design using a hardware/software co-design methodology. Several alternatives are discussed with respect to their performance. A prototype of the X25 system, which correctly implements the system functionality while meeting real-time requirements, has been experimentally checked.  相似文献   

16.
姜东  李波  李炜  宋建斌 《电子学报》2006,34(11):1941-1946
本文提出了一种基于ZIG-ZAG交织的灵活宏块排列算法ZFMO,按ZIG-ZAG扫描方式沿次对角线方向上交织排列宏块,从而在低位率下达到了编码效率和容错性能的最佳平衡点.实验结果表明,ZFMO在低丢包率的网络环境下比当前H.264/AVC的推荐算法具有更好的编码效率和容错性能.经过率失真优化后的ZFMO算法性能进一步提高.  相似文献   

17.
With modern technology, a very-large-scale system may contain several million gates. To achieve an optimal multiple-level partitioning of such a system onto a fixed hierarchy hardware accelerator presents a formidable challenge to even the fastest computing engines currently available. The application of a divide-and-conquer heuristic coupled with a novel ratio-cut algorithm that solves the above problem under a variety, of constraints is described. The goal of this approach is to minimize the communication cost in the hierarchy. Experiments with designs containing up to two million gates are described, and it is demonstrated that the proposed approach decreased communication costs by a factor of two or more when compared with other approaches. This approach enables the hardware simulator to perform approximately three billion gate evaluations per second. or approximately 200 million event evaluations in an event-driven simulator, using a 6% activity rate  相似文献   

18.
In this paper, we address the joint data-aided estimation of frequency offsets and channel coefficients in uplink multiple-input multiple-output orthogonal frequency-division multiple access (MIMO-OFDMA) systems. As the maximum-likelihood (ML) estimator is impractical in this context, we introduce a family of suboptimal estimators with the aim of exhibiting an attractive tradeoff between performance and complexity. The estimators do not rely on a particular subcarrier assignment scheme and are, thus, valid for a large number of OFDMA systems. As far as complexity is concerned, the computational cost of the proposed estimators is shown to be significantly reduced compared to existing estimators based on ML. As far as performance is concerned, the proposed suboptimal estimators are shown to be asymptotically efficient, i.e., the covariance matrix of the estimation error achieves the Cramer-Rao bound when the total number of subcarriers increases. Simulation results sustain our claims.  相似文献   

19.
Virtual Prototyping For Modular And Flexible Hardware-Software Systems   总被引:2,自引:0,他引:2  
The goal of this work is to develop a methodology for fast prototyping of highly modular and flexible electronic systems including both, software and hardware. The main contribution of this work is the ability to handle a wide range of architectures. We assume that hardware/software partitioning is already made. This stage of the codesign process starts with a virtual prototype, an heterogeneous architecture composed of a set of distributed modules, represented in VHDL for hardware elements and in C for software elements, communicating through communication modules. This work concentrates on a modelling strategy that allow virtual prototype to be used for both cosynthesis (mapping hardware and software modules onto an architectural platform) and cosimulation (that is the joint simulation of hardware and software components) into an unified environment. The main contribution is the use of a multi-view library concept in order to hide specific hardware/software implementation details and communication schemes. In particular this approach addresses the problem of communication between the hardware and software modules.  相似文献   

20.
基于SoC的DRM接收机ASIC设计   总被引:1,自引:1,他引:0  
田曦  董在望 《电声技术》2005,(3):61-63,67
DRM是新一代的数字广播体制。针对DRM接收机的ASIC设计,提出了一种采用软硬件协同设计的SoC结构,给出了片上处理单元说明,SoC设计中的软硬件划分、协同设计和验证方法。最后给出了DRM接收机的性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号