首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The authors present a new polynomial-time algorithm for computing lower bounds on the number of functional units (FUs) of each type required to schedule a data flow graph in a specified number of control steps. A formal approach is presented that is guaranteed to find the tightest possible bounds that can be found by relaxing either the precedence constraints or integrality constraints on the scheduling problem. This tight, yet fairly efficient, bounding method can be used to estimate FU area, to generate resource constraints for reducing the search space, or in conjunction with exact techniques for efficient optimal design space exploration  相似文献   

2.
In this paper, we are concerned about performance estimation of bus-based communication architectures assuming that task partitioning and scheduling on processing elements are already determined. Since communication overhead is dynamic and unpredictable due to bus contention, a simulation-based approach seems inevitable for accurate performance estimation. However, it is too time-consuming to be used for exploring the wide design space of bus architectures. We propose a static performance-estimation technique based on a queueing analysis assuming that the memory traces and the task schedule information are given. We use this static estimation technique as the first step in our design space exploration framework to prune the design space drastically before applying a simulation-based approach to the reduced design space. Experimental results show that the proposed technique is several orders of magnitude faster than a trace-driven simulation while keeping the estimation error within 10% consistently in various communication architecture configurations.  相似文献   

3.
Effect of floating-body charge on SOI MOSFET design   总被引:2,自引:0,他引:2  
This work presents a new method for assessing the effect of floating-body charge on a fully- and partially-depleted Silicon-on-Insulator (SOI) MOSFET device design space. Floating-body effects under transient conditions are incorporated into the device design parameters threshold voltage VT and off-current I0FF using calibrated two-dimensional (2-D) device simulation. Simulation methodology which reveals the worst-case bounds of the device design parameters, from idle to switching-steady-state, is presented and applied to a CMOS inverter example. Using this methodology, the worst-case shifts in VT and I0FF due to hysteretic floating-body charge are quantified for devices in L eff=0.2- and 0.1-μm design spaces. Methods to reduce floating-body effects are discussed including a demonstration of how reducing the effective bulk carrier lifetime widens the 0.1-μm design space  相似文献   

4.
Dynamic Partial Reconfiguration (DPR) enables resource sharing in FPGA-based systems. It can also be used for the mitigation of aging-related permanent faults by increasing the number of redundant Partially Reconfigurable Regions (PRRs). Normally, these PRRs are able to host any of the Partially Reconfigurable Modules (PRMs), or tasks, at one particular instance. This kind of system is called homogeneous. However, the FPGA resource constraints limit the amount of homogeneous redundancy that can be used and hence affect the lifetime of the system. This issue can be addressed by utilizing the heterogeneous approach where each PRR now only hosts a subset of the tasks. Further, the deadlines of the applications must also be taken care of in the design phase to decide the mapping and scheduling of tasks to PRRs. To this end, we propose an application-specific multi-objective system-level design methodology to determine the appropriate number of PRRs and the mapping and scheduling of tasks to the PRRs. Specifically, we propose a lifetime-aware scheduling method that maximizes the system's mean time to failure (MTTF) with different tolerances in the makespan specification of an application. We use the scheduler along with an automated floorplanner for design space exploration at design-time to generate a feasible heterogeneous PRR-based system. Our experiments show that the heterogeneous systems can offer more than 2x lifetime improvement over homogeneous ones. It also offers better scaling with increased tolerance in makespan specification.  相似文献   

5.
An integrated top-down design methodology is presented in this brief for synthesizing high performance clock distribution networks based on application dependent localized clock skew. The methodology is divided into four phases: (1) determining an optimal clock skew schedule composed of a set of nonzero clock skew values and the related minimum clock path delays; (2) designing the topology of the clock distribution network with delays assigned to each branch based on the circuit hierarchy, the aforementioned clock skew schedule, and minimizing process and environmental delay variations; (3) designing circuit structures to emulate the delay values assigned to the individual branches of the clock tree; and (4) designing the physical layout of the clock distribution network. The clock distribution network synthesis methodology is based on CMOS technology. The clock lines are transformed from distributed resistive capacitive interconnect lines into purely capacitive interconnect lines by partitioning the RC interconnect lines with inverting repeaters. Variations in process parameters are considered during the circuit design of the clock distribution network to guarantee a race-free circuit. Nominal errors of less than 2.5% for the delay of the clock paths and 7% for the clock skew between any two registers belonging to the same global data path as compared with SPICE Level-3 are demonstrated  相似文献   

6.
Rapid and effective design space exploration at all stages of a design process enables faster design convergence and shorter time-to-market. This is particularly important during the early stage of a design where design decisions can have a significant impact on design convergence. This paper describes a methodology for design space exploration using design target prediction models. These models are driven by legacy design data, technology scaling trends and, an in situ model-fitting process. Experiments on ISCAS benchmark circuits validate the feasibility of the proposed approach and yielded power centric designs that improved power by 7-32% for a corresponding 0-9% performance impact; or performance centric designs with improved performance of 10.31-17% for a corresponding 2-3.85% power penalty. Evolutionary algorithm based Pareto analysis on an industrial 65 nm design uncovered design tradeoffs which are not obvious to designers and optimize both power and performance. The high performance design option of the industrial design improved the straight-ported design's performance by 29% with a 2.5% power penalty, whereas the low power design option reduced the straight-ported design's power consumption by 40% for a 9% performance penalty.  相似文献   

7.
Multiple on-chip memory modules are attractive to many high-performance digital signal processing (DSP) applications. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multiple memory modules remains difficult. The performance gain in this kind of architecture strongly depends on variable partitioning and scheduling techniques. In this paper, we propose a graph model known as the variable independence graph (VIG) and algorithms to tackle the variable partitioning problem. Our results show that VIG is more effective than interference graph for solving variable partitioning problem. Then, we present a scheduling algorithm known as the rotation scheduling with variable repartition (RSVR) to improve the schedule lengths efficiently on a multiple memory module architecture. This algorithm adjusts the variable partitions during scheduling and generates a compact schedule based on retiming and software pipelining. The experimental results show that the average improvement on schedule lengths is 44.8% by using RSVR with VIG. We also propose a design space exploration algorithm using RSVR to find the minimum number of memory modules and functional units satisfying a schedule length requirement. The algorithm produces more feasible solutions with equal or fewer number of functional units compared with the method using interference graph.  相似文献   

8.
Built-in self-test (BIST) has emerged as a promising test solution for high-speed, deep sub-micron VLSI circuits. Traditionally, the testability insertion phase comes after functional logic synthesis and verification in the design cycle. This creates two separate optimisation processes: functional optimisation followed by BIST insertion and optimisation. The first deals with functional design behaviour, while the second deals with test behaviour. Considering testability at such a late stage in the design flow limits efficient design space exploration. In this paper, we consider testability as a design objective alongside area and delay. We extend the concept of design space to include testability and show how this enhanced design space can be used by a high-level synthesis tool. We demonstrate that by taking testability into account at an early stage, we can generate better designs than by leaving BIST insertion to the end of the design cycle.  相似文献   

9.
Applications and architectures for embedded systems are becoming more and more complex. It is difficult to analyze complex embedded applications at early stages of the design flow without generic automated tools and methodologies. Consequently, application analysis under the real input conditions is becoming more and more important. Existing application analysis methodologies mainly focus on a single design objective. A general purpose application analysis methodology is required to satisfy multiple objectives of early design space exploration. This article proposes a general purpose application analysis methodology based on a visitor design pattern. High level source specifications are transformed into a trace tree representation by dynamic analysis. Trace tree representation is analyzed by using a visitor design pattern to get run-time characteristics of the application. Among other outcomes, application characterization and average inherited parallelism are key concerns in this article. Experimental results with MPEG-2 video decoder shows viability of the proposed methodology.  相似文献   

10.
A highly efficient loop-based interconnect modeling methodology is proposed for multigigahertz clock network design and optimization. Closed-form loop resistance and inductance models are proposed for fully shielded global clock interconnect structures, which capture high-frequency effects including inductance and proximity effects. The models are validated through comparisons with electromagnetic simulations and measured data taken from a Power4 chip. This modeling methodology greatly improves the clock interconnect simulation efficiency and enables fast physical design exploration. Examples of interconnect performance optimization are demonstrated and design guidelines are proposed.  相似文献   

11.
Three dimensional (3-D) multi-core SoC has been recognized as a promising solution for implementing complex applications with lower system energy. Recently, voltage–frequency island (VFI)-based design paradigm was widely adopted for energy optimization. However, the existing work commonly targeted 2-D platform, which cannot handle the exacerbated thermal issues and the increased solution space from 3-D integration. In this paper, we propose an optimization framework targeting VFI-based 3-D multi-core SoCs to minimize system energy meanwhile still meeting task deadline and thermal constraints. Our framework conducts at an earlier design phase in which designers have the freedom to determine the core stacks and map them into the hardware platform. Besides energy-aware task scheduling, we also conduct core stacking and task adjusting to balance the powers across the chip for thermal optimization. Moreover, by treating each core stack as a unity, the complicated problem of core mapping and VFI partitioning in 3-D platform can be simplified as a 2-D one. Experimental results demonstrate that on average our framework can achieve an energy reduction of 15.8% over the prior thermal balancing algorithm [17] (X. Zhou, J. Yang, Y. Xu, et al. Thermal-aware task scheduling for 3D multicore processors, IEEE Trans. Parallel Distrib. Syst. (TPDS), 21(1) (2010), 60–71.). Moreover, on average a reduction of 4.8 °C in peak temperature is achieved by our framework, compared with the state-of-the-art energy optimization scheme [8] (U.Y. Ogras, R. Marculescu, P. Choudhary, et al. Voltage–frequency island partitioning for GALS-based networks-on-chip, in: ACM/IEEE Design Automation Conference (DAC), 2007, pp. 110–115.).  相似文献   

12.
Generalized processor sharing (GPS) has been considered as an ideal scheduling discipline based on its end-to-end delay bounds and fairness properties. Until recently, emulation of GPS in a packet server has been regarded as the ideal means of designing a packet-level scheduling algorithm to obtain low delay bounds and bounded unfairness. Strict emulation of GPS, as required in the weighted fair queueing (WFQ) scheduler, however, incurs a time-complexity of O(N) where N is the number of sessions sharing the link. Efforts in the past to simplify the implementation of WFQ, such as self-clocked fair queueing (SCFQ), have resulted in degrading its isolation properties, thus affecting the delay bound. We present a methodology for the design of scheduling algorithms that provide the same end-to-end delay bound as that of WFQ and bounded unfairness without the complexity of GPS emulation. The resulting class of algorithms, called rate-proportional servers (RPSs), are based on isolating scheduler properties that give rise to ideal delay and fairness behavior. Network designers can use this methodology to construct efficient fair-queueing algorithms, balancing their fairness with implementation complexity  相似文献   

13.
14.
A Graph-Based Approach to Power-Constrained SOC Test Scheduling   总被引:2,自引:0,他引:2  
The test scheduling problem is one of the major issues in the test integration of system-on-chip (SOC), and a test schedule is usually influenced by the test access mechanism (TAM). In this paper we propose a graph-based approach to power-constrained test scheduling, with TAM assignment and test conflicts also considered. By mapping a test schedule to a subgraph of the test compatibility graph, an interval graph recognition method can be used to determine the order of the core tests. We then present a heuristic algorithm that can effectively assign TAM wires to the cores, given the test order. With the help of the tabu search method and the test compatibility graph, the proposed algorithm allows rapid exploration of the solution space. Experimental results for the ITC02 benchmarks show that short test length is achieved within reasonable computation time.  相似文献   

15.
Applying system-level fault-tolerant techniques such as active redundancy is a promising way to enhance the system reliability for safety-related applications. Embedded system design using active redundancy is a challenging task that involves solving two major problems, namely finding the optimal redundancy configuration and mapping/scheduling of the application (including the redundant components) to the platform under timing and reliability constraints. This paper presents a framework for automatic synthesis of fault-tolerant designs on multiprocessor platforms. The core of the framework consists of: (1) a reliability analysis, that computes the system-level reliability in the presence spatial and temporal redundancy, and (2) an optimization approach for reliability-aware design space exploration. The proposed approach considers both transient and permanent faults and is among the first to support system design using imperfect fault detectors. The framework takes an application model, a platform model and a set of application requirements as input, and generates the recommended design parameters, including task-to-processor binding, task schedule and the selection/placement of redundancy. The effectiveness of our approach is illustrated using several case studies.  相似文献   

16.
All-optical networks (AONs) with a broadcast-star based physical topology offer the possibility of transmission scheduling to resolve channel and receiver conflicts. This paper considers the problem of scheduling packet transmissions in a wavelength-division multiplexed (WDM) optical network with tunable transmitters and fixed-tuned receivers. The scheduling problem is complicated by tuning latency, a limited number of channels, and arbitrary traffic demands. We first analyze scheduling all-to-all packet transmissions and obtain a new lower bound for the schedule length. The lower bound is achieved by an algorithm proposed by Pieris and Sasaki (1994). We then extend the analysis to the case of arbitrary traffic demands and obtain lower bounds for the schedule length. Two constructions for scheduling algorithms are provided through list scheduling and multigraphs. The upper bounds so obtained not only provide performance guarantees with arbitrary demands, but also nearly meet the lower bound in simulations  相似文献   

17.
The recent MPEG Reconfigurable Media Coding (RMC) standard aims at defining media processing specifications (e.g. video codecs) in a form that abstracts from the implementation platform, but at the same time is an appropriate starting point for implementation on specific targets. To this end, the RMC framework has standardized both an asynchronous dataflow model of computation and an associated specification language. Either are providing the formalism and the theoretical foundation for multimedia specifications. Even though these specifications are abstract and platform-independent the new approach of developing implementations from such initial specifications presents obvious advantages over the approaches based on classical sequential specifications. The advantages appear particularly appealing when targeting the current and emerging homogeneous and heterogeneous manycore or multicore processing platforms. These highly parallel computing machines are gradually replacing single-core processors, particularly when the system design aims at reducing power dissipation or at increasing throughput. However, a straightforward mapping of an abstract dataflow specification onto a concurrent and heterogeneous platform does often not produce an efficient result. Before an abstract specification can be translated into an efficient implementation in software and hardware, the dataflow networks need to be partitioned and then mapped to individual processing elements. Moreover, system performance requirements need to be accounted for in the design optimization process. This paper discusses the state of the art of the combinatorial problems that need to be faced at this design space exploration step. Some recent developments and experimental results for image and video coding applications are illustrated. Both well-known and novel heuristics for problems such as mapping, scheduling and buffer minimization are investigated in the specific context of exploring the design space of dataflow program implementations.  相似文献   

18.
19.
In this paper, we examine the multicriteria optimization involved in scheduling for data-path synthesis (DPS). The criteria we examine are the area cost of the components and schedule time. Scheduling for DPS is a well-known NP-complete problem. We present a method to find nondominated schedules using a combination of restricted search and heuristic scheduling techniques. Our method supports design with architectural constraints such as the total number of functional units, buses, etc. The schedules produced have been taken to completion using GABIND as written by Mandal et al., and the results are promising  相似文献   

20.
We evaluate four scheduling algorithms for satellite communications that use the Time Division Multiple Access methodology. All the algorithms considered are based on the open‐shop model. The open‐shop model is suitably represented or modified to exploit some existing algorithms to solve the satellite communication problem. In the first two algorithms, namely pre‐emptive scheduling with no intersatellite links and greedy heuristics with two intersatellite links, a (traffic) matrix representation of the open‐shop model is used to get a near optimal schedule. In the next two algorithms, generalized heuristic algorithm and the branch and bound algorithm, the open‐shop model is modified to accommodate the inter‐satellite link and this modified open‐shop model is used to solve for a near optimal schedule. The basic methodology of all the algorithms are briefly described and their performance was evaluated through extensive simulations. The performance criteria to evaluate the algorithms are—run time of the algorithms, schedule lengths, and optimality of the algorithm against theoretical bounds. Three of the above‐mentioned algorithms are evaluated by comparing the performance criteria under similar conditions. Optimal branch and bound algorithm is not evaluated due to its high complexity. The general heuristic algorithm is found to give a good trade off between computation time and optimality. The computation time is comparable with the pre‐emptive scheduling algorithm and greedy heuristic algorithm and the schedule length achieved is near to the lower bound value. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号