期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks

R. Govindarajan Guang R. Gao Palash Desai 《The Journal of VLSI Signal Processing》2002,31(3):207-229

Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multi-rate large-grain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rate-optimal compile-time (MBRO) schedules for multi-rate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that in constructing the rate-optimal schedule, it directly minimizes the memory requirement by choosing the schedule time of nodes appropriately. Lastly, a new circular-arc interval graph coloring algorithm has been proposed to further reduce the memory requirement by allowing buffer sharing among the arcs of the multi-rate dataflow graph.We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also known as block schedules) proposed by Lee and Messerschmitt (IEEE Transactions on Computers, vol. 36, no. 1, 1987, pp. 24–35), (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao (Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, SC, Jan. 10–13, 1993, pp. 29–42), and (iii) the multi-rate software pipelining (MRSP) algorithm (Govindarajan and Gao, in Proceedings of the 1993 International Conference on Application Specific Array Processors, Venice, Italy, Oct. 25–27, 1993, pp. 77–88). Schedules generated for a number of random dataflow graphs and for a set of DSP application programs using the different scheduling methods are compared. The experimental results have demonstrated a significant improvement (10–20%) in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods, without sacrificing the computation rate. The MBRO method also gives a 20% average improvement in computation rate compared to Lee's Block scheduling method. 相似文献

2.

Reducing Data Hazards on Multi-pipelined DSP Architecture with Loop Scheduling

Sissades Tongsima Chantana Chantrapornchai Edwin H.-M. Sha Nelson L. Passos 《The Journal of VLSI Signal Processing》1998,18(2):111-123

Computation intensive DSP applications usually require parallel/pipelined processors in order to meet specific timing requirements. Data hazards are a major obstacle against the high performance of pipelined systems. This paper presents a novel efficient loop scheduling algorithm that reduces data hazards for such DSP applications. This algorithm has been embedded in a tool, called SHARP, which schedules a pipelined data flow graph to multiple pipelined units while hiding the underlying data hazards and minimizing the execution time. This paper reports significant improvement for some well-known benchmarks showing the efficiency of the scheduling algorithm and the flexibility of the simulation tool. 相似文献

3.

Cost Minimization with HPDFG and Data Mining for Heterogeneous DSP

Jian-Wei Niu Meikang Qiu Xiaofei Wang Jiayin Li Gang Wu Tianzhou Chen 《Journal of Signal Processing Systems》2012,67(3):213-228

Cost minimization and execution-time reduction have become the most important issues in today’s real-time embedded system. Meanwhile, for the DSP (Digital Signal Processing) applications running on embedded system, loops inside them are the most critical part for performance optimization. To optimize the loop iteration patterns, we need to schedule the loop execution order. Due to the uncertainties within the execution time of tasks, we model varied execution times of tasks as random variables and propose a novel data graph model, called HPDFG (Heterogeneous Probabilistic Data-Flow Graph) to model DSP applications on embedded systems. A novel algorithm, LSHAPE, is proposed to minimize the cost and satisfy the timing constraints. First of all, we use the data mining methods to estimate the probabilistic distribution of the execution time variables. Second, we rotate the loops in the application to explore different possible execution patterns. Finally, we combine the list-scheduling and the dynamic programming to generate a near-optimal task allocation and a core-mode assignment. Experimental results demonstrate the effectiveness of our algorithm. Our approach can handle loops efficiently. 相似文献

4.

Determining the minimum iteration period of an algorithm

Kazuhito Ito Keshab K. Parhi 《The Journal of VLSI Signal Processing》1995,11(3):229-244

Digital signal processing algorithms are repetitive in nature. These algorithms are described by iterative data-flow graphs where nodes represent computations and edges represent communications. For all data-flow graphs, there exists a fundamental lower bound on the iteration period referred to as theiteration bound. Determining the iteration bound for signal processing algorithms described by iterative data-flow graphs is an important problem. In this paper we review two existing algorithms for determination of the iteration bound. Then we propose another novel method based on theminimum cycle mean algorithm to determine the iteration bound with a lower polynomial time complexity than the two existing techniques. It is convenient to represent many multi-rate signal processing algorithms by multi-rate data-flow graphs. The iteration bound of a multi-rate data-flow graph (MRDFG) can be determined by considering the single-rate data-flow graph (SRDFG) equivalent of the MRDFG. However, the equivalent single-rate data-flow graph contains many redundant nodes and edges. The iteration bound of the MRDFG can be determined faster if these redundancies in the equivalent SRDFG are first removed. A previous approach has considered elimination of edge redundancy. In this paper we present an approach to eliminatenode redundancy in the MRDFG. We combine elimination of node and edge redundancies to propose a novel algorithm for faster determination of the iteration bound of the MRDFG.This research was supported by the Advanced Research Projects Agency and monitored by Wright—Patterson AFB under contract number F33615-93-C-1309. 相似文献

5.

Properties and Algorithms for Unfolding of Probabilistic Data-Flow Graphs

Sissades Tongsima Timothy W. O''Neil Chantana Chantrapornchai Edwin H.-M. Sha 《The Journal of VLSI Signal Processing》2000,25(3):215-233

It is known that any selection statement (e.g. if and switch-case statements) in an application is associated with a probability which could either be predetermined by user input or chosen at runtime. Such a statement can be regarded as a computation node whose computation time is represented by a random variable. This paper focuses on iterative applications (containing loops) reflecting those uncertainties. Such an application can then be transformed to a probabilistic data-flow graph.Two timing models, the time-invariant and time-variant models, are introduced to characterize the nature of these applications. Since there can be many unfolding factors associated with each of the possible graph outcomes, for the time-invariant model, we propose a means of selecting a constant minimum rate-optimal unfolding factor for unfolding the probabilistic graph. We demonstrate that this factor guarantees the best schedule length.We also suggest a good estimate for choosing an unfolding factor for a graph under the time-variant model. Experiments show that using our selection scheme results in an iteration period close to the theoretical iteration bound of the experimental graph. Furthermore, this paper discusses an alternative approach which selects a few optimal schedules (with respect to the graph outcomes) to be stored in the system. The other possibilities will be represented by a modified template graph. 相似文献

6.

Three-Phase Algorithms for Task Scheduling in Distributed Mobile DSP System with Lifetime Constraints

Jiayin Li Meikang Qiu Jian-Wei Niu Yongxin Zhu Meiqin Liu Tianzhou Chen 《Journal of Signal Processing Systems》2012,67(3):239-253

A distributed mobile DSP system consists of a group of mobile devices with different computing powers. These devices are connected by a wireless network. Parallel processing in the distributed mobile DSP system can provide high computing performance. Due to the fact that most of the mobile devices are battery based, the lifetime of mobile DSP system depends on both the battery behavior and the energy consumption characteristics of tasks. In this paper, we present a systematic system model for task scheduling in mobile DSP system equipped with Dynamic Voltage Scaling (DVS) processors and energy harvesting techniques. We propose the three-phase algorithms to obtain task schedules with shorter total execution time while satisfying the system lifetime constraints. The simulations with randomly generated Directed Acyclic Graphs (DAG) show that our proposed algorithms generate the optimal schedules that can satisfy lifetime constraints. 相似文献

7.

Synthesis of control circuits in folded pipelined DSP architectures 总被引：1，自引：0，他引：1

Parhi K.K. Wang C.-Y. Brown A.P. 《Solid-State Circuits, IEEE Journal of》1992,27(1):29-43

A systematic folding transformation technique to fold any arbitrary signal processing algorithm data-flow graph to a hardware data-flow architecture, for a specified folding set and specified technology constraints, is presented. The folding set specifies the processor and the time partition at which the task is executed and is typically obtained by performing scheduling and resource allocation for the algorithm data-flow graph and the specified iteration period. The constraints imposed on the hardware architecture are also assumed to be known. The technique is used to derive the control circuitry of the hardware architecture. The authors derive conditions for the validity of a specified folding set, and present approaches to generate the dedicated architecture using systematic folding of tasks to operators. They propose automatic retiming and pipelining of algorithms described by data-flow graphs for folding. The folding algorithm is applied after preprocessing the data-flow graph for automated pipelining and retiming 相似文献

8.

Computing lower bounds on functional units before scheduling

Chaudhuri S. Walker R.A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(2):273-279

The authors present a new polynomial-time algorithm for computing lower bounds on the number of functional units (FUs) of each type required to schedule a data flow graph in a specified number of control steps. A formal approach is presented that is guaranteed to find the tightest possible bounds that can be found by relaxing either the precedence constraints or integrality constraints on the scheduling problem. This tight, yet fairly efficient, bounding method can be used to estimate FU area, to generate resource constraints for reducing the search space, or in conjunction with exact techniques for efficient optimal design space exploration 相似文献

9.

Failure-Aware Task Scheduling of Synchronous Data Flow Graphs Under Real-Time Constraints

Chanhee Lee Sungchan Kim Hyunok Oh Soonhoi Ha 《Journal of Signal Processing Systems》2013,73(2):201-212

As more processors are integrated into Multiprocessor System-on-Chips (MPSoCs) via relentless technology scaling, the mean-time-to-failure (MTTF) is reduced to the extent that unexpected processor failures are considered during design time. A popular approach to tolerate processor failures is to migrate tasks on the faulty processor to live processors. This approach, however, is not suitable for real-time digital signal processing (DSP) applications since it may not guarantee real-time constraints. In this paper, we propose the re-scheduling of the entire application to minimize throughput degradation under a latency constraint, given that the application is specified by a Synchronous Data Flow (SDF) graph. We obtain sub-optimal re-scheduling results using a genetic algorithm for each scenario of processor failures at compile-time. If a failure is detected at run-time, the live processors obtain the saved schedule, perform task transfer, and execute the remaining tasks of the current iteration. We compare preemptive and non-preemptive migration policies and propose a hybrid policy to obtain better performance. We demonstrate the viability of the proposed technique through experiments with real-life DSP applications as well as randomly generated graphs under timing constraints and random fault scenarios. 相似文献

10.

An optimization technique for lowering the iteration bound of DSP programs

Frederico Buchholz Maciel Yoshikazu Miyanaga Koji Tochinai 《The Journal of VLSI Signal Processing》1993,5(2-3):273-282

The throughput of a parallel execution of a DSP algorithm is limited by the iteration bound, which is the minimum period between the starts of consecutive iterations. It is given byT _i=max (T _i/D _i), whereT _i andD _i are the total time of operations and the number of delays in loopi, respectively. The execution throughput of a DSP algorithm can be increased by reducing theT _is, and this reduction can be realized by taking as many operations as possible out of loops without changing the semantic of the calculation. Since many DSP algorithms extensively use the four basic arithmetic operations, a simple and effective way of doing this reduction is to apply commutativity, associativity and distributivity on these operations. This paper presents an optimization technique, calledLoop Shrinking, which reduces the iteration bound by using the above method. Loop Shrinking is based on a heuristic method which is time-efficient for simple cases but can also tackle complex examples. An implementation of Loop Shrinking is presented in this article. The results show that it can yield a reduction in the iteration bound near or equal to careful hand-tuning. 相似文献

11.

Synthesis Scheme for Low Power Designs Under Timing Constraints 总被引：4，自引：1，他引：3

Wang Ling Wen Dongxin Yang Xiaozong and Jiang Yingta 《半导体学报》2005,26(2):287-293

To minimize the power consumption with resources operating at multiple voltages a timeconstrained algorithm is presented.The input to the scheme is an unscheduled data flow graph (DFG),and timing or resource constraints.Partitioning is considered with scheduling in the proposed algorithm as multiple voltage design can lead to an increase in interconnection complexity at layout level.That is,in the proposed algorithm power consumption is first reduced by the scheduling step,and then the partitioning step takes over to decrease the interconnection complexity.The timeconstrained algorithm has time complexity of Ｏ（ｎ２）,where n is the number of nodes in the DFG.Experiments with a number of DSP benchmarks show that the proposed algorithm achieves the power reduction under timing constraints by an average of 46.5%. 相似文献

12.

一种基于时间限制条件的低功耗高层综合设计方案

王玲温东新杨孝宗蒋颖涛《半导体学报》2005,26(2)

提出了多电压时间限制下电路功耗最小的高层综合设计算法,其输入为数据流图及时间限制条件.由于多电压设计会引起低层布局时的连线复杂性提高,所以提出的算法在进行高层调度过程同时考虑了低层分区问题,即算法利用调度步骤降低功耗,利用分区步骤来减小连线的复杂性.该算法的时间复杂性为O(n2),n是DFG图中的结点个数.大量的DSP基准实验表明该算法使得电路功耗平均降低46.5%. 相似文献

13.

TDMA scheduling algorithms for wireless sensor networks

Sinem Coleri Ergen Pravin Varaiya 《Wireless Networks》2010,16(4):985-997

Algorithms for scheduling TDMA transmissions in multi-hop networks usually determine the smallest length conflict-free assignment of slots in which each link or node is activated at least once. This is based on the assumption that there are many independent point-to-point flows in the network. In sensor networks however often data are transferred from the sensor nodes to a few central data collectors. The scheduling problem is therefore to determine the smallest length conflict-free assignment of slots during which the packets generated at each node reach their destination. The conflicting node transmissions are determined based on an interference graph, which may be different from connectivity graph due to the broadcast nature of wireless transmissions. We show that this problem is NP-complete. We first propose two centralized heuristic algorithms: one based on direct scheduling of the nodes or node-based scheduling, which is adapted from classical multi-hop scheduling algorithms for general ad hoc networks, and the other based on scheduling the levels in the routing tree before scheduling the nodes or level-based scheduling, which is a novel scheduling algorithm for many-to-one communication in sensor networks. The performance of these algorithms depends on the distribution of the nodes across the levels. We then propose a distributed algorithm based on the distributed coloring of the nodes, that increases the delay by a factor of 10–70 over centralized algorithms for 1000 nodes. We also obtain upper bound for these schedules as a function of the total number of packets generated in the network. 相似文献

14.

Reward-based voltage scheduling for dynamic-priority hard real-time systems

Han-Saem Yun Jihong Kim 《Design Automation for Embedded Systems》2007,11(1):25-48

Reward-based scheduling has been investigated for flexible applications in which an approximate but timely result is acceptable. Meanwhile, significant research efforts have been made on voltage scheduling which exploits the tradeoff between the processor speed and the energy consumption. In this paper, we address the combined scheduling problem of maximizing the total reward of hard real-time systems with a given energy budget. We present an optimal off-line algorithm and an efficient on-line algorithm for jobs with their own release-times/deadlines under Earliest-Deadline-First (EDF) scheduling. Experimental results show that the solution computed by the on-line algorithm is no more than 14% worse than the theoretical optimal solution obtained by the optimal off-line algorithm. This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment). A preliminary version of this article was presented at Real-Time and Embedded Computing Systems and Applications (RTCSA’04). 相似文献

15.

Efficient variable partitioning and scheduling for DSP processors with multiple memory modules

Qingfeng Zhuge Sha E.H.-M. Bin Xiao Chantrapornchai C. 《Signal Processing, IEEE Transactions on》2004,52(4):1090-1099

Multiple on-chip memory modules are attractive to many high-performance digital signal processing (DSP) applications. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multiple memory modules remains difficult. The performance gain in this kind of architecture strongly depends on variable partitioning and scheduling techniques. In this paper, we propose a graph model known as the variable independence graph (VIG) and algorithms to tackle the variable partitioning problem. Our results show that VIG is more effective than interference graph for solving variable partitioning problem. Then, we present a scheduling algorithm known as the rotation scheduling with variable repartition (RSVR) to improve the schedule lengths efficiently on a multiple memory module architecture. This algorithm adjusts the variable partitions during scheduling and generates a compact schedule based on retiming and software pipelining. The experimental results show that the average improvement on schedule lengths is 44.8% by using RSVR with VIG. We also propose a design space exploration algorithm using RSVR to find the minimum number of memory modules and functional units satisfying a schedule length requirement. The algorithm produces more feasible solutions with equal or fewer number of functional units compared with the method using interference graph. 相似文献

16.

An efficient graph algorithm for FSM scheduling

Ti-Yen Yen Wolf W. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(1):98-112

相似文献

17.

Scheduling for Concept-Oriented Rapid Prototyping

Alexander Burst Michael Wolff Bernhard Spitzer Klaus D. Müller-Glaser 《Design Automation for Embedded Systems》2000,5(3-4):265-280

In this paper scheduling strategies for a rapid prototyping system are discussed. Our rapid prototyping system is able to deal with several CASE-tools and generate code for models of heterogenous domains. By using the emerging CASE Data Interchange Format CDIF the model data of CASE-tools is represented tool-independent. This tool-independent layer serves as a basis for analysis, simulation and code generation. The generated code is partitioned in tasks which must be scheduled as fast as possible using a real-time operating system to support high performance applications. We classify scheduling requirements for the constraints of rapid prototyping and present a new scheduling strategy called pseudo-rate scheduling which significantly improve the execution speed of rapid prototyping applications. Additionally, we provide a set of equations to estimate schedulability. Experimental results demonstrate the main advantages of our scheduling strategy. 相似文献

18.

Scheduling of DSP programs onto multiprocessors for maximumthroughput

Hoang P.D. Rabaey J.M. 《Signal Processing, IEEE Transactions on》1993,41(6):2225-2235

A flow graph scheduling algorithm that simultaneously considers pipelining, retiming, parallelism, and hierarchical node decomposition is presented. The ability to simultaneously consider the many types of concurrency allows the scheduler to find efficient multiprocessor solutions for a wide range of DSP applications. It has been implemented as part of a software environment for scheduling DSP programs onto fixed and configurable multiprocessor systems. The results on a set of benchmarks demonstrate that the algorithm achieves near ideal speedups even across programs with different types of concurrency 相似文献

19.

Multihop packet scheduling in WDM/TDM networks with nonnegligibletransceiver tuning times

Marsan M.A. Bianco A. Leonardi E. Neri F. Nucci A. 《Communications, IEEE Transactions on》2000,48(4):692-703

This paper addresses the design of packet transmission schedules in photonic slotted wavelength-division multiplexing/time-division multiplexing broadcast-and-select networks with W wavelengths and N nodes. Nodes are equipped with one tunable-wavelength transmitter with nonnegligible tuning times and one fixed-wavelength receiver. A new scheduling algorithm that exploits multihop packet transfer to shorten the duration of scheduling periods is first proposed. A single-hop scheduling algorithm that performs slightly better than previous proposals is then described. A simulation-based analysis of the two algorithms shows that they jointly lead to significant improvements in both throughput and delay with respect to previous single-hop schedules 相似文献

20.

Scheduling constraint generation for communicating processes

Takach A. Wolf W. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(2):215-230

This paper describes a new algorithm for generation of scheduling constraints in networks of communicating processes. Our model of communication intertwines the schedules of the machines in the network: timing constraints of a machine may affect the schedules of machines communicating with it. This model of communication facilitates the modular specification of timing constraints. A feasible solution to the set of constraints generated gives a schedule for each machine in the network such that all internal constraints of each machine are satisfied and communication between machines is statically coordinated whenever possible. Static scheduling of communication saves on the cost of handshake associated with dynamic synchronization. Our algorithm can handle complex, state-dependent and cyclic timing constraints. Experimental results show that our algorithm is both effective and efficient 相似文献