首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reducing address arithmetic operations by optimization of address offset assignment greatly improves the performance of digital signal processor (DSP) applications. However, minimizing address operations alone may not directly reduce code size and schedule length for DSPs with multiple functional units. Little research work has been conducted on loop optimization with address offset assignment problem for architectures with multiple functional units. In this paper, we combine loop scheduling, array interleaving, and address assignment to minimize the schedule length and the number of address operations for loops on DSP architectures with multiple functional units. Array interleaving is applied to optimize address assignment for arrays in loop scheduling process. An algorithm, Address Operation Reduction Rotation Scheduling (AORRS), is proposed. The algorithm minimizes both schedule length and the number of address operations. with to list scheduling, AORRS shows an average reduction of 38.4% in schedule length and an average reduction of 31.7% in the number of address operations. Compared with rotation scheduling, AORRS shows an average reduction of 15.9% in schedule length and 33.6% in the number of address operations.   相似文献   

2.
Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable tool in optimizing 1-D signal processing applications, represented by flow graphs. Such transformation can maximize the parallelism of a loop body. Few results on retiming have been obtained for multidimensional (MD) systems. The article develops a novel framework, which consists of a MD retiming technique that considers the final schedule as part of the optimization process. To the author's knowledge, this is the first retiming algorithm on general MD flow graphs  相似文献   

3.
The performance of computation-intensive digital signal processing applications running on parallel systems is highly dependent on communication delays imposed by the parallel architecture. In order to obtain a more compact task/processor assignment, a scheduling algorithm considering the communication time between processors needs to be investigated. Such applications usually contain iterative or recursive segments that are modeled as communication sensitive data flow graphs (CS-DFGs), where nodes represent computational tasks and edges represent dependencies between them. Based on the theorems derived, this paper presents a novel efficient technique called cyclo-compaction scheduling, which is applied to a CS-DFG to obtain a better schedule. This new method takes into account the data transmission time, loop carried dependencies, and the target architecture. It implicitly uses the retiming technique (loop pipelining) and a task remapping procedure to allocate processors and to iteratively improve the parallelism while handling the underlying communication and resource constraints. Experimental results on different architectures demonstrate that this algorithm yields significant improvement over existing methods. For some applications, the final schedule length is less than one half of its initial length  相似文献   

4.
5.
Non-volatile memories (NVMs) show great potential in replacing DRAM as the main memory in many embedded systems because of their attractive characteristics such as low cost, high density, and low energy consumption. However, the problem of asymmetric read and write costs has to be addressed before the advantages of NVM can be fully exploited. That is, the cost of write operation is much more expensive than the cost of read operation on NVMs. The existing techniques for loop optimization cannot be used effectively with non-volatile main memory because this special feature is not considered. In this paper, we propose an efficient loop scheduling algorithm, the Rotation with Maximum Bipartite Matching (RMBM) algorithm, to address the problem of expensive write operations on non-volatile main memory for chip multiprocessors (CMPs). It achieves high parallelism for a loop and, at the same time, reduces the number of write operations on NVM. The experimental results show that the RMBM algorithm reduces the number of write activities on NVM by 34.5 % on average compared with the traditional rotation scheduling algorithm. The execution time is reduced by 20.5 %, and the energy consumption is also reduced by 15.03 % on average using the RMBM algorithm. In other words, the average lifetime of NVM can be extended by more than 2 times using the proposed technique.  相似文献   

6.
We propose a new segment-based approach for a real-time reactive rescheduling method. The approach is applicable to actual large-scale manufacturing systems, such as fully automated semiconductor wafer fabrication lines. The proposed method can efficiently (in terms of computation time) schedule processing operations at equipment units by dividing a wide scheduling range into small segments and by using a greedy scheduling algorithm. It can also reactivate infeasible schedules without sacrificing the quality of schedules in terms of productivity as much as possible. From the simulation results obtained, we experimentally confirmed that the proposed method reactivates, without significant productivity loss caused by the rescheduling algorithm itself, infeasible schedules faster than a comparative method commonly in use today. Consequently, the proposed method manages to keep schedules at each equipment unit executable in terms of processing performance and schedule quality  相似文献   

7.
Multiple on-chip memory modules are attractive to many high-performance digital signal processing (DSP) applications. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multiple memory modules remains difficult. The performance gain in this kind of architecture strongly depends on variable partitioning and scheduling techniques. In this paper, we propose a graph model known as the variable independence graph (VIG) and algorithms to tackle the variable partitioning problem. Our results show that VIG is more effective than interference graph for solving variable partitioning problem. Then, we present a scheduling algorithm known as the rotation scheduling with variable repartition (RSVR) to improve the schedule lengths efficiently on a multiple memory module architecture. This algorithm adjusts the variable partitions during scheduling and generates a compact schedule based on retiming and software pipelining. The experimental results show that the average improvement on schedule lengths is 44.8% by using RSVR with VIG. We also propose a design space exploration algorithm using RSVR to find the minimum number of memory modules and functional units satisfying a schedule length requirement. The algorithm produces more feasible solutions with equal or fewer number of functional units compared with the method using interference graph.  相似文献   

8.
Real-Time Dynamic Voltage Loop Scheduling for Multi-Core Embedded Systems   总被引:1,自引:0,他引:1  
In this brief, we propose a novel real-time loop-scheduling technique to minimize energy consumption via dynamic voltage scaling (DVS) for applications with loops considering transition overhead. One algorithm, dynamic voltage loop scheduling (DVLS), is designed integrating with DVS. In DVLS, we repeatedly regroup a loop based on rotation scheduling and decrease the energy by DVS as much as possible within a timing constraint. We conduct the experiments on a set of digital signal processing benchmarks. The experimental results show that DVLS achieves big energy saving compared with the traditional time-performance-oriented scheduling algorithm  相似文献   

9.
New metrics and scheduling rules for disassembly and bulk recycling   总被引:1,自引:0,他引:1  
In recent years, growing quantities of end-of-life electronics have increased the amount of attention devoted to product recovery. Research on end-of-life electronics returns has primarily focused on manual disassembly operations. In this paper, we focus on the scheduling problem for a facility with staging, manual disassembly operations, and bulk recycling. In bulk recycling, shredding or grinding reduces the size of the material fragments while magnetic, eddy current or other density separation techniques separate the material fragments. Unlike production, there are often no due dates in materials recovery processing. Recyclers can sell the recovered materials to material commodity buyers at any time. However, recyclers wait to accumulate a shipment of material to reduce transportation costs and meet minimum sales quantities. Another important difference between production and recycling is that manufacturers purchase raw materials while recyclers may be paid to receive products. When due dates do not apply to scheduling products for materials recycling and product receipts generate revenue for recycling services, we propose two new metrics: the staging space turnover and the shipment fill time. We use our metrics to analyze new scheduling rules for disassembly and bulk recycling and to evaluate their performance. Using discrete-event simulation models, we test our scheduling rules on seven product families, where product families are defined based on material composition and separation operations. Of the rules we test, the disassembly scheduling rule which ranks product families based on the ratio of product size to disassembly time (SDT) most quickly empties the staging space. Shipment fill time is less sensitive to our scheduling rules. Our results illustrate how a recycler can reduce incoming product inventory with a new scheduling rule.  相似文献   

10.
11.
Because of the complexity of a short-term scheduling problem for crude oil operations, some constraints are ignored in modeling the system by the existing approaches, leading to an infeasible solution. To avoid this, the short-term scheduling problem can be studied in control theory perspective by viewing an operation decision in the schedule as a control. With this idea, this paper conducts the schedulability analysis for systems with two and more than two distillers, and the schedulability conditions are presented with the help of Petri net theory. It shows that the number of charging tanks and their capacity, the amount of crude oil of different types in the charging tanks, the oil transportation rate of the pipeline, and the production rate of the system affect the safeness of the system. It also presents the conditions under which the system can reach its maximal production rate. With the safeness conditions and proved results presented in this paper, if a refining schedule is realizable, a feasible detailed schedule for the refining schedule can be easily obtained by creating the operation decisions for the schedule one by one. In the schedule obtained, the starting time of each operation decision can be at any continuous time point, and the schedule is certainly feasible, which overcomes the difficulty faced by techniques that are based on mathematical programming methods.  相似文献   

12.
13.
This paper proposes multi-hop scheduling algorithms for the All-to-All Broadcast (AAB) problem in Wavelength Division Multiplexed (WDM) optical star networks. The multi-hop AAB problem can be split into two subproblems: Logical Topologies Construction (LTC) problem, and Transmission Scheduling (TS) problem. For improving the efficiency of multi-hop scheduling, we focus on a new multi-hop transmission model and transfer the LTC problem to a special case of the Round Robin Tournament (RRT) problem. In the proposed logical topologies, our multi-hop scheduling algorithms can easily overlap the tuning latency and reduce the number of tuning operations on each node. We compare our results with previous research in terms of schedule length. Overall results indicate that our multi-hop scheduling algorithms have better performance than previous algorithms.  相似文献   

14.
We present some necessary and sufficient conditions for a perfect reconstruction (PR) modified multidimensional (MD) delay chain system. With these results, one is able to systematically check if a D-dimensional delay chain system with a D×D sampling matrix M and a D×D delay matrix L has a PR property in a simpler way than before, where the matrix module operations are avoided. Moreover, given a D×D sampling matrix M, in many cases one can determine all possible D×D delay matrices L so that the delay chain systems with the sampling matrix M have a PR property. Several examples are provided. We also present a method to generate D×D sampling and delay matrices M and L such that their corresponding traditional MD delay chain systems are PR  相似文献   

15.
Hardware/Software co-design is an increasingly common design style for integrated circuits. It allows the majority of a system to designed quickly with standardized parts, while special purpose hardware is used for the time critical portions of the system. The framework considered in this paper performs Hardware/Multi-Software (HMS) co-design for iterative loops, given an input specification that includes the system to be built, the number of available processors, the total chip area, and the required response time. Originally, all operations are done in software. The system then substitutes hardware (adder, multiplier, bus) for software based on theneedability of each type of hardware unit. After a new hardware unit is introduced the system is rescheduled using a variation of rotation scheduling in which operations may be moved between processors. Experimental results are shown that illustrate the efficiency of the algorithms as well as the savings achieved.  相似文献   

16.
Multimedia SIMD extensions are commonly employed today to speed up media processing. When performing vectorization for SIMD architectures, one of the major issues is to handle the problem of memory alignment. Prior study focused on either vectorizing loops with all memory references being properly aligned, or introducing extra operations to deal with the misaligned memory references. On the other hand, multi-core SIMD architectures require coarse-grain parallelism. Therefore, it is an important problem to study how to parallelize and vectorize loop nests with the awareness of data misalignments. This paper presents a loop transformation scheme that maximizes the parallelism of outermost loops, while the misaligned memory references in innermost loops are reduced. The basic idea of our technique is to align each level of loops in the nest, considering the constraint of dependence relations. To reduce the data misalignments, we establish a mathematical model with a concept of offset-collection and propose an effective heuristic algorithm. For coarser-grain parallelism, we propose some rules to analyze the outermost loop. When transformations are applied, the inner loops are involved to maximize the parallelism. To avoid introducing more data misalignments, the involved innermost loop is handled from other levels of loops. Experimental results show that 7 % to 37 % (on average 18.4 %) misaligned memory references can be reduced. The simulations on CELL show that 1.1x speedup can be reached by reducing the misaligned data, while 6.14x speedup can be achieved by enhancing the parallelism for multi-core.  相似文献   

17.
Instruction scheduling for instruction level parallel processors   总被引:5,自引:0,他引:5  
Nearly all personal computer and workstation processors, and virtually all high-performance embedded processor cores, now embody instruction level parallel (ILP) processing in the form of superscalar or very long instruction word (VLIW) architectures. ILP processors put much more of a burden on compilers; without "heroic" compiling techniques, most such processors fall far short of their performance goals. Those techniques are largely found in the high-level optimization phase and in the code generation phase; they are also collectively called instruction scheduling. This paper reviews the state of the art in code generation for ILP parallel processors. Modern ILP code generation methods move code across basic block boundaries. These methods grew out of techniques for generating horizontal microcode, so we introduce the problem by describing its history. Most modem approaches can be categorized by the shape of the scheduling "region." Some of these regions are loops, and for those techniques known broadly as "Software Pipelining" are used. Software Pipelining techniques are only considered here when there are issues relevant to the region-based techniques presented. The selection of a type of region to use in this process is one of the most controversial questions in code generation; the paper surveys the best known alternatives. The paper then considers two questions: First, given a type of region, how does one pick specific regions of that type in the intermediate code. In conjunction with region selection, we consider region enlargement techniques such as unrolling and branch target expansion. The second question, how does one construct a schedule once regions have been selected, occupies the next section of the paper. Finally, schedule construction using recent, innovative resource modeling based on finite-state automata is then reexamined. The paper includes an extensive bibliography  相似文献   

18.
This paper describes a new approach to high-level synthesis for high throughput applications. Such applications are typically found in real-time video systems such as HDTV. The method is capable of dealing with hierarchical flow graphs containing loops with manifest boundaries and linear index expressions. The algorithm is based on the model of periodic operations which allows optimizations across loop boundaries. Processing units and storage units are minimized simultaneously. The algorithm is implemented in thePHIDEO system. The major parts of this system are the processing unit synthesis, the scheduler and the memory synthesis including address generation.  相似文献   

19.
This paper describes an exact solution methodology, implemented in Rensselaer's Voyager design space exploration system, for solving the scheduling problem in a three-dimensional (3-D) design space: the usual two-dimensional (2-D) design space (which trades off area and schedule length), plus a third dimension representing clock length. Unlike design space exploration methodologies which rely on bounds or estimates, this methodology is guaranteed to find the globally optimal solution to a 3-D scheduling problem. Furthermore, this methodology efficiently prunes the search space, eliminating provably inferior design points through the following: 1) a careful selection of candidate clock lengths and 2) tight bounds on the number of functional units or on the schedule length. Both chaining and multicycle operations are supported  相似文献   

20.
Resource scheduling and routing tree construction in WiMAX mesh centralized scheduling are not defined in the standard and thus are subject to extensive research. In this paper, we consider routing and scheduling in a WiMAX-based mesh network. We assume that nodes are not necessarily stationary, but rather mobile with a mobility that may yield to frequent topology changes (e.g., failure of existing links and creation of new transmission links). We model the joint routing and scheduling as an optimization problem whose objective is either to determine a minimum length schedule by maximizing spectrum spatial reuse or maximizing the network lifetime by routing around the less stable RF-links, while satisfying a set of (uplink/downlink) end-to-end demands. While solving the problem with the two objectives, we study the tradeoffs between these two objectives. We show that minimizing the schedule length forces the joint routing and scheduling problem to generate a routing tree and feasible transmission groups which favor higher spectrum spatial reuse (and hence higher system throughput), irrespective of the robustness of the selected transmission links. In addition, we show that maximizing the network stability or lifetime yields the selection of different routing trees and slot assignments which do not necessarily result in shorter schedule length. We perform numerical experiences where we compare the performances of our proposed models with respect to the network stability and resource spatial reuse.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号