期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

ILP-based cost-optimal DSP synthesis with module selection and dataformat conversion

Ito K. Lucke L.E. Parhi K.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(4):582-594

相似文献

2.

Cost Effective Shared Path Protection for WDM Optical Mesh Networks with Partial Wavelength Conversion

Li Tianjian Wang Bin 《Photonic Network Communications》2004,8(3):251-266

In this paper, we study routing and wavelength assignment of connection requests in survivable WDM optical mesh networks employing shared path protection with partial wavelength conversion while 100% restorability is guaranteed against any single failures. We formulate the problem as a linear integer program under a static traffic model. The objective is to minimize the total cost of wavelength-links and wavelength converters used by working paths and protection paths of all connections. A weight factor is used which is defined as the cost ratio of a wavelength converter and a wavelength-link. Depending on the relative cost of bandwidth and wavelength conversion, the optimization objective allows a proper tradeoff between the two. The proposed algorithm, the shortest-widest-path-first (SWPF) algorithm, uses a modified Dijkstra's algorithm to find a working path and a protection path for each connection request in the wavelength graph transformed from the original network topology. When there are multiple candidate paths that have the same minimum total cost, the path along which the maximum number of converters used at each node is minimized is chosen by the SWPF algorithm. We have evaluated the effectiveness of the proposed algorithm via extensive simulation. The results indicate that the performance of the proposed algorithm is very close to that of the optimal solutions obtained by solving the ILP formulation and outperforms existing heuristic algorithms in terms of total number of converters used and the maximum number of converters required at each node in the network. The proposed algorithm also achieves slightly better performance in terms of total cost of wavelength-links and converters used by all connections. We also investigated shared path protection employing converter sharing. The results show that the technique can reduce not only the total number of converters used in the network but also the maximum number of converters required at each node, especially when a large number of converters are needed in the network. In this study, although the ILP formulation is based on static traffic, the proposed algorithm is also applicable to routing dynamic connection requests. 相似文献

3.

Synthesis of Hard Real-Time Application Specific Systems

Chunho Lee Miodrag Potkonjak Wayne Wolf 《Design Automation for Embedded Systems》1999,4(4):215-242

This paper presents a system level approach for the synthesis of hard real-time multitask application specific systems. The algorithm takes into account task precedence constraints among multiple hard real-time tasks and targets a multiprocessor system consisting of a set of heterogeneous off-the-shelf processors. The optimization goal is to select a minimal cost multi-subset of processors while satisfying all the required timing and precedence constraints. There are three design phases: resource allocation, assignment, and scheduling. Since the resource allocation is a search for a minimal cost multi-subset of processors, we adopted an A* search based technique for the first synthesis phase. A variation of the force-directed optimization technique is used to assign a task to an allocated processor. The final scheduling of a hard-real time task is done by the task level scheduler which is based on Earliest Deadline First (EDF) scheduling policy. Our task level scheduler incorporates force-directed scheduling methodology to address the situations where EDF is not optimal. The experimental results on a variety of examples show that the approach is highly effective and efficient. 相似文献

4.

A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors

R. Govindarajan Erik R. Altman Guang R. Gao 《Design Automation for Embedded Systems》2002,6(3):243-275

Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Unlike conventional general purpose processors, ASIPs and embedded processors typically run a single application and hence must be optimized extensively for this in order to extract maximum performance. Further, low power and low cost requirements of ASIPs may demand reuse of pipeline stages causing pipelines with complex structural hazards. In such architectures, exploiting higher ILP is a major challenge to the designer.Existing techniques deal with either scheduling hardware pipelines to obtain higher throughput or software pipelining—an instruction scheduling technique for iterative computation—for exploiting greater ILP. We integrate these techniques to co-schedule hardware and software pipelines to achieve greater instruction throughput. In this paper, we develop the underlying theory of Co-Scheduling, called the Modulo-Scheduled Pipeline (or MS-Pipeline) theory. More specifically, we establish the necessary and sufficient condition for achieving the maximum throughput in a given pipeline operating under modulo scheduling. Further, we establish a sufficient condition to achieve a specified throughput, based on which we also develop a methodology for designing the hardware pipelines that achieve such a throughput. Further, we present initial experimental results which help to establish the usefulness of MS-pipeline theory in software pipelining. As the proposed theory helps to analyze and improve the throughput of Modulo-Scheduled Pipelines (MS-pipelines), it is especially useful in designing ASIPs and embedded processors. 相似文献

5.

超标量、超流水线定点RISC核设计 总被引：1，自引：0，他引：1

韦健张明周琼芳遇岩姚庆栋《电路与系统学报》2001,6(4):56-60

本文从开发指令级并行度ILP的角度出发,分析了超标量、超流水线处理器的体系结构特点,在此基础上给出了一个定点超标量RISC核设计。该设计采用Top-down设计方法,含三个流水执行单元,指令动态调度,实现非阻塞高速缓存non-blocking-caches机制。相似文献

6.

An architectural co-synthesis algorithm for distributed, embeddedcomputing systems

Wolf W.H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1997,5(2):218-229

Many embedded computers are distributed systems, composed of several heterogeneous processors and communication links of varying speeds and topologies. This paper describes a new, heuristic algorithm which simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost. The hardware architecture of the synthesized system consists of a network of processors of multiple types and arbitrary communication topology; the software architecture consists of an allocation of processes to processors and a schedule for the processes. Most previous work in co-synthesis targets an architectural template, whereas this algorithm can synthesize a distributed system of arbitrary topology. The algorithm works from a technology database which describes the available processors, communication links, I/O devices, and implementations of processes on processors. Previous work had proposed solving this problem by integer linear programming (ILP); our algorithm is much faster than ILP and produces high-quality results 相似文献

7.

Coactive scheduling and checkpoint determination during high levelsynthesis of self-recovering microarchitectures

Orailoglu A. Karri R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(3):304-311

The growing trend towards VLSI implementation of crucial tasks in critical applications has increased both the demand for and the scope of fault-tolerant VLSI systems. In this paper, we present a self-recovering microarchitecture synthesis system. In a self-recovering microarchitecture, intermediate results are compared at regular intervals, and if correct saved in registers (checkpointing). On the other hand, on detecting a fault, the self-recovering microarchitecture rolls back to a previous checkpoint and retries. The proposed synthesis system comprises of a heuristic and an optimal subsystem. The heuristic synthesis subsystem has two components. Whereas the checkpoint insertion algorithm identifies good checkpoints by successively eliminating clock cycle boundaries that either have a high checkpoint overhead or violate the retry period constraint, the novel edge-based schedule, assigns edges to clock cycle boundaries, in addition to scheduling nodes to clock cycles. Also, checkpoint insertion and edge-based scheduling are intertwined using a flexible synthesis methodology. We additionally show an Integer Linear Programming model for the self-recovering microarchitecture synthesis problem. The resulting ILP formulation can minimize either the number of voters or the overall hardware, subject to constraints on the number of clock cycles the retry period, and the number of checkpoints 相似文献

8.

On-line testable data path synthesis for minimizing testing time

A. A. Ismaeel R. Bhatnagar R. Mathew 《Microelectronics Reliability》2002,42(3):437-453

相似文献

9.

Digital TDM-FDM Translator with Multistage Structure

Tsuda T. Morita S. Fujii Y. 《Communications, IEEE Transactions on》1978,26(5):734-741

In this paper, a new digital signal processing algorithm for the digital TDM-FDM translator is proposed. The digital TDM-FDM translator, which performs a direct translation between two multiplex formats in the telephone network (time-division-multiplexing (TDM) and frequency-division-multiplexing (FDM)) by using digital techniques, has advantages in accuracy and stability of characteristics over equivalent analog equipments. However from the economical point of view, it largely depends on the cost reduction of semiconductor devices and LSI technologies. The proposed algorthm can be realized using only two digital filters and does not require product modulators or Fast Fourier Transform (FFT) processors. The required number of multiplications, which is closely related to the quantity of hardware, is considerably reduced by the multistage structure of this algorithm. The reduction in the kind of required digital hardware and the required number of multiplications makes it possible to efficiently utilize the new hardware realization techniques of digital filters or multipliers using read-only memories and simple logic devices. Since it is foreseen that cost reduction of memory devices will be more rapid than that of logic devices, the proposed algorithm is expected to be advantageous with regard to cost over existing algorithms where complex multiplier logic is required. The estimation of the computation rate is carried out with reference to a practical case. The computer simulation results are also shown. 相似文献

10.

Multiple-Input-Buffer and Shared-Buffer Architectures for Optical Packet- and Burst-Switching Networks

Yiannopoulos K. Vlachos K.G. Varvarigos E. 《Lightwave Technology, Journal of》2007,25(6):1379-1389

We present an architecture for implementing optical buffers, based on the feed-forward-buffer concept, that can truly emulate input queuing and accommodate asynchronous packet and burst operation. The architecture uses wavelength converters and fixed-length delay lines that are combined to form either a multiple-input buffer or a shared buffer. Both architectures are modular, allowing the expansion of the buffer at a cost that grows logarithmically with the buffer depth, where the cost is measured in terms of the number of switching elements, and wavelength converters are employed. The architectural design also provides a tradeoff between the number of wavelength converters and their tunability. The buffer architectures proposed are complemented with scheduling algorithms that can guarantee lossless communication and are evaluated using physical-layer simulations to obtain their performance in terms of bit-error rate and achievable buffer size. 相似文献

11.

基于内部标签的综合接入系统输入输出调度方案 总被引：1，自引：1，他引：0

韩国栋张兴明等《电讯技术》2003,43(1):115-120

本文在介绍了综合业务接入系统的基本结构基础上 ,介绍了自治系统内部标签 (ILP)信令格式 ,提出了一种新的基于优先级的参数自适应轮循调度 (Priority -basedParameterAuto -adaptiveRound -robinScheduling)方案 ,以及该方案在基于ILP的综合接入系统I/O调度中的实现方法 ,并对几种调度方案的性能进行了实验比较相似文献

12.

Vectorized transforms in scalar processors

Trelewicz J.Q. Mitchell J.L. Brady M.T. 《Signal Processing Magazine, IEEE》2002,19(4):22-31

We disclose a generalized approach to creating efficient implementations of linear, orthogonal transforms, with specific examples discussed for the 8 x 8 DCT used in image compression. We connect this with a method for performing signed, parallel processing in scalar, off-the-shelf processors for integer transforms. Uniform data precision may be used, but is not required for the method. The coefficients resulting from the new algorithm converge more quickly than the approximation made to the coefficients. Furthermore, the new algorithm allows more control of the specific representation chosen for the coefficients, as is detailed below. The methods described were designed for addressing this need with two's-complement arithmetic. Data that can be processed in parallel, because of the algorithm structure, are packed in a "vector" format, described, into registers. Many signed arithmetic operations can be performed on these vectors, including addition, subtraction, multiplication by scalars, shifting, and others. When the parallel processing is completed, the vectors can be unpacked into scalar values for storage or subsequent processing. The importance of these methods lies in their handling of carries and borrows in the packed vector format. The generalized method is described. Notation is given at the beginning to establish consistency through the article. We discuss a generalized approach to integer transforms, using the DCT as a specific example. Then we detail the vector format, which allows vector computation in scalar processors of parallelizable algorithms. The IDCT is used as a numerical example in the discussion of the vector format. The results were developed for high-end printers (e.g., more than 100 pages per minute), where image compression and decompression must be performed in real time, either in FPGAs, or in embedded processors; however, the methods are applicable to a broad range of signal processing systems 相似文献

13.

Scalable dimensioning of optical transport networks for grid excess load handling

Pieter Thysebaert Marc De Leenheer Bruno Volckaert Filip De Turck Bart Dhoedt Piet Demeester 《Photonic Network Communications》2006,12(2):117-132

Grids consist of the aggregation of numerous dispersed computational and storage resources, able to satisfy even the most demanding computing jobs. An important aspect of Grid deployment is the allocation and activation of installed network capacity, needed to transfer data and jobs to and from remote resources. Due to the data-intensive nature of Grid jobs, it is expected that optical transport networks will play an important role in Grid deployment. As Grids possibly consist of high numbers of resources, and users, solving the network dimensioning problem (i.e. determining the number of wavelength channels per fiber and wavelength granularity required) using straightforward Integer Linear Programs (ILP) does not scale well with increasing number of jobs. Therefore, we propose the use of Divisible Load Theory (DLT) when modeling the OCS (with wavelength translation) dimensioning problem in this context. We compare this approach to both an exact ILP and heuristic (derived from the exact ILP) approach as a function of the job arrival process, network related parameters and the Grid job scheduling strategy on the Grid. Results show the convergence of the DLT-based and the exact ILP approach, which indicates that the DLT-based approach is of practical use in cases where the exact ILP-based problem becomes intractable. We study an excess load scenario and evaluate the network cost for varying wavelength granularity, fiber/wavelength cost models, network topology and traffic demand asymmetry under multiple Grid scheduling strategies. Results indicate the suitability of our DLT-based approach as an Optical Transport Network dimensioning tool to be used by network operators. 相似文献

14.

Heuristic Loop-Based Scheduling and Allocation for DSP Synthesis with Heterogeneous Functional Units

Yun-Nan Chang Ching-Yi Wang Keshab K. Parhi 《The Journal of VLSI Signal Processing》1998,19(3):243-256

This paper presents a new heuristic, concurrent, iterative loop-based scheduling and allocation algorithm for high-level synthesis of digital signal processing (DSP) architectures using heterogeneous functional units. In a heterogeneous architecture, functional units could be either bit-serial or digit-serial or bit-parallel. We assume that a library of functional units based on heterogeneous implementation style is available. Experiments show that this new heuristic synthesis approach generates optimal and near-optimal area solutions. Although optimum synthesis of such architectures were proposed recently using an integer linear programming (ILP) model, our method can produce similar solutions in one to two orders of magnitude less time, at the expense of sacrificing the cost optimality. We compare the solutions generated by the proposed algorithm with the optimal solutions generated by the ILP approach and other recent techniques. We have incorporated this new algorithm into the Minnesota ARchitecture Synthesis (MARS-II) system. 相似文献

15.

On Optimal$p$-Cycle-Based Protection in WDM Optical Networks With Sparse-Partial Wavelength Conversion

《Reliability, IEEE Transactions on》2006,55(3):496-506

We study the optimal configuration of$p$-cycles in survivable wavelength division multiplexing (WDM) optical mesh networks with sparse-partial wavelength conversion while 100% restorability is guaranteed against any single failures. We formulate the problem as two integer linear programs (Optimization Models I, and II) which have the same constraints, but different objective functions.$p$-cycles and wavelength converters are optimally determined subject to the constraint that only a given number of nodes have wavelength conversion capability, and the maximum number of wavelength converters that can be placed at such nodes is limited. Optimization Model I has a composite sequential objective function that first (G1) minimizes the cost of link capacity used by all$p$-cycles in order to accommodate a set of traffic demands; and then (G2) minimizes the total number of wavelength converters used in the entire network. In Optimization Model II, the cost of one wavelength converter is measured as the cost of a deployed wavelength link with a length of$alpha$units; and the objective is to minimize the total cost of link capacity & wavelength converters required by$p$-cycle configuration. During$p$-cycle configuration, our schemes fully takes into account wavelength converter sharing, which reduces the number of converters required while attaining a satisfactory level of performance. Our simulation results indicate that the proposed schemes significantly outperform existing approaches in terms of protection cost, number of wavelength conversion sites, and number of wavelength converters needed. 相似文献

16.

VLSI architectures for discrete wavelet transforms

Parhi K.K. Nishitani T. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(2):191-202

A folded architecture and a digit-serial architecture are proposed for implementation of one- and two-dimensional discrete wavelet transforms. In the one-dimensional folded architecture, the computations of all wavelet levels are folded to the same low-pass and high-pass filters. The number of registers in the folded architecture is minimized by the use of a generalized life time analysis. The converter units are synthesized with a minimum number of registers using forward-backward allocation. The advantage of the folded architecture is low latency and its drawbacks are increased hardware area, less than 100% hardware utilization, and the complex routing and interconnection required by the converters used. These drawbacks are eliminated in the alternate digit-serial architecture at the expense of an increase in the system latency and some constraints on the wordlength. In latency-critical applications, the use of the folded architecture is suggested. If latency is not so critical, the digit-serial architecture should be used. The use of a combined folded and digit-serial architecture is proposed for implementation of two-dimensional discrete wavelet transforms 相似文献

17.

Three-Dimensional Feedforward Space Vector Modulation Applied to Multilevel Diode-Clamped Converters

《Industrial Electronics, IEEE Transactions on》2009,56(1):101-109

Simplified space vector modulation (SVM) techniques for multilevel converters are being developed to improve factors such as the computational cost, number of commutations, and voltage distortion. The feedforward SVM presented in this paper takes into account the actual DC capacitor voltage unbalance of the multilevel power converter. The resulting technique is a low-cost generalized feedforward 3-D SVM method and is particularized for three-phase multilevel diode-clamped converters. This new modulation technique can be applied to topologies where the gamma component may not be zero. The computational cost of the proposed method is similar to those of comparable methods, and it is independent of the number of levels of the power converter. Experimental results using a three-level diode-clamped converter are presented to validate the proposed modulation technique. 相似文献

18.

Estimation of BIST Resources During High-Level Synthesis

Ishwar Parulkar Sandeep K. Gupta Melvin A. Breuer 《Journal of Electronic Testing》1998,13(3):221-237

Lower bound estimations of functional resources at various stages of high-level synthesis have been developed to guide synthesis algorithms toward optimal solutions. In this paper we present lower bounds on the number of test resources (i.e., registers that generate pseudo-random test patterns and/or compress test responses) required to test a synthesized data path using built-in self-test (BIST). The bounds on different types of test resources are proved to be individually achievable and experiments show that in most cases the bounds can be achieved simultaneously and with minimum number of functional registers. Efficient ways of computing the lower bounds are developed. The estimations are performed on scheduled data flow graphs with a given module assignment and provide a practical way of selecting or modifying module assignments and schedules such that the resulting synthesized data path requires a small number of BIST resources to test itself. 相似文献

19.

Allocation Techniques for Reducing BIST Area Overhead of Data Paths

Ishwar Parulkar Sandeep K. Gupta Melvin A. Breuer 《Journal of Electronic Testing》1998,13(2):149-166

Built-in self-test (BIST) techniques modify functional hardware so that a chip has the capability to test itself. A prime concern in using BIST is the area overhead due to the modification of normal registers to BIST registers. This paper proposes register and interconnect assignment techniques that address the BIST area overhead issue during high-level synthesis. A minimal intrusion BIST methodology is employed where a subset of the functional registers are modified to be BIST registers. Depending on the BIST functions performed (test pattern generation and/or test response compression) and the concurrency of the functions, four types of BIST registers with varying costs are used. Data path allocation techniques are presented that (1) maximize the sharing of BIST registers between modules, and (2) minimize the number of expensive BIST registers that are essential for minimal intrusion BIST of a data path. The designs synthesized by our techniques have the same number of functional modules and registers as those synthesized using traditional approaches but require significantly lower BIST area overhead. 相似文献

20.

Optimal scheduling of multiple preventive maintenance activities

Kyung S. Park 《Microelectronics Reliability》1983,23(2):351-354

This paper is concerned with scheduling multiple periodic preventive maintenance (PM) activities. This scheduling problem is formulated as partitioned integer linear programming (ILP) models based on the Chinese Remainder Theorem. Since the number of necessary variables and constraints of the proposed partitioned ILP model does not grow rapidly with the number of activities, the computation time and memory requirements are of a negligible nature. For a small problem involving a decade or so activities, the scheduling can be carried out very simply by hand. An example is given to illustrate the approach. 相似文献