首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Resource-constrained loop list scheduler for DSP algorithms   总被引:1,自引:0,他引:1  
We present a new algorithm for resource-constrained scheduling for digital signal processing (DSP) applications when the number of processors is fixed and the objective is to obtain a schedule with the minimum iteration period. This type of scheduling is best suited for moderate speed applications where conservation of area and power is more important than speed. We define and make use of newgraph dependent constraints to obtain a lower bound estimate on the iteration period for any data-flow graph. By satisfying these constraints before performing the scheduling task, we can restrict the design space and can generate valid schedules in less time than previously reported. The graph dependent constraints provide a more accurate lower bound estimate on the iteration period than previously published results. This new scheduling algorithm exploits the iterative nature of DSP algorithms and uses aniterative-loop based scheduling approach. This resource scheduling algorithm has been incorporated in the Minnesota ARchitecture Synthesis (MARS) system. Our approach exploits inter-iteration and intra-iteration precedence constraints and incorporates implicit retiming and pipelining to generate optimal and near optimal schedules.This research was supported by the Advanced Research Projects Agency under grant number F33615-93-C-1309 and the office of Naval Research under contract number N00014-91-J-1008.  相似文献   

2.
A 32-b RISC/DSP microprocessor with reduced complexity   总被引:2,自引:0,他引:2  
This paper presents a new 32-b reduced instruction set computer/digital signal processor (RISC/DSP) architecture which can be used as a general purpose microprocessor and in parallel as a 16-/32-b fixed-point DSP. This has been achieved by using RISC design principles for the implementation of DSP functionality. A DSP unit operates in parallel to an arithmetic logic unit (ALU)/barrelshifter on the same register set. This architecture provides the fast loop processing, high data throughput, and deterministic program flow absolutely necessary in DSP applications. Besides offering a basis for general purpose and DSP processing, the RISC philosophy offers a higher degree of flexibility for the implementation of DSP algorithms and achieves higher clock frequencies compared to conventional DSP architectures. The integrated DSP unit provides instruction set support for highly specialized DSP algorithms. Subword processing optimized for DSP algorithms has been implemented to provide maximum performance for 16-b data types. While creating a unified base for both application areas, we also minimized transistor count and we reduced complexity by using a short instruction pipeline. A parallelism concept based on a varying number of instruction latency cycles made superscalar instruction execution superfluous  相似文献   

3.
In implementing digital signal processing (DSP) algorithms for audio real-time applications, one is frequently faced with problems regarding incompatibilities between the hardware buffer length (the internal buffer of a professional sound card) and the software buffer size imposed by the underlying algorithm (due to i.e. multirate or FFT constraints). This mismatch is solved by proper frame size conversion algorithms which inevitably introduce delay. In this context, this paper presents a buffering scheme together with a theoretical proof of the minimum delay property shown by it. Some examples derived from frequently encountered issues in DSP applications are reported.  相似文献   

4.
Identifies a comprehensive set of compact rules and efficient algorithms for simplifying and rearranging structures common in multidimensional multirate signal processing. The authors extend the 1D rules reported by Crochiere and Rabiner (1983), especially the many equivalent forms of cascades of upsamplers and downsamplers. They also include rules reported by other authors for completeness. The extension to mD is based primarily on the Smith form decomposition of resampling (nonsingular integer square) matrices. The Smith form converts non-separable multidimensional operations into separable ones by means a shuffling of input samples and a reshuffling of the separable operations. Based on the Smith form, the authors have developed algorithms for 1) computing coset vectors 2) finding greatest common sublattices 3) simplifying cascades of up/downsampling operations. The algorithms and rules are put together in a form that can be implemented efficiently in a symbolic algebra package. The authors have encoded the knowledge in the commercially available Mathematica environment  相似文献   

5.
6.
The theory of the polynomial residue number system (PRNS), a system in which totally parallel polynomial multiplication can be achieved provided that the arithmetic takes place in some carefully chosen ring, is examined. Such a system is defined by a mapping which maps the problem of multiplication of two polynomials onto a completely parallel scheme where the mapped polynomial coefficients are multiplied pairwise. The properties of the mapping and the rules of operations in the PRNS are proven. Choices of the rings for which the PRNS is defined are also studied. It is concluded that the PRNS can offer significant advantages in those digital signal processing (DSP) applications that involve multiplication-intensive algorithms like convolutions and one-dimensional or multidimensional correlation  相似文献   

7.
In this paper we formalize a novel multirate folding transformation which is a tool used to systematically synthesize control circuits for pipelined VLSI architectures which implement multirate algorithms. Although multirate algorithms contain decimators and expanders which change the effective sample rate of a discrete-time signal, multirate folding time-multiplexes the multirate algorithm to hardware in such a manner that the resulting synchronous architecture requires only a single-clock signal. Multirate folding equations are derived and these equations are used to address two related issues. The first issue is memory requirements in folded architectures. We derive expressions for the minimum number of registers required by a folded architecture which implements a multirate algorithm. The second issue is retiming. Based on the noble identities of multirate signal processing, we derive retiming for folding constraints which indicate how a multirate data-flow graph must be retimed for a given schedule to be feasible. The techniques introduced in this paper can be used to synthesize architectures for a wide variety of digital signal processing applications which are based on multirate algorithms, such as signal analysis and coding based on subband decompositions and wavelet transforms  相似文献   

8.
This brief deals with shift invariance in linear systems (LSs), using a group theoretical approach. The group approach is necessary to obtain general definitions, which can be applied to continuous-time, discrete-time, multirate and also multidimensional LSs. Thus, a unique definition is given for periodic invariance and strict invariance (SI) and also for quasi-invariance (QI), a new concept, intermediate between periodic and SI. The importance of QI relies upon the fact that it is comprehensive of practically all multirate LSs  相似文献   

9.
This paper addresses instruction-level parallelism in code generation for digital signal processors (DSPs). In the presence of potential parallelism, the task of code generation includes code compaction, which parallelizes primitive processor operations under given dependency and resource constraints. Furthermore, DSP algorithms in most cases are required to guarantee real-time response. Since the exact execution speed of a DSP program is only known after compaction, real-time constraints should be taken into account during the compaction phase. While previous DSP code generators rely on rigid heuristics for compaction, we propose a novel approach to exact local code compaction based on an integer programming (IP) model, which handles time constraints. Due to a general problem formulation, the IP model also captures encoding restrictions and handles instructions having alternative encodings and side effects and therefore applies to a large class of instruction formats. Capabilities and limitations of our approach are discussed for different DSPs  相似文献   

10.
In this paper the design and implementation of Multi-Dimensional (MD) filter, particularly 3-Dimensional (3D) filter, are presented. Digital (discrete domain) filters applied to image and video signal processing using the novel 3D multirate algorithms for efficient implementation of moving object extraction are engineered with an example. The multirate (decimation and/or interpolation) signal processing algorithms can achieve significant savings in computation and memory usage. The proposed algorithm uses the mapping relations of z-transfer functions between non-multirate and multirate mathematical expressions in terms of time-varying coefficient instead of traditional polyphase de- composition counterparts. The mapping properties can be readily used to efficiently analyze and synthesize MD multirate filters.  相似文献   

11.
朱正学  郑重 《微电子学》1998,28(1):16-22
从视频信号的特征出发,简要说明了实时视频压缩的常用算法及其国际标准。通过系统地分析了视频压缩算法中内在的模块特性和并行特性,结合数字信号处理领域中具有并行实现机制的典型硬件结构,得出了可用于实时视频压缩的两种单片硬件结构模型。  相似文献   

12.
This paper considers two-dimensional (2-D) retiming, which is the problem of retiming circuits that operate on 2-D signals. We begin by discussing two types of parallelism available in 2-D data processing, which we call inter-iteration parallelism and inter-operation parallelism. We then present two novel techniques for 2-D retiming that can be used to extract inter-operation parallelism. These two techniques are designed to minimize the amount of memory required to implement a 2-D data-flow graph while maintaining a desired clock rate for the circuit. The first technique is based on an integer linear programming (ILP) formulation of the problem, and is called ILP 2-D retiming. This technique considers the entire 2-D retiming problem as a whole, but long central processing unit times are required if the circuit is large. The second technique, called orthogonal 2-D retiming, is a linear programming formulation which is derived by partitioning ILP 2-D retiming into two parts called s- and a-retiming. This technique finds a solution in polynomial time and is much faster than the ILP 2-D retiming technique, but the two sub problems (s- and a-retiming) can give results which are not compatible with one another. To solve this incompatibility problem, a variation of orthogonal 2-D retiming called integer orthogonal 2-D retiming is developed. This technique runs in polynomial time and the s-retiming and a-retiming steps are guaranteed to give compatible results. We show that the techniques presented in this paper can result in memory hardware savings of 50% compared to previously published 2-D retiming techniques  相似文献   

13.
The polyphase representation with respect to sampling lattices in multidimensional (M-D) multirate signal processing allows us to identify perfect reconstruction (PR) filter banks with unimodular Laurent polynomial matrices, and various problems in the design and analysis of invertible MD multirate systems can be algebraically formulated with the aid of this representation. While the resulting algebraic problems can be solved in one dimension (1-D) by the Euclidean Division Algorithm, we show that Gröbner bases offers an effective solution to them in the M-D case.  相似文献   

14.
The performance of multirate direct-sequence code-division multiple-access (CDMA) systems is considered. We compare two multirate schemes: variable spreading length (VSL-CDMA) and multicode (MC-CDMA). The performance in terms of asymptotic multiuser efficiency (AME) and near-far resistance (NFR) for various detectors are evaluated. Analytical and numerical results demonstrate that in multirate systems, MC-CDMA has a similar performance to that of VSL-CDMA employing low-rate detection in terms of multirate AME (MAME) and multirate NFR (MNFR). A lower bound for the optimal MNFR is also obtained and is shown to be that of the linear decorrelator in multirate systems. Thus, this implies that the decorrelator is no longer optimal in the sense of MNFR.  相似文献   

15.
高维环网上的一种可扩展的全交换算法   总被引:1,自引:0,他引:1       下载免费PDF全文
刘刚  顾乃杰  任开新  熊焰 《电子学报》2005,33(9):1723-1728
全交换在并行计算领域中有着大量而且重要的应用,例如FFT和矩阵运算等.本文提出了一种适合环网结构的全交换算法.算法中采用了新的网络划分技术及通信模式,使高维环网上全交换算法的通信量的主项达到了理论下限,这是已知的其他相关算法未能达到的,且其启动次数与通信量均优于现有的其他同类算法.本文所述的算法并不要求环网每一维上的处理器结点数目是2的方幂或某一个数的平方.最后,该算法简单规范,易于硬件高效实现.  相似文献   

16.
Today's communications systems especially in the field of wireless communications rely on many different algorithms to provide applications with constantly increasing data rates and higher quality. This development combined with the wireless channel characteristics as well as the invention of turbo codes has particularly increased the importance of interleaver algorithms. In this paper, we demonstrate the feasibility to exploit the hardware parallelism in order to accelerate the interleaving procedure. Based on a heuristic algorithm, the possible speedup for different interleavers as a function of the degree of parallelism of the hardware is presented. The parallelization is generic in the sense that the assumed underlying hardware is based on a parallel datapath DSP architecture and therefore provides the flexibility of software solutions.  相似文献   

17.
On the complexity of frequency-domain adaptive filtering   总被引:1,自引:0,他引:1  
In frequency-domain adaptive filtering (FDAF), the `crossover point' where the frequency domain becomes more efficient than the time domain is asymptotically 1.5 times as high for a DSP software realization as for a VLSI hardware realization. However, actual software realizations of FDAF may require as much as twice the computation predicted by theory, depending on the processor chosen. The approach can be extended to analyze other FDAF structures, such as multirate filters  相似文献   

18.
Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively.
Edwin H.-M. ShaEmail:
  相似文献   

19.
We present a novel strategy for incorporating massive parallelism into the solution of Maxwell's equations using finite-difference time-domain methods. In a departure from previous techniques wherein spatial parallelism is used, our approach exploits massive temporal parallelism by computing all of the time steps in parallel. Furthermore, in contrast to other methods which appear to concentrate on explicit schemes such as Yee's (1966) algorithm, our strategy uses the implicit Crank-Nicolson technique which provides superior numerical properties. We show that the use of temporal parallelism results in algorithms which offer a massive degree of coarse grain parallelism with minimum communication and synchronization requirements. Due to these features, the time-parallel algorithms are particularly suitable for implementation on emerging massively parallel multiple instruction-multiple data (MIMD) architectures. The methodology is applied to a circular cylindrical configuration, which serves as a testbed problem for the approach, to demonstrate the massive parallelism that can be exploited. We also discuss the generalization of the methodology for more complex problems  相似文献   

20.
The paper addresses the problem of multirate signal processing over arbitrary fields. Studies of multirate systems and filter banks have proceeded in parallel, and a wealth of results are available in literature. The authors concentrate their attention on cyclic systems. These structures are ideally suited to generalising the concepts to finite fields. The perfect reconstruction property for quadrature mirror filter banks is obtained. It is shown how the cyclic wavelet transform (CWT) can be derived from such systems; the relationships between cyclic filter banks and CWTs are explored in detail. The results obtained are potentially very well suited for speech and image encoding, as well as for fast algorithms in signal processing  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号