共查询到20条相似文献,搜索用时 15 毫秒
1.
Georg Hetmanczyk 《Multidimensional Systems and Signal Processing》2010,21(1):45-58
Multidimensional wave digital algorithms for numerical integration of partial differential equations exhibit not only important
robustness properties, but also a massive amount of parallelism. As the technology limit of heat dissipation stalls a further
increase of clock rates, modern CPUs incorporate multiple cores for parallel computation. In this paper, a safe and efficient
multithreading concept is presented to exploit the multicore architecture for multidimensional wave digital algorithms. Context
switching and synchronization overhead is investigated as well as effects of unfair operating system thread scheduling due
to unequal cache sharing of cores. Simulation results for the nonlinear Euler equations confirm the efficiency of the proposed
setup on a 1-core, 4-core and a 2 × 4-core system. 相似文献
2.
This paper addresses the problem of adaptively optimizing a two-channel lossless finite-impulse-response (FIR) filter bank, which finds application in subband coding and wavelet signal analysis. Instead of using a gradient decent procedure-with its inherent problem of becoming trapped in local minima of a nonquadratic cost function-two eigenstructure algorithms are proposed. Both algorithms feature a priori bounds on the output variance at any convergent point, which, based on simulations, lead to solutions that lie acceptably close to a global minimum point of an output variance objective function. Moreover, a sufficient condition for such stationary points based on fixed-point theory is shown. It is shown that the convergence rate of both algorithms increases as the separation of eigenvalues of the input covariance matrix increases. Simulations for synthetic and real data support the conclusions. 相似文献
3.
Chandrachoodan N. Bhattacharyya S.S. Liu K.J.R. 《Signal Processing, IEEE Transactions on》2004,52(5):1209-1217
The problem of representing timing information associated with functions in a dataflow graph is considered. This information is used for constraint analysis during behavioral synthesis of appropriate architectures for implementing the graph. Conventional models for timing suffer from shortcomings that make it difficult to represent timing information in a hierarchical manner for sequential and multirate systems. Some of these shortcomings are identified, and an alternate timing model that does not have these problems for hardware implementations is provided. We introduce the concept of timing pairs to model delay elements in sequential and multirate circuits and show how this allows us to derive hierarchical timing information for complex circuits. The resulting compact representation of the timing information can be used to streamline system performance analysis. In addition, several analytical results that previously applied only to single rate systems can now be extended to multirate systems. We present an algorithm to compute the timing parameters and have used this to compute timing parameters for a number of benchmark circuits. The results obtained on several ISCAS benchmark circuits as well as several multirate dataflow graphs corresponding to useful signal processing applications are presented. These results show that the new representation model can result in large reductions in the amount of information required to represent timing for hierarchical systems. 相似文献
4.
The problem of estimating a probability density function (PDF) from measurements has been widely studied by many researchers. Even though much work has been done in the area of PDF estimation, most of it was focused on the continuous case. We propose a new model-based approach for modeling and estimating discrete probability density functions or probability mass functions. This approach is based on multirate signal processing theory, and it has several advantages over the conventional histogram method. We illustrate the PDF estimation procedure and analyze the statistical properties of the PDF estimates. Based on this model, a novel scheme is introduced that can be used for estimating the PDF in the presence of noise. Furthermore, the proposed ideas are extended to the more general case of estimating multivariate PDFs. Finally, we also consider practical issues such as optimizing the coefficients of a digital filter, which is an integral part of the model. This allows us to apply the proposed model to solve real-world problems. Simulation results are given where appropriate in order to demonstrate the ideas. 相似文献
5.
Duen-Jeng Wang Yu Hen Hu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(3):393-403
In this paper, we consider multiprocessor implementation of real-time recursive digital signal processing algorithms. The objective is to devise a periodic schedule, and a fully static task assignment scheme to meet the desired throughput rate while minimizing the number of processors. Toward this goal, we propose a notion called cutoff time. We prove that the minimum-processor schedule can be found within a finite time interval bounded by the cutoff time. As such the complexity of the scheduling algorithm and the allocation algorithm can be significantly reduced. Next, taking advantage of the cutoff time, we derive efficient heuristic algorithms which promise better performance and less computation complexity compared to other existing algorithms. Extensive benchmark examples are tested which yield most encouraging results 相似文献
6.
Bernardini R. Cortelazzo G.M. Mian G.A. 《Signal Processing, IEEE Transactions on》1994,42(7):1786-1794
This work determines the scrambling rule of the multidimensional Cooley-Tukey FFT, and of the multidimensional prime factor FFT, in complete generality, i.e., for signals defined on lattices of general type. The characteristics of the scrambling rule bear interesting similarities with the 1-D case: the scrambling can be performed on the input data and it can be eliminated from the operations requiring pairs of FFT and inverse FFT (e.g. convolutions and correlations). The results of this work allow one to derive the most efficient way of performing multidimensional scrambling. The consequent memory access savings are relevant, especially with arrays of sizable dimensions 相似文献
7.
Resource-constrained loop list scheduler for DSP algorithms 总被引:1,自引:0,他引:1
We present a new algorithm for resource-constrained scheduling for digital signal processing (DSP) applications when the number of processors is fixed and the objective is to obtain a schedule with the minimum iteration period. This type of scheduling is best suited for moderate speed applications where conservation of area and power is more important than speed. We define and make use of newgraph dependent constraints to obtain a lower bound estimate on the iteration period for any data-flow graph. By satisfying these constraints before performing the scheduling task, we can restrict the design space and can generate valid schedules in less time than previously reported. The graph dependent constraints provide a more accurate lower bound estimate on the iteration period than previously published results. This new scheduling algorithm exploits the iterative nature of DSP algorithms and uses aniterative-loop based scheduling approach. This resource scheduling algorithm has been incorporated in the Minnesota ARchitecture Synthesis (MARS) system. Our approach exploits inter-iteration and intra-iteration precedence constraints and incorporates implicit retiming and pipelining to generate optimal and near optimal schedules.This research was supported by the Advanced Research Projects Agency under grant number F33615-93-C-1309 and the office of Naval Research under contract number N00014-91-J-1008. 相似文献
8.
Rundblad E. Labunets V. Astola J. Egiazarian K. 《Signal Processing, IEEE Transactions on》2002,50(6):1496-1507
Fast algorithms for a wide class of nonseparable n-dimensional (n-D) discrete unitary 𝒦 transforms (DKTs) are introduced. They need fewer 1-D DKTs than in the case of the classical radix-2 FFT-type approach. The method utilizes a decomposition of the n-D K transform into the product of a new n-D discrete Radon transform and of a set of parallel/independ 1-D K transforms. If the n-D K transform has a separable kernel (e.g., the case of the discrete Fourier transform), our approach leads to decrease of multiplicative complexity by the factor of n, compared with the classical row/column separable approach 相似文献
9.
J. Meerbergen J. Huisken P. Lippens O. McArdle R. Segers G. Goossens J. Vanhoof D. Lanneer F. Catthoor H. Man 《The Journal of VLSI Signal Processing》1990,1(4):265-278
An integrated design environment for the automated design of DSP systems is described. The overall design time of complex DSP systems on silicon can be reduced drastically by offering the designer a complete silicon compilation environment, integrating architectural level synthesis tools, a module generator and a floorplanner. The system is supported by a flexible and powerful library. A true exploration of the design space in an interactive way is possible. Examples of the first complex chips that have been designed with this system are discussed. 相似文献
10.
Jiandong Wang Tongwen Chen Biao Huang 《Signal Processing, IEEE Transactions on》2005,53(7):2421-2431
This paper studies two problems in the spectral theory of discrete-time cyclostationary signals: the cyclospectrum representation and the cyclospectrum transformation by linear multirate systems. Four types of cyclospectra are presented, and their interrelationships are explored. In the literature, the problem of cyclospectrum transformation by linear systems was investigated only for some specific configurations and was usually developed with inordinate complexities due to lack of a systematic approach. A general multirate system that encompasses most common systems-linear time-invariant systems and linear periodically time-varying systems-is proposed as the unifying framework; more importantly, it also includes many configurations that have not been investigated before, e.g., fractional sample-rate changers with cyclostationary inputs. The blocking technique provides a systematic solution as it associates a multirate system with an equivalent linear time-invariant system and cyclostationary signals with stationary signals; thus, the original problem is elegantly converted into a relatively simple one, which is solved in the form of matrix multiplication. 相似文献
11.
Silicon technology has now advanced to the point that there is a serious mismatch in the time taken to design advanced silicon-based systems and the time to market for any new product or product derivative. To obviate this delay, a new paradigm is emerging based on intellectual property (IP) exchange, where designers and differing companies share subsystems (virtual cores) between themselves to reduce design time to acceptable levels. To this end, over 150 companies including all the major players formed the Virtual Socket Interface Alliance in March 1997. The protection of IP has become a serious issue as intercompany subsystem design exchange becomes more commonplace. This paper presents new techniques to protect the IP of virtual cores that implement digital signal processing (DSP) algorithms. The approach involves embedding codewords into the design of fundamental signal processing algorithms such as digital filters and the DFT in such a way that proof of authorship can be retained, and, if required, easily identified. The techniques discussed can be adapted to protect other fundamental DSP algorithms such as convolution and correlation. The protection of IP via watermarking techniques is increasingly being applied at all levels of design. It is particularly advantageous if such techniques are applied at the highest abstraction levels in the design flow, and if such techniques are applied at basic algorithm level, they become very difficult to detect at lower levels of system design 相似文献
12.
13.
14.
Mikko Berg Ilpo Kojo Jari Laarni 《Journal of Visual Communication and Image Representation》2010,21(8):880-888
This article discusses the human ability to detect, locate, or identify objects and their features using peripheral vision. The potential of peripheral vision is underused with user interfaces probably due to the limits of visual acuity. Peripheral preview can guide focused attention to informative locations, if the visual objects are large enough and otherwise within the limits of discrimination. Our experiments focused on the task of identifying an outlier and implicated another limiting factor, crowding, for integration of object features. The target object and the corresponding data dimension were located from an object display representation used for integrating multidimensional data. We measured performance on a peripheral vision task in terms of reaction times and eye movements. Subjects identified the target item from 480 alternatives within 100 ms. Therefore, the identification process would not slow down the natural gaze sequence and focused attention during monitoring and data mining tasks. 相似文献
15.
On fast address-lookup algorithms 总被引:17,自引:0,他引:17
The growth of the Internet and its acceptance has sparkled keen interest in the research community in respect to many apparent scaling problems for a large infrastructure based on IP technology. A self-contained problem of considerable practical and theoretical interest is the longest-prefix lookup operation, perceived as one of the decisive bottlenecks. Several novel approaches have been proposed to speed up this operation that promise to scale forwarding technology into gigabit speeds. This paper surveys these new lookup algorithms and classifies them based on applied techniques, accompanied by a set of practical requirements that are critical to the design of high-speed routing devices. We also propose several new algorithms to provide lookup capability at gigabit speeds. In particular, we show the theoretical limitations of routing table size and show that one of our new algorithms is almost optimal, while requiring only a small number of memory accesses to perform each address lookup 相似文献
16.
Muhammad K. Roy K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2002,10(3):292-300
In this paper, we present a general approach which specifically targets reduction of redundant computation in common digital-signal processing (DSP) tasks such as filtering and matrix multiplication. We show that such tasks can be expressed as multiplication of vectors by scalars and this allows fast multiplication by sharing computation. Vector scaling operation is decomposed to find the most effective precomputations which yield a fast multiplier implementation. Two decomposition approaches are presented, one based on a greedy decomposition and the other based on fixed-size lookup and this leads to two multiplier architectures for vector-scalar products. Analog simulation of an example multiplier shows a speed advantage by a factor of about 1.85 over a conventional carry save array multiplier. Further simulations using 0.18 /spl mu/ technology show up to 20% speed advantage over Booth encoded Wallace tree multipliers. 相似文献
17.
数字化接收机通常采用SSB、AM、FM等解调方式.在FM解调方式下,良好的静噪性能通常是衡量接收机性能的重要指标.比较了几种常用的静噪算法,重点提出了一种改进的基于音频频谱特性的静噪算法,并完成了算法的DSP实现和性能分析.通过仿真和实际试验数据分析表明,该算法计算简单,便于DSP实现,可靠性高,适合数字化接收机使用. 相似文献
18.
19.
Kittitornkun S. Yu Hen Hu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(2):208-217
Recently, FPGAs (field programmable gate arrays) technology have made significant advances in both speed and capacity. Millions of logic gates are now available for reconfiguration programming. To fully exploit the potential of so many programmable devices, powerful design methodology must be developed. In this paper, we propose a novel systematic computer-aided design methodology that can efficiently implement deeply nested do-loop algorithms on a FPGA. Specifically, our design methodology maps the loop dependence graph onto a linear array of locally connected processing elements to exploit parallelism. Due to the regular structure of this linear array of processors, it can be easily implemented on a FPGA. While this method is based on conventional systolic array design methodology, our proposed approach exhibits two distinct features that contribute to its superior performance: 1) We developed a novel multiple-order dependence graph representation that is able to efficiently represent distinct, yet correct algorithm execution orders. 2) We developed new FPGA-specific architectural constraints during the mapping process. As such, FPGA implementations based on our approach will utilize much fewer lookup tables while achieving superior performance. 相似文献
20.
Parallel implementation for iterative image restoration algorithms on a parallel DSP machine 总被引:1,自引:0,他引:1
Robert L. Stevenson George B. Adams III Leah H. Jamieson Edward J. Delp 《The Journal of VLSI Signal Processing》1993,5(2-3):261-272
Many low-level image processing algorithms which are posed as variational problems can be numerically solved using local and iterative relaxation algorithms. Because of the structure of these algorithms, processing time will decrease nearly linearly with the addition of processing nodes working in parallel on the problem. In this article, we discuss the implementation of a particular application from this class of algorithms on the 8×8 processing array of the AT&T Pixel system. In particular, a case study for a image interpolation algorithm is presented. The performance of the implementation is evaluated in terms of the absolute processing time. We show that near linear speedup is achieved for such iterative image processing algorithms when the processing array is relatively small.This work was made possible by a grant from the AT&T University Equipment Program. 相似文献