期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Architecture and Evaluation of an Asynchronous Array of Simple Processors

Zhiyi Yu Michael J. Meeuwsen Ryan W. Apperson Omar Sattari Michael A. Lai Jeremy W. Webb Eric W. Work Tinoosh Mohsenin Bevan M. Baas 《Journal of Signal Processing Systems》2008,53(3):243-259

This paper presents the architecture of an asynchronous array of simple processors (AsAP), and evaluates its key architectural features as well as its performance and energy efficiency. The AsAP processor calculates DSP applications with high energy-efficiency, is capable of high-performance, is easily scalable, and is well-suited to future fabrication technologies. It is composed of a two-dimensional array of simple single-issue programmable processors interconnected by a reconfigurable mesh network. Processors are designed to capture the kernels of many DSP algorithms with very little additional overhead. Each processor contains its own tunable and haltable clock oscillator, and processors operate completely asynchronously with respect to each other in a globally asynchronous locally synchronous (GALS) fashion. A 6×6 AsAP array has been designed and fabricated in a 0.18 μm CMOS technology. Each processor occupies 0.66 mm², is fully functional at a clock rate of 520–540 MHz at 1.8 V, and dissipates an average of 35 mW per processor at 520 MHz under typical conditions while executing applications such as a JPEG encoder core and a complete IEEE 802.11a/g wireless LAN baseband transmitter. Most processors operate at over 600 MHz at 2.0 V. Processors dissipate 2.4 mW at 116 MHz and 0.9 V. A single AsAP processor occupies 4% or less area than a single processing element in other multi-processor chips. Compared to several RISC processors (single issue MIPS and ARM), AsAP achieves performance 27–275 times greater, energy efficiency 96–215 times greater, while using far less area. Compared to the TI C62x high-end DSP processor, AsAP achieves performance 0.8–9.6 times greater, energy efficiency 10–75 times greater, with an area 7–19 times smaller. Compared to ASIC implementations, AsAP achieves performance within a factor of 2–5, energy efficiency within a factor of 3–50, with area within a factor of 2.5–3. These data are for varying numbers of AsAP processors per benchmark.

Bevan M. BaasEmail:

相似文献

2.

Negative Save Sign Extension for Multi-term Adders and Multipliers

Robert T. Grisamore Earl E. SwartzlanderJr. 《Journal of Signal Processing Systems》2008,52(1):1-11

This paper outlines a new sign extension technique for use in carry save adder trees that reduces the computational complexity. The “Negative Save” technique presented is a modification to the Baugh–Wooley sign extension technique developed for array multipliers. Applying this sign extension technique to both parallel adder and multiplier partial product structures reduces the hardware required. The speed of the resulting structures is also improved.

Robert T. GrisamoreEmail:

相似文献

3.

Error Probability Distribution and Density Functions for Rayleigh and Rician Fading Channels with Diversity

Chen Jie Vidhyacharan Bhaskar 《International Journal of Wireless Information Networks》2008,15(1):53-60

This paper analyzes the distribution and density functions of the probability of error for Rayleigh and Rician fading channels with diversity. An expression for the signal-to-noise ratio is derived for an asynchronous CDMA (A-CDMA) system with diversity. The error probability distribution and density functions are derived and plotted for different mean energy-to-noise ratios.

Vidhyacharan BhaskarEmail:

相似文献

4.

Layereless Communications: From Dream to Reality?

Juha Saarnio Rui Aguiar I. Vijaya Kumar 《Wireless Personal Communications》2008,44(1):51-55

The paper summarizes the main results of one of the key panel session of the Workshop, focused on the investigation about the possible translation of the “layerless communications” from a dreaming vision to reality.

Juha SaarnioEmail:

相似文献

5.

A Scalable Configurable Architecture for Advanced Wireless Communication Algorithms

Konstantinos Sarrigeorgidis Jan Rabaey 《The Journal of VLSI Signal Processing》2006,45(3):127-151

相似文献

6.

Design of 100 μW Wireless Sensor Nodes for Biomedical Monitoring

Lennart Yseboodt Michael De Nil Jos Huisken Mladen Berekovic Qin Zhao Frank Bouwens Jos Hulzink Jef Van Meerbergen 《Journal of Signal Processing Systems》2009,57(1):107-119

Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied, next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain. The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram algorithm.

Jef Van MeerbergenEmail:

相似文献

7.

Isolation of Failing Scan Cells through Convolutional Test Response Compaction

Grzegorz Mrugalski Janusz Rajski Chen Wang Artur Pogiel Jerzy Tyszer 《Journal of Electronic Testing》2007,23(1):35-45

This paper describes a non-recursive fault diagnosis technique for scan-based designs with convolutional test response compaction. The proposed approach allows a time-efficient and accurate identification of failing scan cells using Gauss–Jordan elimination method.

Jerzy Tyszer (Corresponding author)Email:

相似文献

8.

The Architecture and Development Flow of the S5 Software Configurable Processor

Jeffrey M. Arnold 《The Journal of VLSI Signal Processing》2007,47(1):3-14

A software configurable processor (SCP) is a hybrid device that couples a conventional processor datapath with programmable logic to allow application programs to dynamically customize the instruction set. SCP architectures can offer significant performance gains by exploiting data parallelism, operator specialization and deep pipelines. The S5000 is a family of high performance software configurable processors for embedded applications. The S5000 consists of a conventional 32-bit RISC processor coupled with a programmable Instruction Set Extension Fabric (ISEF). To develop an application for the S5 the programmer identifies critical sections to be accelerated, writes one or more extension instructions as functions in a variant of the C programming language, and accesses those functions from the application program. Performance gains of more than an order of magnitude over the unaccelerated processor can be achieved.

Jeffrey M. ArnoldEmail:

相似文献

9.

Blocking models of optical burst switches with shared wavelength converters: exact formulations and analytical approximations

Pedro Reviriego Anna Maria Guidotti Carla Raffaelli Javier Aracil 《Photonic Network Communications》2008,16(1):61-70

Loss modeling of asynchronous optical burst switches with shared wavelength converters is considered. An exact analysis based on continuous time Markov chains is proposed and validated by comparison with simulation for balanced and unbalanced traffic. A computationally efficient approximated analysis is also proposed and compared with the exact model to find applicability conditions. Approximate loss performance evaluation is presented for ranges of values which are not tractable either by simulation or exact analysis.

Javier AracilEmail:

相似文献

10.

Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder

B. Mei B. De Sutter T. Vander Aa M. Wouters A. Kanstein S. Dupont 《Journal of Signal Processing Systems》2008,51(3):225-243

Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability. Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler, simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed and the power consumption are also very competitive compared with other processors.

S. DupontEmail:

相似文献

11.

O<Superscript>3</Superscript>BPSK-based Linear Decorrelating Detector for Asynchronous DS/CDMA Systems over Frequency-Selective Rayleigh Fading Channels

Chia-Hsin Cheng Jen-Yung Lin Chao-Kai Wen Jyh-Horng Wen 《Wireless Personal Communications》2009,48(2):311-325

In this paper, a convenient signaling scheme, called orthogonal on–off BPSK (O³BPSK), along with a simple one-shot linear decorrelating detector (LDD) and a whitening Rake bank, is proposed for near–far resistant detection in asynchronous DS/CDMA systems. Based on the maximum multi-path spreading delay, a minimum duration of “off” is suggested, during which the temporally adjacent bits (TABs) that contain multi-user interference (MUI) and inter-symbol interference (ISI) from different users at the receiver are decoupled. The O³BPSK signaling scheme is combined with the whitening Rake receiver to preserve multi-path diversity gain in multi-path fading CDMA channels. The scheme offers low complexity, no detection delay, near–far resistance, and compensation for fading channels.

Jyh-Horng Wen (Corresponding author)Email:

相似文献

12.

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor

Shuenn-Shyang Wang Chien-Sung Li 《Journal of Signal Processing Systems》2008,51(3):245-256

Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication systems. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-2² and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 μm process, the area of the processor is 2.9 mm² and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.

Shuenn-Shyang WangEmail:

相似文献

13.

Architecture Considerations for Multi-Format Programmable Video Processors

Jonah Probell 《Journal of Signal Processing Systems》2008,50(1):33-39

Many different video processor architectures exist. Its architecture gives a processor strength for a particular application. Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and peripheral support.

Jonah ProbellEmail:

相似文献

14.

An Area Efficient FFT/IFFT Processor for MIMO-OFDM WLAN 802.11n

Bo Fu Paul Ampadu 《Journal of Signal Processing Systems》2009,56(1):59-68

A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support 1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um CMOS technology and saves 25% area compared to a conventional implementation approach using radix-2³ algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length; it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN standard.

Paul AmpaduEmail:

相似文献

15.

Real-time DSP and FPGA Implementation of Wiener LMS Based Multipath Channel Estimation in 3G CDMA Systems 总被引：2，自引：0，他引：2

Messaoud Ahmed Ouameur Daniel Massicotte 《The Journal of VLSI Signal Processing》2007,47(3):259-279

This paper investigates real-time DSP and FPGA implementations of a low complexity technique for asynchronous multiuser delay acquisition and time varying channel tracking for multipath channels in WCDMA and cdma2000 systems. A multiuser-LMS-like structure along with smoothing/prediction filters to improve tracking quality is reviewed. We investigate an efficient implementation based on FFT/IFFT technique, under fixed-point data representation and computation constraint. The measured BER reveals that fixed-point implementation is feasible at possibly no performance degradation. Based on real time execution made on a fixed-point high performance DSP, the maximum number of users is 15 and 17 for the proposed method and correlator, respectively. Due to the inherent parallelism and regular data flow FPGA implementation is suggested wherein a maximum number of users more than 80 can be afforded in Xilinx Virtex™ II Pro device.

Daniel Massicotte (Corresponding author)Email:

相似文献

16.

MGFs for Rayleigh Random Variables

Christopher S. Withers Saralees Nadarajah 《Wireless Personal Communications》2008,46(4):463-468

Expressions are given for the moment generating functions of the Rayleigh and generalized Rayleigh distributions.

Saralees NadarajahEmail:

相似文献

17.

Triangular representation of Fornasini–Marchesini systems

Chen Dubi 《Multidimensional Systems and Signal Processing》2009,20(3):217-234

We further develop the study of Fornasini–Marchesini linear systems with upper triangular state operators, addressing the problem of constructing a triangular Fornasini–Marchesini model equivalent (under a proper definition) to a given system. In particular, we are interested in the problem of determining when such a system can be constructed, without losing the information about the state of the original system.

Chen DubiEmail:

相似文献

18.

Fast PWM-Based Test for High Resolution ΣΔ ADCs

Daniela De Venuto Leonardo Reyneri 《Journal of Electronic Testing》2007,23(6):539-548

This work describes a novel test strategy that uses digital stimuli for cheap, fast, though accurate, testing of high resolution ΣΔ ADCs. Simulations and measurements showed a discrimination threshold on specification parameters up to −90 dBc. The proposed method helps to reduce the cost of ADC production test, to extend test coverage and to enable built-in self-test and test-based self-calibration.

Leonardo Reyneri (Corresponding author)Email:

相似文献

19.

Design and Implementation of a High-Performance and Complexity-Effective VLIW DSP for Multimedia Applications

Tay-Jyi Lin Shin-Kai Chen Yu-Ting Kuo Chih-Wei Liu Pi-Chen Hsiao 《Journal of Signal Processing Systems》2008,51(3):209-223

This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications. The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance (estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process, and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm² including 160 KB on-chip SRAM.

Chih-Wei LiuEmail:

相似文献

20.

Coding Schemes for Crisscross Error Patterns

Simon Plass Gerd Richter A. J. Han Vinck 《Wireless Personal Communications》2008,47(1):39-49

This paper addresses two coding schemes which can handle emerging errors with crisscross patterns. First, a code with maximum rank distance, so-called Rank-Codes, is described and a modified Berlekamp–Massey algorithm is provided. Secondly, a Permutation Code based coding scheme for crisscross error patterns is presented. The influence of different types of noise are also discussed.

A. J. Han VinckEmail:

相似文献