共查询到20条相似文献,搜索用时 15 毫秒
1.
Zhiyi Yu Michael J. Meeuwsen Ryan W. Apperson Omar Sattari Michael A. Lai Jeremy W. Webb Eric W. Work Tinoosh Mohsenin Bevan M. Baas 《Journal of Signal Processing Systems》2008,53(3):243-259
This paper presents the architecture of an asynchronous array of simple processors (AsAP), and evaluates its key architectural
features as well as its performance and energy efficiency. The AsAP processor calculates DSP applications with high energy-efficiency,
is capable of high-performance, is easily scalable, and is well-suited to future fabrication technologies. It is composed
of a two-dimensional array of simple single-issue programmable processors interconnected by a reconfigurable mesh network.
Processors are designed to capture the kernels of many DSP algorithms with very little additional overhead. Each processor
contains its own tunable and haltable clock oscillator, and processors operate completely asynchronously with respect to each
other in a globally asynchronous locally synchronous (GALS) fashion. A 6×6 AsAP array has been designed and fabricated in
a 0.18 μm CMOS technology. Each processor occupies 0.66 mm2, is fully functional at a clock rate of 520–540 MHz at 1.8 V, and dissipates an average of 35 mW per processor at 520 MHz
under typical conditions while executing applications such as a JPEG encoder core and a complete IEEE 802.11a/g wireless LAN
baseband transmitter. Most processors operate at over 600 MHz at 2.0 V. Processors dissipate 2.4 mW at 116 MHz and 0.9 V.
A single AsAP processor occupies 4% or less area than a single processing element in other multi-processor chips. Compared
to several RISC processors (single issue MIPS and ARM), AsAP achieves performance 27–275 times greater, energy efficiency
96–215 times greater, while using far less area. Compared to the TI C62x high-end DSP processor, AsAP achieves performance
0.8–9.6 times greater, energy efficiency 10–75 times greater, with an area 7–19 times smaller. Compared to ASIC implementations,
AsAP achieves performance within a factor of 2–5, energy efficiency within a factor of 3–50, with area within a factor of
2.5–3. These data are for varying numbers of AsAP processors per benchmark.
相似文献
Bevan M. BaasEmail: |
2.
This paper outlines a new sign extension technique for use in carry save adder trees that reduces the computational complexity.
The “Negative Save” technique presented is a modification to the Baugh–Wooley sign extension technique developed for array
multipliers. Applying this sign extension technique to both parallel adder and multiplier partial product structures reduces
the hardware required. The speed of the resulting structures is also improved.
相似文献
Robert T. GrisamoreEmail: |
3.
Chen Jie Vidhyacharan Bhaskar 《International Journal of Wireless Information Networks》2008,15(1):53-60
This paper analyzes the distribution and density functions of the probability of error for Rayleigh and Rician fading channels
with diversity. An expression for the signal-to-noise ratio is derived for an asynchronous CDMA (A-CDMA) system with diversity.
The error probability distribution and density functions are derived and plotted for different mean energy-to-noise ratios.
相似文献
Vidhyacharan BhaskarEmail: |
4.
The paper summarizes the main results of one of the key panel session of the Workshop, focused on the investigation about
the possible translation of the “layerless communications” from a dreaming vision to reality.
相似文献
Juha SaarnioEmail: |
5.
6.
Lennart Yseboodt Michael De Nil Jos Huisken Mladen Berekovic Qin Zhao Frank Bouwens Jos Hulzink Jef Van Meerbergen 《Journal of Signal Processing Systems》2009,57(1):107-119
Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare
monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different
steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used
as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow
a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied,
next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain.
The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram
algorithm.
相似文献
Jef Van MeerbergenEmail: |
7.
Grzegorz Mrugalski Janusz Rajski Chen Wang Artur Pogiel Jerzy Tyszer 《Journal of Electronic Testing》2007,23(1):35-45
This paper describes a non-recursive fault diagnosis technique for scan-based designs with convolutional test response compaction.
The proposed approach allows a time-efficient and accurate identification of failing scan cells using Gauss–Jordan elimination
method.
相似文献
Jerzy Tyszer (Corresponding author)Email: |
8.
Jeffrey M. Arnold 《The Journal of VLSI Signal Processing》2007,47(1):3-14
A software configurable processor (SCP) is a hybrid device that couples a conventional processor datapath with programmable
logic to allow application programs to dynamically customize the instruction set. SCP architectures can offer significant
performance gains by exploiting data parallelism, operator specialization and deep pipelines. The S5000 is a family of high
performance software configurable processors for embedded applications. The S5000 consists of a conventional 32-bit RISC processor
coupled with a programmable Instruction Set Extension Fabric (ISEF). To develop an application for the S5 the programmer identifies
critical sections to be accelerated, writes one or more extension instructions as functions in a variant of the C programming
language, and accesses those functions from the application program. Performance gains of more than an order of magnitude
over the unaccelerated processor can be achieved.
相似文献
Jeffrey M. ArnoldEmail: |
9.
Pedro Reviriego Anna Maria Guidotti Carla Raffaelli Javier Aracil 《Photonic Network Communications》2008,16(1):61-70
Loss modeling of asynchronous optical burst switches with shared wavelength converters is considered. An exact analysis based
on continuous time Markov chains is proposed and validated by comparison with simulation for balanced and unbalanced traffic.
A computationally efficient approximated analysis is also proposed and compared with the exact model to find applicability
conditions. Approximate loss performance evaluation is presented for ranges of values which are not tractable either by simulation
or exact analysis.
相似文献
Javier AracilEmail: |
10.
B. Mei B. De Sutter T. Vander Aa M. Wouters A. Kanstein S. Dupont 《Journal of Signal Processing Systems》2008,51(3):225-243
Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor
architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability.
Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using
more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler,
simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other
video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done
in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed
and the power consumption are also very competitive compared with other processors.
相似文献
S. DupontEmail: |
11.
Chia-Hsin Cheng Jen-Yung Lin Chao-Kai Wen Jyh-Horng Wen 《Wireless Personal Communications》2009,48(2):311-325
In this paper, a convenient signaling scheme, called orthogonal on–off BPSK (O3BPSK), along with a simple one-shot linear decorrelating detector (LDD) and a whitening Rake bank, is proposed for near–far
resistant detection in asynchronous DS/CDMA systems. Based on the maximum multi-path spreading delay, a minimum duration of
“off” is suggested, during which the temporally adjacent bits (TABs) that contain multi-user interference (MUI) and inter-symbol
interference (ISI) from different users at the receiver are decoupled. The O3BPSK signaling scheme is combined with the whitening Rake receiver to preserve multi-path diversity gain in multi-path fading
CDMA channels. The scheme offers low complexity, no detection delay, near–far resistance, and compensation for fading channels.
相似文献
Jyh-Horng Wen (Corresponding author)Email: |
12.
Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication
systems. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT
lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting
(DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational
complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist
of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient
design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 μm process, the area of the processor
is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.
相似文献
Shuenn-Shyang WangEmail: |
13.
Jonah Probell 《Journal of Signal Processing Systems》2008,50(1):33-39
Many different video processor architectures exist. Its architecture gives a processor strength for a particular application.
Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support
multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor
architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor
level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized
for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth
rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video
processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and
peripheral support.
相似文献
Jonah ProbellEmail: |
14.
A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is
proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors
as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data
sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support
1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um
CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input
data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length;
it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN
standard.
相似文献
Paul AmpaduEmail: |
15.
Real-time DSP and FPGA Implementation of Wiener LMS Based Multipath Channel Estimation in 3G CDMA Systems 总被引:2,自引:0,他引:2
This paper investigates real-time DSP and FPGA implementations of a low complexity technique for asynchronous multiuser delay
acquisition and time varying channel tracking for multipath channels in WCDMA and cdma2000 systems. A multiuser-LMS-like structure
along with smoothing/prediction filters to improve tracking quality is reviewed. We investigate an efficient implementation
based on FFT/IFFT technique, under fixed-point data representation and computation constraint. The measured BER reveals that
fixed-point implementation is feasible at possibly no performance degradation. Based on real time execution made on a fixed-point
high performance DSP, the maximum number of users is 15 and 17 for the proposed method and correlator, respectively. Due to
the inherent parallelism and regular data flow FPGA implementation is suggested wherein a maximum number of users more than
80 can be afforded in Xilinx Virtex™ II Pro device.
相似文献
Daniel Massicotte (Corresponding author)Email: |
16.
Expressions are given for the moment generating functions of the Rayleigh and generalized Rayleigh distributions.
相似文献
Saralees NadarajahEmail: |
17.
Chen Dubi 《Multidimensional Systems and Signal Processing》2009,20(3):217-234
We further develop the study of Fornasini–Marchesini linear systems with upper triangular state operators, addressing the
problem of constructing a triangular Fornasini–Marchesini model equivalent (under a proper definition) to a given system.
In particular, we are interested in the problem of determining when such a system can be constructed, without losing the information
about the state of the original system.
相似文献
Chen DubiEmail: |
18.
This work describes a novel test strategy that uses digital stimuli for cheap, fast, though accurate, testing of high resolution
ΣΔ ADCs. Simulations and measurements showed a discrimination threshold on specification parameters up to −90 dBc. The proposed
method helps to reduce the cost of ADC production test, to extend test coverage and to enable built-in self-test and test-based
self-calibration.
相似文献
Leonardo Reyneri (Corresponding author)Email: |
19.
Tay-Jyi Lin Shin-Kai Chen Yu-Ting Kuo Chih-Wei Liu Pi-Chen Hsiao 《Journal of Signal Processing Systems》2008,51(3):209-223
This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications.
The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time
of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance
(estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is
also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process,
and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.
相似文献
Chih-Wei LiuEmail: |
20.
This paper addresses two coding schemes which can handle emerging errors with crisscross patterns. First, a code with maximum
rank distance, so-called Rank-Codes, is described and a modified Berlekamp–Massey algorithm is provided. Secondly, a Permutation
Code based coding scheme for crisscross error patterns is presented. The influence of different types of noise are also discussed.
相似文献
A. J. Han VinckEmail: |