首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents the architecture of an asynchronous array of simple processors (AsAP), and evaluates its key architectural features as well as its performance and energy efficiency. The AsAP processor calculates DSP applications with high energy-efficiency, is capable of high-performance, is easily scalable, and is well-suited to future fabrication technologies. It is composed of a two-dimensional array of simple single-issue programmable processors interconnected by a reconfigurable mesh network. Processors are designed to capture the kernels of many DSP algorithms with very little additional overhead. Each processor contains its own tunable and haltable clock oscillator, and processors operate completely asynchronously with respect to each other in a globally asynchronous locally synchronous (GALS) fashion. A 6×6 AsAP array has been designed and fabricated in a 0.18 μm CMOS technology. Each processor occupies 0.66 mm2, is fully functional at a clock rate of 520–540 MHz at 1.8 V, and dissipates an average of 35 mW per processor at 520 MHz under typical conditions while executing applications such as a JPEG encoder core and a complete IEEE 802.11a/g wireless LAN baseband transmitter. Most processors operate at over 600 MHz at 2.0 V. Processors dissipate 2.4 mW at 116 MHz and 0.9 V. A single AsAP processor occupies 4% or less area than a single processing element in other multi-processor chips. Compared to several RISC processors (single issue MIPS and ARM), AsAP achieves performance 27–275 times greater, energy efficiency 96–215 times greater, while using far less area. Compared to the TI C62x high-end DSP processor, AsAP achieves performance 0.8–9.6 times greater, energy efficiency 10–75 times greater, with an area 7–19 times smaller. Compared to ASIC implementations, AsAP achieves performance within a factor of 2–5, energy efficiency within a factor of 3–50, with area within a factor of 2.5–3. These data are for varying numbers of AsAP processors per benchmark.
Bevan M. BaasEmail:
  相似文献   

2.
This paper outlines a new sign extension technique for use in carry save adder trees that reduces the computational complexity. The “Negative Save” technique presented is a modification to the Baugh–Wooley sign extension technique developed for array multipliers. Applying this sign extension technique to both parallel adder and multiplier partial product structures reduces the hardware required. The speed of the resulting structures is also improved.
Robert T. GrisamoreEmail:
  相似文献   

3.
This paper analyzes the distribution and density functions of the probability of error for Rayleigh and Rician fading channels with diversity. An expression for the signal-to-noise ratio is derived for an asynchronous CDMA (A-CDMA) system with diversity. The error probability distribution and density functions are derived and plotted for different mean energy-to-noise ratios.
Vidhyacharan BhaskarEmail:
  相似文献   

4.
The paper summarizes the main results of one of the key panel session of the Workshop, focused on the investigation about the possible translation of the “layerless communications” from a dreaming vision to reality.
Juha SaarnioEmail:
  相似文献   

5.
6.
Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied, next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain. The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram algorithm.
Jef Van MeerbergenEmail:
  相似文献   

7.
This paper describes a non-recursive fault diagnosis technique for scan-based designs with convolutional test response compaction. The proposed approach allows a time-efficient and accurate identification of failing scan cells using Gauss–Jordan elimination method.
Jerzy Tyszer (Corresponding author)Email:
  相似文献   

8.
A software configurable processor (SCP) is a hybrid device that couples a conventional processor datapath with programmable logic to allow application programs to dynamically customize the instruction set. SCP architectures can offer significant performance gains by exploiting data parallelism, operator specialization and deep pipelines. The S5000 is a family of high performance software configurable processors for embedded applications. The S5000 consists of a conventional 32-bit RISC processor coupled with a programmable Instruction Set Extension Fabric (ISEF). To develop an application for the S5 the programmer identifies critical sections to be accelerated, writes one or more extension instructions as functions in a variant of the C programming language, and accesses those functions from the application program. Performance gains of more than an order of magnitude over the unaccelerated processor can be achieved.
Jeffrey M. ArnoldEmail:
  相似文献   

9.
Loss modeling of asynchronous optical burst switches with shared wavelength converters is considered. An exact analysis based on continuous time Markov chains is proposed and validated by comparison with simulation for balanced and unbalanced traffic. A computationally efficient approximated analysis is also proposed and compared with the exact model to find applicability conditions. Approximate loss performance evaluation is presented for ranges of values which are not tractable either by simulation or exact analysis.
Javier AracilEmail:
  相似文献   

10.
Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability. Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler, simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed and the power consumption are also very competitive compared with other processors.
S. DupontEmail:
  相似文献   

11.
In this paper, a convenient signaling scheme, called orthogonal on–off BPSK (O3BPSK), along with a simple one-shot linear decorrelating detector (LDD) and a whitening Rake bank, is proposed for near–far resistant detection in asynchronous DS/CDMA systems. Based on the maximum multi-path spreading delay, a minimum duration of “off” is suggested, during which the temporally adjacent bits (TABs) that contain multi-user interference (MUI) and inter-symbol interference (ISI) from different users at the receiver are decoupled. The O3BPSK signaling scheme is combined with the whitening Rake receiver to preserve multi-path diversity gain in multi-path fading CDMA channels. The scheme offers low complexity, no detection delay, near–far resistance, and compensation for fading channels.
Jyh-Horng Wen (Corresponding author)Email:
  相似文献   

12.
Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication systems. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 μm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.
Shuenn-Shyang WangEmail:
  相似文献   

13.
Many different video processor architectures exist. Its architecture gives a processor strength for a particular application. Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and peripheral support.
Jonah ProbellEmail:
  相似文献   

14.
A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support 1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length; it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN standard.
Paul AmpaduEmail:
  相似文献   

15.
This paper investigates real-time DSP and FPGA implementations of a low complexity technique for asynchronous multiuser delay acquisition and time varying channel tracking for multipath channels in WCDMA and cdma2000 systems. A multiuser-LMS-like structure along with smoothing/prediction filters to improve tracking quality is reviewed. We investigate an efficient implementation based on FFT/IFFT technique, under fixed-point data representation and computation constraint. The measured BER reveals that fixed-point implementation is feasible at possibly no performance degradation. Based on real time execution made on a fixed-point high performance DSP, the maximum number of users is 15 and 17 for the proposed method and correlator, respectively. Due to the inherent parallelism and regular data flow FPGA implementation is suggested wherein a maximum number of users more than 80 can be afforded in Xilinx Virtex™ II Pro device.
Daniel Massicotte (Corresponding author)Email:
  相似文献   

16.
Expressions are given for the moment generating functions of the Rayleigh and generalized Rayleigh distributions.
Saralees NadarajahEmail:
  相似文献   

17.
We further develop the study of Fornasini–Marchesini linear systems with upper triangular state operators, addressing the problem of constructing a triangular Fornasini–Marchesini model equivalent (under a proper definition) to a given system. In particular, we are interested in the problem of determining when such a system can be constructed, without losing the information about the state of the original system.
Chen DubiEmail:
  相似文献   

18.
This work describes a novel test strategy that uses digital stimuli for cheap, fast, though accurate, testing of high resolution ΣΔ ADCs. Simulations and measurements showed a discrimination threshold on specification parameters up to −90 dBc. The proposed method helps to reduce the cost of ADC production test, to extend test coverage and to enable built-in self-test and test-based self-calibration.
Leonardo Reyneri (Corresponding author)Email:
  相似文献   

19.
This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications. The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance (estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process, and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.
Chih-Wei LiuEmail:
  相似文献   

20.
This paper addresses two coding schemes which can handle emerging errors with crisscross patterns. First, a code with maximum rank distance, so-called Rank-Codes, is described and a modified Berlekamp–Massey algorithm is provided. Secondly, a Permutation Code based coding scheme for crisscross error patterns is presented. The influence of different types of noise are also discussed.
A. J. Han VinckEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号