共查询到20条相似文献,搜索用时 31 毫秒
1.
A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is
proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors
as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data
sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support
1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um
CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input
data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length;
it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN
standard.
相似文献
Paul AmpaduEmail: |
2.
B. Mei B. De Sutter T. Vander Aa M. Wouters A. Kanstein S. Dupont 《Journal of Signal Processing Systems》2008,51(3):225-243
Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor
architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability.
Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using
more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler,
simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other
video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done
in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed
and the power consumption are also very competitive compared with other processors.
相似文献
S. DupontEmail: |
3.
Jonah Probell 《Journal of Signal Processing Systems》2008,50(1):33-39
Many different video processor architectures exist. Its architecture gives a processor strength for a particular application.
Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support
multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor
architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor
level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized
for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth
rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video
processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and
peripheral support.
相似文献
Jonah ProbellEmail: |
4.
Lennart Yseboodt Michael De Nil Jos Huisken Mladen Berekovic Qin Zhao Frank Bouwens Jos Hulzink Jef Van Meerbergen 《Journal of Signal Processing Systems》2009,57(1):107-119
Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare
monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different
steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used
as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow
a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied,
next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain.
The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram
algorithm.
相似文献
Jef Van MeerbergenEmail: |
5.
6.
F. Angarita M. J. Canet T. Sansaloni A. Perez-Pascual J. Valls 《Journal of Signal Processing Systems》2008,52(2):181-191
In an orthogonal frequency division multiplexing-based wireless local area network receiver there are three operations that
can be performed by a unique coordinate rotation digital computer (CORDIC) processor since they are needed in different time
instants. These are the rotation of a vector, the computation of the angle of a vector and the computation of the reciprocal.
This paper proposes a common architecture of CORDIC algorithm suitable to implement the three operations with a reduced increase
of the hardware cost with respect to a single operation CORDIC. The proposed architecture has been validated on field programmable
gate-arrays devices and the results of the implementation show that area saving around 28% and throughput increment of 64%
are obtained.
相似文献
J. VallsEmail: |
7.
Mladen Berekovic Mladen Berekovic Tim Niggemeier 《Journal of Signal Processing Systems》2008,50(2):201-229
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal
processing applications by combining high frequency design techniques with a very high degree of parallel processing on a
chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme that was extended to eliminate
all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads
[simultaneously multi-threaded (SMT)]. Consequent application of fine clustering reduces the cycle-time for wire-sensitive
building blocks of the processor like the register file and the scheduling window and leads to a distributed architecture
model, where independent thread processing units, arithmetic logic units, registers files and memories are distributed across
the chip and communicate with each other by special network. A special communication protocol replaces broadcasting and associative
compare of destination tags in a centralised instruction scheduler with explicit operand transfer instructions, thus decentralizing
the control of the data flow to the greatest extent. As a result, the processor cycle time does neither depend on the issue
bandwidth of a single thread nor on the execution bandwidth of the SMT processor. This makes the performance of the architecture
scalable with both the number of function and the number of thread units without having any impact on the processors cycle-time.
Performance and scalability of the proposed microarchitecture is demonstrated with critical signal processing kernels from
the MPEG-4 video coding standard on a cycle-true simulator.
相似文献
Tim NiggemeierEmail: |
8.
Tay-Jyi Lin Shin-Kai Chen Yu-Ting Kuo Chih-Wei Liu Pi-Chen Hsiao 《Journal of Signal Processing Systems》2008,51(3):209-223
This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications.
The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time
of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance
(estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is
also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process,
and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.
相似文献
Chih-Wei LiuEmail: |
9.
This work presents an efficient architecture design for deblocking filter in H.264/AVC using a novel fast-deblocking boundary-strength
(FDBS) technique. Based on the FDBS technique, the proposed architecture divides the deblocking process into three filtering
modes, namely offset-based, standard-based and diagonal-based filtering modes, to reduce the blocking artifact and improve
the video quality in H.264/AVC. The proposed architecture is designed in Verilog HDL, simulated with Quartus II and synthesized
using 0.18 μm CMOS cells library with the Synopsys Design Compiler. Simulation results demonstrate good performance in PSNR
improvement and bit-rate reduction. Additionally, verification results through physical chip design reveal that the proposed
architecture design can support 1,280 × 720@30 Hz processing throughput while clocking at 100 MHz. Comparisons with other
studies show the excellent properties of the proposed architecture in terms of gate count, memory size and clock-cycle/macroblock.
相似文献
Chun-Lung HsuEmail: |
10.
Lingfeng Li Yang Song Shen Li Takeshi Ikenaga Satoshi Goto 《Journal of Signal Processing Systems》2008,50(1):81-95
This paper presents a compact hardware architecture of Context-Based Adaptive Binary Arithmetic Coding (CABAC) codec for H.264/AVC.
The similarities between encoding algorithm and decoding algorithm are explored to achieve remarkable hardware reuse. System-level
hardware/software partition is conducted to improve overall performance. Meanwhile, the characteristics of CABAC algorithm
are utilized to implement dynamic pipeline scheme, which increases the processing throughput with very small hardware overhead.
Proposed architecture is implemented under 0.18 μm technology. Results show that the core area of proposed design is 0.496 mm2 when the maximum clock frequency is 230 MHz. It is estimated that the proposed architecture can support CABAC encoding or
decoding for HD1080i resolution at a speed of 30 frame/s.
相似文献
Lingfeng LiEmail: |
11.
T. Sansaloni A. Pérez-Pascual V. Torres J. Valls 《The Journal of VLSI Signal Processing》2007,47(2):183-187
A scheme for reducing the hardware resources to implement on LUT-based FPGA devices the twiddle factors required in Fast Fourier
Transform (FFT) processors is presented. The proposed scheme reduces the number of embedded block RAM for large FFTs and the
number of slices for FFT lengths higher than 128 points. Results are given for Xilinx devices, but they can be generalized
for other advanced LUT-based devices like ALTERA Stratix.
相似文献
T. SansaloniEmail: |
12.
This paper presents an Application-Specific Signal Processor (ASSP) for Orthogonal Frequency Division Multiplexing (OFDM)
Communication Systems, called SPOCS. The instruction set and its architecture are specially designed for OFDM systems, such
as Fast Fourier Transform (FFT), scrambling/descrambling, puncturing, convolutional encoding, interleaving/deinterleaving,
etc. SPOCS employs the optimized Data Processing Unit (DPU) to support the proposed instructions and the FFT Address Generation
Unit (FAGU) to automatically calculate input/output data addresses. In addition, the proposed Bit Manipulation Unit (BMU)
supports efficient bit manipulation operations. SPOCS has been synthesized using the SEC 0.18 μm standard cell library and
has a much smaller area than commercial DSP chips. SPOCS can reduce the number of clock cycles over 8%~53% for FFT and about
48%~84% for scrambling, convolutional encoding and interleaving compared with existing DSP chips. SPOCS can support various
OFDM communication standards, such as Wireless Local Area Network (WLAN), Digital Audio Broadcasting (DAB), Digital Video
Broadcasting-Terrestrial (DVB-T), etc.
相似文献
Myung H. SunwooEmail: |
13.
This paper presents an FPGA realisation of an application-specific cellular processor array designed for asynchronous skeletonization
of binary images. The skeletonization algorithm is based on iterative thinning utilizing a ‘grassfire’ transformation approach.
The purpose of this work was to test the performance of a fully parallel asynchronous processor array and to evaluate the
inhomogeneity of wave propagation velocity. A proof-of-concept design has been implemented and evaluated, the results are
presented and discussed.
相似文献
Piotr DudekEmail: |
14.
We implemented the H.264/AVC variable block size motion estimation (VBSME) using a very long instruction word (VLIW)–single
instruction multiple data (SIMD) digital signal processor (DSP). The SAD_Reuse method which has a regular structure is chosen
for VBSME not only to remove redundant sum of absolute difference (SAD) operations but also to utilize the instruction level
parallelism (ILP) and data level parallelism (DLP) of the architecture. A fast mode decision algorithm is developed to reduce
the number of ‘compare and update’ operations and simplify the rate distortion optimization (RDO). The developed fast mode
decision uses the difference of motion vectors and the maximum a posteriori (MAP) estimation of the rate-distortion costs.
Several advanced software techniques that include software pipelining and packed-data processing are employed. Especially,
memory access overhead reduction schemes including the multi-block processing and the inter-procedural scheduling are used
for the software optimization. In order to reduce the ‘write buffer full’ in the quarter pixel ME, a 4 bit quantization scheme
is developed, which increases the number of arithmetic operations but decreases the stall cycles very much. The implemented
variable block size ME for H.264/AVC requires an average of 9 M and 78 Mcycles per frame for QCIF and CIF size video sequences,
respectively, in the TMS320C64x DSP architecture.
相似文献
Wonyong SungEmail: |
15.
This paper proposes a novel cost-effective and programmable architecture of CAVLC decoder for H.264/AVC, including decoders
for Coeff_token, T1_sign, Level, Total_zeros and Run_before. To simplify the hardware architecture and provide programmability,
we propose four new techniques: a new group-based VLD with efficient memory (NG–VLDEM) for Coeff_token decoder, a novel combined
architecture (NCA) for level decoder, a new group-based VLD with memory access once (GMAO) for Total_zeros decoder and a new
VLD architecture based on multiplexers instead of searching memory (MISM) for Run_before decoder. With the above four techniques,
the proposed CAVLC decoder can decode every syntax element within one clock cycle. Synthesis result shows that the hardware
cost is 3,310 gates with 0.18 μm CMOS technology at a clock constrain of 125 MHz. Therefore, the proposed design is satisfied
for real-time applications, such as H.264/AVC HD1080i video decoding.
相似文献
Shunliang MeiEmail: |
16.
17.
A. Pérez-Pascual T. Sansaloni V. Torres V. Almenar J. Valls 《Journal of Signal Processing Systems》2009,56(1):35-40
This paper shows that when a digital receiver is designed utilizing two clock scopes, the digital down-converter can be designed
to be efficient in terms of area and power consumption. The main design parameter that contributes to make the design efficient
is the relationship between the transition band of the designed filter and its sampling frequency.
相似文献
J. VallsEmail: |
18.
In this paper, we propose a cost-effective architecture of variable length decoder (VLD) for MPEG-2 and AVS. In order to save
the buffer memory between VLD and IDCT and accelerate decoding speed, block-based pipeline buffers are adopted. Inverse scan
(IScan) and inverse quantisation (IQ) are also merged into this architecture for cost-effective implementation and for easier
system integration. A novel group-based architecture with the optimized look-up table is used for MPEG-2 and a new memory-efficient
architecture with mixed memory organization is used for AVS. We use shared modules in both MPEG-2 and AVS as much as possible,
such as the flush unit, the buffer controller and the buffers. Moreover, we propose merged IQ scheme and merged RAMs scheme.
Based on 0.18 μm CMOS technology, the proposed design consumes about 11.5 K gates at a clock constrain of 125 MHz. The simulation
results show that it can achieve real-time decoding, such as HD1080i (1,920 × 1,088 at 30 MHz) format video of AVS and MPEG-2.
Furthermore, we propose an effective design of the buffers between VLD and IDCT according to the IDCT architecture, a cost-efficient
IQ architecture with full flexibility and an efficient scheme for accelerating VLC decoding.
相似文献
Yun HeEmail: |
19.
Cognitive Radio with Single Carrier TDCS and Multicarrier OFDM Approach with V-BLAST Receiver in Rayleigh Fading Channel 总被引:1,自引:0,他引:1
This article presents the performance comparison of TDCS and OFDM based cognitive radio for MIMO system using VBLAST receiver
architecture to reconstruct the transmitted data. The interference avoidance performance in terms of BER and bitrate are improved
by adding multiple antennas to the system and the use of V-BLAST technique at the receiver. The results show the most promising
interference avoidance technique combined with MIMO V-BLAST architecture to be applied in the CR system.
相似文献
L. P. LigthartEmail: |
20.
Virtual Identity Framework for Telecom Infrastructures 总被引:1,自引:1,他引:0
Amardeo Sarma Alfredo Matos João Girão Rui L. Aguiar 《Wireless Personal Communications》2008,45(4):521-543
Identity Management has so far been a field mainly applications and Web focused. This paper describes a novel approach to
cross layer identity management that extends digital identities to the network, the virtual identity (VID) framework. The
VID framework provides strong privacy to the user, while easily supporting personalization cross-service providers. While
other identity management solutions are tailored to one specific application and/or protocol domain, the proposed framework
extends the use of one’s digital identity to all aspects of the network and services architecture. It is also the first to
consider legal constrains, such as ownership of data and legal intercept issues, in such a broad scope. One major aspect reported
here is the relevance for operators.
相似文献
Rui L. AguiarEmail: |