首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support 1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length; it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN standard.
Paul AmpaduEmail:
  相似文献   

2.
Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability. Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler, simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed and the power consumption are also very competitive compared with other processors.
S. DupontEmail:
  相似文献   

3.
Many different video processor architectures exist. Its architecture gives a processor strength for a particular application. Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and peripheral support.
Jonah ProbellEmail:
  相似文献   

4.
Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied, next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain. The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram algorithm.
Jef Van MeerbergenEmail:
  相似文献   

5.
6.
In an orthogonal frequency division multiplexing-based wireless local area network receiver there are three operations that can be performed by a unique coordinate rotation digital computer (CORDIC) processor since they are needed in different time instants. These are the rotation of a vector, the computation of the angle of a vector and the computation of the reciprocal. This paper proposes a common architecture of CORDIC algorithm suitable to implement the three operations with a reduced increase of the hardware cost with respect to a single operation CORDIC. The proposed architecture has been validated on field programmable gate-arrays devices and the results of the implementation show that area saving around 28% and throughput increment of 64% are obtained.
J. VallsEmail:
  相似文献   

7.
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads [simultaneously multi-threaded (SMT)]. Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and the scheduling window and leads to a distributed architecture model, where independent thread processing units, arithmetic logic units, registers files and memories are distributed across the chip and communicate with each other by special network. A special communication protocol replaces broadcasting and associative compare of destination tags in a centralised instruction scheduler with explicit operand transfer instructions, thus decentralizing the control of the data flow to the greatest extent. As a result, the processor cycle time does neither depend on the issue bandwidth of a single thread nor on the execution bandwidth of the SMT processor. This makes the performance of the architecture scalable with both the number of function and the number of thread units without having any impact on the processors cycle-time. Performance and scalability of the proposed microarchitecture is demonstrated with critical signal processing kernels from the MPEG-4 video coding standard on a cycle-true simulator.
Tim NiggemeierEmail:
  相似文献   

8.
This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications. The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance (estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process, and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.
Chih-Wei LiuEmail:
  相似文献   

9.
This work presents an efficient architecture design for deblocking filter in H.264/AVC using a novel fast-deblocking boundary-strength (FDBS) technique. Based on the FDBS technique, the proposed architecture divides the deblocking process into three filtering modes, namely offset-based, standard-based and diagonal-based filtering modes, to reduce the blocking artifact and improve the video quality in H.264/AVC. The proposed architecture is designed in Verilog HDL, simulated with Quartus II and synthesized using 0.18 μm CMOS cells library with the Synopsys Design Compiler. Simulation results demonstrate good performance in PSNR improvement and bit-rate reduction. Additionally, verification results through physical chip design reveal that the proposed architecture design can support 1,280 × 720@30 Hz processing throughput while clocking at 100 MHz. Comparisons with other studies show the excellent properties of the proposed architecture in terms of gate count, memory size and clock-cycle/macroblock.
Chun-Lung HsuEmail:
  相似文献   

10.
This paper presents a compact hardware architecture of Context-Based Adaptive Binary Arithmetic Coding (CABAC) codec for H.264/AVC. The similarities between encoding algorithm and decoding algorithm are explored to achieve remarkable hardware reuse. System-level hardware/software partition is conducted to improve overall performance. Meanwhile, the characteristics of CABAC algorithm are utilized to implement dynamic pipeline scheme, which increases the processing throughput with very small hardware overhead. Proposed architecture is implemented under 0.18 μm technology. Results show that the core area of proposed design is 0.496 mm2 when the maximum clock frequency is 230 MHz. It is estimated that the proposed architecture can support CABAC encoding or decoding for HD1080i resolution at a speed of 30 frame/s.
Lingfeng LiEmail:
  相似文献   

11.
A scheme for reducing the hardware resources to implement on LUT-based FPGA devices the twiddle factors required in Fast Fourier Transform (FFT) processors is presented. The proposed scheme reduces the number of embedded block RAM for large FFTs and the number of slices for FFT lengths higher than 128 points. Results are given for Xilinx devices, but they can be generalized for other advanced LUT-based devices like ALTERA Stratix.
T. SansaloniEmail:
  相似文献   

12.
This paper presents an Application-Specific Signal Processor (ASSP) for Orthogonal Frequency Division Multiplexing (OFDM) Communication Systems, called SPOCS. The instruction set and its architecture are specially designed for OFDM systems, such as Fast Fourier Transform (FFT), scrambling/descrambling, puncturing, convolutional encoding, interleaving/deinterleaving, etc. SPOCS employs the optimized Data Processing Unit (DPU) to support the proposed instructions and the FFT Address Generation Unit (FAGU) to automatically calculate input/output data addresses. In addition, the proposed Bit Manipulation Unit (BMU) supports efficient bit manipulation operations. SPOCS has been synthesized using the SEC 0.18 μm standard cell library and has a much smaller area than commercial DSP chips. SPOCS can reduce the number of clock cycles over 8%~53% for FFT and about 48%~84% for scrambling, convolutional encoding and interleaving compared with existing DSP chips. SPOCS can support various OFDM communication standards, such as Wireless Local Area Network (WLAN), Digital Audio Broadcasting (DAB), Digital Video Broadcasting-Terrestrial (DVB-T), etc.
Myung H. SunwooEmail:
  相似文献   

13.
This paper presents an FPGA realisation of an application-specific cellular processor array designed for asynchronous skeletonization of binary images. The skeletonization algorithm is based on iterative thinning utilizing a ‘grassfire’ transformation approach. The purpose of this work was to test the performance of a fully parallel asynchronous processor array and to evaluate the inhomogeneity of wave propagation velocity. A proof-of-concept design has been implemented and evaluated, the results are presented and discussed.
Piotr DudekEmail:
  相似文献   

14.
We implemented the H.264/AVC variable block size motion estimation (VBSME) using a very long instruction word (VLIW)–single instruction multiple data (SIMD) digital signal processor (DSP). The SAD_Reuse method which has a regular structure is chosen for VBSME not only to remove redundant sum of absolute difference (SAD) operations but also to utilize the instruction level parallelism (ILP) and data level parallelism (DLP) of the architecture. A fast mode decision algorithm is developed to reduce the number of ‘compare and update’ operations and simplify the rate distortion optimization (RDO). The developed fast mode decision uses the difference of motion vectors and the maximum a posteriori (MAP) estimation of the rate-distortion costs. Several advanced software techniques that include software pipelining and packed-data processing are employed. Especially, memory access overhead reduction schemes including the multi-block processing and the inter-procedural scheduling are used for the software optimization. In order to reduce the ‘write buffer full’ in the quarter pixel ME, a 4 bit quantization scheme is developed, which increases the number of arithmetic operations but decreases the stall cycles very much. The implemented variable block size ME for H.264/AVC requires an average of 9 M and 78 Mcycles per frame for QCIF and CIF size video sequences, respectively, in the TMS320C64x DSP architecture.
Wonyong SungEmail:
  相似文献   

15.
This paper proposes a novel cost-effective and programmable architecture of CAVLC decoder for H.264/AVC, including decoders for Coeff_token, T1_sign, Level, Total_zeros and Run_before. To simplify the hardware architecture and provide programmability, we propose four new techniques: a new group-based VLD with efficient memory (NG–VLDEM) for Coeff_token decoder, a novel combined architecture (NCA) for level decoder, a new group-based VLD with memory access once (GMAO) for Total_zeros decoder and a new VLD architecture based on multiplexers instead of searching memory (MISM) for Run_before decoder. With the above four techniques, the proposed CAVLC decoder can decode every syntax element within one clock cycle. Synthesis result shows that the hardware cost is 3,310 gates with 0.18 μm CMOS technology at a clock constrain of 125 MHz. Therefore, the proposed design is satisfied for real-time applications, such as H.264/AVC HD1080i video decoding.
Shunliang MeiEmail:
  相似文献   

16.
17.
This paper shows that when a digital receiver is designed utilizing two clock scopes, the digital down-converter can be designed to be efficient in terms of area and power consumption. The main design parameter that contributes to make the design efficient is the relationship between the transition band of the designed filter and its sampling frequency.
J. VallsEmail:
  相似文献   

18.
In this paper, we propose a cost-effective architecture of variable length decoder (VLD) for MPEG-2 and AVS. In order to save the buffer memory between VLD and IDCT and accelerate decoding speed, block-based pipeline buffers are adopted. Inverse scan (IScan) and inverse quantisation (IQ) are also merged into this architecture for cost-effective implementation and for easier system integration. A novel group-based architecture with the optimized look-up table is used for MPEG-2 and a new memory-efficient architecture with mixed memory organization is used for AVS. We use shared modules in both MPEG-2 and AVS as much as possible, such as the flush unit, the buffer controller and the buffers. Moreover, we propose merged IQ scheme and merged RAMs scheme. Based on 0.18 μm CMOS technology, the proposed design consumes about 11.5 K gates at a clock constrain of 125 MHz. The simulation results show that it can achieve real-time decoding, such as HD1080i (1,920 × 1,088 at 30 MHz) format video of AVS and MPEG-2. Furthermore, we propose an effective design of the buffers between VLD and IDCT according to the IDCT architecture, a cost-efficient IQ architecture with full flexibility and an efficient scheme for accelerating VLC decoding.
Yun HeEmail:
  相似文献   

19.
This article presents the performance comparison of TDCS and OFDM based cognitive radio for MIMO system using VBLAST receiver architecture to reconstruct the transmitted data. The interference avoidance performance in terms of BER and bitrate are improved by adding multiple antennas to the system and the use of V-BLAST technique at the receiver. The results show the most promising interference avoidance technique combined with MIMO V-BLAST architecture to be applied in the CR system.
L. P. LigthartEmail:
  相似文献   

20.
Virtual Identity Framework for Telecom Infrastructures   总被引:1,自引:1,他引:0  
Identity Management has so far been a field mainly applications and Web focused. This paper describes a novel approach to cross layer identity management that extends digital identities to the network, the virtual identity (VID) framework. The VID framework provides strong privacy to the user, while easily supporting personalization cross-service providers. While other identity management solutions are tailored to one specific application and/or protocol domain, the proposed framework extends the use of one’s digital identity to all aspects of the network and services architecture. It is also the first to consider legal constrains, such as ownership of data and legal intercept issues, in such a broad scope. One major aspect reported here is the relevance for operators.
Rui L. AguiarEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号