首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Borden codes are optimal nonsystematic t-unidirectional error detecting (t-UED) codes. A possible method to design a Borden code checker is to map the Borden code words to words of an AN arithmetic code and to check the obtained words with an appropriate AN code checker. For t = q − 1 with q = 2 m  − 1 we show how this method can be modified such that the Borden code checkers achieve the self-testing property under very weak conditions. It is only required that no checker input line gets a constant signal and that the Borden code words occur in a random order, making the proposed checkers very suitable for use as embedded checkers. Based on these checkers it is then possible to design embedded Borden t-UED code checkers for t = 2 k q − 1 with q = 2 m  − 1.
Steffen TarnickEmail:
  相似文献   

2.
FPGA Implementation of Integer Transform and Quantizer for H.264 Encoder   总被引:1,自引:0,他引:1  
This paper deals with the process of Transformation and Quantization that is carried out on each inter-predicted residual block in a video encoding process and their reduced complexity hardware implementation. H.264/AVC utilizes 4 × 4 integer transform, which is derived from the 4 × 4 DCT. We propose, a reduced complexity algorithm and a pipelined structure for the Core forward integer transform module. A multiplier-less architecture is realized with less number of shifts and adds compared to existing works. The corresponding inverse transform is exactly reversible. Each of the transformed coefficients is quantized by a scalar quantizer. The quantization step size can be varied from macroblock to macroblock. The proposed unified pipelined architecture outperforms many recent implementations in terms of gate count and is capable of processing a 4 × 4 residual block in 4 clock cycles.
Reeba KorahEmail:
  相似文献   

3.
This paper presents a compact hardware architecture of Context-Based Adaptive Binary Arithmetic Coding (CABAC) codec for H.264/AVC. The similarities between encoding algorithm and decoding algorithm are explored to achieve remarkable hardware reuse. System-level hardware/software partition is conducted to improve overall performance. Meanwhile, the characteristics of CABAC algorithm are utilized to implement dynamic pipeline scheme, which increases the processing throughput with very small hardware overhead. Proposed architecture is implemented under 0.18 μm technology. Results show that the core area of proposed design is 0.496 mm2 when the maximum clock frequency is 230 MHz. It is estimated that the proposed architecture can support CABAC encoding or decoding for HD1080i resolution at a speed of 30 frame/s.
Lingfeng LiEmail:
  相似文献   

4.
A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support 1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length; it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN standard.
Paul AmpaduEmail:
  相似文献   

5.
Elliptic curve cryptography (ECC) is recognized as a fast cryptography system and has many applications in security systems. In this paper, a novel sharing scheme is proposed to significantly reduce the number of field multiplications and the usage of lookup tables, providing high speed operations for both hardware and software realizations.
Brian KingEmail:
  相似文献   

6.
This work presents an efficient architecture design for deblocking filter in H.264/AVC using a novel fast-deblocking boundary-strength (FDBS) technique. Based on the FDBS technique, the proposed architecture divides the deblocking process into three filtering modes, namely offset-based, standard-based and diagonal-based filtering modes, to reduce the blocking artifact and improve the video quality in H.264/AVC. The proposed architecture is designed in Verilog HDL, simulated with Quartus II and synthesized using 0.18 μm CMOS cells library with the Synopsys Design Compiler. Simulation results demonstrate good performance in PSNR improvement and bit-rate reduction. Additionally, verification results through physical chip design reveal that the proposed architecture design can support 1,280 × 720@30 Hz processing throughput while clocking at 100 MHz. Comparisons with other studies show the excellent properties of the proposed architecture in terms of gate count, memory size and clock-cycle/macroblock.
Chun-Lung HsuEmail:
  相似文献   

7.
This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications. The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance (estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process, and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.
Chih-Wei LiuEmail:
  相似文献   

8.
In this paper, we present high performance motion compensation architecture for H.264/AVC HDTV decoder. The bottleneck of efficient motion compensation implementation primarily rests on the high memory bandwidth demand and six-tap fractional interpolation complexity. To solve the bottleneck for H.264/AVC HD applications, three combined bandwidth optimization strategies are proposed to minimize the memory bandwidth for MB-based decoding process. To improve the interpolation hardware utilization and reduce the interpolation cycles, an interpolation classification scheme is proposed. By classifying the fifteen fractional pixels into five types and processing correspondingly, the interpolation cycles decrease significantly. A direct mapping memory cache characterized with circular addressing, byte-aligned addressing and horizontal and vertical parallel access is designed to support the proposed scheme. The hardware of proposed motion compensation is implemented at 100 M with 31.841 K logic gates, averagely 70–80% reduced memory bandwidth can be offered and the interpolation hardware can be fully utilized and interpolate one MB within 304 cycles, which can satisfy the real time constraint for H.264/AVC HD (1,920 × 1,088) 30 fps decoder. The design is implemented under UMC 0.18 μm technology, and the synthesis results and comparisons are shown.
Yu LiEmail:
  相似文献   

9.
This paper presents a family of uniform random number generators designed for efficient implementation in Lookup table (LUT) based FPGA architectures. A generator with a period of 2 k  − 1 can be implemented using k flip-flops and k LUTs, and provides k random output bits each cycle. Each generator is based on a binary linear recurrence, with a state-transition matrix designed to make best use of all available LUT inputs in a given FPGA architecture, and to ensure that the critical path between all registers is a single LUT. This class of generator provides a higher sample rate per area than LFSR and Combined Tausworthe generators, and operates at similar or higher clock-rates. The statistical quality of the generators increases with k, and can be used to pass all common empirical tests such as Diehard, Crush and the NIST cryptographic test suite. Theoretical properties such as global equidistribution can also be calculated, and best and average case statistics shown. Due to the large number of random bits generated per cycle these generators can be used as a basis for generators with even higher statistical quality, and an example involving combination through addition is demonstrated.
Wayne LukEmail:
  相似文献   

10.
Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication systems. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 μm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.
Shuenn-Shyang WangEmail:
  相似文献   

11.
We present a high-speed public-key cryptoprocessor that exploits three-level parallelism in Elliptic Curve Cryptography (ECC) over GF(2 n ). The proposed cryptoprocessor employs a Parallelized Modular Arithmetic Logic Unit (P-MALU) that exploits two types of different parallelism for accelerating modular operations. The sequence of scalar multiplications is also accelerated by exploiting Instruction-Level Parallelism (ILP) and processing multiple P-MALU instructions in parallel. The system is programmable and hence independent of the type of the elliptic curves and scalar multiplication algorithms. The synthesis results show that scalar multiplication of ECC over GF(2163) on a generic curve can be computed in 20 and 16 μs respectively for the binary NAF (Non-Adjacent Form) and the Montgomery method. The performance can be accelerated furthermore on a Koblitz curve and reach scalar multiplication of 12 μs with the TNAF (τ-adic NAF) method. This fast performance allows us to perform over 80,000 scalar multiplications per second and to enhance security in wireless mobile applications.
Ingrid VerbauwhedeEmail:
  相似文献   

12.
In an orthogonal frequency division multiplexing-based wireless local area network receiver there are three operations that can be performed by a unique coordinate rotation digital computer (CORDIC) processor since they are needed in different time instants. These are the rotation of a vector, the computation of the angle of a vector and the computation of the reciprocal. This paper proposes a common architecture of CORDIC algorithm suitable to implement the three operations with a reduced increase of the hardware cost with respect to a single operation CORDIC. The proposed architecture has been validated on field programmable gate-arrays devices and the results of the implementation show that area saving around 28% and throughput increment of 64% are obtained.
J. VallsEmail:
  相似文献   

13.
This paper proposes a novel cost-effective and programmable architecture of CAVLC decoder for H.264/AVC, including decoders for Coeff_token, T1_sign, Level, Total_zeros and Run_before. To simplify the hardware architecture and provide programmability, we propose four new techniques: a new group-based VLD with efficient memory (NG–VLDEM) for Coeff_token decoder, a novel combined architecture (NCA) for level decoder, a new group-based VLD with memory access once (GMAO) for Total_zeros decoder and a new VLD architecture based on multiplexers instead of searching memory (MISM) for Run_before decoder. With the above four techniques, the proposed CAVLC decoder can decode every syntax element within one clock cycle. Synthesis result shows that the hardware cost is 3,310 gates with 0.18 μm CMOS technology at a clock constrain of 125 MHz. Therefore, the proposed design is satisfied for real-time applications, such as H.264/AVC HD1080i video decoding.
Shunliang MeiEmail:
  相似文献   

14.
In this paper an improved Montgomery multiplier, based on modified four-to-two carry-save adders (CSAs) to reduce critical path delay, is presented. Instead of implementing four-to-two CSA using two levels of carry-save logic, authors propose a modified four-to-two CSA using only one level of carry-save logic taking advantage of pre-computed input values. Also, a new bit-sliced, unified and scalable Montgomery multiplier architecture, applicable for both RSA and ECC (Elliptic Curve Cryptography), is proposed. In the existing word-based scalable multiplier architectures, some processing elements (PEs) do not perform useful computation during the last pipeline cycle when the precision is not equal to an exact multiple of the word size, like in ECC. This intrinsic limitation requires a few extra clock cycles to operate on operand lengths which are not powers of 2. The proposed architecture eliminates the need for extra clock cycles by reconfiguring the design at bit-level and hence can operate on any operand length, limited only by memory and control constraints. It requires 2∼15% fewer clock cycles than the existing architectures for key lengths of interest in RSA and 11∼18% for binary fields and 10∼14% for prime fields in case of ECC. An FPGA implementation of the proposed architecture shows that it can perform 1,024-bit modular exponentiation in about 15 ms which is better than that by the existing multiplier architectures.
M. B. SrinivasEmail:
  相似文献   

15.
A Highly Parallel Joint VLSI Architecture for Transforms in H.264/AVC   总被引:1,自引:0,他引:1  
In H.264/AVC, the concept of adapting the transform size to the block size of motion-compensated prediction residue has proven to be an important coding tool. This paper presents highly parallel joint circuit architecture for 8 × 8 and 4 × 4 adaptive block-size transforms in H.264/AVC. By decomposing the 8 × 8 transform to basic 4 × 4 transforms, a unified architecture is designed for both 8 × 8 and 4 × 4 transform and the transform data-path can be efficiently reused for six kinds of transforms. i.e., 8 × 8 forward, 8 × 8 inverse, 4 × 4 forward, 4 × 4 inverse, forward-Hadamard, inverse-Hadamard transforms. Linear shift mapping is applied on the memory buffer to support parallel access both in row and column directions which eliminates the need for a transpose circuit. For reusable and configurable transform data-path, a multiple-stage pipeline is designed to reduce the critical path length and increase throughput. The design is implemented under UMC 0.18 um technology at 200 MHz with 13.651 K logic gates, which can support 1,920 × 1,088 30 fps H.264/AVC HDTV decoder.
Yu LiEmail:
  相似文献   

16.
In this paper we combine two points made in two previous papers on negative correlation learning (NC) by different authors, which have theoretical implications for the optimal setting of λ, a parameter of the method whose correct choice is critical for stability and good performance. An expression for the optimal λ is derived whose value λ* depends only on the number of classifiers in the ensemble. This result arises from the form of the ambiguity decomposition of the ensemble error, and the close links between this and the error function used in NC. By analyzing the dynamics of the outputs we find dramatically different behavior for λ < λ*, λ = λ* and λ > λ*, providing further motivation for our choice of λ and theoretical explanations for some empirical observations in other papers on NC. These results will be illustrated using well known synthetic and medical datasets.
Bogdan GabrysEmail:
  相似文献   

17.
In this paper a new Back-Propagation (BP) algorithm cost function is appropriately studied for the modeling of air pollution time series. The underlying idea is that of modifying the error definition in order to improve the capabilities of this kind of models to forecast episodes of poor air quality. The proposed error definition can be regarded as a generalization of the traditional squared error cost function thanks to the presence of a parameter α which allows to obtain the ordinary BP as a special case when α = 1. A criterion for choosing this parameter is stated based on setting a-priori a maximum level of allowable false alarms. The goodness of the proposed approach is assessed by means of case studies both on synthetic and measured air quality data.
Flavio Cannavó (Corresponding author)Email:
  相似文献   

18.
A frequency domain analysis is presented to optimize the Predictive Least Mean Square (PLMS) algorithm used for wireless channel tracking. Simulation results show that the PLMS offers significant improvement in tracking performance compared to that of the conventional LMS based method. The algorithm parameters should be carefully selected in order to gain such improvements. The objective of this paper is to use frequency domain analysis to determine an expression for the Mean Square Tracking Error (MSTE) and use it to obtain the optimum PLMS algorithm parameters such as step size (μ) and smoothing constant (θ) with numerical optimization methods.
Qassim NasirEmail:
  相似文献   

19.
In this paper, we propose a cost-effective architecture of variable length decoder (VLD) for MPEG-2 and AVS. In order to save the buffer memory between VLD and IDCT and accelerate decoding speed, block-based pipeline buffers are adopted. Inverse scan (IScan) and inverse quantisation (IQ) are also merged into this architecture for cost-effective implementation and for easier system integration. A novel group-based architecture with the optimized look-up table is used for MPEG-2 and a new memory-efficient architecture with mixed memory organization is used for AVS. We use shared modules in both MPEG-2 and AVS as much as possible, such as the flush unit, the buffer controller and the buffers. Moreover, we propose merged IQ scheme and merged RAMs scheme. Based on 0.18 μm CMOS technology, the proposed design consumes about 11.5 K gates at a clock constrain of 125 MHz. The simulation results show that it can achieve real-time decoding, such as HD1080i (1,920 × 1,088 at 30 MHz) format video of AVS and MPEG-2. Furthermore, we propose an effective design of the buffers between VLD and IDCT according to the IDCT architecture, a cost-efficient IQ architecture with full flexibility and an efficient scheme for accelerating VLC decoding.
Yun HeEmail:
  相似文献   

20.
Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied, next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain. The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram algorithm.
Jef Van MeerbergenEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号