共查询到20条相似文献,搜索用时 31 毫秒
1.
Steffen Tarnick 《Journal of Electronic Testing》2008,24(6):509-527
Borden codes are optimal nonsystematic t-unidirectional error detecting (t-UED) codes. A possible method to design a Borden code checker is to map the Borden code words to words of an AN arithmetic code and to check the obtained words with an appropriate AN code checker. For t = q − 1 with q = 2
m
− 1 we show how this method can be modified such that the Borden code checkers achieve the self-testing property under very
weak conditions. It is only required that no checker input line gets a constant signal and that the Borden code words occur
in a random order, making the proposed checkers very suitable for use as embedded checkers. Based on these checkers it is
then possible to design embedded Borden t-UED code checkers for t = 2
k
q − 1 with q = 2
m
− 1.
相似文献
Steffen TarnickEmail: |
2.
This paper deals with the process of Transformation and Quantization that is carried out on each inter-predicted residual
block in a video encoding process and their reduced complexity hardware implementation. H.264/AVC utilizes 4 × 4 integer transform,
which is derived from the 4 × 4 DCT. We propose, a reduced complexity algorithm and a pipelined structure for the Core forward
integer transform module. A multiplier-less architecture is realized with less number of shifts and adds compared to existing
works. The corresponding inverse transform is exactly reversible. Each of the transformed coefficients is quantized by a scalar
quantizer. The quantization step size can be varied from macroblock to macroblock. The proposed unified pipelined architecture
outperforms many recent implementations in terms of gate count and is capable of processing a 4 × 4 residual block in 4 clock
cycles.
相似文献
Reeba KorahEmail: |
3.
Lingfeng Li Yang Song Shen Li Takeshi Ikenaga Satoshi Goto 《Journal of Signal Processing Systems》2008,50(1):81-95
This paper presents a compact hardware architecture of Context-Based Adaptive Binary Arithmetic Coding (CABAC) codec for H.264/AVC.
The similarities between encoding algorithm and decoding algorithm are explored to achieve remarkable hardware reuse. System-level
hardware/software partition is conducted to improve overall performance. Meanwhile, the characteristics of CABAC algorithm
are utilized to implement dynamic pipeline scheme, which increases the processing throughput with very small hardware overhead.
Proposed architecture is implemented under 0.18 μm technology. Results show that the core area of proposed design is 0.496 mm2 when the maximum clock frequency is 230 MHz. It is estimated that the proposed architecture can support CABAC encoding or
decoding for HD1080i resolution at a speed of 30 frame/s.
相似文献
Lingfeng LiEmail: |
4.
A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is
proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors
as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data
sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support
1–4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um
CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input
data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length;
it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN
standard.
相似文献
Paul AmpaduEmail: |
5.
Elliptic curve cryptography (ECC) is recognized as a fast cryptography system and has many applications in security systems.
In this paper, a novel sharing scheme is proposed to significantly reduce the number of field multiplications and the usage
of lookup tables, providing high speed operations for both hardware and software realizations.
相似文献
Brian KingEmail: |
6.
This work presents an efficient architecture design for deblocking filter in H.264/AVC using a novel fast-deblocking boundary-strength
(FDBS) technique. Based on the FDBS technique, the proposed architecture divides the deblocking process into three filtering
modes, namely offset-based, standard-based and diagonal-based filtering modes, to reduce the blocking artifact and improve
the video quality in H.264/AVC. The proposed architecture is designed in Verilog HDL, simulated with Quartus II and synthesized
using 0.18 μm CMOS cells library with the Synopsys Design Compiler. Simulation results demonstrate good performance in PSNR
improvement and bit-rate reduction. Additionally, verification results through physical chip design reveal that the proposed
architecture design can support 1,280 × 720@30 Hz processing throughput while clocking at 100 MHz. Comparisons with other
studies show the excellent properties of the proposed architecture in terms of gate count, memory size and clock-cycle/macroblock.
相似文献
Chun-Lung HsuEmail: |
7.
Tay-Jyi Lin Shin-Kai Chen Yu-Ting Kuo Chih-Wei Liu Pi-Chen Hsiao 《Journal of Signal Processing Systems》2008,51(3):209-223
This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications.
The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time
of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance
(estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is
also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process,
and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.
相似文献
Chih-Wei LiuEmail: |
8.
In this paper, we present high performance motion compensation architecture for H.264/AVC HDTV decoder. The bottleneck of
efficient motion compensation implementation primarily rests on the high memory bandwidth demand and six-tap fractional interpolation
complexity. To solve the bottleneck for H.264/AVC HD applications, three combined bandwidth optimization strategies are proposed
to minimize the memory bandwidth for MB-based decoding process. To improve the interpolation hardware utilization and reduce
the interpolation cycles, an interpolation classification scheme is proposed. By classifying the fifteen fractional pixels
into five types and processing correspondingly, the interpolation cycles decrease significantly. A direct mapping memory cache
characterized with circular addressing, byte-aligned addressing and horizontal and vertical parallel access is designed to
support the proposed scheme. The hardware of proposed motion compensation is implemented at 100 M with 31.841 K logic gates,
averagely 70–80% reduced memory bandwidth can be offered and the interpolation hardware can be fully utilized and interpolate
one MB within 304 cycles, which can satisfy the real time constraint for H.264/AVC HD (1,920 × 1,088) 30 fps decoder. The
design is implemented under UMC 0.18 μm technology, and the synthesis results and comparisons are shown.
相似文献
Yu LiEmail: |
9.
This paper presents a family of uniform random number generators designed for efficient implementation in Lookup table (LUT)
based FPGA architectures. A generator with a period of 2
k
− 1 can be implemented using k flip-flops and k LUTs, and provides k random output bits each cycle. Each generator is based on a binary linear recurrence, with a state-transition matrix designed
to make best use of all available LUT inputs in a given FPGA architecture, and to ensure that the critical path between all
registers is a single LUT. This class of generator provides a higher sample rate per area than LFSR and Combined Tausworthe
generators, and operates at similar or higher clock-rates. The statistical quality of the generators increases with k, and can be used to pass all common empirical tests such as Diehard, Crush and the NIST cryptographic test suite. Theoretical
properties such as global equidistribution can also be calculated, and best and average case statistics shown. Due to the
large number of random bits generated per cycle these generators can be used as a basis for generators with even higher statistical
quality, and an example involving combination through addition is demonstrated.
相似文献
Wayne LukEmail: |
10.
Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication
systems. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT
lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting
(DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational
complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist
of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient
design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 μm process, the area of the processor
is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.
相似文献
Shuenn-Shyang WangEmail: |
11.
Kazuo Sakiyama Lejla Batina Bart Preneel Ingrid Verbauwhede 《Mobile Networks and Applications》2007,12(4):245-258
We present a high-speed public-key cryptoprocessor that exploits three-level parallelism in Elliptic Curve Cryptography (ECC)
over GF(2
n
). The proposed cryptoprocessor employs a Parallelized Modular Arithmetic Logic Unit (P-MALU) that exploits two types of different
parallelism for accelerating modular operations. The sequence of scalar multiplications is also accelerated by exploiting
Instruction-Level Parallelism (ILP) and processing multiple P-MALU instructions in parallel. The system is programmable and
hence independent of the type of the elliptic curves and scalar multiplication algorithms. The synthesis results show that
scalar multiplication of ECC over GF(2163) on a generic curve can be computed in 20 and 16 μs respectively for the binary NAF (Non-Adjacent Form) and the Montgomery
method. The performance can be accelerated furthermore on a Koblitz curve and reach scalar multiplication of 12 μs with the
TNAF (τ-adic NAF) method. This fast performance allows us to perform over 80,000 scalar multiplications per second and to enhance
security in wireless mobile applications.
相似文献
Ingrid VerbauwhedeEmail: |
12.
F. Angarita M. J. Canet T. Sansaloni A. Perez-Pascual J. Valls 《Journal of Signal Processing Systems》2008,52(2):181-191
In an orthogonal frequency division multiplexing-based wireless local area network receiver there are three operations that
can be performed by a unique coordinate rotation digital computer (CORDIC) processor since they are needed in different time
instants. These are the rotation of a vector, the computation of the angle of a vector and the computation of the reciprocal.
This paper proposes a common architecture of CORDIC algorithm suitable to implement the three operations with a reduced increase
of the hardware cost with respect to a single operation CORDIC. The proposed architecture has been validated on field programmable
gate-arrays devices and the results of the implementation show that area saving around 28% and throughput increment of 64%
are obtained.
相似文献
J. VallsEmail: |
13.
This paper proposes a novel cost-effective and programmable architecture of CAVLC decoder for H.264/AVC, including decoders
for Coeff_token, T1_sign, Level, Total_zeros and Run_before. To simplify the hardware architecture and provide programmability,
we propose four new techniques: a new group-based VLD with efficient memory (NG–VLDEM) for Coeff_token decoder, a novel combined
architecture (NCA) for level decoder, a new group-based VLD with memory access once (GMAO) for Total_zeros decoder and a new
VLD architecture based on multiplexers instead of searching memory (MISM) for Run_before decoder. With the above four techniques,
the proposed CAVLC decoder can decode every syntax element within one clock cycle. Synthesis result shows that the hardware
cost is 3,310 gates with 0.18 μm CMOS technology at a clock constrain of 125 MHz. Therefore, the proposed design is satisfied
for real-time applications, such as H.264/AVC HD1080i video decoding.
相似文献
Shunliang MeiEmail: |
14.
In this paper an improved Montgomery multiplier, based on modified four-to-two carry-save adders (CSAs) to reduce critical
path delay, is presented. Instead of implementing four-to-two CSA using two levels of carry-save logic, authors propose a
modified four-to-two CSA using only one level of carry-save logic taking advantage of pre-computed input values. Also, a new
bit-sliced, unified and scalable Montgomery multiplier architecture, applicable for both RSA and ECC (Elliptic Curve Cryptography),
is proposed. In the existing word-based scalable multiplier architectures, some processing elements (PEs) do not perform useful
computation during the last pipeline cycle when the precision is not equal to an exact multiple of the word size, like in
ECC. This intrinsic limitation requires a few extra clock cycles to operate on operand lengths which are not powers of 2.
The proposed architecture eliminates the need for extra clock cycles by reconfiguring the design at bit-level and hence can
operate on any operand length, limited only by memory and control constraints. It requires 2∼15% fewer clock cycles than the
existing architectures for key lengths of interest in RSA and 11∼18% for binary fields and 10∼14% for prime fields in case
of ECC. An FPGA implementation of the proposed architecture shows that it can perform 1,024-bit modular exponentiation in
about 15 ms which is better than that by the existing multiplier architectures.
相似文献
M. B. SrinivasEmail: |
15.
In H.264/AVC, the concept of adapting the transform size to the block size of motion-compensated prediction residue has proven
to be an important coding tool. This paper presents highly parallel joint circuit architecture for 8 × 8 and 4 × 4 adaptive
block-size transforms in H.264/AVC. By decomposing the 8 × 8 transform to basic 4 × 4 transforms, a unified architecture is
designed for both 8 × 8 and 4 × 4 transform and the transform data-path can be efficiently reused for six kinds of transforms.
i.e., 8 × 8 forward, 8 × 8 inverse, 4 × 4 forward, 4 × 4 inverse, forward-Hadamard, inverse-Hadamard transforms. Linear shift
mapping is applied on the memory buffer to support parallel access both in row and column directions which eliminates the
need for a transpose circuit. For reusable and configurable transform data-path, a multiple-stage pipeline is designed to
reduce the critical path length and increase throughput. The design is implemented under UMC 0.18 um technology at 200 MHz
with 13.651 K logic gates, which can support 1,920 × 1,088 30 fps H.264/AVC HDTV decoder.
相似文献
Yu LiEmail: |
16.
In this paper we combine two points made in two previous papers on negative correlation learning (NC) by different authors,
which have theoretical implications for the optimal setting of λ, a parameter of the method whose correct choice is critical for stability and good performance. An expression for the optimal
λ is derived whose value λ* depends only on the number of classifiers in the ensemble. This result arises from the form of the ambiguity decomposition
of the ensemble error, and the close links between this and the error function used in NC. By analyzing the dynamics of the
outputs we find dramatically different behavior for λ < λ*, λ = λ* and λ > λ*, providing further motivation for our choice of λ and theoretical explanations for some empirical observations in other papers on NC. These results will be illustrated using
well known synthetic and medical datasets.
相似文献
Bogdan GabrysEmail: |
17.
In this paper a new Back-Propagation (BP) algorithm cost function is appropriately studied for the modeling of air pollution
time series. The underlying idea is that of modifying the error definition in order to improve the capabilities of this kind
of models to forecast episodes of poor air quality. The proposed error definition can be regarded as a generalization of the
traditional squared error cost function thanks to the presence of a parameter α which allows to obtain the ordinary BP as a special case when α = 1. A criterion for choosing this parameter is stated based on setting a-priori a maximum level of allowable false alarms.
The goodness of the proposed approach is assessed by means of case studies both on synthetic and measured air quality data.
相似文献
Flavio Cannavó (Corresponding author)Email: |
18.
Qassim Nasir 《Wireless Personal Communications》2009,48(4):511-519
A frequency domain analysis is presented to optimize the Predictive Least Mean Square (PLMS) algorithm used for wireless channel
tracking. Simulation results show that the PLMS offers significant improvement in tracking performance compared to that of
the conventional LMS based method. The algorithm parameters should be carefully selected in order to gain such improvements.
The objective of this paper is to use frequency domain analysis to determine an expression for the Mean Square Tracking Error
(MSTE) and use it to obtain the optimum PLMS algorithm parameters such as step size (μ) and smoothing constant (θ) with numerical optimization methods.
相似文献
Qassim NasirEmail: |
19.
In this paper, we propose a cost-effective architecture of variable length decoder (VLD) for MPEG-2 and AVS. In order to save
the buffer memory between VLD and IDCT and accelerate decoding speed, block-based pipeline buffers are adopted. Inverse scan
(IScan) and inverse quantisation (IQ) are also merged into this architecture for cost-effective implementation and for easier
system integration. A novel group-based architecture with the optimized look-up table is used for MPEG-2 and a new memory-efficient
architecture with mixed memory organization is used for AVS. We use shared modules in both MPEG-2 and AVS as much as possible,
such as the flush unit, the buffer controller and the buffers. Moreover, we propose merged IQ scheme and merged RAMs scheme.
Based on 0.18 μm CMOS technology, the proposed design consumes about 11.5 K gates at a clock constrain of 125 MHz. The simulation
results show that it can achieve real-time decoding, such as HD1080i (1,920 × 1,088 at 30 MHz) format video of AVS and MPEG-2.
Furthermore, we propose an effective design of the buffers between VLD and IDCT according to the IDCT architecture, a cost-efficient
IQ architecture with full flexibility and an efficient scheme for accelerating VLC decoding.
相似文献
Yun HeEmail: |
20.
Lennart Yseboodt Michael De Nil Jos Huisken Mladen Berekovic Qin Zhao Frank Bouwens Jos Hulzink Jef Van Meerbergen 《Journal of Signal Processing Systems》2009,57(1):107-119
Wireless sensor nodes span a wide range of applications. This paper focuses on the biomedical area, more specifically on healthcare
monitoring applications. Power dissipation is the dominant design constraint in this domain. This paper shows the different
steps to develop a digital signal processing architecture for a single channel electrocardiogram application, which is used
as an application example. The target power consumption is 100 μW as that is the power energy scavengers can deliver. We follow
a bottleneck-driven approach: first the algorithm is tuned to the target processor, then coarse grained clock-gating is applied,
next the static as well as the dynamic dissipation of the digital processor is reduced by tuning the core to the target domain.
The impact of each step is quantified. A solution of 11 μW is possible for both radio and DSP running the electrocardiogram
algorithm.
相似文献
Jef Van MeerbergenEmail: |