首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wireless Personal Communications - Channel state information at the transmitter side is an important issue for wireless communications systems, namely when precoding techniques are employed. Recent...  相似文献   

2.
In this paper, we propose a parallel systematic resampling (PSR) algorithm for particle filters, which is a new form of systematic resampling (SR). The PSR algorithm makes iterations independent, thus allowing the resampling algorithm to perform loop iterations in parallel. A fixed-point version of the PSR algorithm is also proposed, with a modification to ensure that a correct number of particles is generated. Experiments show that the fixed-point implementation of the PSR algorithm can use as few as 22 bits for representing the weights, when processing 512 particles, while achieving results equivalent to a floating-point SR implementation. Four customized instructions were designed to accelerate the proposed PSR algorithm in Application-Specific Instruction-set Processors. These four custom instructions, when configured to support four weight inputs in parallel, lead to a 73.7 \(\times \) speedup over a floating-point SR implementation on a general-purpose processor at a cost of 47.3 K additional gates.  相似文献   

3.
Efficient Implementations for AES Encryption and Decryption   总被引:1,自引:0,他引:1  
This paper proposes two efficient architectures for hardware implementation of the Advanced Encryption Standard (AES) algorithm. The composite field arithmetic for implementing SubBytes (S-box) and InvSubBytes (Inverse S-box) transformations investigated by several authors is used as the basis for deriving the proposed architectures. The first architecture for encryption is based on optimized S-box followed by bit-wise implementation of MixColumns and AddRoundKey and optimized Inverse S-box followed by bit-wise implementation of InvMixColumns and AddMixRoundKey for decryption. The proposed S-box and Inverse S-box used in this architecture are designed as a cascade of three blocks. In the second proposed architecture, the block III of the proposed S-box is combined with the MixColumns and AddRoundKey transformations forming an integrated unit for encryption. An integrated unit for decryption combining the block III of the proposed InvSubBytes with InvMixColumns and AddMixRoundKey is formed on similar lines. The delays of the proposed architectures for VLSI implementation are found to be the shortest compared to the state-of-the-art implementations of AES operating in non-feedback mode. Iterative and fully unrolled sub-pipelined designs including key schedule are implemented using FPGA and ASIC. The proposed designs are efficient in terms of Kgates/Giga-bits per second ratio compared with few recent state-of-the-art ASIC (0.18-μm CMOS standard cell) based designs and throughput per area (TPA) for FPGA implementations.  相似文献   

4.
In the context of minimum mean-square error symmetric uniform quantization, we show that for several different distributions on the input signals, log-log plots of step size versus number of output levels and mean-square error versus number of output levels both exhibit nearly linear behavior. This observation results in a straightforward design procedure for symmetric uniform quantization.  相似文献   

5.
H.264/AVC also known as MPEG 4 part 10 or JVT, is a recently established video coding standard by the Joint Video Team (JVT) of the ISO/IEC MPEG and ITU-T VCEG. The main goal of the paper is to give a broader understanding of the design considerations for the transform and quantization blocks from H.264/AVC, by presenting area and speed optimized implementations of these blocks. The area optimized design can be used in low performance applications like mobile devices, while the speed optimized designs can be used in high definition encoders. Various designs with these blocks were synthesized with 0.18 μm TSCM technology and were also implemented on a Xilinx FPGA. The resulting gate counts were anywhere from 294 to 47,762 gates and the throughput was anywhere from 6 to 2,552 M pixels/s depending on block and optimization. In addition, a system on a programmable chip implementation of the DCT and quantization blocks is presented, which uses the Xilinx Virtex II-Pro’s FPGA and its Power PC. Using this system it is possible to process 0.8 M pixels/s.
Shahram ShiraniEmail:
  相似文献   

6.
H.264/AVC also known as MPEG 4 part 10 or JVT, is a recently established video coding standard by the Joint Video Team (JVT) of the ISO/IEC MPEG and ITU-T VCEG. The main goal of the paper is to give a broader understanding of the design considerations for the transform and quantization blocks from H.264/AVC, by presenting area and speed optimized implementations of these blocks. The area optimized design can be used in low performance applications like mobile devices, while the speed optimized designs can be used in high definition encoders. Various designs with these blocks were synthesized with 0.18 μm TSCM technology and were also implemented on a Xilinx FPGA. The resulting gate counts were anywhere from 294 to 47,762 gates and the throughput was anywhere from 6 to 2,552 M pixels/s depending on block and optimization. In addition, a system on a programmable chip implementation of the DCT and quantization blocks is presented, which uses the Xilinx Virtex II-Pro’s FPGA and its Power PC. Using this system it is possible to process 0.8 M pixels/s.
Shahram Shirani (Corresponding author)Email:
  相似文献   

7.
In this paper, we propose a compact threshold-based resampling algorithm and architecture for efficient hardware implementation of particle filters (PFs). By using a simple threshold-based scheme, this resampling algorithm can reduce the complexity of hardware implementation and power consumption. Simulation results indicate that this algorithm has approximately equal performance with the traditional systematic resampling (SR) algorithm when the root-mean-square error (RMSE) and lost track are considered. Experimental comparison of the proposed hardware architecture with those based on the SR and the residual systematic resampling (RSR) algorithms was conducted on a Xilinx Virtex-II Pro field programmable gate array (FPGA) platform in the bearings-only tracking context, and the results establish the superiority of the proposed architecture in terms of high memory efficiency, low power consumption, and low latency.  相似文献   

8.
This paper investigates optimized synchronization techniques for shared memory on-chip multiprocessors (CMPs) based on network-on-chip (NoC) and targeted at future mobile systems. The proposed solution is based on the idea of locally performing synchronization operations requiring continuous polling of a shared variable, thus, featuring large contentions (e.g., spin locks and barriers). A hardware (HW) module, the synchronization-operation buffer (SB), has been introduced to queue and to manage the requests issued by the processors. By using this mechanism, we propose a spin lock implementation requiring a constant number of network transactions and memory accesses per lock acquisition. The SB also supports an efficient implementation of barriers. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform for multiprocessor systems-on-chip (MPSoCs). Two different architectures have been explored to prove that the proposed approach is effective independently from caches and coherence schemes adopted. For an eight-processor target architecture, we show that the SB-based solution achieves up to 50% performance improvement and 30% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherence protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases  相似文献   

9.
Code density is of increasing concern in embedded system design since it reduces the need for the scarce resource memory and also implicitly improves further important design parameters like power consumption and performance. In this paper we introduce a novel, hardware-supported approach. Besides the code, also the lookup tables (LUTs) are compressed, that can become significant in size if the application is large and/or high compression is desired. Our scheme optimizes the number and size of generated LUTs to improve the compression ratio. To show the efficiency of our approach, we apply it to two compression schemes: “dictionary-based” and “statistical”. We achieve an average compression ratio of 48% (already including the overhead of the LUTs). Thereby, our scheme is orthogonal to approaches that take particularities of a certain instruction set architecture into account. We have conducted evaluations using a representative set of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS, and PowerPC.   相似文献   

10.
IIR数字滤波器设计的粒子群优化算法   总被引:13,自引:0,他引:13  
本文探讨了粒子群优化算法及其性能评估准则,然后重点研究了IIR数字滤波器设计的粒子群优化算法及其实现步骤。最后,通过IIR数字低通、带通滤波器设计两个实例证明了本文算法的有效性。  相似文献   

11.
In this paper, we introduce a hierarchical resampling (HR) algorithm and architecture for distributed particle filters (PFs). While maintaining the same accuracy as centralized resampling in statistics, the proposed HR algorithm decomposes the resampling step into two hierarchies including intermediate resampling (IR) and unitary resampling (UR), which suits PFs for distributed hardware implementation. Also presented includes a residual cumulative resampling (RCR) method that pipelines and accelerates the UR step. The corresponding architecture, when compared with traditional distributed architectures, eliminates the particle redistribution step, and has such advantages as short execution time and high memory efficiency. The prototype containing 8 PEs has been developed in Xilinx Virtex IV FPGA (XC4VFX100-12FF1152) for the bearings-only tracking (BOT) problem, and the result shows that the input observations can be processed at 37.21 KHz with 8 K particles and a clock speed of 80 MHz.  相似文献   

12.
粒子群优化算法在FIR数字滤波器设计中的应用   总被引:18,自引:0,他引:18       下载免费PDF全文
李辉  张安  赵敏  徐琦 《电子学报》2005,33(7):1338-1341
本文针对有限脉冲响应(FIR)数字滤波器的设计实质上是一个多参数优化问题,提出了一种用粒子群优化算法(PSO)设计FIR数字滤波器的方法.首先将滤波器的设计问题转化为滤波器参数的优化问题,然后利用粒子群优化算法对整个参数空间进行高效并行搜索以获得参数的最优化.FIR数字低通、带通滤波器设计实例证明了该方法的有效性和优越性.  相似文献   

13.
This paper introduces new algorithms for joint blind equalization and decoding of convolutionally coded communication systems operating on frequency-selective channels. The proposed method is based on particle filters (PF), recursively approximating maximum a posteriori (MAP) estimates of the transmitted data without explicitly determining channel parameters. Further elaborating on previous works, we assume that both the channel order and the noise variance are unknown random variables, and develop a new formulation for PF weight propagation which allows these quantities to be analytically integrated out. We verify via numerical simulations that the proposed methods lead to near optimal performance, closely approximating that of algorithms that require exact knowledge of all channel parameters.  相似文献   

14.
In this paper, we treat nonlinear active noise control (NANC) with a linear secondary path (LSP) and with a nonlinear secondary path (NSP) in a unified structure by introducing a new virtual secondary path filter concept and using a general function expansion nonlinear filter. We discover that using the filtered-error structure results in greatly reducing the computational complexity of NANC. As a result, we extend the available filtered-error-based algorithms to solve NANC/LSP problems and, furthermore, develop our adjoint filtered-error-based algorithms for NANC/NSP. This family of algorithms is computationally efficient and possesses a simple structure. We also find that the computational complexity of NANC/NSP can be reduced even more using block-oriented nonlinear models, such as the Wiener, Hammerstein, or linear-nonlinear-linear (LNL) models for the NSP. Finally, we use the statistical properties of the virtual secondary path and the robustness of our proposed methods to further reduce the computational complexity and simplify the implementation structure of NANC/NSP when the NSP satisfies certain conditions. Computational complexity and simulation results are given to confirm the efficiency and effectiveness of all of our proposed methods  相似文献   

15.
郭黎利  高飞  孙志国 《电子学报》2016,44(11):2773-2779
在无线传感器网络背景下的分布式估计中,由于传输网络对发送功率和传输带宽的限制,压缩信源冗余、降低通信数据量便成为一个重要的课题.为此,本文提出了一种基于多比特量化观测的分布式估计方法(MQS),利用渐进性能作为优化准则构造量化阈值优化问题,运用粒子群算法对其进行求解得到最优量化阈值,给出了克拉美罗下界的解析表达式,并与均匀量化方法(UQS)和未量化方法(NQS)进行对比.理论分析和仿真实验表明,MQS的性能优于UQS.当量化深度增大到3时,MQS的估计性能十分接近NQS的估计性能.  相似文献   

16.
针对在LBG算法中存在初始码书的选择极易影响码书训练的收敛速度和最终码书性能的缺陷,提出了一种基于微粒群的矢量量化码书设计算法.首先产生具有一定全局性特点的初始码书,然后再应用LBG算法进行优化得到同时具有局部特性的码书.实验结果验证了该算法的合理性.  相似文献   

17.
18.
庞宇  贺志龙  王绍全  王骏超  高翔  吴玮 《电子学报》2012,40(9):1752-1758
 值域与精度分析是高级综合的重要步骤.虽然过去已提出了不少方法试图解决这两个问题,但针对无限冲击响应滤波器(Infinite Impulse Response,IIR)来说,这些方法要么过高估计数值要么无法处理任意阶的反馈电路.对于给定输入值范围与误差界限的IIR滤波器,我们提出了一个高效的启发式算法来解决值域与精度分析.该算法能用于优化整数和分数的比特宽度分配,获得优化的电路面积.实验结果证明了所提出的算法具有快速收敛性与鲁棒性,由于高阶IIR滤波器能分解为低阶结构的滤波器,因此该算法能高效的处理任意阶IIR滤波器.  相似文献   

19.
In this paper, we present an area-efficient storage and routing structure to be used as part of either a DWT or an IDWT filter. Such efficient structures are necessary for the single chip implementation of multidimensional DWT and IDWT filters for processing images and video. While the storage structures described in previously published architectures were adequate for the 1D DWT/IDWT filter, they do not scale well to a multidimensional implementation. The storage structure design and implementation described in this paper utilizes a combination of well-known efficient RAM cells with simple control to achieve compact size and scalability. When compared to other alternatives, the structure uses less power.In this paper, we examine the problem of constructing, on a single chip, filters for both the multidimensional Discrete Wavelet Transform (DWT) and the multidimensional Inverse Discrete Wavelet Transform (IDWT). We will use the following example to illustrate where the difficulty lies in constructing such a chip. Consider a filter that executes transforms on 2D images at the rate of 30 images per second. Furthermore, the size N × N of the images is 1024 × 1024, the length L of the filter is 8, the number of octaves O to be generated is 4, and the arithmetic precision P is 24. In image compression, such a filter would be a good candidate for the replacement of the filters presently used to perform the block Discrete Cosine Transform (DCT).  相似文献   

20.
For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号