首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 314 毫秒
1.
密码专用可编程逻辑阵列(CSPLA)是一种数据流驱动的密码处理结构,该文针对不同规模的阵列结构和密码算法映射实现能效关系的问题,首先以CSPLA的特定硬件结构为基础,以分组密码的高能效实现为切入点,建立基于该结构的分组密码算法映射能效模型并分析影响能效的相关因素,然后进一步根据阵列结构上算法映射的基本过程提出映射算法,最后选取几种典型的分组密码算法分别在不同规模的阵列进行映射实验。结果表明越大的规模并不一定能够带来越高的能效,为取得映射的最佳能效,阵列的规模参数应当与具体的硬件资源限制和密码算法运算需求相匹配,CSPLA规模为4×4~4×6时映射取得最优能效,AES算法最优能效为33.68 Mbps/mW,对比其它密码处理结构,CSPLA具有较优的能效特性。  相似文献   

2.
密码专用可编程逻辑阵列(CSPLA)是一种数据流驱动的密码处理结构,该文针对不同规模的阵列结构和密码算法映射实现能效关系的问题,首先以CSPLA的特定硬件结构为基础,以分组密码的高能效实现为切入点,建立基于该结构的分组密码算法映射能效模型并分析影响能效的相关因素,然后进一步根据阵列结构上算法映射的基本过程提出映射算法,最后选取几种典型的分组密码算法分别在不同规模的阵列进行映射实验.结果表明越大的规模并不一定能够带来越高的能效,为取得映射的最佳能效,阵列的规模参数应当与具体的硬件资源限制和密码算法运算需求相匹配,CSPLA规模为4×4~4×6时映射取得最优能效,AES算法最优能效为33.68 Mbps/mW,对比其它密码处理结构,CSPLA具有较优的能效特性.  相似文献   

3.
天线阵列方向图的一种数值综合算法   总被引:1,自引:0,他引:1  
本文提出了一种新型阵列综合算法,目标方向图迭代算法。这种算法与现有的阵列综合方法不同,它通过对目标方向图的迭代来调整实际方向图的形状,是一种纯数值的阵列综合算法。这种算法适用于任意结构阵列的方向图综合,计算效率高,可以满足实际工程的需要。作为验证,本文综合了一些具有代表性的天线阵列,给出了计算结果,并对结果进行了讨论。  相似文献   

4.
天线阵列方向图的一种数值综合算法   总被引:4,自引:0,他引:4  
本文提出了一种新型阵列综合算法,目标方向图迭代算法。这种算法与现有的阵列综合方法不同,它通过对目标方向图的迭代来调整实际方向图的形状,是一种纯数值的阵列综合算法。这种算法适用于任意结构阵列的方向图综合,计算效率高,可以满足实际工程的需要。作为验证,本文综合了一些具有代表性的天线阵列,给出了计算结果,并对结果进行了讨论。  相似文献   

5.
张中培  周亮  靳蕃 《电子学报》2001,29(2):272-274
MAX-LOG-MAP是Turbo码译码算法的简化算法,本文提出了该算法的并行阵列集成电路实现结构,给出阵列的数据流向和译码算法在阵列中的计算过程,分析了阵列结点联接关系和数据存贮结构,以及数据运算之间的简单时序关系.通过计算机仿真,证明了这种并行实现结构的正确性.  相似文献   

6.
离散傅里叶变换的算术傅里叶变换算法   总被引:11,自引:3,他引:8       下载免费PDF全文
离散傅里叶变换(DFT)在数字信号处理等许多领域中起着重要作用.本文采用一种新的傅里叶分析技术—算术傅里叶变换(AFT)来计算DFT.这种算法的乘法计算量仅为O(N);算法的计算过程简单,公式一致,克服了任意长度DFT传统快速算法(FFT)程序复杂、子进程多等缺点;算法易于并行,尤其适合VLSI设计;对于含较大素因子,特别是素数长度的DFT,其速度比传统的FFT方法快;算法为任意长度DFT的快速计算开辟了新的思路和途径.  相似文献   

7.
王宏伟 《电波科学学报》2012,(4):773-779,796
受数字系统有限字长的影响,滑动离散傅里叶变换(滑动DFT)算法的频率单元存在输出不稳定的缺点。利用改进Goertzel算法的递归单元对滑动DFT算法的频率单元改造后,不仅可以直接计算起始频谱值,而且滑动DFT算法可以每隔N个输出值就对频率单元清零,并能提供准确的新谱值,保证了滑动DFT算法的频率单元可以长时间连续不断的处理输入数据,而不会出现输出不稳定现象。这种方法在连续地、实时地进行时频谱分析中具有重要的意义。  相似文献   

8.
任意几何结构阵列下的空间信号频率估计   总被引:3,自引:1,他引:2  
该文论述了一种基于波束空间ESPRIT并适用于任意几何结构阵列的空间信号频率估计算法。在把阵元空间数据映射到DFT空间后,提取出不依赖于阵列几何结构的频率矢量,然后运用BeamspaceESPRIT的概念并通过一个间接的方法估计出信号频率。计算机模拟结果证实了该算法对不同阵列获取的信号的测频有效性并显示出良好的高分辨、高精度性能。  相似文献   

9.
快速傅里叶变换(FFT)是减少离散傅里叶变换(DFT)计算时间的算法。而在无线/移动通信系统中无线通信算法和多媒体应用处理算法中存在大量的矩阵或向量运算,均可以由DLP计算实现。本文研究的FFT算法就存在大量的矩阵运算,通过对FFT矩阵算法的分析,本文提出了在DLP计算模式下通过阵列计算机来实现FFT的快速算法,在MATLAB仿真平台上进行了传统算法与改进之后算法的比较,提出了进一步减少运算时间的FFT并行算法。  相似文献   

10.
针对曲面共形阵列结构电磁散射特性的高效、精确仿真分析需求,提出了一种并行综合函数矩量法处理方案.该方法是传统电磁经典数值算法——矩量法的一种改进形式,通过几何区域分解处理和综合基函数的方式极大降低了算法的内存消耗,使得单机分析电大尺寸问题和大规模阵列问题成为可能.更为重要的是,针对周期阵列结构,该方法具备综合函数复用特性和多区域并行处理特性,能够大大提高算法的综合处理效率.一个6×11的柱面共形贴片阵列被用于验证所提方法的性能,仿真结果表明,对于周期阵列结构,该方法的计算精度与多层快速多极子算法相当,虽然计算效率略低于多层快速多极子方法,但内存消耗比多层快速多极子方法低一个数量级.  相似文献   

11.
LSC87中嵌入式ROM内建自测试实现   总被引:2,自引:1,他引:1  
LSC87芯片是与Intel8086配套使用的数值协处理器,体系结构复杂,有较大容量的嵌入式ROM存储器,考虑到与Intel8087的兼容性和管脚的限制,必须选择合适的可测性设计来提高芯片的可测性。文章研究了LSC87芯片中嵌入式ROM存储器电路的设计实现,然后提出了芯片中嵌入式ROM电路的内建自测试,着重介绍了内建自测试的设计与实现,并分析了采用内建自测试的误判概率,研究结果表明,文章进行的嵌入式ROM内建自测试仅仅增加了很少的芯片面积开销,获得了满意的故障覆盖率,大大提高了整个芯片的可测性。  相似文献   

12.
并行视频运动估计协处理器设计   总被引:2,自引:1,他引:1  
本文重点研究通用视频处理器(VSP)中运动估计协处理器的设计。该设计提出了一种将常用的并行SIMD结构与流水线MISD结构相结合的新颖并行视频处理体系结构形式。协处理器中各个模块单独设计,经由指令调用来实现不同的算法。兼顾到不同格式视频序列的通用性以及灵活性等要求,协处理器可以同时激活最多8个同类模块并行协同工作以实现对不同格式图像块的处理。该设计结构非常简单,易于实现。目前,已经通过VSP芯片整体的指令级与功能级仿真与验证。结果表明,当系统时钟为80MHz时,运动估计协处理器与VSP的其它功能部件及指令部件可以有机协调地工作。  相似文献   

13.
14.
文章讨论了定义在GaloisField(GF)2有限域上椭圆曲线密码体制(ECC)协处理器芯片的设计。首先在详细分析基于GF(2n)ECC算法的基础上提取了最基本和关键的运算,并提出了通过协处理器来完成关键运算步骤,主处理器完成其它运算的ECC加/解密实现方案。其次,进行了加密协处理器体系结构设计,在综合考虑面积、速度、功耗的基础上选择了全串行方案来实现GF(2n)域上的乘和加运算。然后,讨论了加密协处理器芯片的电路设计和仿真、验证问题。最后讨论了芯片的物理设计并给出了样片的测试结果。  相似文献   

15.
Reconfigurable Filter Coprocessor Architecture for DSP Applications   总被引:1,自引:0,他引:1  
Digital Signal Processing (DSP) is widely used in high-performance media processing and communication systems. In majority of these applications, critical DSP functions are realized as embedded cores to meet the low-power budget and high computational complexity. Usually these cores are ASICs that cannot be easily retargeted for other similar applications that share certain commonalities. This stretches the design cycle that affects time-to-market constraints. In this paper, we present a reconfigurable high-performance low-power filter coprocessor architecture for DSP applications. The coprocessor architecture, apart from having the performance and power advantage of its ASIC counterpart, can be reconfigured to support a wide variety of filtering computations. Since filtering computations abound in DSP applications, the implementation of this coprocessor architecture can serve as an important embedded hardware IP.  相似文献   

16.
17.
Traditional design for testability (DFT) is arduous and time-consuming because of the iterative process of testability assessment and design modification. To improve the DFT efficiency, a DFT process based on test point allocation is proposed. In this process, the set of optimal test points will be automatically allocated according to the signal reachability under the constraints of testability criteria. Thus, the iterative DFT process will be completed by computer and the test engineers will be released to concentrate on the system design rather than the repetitive modification process. To perform test point allocation, the dependency matrix of signal to potential test point (SP-matrix) is defined based on multi-signal flow graph. Then, genetic algorithm (GA) is adopted to search for the optimal test point allocation solution based on the SP-matrix. At last, experiment is carried out to evaluate the effectiveness of the algorithm.  相似文献   

18.
The spring scheduling coprocessor is a novel very large scale integration (VLSI) accelerator for multiprocessor real-time systems. The coprocessor can be used for static as well as online scheduling. Many different policies and their combinations can be used (e.g., earliest deadline first, highest value first, or resource-oriented policies such as earliest available time first). In this paper, we describe a coprocessor architecture, a CMOS implementation, an implementation of the host/coprocessor interface and a study of the overall performance improvement. We show that the current VLSI chip speeds up the main portion of the scheduling operation by over three orders of magnitude. We also present an overall system improvement analysis by accounting for the operating system overheads and identify the next set of bottlenecks to improve. The scheduling coprocessor includes several novel VLSI features. It is implemented as a parallel architecture for scheduling that is parameterized for different numbers of tasks, numbers of resources, and internal wordlengths. The architecture was implemented using a single-phase clocking style in several novel ways. The 328 000 transistor custom 2-μm VLSI accelerator running with a 100-MHz clock, combined with careful hardware/software co-design results in a considerable performance improvement, thus removing a major bottleneck in real-time systems  相似文献   

19.
Based on the microprocessor structure,an RSA coprocessor for improved Montgomery algorithm has been designed.The functional units of this coprocessor operate concurrently,and up to three instructions can be issued in one cycle.A mixed form of three-stage and two-stage pipelined structure is used for instruction execution,and the coprocessor and CPU core can share a common RAM memory through a set of switches under control.The structure of the coprocessor can be expanded to contain more than one multiplier-accumulator units for higher performance.  相似文献   

20.
Low-Cost Fast VLSI Algorithm for Discrete Fourier Transform   总被引:1,自引:0,他引:1  
A primeN-length discrete Fourier transform (DFT) can be reformulated into a (N-1)-length complex cyclic convolution and then implemented by systolic array or distributed arithmetic. In this paper, a recently proposed hardware efficient fast cyclic convolution algorithm is combined with the symmetry properties of DFT to get a new hardware efficient fast algorithm for small-length DFT, and then WFTA is used to control the increase of the hardware cost when the transform length Nis large. Compared with previously proposed low-cost DFT and FFT algorithms with computation complexity of O(logN), the new algorithm can save 30% to 50% multipliers on average and improve the average processing speed by a factor of 2, when DFT length Nvaries from 20 to 2040. Compared with previous prime-length DFT design, the proposed design can save large amount of hardware cost with the same processing speed when the transform length is long. Furthermore, the proposed design has much more choices for different applicable DFT transform lengths and the processing speed can be flexible and balanced with the hardware cost  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号