共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents two high-throughput, low-latency converters that can be used to convert synchronous communication protocol to asynchronous one and vice versa. We have designed these two hardware components to be used in a Globally Asynchronous Locally Synchronous clusterized Multi-Processor System-on-Chip communicating by a fully asynchronous Network-on-Chip. The proposed architecture is rather generic, and allows the system designer to make various trade-offs between latency and robustness, depending on the selected synchronizer. We have physically implemented the two converters with portable ALLIANCE CMOS standard cell library and evaluated the architectures by SPICE simulation for a 90 nm CMOS fabrication process. 相似文献
2.
3.
In the coming years, the well-known synchronous design style will not be able to keep pace with the increase speed and capabilities of integration of advanced processes. New design paradigms, like core reuse of the already designed synchronous modules and asynchronous designs, are considered in order to cope with the ever increasing complexity. The future SoCs will contain multiple synchronous and asynchronous cores. Asynchronous design will become more and more common among digital designers, while synchronous-asynchronous interactions will emerge as a key issue in the future SoC designs.This paper will present test strategies for 2-phase asynchronous-synchronous interfaces and vice versa. It will be shown how test vectors can be automatically generated using commercially available ATPG tools. The generated ATPG vectors will be able to test all stuck-at-faults within the asynchronous-synchronous interfaces. 相似文献
4.
提出一种浮点型数字信号处理器(DSP)硬核结构,在兼容定点数运算的同时,也为浮点数运算提供较好支持。目前各大现场可编程门阵列(FPGA)主流厂商在实现浮点数运算功能时均采用软核实现方式,即将浮点数运算算法映射到芯片上,通过逻辑资源和DSP模块实现。相比于传统方法,提出的硬核结构在不占用FPGA中其他逻辑资源情况下,仅利用DSP模块便能完成浮点数运算。设计中,充分考虑负载和时延影响,插入多级流水线,显著提高浮点数的计算效率。采用中芯国际(MCI)28 nm工艺设计并完成所提出的浮点型DSP硬核结构。仿真结果表明,所提出的硬核结构的单个浮点数加法和乘法效率为0.4 Gflops。 相似文献
5.
L. Bissi Author Vitae Author Vitae G. Baruffa Author VitaeAuthor Vitae 《Integration, the VLSI Journal》2008,41(2):161-170
This paper presents a Viterbi decoder (VD) architecture for a programmable data transmission system, implemented using a Field Programmable Gate Array (FPGA) device. This VD has been conceived as a building block of a software defined radio (SDR) mobile transceiver, reconfigurable on request and capable to provide agility in choosing between different standards. UMTS and GPRS Viterbi decoding is achieved by choosing different coding rates and constraint lengths, and the possibility to switch, at run time, between them guarantees a high degree of programmability. The architecture has been tested and verified with a Xilinx XC2V2000 FPGA for providing a generalized co-simulation/co-design testbed. The results show that this decoder can sustain an uncoded data rate of about 2 Mbps, with an area occupancy of 46%, due to the efficient resources reuse. 相似文献
6.
7.
随着现场可编程门阵列(FPGA)器件尺寸不断增大,计算机辅助设计(CAD)工具运行时间成为突出的问题。布线是FPGA的CAD流程中最为耗时的一个阶段,一种能有效缩短布线时间的方法就是并行布线。本文提出一种减少FPGA时序驱动布线算法运行时间的多线程方法。该算法首先将信号按照线网的扇出数量进行排序,再将排序后的线网均匀分配到各个线程中,最后并发执行所有的线程。在布线质量没有受到显著影响的前提下,即线长增加2.58%,关键路径延时增加1.78%的情况下,相对于传统通用布局布线工具(VPR)时序驱动布线算法8线程下的加速比为2.46。 相似文献
8.
9.
10.
使用图像识别技术,提出了一种应用于环形工业生产线仪表数据采集与自动监控的系统架构,并结合实际的环形生产线平台,给出了详细的硬件设计和软件设计。针对工业现场仪表样本采集困难的弱点,为保障系统的识别率,利用虚拟样本改进的BP神经网络设计了模式分类器,并移植到FPGA上,发挥其并行计算能力,实现了对生产线的实时、高效监控。 相似文献
11.
12.
固定角度旋转的CORDIC(Coordinate Rotation Digital Computer)算法已经广泛的应用于高速数字信号处理、图像处理、机器人学等领域.针对固定角度旋转CORDIC算法在相位旋转过程中,存在数据吞吐率较高、占用硬件资源较多且资源消耗量大等缺点,提出了利用混合CORDIC算法,将角度旋转分为单向角度旋转和一次角度估计旋转两部分.本文根据欠阻尼理论,将固定角度旋转采用单向旋转CORDIC算法实现,减少了流水线的级数和迭代符号位的判决,然后通过对角度估计旋转的二进制表示,修正常数因子,再根据角度映射关系进行相关处理,完成高速高精度坐标旋转.最后在硬件平台上进行了仿真实验.实验结果表明,在误差范围一定的前提下,混合算法进一步的减少了迭代次数,并且资源消耗较低,提高了数据吞吐率. 相似文献
13.
为了缩短专用集成电路和片上系统的功能验证周期,该文提出FPGA硬核处理器系统加速数字电路功能验证的方法。所提方法综合软件仿真功能验证和现场可编程门阵列原型验证的优点,利用集成在片上系统现场可编程门阵列器件中的硬核处理器系统作为验证激励发生单元和功能验证覆盖率分析单元,解决了验证速度和灵活性不能统一的问题。与软件仿真验证相比,所提方法可以有效缩短数字电路的功能验证时间;在功能验证效率和验证知识产权可重用方面表现优于现有的FPGA原型验证技术。 相似文献
14.
802.11p物理层OFDM基带调制器的FPGA实现 总被引:1,自引:0,他引:1
基于IEEE 802.11p标准的车辆无线通信系统作为智能交通系统的一部分,能极大地改善交通系统的安全性、有效性和实时性。该文详细描述了作者对车辆无线通讯系统关键技术进行的相关研究内容,并且在CycloneⅢEP3C120F平台上实现了一种基于IEEE 802.11p标准的物理层OFDM基带调制系统,效果理想,有很好的移植性。 相似文献
15.
Physically Unclonable Functions (PUFs) are a promising technology and have been proposed as central building blocks in many cryptographic protocols and security architectures. Among other uses, PUFs enable chip identifier/authentication, secret key generation/storage, seed for a random number generator and Intellectual Property (IP) protection. Field Programmable Gate Arrays (FPGAs) are re-configurable hardware systems which have emerged as an interesting trade-off between the versatility of standard microprocessors and the efficiency of Application Specific Integrated Circuits (ASICs). In FPGA devices, PUFs may be instantiated directly from FPGA fabric components in order to exploit the propagation delay differences of signals caused by manufacturing process variations. PUF technology can protect the individual FPGA IP cores with less overhead. In this article, we first provide an extensive survey on the current state-of-the-art of FPGA based PUFs. Then, we provide a detailed performance evaluation result for several FPGA based PUF designs and their comparisons. Subsequently, we briefly report on some of the known attacks on FPGA based PUFs and the corresponding countermeasures. Finally, we conclude with a brief overview of the FPGA based PUF application scenarios and future research directions. 相似文献
16.
ABSTRACT This paper presents a unified reconfigurable coordinate rotation digital computer (CORDIC) processor for floating-point arithmetic. It can be configured to operate in multi-mode to achieve a variety of operations and replaces multiple single-mode CORDIC processors. A reconfigurable pipeline-parallel mixed architecture is proposed to adapt different operations, which maximises the sharing of common hardware circuit and achieves the area-delay efficiency. Compared with previous unified floating-point CORDIC processors, the consumption of hardware resources is greatly reduced. As a proof of concept, we apply it to 16384 × 16384 points target synthetic aperture radar (SAR) imaging system, which is implemented on Xilinx XC7VX690T field programmable gate array platform. The maximum relative error of each phase function between hardware and software computation and the corresponding SAR imaging result can meet the accuracy index requirements. 相似文献
17.
18.
Quantum computers have the potential to solve difficult mathematical problems efficiently, therefore meaning an important threat to Public-Key Cryptography (PKC) if large-scale quantum computers are ever built. The goal of Post-Quantum Cryptography (PQC) is to develop cryptosystems that are secure against both classical and quantum computers. DME is a new proposal of quantum-resistant PKC algorithm that was presented for NIST PQC Standardization competition in order to set the next-generation of cryptography standards. DME is a multivariate public key, signature and Key Encapsulation Mechanism (KEM) system based on a new construction of the central maps, that allows the polynomials of the public key to be of an arbitrary degree. In this paper, a high-throughput pipelined architecture of DME is presented and hardware implementations over Xilinx FPGAs have been performed. Experimental results show that the architecture here presented exhibits the lowest execution time and highest throughput when it is compared with other PQC multivariate implementations given in the literature. 相似文献
19.
A fast half-pixel motion estimation algorithm and its corresponding hardware architecture are presented. Unlike three steps are needed in typical half-pixel motion estimation algorithm, the presented algorithm needs only two steps to obtain all the interpolated pixels of an entire 8′8 block. The proposed architecture works in a parallel way and is simulated by Modelsim 6.5 SE, synthesized to the Xilinx Virtex4 XC4VLX15 Field Programmable Gate Array(FPGA) device, and verified by hardware platform. The implementation results show that this architecture can achieve 190 MHz and 11 clock cycles are reduced to complete the entire interpolation process in comparison with typical half-pixel interpolation, which meets the requirements of real-time application for very high defination videos. 相似文献