首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We present a pipelined approach to hardware implementation of the Aho-Corasick (AC) algorithm for string matching called P-AC. By incorporating pipelined processing, the state graph is reduced to a character trie that only contains forward edges. Edge reduction in P-AC is very impressive and is guaranteed algorithmically. For a signature set with 4434 strings extracted from the Snort rule set, the memory cost of P-AC is only 21.5 bits/char. The simplicity of the pipeline control plus the availability of 2-port memories allow us to implement two pipelines sharing the set of lookup tables on the same device. By doing so, the system throughput can be doubled with little overhead. The throughput of our method is up to 8.8 Gbps when the system is implemented using 550MHz FPGA.  相似文献   

3.
4.
5.
A performance comparison for the 64-bit block cipher (Triple-DES, IDEA, CAST-128, MISTY1, and KHAZAD) FPGA hardware implementations is given in this paper. All these ciphers are under consideration from the ISO/IEC 18033-3 standard in order to provide an international encryption standard for the 64-bit block ciphers. Two basic architectures are implemented for each cipher. For the non-feedback cipher modes, the pipelined technique between the rounds is used, and the achieved throughput ranges from 3.0 Gbps for IDEA to 6.9 Gbps for Triple-DES. For feedback ciphers modes, the basic iterative architecture is considered and the achieved throughput ranges from 115 Mbps for Triple-DES to 462 Mbps for KHAZAD. The throughput, throughput per slice, latency, and area requirement results are provided for all the ciphers implementations. Our study is an effort to determine the most suitable algorithm for hardware implementation with FPGA devices.  相似文献   

6.
Packet classification (matching) is one of the critical operations in networking widely used in many different devices and tasks ranging from switching or routing to a variety of monitoring and security applications like firewall or IDS. To satisfy the ever-growing performance demands of current and future high-speed networks, specially designed hardware accelerated architectures implementing packet classification are necessary. These demands are now growing to such an extent, that in order to keep up with the rising throughputs of network links, the FPGA accelerated architectures are required to perform matching of multiple packets in every single clock cycle. To meet this requirement a simple replication approach can be utilized – instantiate multiple copies of a processing pipeline matching incoming packets in parallel. However, simple replication of pipelines inseparably brings a significant increase in utilization of FPGA resources of all types, which is especially costly for rather scarce on-chip memories used in matching tables.We propose and examine a unique parallel hardware architecture for hash-based exact match classification of multiple packets in each clock cycle that offers a reduction of memory replication requirements. The core idea of the proposed architecture is to exploit the basic memory organization structure present in all modern FPGAs, where hundreds of individual block or distributed memory tiles are available and can be accessed (addressed) independently. This way, we are able to maintain a rather high throughput of matching multiple packets per clock cycle even without fully replicated memory resources in matching tables. Our results show that the designed approach can use on-chip memory resources very efficiently and even scales exceptionally well with increased capacities of match tables. For example, the proposed architecture is able to achieve a throughput of more than 2 Tbps (over 3 000 Mpps) with an effective capacity of more than 40 000 IPv4 flow records at the cost of only a few hundred block memory tiles (366 BlockRAM for Xilinx or 672 M20K for Intel FPGAs) utilizing only a small fraction of available logic resources (around 68 000 LUTs for Xilinx or 95 000 ALMs for Intel).  相似文献   

7.
数字调制器载波产生电路的FPGA实现通常都是基于查找表的方法,为了达到高精度要求,需要耗费大量的ROM资源去建立庞大的查找表。文中提出了一种基于流水线CORDIC算法的实现方案,可有效地节省FPGA的硬件资源,提高运算速度。电路在FPGA芯片EP1C12Q240C8上实现,并通过QuartusⅡ嵌入式逻辑分析仪SignalTapⅡ对硬件进行了实时测试,测试结果验证了设计的正确性及可行性。  相似文献   

8.
9.
基于Hausdorff距离的图像匹配算法鲁棒性较好,但计算代价较大,软件实现方案很难满足实时性要求。为了解决这个问题,本文在基于局部Hausdorff距离的图像匹配算法基础上提出了一种鲁棒而实时的FPGA实现方案。为了充分有效利用FPGA的硬件资源,首先对传统串行算法进行并行性分析,提出了一个并行算法;然后以此为基础设计了一种三段式粗粒度流水体系结构,并将其映射到FPGA上进行实现。实验结果表明,该系统在性能上优于其它相关工作,与PC(Pentium42.8GHz)上的软件实现方案相比可以达到接近50倍的加速比。  相似文献   

10.
陶涛  张云 《中国图象图形学报》2015,20(12):1639-1651
目的 当前国际流行的SIFT算法及其改进算法在检测与描述特征点时基于高斯差分函数,存在损失图像高频信息的缺陷,从而导致图像匹配时其性能随着图像变形的增加而出现急剧下降。针对SIFT算法及其改进算法的这一缺陷,本研究提出了一种新的无图像信息损失的、在对数极坐标系下的尺度不变特征点检测与描述算法。方法 本研究提出的尺度不变特征点检测与描述算法首先将直角坐标系下以采样点为中心的圆形图块转换为对数极坐标系下的矩形图块,并以此矩形图块为基础对采样点进行特征点检测与描述符提取;该算法使用固定宽度的窗口在采样点的对数极坐标径向梯度图像的logtr轴上进行移动以判断该点是否为特征点并计算该点的特征尺度,并在具有局部极大窗口响应的特征尺度位置处提取特征点的描述符。该算法的描述符基于对数极坐标系下的矩形图块的灰度梯度的幅值与角度,是一个192维向量,并具有对于尺度、旋转、光照等变化的不变性。结果 本研究采用INRIA数据组和Mikolajczyk提出的匹配性能指标对SIFT算法、SURF算法和提出的尺度不变特征点检测与描述算法进行比较。与SIFT算法和SURF算法相比,提出的尺度不变特征点检测与描述算法在对应点数、重复率、正确匹配点数和匹配率等方面均具有一定优势。结论 提出了一种基于对数极坐标系的图像匹配算法,即将直角坐标系下以采样点为中心的圆形图块转换为对数极坐标系下的矩形图块,这样在特征点的检测过程中,可以有效规避SIFT算法因为采用DoG函数而造成的高频信息损失;在描述符提取过程中,对数极坐标系可以有效地减少图像的变化量,从而提高了匹配性能。  相似文献   

11.
图像二进制特征描述器比浮点数特征描述器存储容量小、计算速度更快。在对常用二进制特征描述器进行分析的基础上,利用图像特征点之间的空间结构信息改进FREAK描述器的采样模式,提出MPFREAK描述器,提高特征描述能力;针对特征匹配时最近邻算法运行较慢的缺点,改进LSH算法,减少候选集列表空间,提出了海明空间的二进制特征快速匹配算法MLSH。实验表明,MPFREAK描述器描述能力优于其他算法,特征匹配算法效果明显、速度更快。  相似文献   

12.
二值化的SIFT特征描述子及图像拼接优化   总被引:1,自引:1,他引:0       下载免费PDF全文
目的 针对SIFT算法计算复杂度高、存储开销大和近几年提出的BRIEF(binary robust independent elementary features)、ORB(oriented BRIEF)、BRISK(binary robust invariant scalable keypoints)和FREAK(fast retina keypoint)等二进制描述子可区分性弱和鲁棒性差的问题,提出基于SIFT的二进制图像局部特征描述子。方法 首先,对传统SIFT的特征空间和特征向量分布在理论和实验上进行分析,在此基础上结合二进制特征描述子的优势对SIFT进行改进。不同于传统的二进制特征描述子,本文算法对传统SIFT特征向量在每一维上的分量进行排序后,以该特征向量的中值作为量化阈值,将高维浮点型SIFT特征向量转化成位向量得到二进制特征描述子。并使用易于计算的汉明距离代替欧氏距离度量特征点间的相似性以提高匹配效率。然后,在匹配阶段将二进制特征描述子分为两部分并分别对其进行匹配,目的是通过初匹配剔除无效匹配特征点来进一步缩短匹配时间。最后,对提出的量化算法的可区分性及鲁棒性进行验证。结果 该量化算法在保持SIFT的较强的鲁棒性和可区分性的同时,达到了低存储、高匹配效率的要求,解决了SIFT算法的计算复杂度高、二进制描述子鲁棒性和可区分性差的问题。此外,在匹配阶段平均剔除了77.5%的无效匹配特征点,减少了RANSAC(random sample consensus)的迭代次数。结论 本文提出的量化算法可用于快速匹配和快速图像拼接中,提高匹配和拼接效率。  相似文献   

13.
Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, a mid-level multi-unit decoding core, and a high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.  相似文献   

14.
影像匹配是诸多遥感影像处理和影像分析的一个关键环节。传统基于角点的灰度相关匹配算法由于不具备旋转不变性而需要人工干预进行粗匹配,无法实现自动化。SIFT(scale invariant feature transform)算法能很好地解决图像旋转、缩放等问题,但是对于几何结构特征更加清晰、纹理信息更加丰富的高分辨率遥感影像而言,该算法消耗内存多、运算速度慢的问题非常突出。将两者结合,提出基于Harris角点和SIFT描述符的影像匹配算法。实验结果表明,相比SIFT算法,该算法大量缩减了运算时间,同时保留了SIFT描述符的旋转不变性和对光照变化的适应性,克服了灰度相关算法无法实现全自动的缺点,在高分辨率遥感影像匹配上效果较好。  相似文献   

15.
16.
The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.  相似文献   

17.
A parallel architecture for an on-line implementation of the recursive least squares (RLS) identification algorithm on a field programmable gate array (FPGA) is presented. The main shortcoming of this algorithm for on-line applications is its computational complexity. The matrix computation to update error covariance consumes most of the time. To improve the processing speed of the RLS architecture, a multi-stage matrix multiplication (MMM) algorithm was developed. In addition, a trace technique was used to reduce the computational burden on the proposed architecture. High throughput was achieved by employing a pipelined design. The scope of the architecture was explored by estimating the parameters of a servo position control system. No vendor dependent modules were used in this design. The RLS algorithm was mapped to a Xilinx FPGA Virtex-5 device. The entire architecture operates at a maximum frequency of 339.156 MHz. Compared to earlier work, the hardware utilization was substantially reduced. An application-specific integrated circuit (ASIC) design was implemented in 180 nm technology with the Cadence RTL compiler.  相似文献   

18.
提出了一种基于流水线CORDIC的算法实现QAM调制,可有效节省硬件资源,提高运算速度.用Verilog HDL对本设计进行了编程和功能仿真,仿真结果表明,本设计具有一定的实用性.  相似文献   

19.
为实现高速可配RSA硬件加速器,提出了一种基于基—64蒙哥马利算法的模乘器流水线架构及其对应的可配置存储结构。通过五级流水线的并行运算和存储器的灵活配置,可以高效地实现256位到2048位的RSA运算。实验结果表明:与其他相关工作比较,提出的流水线架构能够取得较好的性能和资源消耗比,加速器在模乘器性能和数据吞吐率方面有明显提高。在73 k门硬件资源下,在1024位RSA运算情况下,实现了333 kbps的数据吞吐率。  相似文献   

20.
王蕾 《自动化信息》2011,(10):29-31,67
基于SIFT(尺度不变特征变换)特征匹配思想,提出了一种应用对极几何约束的图像特征配准算法。首先对图像提取SIFT特征点,然后通过欧氏距离估算对SIFT特征描述子进行初始匹配得到预匹配点集;采用基于单应矩阵的抽样算法计算初始基础矩阵,通过RANSAC算法计算精确的基础矩阵和匹配点集,进而实现图像配准。实验表明,该算法可以获得更准确的匹配点,得到精度较高的图像配准效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号