期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Distributed method for cracking WPA/WPA2‐PSK on multi‐core CPU and GPU architecture

Liu Yong‐lei Jin Zhi‐gang 《International Journal of Communication Systems》2015,28(4):723-742

To overcome the limitations of the existing brute force cracking method of Wi‐Fi Protected Access/Wi‐Fi Protected Access II (WPA/WPA2)‐pre‐shared key (PSK) based on single core CPU or one core of a multi‐core CPU, a new distributed multi‐core CPU and GPU parallel cracking method (DMCG) was first proposed. Colored Petri nets was used to validate the four‐way handshake protocol and proved that DMCG could successfully crack WPA/WPA2‐PSK. In DMCG, the PSK list was distributed to each PC reasonably using distributed technology. Multiple computing cores were made up of multi‐core CPU and GPU on single PC to crack in parallel. GPU contributed to the cracking speed improvement due to the strong computing power for intensive parallel tasks. Experimental results showed that DMCG improved the cracking speed by two orders of magnitude and would exhibit more notable advantages with high‐performance distributed system as the cracking speed improved by three or four orders of magnitude, compared with the computing power of one CPU core. An improved Amdahl's law was first proposed, by which the upper bound of the cracking speedup was analyzed. Aiming to the DMCG expansion of cloud computing based on GPU, a lightweight framework called Dandelion computing model was first proposed. Moreover, the analysis of the influences of the graphics card parameters on the cracking speed was processed, and accordingly, the decision support for choosing graphics card in DMCG based on analytic hierarchy process was provided. Finally, the performance optimization of DMCG was processed. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

2.

基于异构平台的细胞神经网络算法研究

刘昊《电子质量》2010,(12):1-4

随着GPU的发展,其计算能力和访存带宽都超过了CPU,在GPU上进行通用计算具有成本低、性能高的特点。细胞神经网络由于其特有的性质,非常适合利用GPU进行并行计算,因此,该文提出了利用CU-DA实现的基于GPU的细胞神经网络异构算法,并应用在图像边缘检测上。实验结果证明,与传统的利用CPU实现的边缘检测方法相比,在速度上,基于GPU实现的图像边缘检测方法提高了数十倍,为细胞神经网络在实时图像、视频处理上的应用提供了新的方法。相似文献

3.

字符串匹配算法的实现:CPU vs.GPU vs.FPGA

李璋杜慧敏王涌钢《电子科技》2014,27(12):5-8

针对字符串匹配算法在各平台实现的性能问题,将算法在CPU、GPU及FPGA上做了测试对比。GPU具有计算单元多的特点,使得GPU对计算密集型应用有较大的效率提升;而FPGA具有级强的灵活性、可编程性及大量的逻辑运算单元,在处理字符串匹配时的处理速度快。通过对3种实现方式在Snort规则库下做的分析,其结果表明,FPGA的处理速度最快,相比GPU的处理速度提升了10倍。而CPU的串行处理速度最慢,且FPGA的资源消耗最多,GPU次之,CPU的资源消耗最少,且实现最简单。相似文献

4.

基于GPU通用计算平台的乐谱自动识别系统设计

下载免费PDF全文

谢晨伟陆天翼汤勇明《电子器件》2015,38(4)

在GPU通用计算平台上实现了一个钢琴独奏乐曲的乐谱识别系统,它读取WAV格式音频文件,利用GPU通用计算技术加速自相关函数算法来实现音高的识别,并综合考虑短时能量和基音周期的变化进行节拍划分。通过实际测试,验证了该乐谱识别系统的准确性,并证明了GPU并行计算对系统计算效率提升的效果：将计算时间减少到传统CPU计算时间的16%左右。相似文献

5.

基于GPU的AES算法实现

商凯胡艳《电子技术》2011,38(5):9-11

近几年图形处理器GPU的通用计算能力发展迅速,现在已经发展成为具有巨大并行运算能力的多核处理器,而CUDA架构的推出突破了传统GPU开发方式的束缚,把GPU巨大的通用计算能力解放了出来.本文利用GPU来加速AES算法,即利用GPU作为CPU的协处理器,将AES算法在GPU上实现,以提高计算的吞吐量.最后在GPU和CPU... 相似文献

6.

Distributed topology control in large‐scale hybrid RF/FSO networks: SIMT GPU‐based particle swarm optimization approach

Osama Awwad Ala Al‐Fuqaha Ghassen Ben Brahim Bilal Khan Ammar Rayes 《International Journal of Communication Systems》2013,26(7):888-911

The tremendous power of graphics processing unit (GPU) computing relative to prior CPU‐only architectures presents new opportunities for efficient solutions of previously intractable large‐scale optimization problems. Although most previous work in this field focused on scientific applications in the areas of medicine and physics, here we present a Compute Unified Device Architecture‐based (CUDA) GPU solution to solve the topology control problem in hybrid radio frequency and free space optics wireless mesh networks by adapting and adjusting the transmission power and the beam‐width of individual nodes according to QoS requirements. Our approach is based on a stochastic global optimization technique inspired by the social behavior of flocking birds — so‐called ‘particle swarm optimization’ — and was implemented on the NVIDIA GeForce GTX 285 GPU. The implementation achieved a performance speedup factor of 392 over a CPU‐only implementation. Several innovations in the memory/execution structure in our approach enabled us to surpass all prior known particle swarm optimization GPU implementations. Our results provide a promising indication of the viability of GPU‐based approaches towards the solution of large‐scale optimization problems such as those found in radio frequency and free space optics wireless mesh network design. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

7.

基于GPU的宽带信号时延差与相位差估计方法

毛飞龙焦义文马宏李超高泽夫王育欣《电讯技术》2023,63(11):1779-1789

针对多天线信号合成系统对于宽带、高速、并行信号的实时合成需求,设计了基于图形处理单元（Graphics Processing Unit,GPU）的宽带信号时延差与相位差估计方法,对估计方法中的快速傅里叶变换（Fast Fourier Transform,FFT)、共轭相乘、累加平均等模块进行了相应的线程并行程序设计。为充分发挥GPU的并行运算能力,利用异步流并发的方式对估计方法进一步优化,从结构层面有效提高了数据的并行处理效率。对基于GPU的宽带信号时延差与相位差估计方法进行了实验验证,多次实验测试结果表明,数据量为512 000时,在保证估计正确性的基础上,该方法相比传统串行CPU估计方法约有125倍的加速比,采用该方法可实现对多天线信号参数的实时估计。相似文献

8.

在GPU上实现Jacobi迭代法的分析与设计

吴玫华《电子设计工程》2012,20(10):28-30

随着GPU技术的快速发展,GPU的浮点运算能力飞速提升。将GPU浮点处理能力用于非图形计算领域正成为高性能计算领域的热点研究问题。Jacobi迭代法是科学计算中常用的计算方法。在分析了GPU和Jacobi迭代法特征的基础上,基于Nvidia的CUDA平台设计并实现了Jacobi迭代算法,并通过实验表明,相对于CPU取得了较好的加速效果。相似文献

9.

基于GPU的星图配准算法并行程序设计

下载免费PDF全文

陈茜邱跃洪易红伟《红外与激光工程》2014,43(11):3756-3761

星图配准是星图处理应用中的一个重要步骤,因此星图配准的速度直接影响了星图处理的整体速度.近几年来,图形处理器(GPU)在通用计算领域得到快速的发展.结合GPU在通用计算领域的优势与星图配准面临的处理速度的问题,研究了基于GPU加速处理星图配准的算法.在已有配准算法的基础上,根据算法特点提出了相应的GPU并行设计模型,利用CUDA编程语言进行仿真实验.实验结果表明:相较于传统基于CPU的配准算法,基于GPU的并行设计模型同样达到了配准要求,且配准速度的加速比达到29.043倍. 相似文献

10.

基于GPU通用计算CUDA架构的人体检测技术

周晓阳《电子工程师》2012,(2):41-43

随着计算机硬件技术的高速发展,图形处理器（Graphic processing unit,GPU）通用计算已经发展到颇为成熟阶段,其并行运算速度已远远超过多核CPU。文章简介CUDA架构并验证其在图形处理中的加速能力,对比线性代数运算在CPU与GPU架构下的效率,将CUDA技术应用于智能视频监控人体检测系统中,实验验证其高效性及可行性。最后对CUDA的发展方向进行了展望。相似文献

11.

FPGA and GPU-based acceleration of ML workloads on Amazon cloud - A case study using gradient boosted decision tree library

《Integration, the VLSI Journal》2020

Cloud vendors such as Amazon (AWS) have started to offer FPGAs in addition to GPUs and CPU in their computing on-demand services. In this work we explore design space trade-offs of implementing a state-of-the-art machine learning library for Gradient-boosted decision trees (GBDT) on Amazon cloud and compare the scalability, performance, cost and accuracy with best known CPU and GPU implementations from literature. Our evaluation indicates that depending on the dataset, an FPGA-based implementation of the bottleneck computation kernels yields a speed-up anywhere from 3X to 10X over a GPU and 5X to 33X over a CPU. We show that smaller bin size results in better performance on a FPGA, but even with a bin size of 16 and a fixed point implementation the degradation in terms of accuracy on a FPGA is relatively small, around 1.3%–3.3% compared to a floating point implementation with 256 bins on a CPU or GPU. 相似文献

12.

Efficient GPU and CPU-based LDPC decoders for long codewords

Stefan Gr?nroos Kristian Nybom Jerker Bj?rkqvist 《Analog Integrated Circuits and Signal Processing》2012,73(2):583-595

The next generation DVB-T2, DVB-S2, and DVB-C2 standards for digital television broadcasting specify the use of low-density parity-check (LDPC) codes with codeword lengths of up to 64800 bits. The real-time decoding of these codes on general purpose computing hardware is useful for completely software defined receivers, as well as for testing and simulation purposes. Modern graphics processing units (GPUs) are capable of massively parallel computation, and can in some cases, given carefully designed algorithms, outperform general purpose CPUs (central processing units) by an order of magnitude or more. The main problem in decoding LDPC codes on GPU hardware is that LDPC decoding generates irregular memory accesses, which tend to carry heavy performance penalties (in terms of efficiency) on GPUs. Memory accesses can be efficiently parallelized by decoding several codewords in parallel, as well as by using appropriate data structures. In this article we present the algorithms and data structures used to make log-domain decoding of the long LDPC codes specified by the DVB-T2 standard??at the high data rates required for television broadcasting??possible on a modern GPU. Furthermore, we also describe a similar decoder implemented on a general purpose CPU, and show that high performance LDPC decoders are also possible on modern multi-core CPUs. 相似文献

13.

GPU Accelerated Generation of Digitally Reconstructed Radiographs for 2-D/3-D Image Registration

OM Dorgham SD Laycock MH Fisher 《IEEE transactions on bio-medical engineering》2012,59(9):2594-2603

Recent advances in programming languages for graphics processing units (GPUs) provide developers with a convenient way of implementing applications which can be executed on the CPU and GPU interchangeably. GPUs are becoming relatively cheap, powerful, and widely available hardware components, which can be used to perform intensive calculations. The last decade of hardware performance developments shows that GPU-based computation is progressing significantly faster than CPU-based computation, particularly if one considers the execution of highly parallelisable algorithms. Future predictions illustrate that this trend is likely to continue. In this paper, we introduce a way of accelerating 2-D/3-D image registration by developing a hybrid system which executes on the CPU and utilizes the GPU for parallelizing the generation of digitally reconstructed radiographs (DRRs). Based on the advancements of the GPU over the CPU, it is timely to exploit the benefits of many-core GPU technology by developing algorithms for DRR generation. Although some previous work has investigated the rendering of DRRs using the GPU, this paper investigates approximations which reduce the computational overhead while still maintaining a quality consistent with that needed for 2-D/3-D registration with sufficient accuracy to be clinically acceptable in certain applications of radiation oncology. Furthermore, by comparing implementations of 2-D/3-D registration on the CPU and GPU, we investigate current performance and propose an optimal framework for PC implementations addressing the rigid registration problem. Using this framework, we are able to render DRR images from a 256×256×133 CT volume in ～?24 ms using an NVidia GeForce 8800 GTX and in ～?2 ms using NVidia GeForce GTX 580. In addition to applications requiring fast automatic patient setup, these levels of performance suggest image-guided radiation therapy at video frame rates is technically feasible using relatively low cost PC architecture. 相似文献

14.

FPGA-based acceleration for binary neural networks in edge computing

下载免费PDF全文

《电子科技学刊:英文版》2023,21(2):100204

As a core component in intelligent edge computing, deep neural networks (DNNs) will increasingly play a critically important role in addressing the intelligence-related issues in the industry domain, like smart factories and autonomous driving. Due to the requirement for a large amount of storage space and computing resources, DNNs are unfavorable for resource-constrained edge computing devices, especially for mobile terminals with scarce energy supply. Binarization of DNN has become a promising technology to achieve a high performance with low resource consumption in edge computing. Field-programmable gate array (FPGA)-based acceleration can further improve the computation efficiency to several times higher compared with the central processing unit (CPU) and graphics processing unit (GPU). This paper gives a brief overview of binary neural networks (BNNs) and the corresponding hardware accelerator designs on edge computing environments, and analyzes some significant studies in detail. The performances of some methods are evaluated through the experiment results, and the latest binarization technologies and hardware acceleration methods are tracked. We first give the background of designing BNNs and present the typical types of BNNs. The FPGA implementation technologies of BNNs are then reviewed. Detailed comparison with experimental evaluation on typical BNNs and their FPGA implementation is further conducted. Finally, certain interesting directions are also illustrated as future work. 相似文献

15.

基于GPU加速的地震图像重建技术

许盼兮张东孙尽尧《半导体光电》2013,34(5):852-857

针对目前地层层析成像算法中正演算法存在计算量大、计算速度慢的问题,以图像处理器(GPU)为核心,研究并实现了一种基于GPU平台的时域有限差分(FDTD)正演算法。CUDA是一种由NVIDIA推出的GPU通用并行计算架构,也是目前较为成熟的GPU并行运算架构。而FDTD正演算法本身在算法特性上满足并行的要求,二者的结合将极大地加速程序的计算速度。在基于标准Marmousi速度模型的正演模拟中,程序速度提升30倍,而GPU正演图像与CPU正演结果误差小于千分之一。算例表明CUDA可以大大加速目前的FDTD正演算法,并且随着GPU硬件自身的发展和计算架构的不断改进,加速效果还将进一步提升,这将有利于后续波形反演工作的进展。相似文献

16.

Simulating 3-D lung dynamics using a programmable graphics processing unit.

Anand P Santhanam Felix G Hamza-Lup Jannick P Rolland 《IEEE transactions on information technology in biomedicine》2007,11(5):497-506

Medical simulations of lung dynamics promise to be effective tools for teaching and training clinical and surgical procedures related to lungs. Their effectiveness may be greatly enhanced when visualized in an augmented reality (AR) environment. However, the computational requirements of AR environments limit the availability of the central processing unit (CPU) for the lung dynamics simulation for different breathing conditions. In this paper, we present a method for computing lung deformations in real time by taking advantage of the programmable graphics processing unit (GPU). This will save the CPU time for other AR-associated tasks such as tracking, communication, and interaction management. An approach for the simulations of the three-dimensional (3-D) lung dynamics using Green's formulation in the case of upright position is taken into consideration. We extend this approach to other orientations as well as the subsequent changes in breathing. Specifically, the proposed extension presents a computational optimization and its implementation in a GPU. Results show that the computational requirements for simulating the deformation of a 3-D lung model are significantly reduced for point-based rendering. 相似文献

17.

基于CUDA的多路高清视频流解码器设计与实现

唐昆鹏陈庆奎《电子科技》2016,29(4):71

针对多视频流解码和显示时CPU占用率过高等问题。设计了基于统一计算设备架构(CUDA)平台上的GPU多视频流并行化处理方案,定义了表示GPU显卡设备和解码器的数据结构,通过解码函数接口的调用可适用于多种视频播放器中去。实验结果表明,所设计的解码器大幅降低了多视频解码显示中CPU的占用率,同时与JM实现的软件解码方案相比,解码单路720 p的高清视频CPU占用率同比降低约30%,所以此硬件解码方案表现出更加高效的多视频流解码处理能力。提高了系统性能和资源复用率,并能保持较低的能量消耗。相似文献

18.

High-performance template tracking

R. Cabido M. Martínez-Zarzuela 《Journal of Visual Communication and Image Representation》2012,23(2):271-286

Tracking systems are important in computer vision, with applications in video surveillance, human computer interfaces (HCI), etc. Consumer graphics processing units (GPUs) have experienced an extraordinary evolution in both computing performance and programmability, leading to a greater use of the GPU for non-rendering applications, such as image processing and computer vision tasks. In this work we show an effective particle filtering implementation for real-time template tracking based on the use of a graphics card as a streaming architecture in a translation-rotation-scale model. 相似文献

19.

基于GPU的MTD性能优化

杨千禾袁子乔扈月松《火控雷达技术》2021,50(1):86-93

为了解决传统雷达信号处理机在研发阶段面临的调试困难,计算能力受硬件限制及程序复用性差等问题,本文提出了使用GPU作为雷达计算核心的方案.在使用GPU实现雷达信号处理算法的过程中,动目标检测(MTD)部分的优化效果远低于脉冲压缩和恒虚警检测.经过分析,MTD过程中的矩阵转置与向量点乘占据了算法的大量时间.本文从GPU的数... 相似文献

20.

Towards accelerating irregular EDA applications with GPUs

Hao Qian^{Author Vitae} Yangdong DengAuthor VitaeBo WangAuthor Vitae Shuai MuAuthor Vitae 《Integration, the VLSI Journal》2012,45(1):46-60

Recently graphic processing units (GPUs) are rising as a new vehicle for high-performance, general purpose computing. It is attractive to unleash the power of GPU for Electronic Design Automation (EDA) computations to cut the design turn-around time of VLSI systems. EDA algorithms, however, generally depend on irregular data structures such as sparse matrix and graphs, which pose major challenges for efficient GPU implementations. In this paper, we propose high-performance GPU implementations for a set of important irregular EDA computing patterns including sparse matrix, graph algorithms and message-passing algorithms. In the sparse matrix domain, we solve a core problem, sparse-matrix vector product (SMVP). On a wide range of EDA problem instances, our SMVP implementation outperforms all prior work and achieves a speedup up to 50× over the CPU baseline implementation. The GPU based SMVP procedure is applied to successfully accelerate two core EDA computing engines, timing analysis and linear system solution. In the graph algorithm domain, we developed a SMVP based formulation to efficiently solve the breadth-first search (BFS) problem on GPUs. We also developed efficient solutions for two message-passing algorithms, survey propagation (SP) based SAT solution and a register-transfer level (RTL) simulation. Our results prove that GPUs have a strong potential to accelerate EDA computing through designing GPU-friendly algorithms and/or re-organizing computing structures of sequential algorithms. 相似文献