首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 125 毫秒
1.
基于GPU的MD5高速解密算法的实现   总被引:1,自引:1,他引:1       下载免费PDF全文
乐德广  常晋义  刘祥南  郭东辉 《计算机工程》2010,36(11):154-155,158
MD5快速碰撞算法由于不支持逆向过程而无法在MD5密码攻击中得到实际应用。针对上述问题,通过分析基于图形处理单元(GPU)的MD5密码并行攻击算法原理,设计基于GPU的MD5高速解密算法,在此基础上实现一个MD5高速密码攻击系统。测试结果证明,该算法能有效加快MD5密码破解速度。  相似文献   

2.
近年来GPU作为一种具有极强运算能力的多核处理器,得到了快速的发展,成为高性能计算领域的主要发展方向。各种分子动力学模拟的主流软件也纷纷使用GPU技术,其中LAMMPS较早地开发出了通用的并行GPU版本。本文利用nVIDIA公司最新Femi架构的Tesla C2050 GPU搭建了小型的基于LAMMPS的分子动力学模拟GPU并行计算集群,通过氩原子熔化的算例对集群性能进行了测试,测试的内容包括CPU集群、单节点单GPU、单节点多GPU以及多节点GPU集群。比较了各种情况的加速倍数并对造成性能改变的原因进行了讨论,分析了用于MD模拟的GPU并行计算集群性能的瓶颈所在,提出可能的解决方法,搭建集群时,充分考虑PCI总线的承受能力,对于集群效率的提高有很大好处。测试结果表明,集群的性能较高,相对于以往的单机以及CPU集群,计算的规模大大提高了,加速比也在20倍以上。可以预测,在未来一段时间内,多GPU并行是分子动力学模拟的发展方向。  相似文献   

3.
介绍布隆过滤器的相关理论,对MD5哈希算法进行较为详细的分析,对GPU和CPU的结构及运算特点进行分析比较,提出一种基于布隆过滤器并使用GPU进行URL的MD5计算的网页搜索去重方法。  相似文献   

4.
张润梅  王霄 《计算机科学》2011,38(2):302-305
由于内存、运算速度以及磁盘空间的限制,暴力破解MD5几乎无法在PC机上实现。CUDA意在使GPU的超高计算性能在数据处理和科学计算等通用计算领域发挥优势。主要研究基于CUD八架构的MD5破解方法,并使用VS2005与NVCC进行混合编译。实验选择在GeForce9600UT显卡和四核CPUQ660。上分别运行所提程序和标准C语言版程序。结果表明,在高计算负荷与巨量数据情况下,中低端显卡的计算速度比高端CPU高30~50。倍。CUDA使GPU流处理器阵列的性能得到充分发挥,极大地提高了并行计算程序的效率。  相似文献   

5.
分子动力学模拟(MD)是分子模拟的一类常用方法,为生物体系的模拟提供了重要途径。由于计算强度大,目前MD可模拟的时空尺度还不能满足真实物理过程的需要。作为CPU的加速设备,近年来,GPU为提高MD计算能力提供了新的可能。GPU编程难点主要在于如何将计算任务分解并映射到GPU端并合理组织线程及存储器,细致地平衡数据传输和指令吞吐量以发挥GPU的最大计算性能。静电效应是长程作用,广泛存在于生物现象的各个方面,对其精确模拟是MD的重要组成部分。Particle-Mesh-Ewald(PME)方法是公认的精确处理静电作用的算法之一。本文介绍在本实验室已建立的GPU加速分子动力学模拟程序GMD的基础上,基于NVIDIACUDA,采用GPU实现PME算法的策略,针对算法中组成静电作用的三个部分即实空间、傅立叶空间和能量修正项,分别采用不同的计算任务组织策略以提升整体性能。使用事实上的标准算例dhfr进行的测试结果表明,实现PME的GMD程序,性能分别是Gromacs4.5.3版单核CPU的3.93倍,8核CPU的1.5倍,基于OpenMM2.0加速的Gromacs4.5.3GPU版本的1.87倍。  相似文献   

6.
针对当前密码口令破解功能单一、破解算法种类不足、效率低下等技术问题,从基于GPU的多策略散列算法并行计算技术、基于ASIC的复杂密码破解技术和基于异构计算集群的多种计算资源调度技术三个方面展开研究,在此基础上构建多手段密码口令破解服务平台,实现密码破解的大复杂度运算和高速破解。  相似文献   

7.
使用GPU加速分子动力学模拟中的非绑定力计算   总被引:1,自引:0,他引:1  
在分子动力学模拟(MD)中,对非绑定力的计算需要花费大量的时间。本文提出了基于CUDA和Brook+的两种双精度算法,分别在NVIDIA和AMD两款主流GPU上实现了非绑定力的计算,借助GPU的计算能力加速了整个MD程序。算法对MD进行了任务分割,采用区域分解的方法将非绑定力的计算映射到GPU的计算核心上,同时针对两款GPU的各自特点提出了线程块内共享存储、最小化数据集两种优化方法。性能测试结果表明,与Intel Xeon 2.6GHzCPU的单核相比,43.2万粒子的高速粒子碰撞模拟,在配置NVIDIA Tesla C1060的系统上性能提高了6.5倍,在配置AMD HD4870的系统上性能提高了4.8倍。  相似文献   

8.
针对海量、大型遥感数据处理的难题,采用在集群计算环境中使用分布式存储、P2P通信,并与GPU相结合的策略,提出了一个基于改进集群计算的遥感数据快速处理平台。分析了该平台的体系结构与结构模型,并对其进行详细论述,最后,对平台的性能进行了分析。  相似文献   

9.
作为高性能科学计算的典型应用,利用GPU并行加速分子动力学模拟是2007年以来计算化学领域高性能计算的热点。本文概述了支持GPU加速的不同MD软件的特点和其研究进展,重点分析了Amber、GROMACS、ACEMD三个代表性软件的单GPU卡和多GPU卡计算性能,结果表明在配置相同数目GPU卡的情况下,单节点比多节点在计算性能上较有优势,桌面工作站配多块GPU卡是性价比相对较好的MD模拟计算模式。本文还考察了单精度和双精度GPU加速MD的模拟计算结果的准确性,与CPU的计算结果进行了比较,结果表明,GPU的计算结果总体而言是可信的。最后,本文对GPU并行加速MD模拟的研究现状进行总结并对未来发展做了展望。  相似文献   

10.
基于GPU的并行集群系统的各类产品遍布我国的生产,生活。本文将介绍GPU的并行集群的技术和其在我国的发展状况。  相似文献   

11.
Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10–50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.’s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps’ C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations.  相似文献   

12.
Molecular dynamics (MD) is an important research tool extensively applied in materials science. Running MD on a graphics processing unit (GPU) is an attractive new approach for accelerating MD simulations. Currently, GPU implementations of MD usually run in a one-host-process-one-GPU (OHPOG) scheme. This scheme may pose a limitation on the system size that an implementation can handle due to the small device memory relative to the host memory. In this paper, we present a one-host-process-multiple-GPU (OHPMG) implementation of MD with embedded-atom-model or semi-empirical tight-binding many-body potentials. Because more device memory is available in an OHPMG process, the system size that can be handled is increased to a few million or more atoms. In comparison with the serial CPU implementation, in which Newton’s third law is applied to improve the computational efficiency, our OHPMG implementation has achieved a 28.9x–86.0x speedup in double precision, depending on the system size, the cut-off ranges and the number of GPUs. The implementation can also handle a group of small simulation boxes in one run by combining the small boxes into a large box. This approach greatly improves the GPU computing efficiency when a large number of MD simulations for small boxes are needed for statistical purposes.  相似文献   

13.
分子动力学模拟是对微观分子原子体系在时间与空间上的运动模拟,是从微观本质上认识体系宏观性质的有力方法.针对如何提升分子动力学并行模拟性能的问题,本文以著名软件GROMACS为例,分析其在分子动力学模拟并行计算方面的实现策略,结合分子动力学模拟关键原理与测试实例,提出MPI+OpenMP并行环境下计算性能的优化策略,为并行计算环境下实现分子动力学模拟的最优化计算性能提供理论和实践参考.对GPU异构并行环境下如何进行MPI、OpenMP、GPU搭配选择以达到性能最优,本文亦给出了一定的理论和实例参考.  相似文献   

14.
wpa/wpa2-psk高速暴力破解器的设计和实现   总被引:1,自引:0,他引:1       下载免费PDF全文
针对基于单核CPU的wpa/wpa2-psk暴力破解器破解速度慢的缺点,提出一种分布式多核CPU加GPU的高速暴力破解器.采用分布式技术将密钥列表合理地分配到各台机器上,在单机上利用多核CPU和GPU形成多个计算核心并行破解,利用GPU计算密集型并行任务强大的计算能力提高破解速度.实验结果证明,该暴力破解器的破解速度相...  相似文献   

15.
汤小春  朱紫钰  毛安琪  符莹  李战怀 《软件学报》2022,33(12):4429-4451
数据密集型作业包含大量的任务,使用GPU设备来提高任务的性能是目前的主要手段.但是,在解决数据密集型作业之间的GPU资源公平共享以及降低任务所需数据在网络间的传输代价方面,现有的研究方法没有综合考虑资源公平与数据传输代价的矛盾.分析了GPU集群资源调度的特点,提出了一种基于最小代价最大任务数的GPU集群资源调度算法,解决了GPU资源的公平分配与数据传输代价较高的矛盾.将调度过程分为两个阶段:第1阶段为各个作业按照数据传输代价给出自己的最优方案;第2阶段为资源分配器合并各个作业的方案,按照公平性给出全局的最优方案.首先,给出了GPU集群资源调度框架的总体结构,各个作业给出自己的最优方案,资源分配进行全局优化;第二,给出了网络带宽估计策略以及计算任务的数据传输代价的方法;第三,给出了基于GPU数量的资源公平分配的基本算法;第四,提出了最小代价最大任务数的资源调度算法,描述了资源非抢夺、抢夺以及不考虑资源公平策略的实现策略;最后,设计了6种数据密集型计算作业,对所提出的算法进行了实验.通过实验验证,最小代价最大任务数的资源调度算法对于资源公平性能够达到90%左右,同时亦能保证作业并行运行时间最小.  相似文献   

16.
对称矩阵三对角化是求解稠密特征问题的关键计算过程.针对GPU集群采用了MPI(message passing interface)和GPU级2级并行方法设计实现了基于MPI和CUDA(compute unified device architecture )的稠密对称矩阵三对角化算法.在MPI集群级并行中,通过将2维通信域中行-列通信域间的全局数据通信设计为完全并行的点-点数据通信方式,改善了三对角化MPI并行算法的通信性能.通过改进原矩阵三对角化的MPI并行算法,避免了在GPU级并行中使用的不规则的矩阵-向量运算,这部分的并行性能提升了1倍左右.并且,将在GPU并行中存在的小粒度计算合并为较大粒度计算,该策略可通过加大计算密集度来充分地发挥GPU的计算能力,增加GPU的利用率,从而提升了算法的性能.此外,利用多个CUDA流使算法中独立的CUDA操作可以在不同的流中并发执行.并且,在并行算法中,利用CPU与GPU之间的异步数据传输,使得在不同流中的数据传输和核函数同时执行,隐藏了数据传输的时间,进一步提升了算法的性能.在中国科学院超级计算机系统“元”上,使用Nvidia Tesla K20 GPGPU测试了不同规模矩阵的基于MPI+CUDA的三对角化并行块算法的性能,取得了较好的加速效果与性能,并且具有良好的可扩展性.  相似文献   

17.
The parallel computation capabilities of modern graphics processing units (GPUs) have attracted increasing attention from researchers and engineers who have been conducting high computational throughput studies. However, current single GPU based engineering solutions are often struggling to fulfill their real-time requirements. Thus, the multi-GPU-based approach has become a popular and cost-effective choice for tackling the demands. In those cases, the computational load balancing over multiple GPU “nodes” is often the key and bottleneck that affect the quality and performance of the real-time system. The existing load balancing approaches are mainly based on the assumption that all GPU nodes in the same computer framework are of equal computational performance, which is often not the case due to cluster design and other legacy issues. This paper presents a novel dynamic load balancing (DLB) model for rapid data division and allocation on heterogeneous GPU nodes based on an innovative fuzzy neural network (FNN). In this research, a 5-state parameter feedback mechanism defining the overall cluster and node performance is proposed. The corresponding FNN-based DLB model will be capable of monitoring and predicting individual node performance under different workload scenarios. A real-time adaptive scheduler has been devised to reorganize the data inputs to each node when necessary to maintain their runtime computational performance. The devised model has been implemented on two dimensional (2D) discrete wavelet transform (DWT) applications for evaluation. Experiment results show that this DLB model enables a high computational throughput while ensuring real-time and precision requirements from complex computational tasks.  相似文献   

18.
The Building-Cube Method (BCM) based on equally-spaced Cartesian meshes has been proposed as a next generation CFD method. Due to the equally-spaced meshes, it is well suited for highly parallel computation. This paper proposes a parallel implementation scheme of BCM on a GPU cluster system, which needs efficient hierarchical parallel processing to exploit the potential of the cluster system. The proposed scheme employs the Red-Black SOR method for the pressure calculations, which is the most time-consuming part of BCM, to obtain massive data parallelism of BCM. By exploiting the coarse-grain and fine-grain parallelism of BCM, the proposed scheme hierarchically assigns equally-divided tasks into the GPU cluster system. Furthermore, to exploit the computational power of GPUs in the cluster system, the proposed scheme employs an efficient data management such as coalesced data transfer and reusing data on an on-chip memory. Experimental results show that the single GPU implementation can achieve about three times higher performance than the single CPU one. Moreover, the multiple GPU implementation can achieve an almost ideal scalability. Finally, the possibility of further acceleration of not only the pressure calculation but also the whole BCM is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号