首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 640 毫秒
1.
在分子动力学模拟系统中,实现分子间短程力的计算需要频繁地传输与大量的粒子数据访问。为了减轻CPU的计算负载,可以使用FPGA加速计算。但是,在基于FPGA的分子动力学模拟系统中,短程力计算模块面临巨大的数据传输压力以及访存冲突问题。针对这些问题,基于FPGA上有限的硬件资源,提出一种交互控制系统。该系统由取数控制模块与粒子数据解析模块组成。整个系统通过合理的数据编排以及2个模块的协同工作,实现粒子数据从片上存储到短程力计算模块的快速可靠的传输。通过硬件仿真和板级实验验证了该系统在处理粒子数据过程中的有效性和可靠性。  相似文献   

2.
为提高分子动力学模拟中短程力的计算效率,设计并实现了基于FPGA的分子动力学模拟短程力多流水计算系统。针对在短程力多流水计算过程中多个计算模块频繁调用大量的粒子信息导致的高带宽需求和访问内存冲突问题,提出了多流水数据预取系统的设计,可减少对粒子数据的重复读取,缓解访问冲突,保证计算模块的效率。本文使用Xilinx Virtex UltraScale+HBM VCU128 FPGA开发板,实验结果表明,与短程力单流水计算系统相比,短程力多流水计算系统的计算效率提高了3.29倍,同时验证了多流水数据预取系统的有效性。  相似文献   

3.
FPGA传统RTL级别开发有着较高的编程难度和较长的设计时间,这限制了FPGA在分子动力学模拟中的应用.本文使用FPGA新一代编程方案HLS,基于Alevo U50板卡设计并实现了基于可重构计算平台硬件的分子动力学短程非键成力加速器,分别从粒子配对器设计优化、计算流水线设计等方面出发,设计具有高效率、低能耗的可重构计算方法.同时针对非键成力计算中存在的动态数据流,提出了HLS+ HDL的设计方法,进而在极大缩减设计时间的同时保证加速器的性能.  相似文献   

4.
张帅  徐顺  刘倩  金钟 《计算机科学》2018,45(10):291-294, 299
分子动力学模拟存在空间和时间的复杂性,并行加速分子的模拟过程尤为重要。基于GPU硬件数据并行架构的特点,组合分子动力学模拟的原子划分和空间划分的并行策略,优化实现了短程作用力计算Cell Verlet算法,并对分子动力学核心基础算法的GPU实现做了优化和性能分析。Cell Verlet算法实现首先采用原子划分的方式,将每个粒子的模拟计算任务映射到每个GPU线程,并采用空间划分的方式将模拟区域进行元胞划分,建立元胞索引表,实现粒子在模拟空间的实时定位;而在计算粒子间的作用力时,引入希尔伯特空间填充曲线方法来保持数据的线性存储与数据的三维空间分布的局部相关性,以便通过缓存加速GPU的全局内存访问;也利用了访存地址对齐和块内共享等技术来优化设计GPU分子动力学模拟过程。实例测试与对比分析显示,当前的算法实现具有强可扩展性和加速比等优势。  相似文献   

5.
耗散粒子动力学(DPD)模拟是一种重要的研究流体动力学特性的计算模拟方法,基于Intel MIC平台设计实现了面向大规模耗散粒子动力学模拟,充分结合了DPD模拟本身的特性和MIC平台的特征。对DPD模拟中的近邻列表构建和短程作用力关键代码实现了向量化优化,在CPU和MIC协处理器之间采用任务计算负载平衡机制,支持MPI进程内线程数量负载平衡控制。分别在原型程序上和LAMMPS集成中做了性能对比分析,实验结果显示了引入相关优化技术的有效性,为进一步研究面向MIC众核平台的分子动力学相关工作奠定了基础。  相似文献   

6.
使用GPU加速分子动力学模拟中的非绑定力计算   总被引:1,自引:0,他引:1  
在分子动力学模拟(MD)中,对非绑定力的计算需要花费大量的时间。本文提出了基于CUDA和Brook+的两种双精度算法,分别在NVIDIA和AMD两款主流GPU上实现了非绑定力的计算,借助GPU的计算能力加速了整个MD程序。算法对MD进行了任务分割,采用区域分解的方法将非绑定力的计算映射到GPU的计算核心上,同时针对两款GPU的各自特点提出了线程块内共享存储、最小化数据集两种优化方法。性能测试结果表明,与Intel Xeon 2.6GHzCPU的单核相比,43.2万粒子的高速粒子碰撞模拟,在配置NVIDIA Tesla C1060的系统上性能提高了6.5倍,在配置AMD HD4870的系统上性能提高了4.8倍。  相似文献   

7.
Lammps是用于分子动力学模拟及其相关问题的一款开源软件,可利用其了解固体、液体性质,应用广泛。支持使用CUDA及OpenCL进行GPU加速。因OpenCL具有跨平台特性,将其作为研究重点。总结了OpenCL内核编程中需要注意的设计原则并阐述了一种改进的阿姆达尔定律用于衡量异构平台理论加速性能。测试了Lammps短程力计算在Y485P平台下的性能参数。通过对短程力计算中的关键部分如邻接表的建立及短程力计算部分的内核代码进行优化,使其取得了更好的加速效果。  相似文献   

8.
针对经典分子动力学和PIC方法等粒子类模拟方法具有粒子动态移动、粒子计算局部性好等共性,首先,提出了粒子量数据片对象.该对象是单网格片上的一团粒子,其中网格片是包含多个网格单元的矩形区域.然后,设计了并行算法,包括对象之间的粒子迁移和数据交换以及动态负载平衡.最后,在JASMIN框架上具体实现,进而开发了并行经典分子动力学程序和并行PIC程序.在64个处理器上实测表明,并行PIC程序模拟包含3百万个网格、2千万个粒子的复杂物理模型时,获得了80%的并行效率.  相似文献   

9.
分子动力学模拟中基于GPU的范德华非键作用计算   总被引:1,自引:1,他引:0  
GPU最初是专为图形渲染而设计的.近年来已经演化为高并行度、多线程、具有强大计算能力和极高存储器带宽的通用多核处理器,目前主流GPLJ的峰值计算能力通常可达CPU的数10倍.这提供了1种解决大计算量难题的新的可能.分子动力学模拟需要极强的计算能力.故使用GPU来进行分子动力学模拟的尝试是很自然的选择.本文基于NVIDIA的GeForceGTX295 GPU和CUDA2.3开发环境实现了范德华力计算、范德华势能计算和基于网格的邻居搜索.在邻居搜索算法实现中,对于不同计算能力的GPU给出了不同的实现策略.对36万粒子规模的高分子聚乙烯体系算例的测试表明:1个时间步的计算结果与计算性能突出的分子动力学软件GROMACS相应的计算结果一致(运行在工作站Intel Xeon E 5405上),相对于CPU单核计算性能有大幅提高,其中邻居搜索加速了17倍,范德华力计算加速了47倍;并且解决了邻居搜索时的边界问题.虽然本文是针对范德华力的计算,但是策略是通用的,其他方向的研究人员也可以参考.测试结果表明,使用 GPU来加速较大规模计算量的计算是可取的.  相似文献   

10.
分子动力学模拟通常用于晶体硅热力学性质的研究,因原子间采用复杂的多体作用势,分子模拟通常面临较高的计算负载,导致计算的时间和空间尺度受限。图形处理器(GPU)采用并行多线程技术,用于计算密集型处理任务,在分子动力学模拟领域中显示巨大的应用潜力。因此,充分利用GPU硬件架构特性提升固态共价晶体硅分子动力学模拟的时空尺度对晶体硅导热机制的研究具有重要意义。基于固态共价晶体硅分子动力学模拟算法,提出面向GPU计算平台的固定邻居算法设计与优化。利用数据结构、分支结构优化等方法解决分子动力学模拟的固定邻居算法全局访存和分支结构的耗时问题,降低数据访存消耗和分支冲突,通过改变线程并行调度方式,在GPU计算平台上实现高性能并行计算,有效解决计算负载问题。实验结果表明,LAMMPS双精度固态晶体硅分子动力学模拟与双精度固定邻居算法的加速比为11.62,HOOMD-blue双精度固态晶体硅分子动力学模拟与双精度固定邻居算法和单精度固定邻居算法的加速比分别为9.39和12.18。  相似文献   

11.
We examine parallel algorithms for molecular dynamics simulations involving long-range induction interactions. The algorithms are tested by performing molecular dynamics simulations of water with an intermolecular potential that explicitly includes contributions from pair, three-body and induction interactions. Both cyclic and balanced force decomposition methods are implemented to decompose the parallelizable components of induction, pair and three-body interactions using a message passing interface. We report that more than 90% of the induction calculation, and 98% of the total calculation can be effectively parallelized. A reasonably good speedup of 15.7 times and an efficiency of 49.1% are obtained on 32 processors with the balance force decomposition algorithm.  相似文献   

12.
《Advanced Robotics》2013,27(9):977-993
The purpose of this paper is to suggest a new three-dimensional multi-link system dynamics simulator. A 'Jacobian' matrix was generally used for calculating the dynamics of the multi-link system in previously proposed methods; however, this method has many difficulties related to the computational cost and precision of the calculations. For example, there are the problems of the singularity and the accumulation of calculation errors when it follows from the root to the end of the link, and the problems of treating external force effects in the angular space dynamics. In this study, we consider a multi-link system as a multi-particle movement system and each of the particles are connected by a kind of spring damper model as an imitation of a link. Using this mechanism, the Jacobian matrix is not required in the dynamics simulation and the complexity of the link dynamics simulation is dramatically decreased by way of introducing a rotational plane concept. We confirm the effectiveness of the simulator through some dynamics simulations such as walking.  相似文献   

13.
We developed MDGRAPE-2, a hardware accelerator that calculates forces at high speed in molecular dynamics (MD) simulations. MDGRAPE-2 is connected to a PC or a workstation as an extension board. The sustained performance of one MDGRAPE-2 board is 15 Gflops, roughly equivalent to the peak performance of the fastest supercomputer processing element. One board is able to calculate all forces between 10 000 particles in 0.28 s (i.e. 310000 time steps per day). If 16 boards are connected to one computer and operated in parallel, this calculation speed becomes ∼10 times faster. In addition to MD, MDGRAPE-2 can be applied to gravitational N-body simulations, the vortex method and smoothed particle hydrodynamics in computational fluid dynamics.  相似文献   

14.
复杂事件处理是一种动态环境下对事件流进行分析的技术。复杂事件处理技术通常基于有限状态自动机实现,匹配过程中会在事件流上产生大量且重叠的部分匹配,有限状态自动机需维护大量的重复匹配状态,导致基于该技术的方法都会出现冗余计算的问题。为了提高复杂事件处理的匹配效率,提出了使用复杂事件实例覆盖技术来实现复杂事件处理的方法。通过设计临时匹配链式分区存储结构以及基于此结构的匹配算法,来利用复杂事件实例覆盖减少冗余计算,从而实现匹配效率的提升。在模拟数据集和真实数据集上进行了实验测试与分析,与两种常用的复杂事件处理技术进行比较。实验表明,提出方法能够在保证匹配正确性的同时有效地减少匹配过程中的冗余计算,提高整体匹配效率。  相似文献   

15.
In molecular dynamics (MD) simulations, calculations of potentials and their derivatives by coordinate, i.e., forces, in a pairwise additive manner such as the Lennard–Jones interactions and a short-range part of the Coulombic interactions form the main part of arithmetic operations. It is essential to achieve high thread-level parallelization efficiency of these pairwise additive calculations of potentials and forces to use current supercomputers with many-core architectures effectively. In this paper, we propose four new thread-level parallelization algorithms for the pairwise additive potential and force calculations. We implement the four codes in a MD calculation code based on the fast multipole method. Performance benchmarks were taken on the FX100 supercomputer and Intel Xeon Phi coprocessor. The code succeeds in achieving high thread-level parallelization efficiency with 32 threads on the FX100 and up to 60 threads on the Xeon Phi.  相似文献   

16.
Visual sensor networks require low power compression techniques of large amount of video data in each camera node due to the energy-constrained and bandwidth-limited environments. In this paper, energy-efficient architecture for Variable Block Size Motion Estimation is proposed to fully utilize dynamic partial reconfiguration capability of programmable hardware fabric in distributed embedded vision processing nodes. Partial reconfiguration of FPGA is exploited to support run-time reconfiguration of the proposed modular hardware architecture for motion estimation. According to the required search range, hardware reconfiguration is performed adaptively to reduce the hardware resources and power consumption. A reconfigurable ME ranging from simple 1-D to a complex 2-D Sum of Absolute Differences (SAD) array to perform full search block matching is selected in order to support different search window size. The implemented scalable SAD array can provide different resolutions and frame rates for real time applications with multiple reconfigurable regions.  相似文献   

17.
A mathematical formulation for the 3D vortex method has been developed for calculation using a special-purpose computer MDGRAPE-2 that was originally designed for molecular dynamics simulations. We made an assessment of this hardware for a few representative problems and compared the results with and without it. It is found that the generation of appropriate function tables, which are used to call libraries, embedded in MDGRAPE-2 is of primary importance in order to retain accuracy. The error arising from the approximation is evaluated by calculating a pair of vortex rings impinging to themselves. Consequently, acceleration about 50 times greater is achieved by MDGRAPE-2 while the error in the statistical quantities such as kinetic energy and enstrophy remain negligible.  相似文献   

18.
《Computers & chemistry》1989,13(2):123-128
Methods are described which allow the addition of parameters to an existing force field in order to include new structural units. Data required for the parameterization procedure are mainly taken from quantum mechanical calculations. The methods have been applied to add parameters for aminoalkylphosphonic acids to the GROMOS force field. A molecular dynamics (MD) simulation of 64 aminomethylphosphonic acid molecules in eight unit cells has been carried out. The approximate parameters have also been used for a MD-study of aminomethylphosphonic acid in SPC water at constant temperature and pressure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号