首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Heterogeneous systems with nodes containing more than one type of computation units, e.g., central processing units (CPUs) and graphics processing units (GPUs), are becoming popular because of their low cost and high performance. In this paper, we have developed a Three-Level Parallelization Scheme (TLPS) for molecular dynamics (MD) simulation on heterogeneous systems. The scheme exploits multi-level parallelism combining (1) inter-node parallelism using spatial decomposition via message passing, (2) intra-node parallelism using spatial decomposition via dynamically scheduled multi-threading, and (3) intra-chip parallelism using multi-threading and short vector extension in CPUs, and employing multiple CUDA threads in GPUs. By using a hierarchy of parallelism with optimizations such as communication hiding intra-node, and memory optimizations in both CPUs and GPUs, we have implemented and evaluated a MD simulation on a petascale heterogeneous supercomputer TH-1A. The results show that MD simulations can be efficiently parallelized with our TLPS scheme and can benefit from the optimizations.  相似文献   

2.
现有并行识别方法用于众核处理器时存在一定不足,当选择的循环并行维迭代数较少时可能导致严重地负载不均衡。针对这一问题,提出了一种面向众核处理器的多维并行识别方法,在现有并行识别方法无法做到较好的负载均衡时,选择嵌套循环的多个维进行并行,将多个并行维的迭代空间合并后再做任务划分,减少负载不均衡对程序并行效率的影响。此方法已在课题组开发的自动并行化系统中进行了实现,实际应用过程中能够提升一些应用程序在众核处理器上并行执行的效率。  相似文献   

3.
The Journal of Supercomputing - In drug discovery, molecular docking is the task in charge of estimating the position of a molecule when interacting with the docking site. This task is usually used...  相似文献   

4.
The Frequency Assignment Problem (fap) is one of the key issues in the design of Global System for Mobile Communications (gsm) networks. The formulation of the fap used here focuses on aspects that are relevant to real gsm networks. In this paper, we adapt a parallel model to tackle a multiobjectivised version of the fap. It is a hybrid model which combines an island-based model and a hyperheuristic. The main aim of this paper is to design a strategy that facilitates the application of the current best-behaved algorithm. Specifically, our goal is to decrease the user effort required to set its parameters. At the same time, the usage of such an algorithm in parallel environments was enabled. As a result, the time required to attain high-quality solutions was decreased. We also conduct a robustness analysis of this parallel model. In this analysis we study the relationship between the migration stage of the parallel model and the quality of the resulting solutions. In addition, we also carry out a scalability study of the parallel model. In this case, we analyse the impact that the migration stage has on the scalability of the entire parallel model. Computational results with several real network instances have validated our proposed approach. The best-known frequency plans for two real-world network instances are improved with this strategy.  相似文献   

5.
We present techniques for exploiting fine-grained parallelism extracted from sequential programs on a fine-grained MIMD system. The system exploits fine-grained parallelism through parallel execution of instructions on multiple processors as well as pipelined nature of individual processors. The processors can communicate data values via globally shared registers as well as dedicated channel queues. Compilation techniques are presented to utilize these mechanisms. A scheduling algorithm has been developed to distribute operations among the processors in a manner that reduces communication among the processors. The compiler identifies data dependencies which require synchronization and enforces them using channel queues. Delays that may result by attempting write operations to a full channel queue are avoided by spilling values from channels to local registers. If an interprocessor data dependency does not require synchronization, then the data value is passed through a shared register or shared memory.Partially supported by National Science Foundation Presidential Young Investigator Award CCR-9157371 (CCR-9249143) to the University of Pittsburgh.  相似文献   

6.
分子对接同的之一,是找出配体和受体之间最稳定构像的结合模式,可以归为全局搜索或优化问题.本文提出的鼍子行为粒子群优化算法(QPSO)是1种有效的全局优化搜索算法.本文介绍QPSO算法在分子对接问题研究中的应用,并使用Autodock3.05的打分函数评价分子对接结果.结果表明,QPSO算法的QDOCK程序能够寻找出更为稳定的构像,且其收敛速度以及对接结果的精确性均比拉马克遗传算法(LGA)的Autodock3.05好.  相似文献   

7.
8.
He  Wei-Jia  Yang  Ming-Lin  Wang  Wu  Sheng  Xin-Qing 《The Journal of supercomputing》2021,77(2):1502-1516
The Journal of Supercomputing - A many-core parallel approach of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model is presented on the homegrown...  相似文献   

9.
An extension of the computer program CICADA has been developed that allows us to use the single-coordinate-driving (SCD) method for flexible molecular docking. The docking procedure is composed of three independent space rotations, three independent translations, and the torsions selected by the user. One of the coordinates is driven; the other coordinates are relaxed. This procedure follows low-energy wells on the potential energy surface of the entire system. The program allows us to dock more than one ligand molecule to the receptor. We ran two test examples, docking N,N-dimethylformamide into alpha-cyclodextrin and R-phenoxypropionic acid into beta-cyclodextrin. The test examples showed that the SCD approach is able to overcome high-energy barriers and to cover the entire box within which the search is performed. The limitations of molecular dynamics docking in comparison with our approach also are discussed. The philosophy of the newly developed approach is not only to find the best dock for the receptor-ligand(s) system, but also to describe all the important binding modes and provide a good starting point for studying the dynamics within the cavity during the docking process.  相似文献   

10.
We investigate the efficient iterative solution of large-scale sparse linear systems on shared-memory multiprocessors. Our parallel approach is based on a multilevel ILU preconditioner which preserves the mathematical semantics of the sequential method in ILUPACK. We exploit the parallelism exposed by the task tree corresponding to the nested dissection hierarchy (task parallelism), employ dynamic scheduling of tasks to processors to improve load balance, and formulate all stages of the parallel PCG method conformal with the computation of the preconditioner to increase data reuse. Results on a CC-NUMA platform with 16 processors reveal the parallel efficiency of this solution.  相似文献   

11.
The distance geometry problem (DGP) consists in finding an embedding in a metric space of a given weighted undirected graph such that for each edge in the graph, the corresponding distance in the embedding belongs to a given distance interval. We discuss the relationship between the existence of a graph embedding in a Euclidean space and the existence of a graph embedding in a lattice. Different approaches, including two integer programming (IP) models and a constraint programming (CP) approach, are presented to test the feasibility of the DGP. The two IP models are improved with the inclusion of valid inequalities, and the CP approach is improved using an algorithm to perform a domain reduction. The main motivation for this work is to derive new pruning devices within branch‐and‐prune algorithms for instances occurring in real applications related to determination of molecular conformations, which is a particular case of the DGP. A computational study based on a set of small‐sized instances from molecular conformations is reported. This study compares the running times of the different approaches to check feasibility.  相似文献   

12.
Molecular docking is a Bioinformatics method based on predicting the position and orientation of a small molecule or ligand when it is bound to a target macromolecule. This method can be modeled as an optimization problem where one or more objectives can be defined, typically around an energy scoring function. This paper reviews developments in the field of single- and multi-objective meta-heuristics for efficiently addressing molecular docking optimization problems. We comprehensively analyze both problem formulations and applied techniques from Evolutionary Computation and Swarm Intelligence, jointly referred to as Bio-inspired Optimization. Our prospective analysis is supported by an experimental study dealing with a molecular docking problem driven by three conflicting objectives, which is tackled by using different multi-objective heuristics. We conclude that genetic algorithms are the most widely used techniques by far, with a noted increasing prevalence of particle swarm optimization in the last years, being these last techniques particularly adequate when dealing with multi-objective formulations of molecular docking problems. We end this experimental survey by outlining future research paths that should be under target in this vibrant area.  相似文献   

13.
Recently there is a trend to broaden the usage of lower-power embedded media processor core to build the future high-end computing machine or the supercomputer. However the embedded solution also faces the operating system (OS) design challenge which the thread invoking overhead is higher for fine-grained scientific workload, the message passing among threads is not managed efficiently enough and the OS does not provide convenient enough service for parallel programming. This paper presents a scheduler of master-slave real-time operating system (RTOS) to manage the thread running for the distributed multi/many-core system without shared memories. The proposed scheduler exploits the data-driven feature of scientific workloads to reduce the thread invoking overhead. And it also defines two protocols: (1) one is between the RTOS and application program, which is used to reduce the burden of parallel programming for the programmer; (2) another one is between the RTOS and networks-on-chip, which is used to manage the message passing among threads efficiently. The experimental results show that the proposed scheduler can manage the thread running with lower overhead and less storage requirement, thereby, improving the multi/many-core system performance.  相似文献   

14.
Molecular dynamics (MD) simulation has broad applications, and an increasing amount of computing power is needed to satisfy the large scale of the real world simulation. The advent of the many-core paradigm brings unprecedented computing power, but it remains a great challenge to harvest the computing power due to MD’s irregular memory-access pattern. To address this challenge, this paper presents a joint application/architecture study to enhance the scalability of MD on Godson-T-like many-core architecture. First, a preprocessing approach leveraging an adaptive divide-and-conquer framework is designed to exploit locality through memory hierarchy with software controlled memory. Then three incremental optimization strategies–a novel data-layout to improve data locality, an on-chip locality-aware parallel algorithm to enhance data reuse, and a pipelining algorithm to hide latency to shared memory–are proposed to enhance on-chip parallelism for Godson-T many-core processor. Experiments on Godson-T simulator exhibit strong-scaling parallel efficiency of 0.99 on 64 cores, which is confirmed by a field-programmable gate array emulator. Also the performance per watt of MD on Godson-T is much higher than MD on a 16-cores Intel core i7 symmetric multiprocessor (SMP) and 26 times higher than MD on an 8-core 64-thread Sun T2 processor. Detailed analysis shows that optimizations utilizing architectural features to maximize data locality and to enhance data reuse benefit scalability most. Furthermore, a hierarchical parallelization scheme is designed to map the MD algorithm to Godson-T many-core cluster and a simple performance model is derived, which suggests that the optimization scheme is likely to scale well toward exascale. Certain architectural features are found essential for these optimizations, which could guide future hardware developments.  相似文献   

15.
Black  D.L. 《Computer》1990,23(5):35-43
The shared use of general-purpose uniprocessors is examined. Support for common uniform-memory-access architectures that have all memory equidistant from all processors in terms of access time is emphasized. This work is also applicable to non-uniform-memory-access machines, whose memory access times depend on the physical distance between the processor and the accessed memory, but it does not provide a complete solution to load-balancing problems for this class of machine. The discussion covers time-sharing scheduling, the Mach scheduler, programming models, scheduling concurrency support, processor allocation, and related work  相似文献   

16.
Many computer vision applications, such as object recognition and content-based image retrieval could function more reliably and effectively if regions of interest were isolated from their background. A new method for regions of interest extraction from color image based on visual saliency in HSV color space is proposed in this paper. Color saliency is calculated by a two-dimensional sigmoid function using the saturation component and brightness component, and we can identify regions with vivid color. Discrete Moment Transform (DMT)-based saliency can determine large areas of interest. A visual saliency map is obtained by combining color saliency and DMT-based saliency, which is denoted the S image. A criterion for the local homogeneity called the E image is calculated in the image. Based on S image and E image, the high visual saliency object seed points set and low visual saliency object seed points set are determined. The seeded regions growing and merging are used to extract regions of interest. Experimental results demonstrate the effectiveness and efficiency of the method for the natural color images.  相似文献   

17.
The field of combinatorial optimization has inspired the development of a large number of heuristic solution procedures. These methods are commonly assessed using a competitive evaluation methodology that may give an indication of which algorithm has a better performance. A next step in the experimental analysis is to uncover “why” one algorithm performs better. Which elements are responsible for good or bad performance? How does the performance of elements vary across the design space? What is the influence of the specific problem instance that is being solved? We focus on gaining a better understanding of heuristic algorithm performance and demonstrate that the application of a proper statistical methodology can provide researchers insight into how performance is affected by the different algorithm parameters and components. As an example, we apply a multilevel statistical analysis to a large neighborhood search algorithm for the vehicle routing problem with time windows.  相似文献   

18.
A multilevel optimization approach applicable to nonhierarchic coupled systems is presented. The approach includes a general treatment of design (or behaviour) constraints and coupling constraints at the discipline level through the use of norms. Three different types of norms are examined - the max norm, the Kreisselmeier-Steinhauser (KS) norm, and the p norm. The max norm is recommended. The approach is demonstrated on a class of hub frame structures that simulate multidisciplinary systems. The max norm is shown to produce system-level constraint functions which are nonsmooth. A cutting-plane algorithm is presented, which adequately deals with the resulting corners in the constraint functions. The algorithm is tested on hub frames with an increasing number of members (which simulate disciplines), and the results are summarized.  相似文献   

19.
The Journal of Supercomputing - Molecular docking techniques are widely used in computational drug discovery. Most of these techniques simulate the way that a ligand interacts with a protein target...  相似文献   

20.
This paper is devoted to analyzing numerical optimization methods for solving the problem of molecular docking. Some additional requirements for optimization methods that take into account certain architectural features of graphics processing units (GPUs) have been formulated. A promising optimization method for use on graphics processors has been selected, its implementation is described, and its efficiency and accuracy have been estimated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号