期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A common parallel computing framework for modeling hydrological processes of river basins 总被引：2，自引：0，他引：2

Hao Wang Xudong Fu Guangqian Wang Tiejian Li Jie Gao 《Parallel Computing》2011,37(6-7):302-315

Restricted computing power has become one of the primary factors obstructing advancement in basin simulations for majority of hydrological models. Parallel computing is one of the most available approaches to solve this problem. Using binary-tree theory, we present in this study a common parallel computing framework based on the message passing interface (MPI) protocol for modeling hydrological processes of river basins. A practical and dynamic spatial domain decomposition method, based on the binary-tree structure of the drainage network, is proposed. This framework is computationally efficient, and is independent of the type of physical models chosen. The framework is tested in the Chabagou river basin of China, where two years of runoff processes of the entire basin were simulated. Results demonstrate that the system may provide efficient computing performance. However, primarily because of the constraint of the binary-tree structure for drainage network, this study finds that unlimited enhancement of computing efficiency is impossible to realize. 相似文献

2.

基于多核系统的混合神经网络及应用

杨蕾林红《电脑学习》2011,(4):4-6

借助混沌免疫遗传优化算法对于BP神经网络进行训练,建立基于混沌免疫遗传算法的混合神经网络模型。针对混沌免疫遗传神经网络计算工作量大,训练速度慢的缺点,利用Matlab的Parallel Computing Toolbox对于所建立的混沌免疫遗传神经网络模型进行并行化算法设计实现,并对渤海海区年极值冰厚数据进行预测,对比分析了串行和并行算法的计算效率和加速比,表明基于多核系统的并行化设计算法可以提高加速比和计算效率。相似文献

3.

非结构网格上求解中子输运方程的并行流水线Sn扫描算法 总被引：11，自引：4，他引：7

莫则尧傅连祥阳述林《计算机学报》2004,27(5):587-595

间断有限元离散纵标方法(Sn)是广泛应用于求解高维非定常中子输运方程的数值方法,它涉及几何网格空间、速度相空间和中子能群的离散,计算量很大．该文基于非结构网格,提出了基于区域分解的并行流水线Sn扫描算法,通过设计具有不同内在并行度和通信面体比的区域分解方法和队列插入算法,对两个不同物理模型,分别使用两台并行机的92个和256个CPU,获得72倍和78倍以上的加速．可扩展性能分析表明,算法的性能非常依赖于并行机的点对点通信延迟．相似文献

4.

集群环境下的并行聚类算法 总被引：8，自引：0，他引：8

周兵沈钧毅彭勤科《计算机工程》2004,30(4):4-6

探讨在集群环境下,如何设计并行聚类算法。作为一种低成本、通用并行系统,集群系统的通信能力,相对于节点的计算能力,是一个瓶颈。所以在集群环境下,设计并行聚类算法时,应采用数据并行的思想。从理论上,对采用数据并行思想后,影响聚类算法的加速比和聚类质量的因素进行了分析,然后通过一个验证算法PCIT(Parallel clustering algorithm based on Index Tree)证实了理论分析的正确性。研究结果可以为以后设计更好的数据并行聚类算法提供理论依据。相似文献

5.

集群环境下的并行聚类算法之研究

周兵冯中慧王和兴《计算机科学》2007,34(10):195-199

本文的目的就是通过理论分析和试验,探讨集群环境下并行聚类算法的设计思想。作为一种低成本、通用并行系统,集群系统的通讯能力相对于节点的计算能力是一个瓶颈。所以本文提出,在集群环境下设计并行聚类算法时,应采用数据并行的思想。本文首先从理论上,对采用数据并行思想后影响加速比的因素和通讯策略的选择进行了分析,然后实现了一个新的并行聚类算法——PARC算法。通过PARC算法的实验,证明了理论分析的正确性,并且表明并行聚类算法可以得到良好的聚类质量。本文的研究结果可以为以后设计更好的数据并行聚类算法提供一定的理论依据。相似文献

6.

A parallel implementation of the backward error propagation neural network training algorithm: experiments in event identification.

D F Sittig J A Orr 《Computers and biomedical research》1992,25(6):547-561

An artificial neural-network-based (ANN) event detection and alarm generation system has been developed to aid clinicians in the identification of critical events commonly occurring in the anesthesia breathing circuit. To detect breathing circuit problems, the system monitored CO2 gas concentration, gas flow, and airway pressure. Various parameters were extracted from each of these input waveforms and fed into an artificial neural network. To develop truly robust ANNs, investigators are required to train their networks on large training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from 6.4 to 1.5 h. By reducing the total run time of the computation through parallelism, we were able to optimize many of the neural network's initial parameters. We conclude that use of the master-worker model of parallel computation is an excellent method for speeding up the backward error propagation neural network training algorithm. 相似文献

7.

基于OpenMP的Multi-Critical分子动力学并行算法优化

段振华白明泽豆育升《计算机应用研究》2012,29(7):2432-2434

为提高分子动力学模拟在多核共享内存式服务器上的运算速度,在现有的分子动力学并行算法基础上提出了Multi-Critical算法。该算法使用手动划分力矩阵的方法,使多个线程进入不同名的临界区,并使用分块叠加的方法优化了并行算法,提高了并行效率。实验结果表明,对比之前的Critical算法,该算法的加速比和并行效率均有较大幅度的提高。相似文献

8.

Program Speedup in a Heterogeneous Computing Network

Donaldson V. Berman F. Paturi R. 《Journal of Parallel and Distributed Computing》1994,21(3)

Program speedup is an important measure of the performance of an algorithm on a parallel machine. Of particular importance is the near linear or superlinear speedup exhibited by the most performance-efficient algorithms for a given system. We describe network and program models for heterogeneous networks, define notions of speedup and superlinear speedup, and observe that speedup consists of both heterogeneous and parallel components. We also consider the case of linear tasks, give a lower bound for the speedup, and show that there is no theoretical upper limit on heterogeneous speedup. 相似文献

9.

Parallel genetic algorithm for N-Queens problem based on message passing interface-compute unified device architecture

Cao Jianli Chen Zhikui Wang Yuxin Guo He 《Computational Intelligence》2020,36(4):1621-1637

N-Queens problem derives three variants: obtaining a specific solution, obtaining a set of solutions and obtaining all solutions. The purpose of the variant I is to find a constructive solution, which has been solved. Variant III is aiming to find all solutions and the largest number of queens currently being resolved is 26. Variant II whose purpose is to obtain a set of solutions for larger-scale problems relies on various intelligent algorithms. In this paper, we use a master-slave model genetic algorithm that combines the idea of the evolutionary algorithm and simulated annealing algorithm to solve Variant III, and use a parallel fitness function based on compute unified device architecture. Experimental results show that our scheme achieved a maximum 60-fold speedup over the single-CPU counterpart. On this basis, a two-level parallel genetic algorithm based on the island model and master-slave model is implemented on the GPU cluster by using message passing interface technology. Using two-node and three-node GPU cluster, speedup of 1.46 and 2.01 are obtained on average over single-node, respectively. Compared with the sequential genetic algorithm, the two-level parallel genetic algorithm makes full use of the parallel computing power of GPU cluster in solving N-Queen variant II and improves the performance by 99.19 times in the best case. 相似文献

10.

Parallel computing optimization in the Apollo domain network

Pekergin M.F. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(4):296-303

The performance of parallel computing in a network of Apollo workstations where the processes use the remote procedure call (RPC) mechanism for communication is addressed. The speedup in such systems cannot be accurately estimated without taking into account the relatively large communication overheads. Moreover, it decreases by increasing parallelism when the latter exceeds some certain limit. To estimate the speedup and determine the optimum degree of parallelism, the author characterizes the parallelization and the communication overheads in the system considered. Then, parallel applications are modeled and their execution times are expressed for the general case of nonidentical tasks and workstations. The general case study allows the structural constraints of the applications to be taken into account by permitting their partitioning into heterogeneous tasks. A simple expression of the optimum degree of parallelism is obtained for identical tasks where the inherent constraints are neglected. The fact that the theoretical maximum speedup is bounded by half of the optimum degree of parallelism shows the importance of this measure 相似文献

11.

优化并行计算的性能评价

刘杰迟利华胡庆丰《计算机工程与设计》2000,21(6):4-7

传统的并行计算的性能评价模型是加速比,文中讨论了加速比的缺点和不足,在此基础上提出了一种新的优化并行计算的性能评价模型（我们称之为优化加速比）。利用优化加速比分析了NAS基准测试程序MG和FT在IBM SP2(66mhz/wn)上的性能。相似文献

12.

并行程序的优化与性能评价 总被引：5，自引：0，他引：5

下载免费PDF全文

刘杰迟利华胡庆丰《计算机工程与科学》2000,22(5):67-70

文中讨论了并行程序的优化问题,指出并行程序的优化应从数据划分、通信优化和串行优化三个方面着手。针对传统加速比的缺点和不足,我们提出了优化加速比模型来评价优化并行程序的性能;对ＮＡＳ基准测试程序ＭＧ和ＦＴ进行了优化,用优化加速比模型分析了上述两个程序在ＩＢＭＳＰ２上的性能。相似文献

13.

Hogs and slackers: Using operations balance in a genetic algorithm to optimize sparse algebra computation on distributed architectures

Una-May O’Reilly Eric Robinson Sanjeev Mohindra Julie Mullen Nadya Bliss 《Parallel Computing》2010,36(10-11):635-644

We present a framework for optimizing the distributed performance of sparse matrix computations. These computations are optimally parallelized by distributing their operations across processors in a subtly uneven balance. Because the optimal balance point depends on the non-zero patterns in the data, the algorithm, and the underlying hardware architecture, it is difficult to determine. The Hogs and Slackers genetic algorithm (GA) identifies processors with many operations – hogs, and processors with few operations – slackers. Its intelligent operation-balancing mutation operator swaps data blocks between hogs and slackers to explore new balance points. We show that this operator is integral to the performance of the genetic algorithm and use the framework to conduct an architecture study that varies network specifications. The Hogs and Slackers GA is itself a parallel algorithm with near linear speedup on a large computing cluster. 相似文献

14.

Mapping the potential annual total nitrogen load in the river basins of Japan with remotely sensed imagery

Kazuo Oki Yoshifumi Yasuoka 《Remote sensing of environment》2008,112(6):3091-3098

The increase of nutrient loads such as nitrogen and phosphorus to a river due to land cover changes in surrounding areas has been one of the major sources of water pollution or eutrophication. Monitoring the influent nutrient load from river basins to rivers is now crucial in the management of river basin environments. The monitoring is not easy, however, because it requires spatial and temporal measurement tools for land cover changes in the river basin and water qualities, and also it requires models relating them.In this study, we first analyzed the relation between the land cover types estimated from monthly maximum Normalized Difference Vegetation Index (NDVI) imagery calculated from NOAA Advanced Very High Resolution Radiometer (AVHRR) imagery and the annual total nitrogen load discharged from river basins. We found that the runoff load factor from urban areas is higher than those of forested areas. We also found that the impacts of land cover such as plantation and field weed communities on the total nitrogen load of each river are higher than the impacts of other land cover types such as Beech and Camellia japonica community type.Finally, we produced two advanced maps of the potential annual total nitrogen load (PTNL) index and the potential annual total nitrogen load for each river basin area (PTNL/area) index by considering the relationship between the land cover types and the annual total nitrogen load discharged from river basins in Japan. The PTNL map will be useful for the risk assessment of total nitrogen load impact on lakes and the sea through rivers from each basin. The PTNL/area index, which considers the effects of river basin areas, will allow evaluation of the state of river basins. 相似文献

15.

Parallel implementation of multilayered neural networks based on Map-Reduce on cloud computing clusters

Hai-jun Zhang Nan-feng Xiao 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(4):1471-1483

To meet the requirements of big data processing, this paper presents an efficient mapping scheme for a fully connected multilayered neural network, which is trained by using back-propagation (BP) algorithm based on Map-Reduce of cloud computing clusters. The batch-training (or epoch-training) regimes are used by effective segmentation of samples on the clusters, and are adopted in the separated training method, weight summary to achieve convergence by iterating. For a parallel BP algorithm on the clusters and a serial BP algorithm on an uniprocessor, the required time for implementing the algorithms is derived. The performance parameters, such as speedup, optimal number and minimum of data nodes are evaluated for the parallel BP algorithm on the clusters. Experiment results demonstrate that the proposed parallel BP algorithm in this paper has better speedup, faster convergence rate, less iterations than that of the existed algorithms. 相似文献

16.

Amdahl’s law for multithreaded multicore processors

Hao Che Minh Nguyen 《Journal of Parallel and Distributed Computing》2014

In this paper, we conduct performance scaling analysis of multithreaded multicore processors (MMPs) for parallel computing. We propose a thread-level closed-queuing network model covering a fairly large design space, accounting for hardware scaling models, coarse-grain, fine-grain, and simultaneous multithreading (SMT) cores, shared resources, including cache, memory, and critical sections. We then derive a closed-form solution for this model in terms of speedup performance measure. This solution makes it possible to analyze performance scaling properties of MMPs along multiple dimensions. In particular, we show that for the parallelizable part of the workload, the speedup, in the absence of resource contention, is no longer just a linear function of parallel processing unit counts, as predicted by Amdahl’s law, but also a strong function of workload characteristics, ranging from strong memory-bound to strong CPU-bound workloads. We also find that with core multithreading, super linear speedup, higher than that predicted by Amdahl’s law, may be achieved for the parallelizable part of the workload, if core threads exhibit strong cache affinity and the workload is strongly memory-bound. Then, we derive a tight speedup upper bound in the presence of both memory resource contention and critical section for multicore processors with single-threaded cores. This speedup upper bound indicates that with resource contention among threads, whether it is due to shared memory or critical section, a sequential term is guaranteed to emerge from the parallelizable part of the workload, fundamentally limiting the scalability of multicore processors for parallel computing, in addition to the sequential part of the workload, as dictated by Amdahl’s law. As a result, to improve speedup performance for MMPs, one should strive to enhance memory parallelism and confine critical sections as locally as possible, e.g., to the smallest possible number of threads in the same core. 相似文献

17.

流域系统森林作用水文动态过程的仿真分析

欧松《计算机仿真》2000,17(6):51-55

该文对森林状态所引起的流域系统水文动态过程特征的变化作了计算机仿真分析。流域系统的森林状态随时间变化,包含森林的自然生长、采伐、造林等。而自然地质环境,如地形地貌等流域特征相对保持稳定。所以不同时期的流域水文动态过程特性变化主要由森林状态的变化所引起。选择长江流域湖南省境内三个集水面积在500-950KM^2的流域系统,收集连续30年的水文气象数据,以连续和5年数据作为一组,分成多个对比样本系列,辨别不同时期的水文动态过程模型。应用的模型有ARX和模糊神经网络。对模型作脉冲输入的响应仿真,得到不同时期的水文动态过程特性。比较各个时期的流域水文动态过程特性和相应时期的森林状态,得出了森林作用于水文动态过程的一些结论。相似文献

18.

PRAM和LARPBS模型上的近似串匹配并行算法 总被引：15，自引：1，他引：15

钟诚陈国良《软件学报》2004,15(2):159-169

近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用((m+1)个处理器和O(n/(+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器. 相似文献

19.

一种基于数据并行的过程神经网络训练算法

许少华刘丹丹《电脑学习》2011,(3):40-42

针对过程神经网络时空聚合运算机制复杂、学习周期长的问题,提出了一种基于数据并行的过程神经网络训练算法。该方法基于梯度下降的批处理训练方式,应用MPI并行模式进行算法设计,在局域网内实现多台计算机的机群并行计算。文中给出了基于数据并行的过程神经网络训练算法和实现机制,对不同规模的训练函数样本集和进程数进行了对比实验,并对加速比、并行效率等算法性质进行了分析。实验结果表明,根据网络和样本规模适当选取并行粒度,算法可较大提高过程神经网络的训练效率。相似文献

20.

基于互信息的遥感图像区域配准并行算法的研究与实现 总被引：2，自引：0，他引：2

下载免费PDF全文

周海芳杜云飞杨学军李思昆《中国图象图形学报》2010,15(1):174-180

图像配准是图像融合、变化检测、目标识别等遥感应用中的重要步骤。互信息由于具有无需预处理、自动化程度高以及鲁棒性强等特点,将其作为一种相似性测度进行图像配准成为近几年图像处理领域的研究热点。随着遥感图像数据量的不断加大,传统的单机处理模式已经无法满足一些应用的时效性要求。基于对串行算法计算瓶颈的实验分析,研究并提出了一种基于互信息的遥感图像区域配准并行算法,分别给出了数据划分策略和互信息计算并行处理方案,采用边界冗余划分和二叉树归约方法减少数据通信,并对算法进行了定量的复杂度分析。实验结果表明该算法可扩展性好,通用性强。相似文献