首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time-accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared-memory multiprocessor (the CRAY Y-MP), and distributed-memory multiprocessors with different topologies (the IBM SP and the CRAY T3D). We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message-passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

3.
为了设计各种体系结构的通用并行模型,本文分析了现有并行模型中存在的问题,提出了一个有效的BSP并行模型。详述了BSP计算机的组成及其运行过程。  相似文献   

4.
In this paper we present an efficient general simulation strategy for computations designed for fully operational bsp machines of n ideal processors, on n -processor dynamic-fault-prone bsp machines. The fault occurrences are fail-stop and fully dynamic, i.e., they are allowed to happen on-line at any point of the computation, subject to the constraint that the total number of faulty processors may never exceed a known fraction. The computational paradigm can be exploited for robust computations over virtual parallel settings with a volatile underlying infrastructure, such as a network of workstations (where workstations may be taken out of the virtual parallel machine by their owner). Our simulation strategy is Las Vegas (i.e., it may never fail, due to backtracking operations to robustly stored instances of the computation, in case of locally unrecoverable situations). It adopts an adaptive balancing scheme of the workload among the currently live processors of the bsp machine. Our strategy is efficient in the sense that, compared with an optimal off-line adversarial computation under the same sequence of fault occurrences, it achieves an \cal O \left( (log n ⋅log log n) 2 \right) multiplicative factor times the optimal work (namely, this measure is in the sense of the ``competitive ratio' of on-line analysis). In addition, our scheme is modular, integrated, and considers many implementation points. We comment that, to our knowledge, no previous work on robust parallel computations has considered fully dynamic faults in the bsp model, or in general distributed memory systems. Furthermore, this is the first time an efficient Las Vegas simulation in this area is achieved. Online publication October 26, 2000.  相似文献   

5.
The occurrence of faults in multicomputers with hundreds or thousands of nodes is a likely event that can be dealt with hardware or software fault-tolerant approaches. This paper presents a unifying model that describes software reconfiguration strategies for parallel applications with regular computational pattern. We show that most existing strategies can be obtained as instances of the proposedthreshold-basedreconfiguration meta-algorithm. Moreover, this approach is useful to discover several yet unexplored strategies among which we consider the class of theadaptive threshold-basedstrategies. The performance optimization analysis demonstrates that these strategies, applied to data-parallel regular computations, give optimal results for worst fault patterns. A wide spectrum of simulations, where the system parameters have been settled to those of actual multicomputers, confirms that adaptive threshold-based strategies yield the most stable performance for a variety of workloads, independently of the number and pattern of faults.  相似文献   

6.
李国东  张德富 《软件学报》2002,13(3):342-353
在为工作站机群构造并行软件的过程中,计算特征和组成特征非常重要.但是,由于缺乏有效的支撑环境,当今的分布式并行计算软件系统效率低下,这在计算特征方面尤为明显.提出一个基于分布式对象的并行计算框架,目的在于保证高效的并行计算开发,提供封装和复用并行程序的机制,并保证系统的动态平衡和容错性.框架是4层模型,包括对象组层和移动对象层.实验结果证明了方案的有效性.  相似文献   

7.
Skeletal parallel programming enables programmers to build a parallel program from ready-made components (parallel primitives) for which efficient implementations are known to exist, making both the parallel program development and the parallelization process easier. Constructing efficient parallel programs is often difficult, however, due to difficulties in selecting a proper combination of parallel primitives and in implementing this combination without having unnecessary creations and exchanges of data among parallel primitives and processors. To overcome these difficulties, we propose a powerful and general parallel skeleton, accumulate, which can be used to naturally code efficient solutions to problems as well as be efficiently implemented in parallel using Message Passing Interface (MPI).  相似文献   

8.
本文给出了一种新的基于模式树构造的多模式并行匹配算法,算法高效简单且实现了匹配的并行化,特别适合于信息检索,摸式识别,入侵检测等的方面的多关键字查找。对比分析表明,新算法有较大的移动步长,能够有效减少了实际匹配的规模,使时间和资源消耗均得到了降低,提高了查找速度。  相似文献   

9.
本文给出了一种新的基于模式树构造的多模式并行匹配算法,算法高效简单且实现了匹配的并行化,特别适合于信息检索,模式识别,入侵检测等的方面的多关键字查找。对比分析表明,新算法有较大的移动步长,能够有效减少了实际匹配的规模,使时间和资源消耗均得到了降低,提高了查找速度。  相似文献   

10.
并行异构系统中的一种高效任务调度算法   总被引:1,自引:0,他引:1       下载免费PDF全文
并行分布计算是当前计算机科学的热点之一。调度算法是影响分布式计算的关键因素,也是一个具有挑战性的课题。调度算法是将有通信关系的任务按顺序分配给不同的处理器。提出了一种基于区间插入和任务复制的高效启发式算法,通过对各种随机任务图和高斯迭代任务图进行模拟,与已有算法相比,新算法的效率有很大的提高。  相似文献   

11.
神经网络处理系统所能实现神经网络模型的种类越多其通用性越好,应用范围就越广泛.提出了一种神经网络并行处理器的体系结构,能以较高的并行度实现典型的前馈网络-BP网络和典型的反馈网络-Hopfield网络的算法.该处理器以SIMD(Single Instruction Multiple Data)为主要计算结构,并结合这两种网络算法的特点设计了一维脉动阵列和全联通的互连网络,能够方便灵活地实现处理单元之间的数据共享.实验结果表明该体系结构有效地提高了神经网络的运行速度.  相似文献   

12.
This paper shows how a high-level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave. JEL Classifications: C13; C14; C15; C63; C87  相似文献   

13.
Consideration was given to the recursive approach to the block algorithms of linear algebra. The problem of LL T-decomposition (quadratic root) was used by way of example. Computational complexity was estimated both in terms of arithmetic floating-point operations and data-transfer operations required to generate recursive structures. The main area of application of the algorithms is solution of large-scale problems on parallel and distributed computer systems.  相似文献   

14.
The system PARCS-JAVA provides software tools for solution of problems on computer networks. It can be installed on heterogeneous computer networks and allows users of small computers to use parallel data processing.__________Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 25–36, January–February 2005.  相似文献   

15.
This paper describes new mapping algorithms for domain-oriented data-parallel computations, where the workload is distributed irregularly throughout the domain, but exhibits localized or rectilinear communication patterns. We consider the problem of partitioning the domain for parallel processing in such a way that the workload on the most heavily loaded processor is minimized, subject to the constraint that the partition be perfectly rectilinear. Rectilinear partitions are useful on architectures that have a fast local mesh network and a relatively slower global network; these partitions heuristically attempt to maximize the fraction of communication carried by the local network. We provide an improved algorithm for finding the optimal partition in one dimension, propose new algorithms for partitioning in two dimensions, and show that optimal partitioning in three dimensions is NP-complete. We discuss our application of these algorithms to real problems.  相似文献   

16.
17.
信度网是不确定性知识表达和推理的有力工具。信度网的精确推理是NPC问题,计算的主要困难在于将信度网三角化并构造一棵最小权的join tree上。此项研究提出了一种新的三角化算法MsLB-Triang,该算法同时利用了无向图三角化的Direc性质与LB-单纯性质,在生成的三角化图的总权以及增加边的数目上均明显优于目前广泛采用的Min.Weight Heuristic算法。  相似文献   

18.
并行计算:提高SAT问题求解效率的有效方法   总被引:4,自引:1,他引:3  
金人超  黄文奇 《软件学报》2000,11(3):398-400
基于拟物拟人思想的Solar算法是一个求解SAT问题的快速算法.实验和理论分析表明,Solar算法具有易并行化的特性.将Solar算法并行化可大幅度地提高求解SAT问题的效率.  相似文献   

19.
Methods are developed for transforming sequential programs for iterative computations into parallel-distributed versions which execute in parallel on a cluster of workstation or PC nodes on a local area network. We focus on communication issues and present algorithms for interprocess communication implemented by UNIX TCP/IP socket commands. Results of performance tests on several application problems, such as simulation of neural networks and the Jacobi method for solving linear equations, representative of a large class of application problems are presented. Analysis indicates that, for problems with rather intensive computation, speedups of better than 2p/3 are possible with an optimal numberpof nodes on a single Ethernet bus segment. Preliminary tests on small clusters show efficient speedups even for nonoptimalp.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号