共查询到20条相似文献,搜索用时 16 毫秒
1.
The inadequacies of conventional parallel languages for programming multicomputers are identified. The C* language is briefly reviewed, and a compiler that translates C* programs into C programs suitable for compilation and execution on a hypercube multicomputer is presented. Results illustrating the efficiency of executing data-parallel programs on a hypercube multicomputer are reported. They show the speedup achieved by three hand-compiled C* programs executing on an N-Cube 3200 multicomputer. The first two programs, Mandelbrot set calculation and matrix multiplication, have a high degree of parallelism and a simple control structure. The C* compiler can generate relatively straightforward code with performance comparable to hand-written C code. Results for a C* program that performs Gaussian elimination with partial pivoting are also presented and discussed 相似文献
2.
Hatcher P.J. Quinn M.J. Lapadula A.J. Seevers B.K. Anderson R.J. Jones R.R. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(3):377-383
The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of Dataparallel C programs has been compiled and executed, and their execution times and speedups on the Intel iPSC/2, the nCUBE 3200 and the Sequent Symmetry are presented 相似文献
3.
Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper reviews these approaches and develops a straightforward but effective algorithm using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a network of high performance Unix workstations. Performance benchmarks were performed on both systems using two proteins. This algorithm offers a portable cost-effective alternative for molecular dynamics simulations. In view of the increasing numbers of networked workstations, this approach could help make molecular dynamics simulations more easily accessible to the research community. 相似文献
4.
Parallel volume rendering on a network of workstations 总被引:1,自引:0,他引:1
An algorithm for parallel volume rendering on general-purpose workstations connected to a local area network (LAN) is presented. The algorithm is based on an efficient scan-line algorithm for volume rendering of irregular meshes. This algorithm computes images by intersecting the mesh with successive planes defined through each scan line and perpendicular to the screen. These planes are called scan planes. Image coherency from one scan plane to the next, and within each scan plane, speeds up image computation. The proposed algorithm is a modified version of the scan-line algorithm, suitable for parallelization and for handling large data sets efficiently. Based on an efficiency analysis of this version, it is concluded that minimal additional computing and communication are required if each processor is given the task of computing sequences of successive lines in the image. Ways of achieving good load balancing on a group of heterogeneous workstations that have arbitrary loads by other users are suggested 相似文献
5.
Canto S.D. de Madrid A.P. Bencomo S.D. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(9):785-798
The standard DP (dynamic programming) algorithms are limited by the substantial computational demands they put on contemporary serial computers. In this work, the theory behind the solution to serial monadic dynamic programming problems highlights the theory and application of parallel dynamic programming on a general-purpose architecture (cluster or network of workstations). A simple and well-known technique, message passing, is considered. Several parallel serial monadic DP algorithms are proposed, based on the parallelization in the state variables and the parallelization in the decision variables. Algorithms with no interpolation are also proposed. It is demonstrated how constraints introduce load unbalance which affect scalability and how this problem is inherent to DP. 相似文献
6.
Angelo Corana 《Concurrency and Computation》1998,10(10):737-762
We present a parallel algorithm for computing the correlation dimension (D2) from a time series generated by a dynamic system, using the method of correlation integrals, which essentially requires the computation of distances among a set of points in the state space. The parallelization is suitable for coarse-grained multiprocessor systems with distributed memory and is carried out using a virtually shared memory model. The algorithm simultaneously gives all the correlation integrals at various state space dimensions needed to estimate the D2. Two versions are discussed: the first computes all distances between points; the second computes only distances less than a fixed ϵ, and employs a box-assisted approach and linked lists for an efficient search of neighbouring points. The algorithms, coded in Fortran 77, are tested on a heterogeneous network of workstations consisting of various DEC Alphas of different powers, interconnected by Ethernet; the Network Linda parallel environment is used. A detailed analysis of performance is carried out using the generalization of speed-up and efficiency for heterogeneous systems. The algorithms are fully asynchronous and so intrinsically balanced. In almost all the situations they provide a unitary efficiency. The second version greatly reduces the computational work, thus making it possible to tackle D2 estimation even for medium and high-dimensional systems, where an extremely large number of points is involved. The algorithms can also be employed in other applicative contexts requiring the efficient computation of distances among a large set of points. The method proposed for the analysis of performance can be applied to similar problems. © 1998 John Wiley & Sons, Ltd. 相似文献
7.
Mikhail J. Atallah Christina Lock Black Dan C. Marinescu Howard Jay Siegel Thomas L. Casavant 《Journal of Parallel and Distributed Computing》1992,16(4)
The problem of using the idle cycles of a number of high performance workstations, interconnected by a high speed network, for solving computationally intensive tasks is discussed. The classes of distributed applications examined require some form of synchronization among the subtasks, hence the need for coscheduling to guarantee that subtasks start at the same time and execute at the same pace on a group of workstations. A model of the system is presented that allows the definition of an objective function to be maximized. Then a quadratic time and linear space algorithm is derived for computing the optimal coschedule, for the given model and class of applications addressed. 相似文献
8.
Joshi R.K. Ram D.J. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(1):75-90
Parallel computing on interconnected workstations is becoming a viable and attractive proposition due to the rapid growth in speeds of interconnection networks and processors. In the case of workstation clusters, there is always a considerable amount of unused computing capacity available in the network. However, heterogeneity in architectures and operating systems, load variations on machines, variations in machine availability, and failure susceptibility of networks and workstations complicate the situation for the programmer. In this context, new programming paradigms that reduce the burden involved in programming for distribution, load adaptability, heterogeneity and fault tolerance gain importance. This paper identifies the issues involved in parallel computing on a network of workstations. The anonymous remote computing (ARC) paradigm is proposed to address the issues specific to parallel programming on workstation systems. ARC differs from the conventional communicating process model by treating a program as one single entity consisting of several loosely coupled remote instruction blocks instead of treating it as a collection of processes. The ARC approach results in distribution transparency and heterogeneity transparency. At the same time, it provides fault tolerance and load adaptability to parallel programs on workstations. ARC is developed in a two-tiered architecture consisting of high level language constructs and low level ARC primitives. The paper describes an implementation of the ARC kernel supporting ARC primitives 相似文献
9.
Wan Ahmad Tajuddin Wan Abdullah 《国际智能系统杂志》1992,7(6):513-519
We propose a method of doing logic programming on a Hopfield neural network. Optimization of logical consistency is carried out by the network after the connection strengths are defined from the logic program; the network relaxes to neural states corresponding to a valid (or near-valid) interpretation. 相似文献
10.
数据并行模型应用到MIMD机器上,实现SPMD模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Multi-c的设计和实现。正在实现的Muliti-c编译器,以预编译的方式接受SIMD形式的程序说明,放宽同步要求,产生能以SPMK方式在并行系统上运行的C程序。 相似文献
11.
Road network microsimulation is computationally expensive, and existing state of the art commercial tools use task parallelism and coarse-grained data-parallelism for multi-core processors to achieve improved levels of performance. An alternative is to use Graphics Processing Units (GPUs) and fine-grained data parallelism. This paper describes a GPU accelerated agent based microsimulation model of a road network transport system. The performance for a procedurally generated grid network is evaluated against that of an equivalent multi-core CPU simulation. In order to utilise GPU architectures effectively the paper describes an approach for graph traversal of neighbouring information which is vital to providing high levels of computational performance. The graph traversal approach has been integrated within a GPU agent based simulation framework as a generalised message traversal technique for graph-based communication. Speed-ups of up to 43 × are demonstrated with increased performance scaling behaviour. Simulation of over half a million vehicles and nearly two million detectors at a rate of 25 × faster than real-time is obtained on a single GPU. 相似文献
12.
13.
《Journal of Parallel and Distributed Computing》2004,64(10):1127-1156
Ray-tracing based radio wave propagation prediction models play an important role in the design of contemporary wireless networks as they may now take into account diverse physical phenomena including reflections, diffractions, and diffuse scattering. However, such models are computationally expensive even for moderately complex geographic environments. In this paper, we propose a computational framework that functions on a network of workstations (NOW) and helps speed up the lengthy prediction process. In ray-tracing based radio propagation prediction models, orders of diffractions are usually processed in a stage-by-stage fashion. In addition, various source points (transmitters, diffraction corners, or diffuse scattering points) and different ray-paths require different processing times. To address these widely varying needs, we propose a combination of the phase-parallel and manager/workers paradigms as the underpinning framework. The phase-parallel component is used to coordinate different computation stages, while the manager/workers paradigm is used to balance workloads among nodes within each stage. The original computation is partitioned into multiple small tasks based on either raypath-level or source-point-level granularity. Dynamic load-balancing scheduling schemes are employed to allocate the resulting tasks to the workers.We also address issues regarding main memory consumption, intermediate data assembly, and final prediction generation. We implement our proposed computational model on a NOW configuration by using the message passing interface (MPI) standard. Our experiments with real and synthetic building and terrain databases show that, when no constraint is imposed on the main memory consumption, the proposed prediction model performs very well and achieves nearly linear speedups under various workload. When main memory consumption is a concern, our model still delivers very promising performance rates provided that the complexity of the involved computation is high, so that the extra computation and communication overhead introduced by the proposed model do not dominate the original computation. The accuracy of prediction results and the achievable speedup rates can be significantly improved when 3D building and terrain databases are used and/or diffuse scattering effect is taken into account. 相似文献
14.
Recommender systems in location-based social networks (LBSNs), such as Facebook Places and Foursquare, have focused on recommending friends or locations to registered users by combining information derived from explicit (i.e. friendship network) and implicit (i.e. user-item rating network, user-location network, etc.) sub-networks. However, previous models were static and failed to adequately capture user time-varying preferences. In this paper, we provide a novel recommendation method based on the time dimension as well. We construct a hybrid tripartite (i.e., user, location, session) graph, which incorporates 7 different unipartite and bipartite graphs. Then, we test it with an extended version of the Random Walk with Restart (RWR) algorithm, which randomly walks through the network by using paths of 7 differently weighted edge types (i.e., user-location, user-session, user-user, etc.). We evaluate experimentally our method and compare it against three state-of-the-art algorithms on two real-life datasets; we show a significant prevalence of our method over its competitors. 相似文献
15.
16.
针对异构处理器在传统通用计算中利用率低的问题,提出基于开放计算语言OpenCL(open computing language)的新的通用计算技术,它提供了统一的编程模型。介绍了OpenCL的特点、架构及实现原理等,并提出OpenCL性能优化策略。将OpenCL与计算统一设备架构CUDA(compute unified device architecture)及其它通用计算技术进行对比。对比结果表明,OpenCL能够充分发挥异构处理平台上各种处理器的性能潜力,充分合理地分配任务,为进行大规模并行计算提供了新的强有力的工具。 相似文献
17.
A network of workstation(NOW) can act as a single and scalable powerful computer by building a paralle and distributed computing platform on top of it.WAKASHI is such a platform system that supports persitent object management and makes full use of resources of NOW for high perforance transaction processing,One of the main difficulties to overcome is the bottleneck caused by concurrency control mechanism.Therefore,a non-bloking locking method is designed,by adopting several novel techniques to make it outperform the other typical locking methods such as 2PL:1) an SDG (Semantic Dependency Graph)based non-blocking locking protocol for fast transaction scheduling;2) a nmassively virtual memory based backup-page undo algorithm for fast restart;and 3) a multi-processor and multi-thread based transaction manager for fast execution.The new mechanisms have been implemented in WAKASHI and the performance comparison experiments have been implemented in WAKASHI and the performance comparison experiments with 2PL and DWDL have been done.The results show that the new method can outperform 2PL and DWDL under certain conditons.This is meaningful for choosing effective concurrency control mechanisms for improving transaction-rpocessing performance in NOW environments. 相似文献
18.
JXTA: a network programming environment 总被引:4,自引:0,他引:4
《Internet Computing, IEEE》2001,5(3):88-95
JXTA technology, from Sun Microsystems, is a network programming and computing platform that is designed to solve a number of problems in modern distributed computing, especially in the area broadly referred to as peer-to-peer (P2P) computing or P2P networking. JXTA provides a network programming platform specifically designed to be the foundation for P2P systems. As a set of protocols, the technology stays away from APIs and remains independent of programming languages. This means that heterogeneous devices with completely different software stacks can interoperate through JXTA protocols. JXTA technology is also independent of transport protocols. It can be implemented on top of TCP/IP, HTTP, Bluetooth, HomePNA, and many other protocols 相似文献
19.
Singh A. Schaeffer J. Green M. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(1):52-67
A computational model and system for the generation of distributed applications in a workstation environment are presented. The well-known RPC model is modified by a novel concept known as template attachment. A computation consists of a network of sequential procedures which have been encapsulated in templates. A small selection of templates is available from which a distributed application with the desired communication behavior can be rapidly built. The system generates all the required low-level code for correct synchronization, communication, and scheduling. This results in a system that is easy to use and flexible and can provide a programmer with the desired amount of control in using idle processing power over a network of workstations. The practical feasibility of the model has been demonstrated by implementing it for Unix-based workstation environments 相似文献
20.
多流编程机制为异构众核加速器提供流水、资源划分等多种资源使用方式,但如何选择有效使用方式目前缺乏指导。基于异构众核处理器Intel MIC上的hStreams,提出了针对单应用多流程序多硬件分区执行的性能模型,分析不同配置下多流程序性能差异的原因,指出了影响多流程序性能的关键因素,提出多流程序划分优化策略,同时所提性能模型能够帮助判断算法实现的效果。实验结果表明,性能模型与多流配置实际测试结果误差小于1%,根据性能模型指导调优稠密矩阵乘的多流程序,比单流程序获得了5.83%的性能提升。 相似文献