期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Heterogeneous parallel and distributed computing

V. S. Sunderam G. A. Geist 《Parallel Computing》1999,25(13-14)

Heterogeneous network-based distributed and parallel computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss the evolution of heterogeneous concurrent computing, in the context of the parallel virtual machine (PVM) system, a widely adopted software system for network computing. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the PVM project, and comment on ongoing and future work. 相似文献

2.

PVaniM: a tool for visualization in network computing environments

Brad Topol John T. Stasko Vaidy Sunderam 《Concurrency and Computation》1998,10(14):1197-1222

Network computing has evolved into a popular and effective mode of high performance computing. Network computing environments have fundamental differences from hardware multiprocessors, involving a different approach to measuring and characterizing performance, monitoring an application's progress and understanding program behavior. In this paper, we present the design and implementation of PVaniM, an experimental visualization environment we have developed for the PVM network computing system. PVaniM supports a two-phase approach whereby on-line visualization focuses on large-grained events that are influenced by and relate to the dynamic network computing environment, and postmortem visualization provides for detailed program analysis and tuning. PVaniM's capabilities are illustrated via its use on several applications and a comparison with single-phase visualization environments developed for network computing. Our experiences indicate that, for several classes of applications, the two-phase visualization scheme can provide valuable insight into the behavior, efficiency and operation of distributed and parallel programs in network computing environments. © 1998 John Wiley & Sons, Ltd. 相似文献

3.

Network-based concurrent computing on the PVM system

G. A. Geist V. S. Sunderam 《Concurrency and Computation》1992,4(4):293-311

Concurrent computing environments based on loosely coupled networks have proven effective as resources for multiprocessing. Experiences with and enhancements to version 1.0 of PVM (Parallel Virtual Machine) are described in this paper. PVM is a software package that allows the utilization of a heterogeneous network of parallel and serial computers as a single computational resource. This report also describes an interactive graphical interface to PVM, and porting and performance results from production applications. 相似文献

4.

Prospects for virtualization of high-performance x64 systems

A. O. Kudryavtsev V. K. Koshelev A. I. Avetisyan 《Programming and Computer Software》2013,39(6):285-294

Prospects for applying virtualization technology in high-performance computations on the x64 systems are studied. Principal reasons for performance degradation when parallel programs are running in virtual environments are considered. The KVM/QEMU and Palacios virtualization systems are considered in detail, with the HPC Challenge and NAS Parallel Benchmarks used as benchmarks. A modern computing cluster built on the Infiniband high-speed interconnect is used in testing. The results of the study show that, in general, virtualization is reasonable for a wide class of high-performance applications. Fine tuning of the virtualization systems involved made it possible to reduce overheads from 10–60% to 1–5% on the majority of tests from the HPC Challenge and NAS Parallel Benchmarks suites. The main bottlenecks of virtualization systems are reduced performance of the memory system (which is critical only for a narrow class of problems), costs associated with hardware virtualization, and the increased noise caused by the host operating system and hypervisor. Noise can have a negative effect on performance and scalability of fine-grained applications (applications with frequent small-scale communications). The influence of noise significantly increases as the number of nodes in the system grows. 相似文献

5.

Scheduling aspects in keyword extraction problem

下载免费PDF全文

Michał Zimniewicz Krzysztof Kurowski Jan Węglarz 《International Transactions in Operational Research》2018,25(2):507-522

The amount of big data collected during human–computer interactions requires natural language processing (NLP) applications to be executed efficiently, especially in parallel computing environments. Scalability and performance are critical in many NLP applications such as search engines or web indexers. However, there is a lack of mathematical models helping users to design and apply scheduling theory for NLP approaches. Moreover, many researchers and software architects reported various difficulties related to common NLP benchmarks. Therefore, this paper aims to introduce and demonstrate how to apply a scheduling model for a class of keyword extraction approaches. Additionally, we propose methods for the overall performance evaluation of different algorithms, which are based on processing time and correctness (quality) of answers. Finally, we present a set of experiments performed in different computing environments together with obtained results that can be used as reference benchmarks for further research in the field. 相似文献

6.

Communication system for high-performance distributed computing

S. Hariri J.-B. Park M. Parashar G. C. Fox 《Concurrency and Computation》1994,6(4):251-270

With the current advances in computer and networking technology coupled with the availability of software tools for parallel and distributed computing, there has been increased interest in high-performance distributed computing (HPDC). We envision that HPDC environments with supercomputing capabilities will be available in the near future. However, a number of issues have to be resolved before future network-based applications can fully exploit the potential of the HPDC environment. In the paper we present an architecture for a high-speed local area network and a communication system that provides HPDC applications with high bandwidth and low latency. We also characterize the message-passing primitives required in HPDC applications and develop a communication protocol that implements these primitives efficiently. 相似文献

7.

GCSS: a global collaborative scheduling strategy for wide-area high-performance computing

Yao SONG Limin XIAO Liang WANG Guangjun QIN Bing WEI Baicheng YAN Chenhao ZHANG 《Frontiers of Computer Science》2022,16(5):165105

Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources. However, the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging. To achieve a higher system performance, this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments. The collaborative scheduling strategy integrates lightweight solution selection, redundant data placement and task stealing mechanisms, optimizing task distribution and data placement to achieve efficient computing in wide-area environments. The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+, the proposed scheduling strategy reduces the makespan by 23.24%, improves computing and storage resource utilization by 8.28% and 21.73% respectively, and achieves similar global data migration costs. 相似文献

8.

Multiparadigm distributed computing with TPVM

ADAM FERRARI V. S. SUNDERAM 《Concurrency and Computation》1998,10(3):199-228

Distributed concurrent computing based on lightweight processes can potentially address performance and functionality limits in heterogeneous systems. The TPVM framework, based on the notion of ‘exportable services’, is an extension to the PVM message-passing system, but uses threads as units of computing, scheduling, and parallelism. TPVM facilitates and supports three different distributed concurrent programming paradigms: (a) the traditional, task based, explicit message-passing model; (b) a data-driven instantiation model that enables straightforward specification of computation based on data dependencies; and (c) a partial shared-address space model via remote memory access, with naming and typing of distributed data areas. The latter models offer significantly different computing paradigms for network-based computing, while maintaining a close resemblance to, and building upon, the conventional PVM infrastructure in the interest of compatibility and ease of transition. The TPVM system comprises three basic modules: a library interface that provides access to thread-based distributed concurrent computing facilities, a portable thread interface module which abstracts the required thread-related services, and a thread server module which performs scheduling and system data management. System implementation as well as applications experiences have been very encouraging, indicating the viability of the proposed models, the feasibility of portable and efficient threads systems for distributed computing, and the performance improvements that result from multithreaded concurrent computing. © 1998 John Wiley & Sons, Ltd. 相似文献

9.

一种面向GPGPU的行为感知的存储调度策略

刘子骏何炎祥张军李清安沈凡凡《计算机工程与科学》2017,39(6):1011-1021

随着通用图形处理器在高性能计算领域的广泛应用,新的并行执行模式被提出。在新模式下,当前的存储调度策略未能使存储器的吞吐率达到最大。分析了图形处理器上多程序并行执行模式下应用程序访存行为特征及其性能损失不公平的原因,提出了一种基于访存行为感知的存储调度策略,利用不同程序类型的优势进行优先级调度。实验表明,该方法能够明显改善不同类型程序间性能损失不均衡的问题,相比基准结构对所有测试程序的存储系统吞吐率和公平性分别有平均9.7%和15.0%的提升。相似文献

10.

Microbenchmark performance comparison of high-speed cluster interconnects

Liu J. Balasubramanian Chandrasekaran Yu W. Wu J. Buntinas D. Sushmitha Kini Panda D.K. Wyckoff P. 《Micro, IEEE》2004,24(1):42-51

Today's distributed and high-performance applications require high computational power and high communication performance. Recently, the computational power of commodity PCs has doubled about every 18 months. At the same time, network interconnects that provide very low latency and very high bandwidth are also emerging. This is a promising trend in building high-performance computing environments by clustering - combining the computational power of commodity PCs with the communication performance of high-speed network interconnects. There are several network interconnects that provide low latency and high bandwidth. Traditionally, researchers have used simple microbenchmarks, such as latency and bandwidth tests, to characterize a network interconnects communication performance. Later, they proposed more sophisticated models such as LogP. However, these tests and models focus on general parallel computing systems and do not address many features present in these emerging commercial interconnects. Another way to evaluate different network interconnects is to use real-world applications. However, real applications usually run on top of a middleware layer such as the message passing interface (MPI). Our results show that to gain more insight into the performance characteristics of these interconnects, it is important to go beyond simple tests such as those for latency and bandwidth. In future, we plan to expand our microbenchmark suite to include more tests and more interconnects. 相似文献

11.

Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing

James S. Plank Youngbae Kim Jack J. Dongarra 《Journal of Parallel and Distributed Computing》1997,43(2):427

Networks of workstations (NOWs) offer a cost-effective platform for high-performance, long-running parallel computations. However, these computations must be able to tolerate the changing and often faulty nature of NOW environments. We present high-performance implementations of several fault-tolerant algorithms for distributed scientific computing. The fault-tolerance is based on diskless checkpointing, a paradigm that uses processor redundancy rather than stable storage as the fault-tolerant medium. These algorithms are able to run on clusters of workstations that change over time due to failure, load, or availability. As long as there are at leastnprocessors in the cluster, and failures occur singly, the computation will complete in an efficient manner. We discuss the details of how the algorithms are tuned for fault-tolerance and present the performance results on a PVM network of Sun workstations connected by a fast, switched ethernet. 相似文献

12.

高性能计算环境下对校园网络安全体系的设计与优化

李卓《软件》2014,(3):38-39

目前高校也建立起了高性能计算环境体系，主要被用于从事大规模科学计算和复杂问题求解，拥有很高的计算能力和海量的存储资源，时常遭受来自互联网的入侵或攻击。由于单一的安全技术无法满足高性能计算环境复杂的安全防护需求，进一步恶化了高性能计算环境的安全状况。因此建立一套安全防御体系对高性能计算环境的正常运行有着非常重要的意义。相似文献

13.

LBS—基于PVM的动态任务负载平衡系统 总被引：1，自引：0，他引：1

傅强郑纬民《小型微型计算机系统》1998,19(10):6-11

负载平衡问题是影响工作站机群系统并行计算性能的一个重要因素。相似文献

14.

面向人工智能和大数据的高效能计算

李肯立阳王东陈岑陈建国丁岩《数据与计算发展前沿》2020,2(1):27-37

【目的】本文主要分析人工智能和大数据应用随着迅速增大的数据规模,给计算机系统带来的主要挑战,并针对计算机系统的发展趋势给出了一些面向人工智能和大数据亟待解决的高效能计算的若干研究方向。【文献范围】本文广泛查阅国内外在超级计算和高性能计算平台进行大数据和人工智能计算的最新研究成果及解决的挑战性问题。【方法】大数据既为人工智能提供了日益丰富的训练数据集合,但也给计算机系统的算力提出了更高的要求。近年来我国超级计算机处于世界的前列,为大数据和人工智能的大规模应用提供了强有力的计算平台支撑。【结果】而目前以超级计算机为代表的高性能计算平台大多采用CPU+加速器构成的异构并行计算系统,其数量众多的计算核心能够为人工智能和大数据应用提供强大的计算能力。【局限性】由于体系结构复杂,在充分发挥计算能力和提高计算效率方面存在较大挑战。尤其针对有别于科学计算的人工智能和大数据领域,其并行计算效率的提升更为困难。【结论】因此需要从底层的资源管理、任务调度、以及基础算法设计、通信优化,到上层的模型并行化和并行编程等方面展开高效能计算的研究,全面提升人工智能和大数据应用在高性能计算平台上的计算能效。相似文献

15.

集群系统通信性能的测量

姚渺裴巍单珊孟波杨愚鲁《计算机工程与应用》2005,41(17):156-159,196

集群系统通信性能作为影响集群性能的主要因素之一,其测量对寻找集群内部通信瓶颈具有指导作用。采用NetPIPE基准测试对PC集群系统和Sun工作站集群的通信性能进行了测量,实验结果与理论分析一致,表明在通信性能方面,MPI环境整体上优于PVM,合并一些非相关短消息为长消息能够优化集群应用。并采用性能模拟的方法,以基准测试为工具,对两个集群系统的带参数LogP通信模型进行了定量化地测量和计算,完整表征了集群通信子系统的通信性能特征。相似文献

16.

基于代理的网格计算中间件 总被引：11，自引：0，他引：11

陈亚玲桂小林王庆江钱德沛《计算机研究与发展》2003,40(12):1806-1810

WADE系统是基于代理技术实现的一个可屏蔽异构和分布性的动态自适应的校园计算网格，提出了基于代理技术在校园网络内实现并行计算的方法，详细论述了基于代理的网格计算中间件的体系结构和主要模块功能，阐述了利用代理实现异构编译、协同计算的过程，给出了代理的Java实现方法，利用软件代理实现网格计算中间件，可以解决异构计算平台下多种并行编程环境的协同计算问题，为用户提供统一的服务接口，这将大大增强系统的可用性。相似文献

17.

Quantifying the Performance Differences between PVM and TreadMarks 总被引：1，自引：0，他引：1

Honghui Lu Sandhya Dwarkadas Alan L. Cox Willy Zwaenepoel 《Journal of Parallel and Distributed Computing》1997,43(2):932

This paper compares two systems for parallel programming on networks of workstations: Parallel Virtual Machine (PVM), a message-passing system, and TreadMarks, a software distributed shared-memory (DSM) system. The eight applications used in this comparison are Water and Barnes–Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS), and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR) and Traveling Salesman (TSP). Two different input data sets are used for five of the applications. We use two execution environments. The first is a 155 Mbps ATM network with eight Sparc-20 model 61 workstations; the second is an eight-processor IBM SP/2. The differences in speedup between TreadMarks and PVM depend mostly on the applications, and only to a much lesser extent on the platform and the data set used. In particular, the TreadMarks speedup for six of the eight applications is within 15% of that achieved with PVM. For one application, the difference in speedup is between 15% and 30%, and for another, the difference is around 50%. We identified four important factors that contribute to the lower performance of TreadMarks: (1) extra messages due to the separation of synchronization and data transfer, (2) extra messages to handle access misses caused by the use of an invalidate protocol, (3) false sharing, and (4) diff accumulation for migratory data. We have quantified the effects of the last three factors by measuring the performance gain when each is eliminated. Of the three factors, TreadMarks' use of a separate request message per page of data accessed is the most important. The effect of false sharing is comparatively low. Reducing diff accumulation benefits migratory data only when the diffs completely overlap. When these performance impediments are removed, all of the TreadMarks programs perform within 25% of PVM, and for six out of eight experiments, TreadMarks is less than 5% slower than PVM. 相似文献

18.

Building problem-solving environments with the Arches framework

Nathan DeBardeleben Author Vitae Ron Sass Author Vitae Daniel Stanzione Jr.^{Author Vitae} 《Journal of Systems and Software》2009,82(7):1137-1151

The computational problems that scientists face are rapidly escalating in size and scope. Moreover, the computer systems used to solve these problems are becoming significantly more complex than the familiar, well-understood sequential model on their desktops. While it is possible to re-train scientists to use emerging high-performance computing (HPC) models, it is much more effective to provide them with a higher-level programming environment that has been specialized to their particular domain. By fostering interaction between HPC specialists and the domain scientists, problem-solving environments (PSEs) provide a collaborative environment. A PSE environment allows scientists to focus on expressing their computational problem while the PSE and associated tools support mapping that domain-specific problem to a high-performance computing system.This article describes Arches, an object-oriented framework for building domain-specific PSEs. The framework was designed to support a wide range of problem domains and to be extensible to support very different high-performance computing targets. To demonstrate this flexibility, two PSEs have been developed from the Arches framework to solve problem in two different domains and target very different computing platforms. The Coven PSE supports parallel applications that require large-scale parallelism found in cost-effective Beowulf clusters. In contrast, RCADE targets FPGA-based reconfigurable computing and was originally designed to aid NASA Earth scientists studying satellite instrument data. 相似文献

19.

SCIPVM: Parallel distributed computing on SCI workstation clusters

Ivan Zoraja Hermann Hellwagner Vaidy Sunderam 《Concurrency and Computation》1999,11(3):121-138

Workstation and PC clusters interconnected by SCI (scalable coherent interface) are very promising technologies for high-performance cluster computing. Using commercial SBus to SCI interface cards and system software and drivers, a two-workstation cluster has been constructed for initial testing and evaluation. The PVM system has been adapted to operate on this cluster using both raw channel and shared-memory access to the SCI interconnect, and preliminary communications performance tests have been carried out. To achieve mutual exclusion in accessing shared-memory segments, two protocols were used. Our preliminary results indicate that communications throughput in the range of 17.7 Mbytes/s, and round-trip latencies of 80 μs using the first and 140 μs using the second protocol, can be obtained on SCI clusters. These figures are significantly better (by a factor of 2 to 4) for small and large messages than those attainable on Fast Ethernet LANs. Since these performance results are very encouraging, we expect that, in the very near future, SCI networks will be capable of delivering several tens of Mbytes/s bandwidth and a few tens of microseconds latencies, and will significantly enhance the viability of cluster computing. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

20.

IB网上CPU-GPU异构超算平台容器性能评估及优化

下载免费PDF全文

胡鹤赵毅王宪贺《计算机工程与应用》2021,57(18):82-85

为了实现资源和系统环境的隔离,近年来新兴了多种虚拟化工具,容器便是其中之一。在超算资源上运行的问题通常是由软件配置引起的。容器的一个作用就是将依赖打包进轻量级可移植的环境中,这样可以提高超算应用程序的部署效率。为了解基于IB网的CPU-GPU异构超算平台上容器虚拟化技术的性能特征,使用标准基准测试工具对Docker容器进行了全面的性能评估。该方法能够评估容器在虚拟化宿主机过程中产生的性能开销,包括文件系统访问性能、并行通信性能及GPU计算性能。结果表明,容器具备近乎原生宿主机的性能,文件系统I/O开销及GPU计算开销与原生宿主机差别不大。随着网络负载的增大,容器的并行通信开销也相应增大。根据评估结果,提出了一种能够发挥超算平台容器性能的方法,为使用者有针对性地进行系统配置、合理设计应用程序提供依据。相似文献