期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Productivity prediction of MPI programs based on models

A. I. Avetisyan S. S. Gaisaryan V. P. Ivannikov V. A. Padaryan 《Automation and Remote Control》2007,68(5):750-759

A model of parallel program that can be effectively interpreted on the development computer guaranteeing the possibility of a sufficiently precise prediction of real run time for a simulated parallel program at the prescribed computer system is studied. The model is worked out for parallel programs with explicit message passing written in the Java language with MPI library access and is included into the composition of ParJava environment. The model is obtained by transforming the program control tree that can be constructed for Java programs by modifying the abstract syntax tree. To model communication functions, the model LogGP is used which allows taking into consideration the specific character of the communication network of the distributed computer system. 相似文献

2.

LogGPO: An accurate communication model for performance prediction of MPI programs

WenGuang Chen JiDong Zhai Jin Zhang WeiMin Zheng 《中国科学F辑(英文版)》2009,52(10):1785-1791

Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th... 相似文献

3.

A technique for non-invasive application-level checkpointing

Ritu Arora Purushotham Bangalore Marjan Mernik 《The Journal of supercomputing》2011,57(3):227-255

One of the key elements required for writing self-healing applications for distributed and dynamic computing environments is checkpointing. Checkpointing is a mechanism by which an application is made resilient to failures by storing its state periodically to the disk. The main goal of this research is to enable non-invasive reengineering of existing applications to insert Application-Level Checkpointing (ALC) mechanism. The Domain-Specific Language (DSL) developed in this research serves as a perfect means towards this end and is used for obtaining the ALC-specifications from the end-users. These specifications are used for generating and inserting the actual checkpointing code into the existing application. The performance of the application having the generated checkpointing code is comparable to the performance of the application in which the checkpointing code was inserted manually. With slight modifications, the DSL developed in this research can be used for specifying the ALC mechanism in several base languages (e.g., C/C++, Java, and FORTRAN). 相似文献

4.

Net-dbx: a web-based debugger of MPI programs over low-bandwidthlines

Neophytou N. Evripidou P. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(9):986-995

This paper describes Net-dbx, a tool that utilizes Java and other World Wide Web tools for the debugging of MPI programs from anywhere in the Internet. Net-dbx is a source-level interactive debugger with the full power of gdb (the GNU Debugger) augmented with the debug functionality of the public-domain MPI implementation environments. The main effort was on a low overhead, yet powerful, graphical interface supported by low-bandwidth connections. The portability of the tool is of great importance as well because it enables the tool to be used on heterogeneous nodes that participate in an MPI multicomputer. Both needs are satisfied a great deal by the use of WWW browsing tools and the Java programming language. The user of our system simply points his/her browser to the Net-dbx page, logs in to the destination system, and starts debugging by interacting with the tool, just as with any GUI environment. The user can dynamically select which MPI processes to view/debug. A special WWW-based environment has been designed and implemented to host the system prototype 相似文献

5.

In-memory hash tables for accumulating text vocabularies 总被引：5，自引：0，他引：5

Justin Zobel Steffen Heinz Hugh E. Williams 《Information Processing Letters》2001,80(6):271-277

相似文献

6.

MPI Correctness Checking for OpenMP/MPI Applications

Tobias Hilbrich Matthias S. Müller Bettina Krammer 《International journal of parallel programming》2009,37(3):277-291

The MPI interface is the de-facto standard for message passing applications, but it is also complex and defines several usage patterns as erroneous. A current trend is the investigation of hybrid programming techniques that use MPI processes and multiple threads per process. As a result, more and more MPI implementations support multi-threading, which are restricted by several rules of the MPI standard. In order to support developers of hybrid MPI applications, we present extensions to the MPI correctness checking tool Marmot. Basic extensions make it aware of OpenMP multi-threading, while further ones add new correctness checks. As a result, it is possible to detect errors that actually occur in a run with Marmot. However, some errors only occur for certain execution orders, thus, we present a novel approach using artificial data races, which allows us to employ thread checking tools, e.g., Intel Thread Checker, to detect MPI usage errors. 相似文献

7.

Lateral error recovery for media streaming in application-level multicast 总被引：1，自引：0，他引：1

Yiu W.-P.K. Wong K.-F.S. Chan S.-H.G. Wan-Ching Wong Qian Zhang Wen-Wu Zhu Ya-Qin Zhang 《Multimedia, IEEE Transactions on》2006,8(2):219-232

We consider media streaming using application-level multicast (ALM) where packet loss has to be recovered via retransmission in a timely manner. Since packets may be lost due to congestion, node failures, and join and leave dynamics, traditional "vertical" recovery approach where upstream nodes retransmit the lost packets is no longer effective. We therefore propose lateral error recovery (LER). In LER, hosts are divided into a number of planes, each of which forms an independent ALM tree. Since error correlation across planes is low, a node effectively recovers its error by "laterally" requesting retransmission from nearby nodes in other planes. We present analysis on the complexity and recovery delay on LER. Using Internet-like topologies, we show via simulations that LER is an effective error recovery mechanism. It achieves low overhead in terms of delivery delay (i.e., relative delay penalty) and physical link stress. As compared with traditional recovery schemes, LER attains much lower residual loss rate (i.e., loss rate after retransmission) under a certain deadline constraint. The performance can be substantially improved in the presence of some reliable proxies. 相似文献

8.

基于内存计算的钢铁价格预测算法研究

朱靖翔张滨乐嘉锦《计算机科学》2014,41(Z2)

由于钢铁价格具有非线性和因子难以确定的特点,在数据挖掘预测分析时,传统的预测方法只能对钢铁价格进行小数据量的分析,这将导致预测精度低、速度慢、效率低下。随着大数据的深入研究,内存计算技术成为研究热点,用户对实时数据处理技术的需求越来越大。因此,在钢铁价格预测模型中,引入内存计算技术,提出基于内存计算的LM-BP神经网络预测算法,利用2002年到2010年的钢铁价格、产量、库存、GDP等数据建立预测模型。最后,仿真实验结果表明,基于内存计算的预测模型算法不仅速度快,而且精度高。相似文献

9.

In-memory,distributed content-based recommender system

Simon Dooms Pieter Audenaert Jan Fostier Toon De Pessemier Luc Martens 《Journal of Intelligent Information Systems》2014,42(3):645-669

Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node). 相似文献

10.

Distributed,application-level monitoring for heterogeneous clouds using stream processing

Michael Smit Bradley Simmons Marin Litoiu 《Future Generation Computer Systems》2013,29(8):2103-2114

As utility computing is widely deployed, organizations and researchers are turning to the next generation of cloud systems: federating public clouds, integrating private and public clouds, and merging resources at all levels (IaaS, PaaS, SaaS). Adaptive systems can help address the challenge of managing this heterogeneous collection of resources. While services and libraries exist for basic management tasks that enable implementing decisions made by the manager, monitoring is an open challenge. We define a set of requirements for aggregating monitoring data from a heterogeneous collections of resources, sufficient to support adaptive systems. We present and implement an architecture using stream processing to provide near-realtime, cross-boundary, distributed, scalable, fault-tolerant monitoring. A case study illustrates the value of collecting and aggregating metrics from disparate sources. A set of experiments shows the feasibility of our prototype with regard to latency, overhead, and cost effectiveness. 相似文献

11.

Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs

Vincent W. Freeh Nandini Kappiah David K. Lowenthal Tyler K. Bletsch 《Journal of Parallel and Distributed Computing》2008

Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. As a result, improving energy efficiency of nodes on HPC machines has become important, and the prevalence of power-scalable clusters, where the frequency and voltage can be dynamically modified, has increased. 相似文献

12.

Specifying and enforcing application-level Web security policies 总被引：1，自引：0，他引：1

Scott D. Sharp R. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(4):771-783

Application-level Web security refers to vulnerabilities inherent in the code of a Web-application itself (irrespective of the technologies in which it is implemented or the security of the Web-server/back-end database on which it is built). In the last few months, application-level vulnerabilities have been exploited with serious consequences: Hackers have tricked e-commerce sites into shipping goods for no charge, usernames and passwords have been harvested, and confidential information (such as addresses and credit-card numbers) has been leaked. We investigate new tools and techniques which address the problem of application-level Web security. We 1) describe a scalable structuring mechanism facilitating the abstraction of security policies from large Web-applications developed in heterogeneous multiplatform environments; 2) present a set of tools which assist programmers in developing secure applications which are resilient to a wide range of common attacks; and 3) report results and experience arising from our implementation of these techniques. 相似文献

13.

Detecting application-level failures in component-based Internet services 总被引：2，自引：0，他引：2

Kiciman E. Fox A. 《Neural Networks, IEEE Transactions on》2005,16(5):1027-1041

Most Internet services (e-commerce, search engines, etc.) suffer faults. Quickly detecting these faults can be the largest bottleneck in improving availability of the system. We present Pinpoint, a methodology for automating fault detection in Internet services by: 1) observing low-level internal structural behaviors of the service; 2) modeling the majority behavior of the system as correct; and 3) detecting anomalies in these behaviors as possible symptoms of failures. Without requiring any a priori application-specific information, Pinpoint correctly detected 89%-96% of major failures in our experiments, as compared with 20%-70% detected by current application-generic techniques. 相似文献

14.

MPI环境下MPE图形功能的分析与扩展

罗秋明王梅李晶《计算机工程与应用》2006,42(19):87-89

在集群并行计算过程中常常需要将计算结果图形化输出,虽然在MPICH软件包中的MPE包含了基本的并行图形输出功能,但是由于它提供的绘图功能极其简单,不能满足有关图像处理等方面应用的输出结果显示,因此文章通过对MPE图形功能的复杂核心代码加以分析并完成扩展MPE库,为一些复杂图形功能增加新图形库函数来完成输出任务,并减轻编程难度提高效率,这个扩展方法同样适用于其它领域的MPI并行应用程序的结果可视化。相似文献

15.

The dynamic behaviour of parallel programs under process migration

Rosemary Candlin Joseph Phillips 《Concurrency and Computation》1995,7(7):591-613

We have studied the interaction between process-based parallel programs whose characteristics change in various ways at run time and the operation of load-balancing, as implemented by process migration. In order to do this, we propose a simple performance model, whose parameters represent features of the program's execution such as the frequency and regularity of the changes in computational characteristics, and conduct a series of experiments involving simulated executions of synthetic programs with controlled parameter values. From these we can deduce the relative importance of the parameters from the point of view of their influence on performance. We can explain our observations in terms of a simplified stochastic model that relates local changes in load to global behaviour. We show that the dynamics of load-balancing can be represented approximately by a first-order difference equation, and that the distributed process migration algorithm is consistent with a behaviour on the global scale which can be regarded as that of a traditional feedback controller. 相似文献

16.

基于RapidIO的MPI设备层的设计与实现

金亨科雷咏梅梁基《计算机工程与设计》2008,29(21)

针对RapidlO网络的特点,分析MPICH2的层次设计以及建立在TCP,SCTP网络通信协议上的MPI通信方法,通过重新定义ADI3下的CH3层,设计并实现了一种基于RapidIO的MPI设备层,建立了从MPI到RapidlO的通信通道并实现了多流通信的思想.通过在装有RapidIO网卡机器上的实验表明,在带宽和延迟通信性能上,这种专用的MPI设备层要比以太网模拟器表现出色,而且对于大数据量的通信,性能表现更好. 相似文献

17.

面向大规模MPI程序的应用级checkpointing技术

王攀峰杜云飞周海芳杨学军《计算机研究与发展》2009,46(Z2)

应用级checkpointing是一种在大规模科学计算领域中备受关注的容错技术.但是应用级checkpointing技术要求用户决定哪些是需要保存的关键数据,这增加了用户的负担.介绍一个基于MPI并行程序活跃变量分析的源到源的预编译工具ALEC,它可用于辅助应用级checkpointing.在一个512处理器的Cluster系统上,对经过ALEC编译的5个Fortran/MPI应用进行了性能评测.结果表明,ALEC能够有效减小checkpoint的大小和应用级checkpointing保存和恢复的开销. 相似文献

18.

Replication-Based Fault Tolerance for MPI Applications

Walters John Paul Chaudhary Vipin 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(7):997-1010

As computational clusters increase in size, their mean time to failure reduces drastically. Typically, checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. We propose a scalable replication-based MPI checkpointing facility. Our reference implementation is based on LAM/MPI; however, it is directly applicable to any MPI implementation. We extend the existing state of fault-tolerant MPI with asynchronous replication, eliminating the need for central or network storage. We evaluate centralized storage, a Sun-X4500-based solution, an EMC storage area network (SAN), and the Ibrix commercial parallel file system and show that they are not scalable, particularly after 64 CPUs. We demonstrate the low overhead of our checkpointing and replication scheme with the NAS Parallel Benchmarks and the High-Performance LINPACK benchmark with tests up to 256 nodes while demonstrating that checkpointing and replication can be achieved with a much lower overhead than that provided by current techniques. Finally, we show that the monetary cost of our solution is as low as 25 percent of that of a typical SAN/parallel-file-system-equipped storage system. 相似文献

19.

MPI全互换通信的性能优化

罗秋明王梅雷海军张红兵《计算机工程与应用》2006,42(16):127-128,170

MPI全互换操作是集群计算机上进行仿真计算时常用的通信操作之一,用于各计算节点间交换上一步骤的中间计算结果。由于全互换通信的密集多对多通信容易产生接收端的阻塞从而增加通信延时,因此通过形成环状的多次规律且有序的通信过程来优化全互换通信操作过程,在大数据量的全互换通信中可以获得明显的性能提升。相似文献

20.

MPI容错机制的研究

崔丽青徐炜民《计算机工程》2004,30(16):88-90

MPI是广泛应用于集群系统的并行程序开发环境,MPI的容错是集群系统可靠性的关键问题。该文讨论了MPI标准中的容错,结合协调设置检查点和同步卷回等机制设计了基于检查点的卷回恢复系统MPIChaRR、该系统应用于Linux集群机,MPICH应用程序运行中的节点故障恢复是对用户透明的。相似文献