共查询到20条相似文献,搜索用时 0 毫秒
1.
A. I. Avetisyan S. S. Gaisaryan V. P. Ivannikov V. A. Padaryan 《Automation and Remote Control》2007,68(5):750-759
A model of parallel program that can be effectively interpreted on the development computer guaranteeing the possibility of a sufficiently precise prediction of real run time for a simulated parallel program at the prescribed computer system is studied. The model is worked out for parallel programs with explicit message passing written in the Java language with MPI library access and is included into the composition of ParJava environment. The model is obtained by transforming the program control tree that can be constructed for Java programs by modifying the abstract syntax tree. To model communication functions, the model LogGP is used which allows taking into consideration the specific character of the communication network of the distributed computer system. 相似文献
2.
Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th... 相似文献
3.
One of the key elements required for writing self-healing applications for distributed and dynamic computing environments
is checkpointing. Checkpointing is a mechanism by which an application is made resilient to failures by storing its state
periodically to the disk. The main goal of this research is to enable non-invasive reengineering of existing applications
to insert Application-Level Checkpointing (ALC) mechanism. The Domain-Specific Language (DSL) developed in this research serves
as a perfect means towards this end and is used for obtaining the ALC-specifications from the end-users. These specifications
are used for generating and inserting the actual checkpointing code into the existing application. The performance of the
application having the generated checkpointing code is comparable to the performance of the application in which the checkpointing
code was inserted manually. With slight modifications, the DSL developed in this research can be used for specifying the ALC
mechanism in several base languages (e.g., C/C++, Java, and FORTRAN). 相似文献
4.
Neophytou N. Evripidou P. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(9):986-995
This paper describes Net-dbx, a tool that utilizes Java and other World Wide Web tools for the debugging of MPI programs from anywhere in the Internet. Net-dbx is a source-level interactive debugger with the full power of gdb (the GNU Debugger) augmented with the debug functionality of the public-domain MPI implementation environments. The main effort was on a low overhead, yet powerful, graphical interface supported by low-bandwidth connections. The portability of the tool is of great importance as well because it enables the tool to be used on heterogeneous nodes that participate in an MPI multicomputer. Both needs are satisfied a great deal by the use of WWW browsing tools and the Java programming language. The user of our system simply points his/her browser to the Net-dbx page, logs in to the destination system, and starts debugging by interacting with the tool, just as with any GUI environment. The user can dynamically select which MPI processes to view/debug. A special WWW-based environment has been designed and implemented to host the system prototype 相似文献
5.
In-memory hash tables for accumulating text vocabularies 总被引:5,自引:0,他引:5
6.
Tobias Hilbrich Matthias S. Müller Bettina Krammer 《International journal of parallel programming》2009,37(3):277-291
The MPI interface is the de-facto standard for message passing applications, but it is also complex and defines several usage
patterns as erroneous. A current trend is the investigation of hybrid programming techniques that use MPI processes and multiple
threads per process. As a result, more and more MPI implementations support multi-threading, which are restricted by several
rules of the MPI standard. In order to support developers of hybrid MPI applications, we present extensions to the MPI correctness
checking tool Marmot. Basic extensions make it aware of OpenMP multi-threading, while further ones add new correctness checks.
As a result, it is possible to detect errors that actually occur in a run with Marmot. However, some errors only occur for
certain execution orders, thus, we present a novel approach using artificial data races, which allows us to employ thread
checking tools, e.g., Intel Thread Checker, to detect MPI usage errors. 相似文献
7.
Yiu W.-P.K. Wong K.-F.S. Chan S.-H.G. Wan-Ching Wong Qian Zhang Wen-Wu Zhu Ya-Qin Zhang 《Multimedia, IEEE Transactions on》2006,8(2):219-232
We consider media streaming using application-level multicast (ALM) where packet loss has to be recovered via retransmission in a timely manner. Since packets may be lost due to congestion, node failures, and join and leave dynamics, traditional "vertical" recovery approach where upstream nodes retransmit the lost packets is no longer effective. We therefore propose lateral error recovery (LER). In LER, hosts are divided into a number of planes, each of which forms an independent ALM tree. Since error correlation across planes is low, a node effectively recovers its error by "laterally" requesting retransmission from nearby nodes in other planes. We present analysis on the complexity and recovery delay on LER. Using Internet-like topologies, we show via simulations that LER is an effective error recovery mechanism. It achieves low overhead in terms of delivery delay (i.e., relative delay penalty) and physical link stress. As compared with traditional recovery schemes, LER attains much lower residual loss rate (i.e., loss rate after retransmission) under a certain deadline constraint. The performance can be substantially improved in the presence of some reliable proxies. 相似文献
8.
9.
Simon Dooms Pieter Audenaert Jan Fostier Toon De Pessemier Luc Martens 《Journal of Intelligent Information Systems》2014,42(3):645-669
Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node). 相似文献
10.
As utility computing is widely deployed, organizations and researchers are turning to the next generation of cloud systems: federating public clouds, integrating private and public clouds, and merging resources at all levels (IaaS, PaaS, SaaS). Adaptive systems can help address the challenge of managing this heterogeneous collection of resources. While services and libraries exist for basic management tasks that enable implementing decisions made by the manager, monitoring is an open challenge. We define a set of requirements for aggregating monitoring data from a heterogeneous collections of resources, sufficient to support adaptive systems. We present and implement an architecture using stream processing to provide near-realtime, cross-boundary, distributed, scalable, fault-tolerant monitoring. A case study illustrates the value of collecting and aggregating metrics from disparate sources. A set of experiments shows the feasibility of our prototype with regard to latency, overhead, and cost effectiveness. 相似文献
11.
Vincent W. Freeh Nandini Kappiah David K. Lowenthal Tyler K. Bletsch 《Journal of Parallel and Distributed Computing》2008
Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. As a result, improving energy efficiency of nodes on HPC machines has become important, and the prevalence of power-scalable clusters, where the frequency and voltage can be dynamically modified, has increased. 相似文献
12.
Specifying and enforcing application-level Web security policies 总被引:1,自引:0,他引:1
Application-level Web security refers to vulnerabilities inherent in the code of a Web-application itself (irrespective of the technologies in which it is implemented or the security of the Web-server/back-end database on which it is built). In the last few months, application-level vulnerabilities have been exploited with serious consequences: Hackers have tricked e-commerce sites into shipping goods for no charge, usernames and passwords have been harvested, and confidential information (such as addresses and credit-card numbers) has been leaked. We investigate new tools and techniques which address the problem of application-level Web security. We 1) describe a scalable structuring mechanism facilitating the abstraction of security policies from large Web-applications developed in heterogeneous multiplatform environments; 2) present a set of tools which assist programmers in developing secure applications which are resilient to a wide range of common attacks; and 3) report results and experience arising from our implementation of these techniques. 相似文献
13.
Most Internet services (e-commerce, search engines, etc.) suffer faults. Quickly detecting these faults can be the largest bottleneck in improving availability of the system. We present Pinpoint, a methodology for automating fault detection in Internet services by: 1) observing low-level internal structural behaviors of the service; 2) modeling the majority behavior of the system as correct; and 3) detecting anomalies in these behaviors as possible symptoms of failures. Without requiring any a priori application-specific information, Pinpoint correctly detected 89%-96% of major failures in our experiments, as compared with 20%-70% detected by current application-generic techniques. 相似文献
14.
在集群并行计算过程中常常需要将计算结果图形化输出,虽然在MPICH软件包中的MPE包含了基本的并行图形输出功能,但是由于它提供的绘图功能极其简单,不能满足有关图像处理等方面应用的输出结果显示,因此文章通过对MPE图形功能的复杂核心代码加以分析并完成扩展MPE库,为一些复杂图形功能增加新图形库函数来完成输出任务,并减轻编程难度提高效率,这个扩展方法同样适用于其它领域的MPI并行应用程序的结果可视化。 相似文献
15.
We have studied the interaction between process-based parallel programs whose characteristics change in various ways at run time and the operation of load-balancing, as implemented by process migration. In order to do this, we propose a simple performance model, whose parameters represent features of the program's execution such as the frequency and regularity of the changes in computational characteristics, and conduct a series of experiments involving simulated executions of synthetic programs with controlled parameter values. From these we can deduce the relative importance of the parameters from the point of view of their influence on performance. We can explain our observations in terms of a simplified stochastic model that relates local changes in load to global behaviour. We show that the dynamics of load-balancing can be represented approximately by a first-order difference equation, and that the distributed process migration algorithm is consistent with a behaviour on the global scale which can be regarded as that of a traditional feedback controller. 相似文献
16.
针对RapidlO网络的特点,分析MPICH2的层次设计以及建立在TCP,SCTP网络通信协议上的MPI通信方法,通过重新定义ADI3下的CH3层,设计并实现了一种基于RapidIO的MPI设备层,建立了从MPI到RapidlO的通信通道并实现了多流通信的思想.通过在装有RapidIO网卡机器上的实验表明,在带宽和延迟通信性能上,这种专用的MPI设备层要比以太网模拟器表现出色,而且对于大数据量的通信,性能表现更好. 相似文献
17.
应用级checkpointing是一种在大规模科学计算领域中备受关注的容错技术.但是应用级checkpointing技术要求用户决定哪些是需要保存的关键数据,这增加了用户的负担.介绍一个基于MPI并行程序活跃变量分析的源到源的预编译工具ALEC,它可用于辅助应用级checkpointing.在一个512处理器的Cluster系统上,对经过ALEC编译的5个Fortran/MPI应用进行了性能评测.结果表明,ALEC能够有效减小checkpoint的大小和应用级checkpointing保存和恢复的开销. 相似文献
18.
Walters John Paul Chaudhary Vipin 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(7):997-1010
As computational clusters increase in size, their mean time to failure reduces drastically. Typically, checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. We propose a scalable replication-based MPI checkpointing facility. Our reference implementation is based on LAM/MPI; however, it is directly applicable to any MPI implementation. We extend the existing state of fault-tolerant MPI with asynchronous replication, eliminating the need for central or network storage. We evaluate centralized storage, a Sun-X4500-based solution, an EMC storage area network (SAN), and the Ibrix commercial parallel file system and show that they are not scalable, particularly after 64 CPUs. We demonstrate the low overhead of our checkpointing and replication scheme with the NAS Parallel Benchmarks and the High-Performance LINPACK benchmark with tests up to 256 nodes while demonstrating that checkpointing and replication can be achieved with a much lower overhead than that provided by current techniques. Finally, we show that the monetary cost of our solution is as low as 25 percent of that of a typical SAN/parallel-file-system-equipped storage system. 相似文献
19.
MPI全互换操作是集群计算机上进行仿真计算时常用的通信操作之一,用于各计算节点间交换上一步骤的中间计算结果。由于全互换通信的密集多对多通信容易产生接收端的阻塞从而增加通信延时,因此通过形成环状的多次规律且有序的通信过程来优化全互换通信操作过程,在大数据量的全互换通信中可以获得明显的性能提升。 相似文献
20.
MPI是广泛应用于集群系统的并行程序开发环境,MPI的容错是集群系统可靠性的关键问题。该文讨论了MPI标准中的容错,结合协调设置检查点和同步卷回等机制设计了基于检查点的卷回恢复系统MPIChaRR、该系统应用于Linux集群机,MPICH应用程序运行中的节点故障恢复是对用户透明的。 相似文献