首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A Performance Analysis Tool for PVM Parallel Programs   总被引:1,自引:0,他引:1  
In this paper,we introduce the design and implementation of ParaVT,which is a visual performance analysis and parallel debugging tool.In ParaVT,we propose an automated instrumentation mechanism.Based on this mechanism,ParaVT automatically analyzes the performance bottleneck of parallel applications and provides a visual user interface to monitor and analyze the performance of parallel programs.In addition,it also supports certain extensions.  相似文献   

2.
In this paper, we present a constant propagation algorithm for explicitly parallel programs, which we call the Concurrent Sparse Conditional Constant propagation algorithm. This algorithm is an extension of the Sparse Conditional Constant propagation algorithm. Without considering the interaction between threads, classical optimizations lead to an incorrect program transformation for parallel programs. To make analyzing parallel programs possible, a new intermediate representation is needed. We introduce the Concurrent Static Single Assignment (CSSA) form to represent explicitly parallel programs with interleaving semantics and synchronization. The only parallel construct considered in this paper is cobegin/coend. A new confluence function, the -assignment, which summarizes the information of interleaving statements between threads, is introduced. The Concurrent Control Flow Graph, which contains information about conflicting statements, control flow, and synchronization, is used as an underlying representation for the CSSA from.  相似文献   

3.
This paper adds temporal logic to public announcement logic (PAL) and dynamic epistemic logic (DEL). By adding a previous-time operator to PAL, we express in the language statements concerning the muddy children puzzle and sum and product. We also express a true statement that an agent’s beliefs about another agent’s knowledge flipped twice, and use a sound proof system to prove this statement. Adding a next-time operator to PAL, we provide formulas that express that belief revision does not take place in PAL. We also discuss relationships between announcements and the new knowledge agents thus acquire; such relationships are related to learning and to Fitch’s paradox. We also show how inverse programs and hybrid logic each can be used to help determine whether or not an arbitrary structure represents the play of a game. We then add a past-time operator to DEL, and discuss the importance of adding yet another component to the language in order to prove completeness.  相似文献   

4.
Perceptual Temporal Quality Metric for Compressed Video   总被引:2,自引:0,他引:2  
This paper presents a metric to quantify frame loss according to the impact on perceived temporal quality. This metric particularly aims at measuring the temporal quality degradation caused by both regular and irregular frame loss. Experimental results with subjective viewing demonstrate high performance on prediction of perceptual temporal quality.  相似文献   

5.
本文基于静态相关性分析和动态调整相结合的方法,提出了一种逻辑程序的执行模型,它不仅开发了“与“并行,同进也开发了一定的“或“并行,从而有效地加速了逻辑程序的执行。  相似文献   

6.
This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code.  相似文献   

7.
Computations on two-dimensional arrays such as matrices and images are one of the most fundamental and ubiquitous things in computational science and its vast application areas, but development of efficient parallel programs on two-dimensional arrays is known to be hard. In this paper, we propose a compositional framework that supports users, even with little knowledge about parallel machines, to develop both correct and efficient parallel programs on dense two-dimensional arrays systematically. The key feature of our framework is a novel use of the abide-tree representation of two-dimensional arrays. The presentation not only inherits the advantages of tree representations of matrices where recursive blocked algorithms can be defined to achieve better performance, but also supports transformational development of parallel programs and architecture-independent implementation owing to its solid theoretical foundation – the theory of constructive algorithmics.  相似文献   

8.
目前,随着大规模并行计算的高速发展,并行程序性能分析与建模的地位日益重要,而并行程序性能数据的采集是进行性能分析的基础。硬件计数器的使用使人们能够更加便利地在程序执行过程中采集性能数据。文章讨论了OpenMP并行程序的性能数据采集技术,并介绍一种利用PAPI进行数据采集的实现方法。  相似文献   

9.
We apply the methodology of competitive analysis of algorithms to the implementation of programs on parallel machines. We consider the problem of finding the best on-line distributed scheduling strategy that executes in parallel an unknown directed acyclic graph (dag) which represents the data dependency relation graph of a parallel program and which is revealed as execution proceeds. We study the competitive ratio of some important classes of dags assuming a fixed communication delay ratio τ that captures the average interprocessor communication measured in instruction cycles. We provide competitive algorithms for divide-and-conquer dags, trees, and general dags, when the number of processors depends on the size of the input dag and when the number of processors is fixed. Our major result is a lower bound Ω (τ / log τ ) of the competitive ratio for trees; it shows that it is impossible to design compilers that produce almost optimal execution code for all parallel programs. This fundamental result holds for almost any reasonable distributed memory parallel computation model, including the LogP and BSP model. Received March 5, 1996; revised March 11, 1997.  相似文献   

10.
An important component of a programming language for writing operating systems, or other large parallel systems, is the set of access control facilities. Two principles for access control, expressive power and access validation, are discussed. Then two new language mechanisms are presented: one for expressing the static structure and access rights of parallel systems, the other for controlling dynamic access to shared objects (monitors). The use of the proposed mechanisms is illustrated by examples including a file system. Finally, the relationships between the mechanisms, access validation, and the safety problem are discussed.  相似文献   

11.
This paper introduces a formal model of a method for automated adjustment of parallel applications (autotuning). The software implementation of this model is described in the form of a flexible software framework for automatic generation of autotuners using an underlain term rewriting system and expert knowledge as a source of optimizing transformations.  相似文献   

12.
A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basicallyp(≤n) processors are used for a problem withnvariables and a globally optimal solution is found effectively in a finite number of steps. Computational results are presented for test problems with a number of variables up to 80 and 63 linear constraints (plus nonnegativity constraints). These results were obtained on a distributed-memory MIMD parallel computer, DELTA, by running both serial and parallel algorithms with double precision. Also, based on 40 randomly generated problems of the same size, with 16 variables and 32 linear constraints (plusx≥ 0), the numerical results from different number processors are reported, including the serial algorithm's.  相似文献   

13.
14.
An approach to proving paralel programs correct is presented. The steps are 1) model the paralel program, 2) prove partial correctness (proper synchronization), and 3) prove the absence of deadlock, livelock, and infinite loops. The parallel program model is based on KeUler's model. The main contributions of the paper are techniques for proving the absence of deadlock and livelock. A connection is made between Keler's work and Dijkstra's work with serial non-deterministic programs. It is shown how a variant function may be used to prove finite termination, even if the variant function is not strictly decreasing, and how finite termination can be used to prove the absence of livelock. Handling of the finite delay assumption is also discussed. The illustrative examples indude one which occurred in a commercial environment and a classic synchronization problem solved without the aid of special synchronization primitives.  相似文献   

15.
Programming and Computer Software - The DVM-approach to the development of parallel programs for heterogeneous computer clusters with accelerators. The basic capabilities of DVM and SAPFOR , which...  相似文献   

16.
Parallel computers execute parallel programs that are transferred to other parallel architectures with difficultly and require special training of programmers. A parallelizing system is proposed that helps one to solve this problem.  相似文献   

17.
18.
Data distributions have a serious impact on time complexity of parallel programs, developed based on domain decomposition. A new kind of distributions—set distributions, based on set-valued mappings, is introduced. These distributions assign a data object to more than one process. The set distributions can be used especially when the number of processes is greater than the data input size, but, sometimes using set distributions can lead to efficient general parallel algorithms. The work-load properties of these distributions and their impact on the number of communications are discussed. In order to illustrate the implications of data distributions in the construction of parallel programs, some examples are presented. Two parallel algorithms for computation of Lagrange interpolation polynomial are developed, starting from simple distributions and set distributions.  相似文献   

19.
20.
The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号