共查询到20条相似文献,搜索用时 15 毫秒
1.
A Performance Analysis Tool for PVM Parallel Programs 总被引:1,自引:0,他引:1
ChenWang YinLiu ChangjunJiang ZhaoqingZhang 《计算机工程与应用》2004,40(29):103-105,112
In this paper,we introduce the design and implementation of ParaVT,which is a visual performance analysis and parallel debugging tool.In ParaVT,we propose an automated instrumentation mechanism.Based on this mechanism,ParaVT automatically analyzes the performance bottleneck of parallel applications and provides a visual user interface to monitor and analyze the performance of parallel programs.In addition,it also supports certain extensions. 相似文献
2.
Jaejin Lee Samuel P. Midkiff David A. Padua 《International journal of parallel programming》1998,26(5):563-589
In this paper, we present a constant propagation algorithm for explicitly parallel programs, which we call the Concurrent Sparse Conditional Constant propagation algorithm. This algorithm is an extension of the Sparse Conditional Constant propagation algorithm. Without considering the interaction between threads, classical optimizations lead to an incorrect program transformation for parallel programs. To make analyzing parallel programs possible, a new intermediate representation is needed. We introduce the Concurrent Static Single Assignment (CSSA) form to represent explicitly parallel programs with interleaving semantics and synchronization. The only parallel construct considered in this paper is cobegin/coend. A new confluence function, the -assignment, which summarizes the information of interleaving statements between threads, is introduced. The Concurrent Control Flow Graph, which contains information about conflicting statements, control flow, and synchronization, is used as an underlying representation for the CSSA from. 相似文献
3.
Joshua Sack 《Journal of Logic, Language and Information》2008,17(2):183-216
This paper adds temporal logic to public announcement logic (PAL) and dynamic epistemic logic (DEL). By adding a previous-time
operator to PAL, we express in the language statements concerning the muddy children puzzle and sum and product. We also express a true statement that an agent’s beliefs about another agent’s knowledge flipped twice, and use a sound
proof system to prove this statement. Adding a next-time operator to PAL, we provide formulas that express that belief revision
does not take place in PAL. We also discuss relationships between announcements and the new knowledge agents thus acquire;
such relationships are related to learning and to Fitch’s paradox. We also show how inverse programs and hybrid logic each
can be used to help determine whether or not an arbitrary structure represents the play of a game. We then add a past-time
operator to DEL, and discuss the importance of adding yet another component to the language in order to prove completeness. 相似文献
4.
Perceptual Temporal Quality Metric for Compressed Video 总被引:2,自引:0,他引:2
Kai-Chieh Yang Guest C.C. El-Maleh K. Das P.K. 《Multimedia, IEEE Transactions on》2007,9(7):1528-1535
This paper presents a metric to quantify frame loss according to the impact on perceived temporal quality. This metric particularly aims at measuring the temporal quality degradation caused by both regular and irregular frame loss. Experimental results with subjective viewing demonstrate high performance on prediction of perceptual temporal quality. 相似文献
5.
本文基于静态相关性分析和动态调整相结合的方法,提出了一种逻辑程序的执行模型,它不仅开发了“与“并行,同进也开发了一定的“或“并行,从而有效地加速了逻辑程序的执行。 相似文献
6.
Christopher Brown Marco Danelutto Kevin Hammond Peter Kilpatrick Archibald Elliott 《International journal of parallel programming》2014,42(4):564-582
This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code. 相似文献
7.
Kento Emoto Zhenjiang Hu Kazuhiko Kakehi Masato Takeichi 《International journal of parallel programming》2007,35(6):615-658
Computations on two-dimensional arrays such as matrices and images are one of the most fundamental and ubiquitous things in
computational science and its vast application areas, but development of efficient parallel programs on two-dimensional arrays
is known to be hard. In this paper, we propose a compositional framework that supports users, even with little knowledge about
parallel machines, to develop both correct and efficient parallel programs on dense two-dimensional arrays systematically.
The key feature of our framework is a novel use of the abide-tree representation of two-dimensional arrays. The presentation
not only inherits the advantages of tree representations of matrices where recursive blocked algorithms can be defined to
achieve better performance, but also supports transformational development of parallel programs and architecture-independent
implementation owing to its solid theoretical foundation – the theory of constructive algorithmics. 相似文献
8.
9.
We apply the methodology of competitive analysis of algorithms to the implementation of programs on parallel machines. We
consider the problem of finding the best on-line distributed scheduling strategy that executes in parallel an unknown directed
acyclic graph (dag) which represents the data dependency relation graph of a parallel program and which is revealed as execution
proceeds. We study the competitive ratio of some important classes of dags assuming a fixed communication delay ratio τ that captures the average interprocessor communication measured in instruction cycles. We provide competitive algorithms
for divide-and-conquer dags, trees, and general dags, when the number of processors depends on the size of the input dag and
when the number of processors is fixed. Our major result is a lower bound Ω (τ
/ log
τ ) of the competitive ratio for trees; it shows that it is impossible to design compilers that produce almost optimal execution
code for all parallel programs. This fundamental result holds for almost any reasonable distributed memory parallel computation
model, including the LogP and BSP model.
Received March 5, 1996; revised March 11, 1997. 相似文献
10.
An important component of a programming language for writing operating systems, or other large parallel systems, is the set of access control facilities. Two principles for access control, expressive power and access validation, are discussed. Then two new language mechanisms are presented: one for expressing the static structure and access rights of parallel systems, the other for controlling dynamic access to shared objects (monitors). The use of the proposed mechanisms is illustrated by examples including a file system. Finally, the relationships between the mechanisms, access validation, and the safety problem are discussed. 相似文献
11.
This paper introduces a formal model of a method for automated adjustment of parallel applications (autotuning). The software implementation of this model is described in the form of a flexible software framework for automatic generation of autotuners using an underlain term rewriting system and expert knowledge as a source of optimizing transformations. 相似文献
12.
A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basicallyp(≤n) processors are used for a problem withnvariables and a globally optimal solution is found effectively in a finite number of steps. Computational results are presented for test problems with a number of variables up to 80 and 63 linear constraints (plus nonnegativity constraints). These results were obtained on a distributed-memory MIMD parallel computer, DELTA, by running both serial and parallel algorithms with double precision. Also, based on 40 randomly generated problems of the same size, with 16 variables and 32 linear constraints (plusx≥ 0), the numerical results from different number processors are reported, including the serial algorithm's. 相似文献
13.
14.
《IEEE transactions on pattern analysis and machine intelligence》1979,(6):558-574
An approach to proving paralel programs correct is presented. The steps are 1) model the paralel program, 2) prove partial correctness (proper synchronization), and 3) prove the absence of deadlock, livelock, and infinite loops. The parallel program model is based on KeUler's model. The main contributions of the paper are techniques for proving the absence of deadlock and livelock. A connection is made between Keler's work and Dijkstra's work with serial non-deterministic programs. It is shown how a variant function may be used to prove finite termination, even if the variant function is not strictly decreasing, and how finite termination can be used to prove the absence of livelock. Handling of the finite delay assumption is also discussed. The illustrative examples indude one which occurred in a commercial environment and a classic synchronization problem solved without the aid of special synchronization primitives. 相似文献
15.
Programming and Computer Software - The DVM-approach to the development of parallel programs for heterogeneous computer clusters with accelerators. The basic capabilities of DVM and SAPFOR , which... 相似文献
16.
E. S. Borisov 《Cybernetics and Systems Analysis》2004,40(3):428-437
Parallel computers execute parallel programs that are transferred to other parallel architectures with difficultly and require special training of programmers. A parallelizing system is proposed that helps one to solve this problem. 相似文献
17.
18.
Virginia Niculescu 《The Journal of supercomputing》2004,29(1):5-25
Data distributions have a serious impact on time complexity of parallel programs, developed based on domain decomposition. A new kind of distributions—set distributions, based on set-valued mappings, is introduced. These distributions assign a data object to more than one process. The set distributions can be used especially when the number of processes is greater than the data input size, but, sometimes using set distributions can lead to efficient general parallel algorithms. The work-load properties of these distributions and their impact on the number of communications are discussed. In order to illustrate the implications of data distributions in the construction of parallel programs, some examples are presented. Two parallel algorithms for computation of Lagrange interpolation polynomial are developed, starting from simple distributions and set distributions. 相似文献
19.
20.
Main Memory-Based Algorithms for Efficient Parallel Aggregation for Temporal Databases 总被引:1,自引:0,他引:1
Dengfeng Gao Jose Alvin G. Gendrano Bongki Moon Richard T. Snodgrass Minseok Park Bruce C. Huang Jim M. Rodrigue 《Distributed and Parallel Databases》2004,16(2):123-163
The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge. 相似文献