期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Performance Analysis Tool for PVM Parallel Programs 总被引：1，自引：0，他引：1

ChenWang YinLiu ChangjunJiang ZhaoqingZhang 《计算机工程与应用》2004,40(29):103-105,112

In this paper,we introduce the design and implementation of ParaVT,which is a visual performance analysis and parallel debugging tool.In ParaVT,we propose an automated instrumentation mechanism.Based on this mechanism,ParaVT automatically analyzes the performance bottleneck of parallel applications and provides a visual user interface to monitor and analyze the performance of parallel programs.In addition,it also supports certain extensions. 相似文献

2.

A Constant Propagation Algorithm for Explicitly Parallel Programs

Jaejin Lee Samuel P. Midkiff David A. Padua 《International journal of parallel programming》1998,26(5):563-589

In this paper, we present a constant propagation algorithm for explicitly parallel programs, which we call the Concurrent Sparse Conditional Constant propagation algorithm. This algorithm is an extension of the Sparse Conditional Constant propagation algorithm. Without considering the interaction between threads, classical optimizations lead to an incorrect program transformation for parallel programs. To make analyzing parallel programs possible, a new intermediate representation is needed. We introduce the Concurrent Static Single Assignment (CSSA) form to represent explicitly parallel programs with interleaving semantics and synchronization. The only parallel construct considered in this paper is cobegin/coend. A new confluence function, the -assignment, which summarizes the information of interleaving statements between threads, is introduced. The Concurrent Control Flow Graph, which contains information about conflicting statements, control flow, and synchronization, is used as an underlying representation for the CSSA from. 相似文献

3.

Temporal Languages for Epistemic Programs

Joshua Sack 《Journal of Logic, Language and Information》2008,17(2):183-216

This paper adds temporal logic to public announcement logic (PAL) and dynamic epistemic logic (DEL). By adding a previous-time operator to PAL, we express in the language statements concerning the muddy children puzzle and sum and product. We also express a true statement that an agent’s beliefs about another agent’s knowledge flipped twice, and use a sound proof system to prove this statement. Adding a next-time operator to PAL, we provide formulas that express that belief revision does not take place in PAL. We also discuss relationships between announcements and the new knowledge agents thus acquire; such relationships are related to learning and to Fitch’s paradox. We also show how inverse programs and hybrid logic each can be used to help determine whether or not an arbitrary structure represents the play of a game. We then add a past-time operator to DEL, and discuss the importance of adding yet another component to the language in order to prove completeness. 相似文献

4.

Perceptual Temporal Quality Metric for Compressed Video 总被引：2，自引：0，他引：2

Kai-Chieh Yang Guest C.C. El-Maleh K. Das P.K. 《Multimedia, IEEE Transactions on》2007,9(7):1528-1535

This paper presents a metric to quantify frame loss according to the impact on perceived temporal quality. This metric particularly aims at measuring the temporal quality degradation caused by both regular and irregular frame loss. Experimental results with subjective viewing demonstrate high performance on prediction of perceptual temporal quality. 相似文献

5.

逻辑程序的一种并行执行模型

谢立赵宽奇《计算机研究与发展》1994,31(5):1-6

本文基于静态相关性分析和动态调整相结合的方法，提出了一种逻辑程序的执行模型，它不仅开发了“与“并行，同进也开发了一定的“或“并行，从而有效地加速了逻辑程序的执行。相似文献

6.

Cost-Directed Refactoring for Parallel Erlang Programs

Christopher Brown Marco Danelutto Kevin Hammond Peter Kilpatrick Archibald Elliott 《International journal of parallel programming》2014,42(4):564-582

This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code. 相似文献

7.

A Compositional Framework for Developing Parallel Programs on Two-Dimensional Arrays

Kento Emoto Zhenjiang Hu Kazuhiko Kakehi Masato Takeichi 《International journal of parallel programming》2007,35(6):615-658

Computations on two-dimensional arrays such as matrices and images are one of the most fundamental and ubiquitous things in computational science and its vast application areas, but development of efficient parallel programs on two-dimensional arrays is known to be hard. In this paper, we propose a compositional framework that supports users, even with little knowledge about parallel machines, to develop both correct and efficient parallel programs on dense two-dimensional arrays systematically. The key feature of our framework is a novel use of the abide-tree representation of two-dimensional arrays. The presentation not only inherits the advantages of tree representations of matrices where recursive blocked algorithms can be defined to achieve better performance, but also supports transformational development of parallel programs and architecture-independent implementation owing to its solid theoretical foundation – the theory of constructive algorithmics. 相似文献

8.

OpenMP并行程序的性能数据采集

富弘毅周海芳杨学军《计算机工程》2005,31(19):67-69,78

目前,随着大规模并行计算的高速发展,并行程序性能分析与建模的地位日益重要,而并行程序性能数据的采集是进行性能分析的基础。硬件计数器的使用使人们能够更加便利地在程序执行过程中采集性能数据。文章讨论了OpenMP并行程序的性能数据采集技术,并介绍一种利用PAPI进行数据采集的实现方法。相似文献

9.

Competitive Implementation of Parallel Programs

X. Deng E. Koutsoupias P. MacKenzie 《Algorithmica》1999,23(1):14-30

We apply the methodology of competitive analysis of algorithms to the implementation of programs on parallel machines. We consider the problem of finding the best on-line distributed scheduling strategy that executes in parallel an unknown directed acyclic graph (dag) which represents the data dependency relation graph of a parallel program and which is revealed as execution proceeds. We study the competitive ratio of some important classes of dags assuming a fixed communication delay ratio τ that captures the average interprocessor communication measured in instruction cycles. We provide competitive algorithms for divide-and-conquer dags, trees, and general dags, when the number of processors depends on the size of the input dag and when the number of processors is fixed. Our major result is a lower bound Ω (τ / log τ ) of the competitive ratio for trees; it shows that it is impossible to design compilers that produce almost optimal execution code for all parallel programs. This fundamental result holds for almost any reasonable distributed memory parallel computation model, including the LogP and BSP model. Received March 5, 1996; revised March 11, 1997. 相似文献

10.

Access Control in Parallel Programs

《IEEE transactions on pattern analysis and machine intelligence》1979,(1):1-9

An important component of a programming language for writing operating systems, or other large parallel systems, is the set of access control facilities. Two principles for access control, expressive power and access validation, are discussed. Then two new language mechanisms are presented: one for expressing the static structure and access rights of parallel systems, the other for controlling dynamic access to shared objects (monitors). The use of the proposed mechanisms is illustrated by examples including a file system. Finally, the relationships between the mechanisms, access validation, and the safety problem are discussed. 相似文献

11.

Method of Automated Generation of Autotuners for Parallel Programs

P. A. Ivanenko A. Yu. Doroshenko 《Cybernetics and Systems Analysis》2014,50(3):465-475

This paper introduces a formal model of a method for automated adjustment of parallel applications (autotuning). The software implementation of this model is described in the form of a flexible software framework for automatic generation of autotuners using an underlain term rewriting system and expert knowledge as a source of optimizing transformations. 相似文献

12.

A Parallel Algorithm for Linear Programs with an Additional Reverse Convex Constraint

Shih-Mim Liu G.P. Papavassilopoulos 《Journal of Parallel and Distributed Computing》1997,45(2):91

A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basicallyp(≤n) processors are used for a problem withnvariables and a globally optimal solution is found effectively in a finite number of steps. Computational results are presented for test problems with a number of variables up to 80 and 63 linear constraints (plus nonnegativity constraints). These results were obtained on a distributed-memory MIMD parallel computer, DELTA, by running both serial and parallel algorithms with double precision. Also, based on 40 randomly generated problems of the same size, with 16 variables and 32 linear constraints (plusx≥ 0), the numerical results from different number processors are reported, including the serial algorithm's. 相似文献

13.

Parallel Processor Programs in the Federal Government

Schneck P.B. Austin D. Squires S.L. Lehmann J. Mizeil D. Wallgren K. 《Computer》1985,18(6):43-56

相似文献

14.

Proving Total Correctness of Parallel Programs

《IEEE transactions on pattern analysis and machine intelligence》1979,(6):558-574

An approach to proving paralel programs correct is presented. The steps are 1) model the paralel program, 2) prove partial correctness (proper synchronization), and 3) prove the absence of deadlock, livelock, and infinite loops. The parallel program model is based on KeUler's model. The main contributions of the paper are techniques for proving the absence of deadlock and livelock. A connection is made between Keler's work and Dijkstra's work with serial non-deterministic programs. It is shown how a variant function may be used to prove finite termination, even if the variant function is not strictly decreasing, and how finite termination can be used to prove the absence of livelock. Handling of the finite delay assumption is also discussed. The illustrative examples indude one which occurred in a commercial environment and a classic synchronization problem solved without the aid of special synchronization primitives. 相似文献

15.

DVM-Approach to the Automation of the Development of Parallel Programs for Clusters

Bakhtin V. A. Krukov V. A. 《Programming and Computer Software》2019,45(3):121-132

Programming and Computer Software - The DVM-approach to the development of parallel programs for heterogeneous computer clusters with accelerators. The basic capabilities of DVM and SAPFOR , which... 相似文献

16.

A Semi-Automatic System of Decomposition of Sequential Programs for Parallel Computers with Distributed Memory

E. S. Borisov 《Cybernetics and Systems Analysis》2004,40(3):428-437

Parallel computers execute parallel programs that are transferred to other parallel architectures with difficultly and require special training of programmers. A parallelizing system is proposed that helps one to solve this problem. 相似文献

17.

A Fast Parallel Implementation of a PTAS for Fractional Packing and Covering Linear Programs

Slobodan Jelić Sören Laue Domagoj Matijević Patrick Wijerama 《International journal of parallel programming》2015,43(5):840-875

相似文献

18.

On Data Distributions in the Construction of Parallel Programs

Virginia Niculescu 《The Journal of supercomputing》2004,29(1):5-25

Data distributions have a serious impact on time complexity of parallel programs, developed based on domain decomposition. A new kind of distributions—set distributions, based on set-valued mappings, is introduced. These distributions assign a data object to more than one process. The set distributions can be used especially when the number of processes is greater than the data input size, but, sometimes using set distributions can lead to efficient general parallel algorithms. The work-load properties of these distributions and their impact on the number of communications are discussed. In order to illustrate the implications of data distributions in the construction of parallel programs, some examples are presented. Two parallel algorithms for computation of Lagrange interpolation polynomial are developed, starting from simple distributions and set distributions. 相似文献

19.

An Approach to the Run-Time Monitoring of Parallel Programs

Cai W.; Turner S. J. 《Computer Journal》1994,37(4):333-345

相似文献

20.

Main Memory-Based Algorithms for Efficient Parallel Aggregation for Temporal Databases 总被引：1，自引：0，他引：1

Dengfeng Gao Jose Alvin G. Gendrano Bongki Moon Richard T. Snodgrass Minseok Park Bruce C. Huang Jim M. Rodrigue 《Distributed and Parallel Databases》2004,16(2):123-163

The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge. 相似文献