期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

User transparent data and task parallel multimedia computing with Pyxis-DT

Timo van Kessel Ben van Werkhoven Niels Drost Jason Maassen Henri E. Bal Frank J. Seinstra 《Future Generation Computer Systems》2013,29(8):2252-2261

The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. As most MMCA researchers are not also HPC experts, however, there is a demand for programming models and tools that are both efficient and easy to use. Existing user transparent parallelization tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. For certain MMCA applications a data parallel approach induces intensive communication, however, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches.We present Pyxis-DT, a user transparent parallel programming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation. Extensions for GPU clusters are also presented. 相似文献

2.

CAPLib—a ‘thin layer’ message passing library to support computational mechanics codes on distributed memory parallel systems

《Advances in Engineering Software》2001,32(1):61-83

The Computer Aided Parallelisation Tools (CAPTools) [Ierotheou, C, Johnson SP, Cross M, Leggett PF, Computer aided parallelisation tools (CAPTools)—conceptual overview and performance on the parallelisation of structured mesh codes, Parallel Computing, 1996;22:163–195] is a set of interactive tools aimed to provide automatic parallelisation of serial FORTRAN Computational Mechanics (CM) programs. CAPTools analyses the user's serial code and then through stages of array partitioning, mask and communication calculation, generates parallel SPMD (Single Program Multiple Data) messages passing FORTRAN.The parallel code generated by CAPTools contains calls to a collection of routines that form the CAPTools communications Library (CAPLib). The library provides a portable layer and user friendly abstraction over the underlying parallel environment. CAPLib contains optimised message passing routines for data exchange between parallel processes and other utility routines for parallel execution control, initialisation and debugging. By compiling and linking with different implementations of the library, the user is able to run on many different parallel environments.Even with today's parallel systems the concept of a single version of a parallel application code is more of an aspiration than a reality. However for CM codes the data partitioning SPMD paradigm requires a relatively small set of message-passing communication calls. This set can be implemented as an intermediate ‘thin layer’ library of message-passing calls that enables the parallel code (especially that generated automatically by a parallelisation tool such as CAPTools) to be as generic as possible.CAPLib is just such a ‘thin layer’ message passing library that supports parallel CM codes, by mapping generic calls onto machine specific libraries (such as CRAY SHMEM) and portable general purpose libraries (such as PVM an MPI). This paper describe CAPLib together with its three perceived advantages over other routes:

•as a high level abstraction, it is both easy to understand (especially when generated automatically by tools) and to implement by hand, for the CM community (who are not generally parallel computing specialists);
•the one parallel version of the application code is truly generic and portable;
•the parallel application can readily utilise whatever message passing libraries on a given machine yield optimum performance.

相似文献

3.

All‐uses testing of shared memory parallel programs

Cheer‐Sun D. Yang Lori L. Pollock 《Software Testing, Verification and Reliability》2003,13(1):3-24

Parallelism has become a way of life for many scientific programmers. A significant challenge in bringing the power of parallel machines to these programmers is providing them with a suite of software tools similar to the tools that sequential programmers currently utilize. Unfortunately, writing correct parallel programs remains a challenging task.In particular, automatic or semi‐automatic testing tools for parallel programs are lacking. This paper takes a first step in developing an approach to providing all‐uses coverage for parallel programs. A testing framework and theoretical foundations for structural testing are presented, including test data adequacy criteria and hierarchy, formulation and illustration of all‐uses testing problems, classification of all‐uses test cases for parallel programs, and both theoretical and empirical results with regard to what can be achieved with all‐uses coverage for parallel programs. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

4.

OREGAMI: Tools for mapping parallel computations to parallel architectures

Virginia M. Lo Sanjay Rajopadhye Samik Gupta David Keldsen Moataz A. Mohamed Bill Nitzberg Jan Arne Telle Xiaoxiong Zhong 《International journal of parallel programming》1991,20(3):237-270

相似文献

5.

A performance analysis exemplar: Parallel ray tracing

D. W. Jensen D. A. Reed 《Concurrency and Computation》1992,4(2):119-141

Among the many techniques in computer graphics, ray tracing is prized because it can render realistic images, albeit at great computational expense. Ray tracing's large computation requirements, coupled with its inherently parallel nature, make ray tracing algorithms attractive candidates for parallel implementation. In this paper we illustrate the utility and the importance of a suite of performance analysis tools when exploring the performance of several approaches to ray tracing on a distributed memory parallel system. These ray tracing algorithm variations introduce parallelism based on both ray and object partitions. Traditional timing analysis can quantify the performance effects of different algorithm choices (i.e. when an algorithm is best matched to a given problem), but it cannot identify the causes of these performance differences. We argue, by example, that a performance instrumentation system is needed that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary stapistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale, parallel computations; assimilating the enormous volume of performance data mandates visual display. 相似文献

6.

User transparency: a fully sequential programming model for efficient data parallel image processing

F. J. Seinstra D. Koelma 《Concurrency and Computation》2004,16(6):611-644

Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high‐performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high‐performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD‐style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well‐known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi‐baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user‐friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

7.

No pain and gain!—experiences with Mentat on a biological application

Andrew S. Grimshaw Emily A. West William R. Pearson 《Concurrency and Computation》1993,5(4):309-328

Throughout much of the parallel processing community there is the sense that writing software for distributed-memory parallel processors Is subject to a ‘no pain—no gain’ rule: that In order to reap the benefits of parallel computation one must first suffer the pain of converting the application to run on a parallel machine. We believe this Is the result of Inadequate programming tools and not a problem Inherent to parallel processing. We will show that one can parallelize real scientific applications and obtain good performance with little effort If the right tools are used. Our vehicle for this demonstration is a 6000-line DNA and protein sequence comparison application that we have implemented in Mental, an object-oriented parallel processing system for both parallel and distributed architectures. We briefly describe the application and present performance information for both the Mentat version and a hand-coded parallel version of the application. 相似文献

8.

Performance measurement,visualization and modeling of parallel and distributed programs using the AIMS toolkit

Jerry Yan Sekhar Sarukkai Pankaj Mehra 《Software》1995,25(4):429-461

Writing large-scale parallel and distributed scientific applications that make optimum use of the multiprocessor is a challenging problem. Typically, computational resources are underused due to performance failures in the application being executed. Performance-tuning tools are essential for exposing these performance failures and for suggesting ways to improve program performance. In this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs. 相似文献

9.

Language-based vectorization and parallelization using intrinsics,OpenMP, TBB and Cilk Plus

Przemysław Stpiczyński 《The Journal of supercomputing》2018,74(4):1461-1472

The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson’s Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman–Ford algorithm for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers. 相似文献

10.

A comparative performance study of common and popular task‐centric programming frameworks

Artur Podobas Mats Brorsson Karl‐Filip Faxn 《Concurrency and Computation》2015,27(1):1-28

Programmers today face a bewildering array of parallel programming models and tools, making it difficult to choose an appropriate one for each application. An increasingly popular programming model supporting structured parallel programming patterns in a portable and composable manner is the task‐centric programming model. In this study, we compare several popular task‐centric programming frameworks, including Cilk Plus, Threading Building Blocks, and various implementations of OpenMP 3.0. We have analyzed their performance on the Barcelona OpenMP Tasking Suite benchmark suite both on a 48‐core AMD Opteron 6172 server and a 64‐core TILEPro64 embedded many‐core processor. Our results show that the OpenMP offers the highest flexibility for programmers, and this flexibility comes to a cost. Frameworks supporting only a specific and more restrictive model, such as Cilk Plus and Threading Building Blocks, are generally more efficient both in terms of performance and energy consumption. However, Intel's implementation of OpenMP tasks performs the best and closest to the specialized run‐time systems. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

11.

VORD: A Versatile On-the-fly Race Detection Tool in OpenMP Programs

Young-Joo Kim Sejun Song Yong-Kee Jun 《International journal of parallel programming》2014,42(6):900-930

Shared-memory based parallel programming with OpenMP and Posix-thread APIs becomes more common to fully take advantage of multiprocessor computing environments. One of the critical risks in multithreaded programming is data races that are hard to debug and greatly damaging to parallel applications if they are uncaught. Although ample effort has been made in building specialized data race detection techniques, the state of art tools such as the Intel Thread Checker still have various functionality and performance problems. In this paper, we present a Versatile On-the-fly Race Detection (VORD) tool that provides an agile, efficient, and scalable race detection environment for various parallel programming models. VORD can automatically construct an empirically optimal set of race engines by utilizing classification and adaptation mechanisms. A Race-Detection Classification (RDC) table is created to categorize adequate engines in the aspect of labeling, detecting, and filtering. An Engine Code Property Selector (ECPS) uses the RDC table to adapt optimal engines for the given target programming models. In addition to RDC and ECPS, we have also implemented an OpenMP parser and a source instrumentor. The functionality and efficiency of VORD were compared with those of the Intel Thread Checker by using a set of OpenMP based kernel programs. The experimental results show that VORD can detect data races in more challenging programming models such as nested thread and synchronization models, and can achieve a couple of orders of a magnitude faster processing time than the Intel Thread Checker in the large parallel programs. 相似文献

12.

Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre

Andaluz-Pinedo Olaia Sanjurjo-González Hugo 《Language Resources and Evaluation》2022,56(2):651-671

Software tools are of vital importance in corpus-based research, but they can also lead to restrictions on the type of supported corpora and the range of analyses that can be performed. For example, corpus analysis tools, as general purpose software, do not include specific features to process corpora of theatre plays. This situation is even worse for parallel corpora of theatrical texts, in that there is currently a lack of software that allows for both the alignment and analysis of parallel corpora here. In this contribution, we will first outline the peculiarities of theatre texts and suggest three software features to address them: annotation of the structural units of plays, alignment at the utterance level, and concordances and statistics using the annotated units. Second, we will present the specific functionalities of TAligner and ACM to build and analyse parallel corpora of play texts, showing how new avenues of research are opening up with the development of these tools.

相似文献

13.

PORT：并行优化重构工具集

张兆庆乔如良《计算机学报》1994,17(12):908-921

ＰＯＲＴ是以ＦＯＲＴＲＡＮ７７源程序为象的并行优化重构工具集，它以自动并行重构程序为主体，辅以一组优化，静态分析，动态分析和程序执行过程视化工具，通过良好的用户界面和统一的的内部数据结构将它们集成为一个整体，本文介绍ＰＯＲＴ系统的特点，结构和若干关键技术。相似文献

14.

Falcon: On-line monitoring for steering parallel programs

Weiming Gu Greg Eisenhauer Karsten Schwan Jeffrey Vetter 《Concurrency and Computation》1998,10(9):699-736

Advances in high performance computing, communications and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools and user interfaces that permit the on-line monitoring and steering of large-scale parallel codes. The principal aspects of Falcon described in this paper are its abstractions and tools for capture and analysis of application-specific program information, performed on-line, with controlled latencies and scalable to parallel machines of substantial size. In addition, Falcon provides support for the on-line graphical display of monitoring information, and it allows programs to be steered during their execution, by human users or algorithmically. This paper presents our basic research motivation, outlines the Falcon system's functionality, and includes a detailed evaluation of its performance characteristics in light of its principal contributions. Falcon's functionality and performance evaluation are driven by our experiences with large-scale parallel applications being developed with end users in physics and in atmospheric sciences. The sample application highlighted in this paper is a molecular dynamics simulation program (MD) used by physicists to study the statistical mechanics of liquids. © 1998 John Wiley & Sons, Ltd. 相似文献

15.

Detecting mismatches among experts' ontologies acquired through knowledge elicitation

《Knowledge》2002,15(5-6):265-273

We have constructed a set of ontologies modelled on conceptual structures elicited from several domain experts. Protocols were collected from various experts, who advise on the selection/specification and purchase of personal computers. These protocols were analysed from the perspective of both the processes and the domain knowledge to reflect each expert's inherent conceptualisation of the domain. We are particularly interested in analysing discrepancies within and among such experts' ontologies, and have identified a range of ontology mismatches. A systematic approach to the analysis has been developed; subsequently, we shall develop software tools to support this process. 相似文献

16.

Analysis of a model for parallel image processing

S. Yalamanchili J.K. Aggarwal 《Pattern recognition》1985,18(1):1-16

An increasing awareness of the need for high speed parallel processing systems for image analysis has stimulated a great deal of interest in the design and development of such systems. Efficient processing schemes for several specific problems have been developed providing some insight into the general problems encountered in designing efficient image processing algorithms for parallel architectures. However it is still not clear what architecture or architectures are best suited for image processing in general, or how one may go about determining those which are. An approach that would allow application requirements to specify architectural features would be useful in this context. Working towards this goal, general principles are outlined for formulating parallel image processing tasks by exploiting parallelism in the algorithms and data structures employed. A synchronous parallel processing model is proposed which governs the communication and interaction between these tasks. This model presents a uniform framework for comparing and contrasting different formulation strategies. In addition, techniques are developed for analyzing instances of this model to determine a high level specification of a parallel architecture that best ‘matches’ the requirements of the corresponding application. It is also possible to derive initial estimates of the component capabilities that are required to achieve predefined performance levels. Such analysis tools are useful both in the design stage, in the selection of a specific parallel architecture, or in efficiently utilizing an existing one. In addition, the architecture independent specification of application requirements makes it a useful tool for benchmarking applications. 相似文献

17.

Virtual data space – load balancing for irregular applications

Thomas Decker 《Parallel Computing》2000,26(13-14)

Load balancing is a key issue in the development of parallel algorithms with irregular structures. Existing load balancing systems each support only one specific programming paradigm and thus are of limited use. The system VDS presented here allows concurrent use of various paradigms such as fork-join, weighted tasks, and static dags (directed acyclic graphs that are known in advance). The system provides visual performance evaluation tools to facilitate the efficient application of the system. VDS supports various communication interfaces including PVM and MPI. Thus, VDS-applications can be run on architectures ranging from workstation clusters to massively parallel systems. 相似文献

18.

A concurrent van Emde Boas array as a fast and simple concurrent dynamic set alternative

Konrad Ku&#x;akowski 《Concurrency and Computation》2014,26(2):360-379

Increasing demand for computationally efficient algorithms and processors has turned the attention of researchers toward parallel and concurrent solutions. Because the frequency of contemporary processors cannot be tweaked infinitely, the only hopes for squeezing more performance from computers are parallel processing and parallel computation. The important part of every parallel solution is concurrent data structures, which allow multithread programming environments to be taken advantage of. In this article, a new concurrent dynamic set structure is proposed. It is based on the van Emde Boas trees concept, where on every node of a tree, an array of the node's children is stored. The structure is equipped with a simple but effective locking algorithm, which allows it to be used concurrently by any number of threads. The presented algorithm idea is accompanied by an experimental implementation written in JAVA 6. Preliminary tests prove that, especially for moderately larger data sets with a predominance of read operations, the concurrent van Emde Boas array proposed in this article may be a viable alternative for other structures providing a similar functionality. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

19.

Performance modeling for SPMD message-passing programs

JÜRGEN BREHM PATRICK H. WORLEY MANISH MADHUKAR 《Concurrency and Computation》1998,10(5):333-357

Today's massively parallel machines are typically message-passing systems consisting of hundreds or thousands of processors. Implementing parallel applications efficiently in this environment is a challenging task, and poor parallel design decisions can be expensive to correct. Tools and techniques that allow the fast and accurate evaluation of different parallelization strategies would significantly improve the productivity of application developers and increase throughput on parallel architectures. This paper investigates one of the major issues in building tools to compare parallelization strategies: determining what type of performance models of the application code and of the computer system are sufficient for a fast and accurate comparison of different strategies. The paper is built around a case study employing the performance prediction tool (PerPreT) to predict performance of the parallel spectral transform shallow water model code (PSTSWM) on the Intel Paragon. PSTSWM is a parallel application code that was designed to evaluate different parallel strategies for the spectral transform method as it is used in climate modeling and weather forecasting. Multiple parallel algorithms and algorithm variants are embedded in the code. PerPreT uses a relatively simple algebraic model to predict execution time for SPMD (single program multiple data) parallel applications. Applications are modeled through parameterized formulae for communication and computation, where the parameters include the problem size, the number of processors used to execute the program, and system characteristics (e.g. setup times for communication, link bandwidth and sustained computing performance per processor). In this paper we describe performance models that predict the performance of the different algorithms in PSTSWM accurately enough to allow them to be compared, establishing the feasibility of such a demanding application of performance modeling. We also discuss issues in generating and validating the performance models, emphasizing the practical importance of tools such as PerPreT in such studies. © 1998 John Wiley & Sons, Ltd. 相似文献

20.

Controlling HEC-RAS using MATLAB

《Environmental Modelling & Software》2016

相似文献