首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
N. H. Gehani  W. D. Roome 《Software》1988,18(12):1157-1177
C++ and Concurrent C are both upward-compatible supersets of C that provide data abstraction and parallel programming facilities, respectively. Although data abstraction facilities are important for writing concurrent programs, we did not provide data abstraction facilities in Concurrent C because we did not want to duplicate the C++ research effort. Instead, we decided that we would eventually integrate C++ and Concurrent C facilities to produce a language with both data abstraction and parallel programming facilities, namely, Concurrent C++. Data abstraction and parallel programming facilities are orthogonal. Despite this, the merger of Concurrent C and C++ raised several integration issues. In this paper, we will give introductions to C++ and Concurrent C, give two examples illustrating the advantages of using data abstraction facilities in concurrent programs, and discuss issues in integrating C++ and Concurrent C to produce Concurrent C++.  相似文献   

2.
N. H. Gehani  W. D. Roome 《Software》1992,22(3):265-285
Concurrent C (C++) is a parallel superset of C (C++). Versions of Concurrent C have now been implemented for a variety of uniprocessors and multiprocessors. We first implemented a uniprocessor version of Concurrent C correctly anticipating that it would be relatively easy to extend the uniprocessor implementation to run on multiprocessors. Concurrent C is translated to C. The generated C code contains calls to the C library implementing the Concurrent C run-time system. This paper describes the ‘hard core’ details of the Concurrent C implementation: the specifics of process states, the data structures used, the library functions, and the C code generated for various Concurrent C constructs. We also give an overview of the Concurrent C facilities and of the uniprocessor, distributed and shared-memory multiprocessor implementations.  相似文献   

3.
Concurrent C, is a parallel superset of C (and of C++) that provides facilities such as specifying timeouts during process interactions, delaying program execution, accepting messages in a user-specified order, and asynchronous messages that can be used for writing real-time programs. However, Concurrent C does not provide facilities for specifying strict timing constraints, e.g., Concurrent C only ensures that the lower bounds on the specified delay and timeout periods are satisfied. Real-Time Concurrent C extends Concurrent C by providing facilities to specify periodicity or deadline constraints, to seek guarantees that timing constraints will be met, and to perform alternative actions when either the timing constraints cannot be met or the guarantees are not available.In this paper, we will discuss requirements for a real-time programming language, briefly summarize Concurrent C, and motivate and describe the real-time extensions to Concurrent C. We also discuss scheduling and other run-time facilities that have been incorporated to support the real-time extensions. A prototype implementation of Real-Time Concurrent C is nearing completion.  相似文献   

4.
Concurrent C, a superset of C providing parallel programming facilities, is considered. A uniprocessor version of Concurrent C was first implemented. After experience with this version, the Concurrent C implementation was extended to run on two types of multiple processor systems: a set of computers connected by a local area network (the distributed version) and a shared-memory multiprocessor (the multiprocessor version). Experience with implementing and using these versions of Concurrent C is described. Specifically, the language changes triggered by the multiple processor implementations, some sample programs, a comparison of the execution times on various systems, and the suitability of these multiple processor architectures are discussed  相似文献   

5.
That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high‐performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed‐memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM‐style shared‐memory simulation on a distributed‐memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256‐processor Cray T3D, an 8‐node IBM SP/2 and a 4‐node shared‐memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct‐mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

6.
The concurrent programming facilities in both Concurrent C and the Ada language are based on the rendezvous concept. Although these facilities are similar, there are substantial differences. Facilities in Concurrent C were designed keeping in perspective the concurrent programming facilities in the Ada language and their limitations. Concurrent C facilities have also been modified as a result of experience with its initial implementations. The authors compare the concurrent programming facilities in Concurrent C and Ada and show that it is easier to write a variety of concurrent programs in Concurrent C than in Ada  相似文献   

7.
Torquati  M.  Mencagli  G.  Drocco  M.  Aldinucci  M.  De Matteis  T.  Danelutto  M. 《The Journal of supercomputing》2019,75(8):4114-4131

This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper, we consider two streaming parallel patterns and we discuss different implementation variants related to how dynamic memory is managed. The results show that the standard mechanisms provided by modern C++ are not entirely adequate for maximizing the performance. Instead, the combined use of an efficient general purpose memory allocator, a custom allocator optimized for the pattern considered and a custom variant of the C++ shared pointer mechanism, provides a performance improvement up to 16% on the best case.

  相似文献   

8.
9.
Matti Rintala 《Software》2007,37(3):231-246
Serialization of data is needed when information is passed among program entities that have no shared memory. For remote procedure calls, this includes serialization of exception objects in addition to more traditional parameters and return values. In C++, serialization of exceptions is more complicated than parameters and return values, since internal copying and passing of exceptions is handled differently from C++ parameters and return values. This article presents a light‐weight template metaprogramming‐based mechanism for passing C++ exceptions in remote procedure calls, remote method invocations, and other situations where caller and callee do not have a shared address space. This mechanism has been implemented and tested in the KC++ concurrent active object system. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

10.
Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.  相似文献   

11.
提出了一个基于工作站网(networkofworkstations,简称NOW)的分布式程序设计语言NC++(NOWC++).它是DC++语言的扩充.NC++提供了一个完备的编程环境,包括NC++预编译器、图视编程界面、多目通信机制和测试系统.它完善了组管理机制和进程通信机制,提出了一个基于信度推理网络的分布共享内存(distributedsharedmemory,简称DSM)机制以管理C++公共变量.实践证明,NC++语言在确保编程方便性的前提下保证了分布式程序的性能.  相似文献   

12.
袁伟  孙永强 《软件学报》1998,9(1):47-52
面向对象的并行程序设计提供了类似于共享内存模型对通讯和计算的抽象能力,从而非常适合于大型并行软件系统的开发.但是基于远程对象调用的分布式对象的实现效率一直是面向对象方法在分布式/并行程序设计中得到广泛应用的障碍.本文介绍了并行机MANNA上所采用的面向对象的并行程序设计模型——Dual-Object模型.该模型通过引入从语义角度出发给出的数据一致特性的描述,在一定程度上解决了实现效率低下的问题.其次,文章通过程序设计实例详细地讨论了基于Dual-Object模型的扩展C++并行程序设计,并给出了部分实际测试结果.  相似文献   

13.
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.  相似文献   

14.
The most efficient way to parallelize computation is to build and evaluate the task graph constrained only by the data dependencies between the tasks. Both Intel's C++ Concurrent Collections (CnC) and Threading Building Blocks (TBB) libraries allow such task‐based parallel programming. CnC also adapts the macro data flow model by providing only single‐assignment data objects in its global data space. Although CnC makes parallel programming easier, by specifying data flow dependencies only through single‐assignment data objects, its macro data flow model incurs overhead. Intel's C++ CnC library is implemented on top of its C++ TBB library. We can measure the overhead of CnC by comparing its performance with that of TBB. In this paper, we analyze all three types of data dependencies in the tiled in‐place Gauss–Jordan elimination algorithm for the first time. We implement the task‐based parallel tiled Gauss–Jordan algorithm in TBB using the data dependencies analyzed and compare its performance with that of the CnC implementation. We find that the overhead of CnC over TBB is only 12%– 15% of the TBB time, and CnC can deliver as much as 87%– 89% of the TBB performance for Gauss–Jordan elimination, using the optimal tile size. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

15.
The WEDSP32C high-performance, programmable digital signal processor supports 32-bit floating-point arithmetic and is upwardly compatible with its predecessor, the WEDSP32. Because it is implemented in 0.75-μm (effective channel length) CMOS technology, the second-generation device achieves high functional density with low power consumption. The DSP32C offers the following features: 25-Mflop operation; 16-Mb/s serial-input and serial-output ports; a 160-bit, parallel I/O port for control and data transfer; interrupt facilities; single-instruction μ-law and A-law data conversions; single-instruction conversions between integers and floating-point data; a byte-addressable, on-chip memory that is extendable off chip; direct memory access to and from internal and external memory via parallel and serial I/O ports; 16 Mbytes of address space; and IEEE Std. 754 floating-point format conversion. The authors describe the DSP32C's instruction set, architecture, and application development tools. The latter includes an assembler, a simulator, an optimizing C compiler, and special-purpose hardware  相似文献   

16.
The Aurora distributed shared data system implements a shared-data abstraction on distributed-memory platforms, such as clusters, using abstract data types. Aurora programs are written in C++ and instantiate shared-data objects whose data-sharing behaviour can be optimized using a novel technique called scoped behaviour. Each object and each phase of the computation (i.e., use-context) can be independently optimized with per-object and per-context flexibility. Within the scoped behaviour framework, optimizations such as bulk-data transfer can be implemented and made available to the application programmer. Scoped behaviour carries semantic information regarding the specific data-sharing pattern through various layers of software. We describe how the optimizations are integrated from the uppermost application-programmer layers down to the lowest UDP-based layers of the Aurora system. A bulk-data transfer network protocol bypasses some bottlenecks associated with TCP/IP and achieves higher performance on an ATM network than either TreadMarks (distributed shared memory) or MPICH (message passing) for matrix multiplication and parallel sorting.  相似文献   

17.
This paper develops a formalism that precisely characterizes when class tables are required for C++ memory layouts. A memory layout is a particular choice of data structures for implementing run‐time support for object‐oriented languages. We use this formalism to quantify and evaluate, on a set of benchmarks, the space overhead for a set of C++ memory layouts. In particular, this paper studies the space overhead due to three language features: virtual dispatch, virtual inheritance, and dynamic typing. To date, there has been no scientific quantification or evaluation of C++ memory layouts. Our approach can help C++ implementors. This work has already influenced the memory layout design choices in IBM's Visual Age C++ V5 compiler. Applying our approach to a set of five benchmarks, we demonstrate that the impact of object‐oriented space overhead can vary dramatically between applications (ranging from 0.42% to 99.79% for our benchmarks). In particular, applications whose object space is dominated by instances of classes that heavily use object‐oriented language features will be significantly impacted by the choice of a memory layout. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

18.
OpenMP is widely accepted as a de facto standard for shared memory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until the first commercially available compilers supported this optional part of the specification. We employed nested parallelization using OpenMP in three production codes: a C++ code for content-based image retrieval, a C++ code for the computation of critical points in multi-block CFD datasets, and a multi-block Navier-Stokes solver written in Fortran90. In this paper we discuss the opportunities as well as the deficiencies of the nested parallelization support in OpenMP.  相似文献   

19.
Incorporating dimensional-analysis facilities in a language lets an environment detect an additional class of errors, but at the expense of requiring a change to the language you used. However, if the language you used has suitable data-abstraction facilities, most of the benefits of dimensional analysis can be had without changing the language. To demonstrate how to get these benefits, the authors use the data-abstraction facilities of C++ to implement dimensional analysis. C++ is an upwardly compatible extension of C that provides data-abstraction facilities called classes. The authors define a set of classes that allows the writing of programs with automatic checking of units (i.e. dimensional analysis) and automatic conversion between consistent (compatible or equivalent) units. They discuss the pros and cons of this approach and compare the advantage C++ has over Ada for implementing dimensional analysis  相似文献   

20.
This paper proposes a new parallel architecture, which has the potential to support low-level image processing as well as intermediate and high-level vision analysis tasks efficiently. The integrated architecture consists of an SIMD mesh of processors enhanced with multiple broadcast buses, and MIMD multiprocessor with orthogonal access buses, and a two-dimensional shared memory array. Low-level image processing is performed on the mesh processor, while intermediate and high-level vision analysis is performed on the orthogonal multiprocessor. The interaction between the two levels is supported by a common shared memory. Concurrent computations and I/O are made possible by partitioning the memory into disjoint spaces so that each processor system can access a different memory space. To illustrate the power of such a two-level system, we present efficient parallel algorithms for a variety of problems from low-level image processing to high-level vision. Representative problems include matrix based computations, histogramming and key counting operations, image component labeling, pyramid computations, Hough transform, pattern clustering, and scene labeling. Through computational complexity analysis, we show that the integrated architecture meets the processing requirements of most image understanding tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号