期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Object-attribute architecture for design and modeling of distributed automation systems

S. M. Salibekyan P. B. Panfilov 《Automation and Remote Control》2012,73(3):587-595

The application of the object-attribute (OA) architecture of computing environment to implementation of distributed automation systems with computational nodes (computers or PLCs) of different hardware architectures is described. The features of OA modeling of distributed automation tools and the main techniques for modeling, programming, and debugging of such systems are shown. 相似文献

2.

Prospects for Optical Interconnects in Distributed,Shared-Memory Organized MIMD Architectures

Frietman Edward E. E. Ernst Ramon J. Crosbie Roy Shimoji Masao 《The Journal of supercomputing》1999,14(2):107-128

The antipodes of the class of sequential computers, executing tasks with a single CPU, are the parallel computers containing large numbers of computing nodes. In the shared-memory category, each node has direct access through a switching network to a memory bank, that can be composed of a single but large or multiple but medium sized memory configurations. Opposite to the first category are the distributed memory systems, where each node is given direct access to its own local memory section. Running a program in especially the latter category requires a mechanism that gives access to multiple address spaces, that is, one for each local memory. Transfer of data can only be done from one address space to another. Along with the two categories are the physically distributed, shared-memory systems, that allow the nodes to explore a single globally shared address space. All categories, the performances of which are subject to the way the computing nodes are linked, need either a direct or a switched interconnection network for inter-node communication purposes. Linking nodes and not taking into account the prerequisite of scalability in case of exploiting large numbers of them is not realistic, especially when the applied connection scheme must provide for fast and flexible communications at a reasonable cost. Different network topologies, varying from a single shared bus to a more complex elaboration of a fully connected scheme, and with them the corresponding intricate switching protocols have been extensively explored. A different vision is introduced concerning future prospects of an optically coupled distributed, shared-memory organized multiple-instruction, multiple-data system. In each cluster, an electrical crossbar looks after the interconnections between the nodes, the various memory modules and external I/O channels. The clusters itself are optically coupled through a free space oriented data distributing system. Analogies found in the design of the Convex SPP1000 substantiate the closeness to reality of such an architecture. Subsequently to the preceding introduction also an idealized picture of the fundamental properties of an optically based, fully connected, distributed, (virtual) shared-memory architecture is outlined. 相似文献

3.

HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

Weikuan Yu Xinyu Que Vinod Tipparaju Jeffrey S. Vetter 《Journal of Parallel and Distributed Computing》2012

Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model. Typically this is realized through one-sided operations that can enable asynchronous communication and data movement. With the size of petascale systems reaching 10,000s of nodes and 100,000s of cores, the underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. For any solution that addresses these resource scalability challenges, equally important is the need to maintain the performance of GAS programming models. In this paper, we describe a Hierarchical COOperation (HiCOO) architecture for scalable communication in GAS programming models. HiCOO formulates a cooperative communication architecture: with inter-node cooperation amongst multiple nodes (a.k.a multinode) and hierarchical cooperation among multinodes that are arranged in various virtual topologies. We have implemented HiCOO for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). By extensively evaluating different virtual topologies in HiCOO in terms of their impact to memory scalability, network contention, and application performance, we identify MFCG as the most suitable virtual topology. The resulting HiCOO architecture is able to realize scalable resource management and achieve resilience to network contention, while at the same time maintaining or enhancing the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%. 相似文献

4.

A universal parallel computer architecture

William J. Dally 《New Generation Computing》1993,11(3-4):227-249

Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer. 相似文献

5.

Architecture-independent parallel computation

Skillicorn D.B. 《Computer》1990,23(12):38-50

The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instruction multiple-data (MIMD) computers, hypercuboid computers and constant-valence MIMD computers. An argument that the PRAM model is universal over tightly coupled and hypercube systems, but not over constant-valence-topology, loosely coupled-system is reviewed, showing precisely how the PRAM model is too powerful to permit broad universality. Ways in which a model of computation can be restricted to become universal over less powerful architectures are discussed. The Bird-Meertens formalism (R.S. Bird, 1989), is introduced and it is shown how it is used to express computations in a compact way. It is also shown that the Bird-Meertens formalism is universal over all four architecture classes and that nontrivial restrictions of functional programming languages exist that can be efficiently executed on disparate architectures. The use of the Bird-Meertens formalism as the basis for a programming language is discussed, and it is shown that it is expressive enough to be used for general programming. Other models and programming languages with architecture-independent properties are reviewed 相似文献

6.

基于多虚空间多重映射技术的并行操作系统 总被引：3，自引：0，他引：3

陈左宁金怡濂《软件学报》2001,12(10):1562-1568

高性能计算机系统的可扩展性是系统设计的一大难题,NUMA(non-uniformmemoryarchitecture)结构正是为了解决共享存储体系的可扩展性问题而提出来的.研究和实践表明,整机系统的可扩展性与操作系统的结构有着密切的关系.典型的多处理机操作系统通常采用两种结构,基于共享的单一核心结构以及基于消息的多核心结构.通过分析得出结论认为,这两种结构都不能很好地适应可扩展并行机尤其是NUMA结构并行机的需求.针对存在的问题,提出了新的结构设计思想:多虚空间多重映射与主动消息相结合.测试和运行结果显示,该结构成功地解决了系统的可扩展问题. 相似文献

7.

An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

Stefan Goedecker Mireille Boulet 《Computer Physics Communications》2003,154(2):105-110

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors. 相似文献

8.

A parallel logic system on a multicomputer architecture

M. Cannataro G. Spezzano D. Talia 《Future Generation Computer Systems》1991,6(4):317-331

This paper describes the implementation of a logic programming language on a massively parallel architecture. This implementation is based on the AND/OR Process Model which allows the exploitation of both AND and OR parallelism in logic programs. A distributed memory model is used, and a decentralized control mechanism has been designed. The multicomputer, which the system has been implemented on, consists of a network of Inmos Transputers. The AND/OR processes are implemented as Occam processes mapped onto the Transputer nodes. After the presentation of the system architecture and a deep discussion of the distributed memory management, some preliminary performance results are discussed. 相似文献

9.

并行主存数据库系统PRISMA／DB

阳国贵王升《计算机应用与软件》1995,12(4):9-15,53

ＰＲＩＳＭＡ／ＤＢ是一并行的主存关系数据库管理系统。其设计思想主要有两个：第一，将整个数据库存入主存从而获得高性能，第二，系统使用一种面向对象的程序设计语言以模块方式实现，使得这种灵活的组织结构适应于功能和性能方面的分析和试验，目前其原型系统已实现，运行在一个具有１００个结点的多处理机上。本文将其设计和实现细节作一初步介绍。相似文献

10.

Experimenting with a Shared Virtual Memory Environment for Hypercubes

《Journal of Parallel and Distributed Computing》1995,29(2):228-235

This paper describes the design and implementation of a shared virtual memory (SVM) system for the nCUBE 2 machine. The SVM system provides the user a single coherent address space across all nodes. It is implemented at the user level in a C programming environment using high level constructs to support data sharing. Shared variables are treated as objects rather than pages. We have improved upon an existing algorithm for maintaining coherency in the SVM system, thus achieving a reduction in the number of internode messages required in coherency maintenance. Detailed timing analysis is conducted to analyze the feasibility of this shared environment. Experimental results indicate that parallel programs running under an SVM system show linear speedup, suggesting that SVM systems could provide an effective programming environment for the next generation of distributed memory parallel computers. The bottleneck of this implementation is associated with the expensive interrupt handling capability of the nCUBE 2. 相似文献

11.

基于混合编程模型的支持向量机训练并行化

李涛刘学臣张帅王恺杨愚鲁《计算机研究与发展》2015,52(5):1098-1108

支持向量机(support vector machine, SVM)是一种广泛应用于统计分类以及回归分析的监督学习方法.基于内点法(interior point method, IPM)的SVM训练具有空间占用小、迭代趋近快等优点,但随着训练数据集规模的增大,仍面临处理速度与存储空间所带来的双重挑战.针对此问题,提出利用CPU-GPU异构系统进行大规模SVM训练的混合并行机制.首先利用计算统一设备架构(compute unified device architecture, CUDA)对基于内点法的SVM训练算法的计算密集部分并行化,并改进算法使其适合利用cuBLAS线性代数库加以实现,提高训练速度;然后利用消息传递接口(message passing interface, MPI)在集群系统上实现CUDA加速后算法的分布并行化,利用分布存储有效地增加所处理数据集规模并减少训练时间;进而基于Fermi架构支持的页锁定内存技术,打破了GPU设备存储容量不足对数据集规模的限制.结果表明,利用消息传递接口(MPI)和CUDA混合编程模型以及页锁定内存数据存储策略,能够在CPU-GPU异构系统上实现大规模数据集的高效并行SVM训练,提升其在大数据处理领域的计算性能和应用能力. 相似文献

12.

The M-machine multicomputer

Marco Fillo Stephen W. Keckler William J. Dally Nicholas P. Carter Andrew Chang Yevgeny Gurevich Whay S. Lee 《International journal of parallel programming》1997,25(3):183-212

The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M-Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 9 function units, on-chip cache, and local memory. The multiple function units are used to exploit both instruction-level and thread-level parallelism. A user accessible message passing system yields fast communication and synchronization between nodes. Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms. This paper presents the architecture of the M-Machine and describes how its mechanisms attempt to maximize both single thread performance and overall system throughput. The architecture is complete and the MAP chip, which will serve as the M-Machine processing node, is currently being implemented. 相似文献

13.

On the cost–effectiveness of PRAMs

Ferri Abolhassan Jörg Keller Wolfgang J. Paul 《Acta Informatica》1999,36(6):463-487

We introduce a formalism which allows to treat computer architecture as a formal optimization problem. We apply this to the design of shared memory parallel machines. While present parallel computers of this type only support the programming model of a shared memory but often process simultaneous access by several processors to the shared memory sequentially, theoretical computer science offers solutions for this problem that are provably fast and asymptotically optimal. But the constants in these constructions seemed to be too large to let them be competitive. We modify these constructions under engineering aspects and improve the price/performance ratio by roughly a factor of 6. The resulting machine has surprisingly good price/performance ratio even if compared with distributed memory machines. For almost all access patterns of all processors into the shared memory, access is as fast as the access of only a single processor. Received: 29 June 1993 / 22 June 1999 相似文献

14.

Machines and models for parallel computing

Jack B. Dennis 《International journal of parallel programming》1994,22(1):47-77

It is widely believed that superscalar and superpipelined extensions of RISC style architecture will dominate future processor design, and that needs of parallel computing will have little effect on processor architecture. This belief ignores the issues of memory latency and synchronization, and fails to recognize the opportunity to support a general semantic model for parallel computing. Efforts to extend the shared-memory model using standard microprocessors have led to systems that implement no satisfactory model of computing, and present the programmer with a difficult interface on which to build parallel computing applications. A more satisfactory model for parallel computing may be obtained on the basis of functional programming concepts and the principles of modular software construction. We recommend that designs for computers be built on such a general semantic model of parallel computation. Multithreading concepts and dataflow principles can frame the architecture of these new machines. 相似文献

15.

A message-driven programming system for fine-grain multicomputers

Daniel Maskit Stephen Taylor 《Software》1994,24(10):953-980

This paper describes an experimental message-driven programming system for fine-grain multicomputers. The initial target architecture is the J-machine designed at MIT. This machine combines a unique collection of architectural features that include fine-grain processes, on-chip associative memory; and hardware support for process synchronization. The programming system uses these mechanisms via a simple message-driven process model that blurs the distinction between processes and messages: messages correspond to processes that are executed elsewhere in the network. This model allows code and data to be distributed across the computers in the machine, and is supported at every stage of the program development cycle. The prototype system we have developed includes a basic set of programming tools to support the model; these include a compiler, linker, archiver, loader and microkernel. Although the concepts are language independent, our prototype system is based on GNU-C. 相似文献

16.

POW! – The Programmers Open Workbench

《Journal of Systems Architecture》1999,45(11):909-918

This paper is about POW! the Programmers Open Workbench, a flexible and extensible programming environment for personal computers. POW! has been developed for programming lectures at high schools and universities and has been used successfully for the past few years. POW! is based on a modular architecture and can be adapted to many different programming languages and compilers. Until now there are compiler modules for the languages Oberon-2, C, C++ and Java. This text covers the structure and implementation of POW! It also compares POW! with other programming environments for personal computers. 相似文献

17.

Pc-based Shared Memory Architecture and Language

Houzet Dominique Fatni Abdelkrim 《The Journal of supercomputing》1998,12(1-2):119-136

The Image Processing applications require both computing and communication power. The object of the GFLOPS project was to study all aspects concerning the design of such computers. The project's aim was to develop a parallel architecture as well as its software environment to implement these applications efficiently. A development environment, especially a C data-parallel language, has been built for this purpose. The C parallel language presented here, simplifies the use of such architectures by providing the programmer with a global name space and a control mechanism to exploit fine and medium grain parallelism of its applications. The main advantage of our paradigm is that it allows a unique framework to express both data and control parallelism. We have implemented this programming environment on the GFLOPS machine which supports up to 512 processor nodes, which are PC mother boards, connected over a scaleable and cost-effective network, via the PCI-bus, at a constant cost per node. The aim is to obtain at low cost a scaleable virtually shared memory machine. In this paper we discuss the design of the GFLOPS machine and its C parallel language, and evaluate the effectiveness of the mechanisms incorporated. The analysis of the architecture's behaviour was conducted with microbenchmarks and image processing algorithms, written in C. 相似文献

18.

基于目录的Cache一致性协议的可扩展性研究

下载免费PDF全文

潘国腾窦强谢伦国《计算机工程与科学》2008,30(6):131-133

基于CC-NUMA结构的DSM多处理器系统是大规模高性能并行计算机的一个实现方式,由于比监听协议具有更好的扩展性,系统多采用基于目录的Cache一致性协议。但是,随着系统规模的不断扩大,目录协议同样面临着可扩展性的问题。本文在分析影响目录协议可扩展性因素的基础上,对当前比较典型的几种目录组织形式从存储开销方面进行了讨论,最后提出了基于目录Cache的两级目录组织方案。相似文献

19.

Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Chao‐Tung Yang Chao‐Chin Wu Jen‐Hsiang Chang 《Concurrency and Computation》2011,23(8):721-744

Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

20.

Dynamic programming on a functional memory computer

《Computers & Mathematics with Applications》1999,37(11-12):17-22

In a previous paper [1], we described the solution of dynamic programming problems on a new class of parallel processing systems, the Hawaii Parallel Computer (HPC). The HPC has a novel architecture distinguished by its incorporation of field programmable gate arrays to evaluate expressions and by its use of a decision-table data structure to represent computer programs. As specific examples, we showed how the HPC can be used to implement dynamic programming solutions of shortest-path and traveling-salesman problems. In that earlier implementation, we simply adapted algorithms intended for execution on conventional deterministic von Neumann computers. More recently, we designed a successor to the HPC, a “functional memory” computer, which includes constructs for nondeterministic computation. In this paper, we discuss how dynamic programming algorithms can be adapted to take advantage of this nondeterminism. 相似文献