首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this work, we propose new techniques to analyze the behavior, the performance, and specially the scalability of High Performance Computing (in short, HPC) applications on different computing architectures. Our final objective is to test applications using a wide range of architectures (real or merely designed) and scaling it to any number of nodes or components. This paper presents a new simulation framework, called SIMCAN, for HPC architectures. The main characteristic of the proposed simulation framework is the ability to be configured for simulating a wide range of possible architectures that involve any number of components. SIMCAN is developed to simulate complete HPC architectures, but putting special emphasis on the storage and network subsystems. The SIMCAN framework can handle complete components (nodes, racks, switches, routers, etc.), but also key elements of the storage and network subsystems (disks, caches, sockets, file systems, schedulers, etc.). We also propose several methods to implement the behavior of HPC applications. Each method has its own advantages and drawbacks. In order to evaluate the possibilities and the accuracy of the SIMCAN framework, we have tested it by executing a HPC application called BIPS3D on a hardware-based computing cluster and on a modeled environment that represent the real cluster. We also checked the scalability of the application using this kind of architecture by simulating the same application with an increased number of computing nodes.  相似文献   

2.
RAMP: Research Accelerator for Multiple Processors   总被引:1,自引:0,他引:1  
The RAMP project's goal is to enable the intensive, multidisciplinary innovation that the computing industry will need to tackle the problems of parallel processing. RAMP itself is an open-source, community-developed, FPGA-based emulator of parallel architectures. its design framework lets a large, collaborative community develop and contribute reusable, composable design modules. three complete designs - for transactional memory, distributed systems, and distributed-shared memory - demonstrate the platform's potential.  相似文献   

3.
Diminishing returns from increased clock frequencies and instruction‐level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many‐core architectures have progressed at a remarkable rate, concerns arise regarding the performance and productivity of numerous parallel‐programming tools for application development. Development of parallel applications on many‐core processors often requires developers to familiarize themselves with unique characteristics of a target platform while attempting to maximize performance and maintain correctness of their applications. The family of partitioned global address space (PGAS) programming models comprises the current state of the art in balancing performance and programmability. One such PGAS approach is SHMEM, a lightweight, shared‐memory programming library that has demonstrated high performance and productivity potential for parallel‐computing systems with distributed‐memory architectures. In the paper, we present research, design, and analysis of a new SHMEM infrastructure specifically crafted for low‐level PGAS on modern and emerging many‐core processors featuring dozens of cores and more. Our approach (with a new library known as TSHMEM) is investigated and evaluated atop two generations of Tilera architectures, which are among the most sophisticated and scalable many‐core processors to date, and is intended to enable similar libraries atop other architectures now emerging. In developing TSHMEM, we explore design decisions and their impact on parallel performance for the Tilera TILE‐Gx and TILEPro many‐core architectures, and then evaluate the designs and algorithms within TSHMEM through microbenchmarking and applications studies with other communication libraries. Our results with barrier primitives provided by the Tilera libraries show dissimilar performance between the TILE‐Gx and TILEPro; therefore, TSHMEM's barrier design takes an alternative approach and leverages the on‐chip mesh network to provide consistent low‐latency performance. In addition, our experiments with TSHMEM show that naive collective algorithms consistently outperformed linear distributed collective algorithms when executed in an SMP‐centric environment. In leveraging these insights for the design of TSHMEM, our approach outperforms the OpenSHMEM reference implementation, achieves similar to positive performance over OpenMP and OSHMPI atop MPICH, and supports similar libraries in delivering high‐performance parallel computing to emerging many‐core systems. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

4.
硬件构件的形式化描述及其组装机制   总被引:1,自引:1,他引:0       下载免费PDF全文
针对构件化的路由交换平台设计,提出硬件基础构件的抽象模型及其内部处理流程的形式化描述,提取出顺序、并行、分支以及聚合4种原子组装机制,用于构建更高层次的复合构件,并推导出复合构件抽象模型及其处理流程的形式化描述,从而有利于抽象出更大粒度的构件用以组装复杂的硬件平台。  相似文献   

5.
Most Western Governments (USA, Japan, EEC, etc.) have now launched national programmes to develop computer systems for use in the 1990s. These so-called Fifth Generation computers are viewed as “knowledge” processing systems which support the symbolic computation underlying Artificial Intelligence applications. The major driving force in Fifth Generation computer design is to efficiently support very high level programming languages (i.e. VHLL architecture).

Historycally, however, commercial VHLL architectures have been largely unsuccesful. The driving force in computer designs has principally been advances in hardware which at the present time means architectures to exploit very large scale integration (i.e. VLSI architecture).

This paper examines VHLL architectures and VLSI architectures and their probable influences on Fifth Generation computers. Interestingly the major problem for both architecture classes is parallelism; how to orchestrate a single parallel computation so that it can be distributed across an ensemble of processors.  相似文献   


6.
Recent advances in artificial neural networks (ANNs) have led to the design and construction of neuroarchitectures as simulator and emulators of a variety of problems in science and engineering. Such problems include pattern recognition, prediction, optimization, associative memory, and control of dynamic systems. This paper offers an analytical overview of the most successful design, implementation, and application of neuroarchitectures as neurosimulators and neuroemulators. It also outlines historical notes on the formulation of basic biological neuron, artificial computational models, network architectures, and learning processes of the most common ANN; describes and analyzes neurosimulation on parallel architecture both in software and hardware (neurohardware); presents the simulation of ANNs on parallel architectures; gives a brief introduction of ANNs in vector microprocessor systems; and presents ANNs in terms of the "new technologies". Specifically, it discusses cellular computing, cellular neural networks (CNNs), a new proposition for unsupervised neural networks (UNNs), and pulse coupled neural networks (PCNNs).  相似文献   

7.
As programming passes the 30 year mark as a professional occupation, an increasingly large number of programs are in application areas that have been automated for many years. This fact is changing the technology base of commercial programming, and is opening up new markets for standard functions, reusable common systems, modules, and the tools and support needed to facilitate searching out and incorporating existing code segments. This report addresses the 1984 state of the art in the domains of reusable data, reusable architectures, reusable design, common systems, reusable programs, and reusable modules or subroutines. If current trends toward reusability continue, the amount of reused logic and reused code in commercial programming systems may approach 50 percent by 1990. However, major efforts will be needed in the areas of reusable data, reusable architectures, and reusable design before reusable code becomes a sound basic technology.  相似文献   

8.
Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.  相似文献   

9.
The inherent complex nature of current distributed computing architectures hinders the widespread adoption of these systems for mainstream use. In general, users have access to a highly heterogeneous set of compute resources, which may include clusters, grids, desktop grids, clouds, and other compute platforms. This heterogeneity is especially problematic when running parallel and distributed applications. Software is needed which easily combines as many resources as possible into one coherent computing platform. In this paper, we introduce Zorilla: peer‐to‐peer (P2P) middleware that creates a single distributed environment from any available set of compute resources. Zorilla imposes minimal requirements on the resource used, is platform independent, and does not rely on central components. In addition to providing functionality on bare resources, Zorilla can exploit locally available middleware. Zorilla explicitly supports distributed and parallel applications, and allows resources from multiple sites to cooperate in a single computation. Zorilla makes extensive use of both virtualization and P2P techniques. We will demonstrate how virtualization and P2P combine into a simple design, while enhancing functionality and ease of use. Together, these techniques bring our goal a step closer: transparent, easy use of resources, even on very heterogeneous distributed systems. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

10.
11.
时间触发协议是TTA架构必需的通信协议,用于在要求高可靠性的分布式容错实时系统中电子模块之间的互连;目前作为时间触发通信系统重要组成部分的时间触发控制器主要是采用处理器来实现协议的处理,协议开销比较大;基于FPGA的时间触发协议控制器的设计,采用了具有较好同步能力的编码方式和合理的帧格式,在建立全局时间基准的基础上优化了协议处理状态机,利用FP-GA的并行处理能力,降低了协议开销,增加了总线的效率,同时也提高了时钟同步精度和容错能力;仿真结果表明,基于FPGA的时间触发协议控制器具有较好的性能.  相似文献   

12.
BSPlib: The BSP programming library   总被引:1,自引:0,他引:1  
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics.  相似文献   

13.
Much progress has been made in distributed computing in the areas of distribution structure, open computing, fault tolerance, and security. Yet, writing distributed applications remains difficult because the programmer has to manage models of these areas explicitly. A major challenge is to integrate the four models into a coherent development platform. Such a platform should make it possible to cleanly separate an application’s functionality from the other four concerns. Concurrent constraint programming, an evolution of concurrent logic programming, has both the expressiveness and the formal foundation needed to attempt this integration. As a first step, we have designed and built a platform that separates an application’s functionality from its distribution structure. We have prototyped several collaborative tools with this platform, including a shared graphic editor whose design is presented in detail. The platform efficiently implements Distributed Oz, which extends the Oz language with constructs to express the distribution structure and with basic primitives for open computing, failure detection and handling, and resource control. Oz appears to the programmer as a concurrent object-oriented language with dataflow synchronization. Oz is based on a higher-order, state-aware, concurrent constraint computation model. Seif Haridi, Ph.D.: He received his Ph.D. in computer science in 1981 from the Royal Institute of Technology, Sweden. After spending 18 months at IBM T. J. Watson Research Center, he moved to the Swedish Institute of Computer Science (SICS) to form a research lab on logic programming and parallel systems. Dr. Haridi is currently the research director of the Swedish Institute of Computer Science. He has been an active researcher in the area of logic and constraint programming and parallel processing since the beginning of the eighties. His earlier work includes contributions to the design of SICStus Prolog, various parallel Prolog systems and a class of scalable cache-coherent multiprocessors known as Cache-Only Memory Architecture (COMA). During the nineties most of his work focused on the design of multiparadigm programming systems based on Concurrent Constraint Programming (CCP). Currently, he is interested in programming systems and software methodology for distributed and agent-based applications. Peter Van Roy, Ph.D.: He obtained an engineering degree from the Vrije Universiteit Brussel (1983), Masters and Ph.D. degrees from the University of California at Berkeley (1984, 1990), and the Habilitation à Diriger des Recherches from Paris VII Denis Diderot (1996). He has made major contributions to logic language implementation. His research showed for the first time that Prolog can be implemented with the same execution efficiency as C. He was principal developer or codeveloper of Aquarius Prolog, Wild_Life, Logical State Threads, and FractaSketch. He joined the Oz project in 1994 and is currently working on Distributed Oz. His research interests are motivated by the desire to provide increased expressivity and efficiency to application developers. Per Brand: He is a researcher at the Swedish Institute of Computer Science. He has previously worked on the design and implementation of OR-parallel Prolog (the Aurora project) and optimized compilation techniques for Concurrent Constraint Programming Languages (in particular, AKL). He has been a member of the Distributed Oz design team since the project began. His research interests are focused on techniques, languages, and methodology for distributed programming. Christian Schulte: He studied computer science at the University of Karlsruhe, Germany, from 1987 to 1992 where he received his diploma. Since 1992 he has been a member of the Programming Systems Lab at DFKI. He is one of the principal designers of Oz. His research interests include design, implementation, and application of concurrent and distributed programming languages as well as constraint programming.  相似文献   

14.
《Information Sciences》1986,38(2):165-180
One complication in using distributed computer systems is the increased complexity of developing distributed software systems. These software systems are composed of asynchronously executing components which communicate via message passing. Current software design techniques are not adequate for use in the design of distributed software systems. New design methods which explicitly address the problem of system partitioning are needed. An overall distributed software design approach is presented. The key to the design approach is the presentation of a distributed processing component (DPC) partitioning algorithm for clustering functional modules in order to derive a set of distributed processing components. The design approach is oriented towards producing a software system which is hierarchical, which exploits potential concurrency that exists between functional modules, and which avoids nonprofitable message traffic.  相似文献   

15.
The abundance of parallel and distributed computing platforms, such as MPP, SMP, and the Beowulf clusters, to name just a few, has added many more possibilities and challenges to high performance computing (HPC), parallel I/O, mass data storage, scalable architectures, and large-scale simulations, which traditionally belong to the realm of custom-tailored parallel systems. The intent of this special issue is to discuss problems and solutions, to identify new issues, and to help shape future research directions in these areas. From these perspectives, this special issue addresses the problems encountered at the hardware, architectural, and application levels, while providing conceptual as well as empirical treatments to the current issues in high performance computing, and the I/O architectures and systems utilized therein.  相似文献   

16.
Rapid prototyping is a key aspect for the development of innovative robotic applications. A modular, platform-based, approach is the way to obtain this result. Modular approaches are common for software development, but hardware is still crafted often re-inventing solutions every time. As a consequence, the resources that should be invested in the development of a new robot get often drained by the implementation of a physical, working prototype to test the application idea. To overcome this problem, we propose R2P (Rapid Robot Prototyping), a framework to implement real-time, high-quality architectures for robotic systems with off-the-shelf basic modules (e.g., sensors, actuators, and controllers), integrating hardware and software, which can be assembled in a plug-and-play way. R2P provides hardware modules, a protocol for real-time communication, a middleware to connect components as well as tools to support the development of software on the modules. R2P aims at dramatically reduce time and efforts required to build a prototype robot, making it possible to focus the resources on the development of new robotic applications instead of struggling on their implementation. This enables also people with experience in a specific application domain, but with little technical background, to actively participate in the development of new robotic applications. R2P is open-source both in its software and hardware to promote its diffusion among the robotics community and novel business models that will substantially reduce the costs to design a new robotic product.  相似文献   

17.
Autonomous robotics projects encompass the rich nature of integrated systems that includes mechanical, electrical, and computational software components. The availability of smaller and cheaper hardware components has helped make possible a new dimension in operational autonomy. This paper describes a mobile robotic platform consisting of several integrated modules including a laptop computer that serves as the main control module, microcontroller‐based motion control module, a vision processing module, a sensor interface module, and a navigation module. The laptop computer module contains the main software development environment with a user interface to access and control all other modules. Programming language independence is achieved by using standard input/output computer interfaces including RS‐232 serial port, USB, networking, audio input and output, and parallel port devices. However, with the same hardware technology available to all, the distinguishing factor in most cases for intelligent systems becomes the software design. The software for autonomous robots must intelligently control the hardware so that it functions in unstructured, dynamic, and uncertain environments while maintaining an autonomous adaptability. This paper describes how we introduced fuzzy logic control to one robot platform in order to solve the 2003 Intelligent Ground Vehicle Competition (IGVC) Autonomous Challenge problem. This paper also describes the introduction of hybrid software design that utilizes Fuzzy Evolutionary Artificial Neural Network techniques. In this design, rather than using a control program that is directly coded, the robot's artificial neural net is first trained with a training data set using evolutionary optimization techniques to adjust weight values between neurons. The trained neural network with a weight average defuzzification method was able to make correct decisions to unseen vision patterns for the IGVC Autonomous Challenge. A comparison of the Lawrence Technological University robot designs and the design of the other competing schools shows that our platforms were the most affordable robot systems to use as tools for computer science and engineering education. © 2004 Wiley Periodicals, Inc.  相似文献   

18.
A survey of QoS architectures   总被引:14,自引:0,他引:14  
Over the past several years there has been a considerable amount of research within the field of quality-of-service (QoS) support for distributed multimedia systems. To date, most of the work has been within the context of individual architectural layers such as the distributed system platform, operating system, transport subsystem and network layers. Much less progress has been made in addressing the issue of overall end-to-end support for multimedia communications. In recognition of this, a number of research teams have proposed the development of QoS architectures which incorporate QoS-configurable interfaces and QoS driven control and management mechanisms across all architectural layers. This paper examines the state-of-the-art in the development of QoS architectures. The approach taken is to present QoS terminology and a generalized QoS framework for understanding and discussing QoS in the context of distributed multimedia systems. Following this, we evaluate a number of QoS architectures that have emerged in the literature.  相似文献   

19.
Autonomous agent architectures are design methodologies—collections of knowledge and strategies which are applied to the problem of creating situated intelligence. This article attempts to integrate this knowledge across several architectural traditions, paying particular attention to features which have tended to be selected under the pressure of extensive use in real-world systems. We determine that the following strategies provide significant assistance in the design of autonomous intelligent agents: (i) modularity, which simplifies both design and control; (ii) hierarchically organized action selection, which focusses attention and provides prioritization when different modules conflict; and (iii) parallel environment monitoring which allows a system to be responsive and opportunistic by allowing attention to shift and priorities to be re-evaluated. We offer a review of four architectural paradigms: behaviour-based AI; two- and three-layered systems; belief, desire and intention architectures (particularly PRS); and Soar/ACT-R. By documenting trends within each of these communities towards establishing the components above, we argue that this convergent evolution is strong evidence for the components' utility. We then use this information to recommend specific strategies for researchers working under each paradigm to further exploit the knowledge and experience of the field as a whole.  相似文献   

20.
Randomized algorithms are gaining ground in high-performance computing applications as they have the potential to outperform deterministic methods, while still providing accurate results. We propose a randomized solver for distributed multicore architectures to efficiently solve large dense symmetric indefinite linear systems that are encountered, for instance, in parameter estimation problems or electromagnetism simulations. The contribution of this paper is to propose efficient kernels for applying random butterfly transformations and a new distributed implementation combined with a runtime (PaRSEC) that automatically adjusts data structures, data mappings, and the scheduling as systems scale up. Both the parallel distributed solver and the supporting runtime environment are innovative. To our knowledge, the randomization approach associated with this solver has never been used in public domain software for symmetric indefinite systems. The underlying runtime framework allows seamless data mapping and task scheduling, mapping its capabilities to the underlying hardware features of heterogeneous distributed architectures. The performance of our software is similar to that obtained for symmetric positive definite systems, but requires only half the execution time and half the amount of data storage of a general dense solver.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号