首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
并行处理摸拟器”旨在帮助使用者加深对典型并行计算机系统基本工作过程的理解,通过一个基本的并行程序开发环境使其可进行并行程序设计及模拟执行.它通过对MIMD多处理机系统体系结构、编译器和操作系统基本特征的模拟,实现了对作业、作业步和DO循环级并行性的显式及隐式开发,并可自动检测系统配置规模及准确定位连接故障。  相似文献   

2.
本文给出了适用于MIMD多处理机系统的求解一般带状线性方程组的解耦分解并行算法,并在S10-2并行机上进行了数值实验,理论分析和数值实验的结果表明,该并行算法是有效的。  相似文献   

3.
景晓军  方滨兴 《软件学报》1996,7(7):401-408
SIMC(SIMDC)是通过对C语言进行语法扩展(未进行语义扩展)得到的支持SIMD(singleinstructionmultipledata)并行程序设计的并行语言.SIMC可方便地描述SIMD并行算法,具有SIMD计算机系统结构定义能力,可支持多种系统结构上的并行算法研究.SIMC语言的模拟执行系统已在单机上实现,并作为作者研究开发的SIMD计算机程序设计及性能评价模拟环境的并行程序设计语言,用于SIMD计算机算法及结构的性能评价.  相似文献   

4.
本文讨论了三维物体隐面消除的并行处理问题。给出了一类MIMD并行深度缓冲器算法,并在多Transputer系统上实现。文中还对这些算法的效率进行了比较。  相似文献   

5.
本文实现的SICE(SIMDCEmulator)是一个在串行机的环境下模拟进行SIMD计算机程序设计的软件包。SIC(SIMDC)是作者定义的一种基于C语言的SIMD并行扩展语言,它一方面支持反映SIMD结构特点的的并行语句,更重要的是可支持SIMD结构的定义,能方便的用于SIMD机器的算法研究。  相似文献   

6.
交互式多模型滤波器及其并行实现研究*   总被引:5,自引:1,他引:5  
本文研究交互式多模型滤波器(IMMF).IMMF对于动态模式具有随机突变的一类混合系统的估值具有良好的性能。本文首先给出IMMF的数学描述,揭示IMMF的机理特征,针对IMMF的结构特点,以PD-100多机处理系统为基础,研究了IMMF并行实现的处理器拓扑结构、IMMF任务划分、分配和IMMF并行映射实现,给出了IMMF并行实现算法。在PD-100系统上的仿真表明,本文的并行算法具有加速比线性好,  相似文献   

7.
数据并行模型应用到MIMD机器上,实现SPMD模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Multi-c的设计和实现。正在实现的Muliti-c编译器,以预编译的方式接受SIMD形式的程序说明,放宽同步要求,产生能以SPMK方式在并行系统上运行的C程序。  相似文献   

8.
用户界面管理系统UIMS(UserInterfaceManagementSystem)是为用户界面设计者提供良好设计环境的辅助设计工具。本文首先讨论了人机交互软件的特殊性并从方法学的角度出发,研究其开发过程,提出了一种人机交互软件系统的开发方法-HCSDM(Human-Com-puterInteractionSystemDevelopmentMethod)。其次,从UIMS的基本特征、UIMS模型和UIMS规范说明出发,讨论了在UIMS研究与设计中应考虑的问题。最后,对UIMS的研究与开发前景进行了一些展望  相似文献   

9.
我们把用多处理机系统来并行解释(传统顺序)PROLOG程序称做 PROLOG并行处理。本文讨论PROLOG并行处理的有关问题,如PROLOG并行处理的背景,PROLOG解释的并行模型及相应的处理方式(包括子句分布,并行解释控制策略,处理机/进程的分配/调度,PROLOG过程语义维护,通讯复杂性控制,以及支撑多处理机体系结构等)。最后介绍我们提出并准备实现的PROLOG的A并行处理方式,这一种包纳了多种并行模式的逆向递归式断言并行解释方式。  相似文献   

10.
利用IBM4381-P03型及CPU计算机进行并行处理功能的开发和并行计算的应用,得到较高的加速比。  相似文献   

11.
Fast neural net simulation with a DSP processor array   总被引:1,自引:0,他引:1  
This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.  相似文献   

12.
《Parallel Computing》1997,22(13):1837-1851
The PAPS (Performance Analysis of Parallel Systems) toolset is a testbed for the model based performance prediction of message passing parallel applications executed on private memory multiprocessor computer systems. PAPS allows to describe the execution behavior of the computer hardware and operating system software resources up to a very detailed level. This enables very accurate performance prediction of parallel applications even in the case of substantial performance degradation due to contention for shared resources. In this paper the fundamental design principles and implementation methodologies for the development of the PAPS toolset are presented and the PAPS parallel system specification formalisms are described. A simplified performance study of a parallel Gaussian elimination application on the nCUBE 2 multiprocessor system is used to demonstrate the usage of the tool.  相似文献   

13.
多机仿真器SimDSM的设计及实现   总被引:2,自引:1,他引:1  
仿真器是进行计算机体系结构研究的有力工具。本文以多机仿真器SimDSM为实例,讨论了设计仿真器的一般原理和评价仿真器的标准,SimDSM是为了分析研究分布式共享存储系统而设计的。它基于CISC结构的Intel的体系结构,由于采用执行驱动和自定义的线程的实现方法,使得仿真可以高效准确进行。实验的结果表明,该仿真器具有较高的实用性和有效性。  相似文献   

14.
Energy efficient scheduling of parallel tasks on multiprocessor computers   总被引:2,自引:1,他引:1  
In this paper, scheduling parallel tasks on multiprocessor computers with dynamically variable voltage and speed are addressed as combinatorial optimization problems. Two problems are defined, namely, minimizing schedule length with energy consumption constraint and minimizing energy consumption with schedule length constraint. The first problem has applications in general multiprocessor and multicore processor computing systems where energy consumption is an important concern and in mobile computers where energy conservation is a main concern. The second problem has applications in real-time multiprocessing systems and environments where timing constraint is a major requirement. Our scheduling problems are defined such that the energy-delay product is optimized by fixing one factor and minimizing the other. It is noticed that power-aware scheduling of parallel tasks has rarely been discussed before. Our investigation in this paper makes some initial attempt to energy-efficient scheduling of parallel tasks on multiprocessor computers with dynamic voltage and speed. Our scheduling problems contain three nontrivial subproblems, namely, system partitioning, task scheduling, and power supplying. Each subproblem should be solved efficiently, so that heuristic algorithms with overall good performance can be developed. The above decomposition of our optimization problems into three subproblems makes design and analysis of heuristic algorithms tractable. A unique feature of our work is to compare the performance of our algorithms with optimal solutions analytically and validate our results experimentally, not to compare the performance of heuristic algorithms among themselves only experimentally. The harmonic system partitioning and processor allocation scheme is used, which divides a multiprocessor computer into clusters of equal sizes and schedules tasks of similar sizes together to increase processor utilization. A three-level energy/time/power allocation scheme is adopted for a given schedule, such that the schedule length is minimized by consuming given amount of energy or the energy consumed is minimized without missing a given deadline. The performance of our heuristic algorithms is analyzed, and accurate performance bounds are derived. Simulation data which validate our analytical results are also presented. It is found that our analytical results provide very accurate estimation of the expected normalized schedule length and the expected normalized energy consumption and that our heuristic algorithms are able to produce solutions very close to optimum.  相似文献   

15.
Mehdi Badii 《Software》1998,28(5):463-480
This paper presents the implementation of multitasking functions of DYNIX Sequent computers on the UNIX operating system. The Sequent computers are shared memory multiprocessor computers running the DYNIX operating system. These functions support data and function partitioning. They let the user implement subprograms by the processors of a Sequent computer in parallel. The functions can synchronize, lock, and unlock data and program segments. As a result, the simulator allows the users to develop their multitasking programs on a uniprocessor computer such as a SUN workstation, and later port them to a Sequent computer. Further, the simulator adds a level of abstraction on top of UNIX for concurrent programming. The functions of the simulator allow the user to handle the communication and synchronization of the processes in a program at a higher level of abstraction, while concentrating on the design of multitasking algorithms. The simulator is applied to a parallel selection algorithm. © 1998 John Wiley & Sons, Ltd.  相似文献   

16.
The paper is dedicated to issues concerning simulation and analysis of hierarchical multiprocessor systems oriented to database applications. Requirements for a parallel database system model are given. A survey and comparative analysis of known parallel database system models are presented. A new multiprocessor database system model is introduced. This model allows us to simulate and evaluate arbitrary hierarchical multiprocessor configurations in the context of the OLTP class database applications. Examples of using the database multiprocessor model for simulation study of multiprocessor database systems are presented.  相似文献   

17.
Fork-join structures have gained increased importance in recent years as a means of modeling parallelism in computer and storage systems. The basic fork-join model is one in which a job arriving at a parallel system splits into K independent tasks that are assigned to K unique, homogeneous servers. In the paper, a simple response time approximation is derived for parallel systems with exponential service time distributions. The approximation holds for networks modeling several devices, both parallel and nonparallel. (In the case of closed networks containing a stand-alone parallel system, a mean response time bound is derived.) In addition, the response time approximation is extended to cover the more realistic case wherein a job splits into an arbitrary number of tasks upon arrival at a parallel system. Simulation results for closed networks with stand-alone parallel subsystems and exponential service time distributions indicate that the response time approximation is, on average, within 3 percent of the seeded response times. Similarly, simulation results with nonexponential distributions also indicate that the response time approximation is close to the seeded values. Potential applications of our results include the modeling of data placement in disk arrays and the execution of parallel programs in multiprocessor and distributed systems  相似文献   

18.
Two current trends in the real‐time and embedded systems are the multiprocessor architectures and the partitioning technology that enables several isolated applications with different criticality levels to share the same computer. This paper presents a real‐time platform for multiprocessor and partitioned systems, in which communication requirements are also considered. The paper describes the adaptation of MaRTE OS (a monoprocessor real‐time operating system) to the XtratuM hypervisor for the multiprocessor Intel x86 architecture. This adaptation makes two contributions to ease the development process of future mixed‐criticality applications: firstly, it integrates the hypervisor technology and the fully partitioned scheduling in a multiprocessor environment, and secondly, it provides the basis to interconnect partitioned and non‐partitioned applications via a homogeneous communication subsystem. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
A hardware accelerator for self-organizing feature maps is presented. We have developed a massively parallel architecture that, on the one hand, allows a resource-efficient implementation of small or medium-sized maps for embedded applications, requiring only small areas of silicon. On the other hand, large maps can be simulated with systems that consist of several integrated circuits that work in parallel. Apart from the learning and recall of self-organizing feature maps, the hardware accelerates data pre- and postprocessing. For the verification of our architectural concepts in a real-world environment, we have implemented an ASIC that is integrated into our heterogeneous multiprocessor system for neural applications. The performance of our system is analyzed for various simulation parameters. Additionally, the performance that can be achieved with future microelectronic technologies is estimated.  相似文献   

20.
In simultaneous multithreading (SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号