首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fifth generation computers are analogous to LEGO building blocks, with each block corresponding to a microcomputer and a group of blocks working together as a computer system. These computers will represent a unification of currently separate areas of research into parallel processing and into VLSI processors. Parallel processing based on data driven and demand driven computer organisations are under investigation in well over thirty laboratories in the United States, Japan and Europe. Basically, in data driven (e.g. data flow) computers the availability of operands triggers the execution of the operation to be performed on them; whereas in demand driven (e.g. reduction) computers the requirement for a result triggers the operation that will generate the value. VLSI processors exploit very large scale integration and the new simplified chip design methodology pioneered in US universities by Mead and Conway, allowing users to design their own chips. These novel VLSI processors are implementable by simple replicated cells and use extensive pipelining and multiprocessing to achieve a high performance. Examples range from a powerful image processing device configured from identical special-purpose chips, to a large parallel computer built from replicated general-purpose microcomputers. This paper outlines these topics contributing to fifth generation computers, and speculates on their effect on computing.  相似文献   

2.
The resolution of combinatorial optimization problems can greatly benefit from the parallel and distributed processing which is characteristic of neural network paradigms. Nevertheless, the fine grain parallelism of the usual neural models cannot be implemented in an entirely efficient way either in general-purpose multicomputers or in networks of computers, which are nowadays the most common parallel computer architectures. Therefore, we present a parallel implementation of a modified Boltzmann machine where the neurons are distributed among the processors of the multicomputer, which asynchronously compute the evolution of their subset of neurons using values for the other neurons that might not be updated, thus reducing the communication requirements. Several alternatives to allow the processors to work cooperatively are analyzed and their performance detailed. Among the proposed schemes, we have identified one that allows the corresponding Boltzmann Machine to converge to solutions with high quality and which provides a high acceleration over the execution of the Boltzmann machine in uniprocessor computers.  相似文献   

3.
本文重点分析了分别用作计算机网络通信节点机的前置处理机和网络交换机的两个容错计算机的体系结构和组成等方面的容错特性。这些典型范例对于计算机通信网(计算机网络)中网络通信节点机的容错设计具有良好的参考价值。  相似文献   

4.
PeiZong Lee 《Parallel Computing》1995,21(12):1895-1923
It is widely accepted that distributed memory parallel computers will play an important role in solving computation-intensive problems. However, the design of an algorithm in a distributed memory system is time-consuming and error-prone, because a programmer is forced to manage both parallelism and communication. In this paper, we present techniques for compiling programs on distributed memory parallel computers. We will study the storage management of data arrays and the execution schedule arrangement of Do-loop programs on distributed memory parallel computers. First, we introduce formulas for representing data distribution of specific data arrays across processors. Then, we define communication cost for some message-passing communication operations. Next, we derive a dynamic programming algorithm for data distribution. After that, we show how to improve the communication time by pipelining data, and illustrate how to use data-dependence information for pipelining data. Jacobi's iterative algorithm and the Gauss elimination algorithm for linear systems are used to illustrate our method. We also present experimental results on a 32-node nCUBE-2 computer.  相似文献   

5.
分布式并行仿真系统中的数据通信性能研究   总被引:2,自引:1,他引:1  
张会生  李军  刘永文  苏明 《计算机仿真》2005,22(1):126-127,131
该文针对半物理仿真平台中的分布式并行计算机系统,进行了不同计算机之间的数据通讯性能研究。结果表明利用普通的商用计算机,采用提高仿真程序优先级的策略,可以实现仿真程序的实时运算;采用网络交换机可以避免数据通讯中的阻塞和冲突,有效地解决了分布式计算机系统之间的网络通讯,可以实现燃气轮机系统的半物理实时仿真,这将为实时仿真提供通用的仿真平台,从而节约计算资源。  相似文献   

6.
在PC机与多个单片机的串行通信基础上,从硬件和软件两方面完成了智能数字传送器的设计.智能数字传送器实现的是一台上位机(PC)机和8台下位机(单片机)的串行通信,上位机软件采用Visual Basic编程,作出十分直观的人机界面,并通过串行口将下位机的数据传送给上位机,上位机可作出应答信号,同时上位机(PC)机可向下位机(单片机)发出数据请求.系统具有对数据的修改,显示和报警等功能,目前在工业控制领域中应用非常广泛.  相似文献   

7.
8.
This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O, and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers of processors, we develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications.The contribution of our paper is that it provides quantitative data about real parallel scientific applications in a manner that is largely independent of the specific machine on which the application was run. Such data, which are clearly very valuable to an architect who is designing a new parallel computer, were not previously available. For example, the majority of research papers in interconnection networks have used simulated communication loads consisting of fixed-size messages. Our data, which show that using such simulated loads is unrealistic, can be used to generate more realistic communication loads.  相似文献   

9.
Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This protocol processing convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: the Fixed policy, which uses a dedicated protocol processor; and the Floating policy, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols, (ii) a dedicated protocol processor is generally advantageous when there are four or more processors per node, (iii) multiprocessor node performance is not as sensitive to interrupt overhead as uniprocessor node because a message arrival is likely to find an idle processor on a multiprocessor node, thereby eliminating interrupts, (iv) the system with the lowest cost-performance will include a dedicated protocol processor when interrupt overheads are much higher than protocol weight—as in light-weight protocols.  相似文献   

10.
Consider a distributed system consisting of n computers connected by a number of identical broadcast channels. All computers may receive messages from all channels. We distinguish between two kinds of systems: systems in which the computers may send on any channel (dynamic allocation) and system where the send port of each computer is statically allocated to a particular channel. A distributed task (application) is executed on the distributed system. A task performs execution as well as communication between its subtasks. We compare the completion time of the communication for such a task using dynamic allocation and channels with the completion time using static allocation and channels. Some distributed tasks will benefit very much from allowing dynamic allocation, whereas others will work fine with static allocation. In this paper we define optimal upper and lower bounds on the gain (or loss) of using dynamic allocation and channels compared to static allocation and channels. Our results show that, for some tasks, the gain of permitting dynamic allocation is substantial, e.g. when , there are tasks which will complete 1.89 times faster using dynamic allocation compared to using the best possible static allocation, but there are no tasks with a higher such ratio. Received: 26 February 1998 / 26 July 1999  相似文献   

11.
计算机网络互连的低成本高可用设计   总被引:1,自引:0,他引:1  
本文主要研究了如何构成低成本高可用的计算机网络的原理和技术 ,包括低成本高可用网络的设计目标和网络通信链路、主机、服务器和网络设备 (如集线器、交换机、路由器 ,还包括程控交换机 )的低成本高可用结构的设计  相似文献   

12.
Ray tracing is a well known technique to generate life-like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tracing. However, the computational cost of rendering pixels and patterns of data access cannot be predicted until runtime. To parallelize such an application efficiently on distributed memory parallel computers, the issues of database distribution, dynamic data management and dynamic load balancing must be addressed. In this paper, we present a parallel implementation of a ray tracing algorithm on the Intel Delta parallel computer. In our database distribution, a small fraction of database is duplicated on each processor, while the remaining part is evenly distributed among groups of processors. In the system, there are multiple copies of the entire database in the memory of groups of processors. Dynamic data management is acheived by an ALRU cache scheme which can exploit image coherence to reduce data movements in ray tracing consecutive pixels. We balance load among processors by distributing subimages to processors in a global fashion based on previous workload requests. The success of our implementation depends crucially on a number of parameters which are experimentally evaluated. © 1997 John Wiley & Sons, Ltd.  相似文献   

13.
High-performance computing in finance: The last 10 years and the next   总被引:2,自引:0,他引:2  
Almost two decades ago supercomputers and massively parallel computers promised to revolutionize the landscape of large-scale computing and provide breakthrough solutions in several application domains. Massively parallel processors achieve today terraFLOPS performance – trillion floating point operations per second – and they deliver on their promise. However, the anticipated breakthroughs in application domains have been more subtle and gradual. They came about as a result of combined efforts with novel modeling techniques, algorithmic developments based on innovative mathematical theories, and the use of high-performance computers that vary from top-range workstations, to distributed networks of heterogeneous processors, and to massively parallel computers. An application that benefited substantially from high-performance computing is that of finance and financial planning. The advent of supercomputing coincided with the so-called “age of the quants” in Wall Street, i.e., the mathematization of problems in finance and the strong reliance of financial managers on quantitative analysts. These scientists, aided by mathematical models and computer simulations, aim at a better understanding of the peculiarities of the financial markets and the development of models that deal proactively with the uncertainties prevalent in these markets. In this paper we give a modest synthesis of the developments of high-performance computing in finance. We focus on three major developments: (1) The use of Monte Carlo simulation methods for security pricing and Value-at-Risk (VaR) calculations; (2) the development of integrated financial product management tools and practices – also known as integrative risks management or enterprise-wide risk management, and (3) financial innovation and the computer-aided design of financial products.  相似文献   

14.
This paper analyzes the effect of communication delay on the optimal distribution of processing loads in distributed computing networks. The processing load is assumed to satisfy the property of arbitrary divisibility. The objective is to divide and distribute this processing load among various processors in the network in order to minimize the processing time. An asymptotic analysis of the performance of such networks is carried out to obtain a limit on the performance enhancement obtained by using additional processors. The architectures considered are linear and single-level tree configurations. The cases when the processors are equipped with and without front-ends are considered.  相似文献   

15.
针对基于PVM的由桌面PC机联网而成的网络并行计算环境中,处理机的运算速度较快而处理机间的通信相对较慢的实际情况,给出了一种局域网求解三角形方程组的并行算法,该算法将三角形方程组的系数矩阵及右端项按行分块,然后将分块的系数矩阵及右端项按卷帘方式存储在各处理机,通过循环传送已求出的解的部分分量以减少处理机间的通信开销,实现较容易。并在1-4台桌面PC机联成的局域网,PVM 3.4 on Windows2000,VC 6.0并行计算平台上编程对该算法进行了数值试验,试验结果表明该算法是有效的。  相似文献   

16.
Quisquater  J.-J. Desmedt  Y.G. 《Computer》1991,24(11):14-22
It is demonstrated that some problems can be solved inexpensively using widely distributed computers instead of an expensive supercomputer. This is illustrated by discussing how to make a simple fault-tolerant exhaustive code-breaking machine. The solution, which uses distributed processors is based on some elementary concepts of probability theory (lotto). The need for communication between processors is almost nil. Two approaches-deterministic and random-are compared. How to hide such a machine and how to build larger versions are discussed  相似文献   

17.
It is well known that parallel computers can be used very effectively for image processing at the pixel level, by assigning a processor to each pixel or block of pixels, and passing information as necessary between processors whose blocks are adjacent. This paper discusses the use of parallel computers for processing images at the region level, assigning a processor to each region and passing information between processors whose regions are related. The basic difference between the pixel and region levels is that the regions (e.g. obtained by segmenting the given image) and relationships differ from image to image, and even for a given image, they do not remain fixed during processing. Thus, one cannot use the standard type of cellular parallelism, in which the set of processors and interprocessor connections remain fixed, for processing at the region level. Reconfigurable cellular computers, in which the set of processors that each processor can communicate with can change during a computation, are more appropriate. A class of such computers is described, and general examples are given illustrating how such a computer could initially configure itself to represent a given decomposition of an image into regions, and dynamically reconfigure itself, in parallel, as regions merge or split.  相似文献   

18.
A main objective of scheduling independent jobs composed of multiple sequential tasks in shared-memory and distributed-memory multiprocessor computer systems is the assignment of these tasks to processors in a manner that ensures efficient operation of the system. Achieving this objective requires the analysis of a fundamental tradeoff between maximizing parallel execution, suggesting that the tasks of a job be spread across all system processors, and minimizing synchronization and communication overheads, suggesting that the job's tasks be executed on a single processor. The authors consider a class of scheduling policies that represent the essential aspects of this processor allocation tradeoff, and model the system as a distributed fork-join queueing system. They derive an approximation for the expected job response time, which includes the important effects of various parallel processing overheads (such as task synchronization and communication) induced by the processor allocation policy  相似文献   

19.
Epsilon is a testbed for monitoring distributed applications involving heterogeneous computers, including microcomputers, interconnected by a local area network. Such a hardware configuration is usual but raises difficulties for the programmer. First, the interprocess communication mechanisms provided by the operating systems are rather cumbersome to use. Second, they are different from one system to another. Third, the programmer of distributed applications should not worry about system and/or network aspects that are not relevant for the application level. The authors present the solution chosen in Epsilon. A set of high-level communication primitives has been designed and implemented to provide the programmer with an interface independent of the operating system and of the underlying interprocess communications facilities. A program participating in a distributed application can be executed on any host without any change in the source code except for host names  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号