首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
本文讨论了由不同速度的计算机组成一紧密结合的多处理机系统的设计上的考虑。这一类型的多处理机系统有如下的优点。系统组织上的灵活性,另一重要的优点是基于机器的不同性能特点适当地分配不同类型的任务可以改进系统的性能。本文叙述了控制此多机系统所必要的操作系统。对于非对称任务的调配是考虑到处理机之间的适当的计算负载的分配和由于调整计算速度的差别而进行处理机时钟值的转换。还讨论了支撑这样一个多处理机系统的操作系统OS_7。它能包括多至四台HITAC8700或HITAC8800处理机。最后还叙述了非均匀多处理机系统的性能的分析和估价。  相似文献   

2.
Cache的一致性问题分析   总被引:3,自引:0,他引:3  
Cache的一致性问题是多处理机系统的一个重要问题。本文首先提出在多处理机系统中,主存与各处理机私有Cache之间及各私有Cache之间的存在数据不一致问题。对解决不一致性的方法:监视法、目录法、软件方法等进行了详细的定性分析;指出了各种方法的优缺点,供设计者参考;同时提出用软件和硬件相结合的方法,更能有效地解决Cache不一致性问题。  相似文献   

3.
本文是HDS-9机操作系统中所考虑的一种具体方案。其中大部分功能已付诸实现。文中不详谈有关单处理机和多处理机系统之间存在的那些共同性问题,而着重讨论设计与实现多处理机系统有关作业调度方面的一些技术.诸如,多处理机系统中软、硬件特点,作业类型及其处理方法,分级调度策略,共享数据基的处理等等。  相似文献   

4.
Ada语言通过设定任务为一个程序中并行的过程提供了一种工具。几个异步执行的任务可以连续地写在一个Ada语言程序中。任务内部的通讯和同步是通过会合原理来保证的。本文论述的是多处理机系统中Ada任务的一种实现方法。文中用Intel80286处理机作为这种系统的例子,并例举了实现约定任务间的通讯与同步所需要的代码,这些任务可能是在不同的处理机上执行。任务内的信息传送是Ada应用程序的一个重要方面。本文中有效地址信息可在整个共享的内存中传送。  相似文献   

5.
多处理机系统分析   总被引:3,自引:1,他引:3  
多处理机系统中单个处理机对存储模块有两种访问方式:集中共享存储器和分布共享存储器。为了在访问时间上与高速的处理机相匹配,多处理机系统中要使用CACHE。对于出现的CACHE一致性问题,一般采用写废和写改策略来解决。本文主要讨论多处理机体系结构和Cache一致性问题及解决方法,并举例Sun Enterprise。  相似文献   

6.
在国际上,计算机学者在讨论下一代的处理机体系结构和未来的计算机技术。因为微电子工艺的迅速发展,已有可能在一个芯片上集成10亿个晶体管。这样的极高的集成度(ELSI),为计算机体系结构设计者开辟了一个宽广的技术领域和思想空间,容许我们设计出崭新的处理机和计算机芯片。本文综述了目前美国认为有前途的四种所谓BilionTransistorsonChip的体系结构:即单个功能很强的处理机(U-NIPROC),并发的多线程处理机(SMT),片上多处理机系统(CMP),和智能随机存储器(IRAM)。这四种体系结构,将充分利用片上指令级并行性ILP和线程级的并行性,或充分利用片上数据交换的极高频宽,或充分利用CPU和存储器之间的高速频带总线。而这种10亿晶体管的芯片,可使单片的峰值速度达到160亿次/秒。这对未来的计算机技术会带来极其深刻的影响。在这种芯片设计中,体系结构设计者还要十分注意微电子工艺在极高集成度中的一些特殊问题。  相似文献   

7.
1.引言处理机间的有效通信是多处理机系统的重要课题之一。在由数百甚至数千台微处理机组成的模块式的通用计算系统中,通信可能成为最重要的课题。在最近的文章中[7],我们为大型多处理机系统提出了一个新的体系结构作为下一问题的解答:“在大规模集成和超大规  相似文献   

8.
本文针对具有共享多端口存储器的多处理机系统,研究了通讯同步方法,并根据人工神经网络的计算特点,提出了一种TMS320C30紧密结合的巧妙的指令级同步方法,仅用一个周期就能完成三个处理单元的一次同步,使得通讯速度加快并且方法简便,高效。  相似文献   

9.
纵横开关性能研究   总被引:1,自引:0,他引:1  
本文分析了将纵横开关作为多处理机系统的处理机——存贮器互连网络时的有效存贮器带宽。对已有的模型做了进一步扩展,探讨了当处理机的访存请求为任意分布时的模型,并通过模拟实验讨论了模型的适用范围。  相似文献   

10.
针对一类多处理机系统的任务(进程)在各处理机上的调度问题,提出了一个面向实现的启发式(求次优解)算法。目标是使各处理机负载趋于平衡并尽可能减少代码和数据的传送量。分布在这里是指算法的局部算法是分布在各处理机上的;集中在这里是指存在全局共享的控制数据。我们在PDP 11/23上用汇编语言(开始是在TRS 80上用汇编语言)对算法进行了初步模拟。  相似文献   

11.
Previous investigations have suggested the use of multiple communicating processors for executing logic programs. However, this strategy lacks efficiency due to competition for memory and communication bandwidth, and this is a problem that has been largely neglected. In this paper we propose a realistic model for executing logic programs with low overhead on multiple processors. Our proposal does not involve shared memory or copying computation state between processors. The model organises computations over the nondeterministic proof tree so that different processors explore unique deterministic computation paths independently, in order to exploit the “OR-parallelism” present in a program. We discuss the advantages of this approach over previous ones, and suggest control strategies for making it effective in practice.  相似文献   

12.
The 80386 is a high-performance third-generation microprocessor that is now standard in most top-of-the-range PCs. Like all similar processors operating at clock rates above 30 MHz, the 80386 must use cache memory if it is to operate efficiently. Without cache memory, the user must either pay a very high price for very fast RAM or employ slower memory by introducing wait states. This application note describes the 80386 bus interface and demonstrates how it can be interfaced to IDT cache tag RAMs to create a cache system. Although the report describes a relatively basic cache system, it covers all design considerations ranging from system timing to the programming of the PALs needed to implement the interface. A.C.  相似文献   

13.
A method of operating a multiprocessor system consisting of a large number of processors accessing a common memory is presented. Access to the memory is performed in a deterministic manner which eliminates the need for arbitration logic. An analysis of the method is given and a comparison made against crossbar switch and common bus systems with serial daisy chain and parallel arbitration logic. The key feature of the method is that the memory offers access to locations rather than the processors making asynchronous requests. The scheme has particular application to macro-dataflow when a common memory is used to hold function parameters.  相似文献   

14.
Continuous improvements in semiconductor fabrication density are supporting new classes of System-on-a-Chip (SoC) architectures that combine extensive processing logic/processor with high-density memory. Such architectures are generally called Processor-in-Memory (PIM) or Intelligent Memory (I-RAM) and can support high-performance computing by reducing the performance gap between the processor and the memory. The PIM architecture combines various processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a novel strategy must be developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit them fully. Accordingly, this study presents an automatic source-to-source parallelizing system, called statement-analysis-grouping-evaluation (SAGE), to exploit the advantages of PIM architectures. Unlike conventional iteration-based parallelizing systems, SAGE adopts statement-based analyzing approaches. This study addresses the configuration of a PIM architecture with one host processor (i.e., the main processor in state-of-the-art computer systems) and one memory processor (i.e., the computing logic integrated with the memory). The strategy of the SAGE system, in which the original program is decomposed into blocks and a feasible execution schedule is produced for the host and memory processors, is investigated as well. The experimental results for real benchmarks are also discussed.  相似文献   

15.
FFT处理器无冲突地址生成方法   总被引:8,自引:2,他引:6  
马余泰 《计算机学报》1995,18(11):875-880
本文提出了一种新的无冲突地址生成方法,使蝶式运算单元在一个周期内能够同时读取两个操作数。由于取消了地址奇偶判别电路,简化了存储体控制逻辑,同 时也加快了输入/输出地址生成,该方法还同样适用于基-4FFT处理器。  相似文献   

16.

Weak memory models implemented on modern multicore processors are known to affect the correctness of concurrent code. They can also affect whether or not the concurrent code is secure. This is particularly the case in programs where the security levels of variables are value-dependent, i.e., depend on the values of other variables. In this paper, we illustrate how instruction reordering allowed by ARM and POWER multicore processors leads to vulnerabilities in such programs, and present a compositional, flow-sensitive information-flow logic which can be used to detect such vulnerabilities. The logic allows step-local reasoning (one instruction at a time) about a thread’s security by tracking information about dependencies between instructions which guarantee their order of occurrence. Program security can then be established from individual thread security using rely/guarantee reasoning. The logic has been proved sound with respect to existing operational semantics using Isabelle/HOL, and implemented in an automatic symbolic execution tool.

  相似文献   

17.
This paper presents instruction set architectural guidelines for improving general-purpose embedded processors to optimally accommodate packet-processing applications. Similar to other embedded processors such as media processors, packet-processing engines are deployed in embedded applications, where cost and power are as important as performance. In this domain, the growing demands for higher bandwidth and performance besides the ongoing development of new networking protocols and applications call for flexible power- and performance-optimized engines.The instruction set architectural guidelines are extracted from an exhaustive simulation-based profile-driven quantitative analysis of different packet-processing workloads on 32-bit versions of two well-known general-purpose processors, ARM and MIPS. This extensive study has revealed the main performance challenges and tradeoffs in development of evolution path for survival of such general-purpose processors with optimum accommodation of packet-processing functions for future switching-intensive applications. Architectural guidelines include types of instructions, branch offset size, displacement and immediate addressing modes for memory access along with the effective size of these fields, data types of memory operations, and also new branch instructions.The effectiveness of the proposed guidelines is evaluated with the development of a retargetable compilation and simulation framework. Developing the HDL model of the optimized base processor for networking applications and using a logic synthesis tool, we show that enhanced area, power, delay, and power per watt measures are achieved.  相似文献   

18.
The availability of low cost, high performance microprocessors has led to various designs of shared memory multiprocessor systems. As a result, commercial products which are based on shared memory have been proliferated. Such a multiprocessor system is heavily influenced by the structure of memory system and it is not difficult to find that most configurations include local cache memories. The more processors a system carries, the larger local cache memory is needed to maintain the traffic to and from the shared memory at reasonable level. The implementation of local cache memories, however, is not a simple task because of environmental limitations. In particular, the general lack of board space availability presents a formidable problem. A cache memory system usually needs space mostly to support its complex control logic circuits for the cache itself and network interfaces like snooping logic circuits for shared bus. Although packaging can be made denser to reduce system size, there are still multiple processors per board. It requires a more area-efficient cache memory architecture. This paper presents a design of shared cache for dual processor board of bus-based symmetric multiprocessors. The design and implementation issues are described first and then the evaluation and measurement results are discussed. The shared cache proposed in this paper has been determined to be quite area-efficient without the significant loss of throughput and scalability. It has been implemented as a plug-in unit for TICOM, a prevalent commercial multiprocessor system.  相似文献   

19.
Simulation of complex digital electronic systems requires powerful machines and algorithms. Distributed simulation could improve both the execution time and the availability of a large distributed memory for complex models. Model partitioning onto the available processors has a major impact on simulation efficiency. We report on how various partitioning algorithms affect timewarp-based distributed simulation of combinational and synchronous sequential logic circuits, and try to determine the relationship between circuit parameters (the number of gates, topological levels and the degree of activity in the circuit) and the structure of the partition having the fastest simulation on a heterogeneous network of Sun workstations.  相似文献   

20.
Thurber  K.J. 《Computer》1981,14(2):11-12
Computer system hardware continues to advance by leaps and bounds. The cost of many types of logic and memory circuits is decreasing at annualized rates of 25 to 50 percent, and new interconnection strategies are being developed as circuits on a chip begin to resemble complex processors. We can foresee complete systems on a single chip, a computer in every home, an automated system in every office, and international assemblies of computer systems communicating via satellite networks. The future is now.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号