共查询到19条相似文献,搜索用时 46 毫秒
1.
2.
3.
4.
5.
本文介绍了基于P6总线的多处理器系统的总线事务和存储区的Cache属性.讨论了P6忌线的硬件监听机制,Pentium Ⅲ处理器所采用的MESI状态转换.最后研究了多处理器和P6总线如何相互配合以保证整个系统的Cache一致性。 相似文献
6.
探讨通过优化总线接口部件的设计来提高处理器整体性能,优化的措施着重于降低处理器访存的次数和减小总线负载。仿真和验证结果证明这些方法是可行有效的。 相似文献
7.
8.
一种基于总线的多处理器共享内存机制 总被引:3,自引:1,他引:3
基于总线的分布式多处理器体系结构是目前常见的高性能路由器硬件体系结构,清华大学计算机系统在研制“863”重大项目“高性能安全路由器”的过程中,在基于CompactPCI总线的PowerPC多处理器平台上实现了一种多处理器共享内存机制,该共享内存机制(SM机制)实现了一系列核心对象,包括SM内存,SM信号量,SM消息队列和SM任务控制块等,本文详细介绍了SM机制的设计与实现并给出了性能测试结果。 相似文献
9.
10.
多处理器共享缓存设计与实现 总被引:1,自引:0,他引:1
张剑飞 《计算机与数字工程》2008,36(9)
高速缓存作为中央处理器(CPU)与主存之间的小规模快速存储器,解决了两者数据处理速度的平衡和匹配问题,有助于提高系统整体性能.多处理器(SMP)支持共享和私有数据的缓存,Cache一致性协议用于维护由于多个处理器共享数据引发的多处理器数据一致性问题.论述了一个适用于64位多核处理器的共享缓存设计,包括如何实现多处理器缓存一致性及其全定制后端实现. 相似文献
11.
This paper presents a new cache consistency scheme for hierarchically structured shared-memory multiprocessors. The scheme is simple, fast and efficient, and it does not require a large amount of state information to be maintained. The scheme exploits the broadcast capability of these systems, but limits the extent of the broadcasts by means of a novel filtering mechanism. As a specific example, it is shown how the proposed cache consistency scheme can be implemented on the Hector multiprocessor architecture. Using trace-driven simulations, we demonstrate that the scheme is scalable and performs well for common applications. 相似文献
12.
David Bernstein Mauricio Breternitz Jr. Ahmed M. Gheith Bilha Mendelson 《International journal of parallel programming》1995,23(1):83-103
We analyze two important problems that arise in shared-memory multiprocessor systems. Thestale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done
by other processors.False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false
sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems
are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive
and nonscalable designs. Therefore, we are concentrating on software methods for ensuring cache consistency that would allow
for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the
compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate
the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache
management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration
its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem
is discussed as well, and solutions are suggested. 相似文献
13.
设计了一个基于SDZX-MV-02核的多处理器架构,设计的公共存储器总线切换器,解决了多处理器共享数据的问题;设计的I/O锁存器,解决了多处理器之间的信息、命令和状态的互传;给出了实现框图、实现代码和仿真结果,较好地解决了用低端徽处理器实现高端机器视觉处理的问题。 相似文献
14.
为了解决总线网络中的分布式数据一致性问题,提出了基于总线网络的分布式一致性算法.该算法通过Mod运算将多节点仲裁问题转化为惟一节点仲裁,减少了所需的消息数,降低了系统的负荷;通过消息复用的方法减少了算法所需的消息种类,并缩短了响应延迟.理论性能分析和仿真试验表明,该算法较之传统算法具有较低的消息复杂度和时间复杂度. 相似文献
15.
多处理机通过共享的主存或输入/输出子系统或高速通信网络进行通信。利用多台处理机进行多任务处理,协同求解一个大而复杂的问题来提高速度,或者依靠冗余的处理机及其重组能力来提高系统的可靠性、适应性和可用行。该文介绍了微处理器的发展、多处理机的总线以及处理机系统中通信和存储技术的发展和两种特殊的多处理机系统结构。 相似文献
16.
In an earlier paper we introduced an indirect binary n-cube memory server network which has adaptive properties making it useful in a parallel vector processing environment. The memory server network, due to a special choice in the design of the basic switch node, has the property that N vector processors issuing vector fetches with similar strides are forced into lock step after an initial startup investment.
In this paper we extend this work to the case of the indirect k-any n-cube. As this network has a more favorable memory latency scaling of logkN, one expects that the short vector performance will be improved as k is increased for a given N. We find this to be the case. We also find that the cost of the memory server system scales in a manner which prefers modest values of k above 2. 相似文献
17.
Don Fay 《Microprocessors and Microsystems》1984,8(1):3-15
It is assumed that a host processor computes the corner coordinates of surfaces and outputs these sequentially, in ranked order, to the components described in an OCCAM program. The data is precomputed and stored in a sequential file. A scheduler controls the activity of a number of zone management processors (ZMPs), all running in parallel, and a special memory buffer. Each ZMP handles only one surface at a time. A processor can pick up a new surface for display when the previous surface has been completed. Only one ZMP can write into a given raster scanline at one time. Others may be writing into the same column of other lines at the same time. Hidden surface elimination is achieved by processing the surfaces in an order ranked on distance from the viewing point. This ranking is done in the host processor. The ranked data is held on a file, which is read sequentially in 512 byte blocks. This data has been previously computed and stored as a sequence of double-byte integers in the required order for a series of picture frames, one frame per 512 byte block. The occam implementation on the Apple II europlus running under UCSD version 4 is very slow. It is postulated that an implementation using separate occam professor hardware units for each appropriate process would run in real time. There is considerable communication between the processors. The activity of each processor is generally sequential and all the processors run in parallel. Comments are made about some of the problems and advantages of programming in occam in an appendix. 相似文献
18.
Yigal Hoffner 《Microprocessors and Microsystems》1983,7(3):111-116
A multiprocessor system, based on the 6800 family of microprocessors, is described. It is designed for experimentation in hardware and software aspects of distributed processing. System organization, hardware and software are all given. Listings of test programs are provided. 相似文献
19.
Peter K. K. Loh 《Microprocessors and Microsystems》1995,19(10):591-597
We examine the suitability of three heuristic search algorithms (Greedy Constructive Scheme, Best First Search and A*) for use as routing strategies on a faulty multiprocessor network. Our search space is a simulated 5 × 5 × 5 (125-node) multiprocessor mesh network. Each virtual node comprises a processor and a communications switch supporting explicit message backtracking. Their performances are compared for up to 20% of randomly generated faulty links. The results show that heuristic search algorithms can be implemented as fault-tolerant routing strategies and that the modified Best First Search routing strategy performed consistently better with significantly less degradation than the Greedy Constructive Scheme and the A* strategies. 相似文献