The “fractional tree” algorithm for broadcasting and reduction is introduced. Its communication pattern interpolates between two well known patterns—sequential pipeline and pipelined binary tree. The speedup over the best of these simple methods can approach two for large systems and messages of intermediate size. For networks which are not very densely connected the new algorithm seems to be the best known method for the important case that each processor has only a single (possibly bidirectional) channel into the communication network. 相似文献
A switch matrix operating on baseband or microwave signals is a critical element of communications satellites employing multiple beam antennas and on-board switching. Optical switching by spatial light modulators (SLMs) offers a means of implementing large and highly flexible switch arrays capable of routeing signals at baseband or microwave frequencies. This approach offers potential mass, power and size advantages compared to alternative technologies. The paper reviews the essential features of optical crossbar switch architectures based on SLMs and discusses options for the lasers, SLMs, interface optics and photodetectors. Proof-of-concept demonstrators for optical crossbar switches operating on both baseband and microwave signals are described. Finally, an outline design for a compact switch module is described and the critical component developments needed to realize this are identified. 相似文献
The Earth Simulator (ES), developed under the Japanese government’s initiative “Earth Simulator project”, is a highly parallel vector supercomputer system. In this paper, an overview of ES, its architectural features, hardware technology and the result of performance evaluation are described.
In May 2002, the ES was acknowledged to be the most powerful computer in the world: 35.86 teraflop/s for the LINPACK HPC benchmark and 26.58 teraflop/s for an atmospheric general circulation code (AFES). Such a remarkable performance may be attributed to the following three architectural features; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network.
The ES consists of 640 processor nodes (PN) and an interconnection network (IN), which are housed in 320 PN cabinets and 65 IN cabinets. The ES is installed in a specially designed building, 65 m long, 50 m wide and 17 m high. In order to accomplish this advanced system, many kinds of hardware technologies have been developed, such as a high-density and high-frequency LSI, a high-frequency signal transmission, a high-density packaging, and a high-efficiency cooling and power supply system with low noise so as to reduce whole volume of the ES and total power consumption.
For highly parallel processing, a special synchronization means connecting all nodes, Global Barrier Counter (GBC), has been introduced. 相似文献
Architectures based on a non-blocking fabric, such as a crosspoint switch, are attractive for use in high-speed LAN switches, IP routers, and ATM switches. When operating at the highest speed, memory bandwidth limitations dictate that queues be placed at the input of the switch. But it is well known that input-queueing can lead to low throughput, and does not allow the control of latency through the switch. This is in contrast to output-queueing which maximizes throughput and permits the accurate control of packet latency through scheduling. We ask the question: Can a switch with combined input and output queueing be designed to behave identically to an output-queued switch? In this paper, we prove that if the switch uses virtual output queueing and has an internal speedup of just four, it is possible for it to behave identically to an output-queued switch, regardless of the nature of the arriving traffic. Our proof is based on a novel scheduling algorithm, called Most Urgent Cell First. We find that with a speedup of four the most urgent cell first algorithm (or MUCFA) enables perfect emulation of a FIFO output-queued switch, i.e. one in which packets depart in the same order that they arrived. We extend this result to show that with a small modification, the MUCFA algorithm enables perfect emulation of a variety of output scheduling policies, including strict priorities and weighted fair-queueing. This result makes possible switches that perform as if they were output-queued, yet use memories that run more slowly. 相似文献
Resistive-switching memory (RRAM) is receiving a growing deal of research interest as a possible solution for high-density, 3D nonvolatile memory technology. One of the main obstacle toward size reduction of the memory cell and its scaling is the typically large current Ireset needed for the reset operation. In fact, a large Ireset negatively impacts the scaling possibilities of the select diode in a cross-bar array structure. Reducing Ireset is therefore mandatory for the development of high-density RRAM arrays. This work addresses the reduction of Ireset in NiO-based RRAM by control of the filament size in 1 transistor-1 resistor (1T1R) cell devices. Ireset is demonstrated to be scalable and controllable below 10 μA. The significance of these results for the future scaling of diode-selected cross-bar arrays is finally discussed. 相似文献