A combined input and crosspoint queued (CICQ) switch is receiving significant attention to be the next generation high speed packet switch for its scalability; however, a multi-cabinet implementation of a combined input and crosspoint queued (CICQ) switch unavoidably introduces a large round-trip time (RTT) latency between the line cards and switch fabric, resulting a large crosspoint (CP) buffer requirement. In this paper, virtual crosspoint queues (VCQs) that significantly reduces the CP buffer requirement of the CICQ switch is investigated. The VCQs unit resides inside the switch fabric, is dynamically shared among virtual output queues (VOQ) from the same source port, and is operated at the line rate, making the implementation practical. A threshold-based exhaustive round-robin (T-ERR) arbitration is employed to reduce buffer hogging at VCQ. The T-ERR at VCQ and CP arbiters serves packets residing in a longer queue more frequently than packet residing in a shorter queue. Consequently, the T-ERR, drastically increases the throughput of the CICQ switch with small CP buffers. A multi-cabinet implementation of CICQ switch do not support multicasting traffic well since a combination of small CP buffer in the switch fabric and a large RTT latency between the line cards and switch fabric results in non-work conservation of the intra-switch link. Deployment of multicast FIFO buffer between the input buffer and CP buffer shows a promise. With its ability to achieve high throughput independent of RTT and switch port size, the integration of the VCQ architecture and T-ERR scheduler to the CICQ switch is ideal for supporting ever-increasing Internet traffic that requires higher data rate, larger switch size, and efficient multicasting. 相似文献
In order to process complex and large-scale graph data, numerous distributed graph-parallel computing platforms have been proposed. PowerGraph is an excellent representative of them. It has exhibited better performance, such as faster graph-processing rate and higher scalability, than others. However, like in other distributed graph computing systems, unnecessary and excessive communications among computing nodes in PowerGraph not only aggravate the network I/O workload of the underlying computing hardware systems but may also cause a decrease in runtime performance. In this paper, we propose and implement a mechanism called L-PowerGraph, which reduces the communication overhead in PowerGraph. First, L-PowerGraph identifies and eliminates the avoidable communications in PowerGraph. Second, in order to further reduce the required communications L-PowerGraph proposes an edge direction-aware master appointment strategy, in which L-PowerGraph appoints the replica with both incoming and outgoing edges as master. Third, L-PowerGraph proposes an edge direction-aware graph partition strategy, which optimally isolates the outgoing edges from the incoming edges of a vertex during the graph partition process. We have conducted extensive experiments using real-world datasets, and our results verified the effectiveness of the proposed mechanism. For example, compared with PowerGraph under Random partition scenario L-PowerGraph can not only reduce up to 30.5% of the communication overhead but also cut up to 20.3% of the runtime for PageRank algorithm while processing Live-journal dataset. The performance improvement achieved by L-PowerGraph over our precursor work, LightGraph, which only reduces the synchronizing communication overhead, is also verified by our experimental results.
Parallel applications suffer from I/O latency. Pre-execution I/O prefetching is effective in hiding I/O latency, in which a pre-execution prefetching thread is created and dedicated to fetch the data for the main thread in advance. However, existing pre-execution prefetching works do not pay attention to the relationship between the main thread and the pre-execution prefetching thread. They just simply pre-execute the I/O accesses using the prefetching thread as soon as possible failing to carefully coordinate them with the operations of the main thread. This drawback induces a series of adverse effects on pre-execution prefetching such as diminishing the degree of the parallelism between computation and I/O, delaying the I/O access of main threads, and aggravating the I/O resource competition in the whole system. In this paper, we propose a new method to overcome this drawback by scheduling the I/O operations among the main threads and the pre-execution prefetching threads. The results of extensive experiments on four popular benchmarks in parallel I/O performance area demonstrate the benefits of the proposed approach. 相似文献
In some wireless sensor network applications, sensor nodes will be deployed in harsh communication environments. In such environments, the deployment may not be adequately controlled, and nodes may have to communicate with a single destination node. For nodes to alert the destination on critical data that has been sensed, in addition to the harsh communication environment, contention resulting from both the deployment and network density must be appropriately overcome. In this paper, we create theoretical models for the behavior of Timeout-MAC (T-MAC) protocol, and evaluate five possible solutions, each designed to be easy to implement on a device by simply tuning T-MAC parameters, so as to overcome these environment-specific issues and effectively alert the destination to critical data. Our results indicate that slight changes to the behavior of the network can improve the awareness of the destination to critical regions in the environment, and that these changes have different levels of effectiveness at different network densities. 相似文献
Interface defects related to negative-bias temperature instability (NBTI) in an ultrathin plasma-nitrided SiON/Si1 0 0 system were characterized by using conductance–frequency measurements, electron-spin resonance measurements, and synchrotron radiation X-ray photoelectron spectroscopy. It was confirmed that NBTI is reduced by using D2-annealing instead of the usual H2-annealing. Interfacial Si dangling bonds (Pb1 and Pb0 centers) were detected in a sample subjected to negative-bias temperature stress (NBTS). Although we suggest that NBTS also generates non-Pb defects, it does not seem to generate nitrogen dangling bonds. These results show that NBTI of the plasma-nitrided SiON/Si system is predominantly due to Pb depassivation. Plasma nitridation was also found to increase the Pb1/Pb0 density ratio, modify the Pb1 defect structure, and increase the latent interface trap density by generating Si suboxides at the interface. These changes are likely to be the causes of NBTI in ultrathin plasma-nitrided SiON/Si systems. 相似文献
We have investigated the adsorption of atomic hydrogen on vertically aligned carbon nanotube (CNT) films using in situ synchrotron-radiation-based core-level (CL) photoelectron spectroscopy and Raman spectroscopy. From C 1s CL spectra, we identified a CL peak component due to C-H bonds of carbon atoms in single-walled carbon nanotubes (SWCNTs). We also found the suppression of π-plasmon excitation, indicating that the hydrogen adsorption deforms the bonding structure. Raman spectra of the SWCNT film indicated that the radial-breathing-mode intensities of SWCNTs decreased due to the adsorption-induced bonding-structure deformation. Moreover, the decrease for small-diameter SWCNTs was more severe than that for large-diameter SWCNTs. Our results strongly suggest that the hydrogen adsorption, which induces the structure deformation from sp2 to sp3-like bonding, depends on the diameter of SWCNTs. 相似文献
In this paper, we present an original design for an Ethernet switch with crosspoint-queued crossbar switching fabric and analyze its performance. Recently, significant progress has been made on performance analysis of crosspoint queued crossbar switches using analytical and simulation methods. We propose a hardware implementation on the NetFPGA platform that can provide reliable results obtained in an experimental environment. It is shown that the proposed design performs as expected and outperforms reference design under some incoming traffic conditions. 相似文献