排序方式: 共有45条查询结果,搜索用时 31 毫秒
41.
互连网络是设计 SIMD计算机的关键技术。该文通过设计 Lee无冲突访问互联网络,论述了共享存储器 SIMD计算机互联网络的一种设计与实现方法。 相似文献
42.
43.
Ronald C. Unrau Orran Krieger Benjamin Gamsa Michael Stumm 《The Journal of supercomputing》1995,9(1-2):105-134
We introduce the concept ofhierarchical clustering as a way to structure shared-memory multiprocessor operating systems for scalability. The concept is based on clustering and hierarchical system design. Hierarchical clustering leads to a modular system, composed of easy-to-design and efficient building blocks. The resulting structure is scalable because it 1) maximizes locality, which is key to good performance in NUMA (non-uniform memory access) systems and 2) provides for concurrency that increases linearly with the number of processors. At the same time, there is tight coupling within a cluster, so the system performs well for local interactions that are expected to constitute the common case. A clustered system can easily be adapted to different hardware configurations and architectures by changing the size of the clusters. We show how this structuring technique is applied to the design of a microkernel-based operating system calledHurricane. This prototype system is the first complete and running implementation of its kind and demonstrates the feasibility of a hierarchically clustered system. We present performance results based on the prototype, demonstrating the characteristics and behavior of a clustered system. In particular, we show how clustering trades off the efficiencies of tight coupling for the advantages of replication, increased locality, and decreased lock contention. 相似文献
44.
针对目前大规模电力系统难以实现快速实时仿真的问题,提出了一种基于Adomian分解方法的暂态稳定仿真并行算法。首先,在考虑节点权重的同时,采用METIS将大系统拆分成若干子系统,然后采用波形松弛方法对系统进行并行求解。为加速各子系统的迭代过程,所有状态变量经过隐式梯形积分格式差分化后,采用基于Adomian分解的迭代算法配合非诚实牛顿算法进行了求解;为了进一步提高波形松弛法的整体收敛性,同时还使用了窗口方法、预处理方法与波形预测方法;最后,采用2 383节点和12685节点两个算例进行了测试,发电机采用复杂模型,同时考虑励磁调速系统,并通过共享内存的并行环境加以实现。测试研究结果表明,上述算法可以取得较为理想的收敛速度和并行加速比,同时实现了上万节点的超实时仿真。 相似文献
45.
Bus-based multiprocessors constitute a cost-effective class of shared-memory multiprocessors. Private caches are the key to an efficient utilization of the shared bus, and most such systems use a write-invalidate cache-coherence protocol to keep the caches coherent. Two important factors that limit the performance of the system are cache misses that lead to long-latency reads and bus congestion because of read misses and coherence traffic. While hybrid write-invalidate/write-update snooping protocols lead to fewer read misses than write-invalidate protocols, previous studies have shown them to be incapable of providing consistent performance improvements because of heavily increased coherence traffic. In this paper, we analyze how the deficiencies of hybrid snooping protocols can be dramatically reduced by using write caches and read snarfing (also called read-broadcast) under release consistency. Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well as producer–consumer sharing. We show that one of the evaluated hybrid protocols, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83 and 93% as compared to a write-invalidate protocol for all five applications in this study. In addition, the number of bus transactions is reduced substantially. However, we also show that read snarfing and hybrid snooping protocols might lead to higher cache occupancy because of increased sharing. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe the combination to be an effective approach to boosting the performance of bus-based multiprocessors. 相似文献