首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
并行处理机外围子系统的设计和实现技术直接影响整个系统的性能价格比,本文根据SPP体系结构的特点和实际应用需要,在前端服务器与SM/SSM之间设计了专用的I/O处理机,使得系统I/O设备与SM/SSM之间直接进行高速数据传送,从而大大提高系统的I/O性能。在I/O处理机的设计中,采用了i860+82380+SRAM的总体结构,从而实现了处理机访问主存和DMA控制器访问SRAM之间的并行。  相似文献   

2.
并行文件系统的设计   总被引:2,自引:0,他引:2  
孙凝晖 《计算机学报》1994,17(12):938-945
在大规模并行处理巨型机(MPP)的设计中,提高I/O性能同提高计算能力和通信能力同样重要。并行文件系统(PFS)在多个I/O结点的多个磁盘上,分布文件系统和文件的磁盘块,将文件读写在计算结点转化成多个对物理块的直接I/O请求,利用预读,预分配,磁盘缓冲式区和异步I/O增加I/O的并发生,在特定的文件使用模式下,也是MPP应用的主要I/O模式,获得很高的I/O效率。  相似文献   

3.
VPP500向量并行处理机是一台高度并行的分布式存储器巨型计算机,性能范围是6.4 ̄355GFLOPS,主存容量为1 ̄222GB。该系统可支持4 ̄222个由高带宽交叉开关网络互连的处理器。VPP500与当前大规模并行系统截然不同的三个关键特征决定了其体系结构。第一,它的组成部件是1.6GFLOPS的向量处理器,比大规模并行处理机(MPP)中使用的处理器快一个数量级。这种极高的单处理器性能降低了系统  相似文献   

4.
文中用合并选择的思想及堆上的最佳算法,给出了求解选择问题的一个新算法及其相应的并行化。将串行合并选择算法的复杂度nLogk+O(n)降低到(nLogk)/2+(nLogLogk)/2+O(n),并保持了原并行算法的结构,在SIMD树型机器的并行计算模型上,并行运行  相似文献   

5.
刘键  谢卫 《计算机学报》1996,19(7):520-529
本文提出了一个分配相关新概念以及与此相应的基于迭代空间等价分类的DO-loop并行划分的新观点与新方法,这种方法的主要特点是:(1)是关于DO-loop并行划分的一个一般的统一的方法,能解决所有DO-loop的并行划分问题。(2)能准确地挖掘出程序中所有DO-loop的并行性,并且同时自动完成数据划分与计算划分。(3)最适用于MIMD与SPMD的大粒度并行划分。(4)可以和任务给并行划分技术,向量  相似文献   

6.
面向对象的多媒体数据库系统(OODBMS)为多媒体管理信息系统(MMIS)的开发与应用奠定了坚实的基础。本文阐述了MMIS的主要特征和构成;并讨论了MMIS对MDBS的功能需求的三个方面中后两个方面即数据模型方面及多媒体对象共享和操作方面的需求;最后讨论了OODBMS支持MMIS中的几个问题。  相似文献   

7.
NeuronChip提供有关11个可编程I/O引脚(IO0—IO10),它们可工作在34种工作方式下,例如:位I/O;字节I/O;异步串行I/O;并行I/O等。其中异步串行通信方式仅能工作在半双工方式。本文作者根据实际应用的需要设计实现了异步串行通信的全双工工作方式,并已用于实际节点。  相似文献   

8.
景晓军  方滨兴 《软件学报》1996,7(7):401-408
SIMC(SIMDC)是通过对C语言进行语法扩展(未进行语义扩展)得到的支持SIMD(singleinstructionmultipledata)并行程序设计的并行语言.SIMC可方便地描述SIMD并行算法,具有SIMD计算机系统结构定义能力,可支持多种系统结构上的并行算法研究.SIMC语言的模拟执行系统已在单机上实现,并作为作者研究开发的SIMD计算机程序设计及性能评价模拟环境的并行程序设计语言,用于SIMD计算机算法及结构的性能评价.  相似文献   

9.
并行文件系统已作为超级计算机提高I/O带宽最常用的方法之一,鉴于超级计算机之间体系结构的差异,其设计和实现方法也不同。本文就IntelTFLOPS的并行文件系统(PFS)作一介绍。  相似文献   

10.
高性能并行I/O实现技术分析   总被引:1,自引:0,他引:1  
本文就实现高性能并行I/O的技术问题作了一番比较,认为具有独立I/O网的外部并行I/O结构是最适于实现高性能并行I/O的平台。因而,只有从应用算法研究着手,获取适合并行I/O的数据布局类型,并在语言、编译和OS的支持下实现这种布局和并行I/O访问,才有可能达到较理想的性能指标。  相似文献   

11.
This paper presents further results on the design and implementation of various optimizations based on our earlier work of developing a parallel pipelined model for the computational intensive applications that have multiple processing tasks. Performance evaluation of this model was done by using a real-time airborne radar application that employs a Space-Time Adaptive Processing (STAP) algorithm. This paper focuses on the following four issues: (1) The tradeoffs between increasing the throughput and reducing the latency are examined in more detail when allocating processors among different processing tasks. (2) A multi-threaded design is incorporated into the pipeline model and implemented on a massively parallel computer with symmetric multi-processor nodes, which shows enhanced performance. (3) The disk I/O is incorporated into the parallel pipeline to study its effect on performance in which two I/O task designs have been implemented: embedding I/O in the pipeline or having a separate I/O task. By using a double buffering approach together with the asynchronous I/O, the overall pipeline performance scales well as the number of processors increases. (4) From the comparison of the two I/O implementations, it is discovered that the latency may be improved when merging multiple tasks into a single task. The effect of reorganizing the task structure of the pipeline is discussed in detail. All the performance results shown in this work demonstrate the linear scalability the parallel pipeline model can achieve using a production radar application. Although this paper focuses on the implementation of the parallel pipeline model and uses the results from a STAP application to support the claims of the discovered properties for this pipeline, this model is also applicable to many other types of applications with similar computational characteristics.  相似文献   

12.
基于网络处理器的路由交换方案   总被引:4,自引:0,他引:4  
解超杰  武波 《微机发展》2005,15(6):60-61,64
网络处理器是新一代网络设备的核心器件,基于网络处理器的路由器交换机开发是一个热点。由于ASIC和通用CPU各自的局限无法满足日益增长的网络流量和业务的需求,从而出现了网络处理器,网络处理器一般是由通用处理器作为控制CPU,多个转发引擎并行处理分组以隐藏访问I/O设备的延时,并通过协处理器来加速路由查找、CRC计算等功能。通过分析网络处理器的体系结构并依据当前网络处理器发展的实际情况提出了几种基于网络处理器设计的路由交换系统方案,并分析了各种方案的特点及应用场合。  相似文献   

13.
非定常Monte Carlo输运问题的并行算法   总被引:1,自引:0,他引:1  
文中给出了非定常MonteCarlo(下文简写为MC)输运问题的并行算法 ,对并行程序的加载运行模式进行了讨论和优化设计 .针对MC并行计算设计了一种理想情况下无通信的并行随机数发生器算法 .动态MC输运问题有大量的I/O操作 ,特别是读取剩余粒子数据文件需要大量的I/O时间 ,文中针对I/O问题 ,提出了三种并行I/O算法 .最后给出了并行算法的性能测试结果 ,对比串行计算时间 ,使用 6 4台处理机时的并行计算时间缩短了 30倍  相似文献   

14.
§1.引言 对Boltzmann方程求解,采用连续截面、精确角分布的蒙特卡罗模拟(下简记为MC),可以获得理想的结果,然而MC方法计算耗时多是其相对其它方法的最大不足,并行计算和高加速比是克服这种不足的可行途径。  相似文献   

15.
数字图像处理需要大量的数据运算,要求系统具有很高的数据吞吐量。并行处理结构能较好地满足这一要求。介绍一种SIMD并行多DSP数字图像处理系统。该系统具有避免冲突、能连续处理图像数据、处理器间通信及I/O部分简单、硬件及软件模块化等优点。  相似文献   

16.
We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n4) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n3 log2n) time. One of our new methods also takes O(n3 log2n) time, but the other one takes only O(n3) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems.  相似文献   


17.
As the technology used to implement computer network infrastructure advances, networking resources are becoming more vulnerable to attack. Recent router designs are based on general-purpose programmable processors, which increase their potential vulnerability. To address this issue, a Secure Packet Processing platform has been developed that can flexibly protect emerging router systems. Both instruction-level operation of embedded processors and I/O operations of router ports are monitored to detect anomalous behavior. If such behavior is detected, a recovery system is invoked to restore the system into an operational state. Experimental results show that processor-based attacks can generally be determined by a processing monitor within a single instruction. I/O anomalies, including unexpected packet broadcast or delay, can be detected by an I/O monitor with limited overhead. Overall, the system overhead for secure monitoring is limited to a fraction of the overall system space, memory, and power budget.  相似文献   

18.
提高可扩展并行机群并行I/O效率的一个方法   总被引:10,自引:0,他引:10  
随着CPU性能的高速提升,系统I/O能力的不足越来明显地成为提高NOW系统整体性能的瓶颈。在分析现有基于NOW系统的并行I/O算法的基础上,通过理论推导,给出了一种寻求计算进程与计算结点之间最佳映射的方法。该方法可以在数据重分配时,使各计算的通信量小,从而达到提高系统并行I/O效率的目的。  相似文献   

19.
In general, message passing multiprocessors suffer from communication overhead between processors and shared memory multiprocessors suffer from memory contention. Also, in computer vision tasks, data I/O overhead limits performance. In particular, high level vision tasks, which are complex and require nondeterministic communication, are strongly affected by these disadvantages. This paper proposes a flexibly (tightly/loosely) coupled hypercube multiprocessor (FCHM) for high level vision to alleviate these problems. A variable address space memory scheme in which a set of adjacent memory modules can be merged into a shared memory module by a dynamically partitionable hypercube topology is proposed. The architecture is quantitatively analyzed using computational models and simulated on the Intel’s Personal SuperComputer (iPSC/I), a hypercube multiprocessor. A parallel algorithm for exhaustive search is simulated on FCHM using the iPSC/I showing significant performance improvements over that of the iPSC/I. This research was supported in part by IBM corporation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号