期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴中海叶澄清《小型微型计算机系统》1995,16(7):15-20

并行处理机外围子系统的设计和实现技术直接影响整个系统的性能价格比，本文根据ＳＰＰ体系结构的特点和实际应用需要，在前端服务器与ＳＭ／ＳＳＭ之间设计了专用的Ｉ／Ｏ处理机，使得系统Ｉ／Ｏ设备与ＳＭ／ＳＳＭ之间直接进行高速数据传送，从而大大提高系统的Ｉ／Ｏ性能。在Ｉ／Ｏ处理机的设计中，采用了ｉ８６０＋８２３８０＋ＳＲＡＭ的总体结构，从而实现了处理机访问主存和ＤＭＡ控制器访问ＳＲＡＭ之间的并行。相似文献

2.

并行文件系统的设计 总被引：2，自引：0，他引：2

孙凝晖《计算机学报》1994,17(12):938-945

在大规模并行处理巨型机（ＭＰＰ）的设计中，提高Ｉ／Ｏ性能同提高计算能力和通信能力同样重要。并行文件系统（ＰＦＳ）在多个Ｉ／Ｏ结点的多个磁盘上，分布文件系统和文件的磁盘块，将文件读写在计算结点转化成多个对物理块的直接Ｉ／Ｏ请求，利用预读，预分配，磁盘缓冲式区和异步Ｉ／Ｏ增加Ｉ／Ｏ的并发生，在特定的文件使用模式下，也是ＭＰＰ应用的主要Ｉ／Ｏ模式，获得很高的Ｉ／Ｏ效率。相似文献

3.

VPP500并行巨型机的体系结构

王广益《电子计算机》1996,(6):42-50

ＶＰＰ５００向量并行处理机是一台高度并行的分布式存储器巨型计算机，性能范围是６．４￣３５５ＧＦＬＯＰＳ，主存容量为１￣２２２ＧＢ。该系统可支持４￣２２２个由高带宽交叉开关网络互连的处理器。ＶＰＰ５００与当前大规模并行系统截然不同的三个关键特征决定了其体系结构。第一，它的组成部件是１．６ＧＦＬＯＰＳ的向量处理器，比大规模并行处理机（ＭＰＰ）中使用的处理器快一个数量级。这种极高的单处理器性能降低了系统相似文献

4.

一个选择算法及其并行化

武继刚《计算机工程与设计》1996,17(5):60-64,F003

文中用合并选择的思想及堆上的最佳算法，给出了求解选择问题的一个新算法及其相应的并行化。将串行合并选择算法的复杂度ｎＬｏｇｋ＋Ｏ（ｎ）降低到（ｎＬｏｇｋ）／２＋（ｎＬｏｇＬｏｇｋ）／２＋Ｏ（ｎ），并保持了原并行算法的结构，在ＳＩＭＤ树型机器的并行计算模型上，并行运行相似文献

5.

一种关于DO—loop并行划分的新观点与新方法

刘键谢卫《计算机学报》1996,19(7):520-529

本文提出了一个分配相关新概念以及与此相应的基于迭代空间等价分类的ＤＯ－ｌｏｏｐ并行划分的新观点与新方法，这种方法的主要特点是：（１）是关于ＤＯ－ｌｏｏｐ并行划分的一个一般的统一的方法，能解决所有ＤＯ－ｌｏｏｐ的并行划分问题。（２）能准确地挖掘出程序中所有ＤＯ－ｌｏｏｐ的并行性，并且同时自动完成数据划分与计算划分。（３）最适用于ＭＩＭＤ与ＳＰＭＤ的大粒度并行划分。（４）可以和任务给并行划分技术，向量相似文献

6.

多媒体MIS对MDBMS的功能需求及OODBMS对多媒体MIS的支持

王森肖健宇《计算机工程与应用》1997,33(10):51-54

面向对象的多媒体数据库系统（ＯＯＤＢＭＳ）为多媒体管理信息系统（ＭＭＩＳ）的开发与应用奠定了坚实的基础。本文阐述了ＭＭＩＳ的主要特征和构成；并讨论了ＭＭＩＳ对ＭＤＢＳ的功能需求的三个方面中后两个方面即数据模型方面及多媒体对象共享和操作方面的需求；最后讨论了ＯＯＤＢＭＳ支持ＭＭＩＳ中的几个问题。相似文献

7.

实现全双工异步串行通信的LON节点

杨育红曲保章《微计算机信息》1997,13(4):18-19

ＮｅｕｒｏｎＣｈｉｐ提供有关１１个可编程Ｉ／Ｏ引脚（ＩＯ０—ＩＯ１０），它们可工作在３４种工作方式下，例如：位Ｉ／Ｏ；字节Ｉ／Ｏ；异步串行Ｉ／Ｏ；并行Ｉ／Ｏ等。其中异步串行通信方式仅能工作在半双工方式。本文作者根据实际应用的需要设计实现了异步串行通信的全双工工作方式，并已用于实际节点。相似文献

8.

一种基于C扩展的SIMD的并行程序设计语言

下载免费PDF全文

景晓军方滨兴《软件学报》1996,7(7):401-408

ＳＩＭＣ（ＳＩＭＤＣ）是通过对Ｃ语言进行语法扩展（未进行语义扩展）得到的支持ＳＩＭＤ（ｓｉｎｇｌｅｉｎｓｔｒｕｃｔｉｏｎｍｕｌｔｉｐｌｅｄａｔａ）并行程序设计的并行语言．ＳＩＭＣ可方便地描述ＳＩＭＤ并行算法，具有ＳＩＭＤ计算机系统结构定义能力，可支持多种系统结构上的并行算法研究．ＳＩＭＣ语言的模拟执行系统已在单机上实现，并作为作者研究开发的ＳＩＭＤ计算机程序设计及性能评价模拟环境的并行程序设计语言，用于ＳＩＭＤ计算机算法及结构的性能评价. 相似文献

9.

并行文件系统的设计与实现

倪永年《电子计算机》1999,(1):23-27

并行文件系统已作为超级计算机提高Ｉ／Ｏ带宽最常用的方法之一,鉴于超级计算机之间体系结构的差异,其设计和实现方法也不同。本文就ＩｎｔｅｌＴＦＬＯＰＳ的并行文件系统（ＰＦＳ）作一介绍。相似文献

10.

高性能并行I／O实现技术分析 总被引：1，自引：0，他引：1

程乐祥《电子计算机》2000,(6):22-25

本文就实现高性能并行Ｉ／Ｏ的技术问题作了一番比较,认为具有独立Ｉ／Ｏ网的外部并行Ｉ／Ｏ结构是最适于实现高性能并行Ｉ／Ｏ的平台。因而,只有从应用算法研究着手,获取适合并行Ｉ／Ｏ的数据布局类型,并在语言、编译和ＯＳ的支持下实现这种布局和并行Ｉ／Ｏ访问,才有可能达到较理想的性能指标。相似文献

11.

Performance Evaluation of a Parallel Pipeline Computational Model for Space-Time Adaptive Processing

Wei-Keng Liao Alok Choudhary Donald Weiner Pramod Varshney 《The Journal of supercomputing》2005,31(2):137-160

This paper presents further results on the design and implementation of various optimizations based on our earlier work of developing a parallel pipelined model for the computational intensive applications that have multiple processing tasks. Performance evaluation of this model was done by using a real-time airborne radar application that employs a Space-Time Adaptive Processing (STAP) algorithm. This paper focuses on the following four issues: (1) The tradeoffs between increasing the throughput and reducing the latency are examined in more detail when allocating processors among different processing tasks. (2) A multi-threaded design is incorporated into the pipeline model and implemented on a massively parallel computer with symmetric multi-processor nodes, which shows enhanced performance. (3) The disk I/O is incorporated into the parallel pipeline to study its effect on performance in which two I/O task designs have been implemented: embedding I/O in the pipeline or having a separate I/O task. By using a double buffering approach together with the asynchronous I/O, the overall pipeline performance scales well as the number of processors increases. (4) From the comparison of the two I/O implementations, it is discovered that the latency may be improved when merging multiple tasks into a single task. The effect of reorganizing the task structure of the pipeline is discussed in detail. All the performance results shown in this work demonstrate the linear scalability the parallel pipeline model can achieve using a production radar application. Although this paper focuses on the implementation of the parallel pipeline model and uses the results from a STAP application to support the claims of the discovered properties for this pipeline, this model is also applicable to many other types of applications with similar computational characteristics. 相似文献

12.

基于网络处理器的路由交换方案 总被引：4，自引：0，他引：4

解超杰武波《微机发展》2005,15(6):60-61,64

网络处理器是新一代网络设备的核心器件，基于网络处理器的路由器交换机开发是一个热点。由于ASIC和通用CPU各自的局限无法满足日益增长的网络流量和业务的需求，从而出现了网络处理器，网络处理器一般是由通用处理器作为控制CPU，多个转发引擎并行处理分组以隐藏访问I／O设备的延时，并通过协处理器来加速路由查找、CRC计算等功能。通过分析网络处理器的体系结构并依据当前网络处理器发展的实际情况提出了几种基于网络处理器设计的路由交换系统方案，并分析了各种方案的特点及应用场合。相似文献

13.

非定常Monte Carlo输运问题的并行算法 总被引：1，自引：0，他引：1

刘杰邓力胡庆丰袁国兴李晓梅《计算机学报》2004,27(1):99-106

文中给出了非定常MonteCarlo(下文简写为MC)输运问题的并行算法 ,对并行程序的加载运行模式进行了讨论和优化设计 .针对MC并行计算设计了一种理想情况下无通信的并行随机数发生器算法 .动态MC输运问题有大量的I/O操作 ,特别是读取剩余粒子数据文件需要大量的I/O时间 ,文中针对I/O问题 ,提出了三种并行I/O算法 .最后给出了并行算法的性能测试结果 ,对比串行计算时间 ,使用 6 4台处理机时的并行计算时间缩短了 30倍相似文献

14.

非定常粒子输运蒙特卡罗自适应并行计算

邓力袁国兴黄正丰许海燕王瑞宏李树《数值计算与计算机应用》2003,24(2):111-115

§1.引言对Boltzmann方程求解,采用连续截面、精确角分布的蒙特卡罗模拟(下简记为MC),可以获得理想的结果,然而MC方法计算耗时多是其相对其它方法的最大不足,并行计算和高加速比是克服这种不足的可行途径。相似文献

15.

一种SIMD多DSP数字图像处理系统研究与设计

李勇齐同斌张瑞生《电子技术应用》2007,33(11):71-73

数字图像处理需要大量的数据运算,要求系统具有很高的数据吞吐量。并行处理结构能较好地满足这一要求。介绍一种SIMD并行多DSP数字图像处理系统。该系统具有避免冲突、能连续处理图像数据、处理器间通信及I/O部分简单、硬件及软件模块化等优点。相似文献

16.

An improved parallel Jacobi method for diagonalizing a symmetric matrix

Alan H. Karp John Greenstadt 《Parallel Computing》1987,5(3):281-294

We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n⁴) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n³ log₂n) time. One of our new methods also takes O(n³ log₂n) time, but the other one takes only O(n³) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems. 相似文献

17.

Securing the data path of next-generation router systems

Tilman Wolf Russell Tessier Gayatri Prabhu 《Computer Communications》2011,34(4):598-606

As the technology used to implement computer network infrastructure advances, networking resources are becoming more vulnerable to attack. Recent router designs are based on general-purpose programmable processors, which increase their potential vulnerability. To address this issue, a Secure Packet Processing platform has been developed that can flexibly protect emerging router systems. Both instruction-level operation of embedded processors and I/O operations of router ports are monitored to detect anomalous behavior. If such behavior is detected, a recovery system is invoked to restore the system into an operational state. Experimental results show that processor-based attacks can generally be determined by a processing monitor within a single instruction. I/O anomalies, including unexpected packet broadcast or delay, can be detected by an I/O monitor with limited overhead. Overall, the system overhead for secure monitoring is limited to a fraction of the overall system space, memory, and power budget. 相似文献

18.

提高可扩展并行机群并行I/O效率的一个方法 总被引：10，自引：0，他引：10

龙翔李忠泽高小鹏李未《计算机研究与发展》2000,37(6):650-656

随着ＣＰＵ性能的高速提升,系统Ｉ／Ｏ能力的不足越来明显地成为提高ＮＯＷ系统整体性能的瓶颈。在分析现有基于ＮＯＷ系统的并行Ｉ／Ｏ算法的基础上,通过理论推导,给出了一种寻求计算进程与计算结点之间最佳映射的方法。该方法可以在数据重分配时,使各计算的通信量小,从而达到提高系统并行Ｉ／Ｏ效率的目的。相似文献

19.

A flexibility coupled hypercube multiprocessor for high level vision

Myung H. Sunwoo J. K. Aggarwal 《Machine Vision and Applications》1992,5(2):127-138

In general, message passing multiprocessors suffer from communication overhead between processors and shared memory multiprocessors suffer from memory contention. Also, in computer vision tasks, data I/O overhead limits performance. In particular, high level vision tasks, which are complex and require nondeterministic communication, are strongly affected by these disadvantages. This paper proposes a flexibly (tightly/loosely) coupled hypercube multiprocessor (FCHM) for high level vision to alleviate these problems. A variable address space memory scheme in which a set of adjacent memory modules can be merged into a shared memory module by a dynamically partitionable hypercube topology is proposed. The architecture is quantitatively analyzed using computational models and simulated on the Intel’s Personal SuperComputer (iPSC/I), a hypercube multiprocessor. A parallel algorithm for exhaustive search is simulated on FCHM using the iPSC/I showing significant performance improvements over that of the iPSC/I. This research was supported in part by IBM corporation. 相似文献