期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems

Min Tan Janet M. Siegel Howard Jay Siegel 《International journal of parallel programming》1999,27(3):195-225

Parallel algorithms, based on a distributed memory machine model, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two M IMD machines), and the 16 processor PASM prototype (a partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (the rectangular and stripe subimage methods) are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines. 相似文献

2.

Data management and control-flow aspects of an SIMD/SPMD parallellanguage/compiler

Nichols M.A. Siegel H.J. Dietz H.G. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):222-234

Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines 相似文献

3.

Task Scheduling on the PASM Parallel Processing System

《IEEE transactions on pattern analysis and machine intelligence》1985,(2):145-157

PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM. 相似文献

4.

Parallel stereocorrelation on a reconfigurable multi-ring network

Hamid R. Arabnia Suchendra M. Bhandarkar 《The Journal of supercomputing》1996,10(3):243-269

A reconfigurable network termed as the reconfigurable multi-ring network (RMRN) is described. The RMRN is shown to be a truly scalable network in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O(log₂ N) for anN-processor network. Algorithms for the two-dimensional mesh and the SIMD or SPMD n-cube are shown to map very elegantly onto the RMRN. Basic message passing and reconfiguration primitives for the SIMD/SPMD RMRN are designed for use as building blocks for more complex parallel algorithms. The RMRN is shown to be a viable architecture for image processing and computer vision problems using the parallel computation of the stereocorrelation imaging operation as an example. Stereocorrelation is one of the most computationally intensive imaging tasks. It is used as a visualization tool in many applications, including remote sensing, geographic information systems and robot vision.An earlier version of this paper was presented at the 1995 International Conference on Parallel and Distributed Processing Techniques and Applications. 相似文献

5.

面向SIMD机器的全局自动数据分割

林进朱宁宁张兆庆乔如良《计算机学报》1999,22(6):596-602

提出了一种面向ＳＩＭＤ机器的全局数据自动分割算法,该算法能处理多个非紧嵌折循环嵌套,并且数组下标存取为循环变量的线性式,首先通过数据与迭代映射抽象了计算中的通信方式,然事提出识别规则模式通信模式的形式比条件,接着建立包含对准信息和相应通信开销的数据迭代图,并在数据迭代图的基础上提出了一个启发式算法来计算较优的数据分布和迭代分布,以优化处理单元之间的通信开销,通过发析多个循环嵌套所涉及的多个数组映和相似文献

6.

A Block-Based Mode Selection Model for SIMD/SPMD Parallel Environments

《Journal of Parallel and Distributed Computing》1994,21(3):271-288

One of the challenges for parallel compilers and compiler-related tools is, given a machine-independent parallel language, to generate executable code for a variety of computational models, and to identify those specific parallel modes for which a program is well-suited. One portion of this problem, developing a method for estimating the relative execution time of a data-parallel algorithm in an environment capable of the SIMD and SPMD (MIMD) modes of parallelism, is presented. Given a data-parallel program in a language whose syntax is mode-independent and empirical information about instruction execution time characteristics, the goal is to use static source-code analysis to determine an implementation that results in an optimal execution time for a mixed-mode machine capable of SIMD and SPMD parallelism. Statistical information about individual operation execution times and paths of execution through a parallel program is assumed. A secondary goal of this study is to indicate language, algorithm, and machine characteristics that must be researched to learn how to provide the information needed to obtain an optimal assignment of parallel modes to program segments. 相似文献

7.

一个数据并行语言的设计及其实现

陈斯愈黄林鹏《计算机工程》1997,23(3):3-6

数据并行模型应用到ＭＩＭＤ机器上，实现ＳＰＭＤ模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Ｍｕｌｔｉ－ｃ的设计和实现。正在实现的Ｍｕｌｉｔｉ－ｃ编译器，以预编译的方式接受ＳＩＭＤ形式的程序说明，放宽同步要求，产生能以ＳＰＭＫ方式在并行系统上运行的Ｃ程序。相似文献

8.

An approximate method for filtering out data dependencies with a sufficiently large distance between memory references

Patricio Bulić Tomaž Dobravec 《The Journal of supercomputing》2011,56(2):226-244

相似文献

9.

基于SIMD机器的优化数据传输的并行循环分割 总被引：2，自引：1，他引：2

林进张兆庆祝明发《计算机学报》1998,21(7):577-585

本文提出一个基于分布式局存的ＳＩＭＤ机器的循环分割理论体系以优化运算中所需要的数据传输。该体系使用矩阵表示迭代空间、数据空间和数组存取式。我们引入数据传输概念，并建立一个简单有效的数据传输模型来评估数据在全局内存和局部内存之间的传输开销。最后，对于给定的循环嵌套，我们给出一个循环分割算法以获得优化循环块，使得循环嵌套中所需要的数据传输开销最小，并且大大减少了数据传输和计算的同步开销。实验结果证明了相似文献

10.

A flexibility coupled hypercube multiprocessor for high level vision

Myung H. Sunwoo J. K. Aggarwal 《Machine Vision and Applications》1992,5(2):127-138

In general, message passing multiprocessors suffer from communication overhead between processors and shared memory multiprocessors suffer from memory contention. Also, in computer vision tasks, data I/O overhead limits performance. In particular, high level vision tasks, which are complex and require nondeterministic communication, are strongly affected by these disadvantages. This paper proposes a flexibly (tightly/loosely) coupled hypercube multiprocessor (FCHM) for high level vision to alleviate these problems. A variable address space memory scheme in which a set of adjacent memory modules can be merged into a shared memory module by a dynamically partitionable hypercube topology is proposed. The architecture is quantitatively analyzed using computational models and simulated on the Intel’s Personal SuperComputer (iPSC/I), a hypercube multiprocessor. A parallel algorithm for exhaustive search is simulated on FCHM using the iPSC/I showing significant performance improvements over that of the iPSC/I. This research was supported in part by IBM corporation. 相似文献

11.

Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

《Parallel Computing》2013,39(10):586-602

Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a multimedia coprocessor resembles of single-instruction multiple-data (SIMD) engines into architectures exploiting ILP at compile time, such as very long instruction word (VLIW) and transport triggered architecture (TTA). However, the ILP regions fail to scale with the increased vector length to achieve high performance in the DLP regions. Furthermore, the register-to-register nature of SIMD instructions causes current SIMD engines to have limitations in handling memory alignment, data reorganization, and control flow. Many supporting instructions such as data permutations, address generations, and loop branches, are required to aid in the execution of the real SIMD computation instructions. To mitigate these problems, we propose optimized SIMD engines that have the capabilities for combining VLIW or TTA processing with a unified scalar and long vector computations as well as efficient SIMD hardware for real computation. Our new architecture is based on TTA and is called multimedia coprocessor (MCP). This architecture includes following features: (1) a simple coprocessor structure with 8-way TTA, (2) cost-effective SIMD hardware capable of performing floating-point operations, (3) long vector capabilities built upon existing SIMD hardware and a single register file and processor data path for both scalar operands and vector elements, and (4) an optimized SIMD architecture that addresses the SIMD limitations. Our experimental evaluations show that MCP can outperform conventional SIMD techniques by an average of 39% and 12% in performance for multimedia kernels and applications, respectively. 相似文献

12.

一种共享主存二维SIMD结构资源分配算法的改进与实现

下载免费PDF全文

李初辉王伟肖玮《计算机工程与科学》2008,30(9):99-102

共享主存二维SIMD结构已经广泛应用于多媒体处理加速部件,其数据并行性可以大大提高处理器的运算能力。目前,已有一些针对共享主存二维SIMD结构编译优化方面的研究,这些编译优化技术能有效地提高各种多媒体应用程序的加速比。但是,分析可知,这些优化方法的平均资源利用率只有约50％。本文基于对多媒体应用程序在共享主存二维维SIMD结构上的执行过程分析,根据原有算法并适当修改经典图着色寄存器分配算法,提出了一种改进的资源分的目的。实验结果说明,该算法的改进对于大部分多媒体应用程序的性能有显著的提高。相似文献

13.

适合机群OpenMP系统的制导扩展 总被引：1，自引：0，他引：1

章隆兵吴少刚蔡飞胡伟武《计算机学报》2004,27(8):1129-1136

OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准．机群OpenMP系统在机群上实现了OpenMP计算环境，它将OpenMP的易编程性和机群的可扩展性结合起来，是很有意义的．OpenMP的编程方式主要有循环级和SPMD两种，其中循环级方式易于编程而SPMD方式难于编程．然而在机群OpenMP系统中获得高性能OpenMP程序，必需采用SPMD方式．该文描述了适合机群OpenMP系统的一个简单的OpenMP制导扩展子集(包括数据分布制导、循环调度模式)，并在机群OpenMP系统OpenMP／JIAJIA上进行了实现．应用测试表明，利用这些制导扩展进行编程，既保持循环级方式的易编程性又获得与SPMD方式相当的性能，是有效的编程方式．相似文献

14.

数据并行计算：概念,模型与系统 总被引：3，自引：2，他引：1

李晓明《计算机科学》2000,27(6):1-5

一、引言并行计算,或者并行处理,指的是这样一种努力和相关的研究:利用多个具有计算能力的部件来共同完成一个计算工作,以获得比用一个部件来完成要快的效果。这显然是一个很自然的想法。历史地看,几乎是自从有了计算机,就有了并行处理的想法和实践。在80年代后期到90年代初期,以寻求对人类面临的若干重相似文献

15.

一种SIMD优化中的向量寄存器部分重用方法 总被引：1，自引：0，他引：1

下载免费PDF全文

钱兴隆臧斌宇朱传琪《计算机工程与科学》2007,29(5):141-146

SIMD架构用于多媒体加速,已经广泛应用于现代通用处理器中.SIMD架构的数据并行性可大大提高处理器的运算能力,但由于存储系统的速度远远不能与其匹配,使得应用程序的性能很难获得进一步的提高.因此,本文基于SIMD架构的访存特性,提出了一种向量寄存器部分重用的方法,以提高访存效率;并给出了相应的程序转换算法,通过数据相关性的分
分析,在应用程序向量化时,生成采用向量寄存器部分重用的优化代码.实验结果说明,该算法对多媒体应用程序的性能有显著的提高. 相似文献

16.

Tlib—a library to support programming with hierarchical multi-processor tasks

《Journal of Parallel and Distributed Computing》2005,65(3):347-360

The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. The result is a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring. 相似文献

17.

A sliding memory plane array processor

Sunwoo M.H. Aggarwal J.K. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(6):601-612

A mesh-connected single-input multiple-data (SIMD) architecture called a sliding memory plane (SliM) array processor is proposed. Differing from existing mesh-connected SIMD architectures, SliM has several salient features such as a sliding memory plane that provides inter-PE communication during computation. Two I/O planes provide an I/O overlapping capability. Thus, inter-PE communication and I/O overhead can be overlapped with computation. Inter-PE communication time is invisible in most image processing tasks because the computation time is larger than the communication time on SliM. The ability to overlap inter-PE communication with computation, regardless of window size and shape and without using a coprocessor or an on-chip DMA controller is unique to SliM 相似文献

18.

An efficient memory system for the SIMD construction of a Gaussianpyramid

Jong Won Park Harper D.T. III 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(8):855-860

In this paper, a memory system is introduced for the efficient construction of a Gaussian pyramid. The memory system consists of an address calculating circuit, an address routing circuit, a memory module selection circuit, and 2ⁿ+1 memory modules. The memory system provides parallel access to 2ⁿ image points whose patterns are a block, a row or a column, where the interval of the column and the block is 1 and the interval of the row is 2^l,l⩾0. The performance of a generic SIMD (single-instruction multiple-data) processor using the proposed memory system is compared with one using an interleaved memory system for the construction of a Gaussian pyramid. The ratio of the time of the construction of level 2 and level 10 from the original image (level 0) of an SIMD processor with an interleaved memory system to that of the proposed memory system is 1.485 and 1.633, respectively 相似文献

19.

A new parallel algorithm for parsing arithmetic infix expressions

Y. N. Srikant Priti Shankar 《Parallel Computing》1987,4(3):291-304

A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is presented. The technique is based on a result due to Fischer (1980) which enables the construction of the parse tree, by appropriately scanning the vector of precedence values associated with the elements of the expression. The algorithm presented here is suitable for execution on a shared memory model of an SIMD machine with no read/write conflicts permitted. It uses O(n) processors and has a time complexity of O(log²n) where n is the expression length. Parallel algorithms for generating code for an SIMD machine are also presented. 相似文献

20.

一类非规则并行应用问题的通信集生成算法

胡长军李静王珏姚广利李永红丁良李建江《计算机学报》2008,31(1):120-126

非规则计算是大规模并行应用中普遍存在和影响效率的关键问题.在基于分布式内存的数据并行范例中,如何针对非规则数组引用,有效地生成本地内存访问序列和通信集,是并行编译生成SPMD结点程序所必须解决的重要问题.文中针对两重嵌套循环中,下一层循环边界是上一层循环变量的线性或非线性函数,数组下标是两层循环变量的非线性函数这样一类包含非规则数组引用的并行应用问题,提出了一种在编译时生成通信集的代数算法.并且针对cyclic(k)数据分布和线性对齐模板,借助整数格概念,给出了编译时全局地址和本地地址之间的转换方法.文中还给出了相应的经过通信优化的SPMD结点程序.最后通过实例验证了算法的正确性.该算法的意义在于避免了传统Inspector/Executor非规则计算模型中的Inspector阶段,从而节省了运行时Inspector阶段通过穷举下标生成通信集的巨大开销. 相似文献