期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallelization support for coupled grid applications with small meshes

Lorie M. Liebrock Ken Kennedy 《Concurrency and Computation》1996,8(8):581-615

Composite grid problems arise in important application areas, e.g. reactor simulation. Related physical phenomena are inherently parallel and their simulations are computationally intensive. Unfortunately, parallel languages, such as High Performance Fortran, provide little support for these problems. We illustrate topological connections via a coupling statement, develop a programming style and transformation system to support composite grid code development, and develop an algorithm that automatically determines distributions for composite grid problems with small meshes. A mesh is classified as small if the amount of computational work associated with the mesh is less than the amount of work to be assigned to a single processor. Precompiler transformations, such as cloning for alignment specification, are described. Excerpts from a High Performance Fortran program before and after transformation illustrate user programming style and transformation issues. Our distribution algorithm's alignment and distribution specifications are input to the transformed High Performance Fortran programs which applies the mapping for execution of the simulation code. Some advantages of this approach are: transformations are applied before compilation and allow communication optimization; data distribution may be determined for any number of problems without recompilation; user determined distribution for parallelization is unnecessary; portability is improved. We validate the topology-based data distribution algorithm using a number of reactor configurations. Two random distribution algorithms provide a basis of comparison with measures of load balance and communication cost. Experiments show that the topology-based distribution algorithm almost always obtains load balance at least as good as, and often significantly better than, random algorithms while reducing the total communication per iteration from 50% to as much as a factor of ten. 相似文献

2.

在流水线结构上的基于Shear-Wrap的并行体数据绘制算法 总被引：1，自引：0，他引：1

下载免费PDF全文

黄小虎李维郑南宁《中国图象图形学报》1998,3(2):115-122

在以前的基于目标空间划分的并行体数据绘制算法中，局部绘制和图象融合是两个串行的过程，在节点机的局部绘制阶段几乎没有数据通讯，但在数据融合阶段数据通讯量非常大，出现总线争用甚至通讯阻塞，而且在这个阶段有非常大的同步开销。本文利用流水线结构，让局部体数据绘制和图象融合并行执行，很好地解决了上述缺点。并在一个基于微机的流水线结构上实现了一个新的基于目标空间划分的并行体数据绘制算法。相似文献

3.

并行系统的通讯效率问题

林洪陈国良《小型微型计算机系统》1996,(1)

巨量并行处理（ＭＰＰ）强调并行系统结构和并行算法的可扩放性。在一个可扩放的并行系统结构上，可扩放的并行算法应该能够有效地利用不断增加的处理机，算法的有效性通常以算法运行时的处理机效率来衡量。一个被普遍忽视的因素是通讯效率，这是一个具有一般性的问题。本文给出了通讯效率的定义，研究了它与处理机效率的关系，并通过对一个典型算法的运行情况分析，研究了几个常见的并行系统结构的通讯效率。本文的结果表明：处理机效率和通讯效率的综合才能全面地评价算法的可扩放性并指导并行系统结构的设计。相似文献

4.

Rendering gems with aterism or chatoyancy

Shigeki Yokoi Kosuke Kurashige Jun-ichiro Toriwaki 《The Visual computer》1986,2(5):307-312

In this paper we describe an algorithm to render the asterism or chatoyancy effect in gems. This effect is due to light dispersion caused by inclusions consisting of vast numbers of thin fibrous needles. We develop models for distribution of the microfacet directions of the inclusions and of the light dispersion by them. Based on these models we derive an algorithm to render gems with the asterism effect. By introducting a ray distribution map, a new algorithm for tracing dispersed light rays is developed. 相似文献

5.

Slow annealing versus multiple fast annealing runs — an empirical investigation

Nigel Dodd 《Parallel Computing》1990,16(2-3):269-272

The effects of splitting a single annealing run into several parallel shorter annealing runs is detailed for one specific application of combinatorial optimisation by simulated annealing. This form of parallelisation of the algorithm is extremely easy to implement since it requires no communication between the processors. It is shown that, for a considerable range of parallelisation, the probability of finding the global minimum is largely unchanged. 相似文献

6.

A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes

Fahad SaeedAuthor VitaeAlan Perez-RathkeAuthor Vitae Jaroslaw GwarnickiAuthor VitaeTanya Berger-WolfAuthor Vitae Ashfaq KhokharAuthor Vitae 《Journal of Parallel and Distributed Computing》2012,72(1):83-93

Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for a large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align such a large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits super-linear speedups with increasing number of processors. 相似文献

7.

Algorithms for rendering realistic terrain image sequences and their parellel implementation

Gennady Agranov Craig Gotsman 《The Visual computer》1995,11(9):455-464

We present algorithms for rendering realistic images of large terrains and their implementation on a parallel computer for rapid production of terrain-animation sequences. “Large” means datasets too large for RAM. A hybrid ray-casting and projection technique incorporates quadtree subdivision techniques and filtering using precomputed bit masks. Hilbert space-filling curves determine the imagepixel rendering order. A parallel version of the algorithm is based on a Meiko parallel computer architecture, designed to relieve dataflow bottlenecks and exploit temporal image coherence. Our parallel system, incorporating 26 processors, can generate a full color-terrain image at video resolution (without noticable aliasing artifacts) every 2 s, including I/O and communication overheads. 相似文献

8.

串行程序在大粒度级的并行分解及可并行执行包的形成 总被引：1，自引：0，他引：1

罗昕于月芬罗静敏《小型微型计算机系统》1996,(8):35-40

本文提出了针对由划分阶段所形成的任务图［７］进行优化、合并的技术及相应的算法，用于在并行与通信开销间进行折衷，以使分解出的并行成份有尽可能高的执行效率。本文还给出了根据综合后的任务图形成可并行执行包，并在其中自动插入通信原语的方法。相似文献

9.

基于OpenGL的三维地形场景绘制及并行方法探讨

杨菲张小龙《现代计算机》2005,(10):104-107

三维地形场景的真实感绘制常被应用于一些大规模综合分布仿真系统中.这些仿真系统有时有并行绘制的需求,例如远程显示场景等.而OpenGL作为一种开放式的图形工业标准,利用它可以方便高效地实现三维场景的真实感绘制.本文在分析OpenGL内在特点的基础上,举例说明基于OpenGL的三维地形场景真实感绘制的一般过程,然后讨论基于局域网的OpenGL三维地形场景真实感绘制程序的并行方法. 相似文献

10.

Communication latency tolerant parallel algorithm for particle swarm optimization 总被引：1，自引：0，他引：1

Bo Li Koichi Wada 《Parallel Computing》2011,37(1):1-10

Particle swarm optimization (PSO) algorithm is a population-based algorithm for finding the optimal solution. Because of its simplicity in implementation and fewer adjustable parameters compared to the other global optimization algorithms, PSO is gaining attention in solving complex and large scale problems. However, PSO often requires long execution time to solve those problems. This paper proposes a parallel PSO algorithm, called delayed exchange parallelization, which improves performance of PSO on distributed environment by hiding communication latency efficiently. By overlapping communication with computation, the proposed algorithm extracts parallelism inherent in PSO. The performance of our proposed parallel PSO algorithm was evaluated using several applications. The results of evaluation showed that the proposed parallel algorithm drastically improved the performance of PSO, especially in high-latency network environment. 相似文献

11.

一种改进的数字图像加密算法 总被引：2，自引：0，他引：2

张雪锋范九伦《计算机工程与应用》2007,43(9):49-52

为了能对经过剪切变换攻击的加密图像进行有效恢复,给出了一种基于混沌序列的数字图像加密算法,该方法在预处理的基础上,基于数字图像灰度矩阵位平面进行置乱加密,在恢复过程中通过象素邻域灰度信息进行恢复,能够有效抵御针对加密图像的剪切变换攻击,同时给出了一种非线性的、应用高维混沌系统产生二值序列的方法。实验表明,该方法具有较好的效率和安全性。相似文献

12.

大规模双语句对自动获取技术

王澍郑德权赵铁军《电脑学习》2012,2(3):72-75

从互联网上挖掘大量双语平行句对,可以快速有效地构建大规模双语资源,服务于统计机器翻译。从挖掘对象的不同,将网络数据源分成对照网页和平行网页两类,提出一种抽取双语句对的方法。首先,从上述两类网页中分别抽取平行文本段,对照网页文本段抽取的主要方法为页面过滤和模板匹配,而平行网页依赖于网页结构的相似,采用对应节点匹配方法;其次,采用Gale—Church算法进行句对齐,得到平行句对;最后统一进行后处理。实验结果表明,从对照网页获取平行句对的准确率达到93．3％,平行网页为93．5％。相似文献

13.

Alignment of three-dimensional molecules using an image recognition algorithm

Richmond NJ Willett P Clark RD 《Journal of molecular graphics & modelling》2004,23(2):199-209

This paper describes a novel approach, based on image recognition in two dimensions, for the atom-based alignment of two rigid molecules in three dimensions. The atoms are characterised by their partial charges and their positions relative to the remaining atoms in the molecule. Based on this information, a cost of matching a pair of atoms, one from each molecule, is assigned to all possible pairs. A preliminary set of intermolecular atom equivalences that minimises the total atom matching cost is then determined using an algorithm for solving the linear assignment problem. Several geometric heuristics are described that aim to reduce the number of atom equivalences that are inconsistent with the 3D structures. Those that remain are used to calculate an alignment transformation that achieves an optimal superposition of atoms that have a similar local geometry and partial charge. This alignment is then refined by calculating a new set of equivalences consisting of atom pairs that are approximately overlaid, irrespective of partial charge. A range of examples is provided to demonstrate the efficiency and effectiveness of the method. 相似文献

14.

非规则数据场并行体绘制算法 总被引：1，自引：0，他引：1

任继成刘慎权《计算机学报》1998,21(9):813-818

并行算法是实现体绘制加速的重要途径，然而现有的并行体制绘制算法大部分是针对规则数据场的。相似文献

15.

Parallel algorithms for bipartite matching problems on distributed memory computers

Johannes Langguth Md. Mostofa Ali PatwaryFredrik Manne 《Parallel Computing》2011,37(12):820-845

We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers.The presented algorithm is based on the Push-Relabel algorithm which is known to be one of the fastest algorithms for the bipartite matching problem. Previous attempts at developing parallel implementations of it have focused on shared memory computers using only a limited number of processors.We first present a straightforward adaptation of these shared memory algorithms to distributed memory computers. However, this is not a viable approach as it requires too much communication. We then develop our new algorithm by modifying the previous approach through a sequence of steps with the main goal being to reduce the amount of communication and to increase load balance. The first goal is achieved by changing the algorithm so that many push and relabel operations can be performed locally between communication rounds and also by selecting augmenting paths that cross processor boundaries infrequently. To achieve good load balance, we limit the speed at which global relabelings traverse the graph. In several experiments on a large number of instances, we study weak and strong scalability of our algorithm using up to 128 processors.The algorithm can also be used to find ?-approximate matchings quickly. 相似文献

16.

A space-efficient parallel sequence comparison algorithm for a message-passing multiprocessor

Xiaoqiu Huang 《International journal of parallel programming》1989,18(3):223-239

We present a parallel algorithm for computing an optimal sequence alignment in efficient space. The algorithm is intended for a message-passing architecture with one-dimensional-array topology. The algorithm computes an optimal alignment of two sequences of lengthsM andN inO((M+N) ²/P) time andO((M+N)/P) space per processor, where the number of processors isP>=max(M, N). Thus, whenP=max(M, N) it achieves linear speedup and requires constant space per processor. Some experimental results on an Intel hypercube are provided.This research was supported by NIH Grant LM05110 from the National Library of Medicine. 相似文献

17.

Adaptive Lucas-Kanade tracking

《Image and vision computing》2019

Dense image alignment, when the displacement between the frames is large, can be a challenging task. This paper presents a novel dense image alignment algorithm, the Adaptive Forwards Additive Lucas-Kanade (AFA-LK) tracking algorithm, which considers the scale-space representation of the images, parametrized by a scale parameter, to estimate the geometric transformation between an input image and the corresponding template. The main result in this framework is the optimization of the scale parameter along with the transformation parameters, which permits to significantly increase the convergence domain of the proposed algorithm while keeping a high estimation precision. The performance of the proposed method was tested in various computer-based experiments, which reveal its interest in comparison with geometric as well as learning-based methods from the literature, both in terms of precision and convergence rate. 相似文献

18.

A bandwidth latency tradeoff for broadcast and reduction

Peter Sanders Jop F Sibeyn 《Information Processing Letters》2003,86(1):33-38

The “fractional tree” algorithm for broadcasting and reduction is introduced. Its communication pattern interpolates between two well known patterns—sequential pipeline and pipelined binary tree. The speedup over the best of these simple methods can approach two for large systems and messages of intermediate size. For networks which are not very densely connected the new algorithm seems to be the best known method for the important case that each processor has only a single (possibly bidirectional) channel into the communication network. 相似文献

19.

Optimizing conformality of NURBS surfaces by general bilinear transformations

《Computer aided design》2015

The conformality of NURBS surfaces greatly affects the results of rendering and tessellation applications. To improve the conformality of NURBS surfaces, an optimization algorithm using general bilinear transformations is presented in this paper. The conformality energy is first formulated and its numerical approximation is then constructed using the composite simpson’s rule. The initial general bilinear transformation is obtained by approximating the conformal mapping of its 3D discretized mesh using a least square method, which is further optimized by the Levenberg–Marquardt method. Examples are given to show the performance of our algorithm for rendering and tessellation applications. 相似文献

20.

基于译文的英汉双语句子自动对齐 总被引：1，自引：0，他引：1

钱丽萍赵铁军杨沫昀高光来《小型微型计算机系统》2001,22(1):123-125

本文利用英汉互译译文间的内在联系,提出了基于译文的方法,通过使用一部翻译较完整的词典作为桥梁,将英汉句子间的对应关系连结起来,根据英语文本中的单词,在词典中找其对应的译文,并以译文到汉语句子中去匹配,根据评价函数和动态规划算法找到对齐句对,实验结果证明这种对齐方法消除了基于长度做法中错误蔓延的情况。并且普遍适用于任何文本,它大大地提高了对齐的精度,其效果是令人满意的。相似文献