首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
遥感图像的镶嵌处理具有数据量大,流程复杂,算法处理耗时巨大的特点,并行计算是加速镶嵌处理过程速度的有效手段。但是,传统的并行镶嵌算法由于任务分配采用静态策略,导致计算节点负载不均衡,并行效率不高。同时,由于传统并行镶嵌算法中存在大量非常耗时的数据存取操作,并且在重采样和匀色过程中存在不合理的流程配置,使得并行效率降低,难以得到比较线性的加速比。本文提出的基于动态任务分配和多线程并行I/O的并行镶嵌算法,较好地解决了上述问题,通过对比分析和实验表明,本算法对大规模图像的镶嵌处理,具有较好的并行处理速度,以及理想的线性并行加速比曲线,节点扩展能力较强。  相似文献   

2.
A platform program that performs biological sequence comparison provides a case study to compare the relative advantages of a machine-independent approach to parallel computation versus a machine-specific approach. The program consists of two routines: (i) PSCANLIB, which compares a single biological sequence against a database of sequences, and (ii) PCOMPLIB, which compares a database of sequences against another database of sequences, or against itself. The program was first parallelized to run on the Intel Hypercube parallel computer using native Hypercube commands to coordinate the parallel computation. The parallelization logic of the program was then translated into a machine-independent parallel programming language, Linda. These two approaches to parallelization are contrasted in terms of: (i) the expressive power of the logic that coordinates the parallel computation, (ii) the portability of the machine-independent version to other parallel machines and (iii) the relative efficiency of the two versions of the program. In the benchmark tests reported, the benefits of the machine-independent approach were achieved with only a modest sacrifice in efficiency.  相似文献   

3.
基于TBB的傅里叶变换多核并行化实现   总被引:2,自引:0,他引:2       下载免费PDF全文
杨川  杨斌 《计算机工程》2010,36(16):288-290
通过对传统傅里叶变换的分析,发现其运行的瓶颈主要是循环体的运算效率低下,并且程序执行时只会被分配到一个硬件核上,并没有充分利用多核。针对上述问题,通过对英特尔线程构建模块(TBB)的研究与应用,使得循环体内的运算被划分为各个相互独立的空间,并把这些空间的运算尽可能分配到多核上,实现了对传统傅里叶变换的并行化改造,并取得较好的效果。  相似文献   

4.
非定常Monte Carlo输运问题的并行算法   总被引:1,自引:0,他引:1  
文中给出了非定常MonteCarlo(下文简写为MC)输运问题的并行算法 ,对并行程序的加载运行模式进行了讨论和优化设计 .针对MC并行计算设计了一种理想情况下无通信的并行随机数发生器算法 .动态MC输运问题有大量的I/O操作 ,特别是读取剩余粒子数据文件需要大量的I/O时间 ,文中针对I/O问题 ,提出了三种并行I/O算法 .最后给出了并行算法的性能测试结果 ,对比串行计算时间 ,使用 6 4台处理机时的并行计算时间缩短了 30倍  相似文献   

5.
In this paper we consider the problem of computing the connected components of the complement of a given graph. We describe a simple sequential algorithm for this problem, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time. Moreover, unlike previous linear co-connectivity algorithms, this algorithm admits efficient parallelization, leading to an optimal O(log n)-time and O((n+m)log n)-processor algorithm on the EREW PRAM model of computation. It is worth noting that, for the related problem of computing the connected components of a graph, no optimal deterministic parallel algorithm is currently available. The co-connectivity algorithms find applications in a number of problems. In fact, we also include a parallel recognition algorithm for weakly triangulated graphs, which takes advantage of the parallel co-connectivity algorithm and achieves an O(log2 n) time complexity using O((n+m2) log n) processors on the EREW PRAM model of computation.  相似文献   

6.
为有效解决粒子群算法在求解路网问题时易陷入局部最优的缺点,根据高校地理数据和多核系统并行处理方式,将自平衡策略和变异思想结合且并行化,提出一种并行求解高校路网问题的正序变异的混合PSO算法。该算法引入适合此问题的自平衡正序变异策略且采用并行处理方式,使其生成相互独立子群体且并行求解,来提高算法求解精度,保证算法多样性及收敛,降低计算时间。实验以Visual Studio 2005中C++编程实现仿真,结果表明此算法不但能有效求解高校路网问题,而且比离散PSO算法、并行自平衡PSO算法的解更优。  相似文献   

7.
随着处理器速度和网络传输速度的快速提升,I/O成为限制计算机性能的一大瓶颈,并行I/O提供了解决这一问题的有效途径。在并行I/O的实现中,数据访问调度策略对系统的性能有着重要的影响。文章通过对集群计算中现有I/O数据访问调度策略的研究,针对调度器瓶颈问题,提出了自主协助动态I/O调度策略,实验结果表明了所提出策略的有效性。  相似文献   

8.
A platform for biological sequence comparison on parallel computers   总被引:1,自引:0,他引:1  
We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube.  相似文献   

9.
Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.  相似文献   

10.
Heuristics for scheduling I/O operations   总被引:1,自引:0,他引:1  
The I/O bottleneck in parallel computer systems has recently begun receiving increasing interest. Most attention has focused on improving the performance of I/O devices using fairly low level parallelism in techniques such as disk striping and interleaving. Widely applicable solutions, however, will require an integrated approach which addresses the problem at multiple system levels, including applications, systems software, and architecture. We propose that within the context of such an integrated approach, scheduling parallel I/O operations will become increasingly attractive and can potentially provide substantial performance benefits. We describe a simple I/O scheduling problem and present approximate algorithms for its solution. The costs of using these algorithms in terms of execution time, and the benefits in terms of reduced time to complete a batch of I/O operations, are compared with the situations in which no scheduling is used, and in which an optimal scheduling algorithm is used. The comparison is performed both theoretically and experimentally. We have found that, in exchange for a small execution time overhead, the approximate scheduling algorithms can provide substantial improvements in I/O completion times  相似文献   

11.
This paper presents new efficient shortest path algorithms to solve single origin shortest path problems (SOSP problems) and multiple origins shortest path problems (MOSP problems) for hierarchically clustered data networks. To solve an SOSP problem for a network with n nodes, the distributed version of our algorithm reaches the time complexity of O(log(n)), which is less than the time complexity of O(log 2 (n)) achieved by the best existing algorithm. To solve an MOSP problem, our algorithm minimizes the needed computation resources, including computation processors and communication links for the computation of each shortest path so that we can achieve massive parallelization. The time complexity of our algorithm for an MOSP problem is O(m log(n)), which is much less than the time complexity of O(M log2 (0)) of the best previous algorithm. Here, M is the number of the shortest paths to be computed and m is a positive number related to the network topology and the distribution of the nodes incurring communications, m is usually much smaller than M. Our experiment shows that m is almost a constant when the network size increases. Accordingly, our algorithm is significantly faster than the best previous algorithms to solve MOSP problems for large data networks  相似文献   

12.
本文提出了一种新颖的并行程序配置优化算法。这种算法利用黑板系统将配置优化问题分解组织为不同层次的知识领域,并利用A^*算法对决策树进行搜索。研究了并行程序的任务调度、存储服务器的数据分配、自适应分片、协同I/O和数据筛选五个知识领域。  相似文献   

13.
FFT(快速傅里叶变换)是基于提高DFT(离散傅里叶变换)计算的高效算法,它在众多科学和工程领域都得到了广泛的应用。自FFT算法出现以后,从早期的以降低复杂度到近年以来的大规模并行FFT计算,各种优化算法得到广泛的研究。在并行运算领域中,随着可编程的、并行化GPU的不断推广,特别是通用并行统一计算架构CUDA的出现,极大增强了GPU的计算能力,在编程和优化等方面都有显著地提升。鉴于此,本文在分析FFT算法实现的基础上,研究了一种适合GPU运算的FFT并行计算方法,并通过CUDA架构实现了FFT算法在GPU上的运算。该方法的引入在理论不计算数据传输的情况下,使一维FFT运算时间的复杂度由O(N logN2)可以降到O(N/rlogN2)。通过验证,本文提出的CUDA的并行FFT方法得到较好的加速效果,在精度计算上也符合实际的要求,从而证明了该方法的正确性和有效性。  相似文献   

14.
背包问题无存储冲突的并行三表算法   总被引:4,自引:0,他引:4  
背包问题属于经典的NP难问题,在信息密码学和数论等研究中具有极重要的应用,将求解背包问题著名的二表算法的设计思想应用于三表搜索中,利用分治策略和无存储冲突的最优归并算法,提出一种基于EREW-SIMD共享存储模型的并行三表算法,算法使用O(2^n/4)个处理机单元和O(2^3n/8)的共享存储空间,在O(2^3n/8)时间内求解n维背包问题.将提出的算法与已有文献结论进行的对比分析表明:文中算法明显改进了现有文献的研究结果,是一种可在小于O(2^n/2)的硬件资源上,以小于O(2n/2)的计算时问求解背包问题的无存储冲突并行算法。  相似文献   

15.
Today's massively parallel machines are typically message-passing systems consisting of hundreds or thousands of processors. Implementing parallel applications efficiently in this environment is a challenging task, and poor parallel design decisions can be expensive to correct. Tools and techniques that allow the fast and accurate evaluation of different parallelization strategies would significantly improve the productivity of application developers and increase throughput on parallel architectures. This paper investigates one of the major issues in building tools to compare parallelization strategies: determining what type of performance models of the application code and of the computer system are sufficient for a fast and accurate comparison of different strategies. The paper is built around a case study employing the performance prediction tool (PerPreT) to predict performance of the parallel spectral transform shallow water model code (PSTSWM) on the Intel Paragon. PSTSWM is a parallel application code that was designed to evaluate different parallel strategies for the spectral transform method as it is used in climate modeling and weather forecasting. Multiple parallel algorithms and algorithm variants are embedded in the code. PerPreT uses a relatively simple algebraic model to predict execution time for SPMD (single program multiple data) parallel applications. Applications are modeled through parameterized formulae for communication and computation, where the parameters include the problem size, the number of processors used to execute the program, and system characteristics (e.g. setup times for communication, link bandwidth and sustained computing performance per processor). In this paper we describe performance models that predict the performance of the different algorithms in PSTSWM accurately enough to allow them to be compared, establishing the feasibility of such a demanding application of performance modeling. We also discuss issues in generating and validating the performance models, emphasizing the practical importance of tools such as PerPreT in such studies. © 1998 John Wiley & Sons, Ltd.  相似文献   

16.
N-body问题涉及了科学和工程中的许多领域,它的主要特点就是O(N~2)的计算量,采用并行计算方法是解决N-body问题巨大计算量的终极选择。针对该类问题的具体特点以及不同的并行计算机体系结构,目前有多种算法有效地减少了计算量,加快了求解速度。本文介绍了N-body问题的几种常见算法和它们的并行化方法。  相似文献   

17.
分层分布式狄利克雷分布(HD-LDA)算法是一个对潜在狄利克雷分布(LDA)进行改进的基于概率增长模型的文本分类算法,与只能在单机上运行的LDA算法相比,可以运行在分布式框架下,进行分布式并行处理。Mahout在Hadoop框架下实现了HD-LDA算法,但是因为单节点算法的计算量大,仍然存在对大数据分类运行时间太长的问题。而大规模文本集合分散到多个节点上迭代推导,单个节点上文档集合的推导仍是顺序进行的,所以处理大规模文本集合时仍然需要很长时间才能完成全部文本的分类。为此,提出将Hadoop与图形处理器(GPU)相结合,将单节点文本集合的推导过程转移到GPU上运行,实现单节点多个文档并行推导,利用多台并行的GPU对HD-LDA算法进行加速。应用结果表明,使用该方法能使分布式框架下的HD-LDA算法对大规模文本集合处理达到7倍的加速比。  相似文献   

18.
如何有效地解决I/O瓶颈问题,一直是高性能并行计算机有待解决的关键技术。该文提出了一种高效共享的并行I/O系统——HPPIO,该系统基于CC-NUMA并行系统结构,采用了一系列高效共享、并行I/O技术。该文对其分布与集中相结合的高效共享并行I/O系统结构、基于PCI Express的高性能I/O控制器设计等进行了介绍。  相似文献   

19.
头脑风暴优化BSO算法是一种新型的群体智能优化算法,启发于众人集思广益求解问题的模式,适合求解复杂多峰函数优化问题。但是,BSO求解多峰极值时需进行重复的迭代运算,面对大规模数据集时会出现计算效率与求解精度过低的现象。为解决上述问题,设计并实现了一种基于Spark的并行化头脑风暴优化算法,通过将BSO算法中计算复杂度最高的聚类与新解产生过程并行化,以提高算法的加速比与计算效率。特别地,基于并行化思想,将种群划分为多个子群进行协同演化,每个子群独立产生新解来保持种群多样性,提高算法的收敛速度。最后,利用并行化BSO算法求解多峰函数。实验表明,在并行节点的总核心数为10的情况下,并行化BSO算法计算时间节省一半,计算精度和串行BSO算法基本持平,收敛速度明显提高,实验结果说明了并行化BSO的有效性。  相似文献   

20.
传统的Apriori算法要多次扫描数据集,随着数据量的快速增长,传统的Apriori算法已经不能很好地适用于大数据分析,针对该情况设计了IPApriori算法。首先通过剪枝策略设计了一种适用于多维数据的IApriori算法,再将IApriori算法与Hadoop分布式框架相结合,实现了多维关联规则挖掘算法的并行化。将IPApriori算法运用到手机用户行为预测关联分析中,分析影响手机用户行为的一些主要因素,挖掘出手机用户行为与年龄维度、性别维度、时间维度、地点维度和手机品牌维度属性之间可能存在的某种关联。最后通过实验证明,算法的并行化和建立结构的方法可以降低系统的I/O负荷,提高算法的执行效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号