首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
并行性分析技术是并行编译器中的关键分析技术,也是这一领域研究的热点问题,其目的是对串性程序进行依赖关系分析,提取可并行成分,并在此基础上对串行程序进行变换和分割。文章主要讨论了在基于JAVA的自动并行编译系统JAPS中,并行性分析模块的设计框架和实现方式。  相似文献   

2.
并行遗传算法(PGA)将并行计算机的高速并行性和遗传算法天然的并行性相结合,极大地促进了遗传算法的研究与应用。该文对近年来并行遗传算法的模型、性能分析、算法改进、实现平台进行了归纳和评述,并且对并行遗传算法今后的主要研究方向和发展前景进行了展望。  相似文献   

3.
并行遗传算法(PGA)将并行计算机的高速并行性和遗传算法天然的并行性相结合,极大地促进了遗传算法的研究与应用。该文对近年来并行遗传算法的模型、性能分析、算法改进、实现平台进行了归纳和评述,并且对并行遗传算法今后的主要研究方向和发展前景进行了展望。  相似文献   

4.
解非等同并行多机调度问题的并行遗传算法   总被引:4,自引:0,他引:4       下载免费PDF全文
高家全  方蕾 《计算机工程》2007,33(1):198-199
针对最小化完工时间的非等同并行多机调度一类问题,提出了一种混合遗传算法。该算法根据问题的特点,采用一种自然编码方案,此编码与调度方案一一对应,并对初始种群、交叉和变异等方法进行了研究。在鉴于遗传算法自然的并行性特点的基础上,实现了主从式控制网络模式下并行混合遗传算法。计算结果表明,并行混合遗传算法是有效的,优于启发式算法和遗传算法,有着较高的并行性,能适用于大规模非等同并行多机调度问题。  相似文献   

5.
一、引言并行巨型机的发展与并行处理技术(并行性的开发和利用)的发展紧密相关。并行性种类繁多,目前并行处理中最重要的两种是: (1) 控制并行性——允许多个不同操作同时进行,利用控制并行性的典型例子有流水线技术、多功能部件技术。  相似文献   

6.
PBASE体系结构与并行性   总被引:1,自引:0,他引:1  
王珊  陈红 《计算机学报》1996,19(3):186-190
并行数据库系统以高事务吞吐量和低响应时间为目标,一个好的体系结构方案将大大有助于实现该目标。本文介绍并行数据PBASE系统的总体结构、分派进程结构以及PBASE如何实现事务间与查询间的并行性。PBASE采用的体系结构可以方便地实现并行性,达到并行数据库的预期目标。  相似文献   

7.
为研究并行图形绘制技术,介绍图形绘制的流水线过程,对其内在的可并行性进行分析,研究并行绘制的实现方式,包括流水线并行、数据并行和作业并行,以及前分布拼接合成、中分布拼接合成和后分布拼接合成,讨论并行绘制面临的主要问题及其发展趋势。  相似文献   

8.
并行I/O技术研究   总被引:7,自引:0,他引:7  
从分析提高I/O性能的途径开始,对在分布主存的高性能计算机中利用存储系统并行性来完成数据访问的并行文件系统所涉及到的问题进行了分析和探讨,最后介绍了几个著名的并行文件系统。  相似文献   

9.
胡雷刚  肖明清  王磊 《计算机测量与控制》2008,16(10):1373-1375,1379
针对并行测试系统中不同UUT测试任务组的并行测试效率提高存在差异的现象,提出了测试任务的可并行性概念,以描述测试任务在并行测试过程中的固有属性;首先根据测试任务组的并行测试效率不同的现象,提出了测试任务可并行性的概念;然后给出了衡量可并行性的可并行度指标,用以定量地指导并行测试系统开发过程中仪器资源的配置;最后通过应用实例验证了可并行性概念的合理性、可并行度指标的实用性;建立的可并行性概念不仅丰富了测试领域的理论基础,对并行测试系统的仪器资源的配置也具有指导价值。  相似文献   

10.
传统的并行编译技术能够在编译期间进行相关性分析,有效地并行化循环程序,但是对于程序运行时潜在的并行性却无能为力.因此,并行编译技术必须使用实时依赖分析技术,尽可能挖掘循环级并行性.本文提出仿射依赖关系,消除了循环迭代依赖;基于投机并行思想,提出了SPAD方法.实例分析表明,SPAD是有效的.与LRPD和SPNT方法相比较,SPAD做了重要的改进,因此是更通用的投机并行化方案.  相似文献   

11.
基于方法调用一般化模型的并行性分析   总被引:3,自引:0,他引:3  
该文给出了一种考虑了面向对象语言的多态和对象引用别名问题的对象方法间并行性的分析方法,这种方法用于面向对象语言并行化中的并行性分析,文中首先给出了一般化的方法调用模型,然后基于该模型给出了表达式化简,过程和过程间分析的算法,该算法可以求出变量的定义和使用集合,由于并行性分析,该文给出的简单例子即可以将该文的和相关的工作加以区别。其技术已经在作者研制的Java并行化编译器JAPS-Ⅱ中实现。  相似文献   

12.
This paper presents a system for parallel execution of Prolog supporting both independent conjunctive and disjunctive parallelism. The system is intended for distributed memory architecture and is composed of a set of workers with a hierarchical structure scheduler. The execution model has been designed in such a way that each worker's environment does not contain references to terms in other environments, thus reducing communication overhead. In order to guarantee the improvement of the performance by the parallelism exploitation, a granularity control has been introduced for each kind of parallelism. For conjunctive parallelism PDP applies a control based on the estimation provided by CASLOG. The features of the system allow to introduce this control without adding overhead. For disjunctive parallelism PDP controls granularity by applying a heuristic-based method, which can be adapted to other parallel Prolog systems. Different scheduling policies have also been tested. The system has been implemented on a transputer network and performance results show that it provides a high speedup for coarse grain parallel programs.  相似文献   

13.
新一代视频编码标准获得了较高的编码效率,但同时也增加了计算量。HEVC(High Efficiency Video Coding)并行算法能够提高编码速度,开发适用于多核处理器的并行编码算法对于满足高清视频实时传输和大规模实时共享具有十分重要的意义。分析帧内预测算法在处理像素过程中数据之间的依赖关系,进行基于预测模式的细粒度并行性的设计。块与块之间采用流水线处理,减少帧内预测算法的执行时间。利用动态可编程可重构视频阵列处理器,对帧内预测算法进行验证。实验结果表明,相比于HM16.0官方测试标准,信噪比提高了10%,算法的执行时间减少了大约70%。  相似文献   

14.
Wei  Xing  Hu  Huiqi  Duan  Huichao  Qian  Weining  Zhou  Aoying 《World Wide Web》2019,22(6):2561-2587

To support the large-scale analytic for Web applications, the backend distributed data management system must provide the service for accessing massive data. Thus, the scan operation becomes a critical step. To improve the performance of scan operation, modern data management systems usually rely on the simple partitioned parallelism. Under the partitioned parallelism, tables are consist of several partitions, and each scan operation can access multiple partitions separately. It is a simple and effective solution for a single scan operation. In this paper, we consider managing multiple scan operations together, where the situation is no longer straightforward. To address the problem, we propose the parallel strategy to schedule batched scan operations together beyond the simple partitioned parallelism. For the sake of performance, first, we utilize replications to increase the parallelism and propose an effective load balancing strategy over replication nodes based on linear programming. Second, we propose an effective chunk-based scheduling algorithm for multi-threading parallelism on each node to guarantee all threads have even workloads under a qualified cost model. Finally, we integrate our parallel scan strategy into an open-sourced distributed data management system. Experimental evaluation shows our parallel scan strategy significantly improves the performance of scan operation.

  相似文献   

15.
As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power.  相似文献   

16.
提出了面向科学计算的64位流体系结构——MASA,它具有强局域性、并行性、解耦合访存操作和计算操作等特征,特别适合于计算密集型的并行应用.作者使用时钟精确的模拟器评测了流体力学中的典型应用在MASA上的运行性能,结果表明MASA在500MHz的情况下能够获得比1.6GHz的Iantium2近4倍的加速,证实了流体系结构在高性能计算领域的极大潜力.  相似文献   

17.
This paper proposes a predicate nameddosim which provides a new function for parallel execution of logic programs. The parallelism achieved by this predicate is a simultaneous mapping operation such as bagof and setof predicates. However, the degree of parallelism can be easily decided by arranging the arguments of the dosim goal. The parallel processing system with dosim was realized on a tight-coupled multiprocessor machine. To control the degree of parallelism and reduce the amount of memory required for execution, we introduce the grouping method for the goals executed in parallel and some variations of the dosim predicate. The effectiveness of the proposed method is demonstrated by the results of the execution of several applications.  相似文献   

18.
陈嘉  安虹  刘圆  王莉 《计算机仿真》2007,24(6):81-85
多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步.事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性.介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chip Multiple-Superscaler,TMCMS)上的并行编程模型,以及针对循环程序的执行模型;以FFT程序为例具体介绍了循环结构的并行化方法和编译转换过程.在初步的实验中,将处理单元从1增加到16个时,在所设计的编程模型的支持下,IPC(Instruction Per Cycle)有接近线性的增长,说明该并行编程模型能够充分发掘程序中潜在的细粒度线程级并行性,同时保持并行程序设计的简单性.  相似文献   

19.
叶孝斌  杨树强 《计算机工程》2000,26(3):57-58,76
并行I/O是基于无共享结构的并行数据库系统提高性能的有效途径之一。它通过并行磁盘服务和网络传输并行化提供了高带宽I/O。文章设计实现了基于无共享结构的并行数据库系统的并行I/O,探讨了设计并行I/O时的几个关键问题及实现技术。  相似文献   

20.
Developing parallel object-oriented programs in the framework of VDM   总被引:2,自引:0,他引:2  
After surveying the rely-guarantee and some related approaches to extending VDM to develop parallel programs, two main problems are found. One problem is that all explorations of parallelism are done in the stage of operation decomposition or afterwards so that the degree of parallelism is restricted. Another problem is that the atomicity is fixed at one level and the development complexity can not be controlled effectively because there is no natural means to let the level of granularity be under flexible control of the designer. In order to solve these two problems, we introduce a new concept — data decomposition which is based on the ideas of model split, modularisation and operation decomposition, and combine it with VDM to form a more general formal development method DD-VDM, in which some kind of operation decompositions, i.e., operation split can be done before some data reifications. Then a nested parallel object-oriented structure is proposed. Combining these ideas into the unified framework, this paper presents a hierarchical object-oriented design methodology in which two kinds of parallelism, that is, internal parallelism and service parallelism, can be exploited gradually and a kind of virtual atomicity is provided.This research is partially supported by China's National Foundation for Excellent Young Scientists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号