共查询到20条相似文献,搜索用时 93 毫秒
1.
本提出了一种支持非独立“与”并行的新型“与”并行执行模型DAPM,它通过在共享变元之间建立一种类似生产和消费的同步依赖关系以防止它们在并行执行时产生的约束冲突。与其它模型相比,DAPM可以开发更多的“与”并行。本还从理论上对DAPM的运行代价进行了分析,其分析结果表明DAPM只需较小的运行时刻支持。 相似文献
2.
数据并行模型应用到MIMD机器上,实现SPMD模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Multi-c的设计和实现。正在实现的Muliti-c编译器,以预编译的方式接受SIMD形式的程序说明,放宽同步要求,产生能以SPMK方式在并行系统上运行的C程序。 相似文献
3.
文中介绍了在并行计算机WYSE Series 7000i上开发的一个基于Prolog的并行推理系统BTJ/WYSE,它为用户使用Prolog语言进行逻辑程序设计提供了良好的软件开发环境,而且支持对逻辑程序的并行执行。 相似文献
4.
5.
本文实现的SICE(SIMDCEmulator)是一个在串行机的环境下模拟进行SIMD计算机程序设计的软件包。SIC(SIMDC)是作者定义的一种基于C语言的SIMD并行扩展语言,它一方面支持反映SIMD结构特点的的并行语句,更重要的是可支持SIMD结构的定义,能方便的用于SIMD机器的算法研究。 相似文献
6.
7.
8.
9.
本文介绍实现重入MS-DOS的两种方法,给出一个实用的安全重入DOS的接口程序流程。使用这一接口程序连接一个外部实用过程可以安全地重入DOS,实现与前台程序的并行执行。 相似文献
10.
BJ—01并行计算机的系统软件 总被引:1,自引:0,他引:1
本文介绍了BJ-01并行计算机操作系统MOS,并行C语言PCL以及接口软件的设计和实现技术。此外还讨论了BJ-01并行机的并行执行环境和并行程序调试工具。 相似文献
11.
本文在并行系统模拟环境中,采集了一个迭代类并行程序实例的运行时间数据,据此,分析了影响程序运行时间的主要因素,建立了一个并行程序运行时间推算模型,从而可以在迭代次数,输入数据规模,以及并行系统的配置等三个方向上对程序运行时间进行预测,实验数据表明,该模型是相当精确的,可以为我们节省大量的模拟时间。 相似文献
12.
胃肠道为人体重要消化器官,胃肠疾病常见且病因复杂.为了推动胃肠疾病诊疗的智能化、精准化发展,有效传承医生的经验丰富,文中提出基于ACP理论的平行胃肠诊疗系统框架.ACP理论为平行智能的核心,由人工社会(Artificial Societies)、计算实验(Computational Experiments)、平行执行(Parallel Execution)三部分构成.在平行胃肠诊疗系统中,构建人工胃肠道(A)模拟胃肠疾病实际诊疗情况,运用计算实验(C)在人工胃肠平台上进行各类胃肠疾病诊疗实验并评估最佳诊疗方案,最终借助平行执行(P)实时地对实际胃肠诊疗进行导引,持续更新人工胃肠系统并优化诊疗方案,实现虚拟诊疗和实际诊疗之间的虚实互动.整个系统框架的构建融合知识图谱、深度学习、虚拟现实/增强现实、知识自动化等多种前沿技术,致力于优化胃肠疾病诊疗,推进健康中国建设. 相似文献
13.
作为数据中心大规模处理框架,MapReduce集群包含成百上千个节点,多采用推测执行的方法来有效解决并行计算中的掉队任务。针对集群中实时性需求较高并且任务量较小的目标作业,提出基于MapReduce模型的推测执行优化算法,其目的是在满足实时性需求的基础上尽量减少目标作业的完成时间。首先通过分析任务模型和时间模型,引入数学0-1规划模型,求得整体作业的完成时间最小;然后设计可以在多项式复杂度内完成的启发式算法,目的是在可用资源允许的范围内尽量逼近最优值;最后通过大量实验模拟验证算法的执行效果。 相似文献
14.
In this paper we introduce our estimation method for parallel execution times, based on identifying separate “parts” of the work done by parallel programs. Our run time analysis works without any source code inspection. The time of parallel program execution is expressed in terms of the sequential work and the parallel penalty. We measure these values for different problem sizes and numbers of processors and estimate them for unknown values in both dimensions using statistical methods. This allows us to predict parallel execution time for unknown inputs and non-available processor numbers with high precision. Our prediction methods require orders of magnitude less data points than existing approaches. We verified our approach on parallel machines ranging from a multicore computer to a peta-scale supercomputer. 相似文献
15.
利用GPU进行加速的归一化差分植被指数(Normalized Differential Vegetation Index,NDVI)提取算法通常采用GPU多线程并行模型,存在弱相关计算之间以及CPU与GPU之间数据传输耗时较多等问题,影响了加速效果的进一步提升。针对上述问题,根据NDVI提取算法的特性,文中提出了一种基于GPU多流并发并行模型的NDVI提取算法。通过CUDA流和Hyper-Q特性,GPU多流并发并行模型可以使数据传输与弱相关计算、弱相关计算与弱相关计算之间达到重叠,从而进一步提高算法并行度及GPU资源利用率。文中首先通过GPU多线程并行模型对NDVI提取算法进行优化,并对优化后的计算过程进行分解,找出包含数据传输及弱相关性计算的部分;其次,对数据传输和弱相关计算部分进行重构,并利用GPU多流并发并行模型进行优化,使弱相关计算之间、弱相关计算和数据传输之间达到重叠的效果;最后,以高分一号卫星拍摄的遥感影像作为实验数据,对两种基于GPU实现的NDVI提取算法进行实验验证。实验结果表明,与传统基于GPU多线程并行模型的NDVI提取算法相比,所提算法在影像大于12000*12000像素时平均取得了约1.5倍的加速,与串行提取算法相比取得了约260倍的加速,具有更好的加速效果和并行性。 相似文献
16.
We present a portable, parallel implementation of an urban air quality model. The parallel model runs on the Intel Delta, Intel Paragon, IBM SP2, and Cray T3D, using a variety of standard communication libraries. We analyze the performance of the air quality model on these platforms based on a model derived from the parallel communication behavior and sequential execution time of the air quality model. We predict the performance of the next generation air quality models based on this analysis. 相似文献
17.
18.
In this paper, we present a software tool, RTS (real time simulator), that analyses the time cost behaviour of parallel computations through simulation. It is assumed in RTS that the computer system which supports the executions of parallel computations has a limited number of processors all processors have the same speed and they communicate with each other through a shared memory. In RTS, the time cost of a parallel computation is defined as a function of the input, the algorithm, the data structure, the processor speed, the number of processors, the processor power allocation, the communication and the execution environment. How RTS models the time cost is first discussed in the paper. In the model, a locking technique is used to manipulate the access to the shared memory, processing power is equally allocated among all the operations that are currently being performed in parallel in the computer system, and the number of operations in the execution environment of a parallel computation changes from time to time. How RTS works and how the simulation is used to do time cost analysis are also discussed. 相似文献
19.
20.
《Parallel Computing》2014,40(10):661-680
Data-flow is a natural approach to parallelism. However, describing dependencies and control between fine-grained data-flow tasks can be complex and present unwanted overheads. TALM (TALM is an Architecture and Language for Multi-threading) introduces a user-defined coarse-grained parallel data-flow model, where programmers identify code blocks, called super-instructions, to be run in parallel and connect them in a data-flow graph. TALM has been implemented as a hybrid Von Neumann/data-flow execution system: the Trebuchet. We have observed that TALM’s usefulness largely depends on how programmers specify and connect super-instructions. Thus, we present Couillard, a full compiler that creates, based on an annotated C-program, a data-flow graph and C-code corresponding to each super-instruction. We show that our toolchain allows one to benefit from data-flow execution and explore sophisticated parallel programming techniques, with small effort. To evaluate our system we have executed a set of real applications on a large multi-core machine. Comparison with popular parallel programming methods shows competitive speedups, while providing an easier parallel programing approach. More specifically, for an application that follows the wavefront method, running with big inputs, Trebuchet achieved up to 4.7% speedup over Intel® TBB novel flow-graph approach and up to 44% over OpenMP. 相似文献