首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dynamic Data Prefetching in Home-Based Software DSMs   总被引:1,自引:0,他引:1       下载免费PDF全文
1 IntroductionSoftware Distributed Shared Memory (DSM) provides the illusion of shared memoryon the top of distributed memory hardware. Most software DSM systems are page-based,using virtual memory protection to trap accesses to shared memory. These systems sufferfrom the high communication and coherence--induced overheads caused by the high levelof implementation and large granularity of coherence. Many techniques, such as multiplewriter protocolll], lazy release consistency[2], and data …  相似文献   

2.
Optimizing inter-processor (PE) communication is crucial for parallelizing compilers for message-passing parallel machines to achieve high performance. In this paper, we propose a technique to eliminate redundant inter-PE messages. This technique utilizes data-flow analysis to find a definition point that corresponds to a use point where the definition and the use occur in different PEs. If several read accesses occurred in the same PE use the data defined at the same definition point in another PE, redundant inter-PE messages are eliminated as follows: only one inter-PE communication is performed for the earliest read access and the previously received data are used for the following read. In order to guarantee the consistency of the data, a valid flag and a sent flag are provided for each chunk of received data. The control of these flags is equivalent to the coherence control by the self invalidation on a compiler aided cache coherence scheme.  相似文献   

3.
雷达遥感图像的处理,由于受单机内存空间的限制,一般采用I/O函数随机访问磁盘图像文件的方式,因此完成整幅图像的处理需要耗费大量的时间,很难达到实际应用的需要。基于分布式共享内存网络系统JIAJIA软件将多台微机的物理内存连接构成一个较大的共享内存空间,实现了多台微机对遥感图像同步、方便、快捷的处理。通过对SAR图像几何纠正、图像滤波、监督分类串行算法的分析,发展了相应的并行处理算法,并在8台运行Linux操作系统,主频400MHz,内存256兆的Pentium II PC机上进行了实验,都获得了超线性加速比的实验结果。  相似文献   

4.
Quantitative Evaluation of Register Pressure on Software Pipelined Loops   总被引:3,自引:0,他引:3  
Software Pipelining is a loop scheduling technique that extracts loop parallelism by overlapping the execution of several consecutive iterations. One of the drawbacks of software pipelining is its high register requirements, which increase with the number of functional units and their degree of pipelining. This paper analyzes the register requirements of software pipelined loops. It also evaluates the effects on performance of the addition of spill code. Spill code is needed when the number of registers required by the software pipelined loop is larger than the number of registers of the target machine. This spill code increases memory traffic and can reduce performance. Finally, compilers can apply transformations in order to reduce the number of memory accesses and increase functional unit utilization. The paper also evaluates the negative effect on register requirements that some of these transformations might produce on loops.  相似文献   

5.
针对嵌入式系统芯片SoC开发验证阶段的需求,介绍了一种通用的SoC软硬件协同仿真平台。软件仿真由C/C++和汇编语言编写,硬件仿真基于VMM验证方法学所搭建,SoC设计代码由RTL代码编写而成。将SoC设计代码中的ARM由DSM模型替代,通过VCS编译器将软硬件协同起来进行信息交互,实现一种速度快、真实性高、调试方便的...  相似文献   

6.
把管理科学与工程中的前沿理论“消错学”应用于软件项目管理,通过建立软件项目管理的消错模型,并用消除错误的“十五、六、三”法来分析软件项目管理中错误的发生原因和机制以及提出消除错误的方法.并结合实例说明该模型的具体应用过程,为研究和发现软件项目管理中出现错误的规律提供一种理论模型和方法,也为软件项目管理实践提供参考.  相似文献   

7.
OpenMP on Networks of Workstations for Software DSMs   总被引:3,自引:0,他引:3       下载免费PDF全文
This paper describes the implementation of a sizable subset of OpenMP on networks of workstations(NOWs) and the source-to-source OpenMP complier(AutoPar) is used for the JIAJIA home-based shared virtual memory system (SVM).The paper suggests some simple modifications and extensions to the OpenMP standard for the difference between SVM and SMP(symmetric multi processor),at which the OpenMP specification is aimed.The OpenMP translator is based on an automatic paralleization compiler,so it is possible to check the correctness of the semantics of OpenMP programs which is not required in an OpenMP-compliant implementation AutoPar is measured for five applications including both programs from NAS Parallel Benchmarks and real applications on a cluster of eight Pentium Ⅱ PCs connected by a 100 Mbps switched Eternet.The evaluation shows that the parallelization by annotaing OpenMPdirectives is simple and the performance of generatd JIAJIA code is still acceptable on NOWs.  相似文献   

8.
介绍了软件 DSM系统 JIAJIA在 SCI网络上的实现 .利用 SCI的编程接口 SISCI重新设计和实现了 JIAJIA的通信模块 .并进一步对其进行了性能测试和分析 ,结果显示 ,在 SCI网络上 JIAJIA的性能得到了大幅度的提高 ;值得注意的是 ,系统对小消息性能的提高明显低于对大消息性能的提高  相似文献   

9.
基于相关性的同步优化算法   总被引:3,自引:1,他引:3  
给出了一种基于数据相关图的同步优化算法,作为自动并行化编译器中的一个独立遍,利用并行化编译器对程序的相关性分析结果来实现编译时barrier同步优化。  相似文献   

10.
针对现有软件故障定位方法的缺陷,提出了一种基于代码检测的软件故障定位方法,用嵌入式模块获取软件发生故障时的模块运行序列,分析出软件故障可疑模块集及其故障系数,在此基础上对故障模块进行代码的分类检测,根据上述过程中得到的结果进行综合分析运算,得出软件故障的可疑代码集和故障系数,采用代码分析辅助工具进行排查,定位故障。该方法已成功应用于软件密集型系统的故障诊断,能快速有效地实现软件故障定位。  相似文献   

11.
软件去除零点漂移方法的讨论   总被引:12,自引:1,他引:11  
在数据采集系统中,由于采集硬件或是环境的原因,采集数据中存在零点漂移。在数据的预处理时,需要去除零点漂移。纯硬件的方法使得系统的性价比不高,抑制效果不佳,为此提出了采用软件去除零点漂移的方法。软件的方法简单易行,效果好。用软件的方法结合硬件去除零点漂移,已在HT-7 Tokamak核聚变实验中得到了应用并取得了很好的效果。  相似文献   

12.
Both hardware and software prefetching have been shown to be effective in tolerating the large memory latencies inherent in shared-memory multiprocessors; however, both types of prefetching have their shortcomings. While software schemes require less hardware support than hardware schemes, they must generate address calculation instructions and a prefetch instruction for each datum that needs to be prefetched. Hardware schemes, however, must become progressively more complex to be able to compute data access strides and to increase the prefetching lookahead. In this paper, we propose an integrated hardware/software prefetching method that uses simple hardware that can handle most data accesses and software prefetching for the few remaining accesses. A compile time algorithm analyzes the access streams formed by array references and determines sequences of consecutive memory accesses to an access stream that can be prefetched by the hardware mechanism. This analysis is based on the relative memory locations of consecutive accesses to an access stream and the number of intervening data references between consecutive accesses to an access stream. In addition, the prefetching lookahead can be set separately for each access stream. Our approach yields an effective scheme that minimizes both CPU overhead and hardware costs. Execution-driven simulations show our method to be very effective.  相似文献   

13.
领域设计模型与应用系统设计模型是软件产品线开发与定制阶段的重要产物。在产品线的生命周期中,为 了保证这两个模型之间的一致性,需要采用自动化或半自动化的手段实现模型之间的同步。针对该问题,提出了一种 基于GRoundTram的软件产品线设计模型的同步方法,称为SPLSync-GRoundTram。该方法将领域设计模型与应用 系统设计模型之间的同步问题转换为基于图的模型双向变换问题,并使用GRound"hram实现自动化的同步操作。 出了该模型同步方法的具体操作步骤,并通过一个“网上书城”的设计模型实例展示其有效性。  相似文献   

14.
基于共享虚拟存储(shared virtual memory,SVM)PC机群的大规模并行地理图像处理原型系统ParGIP(parallel geographical image processing)采用Client-Server计算模型,通过软件分布式共享存储(software distributed shared memory,software DSM)中间层将PC机群组织成一个逻辑上共享的内存的并行计算平台,地理图像处理可以充分利用ParGIP提供的大共享内存和并行处理能力来提高性能,缩短处理周期,从而解决传统单机串行方式下地理图像处理中内存匮乏和计算能力不足的问题,ParGIP还进一步将机群中各个结点上分布的磁盘组织起来,提供地理影像库所需的海量存储空间和并行I/O能力,测试结果表明,ParGIP的8机并行I/O带宽达到102.6MB/s,典型的图像处理算法获得了接近线性的加速比。  相似文献   

15.
Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier than others should wait at the barrier, the total processor utilization decreases. In this paper, to find the sources of the barrier waiting time, parallel programs are executed on the various grain sizes through execution-driven simulations. In simulation studies, we found that even if approximately equal amounts of work are distributed to each processor, all processes may not arrive at a barrier at the same time. The reasons are that the different numbers of cache misses and instructions within in partitioned grains result in the difference in arrival time of processors at the barrier. In this paper, the two-phased barrier is considered to reduce the blind waiting time in the traditional barrier scheme, which can be simply constructed by dividing one specific stage for the synchronization into two stages. On each stage, processes decide their stall or not, which is dependent on the current execution state of grains running on any given processors. Simulation results show that the reduced barrier waiting times attributed to the two-phased barrier contribute to the performance improvement of parallel programs.  相似文献   

16.
针对克隆代码与非克隆代码产生"漏洞"倾向性的问题进行了研究,基于"漏洞"对不同类型克隆和非克隆代码进行了比较分析。首先提取软件系统中具有漏洞的代码,并使用克隆检测工具检测出软件的克隆代码;其次分别提取能够产生"漏洞"的克隆和非克隆代码,并分别计算不同克隆类型和非克隆的BOC漏洞密度和LOC漏洞密度;最后对type-1、pure type-2、pure-type3的克隆和非克隆漏洞密度进行了对比分析,并对代码中产生的"漏洞"类型进行分类分析,使用曼—惠特尼检验(WMM)验证了结果的有效性。实验结果表明type-1类型的克隆更容易产生"漏洞",pure type-3类型的克隆引入漏洞的几率相对较小。研究还得出在克隆和非克隆代码中分别存在出现频率较高的"漏洞"集合,增加了对克隆特性的理解,帮助软件设计和开发人员减少代码克隆对软件造成的负面影响。  相似文献   

17.
Load balance is an important issue for the performance of software distributed shared memory (DSM) systems. One solution of addressing this issue is exploiting dynamic thread migration. In order to reduce the data consistency communication increased by thread migration, an effective load balance scheme must carefully choose threads and destination nodes for workload migration. In this paper, a group-based load balance scheme is proposed to resolve this problem. The main characteristic of this scheme is to classify the overloaded nodes and the lightly loaded nodes into a sender group and a receiver group, and then consider all the threads of the sender group and all the nodes of the receiver group for each decision. The experimental results show that the group-based scheme reduces more communication than the previous schemes. Besides, this paper also resolves the problem of the high costs caused by group-based schemes. Therefore, the performance of the test programs is effectively enhanced after minimizing the communication increased by thread migration.  相似文献   

18.
Small organisations can now have access to high raw processing power using networks of workstations (NOW) as parallel computing platforms. Software Distributed Shared Memory (Software DSM) packages have been developed to facilitate the programming of such systems. However, because of the high interprocess latencies in a NOW, the performance of a software DSM application is more susceptible to the partitioning of the problem than what might be expected.This paper presents an approach for a tool to visualise the execution of a program in a way that highlights performance bottlenecks. The tool associates identified bottlenecks with the corresponding source code lines in order to determine what piece of code is the cause of poor performance. The visualisation technique is demonstrated in two case studies. They clearly show that the visualisation is indeed useful and provides an effective way to acquire an understanding of what characterises an applications sharing behaviour.  相似文献   

19.
叶卫东  陈飞 《测控技术》2007,26(8):52-54
详细讲述了PTP(precise time protocol)的时钟同步原理,分析了影响时钟同步精度的因素,并给出了一种在基于LAN的ARM硬件平台上纯软件实现IEEE1588协议的方案.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号