首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
朱一清 《计算机工程》2012,38(18):30-33
针对当前并发程序的不确定性和复杂性,以及程序原子性质获取困难的问题,提出一种并发程序原子性质提取方法。将并发程序中的同步区域转化为与并发操作相关的并发操作图后,采用频繁子图挖掘算法自动提取程序中的原子图,使其能刻画并发程序的原子性质,包括并发操作以及操作之间的控制依赖关系。实验结果证明,该方法能以较低的误测率有效提取并发程序的原子性质。  相似文献   

2.
并发程序切片是一种重要的并发程序分析手段.基于程序可达图可构造以程序状态和语句二元组为节点的、依赖关系具有可传递性的并发程序依赖图,解决依赖关系的不可传递性问题,提高切片精度.程序可达图通过交织执行模拟并发活动,分析代价较高.偏序约简是一种十分有效的并发系统状态空间约简技术,约简的并发系统状态空间包含所有的并发程序执行代表.为提高效率,该文将偏序约简技术扩展到程序可达图的约简中,在偏序约简理论的基础上,证明了基于未约简和约简的并发程序可达图构造的并发程序依赖图在进行切片计算时是等价的.实验结果表明,采用偏序约简技术使基于程序可达图的并发程序切片方法在保证切片精度不受损失的前提下显著提高切片效率.与其它高精度切片方法相比,基于约简程序可达图的切片方法的精度更高,在大多数情况下,切片效率也有一定提高.  相似文献   

3.
并发程序切片是并发程序分析的一种重要手段。针对多线程共享变量通信机制,在通过程序分析工具CodeSurfer获取程序基本信息的基础上构造程序可达图,生成以程序状态和语句二元组为节点的并发程序依赖图,实现了基于程序可达图的并发程序切片原型系统。初步实验结果表明,与传统的切片方法相比,采用基于程序可达图的并发程序切片方法,可有效地解决依赖关系不可传递问题,获得高精度的并发程序切片。  相似文献   

4.
并发程序与并发系统可以拥有非常高的执行效率和相对串行系统较快的响应速度,在现实中有着非常广泛的应用。但是并发程序与并发系统往往难以保证其实现的正确性,实际应用程序运行中的错误会带来严重的后果。同时,并发程序执行时的不确定性会给其正确性验证带来巨大的困难。在形式化验证方法中,人们可以通过交互式定理证明器严格地对并发程序进行验证。本文对在交互式定理证明中可用于描述并发程序正确性的验证目标进行总结,它们包括霍尔三元组、可线性化、上下文精化和逻辑原子性。交互式定理证明方法中常用程序逻辑对程序进行验证,本文分析了基于并发分离逻辑、依赖保证逻辑、关系霍尔逻辑等理论研究的系列成果与相应形式化方案,并对使用了这些方法的程序验证工具和程序验证成果进行了总结。  相似文献   

5.
王于愚  李祥 《计算机应用》2006,26(Z2):260-262
Java在其JDK5.0版本中推出了Concurrent并发类库包,该包为程序员提供了众多开发并发程序的新特征,如同步对象,并发集合,执行程序等,为并发程序的设计提供了更方便、安全的途径.本文在对并发类库包进行了分析与研究的基础上,运用并发包锁机制与原子操作实现了模拟电梯控制的仿真系统.  相似文献   

6.
约束求解应用到程序分析的多个领域,在并发程序分析方面也得到了深入的应用.并发程序随着多核处理器的快速发展而得到广泛使用,然而并发缺陷对并发程序的安全性和可靠性造成了严重的影响,因此,针对并发缺陷的检测尤为重要.并发程序线程运行的不确定性导致的线程交织爆炸问题,给并发缺陷的检测带来了一定挑战.已有并发缺陷检测算法通过约减无效线程交织,以降低在并发程序状态空间内的探索开销.比如,最大因果模型算法把并发程序状态空间的探索问题转换成约束求解问题.然而,其在约束构建过程中会产生大量冗余和冲突的约束,大幅度增加了约束求解的时间以及约束求解器的调用次数,降低了并发程序状态空间的探索效率.针对上述问题,提出了一种有向图约束指导的并发缺陷检测方法 GC-MCR (directed graph constraint-guided maximal causalityreduction).该方法旨在通过使用有向图对约束进行过滤和约减,从而提高约束求解速度,并进一步提高并发程序状态空间的探索效率.实验结果表明:GC-MCR方法构建的有向图可以有效优化约束的表达式,从而提高约束求解器的求解速度并减少求解器的调用次...  相似文献   

7.
研究的目标:为Ada并发软件的设计提供一种图形化的方法和支撑工具,以便用图形表达并发软件的设计,并生成并发程序的框架(并发程序单元的划分和它们之间的通信)。这样做的好处在于提高软件的生产率,改善软件质量。在提出用于并发软件理解的会合次序图的基础上,给出了并发设计语言CONDL的语法、语义描述和图形化表示(泳道),并简单介绍了所研制的一个工具CONDLAS(生成Ada代码框架)。  相似文献   

8.
并发执行的并行多线程程序执行过程中,不同的访存顺序会得到不同的执行结果.由于再次执行时,难以重现首次执行时的错误,导致并行程序的调试非常困难.确定性重放是解决该问题的一种方法,目的是通过记录并行程序执行过程中的不确定性事件,然后利用记录的事件重现出程序的原始执行.然而,已有的确定性重放方法会产生大量的记录日志,如何减小记录日志是确定性重放领域的研究热点,在实际应用中也是非常具有挑战性的问题.为了减小记录日志的开销,文中提出了一种基于逻辑时间的访存依赖约减方法,并在支持松弛存储一致性模型的处理器上提出具体的实现技术,该方法利用了访存依赖对应的逻辑时间之间的序关系进行约减.通过模拟评估所提出方法的性能和可扩展性.其中,在8核模拟平台上,通过Splash2测试程序进行评估,结果显示所提出的记录方法平均日志开销为0.11Bytes/Kilo-Instruction,与目前最好的访存依赖约减方法Timetraveler相比提高了75%;通过4核、8核和16核平台的评估结果,表明所提出约减方法具有较好的可扩展性.  相似文献   

9.
过程间并发程序分析问题是一个不可判定问题,理解这个不可判定问题的来源是发展一个有效的分析算法的基础.现有的证明[1]通过构造三个并发任务的PCP问题实例,证明过程间并发程序分析是一个不可判定问题.利用反射的思想,仅仅用两个并发任务构造该问题的一个PCP问题实例,证明在两个并发任务的情况下,过程间并发程序分析是一个不可判定问题.  相似文献   

10.
针对高可靠性、高质量的Java并行多任务程序设计,分析了Java多线程机制的原理及其实现技术,研究了程序并发过程中的同步机制和交互通信机制,比较了基于操作系统级和基于Java多线程级并发机制的实现结构,总结了并发程序中死锁预防的一些编程规则和策略.所构造的一个具有完全意义上的并发同步的框架实例有一定的实用价值.  相似文献   

11.
一个NT平台上分布式对象数据库服务器系统   总被引:4,自引:0,他引:4  
FISH系统是一个用于支持先进应用(如GIS,EC,CIMS)的新一代分布式对象数据库系统.该系统采用了许多新颖技术,如DSVM(distributed shared virtual memory)、持久堆、页式对象、透明锁、紧凑提交等.重点介绍了该系统的总体结构和设计思想,特别是FISH系统在Windows NT上实现所涉及的底层技术,包括内存映射、共享内存、远程过程调用、多线索连接、页面故障处理等.基于OO7的性能测试表明,FISH系统在NT机群环境下取得了与在分布式UNIX环境下同样高的分布执行效率  相似文献   

12.
This paper discusses the performance analysis of two generic fundamental parallel search techniques on shared memory multi-processor systems in solving the constraint satisfaction problem (CSP). Probabilistic analysis on their expected computation steps needed and their inherent load-balancing capability is performed. Corresponding experimental results are alsoprovided to verify the correctness of the proposed analysis. This fundamental analysis approach can be further applied to various advanced parallel search techniques or various problem solving techniques on parallel platforms. This research was supported in part by the University of Texas at San Antonio under the Faculty Research Award program  相似文献   

13.
With the falling price of memory, an increasing number of multimedia servers and proxies are now equipped with a large memory space. Caching media objects in the memory of a proxy helps to reduce the network traffic, the disk I/O bandwidth requirement, and the data delivery latency. The running buffer approach and its alternatives are representative techniques to caching streaming data in the memory. There are two limits in the existing techniques. First, although multiple running buffers for the same media object co-exist in a given processing period, data sharing among multiple buffers is not considered. Second, user access patterns are not insightfully considered in the buffer management. In this paper, we propose two techniques based on shared running buffers in the proxy to address these limits. Considering user access patterns and characteristics of the requested media objects, our techniques adaptively allocate memory buffers to fully utilize the currently buffered data of streaming sessions, with the aim to reduce both the server load and the network traffic. Experimentally comparing with several existing techniques, we show that the proposed techniques achieve significant performance improvement by effectively using the shared running buffers.  相似文献   

14.
Traditional software distributed shared memory (SDSM) systems modify the semantics of a real hardware shared memory system by relaxing the coherence semantic and by limiting the memory regions that are actually shared. These semantic modifications are done to improve performance of the applications using it. In this paper, we will show that a SDSM system that behaves like a real shared memory system (without the afore-mentioned relaxations) can also be used to execute OpenMP applications and achieve similar speedups as the ones obtained by traditional SDSM systems. This performance can be achieved by encouraging the cooperation between the SDSM and the OpenMP runtime instead of relaxing the semantics of the shared memory. In addition, techniques like boundaries alignment and page presend are demonstrated as very useful to overcome the limitations of the current SDSM systems.  相似文献   

15.
Stunkel  C.B. Janssens  B. Fuchs  W.K. 《Computer》1991,24(1):31-38
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately  相似文献   

16.
Determination of data dependences is a task typically performed with high-level language source code in today's optimizing and parallelizing compilers. Very little work has been done in the field of data dependence analysis on assembly language code, but this area will be of growing importance, e.g., for increasing instruction-level parallelism. A central element of a data dependence analysis in this case is a method for memory reference disambiguation which decides whether two memory operations may access (or definitely access) the same memory location. In this paper we describe a new approach for the determination of data dependences in assembly code. Our method is based on a sophisticated algorithm for symbolic value propagation, and it can derive value-based dependences between memory operations instead of just address-based dependences. We have integrated our method into the Salto system for assembly language optimization. Experimental results show that our approach greatly improves the precision of the dependence analysis in many cases.  相似文献   

17.
Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the central partitional techniques and the agglomerative hierarchical clustering techniques and their derivatives. These methods are well studied and well established. However, both categories have some drawbacks related to data dimensionality (for partitional algorithms) and to the bottom-up structure (for hierarchical agglomerative algorithms). To overcome these limitations, motivated by the problem of gene expression analysis with DNA microarrays, we present a hierarchical clustering algorithm based on a completely different principle, which is the analysis of shared farthest neighbors. We present a framework for clustering using ranks and indexes, and introduce the shared farthest neighbors (SFN) clustering criterion. We illustrate the properties of the method and present experimental results on different data sets, using the strategy of evaluating data clustering by extrinsic knowledge given by class labels.  相似文献   

18.
Memory reservations are used to provide real-time tasks with guaranteed memory access to a specified amount of physical memory. However, previous work on memory reservation primarily focused on private pages, and did not pay attention to shared pages, which are widely used in current operating systems. With previous schemes, a real-time task may experience unexpected timing delays from other tasks through shared pages that are shared by another process, even though the task has enough free pages in its own reservation. In this paper, we first describe the problems that arise when real-time tasks share pages. We then propose a shared-page management framework which enhances the temporal isolation provided by memory reservations in resource kernels that use the resource reservation approach. Our proposed solution consists of two schemes, Shared-Page Conservation (SPC) and Shared-Page Eviction Lock (SPEL), each of which prevents timing penalties caused by the seemingly arbitrary eviction of shared pages. The framework can manage shared data for inter-process communication and shared libraries, as well as pages shared by the kernel’s copy-on-write technique and file caches. We have implemented and evaluated our schemes on the Linux/RK platform, but it can also be applied to other operating systems with paged virtual memory.  相似文献   

19.
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also introduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors. Supported by the National Natural Science Foundation of China (Grant No. 60633050, 60621003) and the National High Technology Research and Development Program of China (Grant No. 2007AA01Z06)  相似文献   

20.
We consider in this paper the effectiveness of a new approach calledcompiler-controlledupdating to reduce coherence-miss penalties in shared-memory multiprocessors. A key part of the method is a compiler algorithm that identifies the last store instruction to a memory block in a flow graph using classic dataflow analysis techniques. Such stores are marked and replaced by update instructions that at run time make the memory copy clean. Whereas this static method shortens the read-miss latency for actively shared blocks, it can cause useless traffic for shared blocks that are effectively private. We therefore complement the static analysis with a dynamic simple heuristic in the cache coherence protocol aiming at classifying blocks as private or shared at run time. We evaluate the performance effects of compiler-controlled updating using six scientific parallel applications compiled by an optimizing compiler that incorporates our static analysis and then running them on a detailed CC-NUMA architectural simulation model. We have found that the compiler algorithm can convert between 83 and 100% of the dirty misses into clean misses. By adding the private/shared heuristic, the update traffic of private memory blocks can be practically eliminated. Overall, the static analysis in combination with the dynamic heuristic is shown to reduce the execution time by as much as 32%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号