共查询到20条相似文献,搜索用时 15 毫秒
1.
使用扩展逻辑效力的逻辑路径尺寸优化方法 总被引:1,自引:0,他引:1
为解决集成电路物理设计中考虑互连线影响的逻辑路径延迟优化问题,提出一个计入互连线负载的扩展的逻辑效力(ELE),并针对ELE给出一个可同时优化逻辑路径中各个逻辑门尺寸及各段互连线长度的优化流程.ELE在保留原有逻辑效力参数的同时,使用互连寄生参数提取软件获得的Ⅱ型互连线参数,实现对带有互连线负载的逻辑门的传播延迟的描述和估计;逻辑路径优化流程采用效力延迟分配策略作为初始条件来表示各段互连线负载对总效力延迟的影响,将所用目标单元库和制造工艺的物理尺寸信息作为限制条件,以ELE表达式为核心展开优化计算,辅以动态规划办法,无需迭代运算,仅通过一轮计算即可求得全部结果.实验结果表明,该流程计算任务简单,资源耗费少,可以准确、快速地获得所需的逻辑门尺寸和互连线长度;结果清晰合理,与目标单元库和工艺库完全兼容. 相似文献
2.
Granularity Analysis for Exploiting Adaptive Parallelism of Declarative Programs on Multiprocessors 下载免费PDF全文
1IntroductionAutomaticparallelexecutionofdeclarativelanguageprograms(e.g.functionprogramsandlogicprograms)isattractive,asitmakestheuseofparallelcomputersveryeasy,andtheprogrammerneednotbeconcernedwiththespecificsoftheunderlyingparallelarchitecture.However,ifseveralprocessorsareexecutingconcurrently,exploitingadaptiveparallelismishardduetonon-determinismoftaskgranularityanddatadependenciesamongtasks.TheearlysolutionproposedbyConeryandKibler[2]usesanorderingalgorithmtodeterminedependenciesatrun… 相似文献
3.
Prefetching is one of several techniques for hiding and tolerating the large memory latencies of scalable multiprocessors. In this paper, we present a performance model for analyzing the limits and effectiveness of data prefetching. The model incorporates the effects of program behavior, network characteristics, cache coherency protocols, and memory consistency model. Our results indicate that, as long as there is enough extra network bandwidth, prefetching is very effective in hiding large latencies. In machines with sufficiently large caches to hold the program working set, the intra- and internode cache interference is marginally low enough to have any significant impact on prefetching performance. Furthermore, we reveal the fact that the effective prefetch distance plays a vital role and adapts extremely well to changes in cache miss rates and remote latencies, thus allowing prefetches to be more effective in hiding latency. An adaptive algorithm is provided to optimize the prefetch distance. This is based on the dynamic behavior of the application, interconnection network, and distributed caches and memories. This optimization of the prefetch distance constitutes a significant advantage of prefetching over other latency tolerating techniques, such as multithreading. We show that the prefetch distance can be chosen constant, program-dependent, or decided by performance information. The optimal distance could be adaptively determined using both compile-time and runtime conditions. Our results are therefore useful not only to compiler writers, but also for the development of runtime support systems in multiprocessors. In large-scale systems, in which network traffic control predominates the performance, the ultimate goal is to match program behavior with machine behavior. 相似文献
4.
Trends in modern multicore architecture design requires software developers to develop and debug multithreaded programs. Consequently, software developers must face new challenges because of bug patterns occurring at runtime and due to the non-deterministic behavior of multi-threaded program executions. This calls for new defect-localization techniques. There has been much work in the field of defect localization for sequential programs on the one side and on the localization of specific multithreading bugs on the other side, but we are not aware of any general technique for multithreaded programs. This paper proposes such an approach. It generalizes data mining-based defect-localization techniques for sequential programs. The techniques work by analyzing call graphs. More specifically, we propose new graph representations of multithreaded program executions as well as two mining-based localization approaches based on these representations. Our evaluation shows that our technique yields good results and is able to find defects that other approaches cannot localize. 相似文献
5.
6.
《Journal of Parallel and Distributed Computing》1993,18(3):371-389
This paper describes an experimental study of three dataflow paradigms, namely, no dataflow, pipelined dataflow, and network dataflow, in multithreaded database transitive closure algorithms on shared memory multiprocessors. This study shows that dataflow paradigm directly influences performance parameters such as the amount of interthread communication, how data are partitioned among the threads, whether access to each page of data is exclusive or shared, whether locks are needed for concurrency control, and how calculation termination is detected. The algorithm designed with no dataflow outperforms the algorithms with dataflow. Approximately linear speedup is achieved by the no dataflow algorithm with sufficient workload and primary memory. An exclusive access working set model and a shared access working set model describe the interactions between two or more threads′ working sets when access to each page of data is exclusive or shared among the threads, respectively. These models are experimentally verified. 相似文献
7.
把设计结构矩阵(design structure matrix, DSM)扩展为动态DSM和静态DSM. 动态DSM中元素为实例修改参数的集合,静态DSM元素为修改参数集合对另一模块的重要度.通过匹配算法把动态DSM映射到静态DSM,并对静态DSM进行重构,实现了实例修改路径的优化.在设计活动改变时,变更动态DSM并映射到静态DSM,实现对设计活动任意阶段的优化.最后以某天文望远镜的关键部件的实例设计过程为例,说明了该方法的可行性和实用性. 相似文献
8.
Write-invalidate protocols suffer from memory-access penalties due to coherence misses. While write-update or hybrid update/invalidate protocols can reduce coherence misses, the update traffic can increase memory-system contention. We show in this paper that update-based cache protocols can perform significantly better than write-invalidate protocols by incorporating a write cache in each processing node. Because it is legal to delay the propagation of modifications of a block until the next synchronization under relaxed memory consistency models, a write cache can significantly reduce traffic by exploiting locality in write accesses. By concentrating on a cache-coherent NUMA architecture, we study the implementation aspects of augmenting a write-invalidate, a write-update and two hybrid update/invalidate protocols with write caches. Through detailed architectural simulations using five benchmark programs, we find that write caches, with only a few blocks each, help write-invalidate protocols to cut the false-sharing miss rate and hybrid update/invalidate protocols to keep other copies, including the memory copy, clean at an acceptable write traffic level. Overall, the memory-access penalty associated with coherence misses is drastically reduced. 相似文献
9.
10.
Lunjin Lu 《Higher-Order and Symbolic Computation》2003,16(4):341-377
This paper presents an abstract semantics that uses information about execution paths to improve precision of data flow analyses of logic programs. The abstract semantics is illustrated by abstracting execution paths using call strings of fixed length and the last transfer of control. Abstract domains that have been developed for logic program analyses can be used with the new abstract semantics without modification. 相似文献
11.
一种基于关键路径分析的CPU-GPU异构系统综合能耗优化方法 总被引:1,自引:0,他引:1
GPU强大的计算性能使得CPU-GPU异构体系结构成为高性能计算领域热点研究方向.虽然GPU的性能/功耗比较高,但在构建大规模计算系统时,功耗问题仍然是限制系统运行的关键因素之一.现在已有的针对GPU的功耗优化研究主要关注如何降低GPU本身的功耗,而没有将CPU和GPU作为一个整体进行综合考虑.文中深入分析了CUDA程序在CPU-GPU异构系统上的运行特点,归纳其中的任务依赖关系,给出了使用AOV网表示程序执行过程的方法,并在此基础上分析程序运行的关键路径,找出程序中可以进行能耗优化的部分,并求解相应的频率调节幅度,在保持程序性能不变的前提下最小化程序的整体能量消耗. 相似文献
12.
文中讨论了并行程序的优化问题,指出并行程序的优化应从数据划分、通信优化和串行优化三个方面着手。针对传统加速比的缺点和不足,我们提出了优化加速比模型来评价优化并行程序的性能;对NAS基准测试程序MG和FT进行了优化,用优化加速比模型分析了上述两个程序在IBM SP2上的性能。 相似文献
13.
14.
15.
多处理器调度问题是影响系统性能的关键问题,基于任务复制的调度算法是解决多处理器调度问题较为有效的方法.文中分析了几个典型的基于任务复制算法,提出了基于动态关键任务(DCT)的多处理器任务分配算法.DCT算法以克服贪心算法不足为要点,调度过程中动态计算任务时间参数,准确确定处理器的关键任务,以关键任务为核心优化调度,逐步改善调度结果,最终取得最优的调度结果.分析和实验证明,DCT算法优于现有其它同类算法. 相似文献
16.
《IEEE transactions on pattern analysis and machine intelligence》1980,(3):278-286
There are a number of practical difficulties in performing a path testing strategy for computer programs. One problem is in deciding which paths, out of a possible infinity, to use as test cases. A hierarchy of structural test metrics is suggested to direct the choide and to monitor the coverge of test paths. Another problem is that many of the chosen paths may be infeasible in the sense that no test data can ever execute them. Experience with the use of "allegations" to circumvent this problem and prevent the static generation of many infeasible paths is reported. 相似文献
17.
一种基于扩展数据流分析的OpenMP程序应用级检查点机制 总被引:1,自引:0,他引:1
随着多核处理器体系结构在高性能计算领域日益广泛的应用,面向共享存储并行程序的容错问题成为研究的热点.近年来,检查点技术已经成为该领域占主导地位的容错机制.目前已有一些针对OpenMP程序检查点技术的研究工作,但其中绝大多数解决方案都依赖于特殊的运行时库或硬件平台.该文提出一种编译辅助的OpenMP应用级检查点,它是一种平台无关的方案,通过面向OpenMP的扩展数据流分析选择那些"必需"的变量保存到检查点映像,从而降低容错的开销,同时通过运行一种非阻塞式的协议维护检查点的全局一致性.文章讨论了该机制的各个关键问题,并通过实验评测以及与同类工作的比较,表明了该文所提出的检查点机制在容错性能方面的优势. 相似文献
18.
基于广播加密的云存储系统受到研究者的关注.然而,基本的广播加密方案不能适应云存储环境中用户和权限的动态变更情况.针对广播加密中密钥管理分发开销大的问题,提出一种扩展公钥的广播加密优化方法,通过保留初始产生公钥时使用的部分私有参数,当用户加入或撤离系统时,使用保留的私有参数产生新的公钥来加密数据.这样,合法用户仍可以使用之前已分发的私钥解密新公钥加密的数据,从而避免了用户动态变化时公钥的频繁变化和密钥的重复分发.通过引入懒惰回收机制,降低了权限变更和密钥定期更新带来的开销.测试结果表明:采用优化方案后,增加用户数量和权限撤销时,系统性能得到较大提高. 相似文献
19.
As a process executes on a processor, it builds up state in that processor′s cache. In multiprogrammed workloads, the opportunity to reuse this state may be lost when a process gets rescheduled, either because intervening processes destroy its cache state or because the process may migrate to another processor. In this paper, we explore affinity scheduling, a technique that helps reduce cache misses by preferentially scheduling a process on a processor where it has run recently. Our study focuses on a bus-based multiprocessor executing a variety of workloads, including mixes of scientific, software development, and database applications. In addition to quantifying the performance benefits of exploiting affinity, our study is distinctive in that it provides low-level data from a hardware performance monitor that details why the workloads perform as they do. Overall, for the workloads studied, we show that affinity scheduling reduces the number of cache misses by 7-36%, resulting in execution time improvements of up to 10%. Although the overall improvements are small, modifying the operating system scheduler to exploit affinity appears worthwhile-affinity has no negative impact on the workloads and we show that it is extremely simple to add to existing schedulers. 相似文献
20.