首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism and optimizes coherence enforcement. It monitors the coherence status of large regions of memory and uses that information to avoid unnecessary broadcasts and filter unnecessary cache tag lookups, thus improving system performance and power consumption.  相似文献   

2.
Coherence graphs     
We study the consistency of a number of probability distributions, which are allowed to be imprecise. To make the treatment as general as possible, we represent those probabilistic assessments as a collection of conditional lower previsions. The problem then becomes proving Walley's (strong) coherence of the assessments. In order to maintain generality in the analysis, we assume to be given nearly no information about the numbers that make up the lower previsions in the collection. Under this condition, we investigate the extent to which the above global task can be decomposed into simpler and more local ones. This is done by introducing a graphical representation of the conditional lower previsions that we call the coherence graph: we show that the coherence graph allows one to isolate some subsets of the collection whose coherence is sufficient for the coherence of all the assessments; and we provide a polynomial-time algorithm that finds the subsets efficiently. We show some of the implications of our results by focusing on three models and problems: Bayesian and credal networks, of which we prove coherence; the compatibility problem, for which we provide an optimal graphical decomposition; probabilistic satisfiability, of which we show that some intractable instances can instead be solved efficiently by exploiting coherence graphs.  相似文献   

3.
Cache coherence in shared-memory multiprocessor systems has been studied mostly from an architecture viewpoint, often by means of aggregating metrics. In many cases, aggregate events provide insufficient information for programmers to understand and optimize the coherence behavior of their applications. A better understanding would be given by source code correlations of not only aggregate events, but also finer granularity metrics directly linked to high-level source code constructs, such as source lines and data structures. In this paper, we explore a novel application-centric approach to studying coherence traffic. We develop a coherence analysis framework based on incremental coherence simulation of actual reference traces. We provide tool support to extract these reference traces and synchronization information from OpenMP threads at runtime using dynamic binary rewriting of the application executable. These traces are fed to ccSIM, our cache-coherence simulator. The novelty of ccSIM lies in its ability to relate low-level cache coherence metrics (such as coherence misses and their causative invalidations) to high-level source code constructs including source code locations and data structures. We explore the degree of freedom in interleaving data traces from different processors and assess simulation accuracy in comparison to metrics obtained from hardware performance counters. Our quantitative results show that: 1) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches the corresponding hardware performance counters. 2) Detailed, high-level coherence statistics are very useful in detecting, isolating, and understanding coherence bottlenecks. We use ccSIM with several well-known benchmarks and find coherence optimization opportunities leading to significant reductions in coherence traffic and savings in wall-clock execution time  相似文献   

4.
Brushe, Gary D., and Waller, Jeremy R., On the Computation of an Averaged Coherence Function, Digital Signal Processing11 (2001) 110–119The coherence function provides a measure of the statistical independence of two stochastic processes and is computed by using the cross spectrum and auto spectra of those processes. This paper examines the computation of an averaged coherence function from multiple blocks of data obtained from two sensors. In signal detection theory, it is generally assumed that the channels through which a signal passes are wide-sense stationary. However, in practical situations this assumption may not be valid and the signal may actually have passed through nonstationary phase channels. This paper demonstrates that in these situations it may be better to average the cross spectrum (used to compute the coherence function) in the polar coordinate system instead of the Cartesian coordinate system. It is demonstrated that averaging the coherence function under the Cartesian coordinate system can produce an averaged estimate that is inconsistent with the meaning of the individual coherence functions that have been averaged. However, averaging the coherence function under the polar coordinate system produces an averaged coherence function that is consistent with the meaning of the individual coherence functions that have been averaged.  相似文献   

5.
多核处理器规模的不断扩大和核间通信机制的日益复杂,使得Cache一致性维护变得更加困难。本文从多核处理器Cache一致性问题的产生背景出发,分析监听协议、目录协议、Token协议和Hammer协议的实现机制以及在多核环境中的优缺点,分别从一致性协议与片上互连结构协同设计、面向低功耗应用的协议优化策略、Cache一致性协议验证及容错机制等角度考虑,对未来多核处理器Cache一致性协议设计的发展趋势和技术挑战进行详细分析与讨论。  相似文献   

6.
This article proposes and demonstrates a technique enabling polygon-based scanline hidden-surface algorithms to be used in applications that require a moderate degree of user interaction. Interactive speeds have been achieved through the use of screen-area coherence,a derivative of frame-to-frame coherence and object coherence. This coherence takes advantage of the face that most of the area of the screen does not change from one frame to the next in applications that have constant viewing positions for a number of frames and in which a majority of the image remains the same. One such application, the user interface of constructive solid geometry (CSG) based modelers, allows a user to modify a model by adding, deleting, repositioning, and performing volumetric Boolean operations on solid geometric primitives. Other possible applications include robot simulation, NC verification, facility layout, surface modeling, and some types of animation. In this article, screen-area coherence is used as the rationale for recalculating only those portions of an image that correspond to a geometric change. More specifically, this article describes a scanline hidden-surface removal procedure that uses screen-area coherence to achieve interactive speeds. A display algorithm using screen-area coherence within a CSG-based scanline hidden-surface algorithm was implemented and tested. Screen-area coherence reduced the average frame update time to about one quarter of the original time for three test sequences of CSG modeling operations.  相似文献   

7.
崔洁  刘辉 《软件》2011,32(9):61-63,66
为了研究井巷内无线信道的相干带宽,在射线跟踪模型的基础上,推导关于频率差的相关系数公式,在推导的公式中,载波频率、传播距离、天线位置是影响信道相干带宽的重要参数,通过仿真分别分析了上述参数对信道相干带宽的影响。得出结论:井下巷道内的相干带宽与频率成正比,与传播距离成反比。  相似文献   

8.
In this paper, we investigate a class of coherence strategies in which an abstraction for shared data at the program-level, referred to as Shared Regions (SR), is used to manage caches dynamically through software. The practical value of these strategies is measured by their performance relative to existing hardware coherence protocols, and the complexity of the SR programming interface. We present detailed quantitative results highlighting the performance of a wide array of SR coherence algorithms, including some novel algorithms introduced in this paper that use direct cache-to-cache data transfers via software to improve performance. These algorithms are studied using execution-driven simulation and compared to a representative hardware strategy for a suite of parallel applications. The experimental results show that the best SR coherence strategy for each application is comparable to or significantly better than the representative hardware strategy for all of the applications that we examine. Our study of programming complexity in SR finds that, for the types of applications that we study, inserting SR annotations is a relatively simple and methodical task.  相似文献   

9.
This paper proposes that self-deception results from the emotional coherence of beliefs with subjective goals. We apply the HOTCO computational model of emotional coherence to simulate a rich case of self-deception from Hawthorne's The Scarlet Letter.We argue that this model is more psychologically realistic than other available accounts of self-deception, and discuss related issues such as wishful thinking, intention, and the division of the self.  相似文献   

10.
提出了一种通过查找缓存一致性协议不变量来验证带参协议正确性的新方法.缓存一致性协议验证的难点在于必须证明协议对于任意大小的带参系统都成立.我们通过寻找不变量和协议规则之间的对应关系来计算辅助不变量,从而帮助推导验证缓存一致性协议.我们设计实现了一个不变量查找工具并将该工具应用到German协议上计算它们的辅助不变量并成功地验证了协议的安全性质.  相似文献   

11.
Quantum coherence plays a central role in quantum mechanics and provides essential power for quantum information processing. In this paper, we study the dynamics of the \(l_1\) norm coherence in one-dimensional quantum walk on cycles for two initial states. For the first initial state, the walker starts from a single position. The coherence increases with the number of steps at the beginning and then fluctuates over time after approaching to saturation. The coherence with odd number of sites is much larger than that with even number of sites. Another initial state, i.e., the equally superposition state, is also considered. The coherence of the whole system is proved to be \(N-1\) (\(2N-1\)) for any odd (even) time step where N is the number of sites. We also investigate the influence of two unitary noises, i.e., noisy Hadamard operator and broken link, on the coherence evolution.  相似文献   

12.
Virtual-memory-based cache coherence is a mechanism that relies only on hardware that already exists on the microprocessors of a shared-memory multiprocessor system, yet dynamically detects and resolves potential cache inconsistencies using virtual-memory techniques. The key feature of the approach is that the virtual-memory translation hardware on each processor is used to detect shared accesses that could lead to memory incoherencies, and VM page fault handlers execute the appropriate actions to maintain cache coherence. VM-based cache coherence basically trades off design simplicity for increased software overheads. The work presented in this paper evaluates this trade-off. We show that VM-based cache coherence performs well for scientific applications that require significant aggregate memory bandwidth.  相似文献   

13.
针对丰富植被地区散射中心高度差去相关的问题,提出了一种结合最优相干运算的Pol-In-SAR相干配准方法,讨论了Pol-InSAR复图像配准过程中需要考虑的几个关键问题,并以天山地区SIR-C/SLC复图像的配准结果证明了方法的有效性。  相似文献   

14.
Circuit-switched networks can significantly lower the communication latency between processor cores, when compared to packet-switched networks, since once circuits are set up, communication latency approaches pure interconnect delay. However, if circuits are not frequently reused, the long set up time and poorer interconnect utilization can hurt overall performance. To combat this problem, we propose a hybrid router design which intermingles packet-switched flits with circuit-switched flits. Additionally, we co-design a prediction-based coherence protocol that leverages the existence of circuits to optimize pair-wise sharing between cores. The protocol allows pair-wise sharers to communicate directly with each other via circuits and drives up circuit reuse. Circuit-switched coherence provides overall system performance improvements of up to 17% with an average improvement of 10% and reduces network latency by up to 30%.  相似文献   

15.
16.
篇章连贯性建模是自然语言处理研究领域的一个基础问题。主流的篇章连贯性模型分为两大类,分别是基于实体网格的连贯性模型和基于神经网络的篇章连贯性模型。其中,基于实体网格的篇章连贯性模型需要进行特征提取,而基于深度学习的模型没有充分考虑篇章中句子间的实体链接对连贯性建模的重要作用。基于此,该文首先抽取篇章中相邻句子的实体信息,将其进行分布式表示,然后将此信息通过多种简单且有效的向量操作融合至句子级的双向LSTM深度学习模型之中。在汉语和英语篇章语料上的句子排序和中英文机器翻译连贯性检测两种任务上的实验表明该文提出的模型性能和现有模型相比有所提升,尤其在中文上有显著提升。  相似文献   

17.
崔洁  徐钊  霍羽 《工矿自动化》2012,38(8):36-40
基于矿井巷道电磁波传播多波模理论和射线传播理论,建立了煤矿井巷中无线信号的信道模型。综合考虑传播距离、巷道截面尺寸、载波频率、天线极化、天线位置、介质电参数等影响信道相干带宽的重要参数,推导了煤矿井巷中无线信道相干带宽的公式。在宽为4m、高为3m的矩形巷道仿真上述参数对相干带宽的影响,得出了如下结论:煤矿井巷内的相干带宽与频率成正比,与传播距离、巷道截面尺寸和介质电参数成反比,改变天线在巷道截面的位置对相干带宽影响不大。  相似文献   

18.
阐述了颜色相关向量的基本概念,提出了分块颜色相关向量相似性度量的计算方法和相关区域快速搜索算法,最终形成基于分块颜色相关向量的图像检索算法。实验表明,算法更符合人的主观感觉。  相似文献   

19.
共享存储系统中如何高效地实现高速缓存一致性是体系结构设计面临的一个关键问题和难点问题.已有的基于目录的协议存在难于实现、验证复杂和存储空间开销大等问题.面向片上众核处理器,文中提出一种由硬件结构支持、基于同步的高速缓存一致性协议.该方案不使用目录,而是通过使用bloom-filter表示一致性信息,并在并行程序中的同步点维护高速缓存一致性.与现有的基于目录的高速缓存一致性协议相比,该方案可以降低目录协议的实现、验证复杂度.用SPLASH一2测试程序集评估表明,基于同步的协议可以获得与基于目录的协议相当的性能.  相似文献   

20.
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established the critical and dominant role that protocol processing bandwidth (or its inverse, occupancy) plays in determining overall performance in architectures with standalone memory/coherence controllers. However, with recent architectural trends toward integrated (on-chip) memory controllers and the well-known fact that processor frequency is increasing more rapidly than memory systems, we must ask whether parallel coherence processing engines (either multiple integrated protocol processors/cores or multiple protocol threads) are needed in DSM machines constructed from modern processor architectures and, if so, when. We construct a useful analytical model to give the designer insight into when parallel coherence streams will improve performance and verify our model via detailed simulation on 64-threaded microbenchmarks and parallel applications and on single-node multiprogrammed workloads. Surprisingly, and contrary to related work, we find that, in these architectures, adding a second coherence engine has almost no impact on performance. Further, for less-tuned applications that suffer from hot spots (contentious requests to the same memory line), additional engines offer no benefit whatsoever. Even with double the memory bandwidth (or channels), an additional coherence processing stream yields only slight performance improvement. Only for a special class of DSM machines employing directoryless broadcast protocols over unordered interconnects does parallel "snoop" processing offer reasonable performance improvement for communication-intensive applications. Overall, given the architectural trends, this is good news for DSM designers who want to minimize the resources necessary (protocol threads or integrated protocol processor cores for maintaining internode coherence, respectively) to create SMTp-based or multi-CMP-based scalable DSM machines using directory protocols.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号