共查询到20条相似文献,搜索用时 15 毫秒
1.
The microprocessor industry is rapidly moving to the use of multicore chips as general-purpose processors. Whereas the current generation of chip multiprocessors (CMPs) target server applications, future desktop processors likely have tens of multithreaded cores on a single die. Various redundant multithreading (RMT) approaches exploit the multithreaded capability of current general-purpose microprocessors. These approaches replicate the entire program, running it as a separate thread using time or space redundancy. This guards the processor core against all errors, including those in combinational logic. Because RMT exploits the existing multithreaded hardware, it requires only a modest amount of additional hardware support for comparing results and, depending on the implementation, duplicating inputs. 相似文献
2.
Transient-fault recovery for chip multiprocessors 总被引:1,自引:0,他引:1
Chip-level redundant threading with recovery (CRTR) for chip multiprocessors extends previous transient-fault detection schemes to provide fault recovery. To hide interprocessor latency, CRTR uses a long slack enabled by asymmetric commit and uses the trailing thread state for recovery. CRTR increases bandwidth supply by pipelining communication paths and reduces bandwidth demand by extending the dependence-based checking elision. 相似文献
3.
Wu K.-L. Fuchs W.K. Patel J.H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):231-240
The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented 相似文献
4.
Proposed hardware optimizations to CC-NUMA machines-shared memory multiprocessors that use cache consistency protocols-can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each 相似文献
5.
In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attack in about one fifth of the time required to complete it. We could not experience the presence of false positives in our test environment and the overhead caused by the detection systems is negligible. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios. 相似文献
6.
Chiung-San Lee Tai-Ming Parng 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(7):755-767
A methodology, called Subsystem Access Time (SAT) modeling, is proposed for the performance modeling and analysis of shared-bus multiprocessors. The methodology is subsystem-oriented because it is based on a Subsystem Access Time Per Instruction (SATPI) concept, in which we treat major components other than processors (e.g., off-chip cache, bus, memory, I/O) as subsystems and model for each of them the mean access time per instruction from each processor. The SAT modeling methodology is derived from the Customized Mean Value Analysis (CMVA) technique, which is request-oriented in the sense that it models the weighted total mean delay for each type of request processed in the subsystems. The subsystem-oriented view of the proposed methodology facilitates divide-and-conquer modeling and bottleneck analysis, which is rarely addressed previously. These distinguishing features lead to a simple, general, and systematic approach to the analytical modeling and analysis of complex multiprocessor systems. To illustrate the key ideas and features that are different from CMVA, an example performance model of a particular shared-bus multiprocessor architecture is presented. The model is used to conduct performance evaluation for throughput prediction. Thereby, the SATPIs of the subsystems are directly utilized to identify the bottleneck subsystem and find the requests or subsystem components that cause the bottleneck. Furthermore, the SATPIs of the subsystems are employed to explore the impact of several performance influencing factors, including memory latency, number of processors, data bus width, as well as DMA transfer 相似文献
7.
In this paper, we explore two techniques for reducing memory latency in bus-based multiprocessors. The first one, designed for sector caches, is a snoopy cache coherence protocol that uses a large transfer block to take advantage of spatial locality, while using a small coherence block (called a subblock) to avoid false sharing. The second technique is read snarfing (or read broadcasting), in which all caches can acquire data transmitted in response to a read request to update invalid blocks in their own cache.
We evaluated the two techniques by simulating 6 applications that exhibit a variety of reference patterns. We compared the performance of the new protocol against that of the Illinois protocol with both small and large block sizes and found that it was effective in reducing memory latency and providing more consistent, good results than the Illinois protocol with a given line size. Read snarfing also improved performance mostly for protocols that use large line sizes. 相似文献
8.
Arun K. Nanda Honda Shing Ten-Hwan Tzen Lionel M. Ni 《Journal of Parallel and Distributed Computing》1991,12(4)
A standard metric conventionally employed to compare the performance of different multiprocessor systems is speedup. Although providing a measure of the improvement in execution speed achievable on a system, this metric does not yield any insight into the factors responsible for limiting the potential improvement in speed. This paper studies the performance degradation in shared-memory multiprocessors as a result of contention for shared-memory resources. A replicate workload framework with a flexible mechanism for workload specification is proposed for measuring performance. Two normalized performance metrics—efficiency and overhead factor—are introduced to quantify the factors limiting performance and facilitate comparison across architectures. Finally, the proposed model is employed to measure and compare the performance of three contemporary shared-memory systems, with special emphasis on the newly released BBN Butterfly-II (TC2000), currently undergoing Beta test. 相似文献
9.
10.
The majority of analyzers in modern formal text languages are able to detect more than one error in one pass. Little attention is devoted though to the problem of syntax error recovery in diagrams of graphical languages. In this paper, a method for analysis of formal graphical languages with error recovery is suggested. 相似文献
11.
Dexterous manipulation is an important function for working robots. Manipulator tasks such as assembly and disassembly can generally be divided into several motion primitives. We call such motion primitives “skills,” and explain how most manipulator tasks can be composed of sequences of these skills. We are currently planning to construct a maintenance robot for household electrical appliances. We considered establishing a hierarchy of the manipulation tasks of this robot since the maintenance of such appliances has become more complex than ever before. In addition, as errors seem likely to increase in complex tasks, it is important to implement an effective error recovery technology. This article presents our proposal for a new type of error recovery that uses the concepts of task stratification and error classification. 相似文献
12.
Jean G. Vaucher 《Software》1979,9(11):925-929
Sequence checking is such a common operation in commercial data processing that one assumes it no longer presents any programming problems; however, many examples presented in the current literature are either incorrect or misleading in as much as they do not state the necessary conditions for proper operation. Checking algorithms generally detect sequence errors and decide which record is in error based on a comparison between two successive record key values (current and previous). We show that identification of the record in error with any degree of certainty requires at least four successive key values and suggest that simplistic recovery in the face of sequence errors should not be attempted. 相似文献
13.
Lutz R.R. Wong J.S.K. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(8):749-760
A mechanism for modeling timing, precedence, and data-consistency constraints on concurrently executing processes is presented. The model allows durations and intervals between events to be specified. An algorithm is provided to detect schedules which may be unsafe with respect to the constraints. This work, motivated by the design and validation of autonomous error-recovery strategies on the Galileo spacecraft, appears to be applicable to a variety of asynchronous real-time systems 相似文献
14.
This paper presents the performance analysis results for the RAP-WAM AND-Parallel Prolog architecture on shared-memory multiprocessor
organizations. The goal of this parallel model is to provide inference speeds beyond those attainable in sequential systems,
while supporting conventional logic programming semantics. Special emphasis is placed on sequential performance, storage efficiency,
and low control overhead. First, the concepts and techniques used in the parallel execution model are described, along with
the general methodology, benchmarks, and simulation tools used for its evaluation. Results are given both at the memory reference
level and at the memory organization level. A two-level shared-memory architecture model is presented together with an analysis
of various solutions to the cache coherency problem. Finally, RAP-WAM shared-memory simulation results are presented. It is
argued that the RAP-WAM model can exploit coherent caches and attain speeds in excess of 2 MLIPS with current shared-memory
multiprocessing technology for real applications that exhibit medium degrees of parallelism.
MCC 相似文献
15.
《国际计算机数学杂志》2012,89(2):107-119
The paper is the second in a series of three papers devoted to a detailed study of LR(k) parsing with error recovery and correction. Error recovery in LR(k) parsing of a context-free grammar is formalized by extending an LR(k) parser of the grammar such that it accepts all strings over the terminal vocabulary. The parse produced by this extension for a terminal string is a right parse if the string is in the language. In the case of a string not in the language the parse produced by the extension contains so-called error productions which represent the error recovery actions performed by the extension. The treatment is based on the formalization of LR(k) parsing presented in the first paper in the series and it covers practically all error recovery methods designed for LR(k) parsing. 相似文献
16.
Hierarchical ring-based multiprocessor systems are attractive and enjoy several advantages over other type of systems. They ensure unique paths between nodes, simple node interfaces and simple cross-ring connections. Furthermore, employing point-to-point links allows the system to run at high clock rate which increases bandwidth and decreases latency. This paper investigates the performance of hierarchical ring-based shared-memory multiprocessors. Rings in the hierarchy are composed of point-to-point, unidirectional links and apply the Scalable Coherent Interface (SCI) protocol. We pay special emphasis on the impact of locality on processor and interconnection design issues such as number of outstanding requests, and ring topology. We find that in order to exploit the power of hierarchical multiprocessors an accurate and appropriate model of locality must be used. Hierarchical multiprocessors that are well balanced (uniform) tend to provide lower latency and higher system throughput. For non-uniform systems, high degree of locality is required for the hierarchies to perform well. However, restricting the number of outstanding transactions per processor is important in decreasing packets latency and avoiding network contention. 相似文献
17.
《Robotics》1987,3(3-4):353-359
Flexible Manufacturing Systems involving a large number of independent control computers require convenient interfaces between the computers, and if the system is to be largely or entirely under computer control, it is necessary to include facilities for automatic error recovery. The main purpose of this paper is to describe the logical organization of a robotic interface, implemented in our laboratory, that supports automatic error recovery in a convenient way. This organization was inspired by, but is different from, the approach to error condition handling implemented in a widely used computer operating system. 相似文献
18.
Douglas C. Burger Rahmat S. Hyder Barton P. Miller David A. Wood 《The Journal of supercomputing》1996,10(1):87-104
Massively parallel processors have begun using commodity operating systems that support demand-paged virtual memory. To evaluate the utility of virtual memory, we measured the behavior of seven shared-memory parallel application programs on a simulated distributed-shared-memory machine. Our results (1) confirm the importance of gang CPU scheduling, (2) show that a page-faulting processor should spin rather than invoke a parallel context switch, (3) show that our parallel programs frequently touch most of their data, and (4) indicate that memory, not just CPUs, must be gang scheduled. Overall, our experiments demonstrate that demand paging has limited value on current parallel machines because of the applications' synchronization and memory reference patterns and the machines' high page-fault and parallel context-switch overheads.An earlier version of this paper was presented at Supercomputing '94.This work is supported in part by NSF Presidential Young Investigator Award CCR-9157366; NSF Grants MIP-9225097, CCR-9100968, and CDA-9024618; Office of Naval Research Grant N00014-89-J-1222; Department of Energy Grant DE-FG02-93ER25176; and donations from Thinking Machines Corporation, Xerox Corporation, and Digital Equipment Corporation. 相似文献
19.
The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented 相似文献
20.
Cooperative cache-based data access in ad hoc networks 总被引:1,自引:0,他引:1
Cooperative caching, in which multiple nodes share and coordinate cached data, is widely used to improve Web performance in wired networks. However, resource constraints and node mobility have limited the application of these techniques in ad hoc networks. We propose caching techniques that use the underlying routing protocols to overcome these constraints and further improve performance. 相似文献