期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The impact of workload on the reliability of real-time processor triads

C.M. Krishna 《Microelectronics Reliability》1993,33(8)

Most real-time systems employ N-modular — most commonly triple-modular — redundancy for fault-tolerance. When a processor in a triad fails permanently, a spare processor (if available) must be switched in to take its place. If the failure is transient, the affected processor will be brought back into the triad after it recovers.In either case, it is necessary to make the memory of all three members of the triad consistent. This can be done by copying into the recovering or substitute processor the writeable memory of the two processors that are still functional. The time required for this can depend on the workload, and the rate at which this workload writes into its memory.Until the processor has recovered, is resynchronized with its colleagues in the triad, and resumes normal operation, the triad is effectively a duplex and will suffer fatal failure if one more of its processors fails.In this paper, we study the impact of workload on the recovery time, and therefore on the reliability, of processor triads. 相似文献

2.

Optimal configuration of redundant real-time systems in the face ofcorrelated failure

Krishna C.M. Singh A.D. 《Reliability, IEEE Transactions on》1995,44(4):587-594

Real-time computers are frequently used in harsh environments, such as space or industry. Lightning strikes, streams of elementary particles, and other manifestations of a harsh operating environment can cause transient failures in processors. Since the entire system is in the same environment, an especially severe disturbance can result in a momentary, correlated, failure of all the processors. To have the system survive transient correlated failures and still execute all its critical workload on time, designers must use time redundancy. To survive permanent or transient independently-occurring failures, processor redundancy must be used, and the computer configured into redundant clusters. Given a fixed total number of processors, there is a tradeoff between processor- and time-redundancy, This paper considers the tradeoffs between configuring the system into duplexes and triplexes. There are pessimistic and optimistic reliability models for each configuration. For the range of pertinent parameters, these models are very close, indicating that these models are quite accurate. The duplex-tripler tradeoff is between the effects of permanent, independent-transient, and correlated-transient failures. Configuring the system in triplexes provides better protection against permanent and independent-transient failures, but diminishes protection against correlated-transient failures. The better configuration is given for each application 相似文献

3.

Redundant task-allocation in multicomputer systems

Cherkassky V. Chen C.-I.H. 《Reliability, IEEE Transactions on》1992,41(3):336-342

A simple yet effective method for improving multicomputer multiprocessor system reliability via redundant allocation of tasks to computers (processors) is described. Given any known (nonredundant) scheduling strategy, tasks are allocated to processors statically and redundantly using a k-circular shifting (KCS) algorithm. so that if some processors fail during the execution. all tasks can be completed on the remaining processors (but at a longer time). Redundant allocation of independent tasks to identical processors (computers), subject to real-time constraints on total execution time, is discussed in detail, and analytic reliability estimates are derived. The longest processing time scheduling is given as an example of nonredundant deterministic scheduling of independent tasks. Processor utilization for redundant task-allocation is discussed and compared with standby redundancy: the authors' KCS algorithm achieves much higher processor utilization than standby redundancy 相似文献

4.

The distributed voting strategy for fault diagnosis and reconfiguration of linear processor arrays

Valentin Obac Roda Ting Ting Y. Lin 《Microelectronics Reliability》1994,34(6)

A new strategy for fault diagnosis and reconfiguration of linear processor arrays is proposed. This strategy can be implemented in a distributed manner, suitable for arrays with a large number of processors such as those implemented by VLSI and WSI techniques. The proposed fault diagnosis is based on the distributed voting technique. Additional links are added for fault diagnosis. These links are also used, when there is a processor failure, for communication between the processors. Some reconfiguration schemes are also presented emphasizing the distributed approach. 相似文献

5.

Self-repairing processor modules

Kilmer W.L. 《Reliability, IEEE Transactions on》1995,44(2):327-332

A processor is any self-contained computer of at least personal-computer capability. The paper explores how much the processor mean time-to-failure can be improved by replacing it with an N-processor module, where each processor in the module consists of a copy of the original processor augmented with a communication protocol unit. The copy of the original processor is faulty with probability, p_c, and the protocol unit is faulty with probability, p. The asynchronous N-processor module uses a Byzantine agreement (F-ID-P) algorithm to identify which of its processors disagreed with a module consensus. The identified processors are presumed faulty, and the module replaces them with duplicates from a set of standbys. The F-ID-P algorithm is a modification of Bracha's, which guarantees that in a module of 3t+1 processors, up to t faults can be identified by at least t+1 non-faulty processors. The module fails if faults in more than t of its processors prevent it from: 1) obtaining a correct consensus, or 2) executing the algorithm. The F-ID-P algorithm departs from Bracha's by using a random instead of an adversary scheduler of message delays. Simulation showed that almost always F-ID-P algorithm correctly identified all of a module's faulty processors if more than half of them were nonfaulty. Thus F-ID-P algorithm was about 3/2 more fault tolerant than guaranteed. Also, compared to a single processor's mean number of decisions to failure, the F-ID-P module was 841 times better when N=37, down to 5.1 times better when N=10 相似文献

6.

Determining the Order of Processor Transactions in Statically Scheduled Multiprocessors

S. Sriram Edward A. Lee 《The Journal of VLSI Signal Processing》1997,15(3):207-220

This paper addresses embedded multiprocessor implementation of iterative, real-time applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), ordering the execution of tasks within each processor (task ordering), and determining when each task must commence execution. We consider three scheduling strategies: fully-static, self-timed and ordered transactions, all of which perform the assignment and ordering steps at compile time. Run time costs are small for the fully-static strategy; however it is not robust with respect to changes or uncertainty in task execution times. The self-timed approach is tolerant of variations in task execution times, but pays the penalty of high run time costs, because processors need to explicitly synchronize whenever they communicate. The ordered transactions approach lies between the fully-static and self-timed strategies; in this approach the order in which processors communicate is determined at compile time and enforced at run time. The ordered transactions strategy retains some of the flexibility of self-timed schedules and at the same time has lower run time costs than the self-timed approach.In this paper we determine an order of processor transactions that is nearly optimal given information about task execution times at compile time, and for a given processor assignment and task ordering. The criterion for optimality is the average throughput achieved by the schedule. Our main result is that it is possible to choose a transaction order such that the resulting ordered transactions schedule incurs no performance penalty compared to the more flexible self-timed strategy, even when the higher run time costs implied by the self-timed strategy are ignored. 相似文献

7.

Optimum VoIP packet routing using a communications processor

《AEUE-International Journal of Electronics and Communications》2008,62(2):104-113

This article presents a new architecture for the VoIP media gateway using only a communications processor and digital signal processors. The new architecture can be used by telecommunications equipment manufactures to replace a network processor and a general-purpose processor with a single communications processor, thereby can reduce the system cost, power consumption, printed circuit board (PCB) area, software complexity and time to market. In the new architecture the modules are interconnected via Ethernet interfaces, which make voice packet encapsulation possible in digital signal processors. This relieves the network processor, which in voice over IP (VoIP) media gateways is most commonly used for the routing of VoIP packets and voice-packet encapsulation, and means it can be replaced by a communications processor. The presented media gateway architecture makes it possible to combine the data- and control-plane application on a single-communications processor, but only in the case of a properly optimized program code and an optimized Ethernet driver. Therefore, the main part of the article is dedicated to a presentation of the methodology for the analysis and optimization of the presented systems. In order to support this methodology, a new tool named performance monitor (PM) was developed. The PM tool is presented here, and was used for optimizing the Ethernet driver. The Ethernet driver was optimized and modified in such a way as to put a minimal load on the microprocessor core of the communications processor when routing the VoIP packets to digital signal processors and back. The article ends with a presentation of the experimental optimization results, which were acquired from a real telecommunications system. 相似文献

8.

Task-scheduling strategies for reliable TMR controllers using taskgrouping and assignment

Seong Woo Kwak Byung Kook Kim 《Reliability, IEEE Transactions on》2000,49(4):355-362

Real-time computers are often used in embedded, life-critical applications where high reliability is important. A common approach to make such systems dependable is to vote on redundant processors executing multiple copies of the same task. The most popular redundant structure is triple modular redundancy (TMR). The processors that make up such systems are subject not only to independently occurring permanent and transient faults, but to correlated transient faults, such as electromagnetic interference (EMI) caused by the operating environment. This paper proposes two new scheduling strategies for TMR computer-controllers. Both strategies can tolerate correlated faults as well as independent faults. These strategies, TMR-R (TMR with rotated task group) and TMR-Q (TMR with quintuple computation), are developed using task grouping and assignment. To evaluate the reliability of these strategies, a discrete-time Markov model for control systems is devised. Reliability equations for the TMR-R and TMR-Q are derived from state transitions of sampling intervals based on the Markov model. The reliability of these TMR is proved by comparing them with a conventional TMR, using numerical analysis. These proposed strategies are anticipated to be useful for control systems operating in harsh environments, such as controllers of airplanes or nuclear power plants 相似文献

9.

Configuration and Extension of Embedded Processors to Optimize IPSec Protocol Execution

Potlapally N.R. Ravi S. Raghunathan A. Lee R.B. Jha N.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(5):605-609

Security protocols, such as IPSec and SSL, are being increasingly deployed in the context of networked embedded systems. The resource-constrained nature of embedded systems and, in particular, the modest capabilities of embedded processors make it challenging to achieve satisfactory performance while executing security protocols. A promising approach for improving performance in embedded systems is to use application-specific instruction set processors that are designed based on configurable and extensible processors. In this paper, we perform a comprehensive performance analysis of the IPSec protocol on a state-of-the-art configurable and extensible embedded processor (Xtensa from Tensilica Inc.). We present performance profiles of a lightweight embedded IPSec implementation running on the Xtensa processor, and examine in detail the various factors that contribute to the processing latencies, including cryptographic and protocol processing. In order to improve the efficiency of IPSec processing on embedded devices, we then study the impact of customizing an embedded processor by synergistically 1) configuring architectural parameters, such as instruction and data cache sizes, processor-memory interface width, write buffers, etc., and 2) extending the base instruction set of the processor using custom instructions for both cryptographic and protocol processing. Our experimental results demonstrate that upto 3.2times speedup in IPSec processing is possible over a popular embedded IPSec software implementation 相似文献

10.

Performance trends in high-end processors 总被引：1，自引：0，他引：1

Sai-Halasz G.A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(1):20-36

Based on a first order cycle time model performance trends and limits are projected for both bipolar and CMOS processors. The key in identifying trends is the understanding of the pivotal factors at any given stage of technology progression. One such parameter is the physical area of the processor. In coming technologies there will be opposite demands placed on the system's area stemming from a need to reduce the proportion of interconnection capacitance and to send signals across the processor. Contrary to the usual perception, delays resulting from wiring capacitance decrease if processor area increases, while the minimization of signal travel times favors reducing area. The system size tradeoff in the case of bipolar processors is primarily determined by power density, while CMOS processor sizes are determined by wirability requirements. To achieve the full potential of CMOS, interconnections will have to be carefully planned. The performance limits of bipolar and room temperature CMOS uniprocessors are shown to be very similar. The highest performance technology on the horizon is liquid nitrogen temperature CMOS. Alternate technologies, based on III-V compound devices, or more exotic quantum structures, are not expected to play a role in future general-purpose high-end systems 相似文献

11.

面向智慧高速网络节点边缘处理器的资源配置优化

陈鸣吴琼赵洪蕊姜玉稀《无线电通信技术》2022,(1):97-102

随着B5G时代的到来,边缘计算被广泛用于处理智慧高速网络节点的海量原始数据.但在实际应用中,基于设备成本的考虑,大量边缘处理器并不会配置较为充裕的冗余性计算资源和存储空间,在处理突发事件时可能无法合理分配资源而出现高时延、宕机等问题,进而影响智慧高速公路系统的稳定性和可靠性.为此,提出了一种面向智慧高速网络节点边缘处理... 相似文献

12.

Scalable Programming Models for Massively Multicore Processors

McCool M.D. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2008,96(5):816-831

Including multiple cores on a single chip has become the dominant mechanism for scaling processor performance. Exponential growth in the number of cores on a single processor is expected to lead in a short time to mainstream computers with hundreds of cores. Scalable implementations of parallel algorithms will be necessary in order to achieve improved single-application performance on such processors. In addition, memory access will continue to be an important limiting factor on achieving performance, and heterogeneous systems may make use of cores with varying capabilities and performance characteristics. An appropriate programming model can address scalability and can expose data locality while making it possible to migrate application code between processors with different parallel architectures and variable numbers and kinds of cores. We survey and evaluate a range of multicore processor architectures and programming models with a focus on GPUs and the Cell BE processor. These processors have a large number of cores and are available to consumers today, but the scalable programming models developed for them are also applicable to current and future multicore CPUs. 相似文献

13.

The effect of statically and dynamically replicated components onsystem reliability

Leu D.-R. Bastani F.B. Leiss E.L. 《Reliability, IEEE Transactions on》1990,39(2):209-216

The reliability of general systems using dynamic and static redundancy schemes is derived, and communication protocols are considered as a representative example. The system reliability for three broadcast protocols using various redundancy-allocation policies is studied. The analytic and simulation results show that, in some cases, static redundancy yields a more reliable system than dynamic redundancy. This is essential for distributed system applications. In some cases, the failure detection time is substantial, so that the hardware reliability and hence the system reliability are adversely affected when using dynamic redundancy. This can be a critical factor for distributed system applications, because a large overhead of communication can be required for error detection. In these cases, unreliable protocols can provide better system reliability than reliable protocols, especially when the communication network is highly reliable and when the machine failure rate is relatively large. Since unreliable protocols generate less load and less resource contention, they are preferable in such cases. The reliability should be analyzed to determine the optimal balance between reliable and unreliable protocols. Static redundancy can be more reliable than dynamic redundancy if the failure-detection time is large 相似文献

14.

Embedded software in real-time signal processing systems: designtechnologies

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1997,85(3):436-454

The increasing use of embedded software, often implemented on a core processor in a single-chip system, is a clear trend in the telecommunications, multimedia, and consumer electronics industries. A companion paper (Paulin et al., 1997) presents a survey of application and architecture trends for embedded systems in these growth markets. However, the lack of suitable design technology remains a significant obstacle in the development of such systems. One of the key requirements is more efficient software compilation technology. Especially in the case of fixed-point digital signal processor (DSP) cores, it is often cited that commercially available compilers are unable to take full advantage of the architectural features of the processor. Moreover, due to the shorter lifetimes and the architectural specialization of many processor cores, processor designers are often compelled to neglect the issue of compiler support. This situation has resulted in an increased research activity in the area of design tool support for embedded processors. This paper discusses design technology issues for embedded systems using processor cores, with a focus on software compilation tools. Architectural characteristics of contemporary processor cores are reviewed and tool requirements are formulated. This is followed by a comprehensive survey of both existing and new software compilation techniques that are considered important in the context of embedded processors 相似文献

15.

Optimal design of fault-tolerant distributed systems based on arecursive algorithm

Pham H. Upadhyaya S.J. 《Reliability, IEEE Transactions on》1991,40(3):375-379

The authors address the issue of optimal design (in terms of the number of processors) of a distributed system which is based on a recursive algorithm for fault tolerance (RAFT). The reliability and performance of the system using RAFT are determined as a function of reliability of individual processors and the number of fault modes in a processor. Also discussed are how to determine the design policies when the objective is to minimize the average system failure. Several numerical examples illustrate the results 相似文献

16.

Error detection by duplicated instructions in super-scalarprocessors

Oh N. Shirvani P.P. McCluskey E.J. 《Reliability, IEEE Transactions on》2002,51(1):63-75

This paper proposes a pure software technique "error detection by duplicated instructions" (EDDI), for detecting errors during usual system operation. Compared to other error-detection techniques that use hardware redundancy, EDDI does not require any hardware modifications to add error detection capability to the original system. EDDI duplicates instructions during compilation and uses different registers and variables for the new instructions. Especially for the fault in the code segment of memory, formulas are derived to estimate the error-detection coverage of EDDI using probabilistic methods. These formulas use statistics of the program, which are collected during compilation. EDDI was applied to eight benchmark programs and the error-detection coverage was estimated. Then, the estimates were verified by simulation, in which a fault injector forced a bit-flip in the code segment of executable machine codes. The simulation results validated the estimated fault coverage and show that approximately 1.5% of injected faults produced incorrect results in eight benchmark programs with EDDI, while on average, 20% of injected faults produced undetected incorrect results in the programs without EDDI. Based on the theoretical estimates and actual fault-injection experiments, EDDI can provide over 98% fault-coverage without any extra hardware for error detection. This pure software technique is especially useful when designers cannot change the hardware, but they need dependability in the computer system. To reduce the performance overhead, EDDI schedules the instructions that are added for detecting errors such that "instruction-level parallelism" (ILP) is maximized. Performance overhead can be reduced by increasing ILP within a single super-scalar processor. The execution time overhead in a 4-way super-scalar processor is less than the execution time overhead in the processors that can issue two instructions in one cycle 相似文献

17.

Runtime Support for Multicore Packet Processing Systems

Wolf T. Ning Weng 《IEEE network》2007,21(4):29-37

Network processors promise a flexible, programmable packet processing infrastructure for network systems. To make full use of the capabilities of network processors, it is imperative to provide the ability to dynamically adapt to changing traffic patterns in the form of a network processor runtime system. The differences from existing operating systems and the main challenges lie in the multiprocessor nature of NPs, their on-chip resource constraints, and real-time processing requirements. In this article we explore the key design trade-offs that need to be considered when designing a network processor operating system. In particular, we explore the performance impact of application analysis on partitioning, traffic characterization, workload mapping, and runtime adaptation. We present and discuss qualitative and quantitative results in the context of a particular application analysis and mapping framework. The observations and conclusions are generally applicable to any runtime environment for network processors. 相似文献

18.

Parallel processing of fault trees on a locally distributedmultiple-processor network

Heger A.S. Koen B.V. Wegmann H.F. 《Reliability, IEEE Transactions on》1993,42(3):436-441

Large fault trees are evaluated in a distributed fashion by pooling the computing power of several networked LISP processors. Direct evaluation of fault trees of complex systems is computationally intensive and can take a long time when performed on a single processor. An iterative top-down algorithm successively recognizes and reduces known patterns, and decomposes the problem into subtasks at each level of iteration. These subtasks are distributed to multiple machines on the network. Thus, subtree evaluations which are tackled serially on a single processor, are performed in parallel by the distributed network. The reductions in elapsed computer time afforded by this approach are examined. Questions of optimal resource allocation and of scheme limitations are considered 相似文献

19.

Failure-Aware Task Scheduling of Synchronous Data Flow Graphs Under Real-Time Constraints

Chanhee Lee Sungchan Kim Hyunok Oh Soonhoi Ha 《Journal of Signal Processing Systems》2013,73(2):201-212

As more processors are integrated into Multiprocessor System-on-Chips (MPSoCs) via relentless technology scaling, the mean-time-to-failure (MTTF) is reduced to the extent that unexpected processor failures are considered during design time. A popular approach to tolerate processor failures is to migrate tasks on the faulty processor to live processors. This approach, however, is not suitable for real-time digital signal processing (DSP) applications since it may not guarantee real-time constraints. In this paper, we propose the re-scheduling of the entire application to minimize throughput degradation under a latency constraint, given that the application is specified by a Synchronous Data Flow (SDF) graph. We obtain sub-optimal re-scheduling results using a genetic algorithm for each scenario of processor failures at compile-time. If a failure is detected at run-time, the live processors obtain the saved schedule, perform task transfer, and execute the remaining tasks of the current iteration. We compare preemptive and non-preemptive migration policies and propose a hybrid policy to obtain better performance. We demonstrate the viability of the proposed technique through experiments with real-life DSP applications as well as randomly generated graphs under timing constraints and random fault scenarios. 相似文献

20.

A Novel Reconfigurable Processor Using Dynamically Partitioned SIMD for Multimedia Applications

Chun‐Gi Lyuh Jung‐Hee Suk Ik‐Jae Chun Tae Moon Roh 《ETRI Journal》2009,31(6):709-716

In this paper, we propose a novel reconfigurable processor using dynamically partitioned single‐instruction multiple‐data (DP‐SIMD) which is able to process multimedia data. The SIMD processor and parallel SIMD (P‐SIMD) processor, which is composed of a number of SIMD processors, are usually used these days. But these processors are inefficient because all processing units (PUs) should process the same operations all the time. Moreover, the PUs can process different operations only when every SIMD group operation is predefined. We propose a processor control method which can partition parallel processors into multiple SIMD‐based processors dynamically to enhance efficiency. For performance evaluation of the proposed method, we carried out the inverse transform, inverse quantization, and motion compensation operations of H.264 using processors based on SIMD, P‐SIMD, and DP‐SIMD. Experimental results show that the DP‐SIMD control method is more efficient than SIMD and P‐SIMD control methods by about 15% and 14%, respectively. 相似文献