期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A parametrized algorithm that implements sequential, causal, and cache memory consistencies

Ernesto Jiménez Vicent Cholvi 《Journal of Systems and Software》2008,81(1):120-131

In this paper, we present an algorithm that can be used to implement sequential, causal, or cache consistency in distributed shared memory (DSM) systems. For this purpose it includes a parameter that allows us to choose the consistency model to be implemented. If all processes run the algorithm with the same value in this parameter, the corresponding consistency is achieved. (Additionally, the algorithm tolerates that processes use certain combination of parameter values.) This characteristic allows a concrete consistency model to be chosen, but implements it with the more efficient algorithm in each case (depending of the requirements of the applications). Additionally, as far as we know, this is the first algorithm proposed that implements cache coherence.In our algorithm, all the read and write operations are executed locally when implementing causal and cache consistency (i.e., they are fast). It is known that no sequential algorithm has only fast memory operations. In our algorithm, however, all the write operations and some read operations are fast when implementing sequential consistency. The algorithm uses propagation and full replication, where the values written by a process are propagated to the rest of the processes. It works in a cyclic turn fashion, with each process of the DSM system, broadcasting one message in turn. The values written by the process are sent in the message (instead of sending one message for each write operation): However, unnecessary values are excluded. All this permits the amount of message traffic owing to the algorithm to be controlled. 相似文献

2.

Rambo: a robust, reconfigurable atomic memory service for dynamic networks

Seth Gilbert Nancy A. Lynch Alexander A. Shvartsman 《Distributed Computing》2010,23(4):225-272

In this paper, we present Rambo, an algorithm for emulating a read/write distributed shared memory in a dynamic, rapidly changing environment. Rambo provides a highly reliable, highly available service, even as participants join, leave, and fail. In fact, the entire set of participants may change during an execution, as the initial devices depart and are replaced by a new set of devices. Even so, Rambo ensures that data stored in the distributed shared memory remains available and consistent. There are two basic techniques used by Rambo to tolerate dynamic changes. Over short intervals of time, replication suffices to provide fault-tolerance. While some devices may fail and leave, the data remains available at other replicas. Over longer intervals of time, Rambo copes with changing participants via reconfiguration, which incorporates newly joined devices while excluding devices that have departed or failed. The main novelty of Rambo lies in the combination of an efficient reconfiguration mechanism with a quorum-based replication strategy for read/write shared memory. The Rambo algorithm can tolerate a wide variety of aberrant behavior, including lost and delayed messages, participants with unsynchronized clocks, and, more generally, arbitrary asynchrony. Despite such behavior, Rambo guarantees that its data is stored consistency. We analyze the performance of Rambo during periods when the system is relatively well-behaved: messages are delivered in a timely fashion, reconfiguration is not too frequent, etc. We show that in these circumstances, read and write operations are efficient, completing in at most eight message delays. 相似文献

3.

A reconfigurable HexCell-based systolic array architecture for evolvable hardware on FPGA

《Microprocessors and Microsystems》2020

Evolvable hardware is a system that modifies its architecture and behavior to adapt with changes of the environment. It is formed by reconfigurable processing elements driven by an evolutionary algorithm. In this paper, we study a reconfigurable HexCell-based systolic array architecture for evolvable systems on FPGA. HexCell is a processing element with a tile-able hexagonal-shaped cell for reconfigurable systolic arrays on FPGAs. The cell has three input ports feed into an internal functional-unit connected to three output ports. The functional-unit is configured using dynamic partial reconfiguration (DPR), and the output ports, in contrast, are configured using virtual reconfiguration circuit (VRC). Our proposed architecture combines the merits of both DPR and VRC to achieve fast reconfiguration and accelerated evolution. A HexCell-based 4 × 4 array was implemented on FPGA and utilized 32.5% look-up tables, 31.3% registers, and 1.4% block RAMs of Artix-7 (XC7Z020) while same-size conventional array consumed 8.7%, 5.1%, and 20.7% of the same FPGA, respectively. As a case study, we used an adaptive image filter as a test application. Results showed that the fitness of the best filters generated by our proposed architecture were generally fitter than those generated by the conventional state-of-the-art systolic array on the selected application. Also, performing 900,000 evaluations on HexCell array was 2.6 × faster than the conventional one. 相似文献

4.

Multi-level reconfigurable architectures in the switch model

Sebastian Lange Martin Middendorf 《Journal of Systems Architecture》2010,56(2-3):103-115

In this paper, we propose a concept for multi-level reconfigurable architectures with more than two levels of reconfiguration, and study these architectures theoretically and experimentally. The proposed architectures are extensions of 2-level reconfigurable architectures where the reconfiguration operations on the lowest level correspond to the reconfiguration operations of standard 1-level reconfigurable architectures, and the reconfigurable units are simple switches. It is shown that finding an optimal number of reconfiguration levels and a corresponding reconfiguration scheme that minimizes the number of reconfiguration bits for a given algorithm can be done in polynomial time. But finding the optimal number of reconfiguration levels is NP-hard for heterogeneous multi-level architectures, where the number of reconfiguration levels varies for the different reconfigurable units. Experimental results for different test applications show that 3–4 reconfiguration levels are optimal with respect to the number of reconfiguration bits needed. The number of reconfiguration bits is reduced by 35–86% compared to 1-level reconfiguration and by 8–34% compared to 2-level reconfiguration. The heterogeneous architecture reduces the number of necessary reconfiguration bits by additional 1–5% and also needs less SRAM cells. 相似文献

5.

Using reconfiguration for efficient management of replicated data 总被引：2，自引：0，他引：2

Agrawal D. El Abbadi A. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(5):786-801

Replicated data management protocols have been proposed that exploit a logically structured set of copies. These protocols have the advantage that they provide limited fault-tolerance at low communication cost. The proposed protocols can be viewed as analogues of the read-one write-all protocol in the context of logical structures. In this paper, we start by generalizing these protocols in two ways for logical structures. First, the quorum-based approach is applied to develop protocols that use structured read and write quorums, thus attaining a high degree of data availability for both read and write operations. Next, the reconfiguration or views approach is developed for these structures, resulting in protocols that attain high degrees of availability at significantly low communication cost for read operations. In this sense, the proposed protocols have the advantages of the read-one write-all protocol for low-cost read operations as well as the majority quorum protocol for high data availability. Finally, we generalize the reconfiguration approach to allow for the dynamic reconfiguration of the database system from one replica management protocol to another. This allows database systems to adapt to an evolving and dynamic application environment 相似文献

6.

Formally verified on-line diagnosis

Walter C.J. Lincoln P. Suri N. 《IEEE transactions on pattern analysis and machine intelligence》1997,23(11):684-721

A reconfigurable fault tolerant system achieves the attributes of dependability of operations through fault detection, fault isolation and reconfiguration, typically referred to as the FDIR paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the health and state of the system. An imprecise state assessment can lead to catastrophic failure due to an optimistic diagnosis, or conversely, result in underutilization of resources because of a pessimistic diagnosis. Differing from classical testing and other off-line diagnostic approaches, we develop procedures for maximal utilization of the system state information to provide for continual, on-line diagnosis and reconfiguration capabilities as an integral part of the system operations. Our diagnosis approach, unlike existing techniques, does not require administered testing to gather syndrome information but is based on monitoring the system message traffic among redundant system functions. We present comprehensive on-line diagnosis algorithms capable of handling a continuum of faults of varying severity at the node and link level. Not only are the proposed algorithms on-line in nature, but are themselves tolerant to faults in the diagnostic process. Formal analysis is presented for all proposed algorithms. These proofs offer both insight into the algorithm operations and facilitate a rigorous formal verification of the developed algorithms 相似文献

7.

Design space exploration for partially reconfigurable architectures in real-time systems

《Journal of Systems Architecture》2013,59(8):571-581

In this paper, we introduce FoRTReSS (Flow for Reconfigurable archiTectures in Real-time SystemS), a methodology for the generation of partially reconfigurable architectures with real-time constraints, enabling Design Space Exploration (DSE) at the early stages of the development. FoRTReSS can be completely integrated into existing partial reconfiguration flows to generate physical constraints describing the architecture in terms of reconfigurable regions that are used to floorplan the design, with key metrics such as partially reconfigurable area, real-time or external fragmentation. The flow is based upon our SystemC simulator for real-time systems that helps develop and validate scheduling algorithms with respect to application timing constraints and partial reconfiguration physical behaviour. We tested our approach with a video stream encryption/decryption application together with Error Correcting Code and showed that partial reconfiguration may lead to an area improvement up to 38% on some resources without compromising application performance, in a very small amount of time: less than 30 s. 相似文献

8.

基于读写特征的总线分布式同步算法

吴建军王征《计算机应用》2008,28(1):11-13

为了解决分布式系统在总线网络中的读写同步问题,提出了基于读写特征的同步算法。该算法通过区分读写特征提高算法的并发度;通过哈希运算将分布式同步问题转化为单节点同步问题,提高了算法的性能,减少了所需的消息数;通过消息转发等方式,缩短了算法的响应延迟。性能分析和仿真试验表明,该算法有较低的消息复杂度和时间复杂度。相似文献

9.

Optimizing the costs of hierarchical quorum consensus

Akhil Kumar Kavindra Malik 《Acta Informatica》1996,33(3):255-275

We study the problem of how to minimize the cost of maintaining consistency among at least N copies of an object in an enviroment where the mix of read and write operations can vary. Quorum consensus requires that read and write operations must assemble appropriate quorums before an operation can succeed. The cost of an operation is proportional to the size of a quorum, and the objective is obviously to minimize the cost while still maintaining consistency. It is known that the quorum size can be reduced by organizing a number of copies into logical structures such as grids and hierarchies. In this paper, we show (a) how methods based on grids and hierarchies can be treated in a common framework, and (b) how these hierarchies can be optimized so as to minimize the cost of consensus. Of course, the optimal solution depends upon the mix of read and write operations that is present. Consequently, given N copies of an object and a ratio of write operations F, our algorithms determine the optimal values for the number of levels in the hierarchy and the read/write quorum sizes at each level. The algorithms, which run in O(N ^1.63) and O(N ²) time, were tested, and the results are reported and discussed. Received September 1, 1992/February 16, 1995 相似文献

10.

Implementing hybrid consistency with high-level synchronization operations

Roy Friedman 《Distributed Computing》1995,9(3):119-129

Summary. In recent years, there is a growing tendency to support high-level synchronization operations, such as read-modify-write, FIFO queues and stacks, as part of the programmer’s shared memory model. This paper examines the problem of implementing hybrid consistency with high-level synchronization operations. It is shown that for any implementation of weak consistency, the time required to execute a read-modify-write, a dequeue or a pop operation is Ω(d), where d is the network delay. Following this, an efficient and simple algorithm for providing hybrid consistency that supports most types of high-level synchronization operations and weak read and weak write operations is presented. Weak read and weak write operations are executed instantaneously, while the time required to execute strong operations is O(d). This is within a constant factor of the lower bounds for most of the commonly used types of operations. Received: August 1994 / Accepted: June 1995 相似文献

11.

Dynamic reconfiguration of node location in wormhole networks

《Journal of Systems Architecture》2000,46(10):873-888

Several techniques have been developed to increase the performance of parallel computers. Reconfigurable networks can be used as an alternative to increase the performance. Network reconfiguration can be carried out in different ways. Our research has focused on distributed memory systems with dynamic reconfiguration of node location. Briefly, this technique consists of positioning the processors in the network depending on the existing communication pattern among them, to suit the requirements of each computation.In this article, we present a dynamic reconfiguration technique for wormhole networks. We have used both a crossbar and a multistage interconnection network to implement a reconfigurable logical two-dimensional (2-D) torus topology. The reconfiguration mechanism is based on a distributed reconfiguration algorithm. The algorithm is based on a cost function that requires only local information. We discuss reconfiguration features and adjust the different parameters of the reconfiguration algorithm. We have also studied the deadlock problem in reconfigurable wormhole networks, and give details of our solution. Finally, we have evaluated the performance of this technique under several workloads. 相似文献

12.

Developing a Consistent Domain-Oriented Distributed Object Service

Georgiou C. Musial P.M. Shvartsman A.A. 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(11):1567-1585

This paper presents a new algorithm for a reconfigurable distributed domain-oriented atomic object service, called DO-RAMBO, which stands for Domain-Oriented Reconfigurable Atomic Memory for Basic Objects. This service is suitable for inclusion as a middleware system service for distributed applications requiring atomic read/write data. The implementation substantially extends and refines the abstract RAMBO algorithm of Lynch and Shvartsman that supports individual atomic objects. In this paper, domains are introduced to allow the users to group related atomic objects. The new implementation manages configurations on the basis of domains, significantly improving the utility and the performance of the resulting service. DO-RAMBO guarantees consistency under asynchrony, message loss, node crashes, new node arrivals, and node departures. We present the formal algorithm development for DO-RAMBO and give analytical and empirical results that illustrate the benefit of the new approach. 相似文献

13.

GeoQuorums: implementing atomic memory in mobile ad hoc networks

Shlomi Dolev Seth Gilbert Nancy A. Lynch Alexander A. Shvartsman Jennifer L. Welch 《Distributed Computing》2005,18(2):125-155

We present a new approach, the GeoQuorums approach, for implementing atomic read/write shared memory in mobile ad hoc networks. Our approach is based on associating abstract atomic objects with certain geographic locations. We assume the existence of focal points, geographic areas that are normally “populated” by mobile nodes. For example, a focal point may be a road junction, a scenic observation point, or a water resource in the desert. Mobile nodes that happen to populate a focal point participate in implementing a shared atomic object, using a replicated state machine approach. These objects, which we call focal point objects, are prone to occasional failures when the corresponding geographic areas are depopulated. The GeoQuorums algorithm uses the fault-prone focal point objects to implement atomic read/write operations on a fault-tolerant virtual shared object. The GeoQuorums algorithm uses a quorum-based strategy in which each quorum consists of a set of focal point objects. The quorums are used to maintain the consistency of the shared memory and to tolerate limited failures of the focal point objects, which may be caused by depopulation of the corresponding geographic areas. We present a mechanism for changing the set of quorums on the fly, thus improving efficiency. Overall, the new GeoQuorums algorithm efficiently implements read and write operations in a highly dynamic, mobile network. 相似文献

14.

RMR-efficient implementations of comparison primitives using read and write operations

Wojciech Golab Vassos Hadzilacos Danny Hendler Philipp Woelfel 《Distributed Computing》2012,25(2):109-162

We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, compare-and-swap and other comparison primitives, as well as load-linked/store-conditional (LL/SC) using only a constant number of remote memory references (RMRs), in both the cache-coherent and the distributed-shared-memory models of such multiprocessors. Our implementations are blocking, rather than wait-free: they ensure progress provided all processes that invoke the implemented primitive are live. Our results imply that any algorithm using read and write operations, and either comparison primitives or LL/SC, can be simulated by an algorithm that uses read and write operations only, with at most a constant-factor increase in RMR complexity. 相似文献

15.

基于JBits的一种可重构数据处理系统可靠性研究 总被引：1，自引：0，他引：1

任小西李仁发金声震张克环吴强《计算机研究与发展》2007,44(4):722-728

空间太阳望远镜(SST)是一颗对太阳进行观测的科学卫星,它使用FPGA芯片对每天采集的大量数据进行预处理.高昂的建造费用和恶劣的工作环境,确保SST数据的高可靠性成为一项艰巨任务.改进了常规TMR结构,提出一种基于配置数据的可重构硬件故障检测和修复方法,使用JBits工具简化对配置数据的各种操作.此结构和方法能及时检测到故障,通过硬件重构消除故障,提高系统可靠性.采用Markov过程理论对系统可靠性进行分析,结果表明可靠性可得到显著提高. 相似文献

16.

利用页面重构与数据温度识别的闪存缓存算法 总被引：1，自引：0，他引：1

曾祥伟邓玉辉《计算机科学与探索》2021,15(1):84-95

基于闪存的固态盘(SSD)具有比磁盘更加优越的性能,并且在桌面系统中逐渐替代磁盘.但是,尽管在SSD中嵌入了DRAM作为缓存,闪存在不断写入的过程中也可能产生不稳定的写性能,主要是因为逻辑页写入时会频繁引发非覆盖写和垃圾回收操作.针对此问题,提出了一种叫作PRLRU的新型闪存缓存管理方法,通过页面重构机制以及数据温度识... 相似文献

17.

声码器中一种四级可重构ALU的研究与设计

荆涛王沁《小型微型计算机系统》2008,29(12)

在面向语音编解码算法实现的高性能声码器设计中,支持可变长VLIW指令集的ALU单元是实现其设计目标的重要环节.本文提出一种四级可重构的ALU设计,以前缀算法加法器为核心,并通过操作数和资源的重构,能在单周期内完成81种复合算术逻辑运算,同时将其控制编码压缩了58.93%以适应指令集的宽度约束,高效实现了算法中潜在的高并行性,很好的满足了运算密集型的算法应用需求. 相似文献

18.

可重构制造系统的多Agent模型 总被引：1，自引：0，他引：1

贺鹏王庆林徐学强《计算机工程与应用》2006,42(13):185-187,221

论文在介绍可重构制造系统重构方法的基础上,给出了可重构制造系统多Agent模型的结构,并详细描述了基于该模型的可重构车间加工系统的重构算法,最后对重构算法进行仿真验证了该算法的可行性。相似文献

19.

Synchronization-oriented reconfiguration of FPAI under graduation intelligent manufacturing system in the COVID-19 pandemic and beyond

《Journal of Manufacturing Systems》2021

Companies with manufacturing systems that are more responsive and resilient will be able to survive or even gain market shares in the face of the unpredicted variable of an outbreak similar to the COVID-19 pandemic. Motivated by an industrial company restructuring its manufacturing system with the layout of fixed-position assembly islands (FPAI) during the COVID-19 pandemic, this paper introduces the synchronization-oriented reconfiguration of FPAI under Graduation Intelligent Manufacturing System (GiMS). Inspired by the graduation ceremony, a novel manufacturing mode-Graduation Manufacturing System (GMS) with ticket-based reconfigurable structures, is designed for organizing production operations with simplicity and resilience for the layout of FPAI. The IIoT and digital twin-enabled GiMS is developed for transforming real-time visibility in operations to support the reconfiguration of the manufacturing system. A synchronization-oriented reconfiguration mechanism is proposed to achieve the synchronous interaction among changing customer demand, island configuration, and production activities allocation rapidly and cost-effectively. Cloud services integrating the proposed reconfiguration mechanism are developed for managers and onsite operators for supporting the successful reconfiguration implementation with enhanced operational visibility. Through the case study of an industrial company, the effectiveness of the proposed concept and approach is verified. 相似文献

20.

A new fast-path mechanism for mutual exclusion

James H. Anderson Yong-Jik Kim 《Distributed Computing》2001,14(1):17-29

Summary. Several years ago, Yang and Anderson presented an N-process algorithm for mutual exclusion under read/write atomicity that has time complexity, where “time” is measured by counting remote memory references. In this algorithm, instances of a two-process mutual exclusion algorithm are embedded within a binary arbitration tree. In the two-process algorithm that was used, all busy-waiting is done by “local spinning.” Performance studies presented by Yang and Anderson showed that their N-process algorithm exhibits scalable performance under heavy contention. One drawback of using an arbitration tree, however, is that each process is required to perform remote memory operations even when there is no contention. To remedy this problem, Yang and Anderson presented a variant of their algorithm that includes a “fast-path” mechanism that allows the arbitration tree to be bypassed in the absence of contention. This algorithm has the desirable property that contention-free time complexity is O(1). Unfortunately, the fast-path mechanism that was used caused time complexity under contention to rise to in the worst case. To this day, the problem of designing a read/write mutual exclusion algorithm with O(1) time complexity in the absence of contention and O(logN) time complexity under contention has remained open. In this paper, we close this problem by presenting a fast-path mechanism that achieves these time complexity bounds when used in conjunction with Yang and Anderson's arbitration-tree algorithm. Received: July 1999 / Accepted: July 2000 相似文献