期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data-Driven Multithreading Using Conventional Microprocessors

Kyriacou C. Evripidou P. Trancoso P. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(10):1176-1188

This paper describes the Data-Driven Multithreading (DDM) model and how it may be implemented using off-the-shelf microprocessors. Data-Driven Multithreading is a nonblocking multithreading execution model that tolerates internode latency by scheduling threads for execution based on data availability. Scheduling based on data availability can be used to exploit cache management policies that reduce significantly cache misses. Such policies include firing a thread for execution only if its data is already placed in the cache. We call this cache management policy the CacheFlow policy. The core of the DDM implementation presented is a memory mapped hardware module that is attached directly to the processor's bus. This module is responsible for thread scheduling and is known as the Thread Synchronization Unit (TSU). The evaluation of DDM was performed using simulation of the Data-Driven Network of Workstations ({rm{D}}^2{rm{NOW}}). {rm{D}}^2{rm{NOW}} is a DDM implementation built out of regular workstations augmented with the TSU. The simulation was performed for nine scientific applications, seven of which belong to the SPLASH-2 suite. The results show that DDM can tolerate well both the communication and synchronization latency. Overall, for 16 and 32-node {rm{D}}^2{rm{NOW}} machines the speedup observed was 14.4 and 26.0, respectively. 相似文献

2.

Block scheduling of iterative algorithms and graph-level priorityscheduling in a simulated data-flow multiprocessor

Evripidou P. Gaudiot J.-L. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(4):398-413

Iterative methods for solving linear systems are discussed. Although these methods are inherently highly sequential, it is shown that much parallelism could be exploited in a data-flow system by scheduling the iterative part of the algorithms in blocks and by looking ahead across several iterations. This approach is general and will apply to other iterative and loop-based problems. It is also demonstrated by simulation that relying solely on data-driven scheduling of parallel and unrolled loops results in low resource utilization and poor performance. A graph-level priority scheduling mechanism has been developed that greatly improves resource utilization and yields higher performance 相似文献

3.

A Case for Chip Multiprocessors Based on the Data-Driven Multithreading Model

Pedro Trancoso Paraskevas Evripidou Kyriakos Stavrou Costas Kyriacou 《International journal of parallel programming》2006,34(3):213-235

Current high-end microprocessors achieve high performance as a result of adding more features and therefore increasing complexity. This paper makes the case for a Chip-Multiprocessor based on the Data-Driven Multithreading (DDM-CMP) execution model in order to overcome the limitations of current design trends. Data-Driven Multithreading (DDM) is a multithreading model that effectively hides the communication delay and synchronization overheads. DDM-CMP avoids the complexity of other designs by combining simple commodity microprocessors with a small hardware overhead for thread scheduling and an interconnection network. Preliminary experimental results show that a DDM-CMP chip of the same hardware budget as a high-end commercial microprocessor, clocked at the same frequency, achieves a speedup of up to 18.5 with a 78–81% power consumption of the commercial chip. Overall, the estimated results for the proposed DDM-CMP architecture show a significant benefit in terms of both speedup and power consumption making it an attractive architecture for future processors. 相似文献

4.

Advanced array optimizations for high performance functionallanguages

Cann D.C. Evripidou P. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(3):229-239

We discuss and evaluate three optimizations for reducing memory management overhead and data copying costs in SISAL 1.2 programs that build arrays. The first, called framework preconstruction, eliminates superfluous allocate-deallocate sequences in cyclic computations. The second, called aggregate storage subsumption, reduces the management overhead for compound array components. The third, called predictive storage preallocation, eliminates superfluous data copying in filtered array constructions and simplifies their parallelization. We have added all three optimizations to the Optimizing SISAL Compiler with rewarding improvements in SISAL program performance on vector-parallel machines such as those built by Cray Computer Corporation, Convex, and Cray Research 相似文献

5.

Introduction

Paraskevas Evripidou 《International journal of parallel programming》2000,28(6):535-536

相似文献

6.

Mobile Agents for Wireless Computing: The Convergence of Wireless Computational Models with Mobile-Agent Technologies

Constantinos Spyrou George Samaras Evaggelia Pitoura Paraskevas Evripidou 《Mobile Networks and Applications》2004,9(5):517-528

Wireless mobile computing breaks the stationary barrier and allows users to compute and access information from anywhere and at anytime. However, this new freedom of movement does not come without new challenges. The mobile computing environment is constrained in many ways. Mobile elements are resource-poor and unreliable. Their network connectivity is often achieved through low-bandwidth wireless links. Furthermore, connectivity is frequently lost for variant periods of time. The difficulties raised by these constraints are compounded by mobility that induces variability in the availability of both communication and computational resources. These severe restrictions have a great impact on the design and structure of mobile computing applications and motivate the development of new software models. To this end, a number of extensions to the traditional distributed system architectures have been proposed [26]. These new software models, however, are static and require a priori set up and configuration. This in effect limits their potential in dynamically serving the mobile client; the client cannot access a site at which an appropriate model is not configured in advance. The contribution of this paper is twofold. First, the paper shows how an implementation of the proposed models using mobile agents eliminates this limitation and enhances the utilization of the models. Second, new frameworks for Web-based distributed access to databases are proposed and implemented. 相似文献

7.

Guest Editorial: Special Issue on 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XI)

John McAllister Luigi Carro Skevos Evripidou 《International journal of parallel programming》2013,41(2):161-162

相似文献

8.

Parallel Implementations of the Selection Problem: A Case Study

Marc Daumas Paraskevas Evripidou 《International journal of parallel programming》2000,28(1):103-131

The selection problem has been studied extensively on sequential machines. A linear average time solution and a linear worst-case solution are considered as the standard by most researchers. Theoretical work is also available on parallel models, but it has not been widely implemented on parallel machines. This paper presents an in-depth analysis of the implementation of the standard algorithms, on a number of multiprocessors and supercomputers from the entire spectrum of Flynn's classification, using both an imperative (C based languages with vendor specific parallel extensions) and a functional (SISAL) language. Very interesting results were obtained for all of the experiments performed, leading us to the conclusion that the selection problem has very efficient parallel implementations. Hand-tuned C programs with parallel extensions provided good efficiency but were time-consuming in terms of development. On the other hand, the SISAL code is fully portable and the same program was used on all the machines. The performances of SISAL implementations were comparable to the ones of the hand-tuned C implementations. On all the tests, the routines were able to sustain good speed-up and reasonable efficiency, even with a large number of processors. In two cases (one machine using SISAL, and one using a C-based language), we were able to obtain an efficiency higher than 80% with a configuration close or equal to the maximum number of processors. 相似文献

9.

Metacomputing with Mobile Agents

Paraskevas Evripidou George Samaras 《International journal of parallel programming》2006,34(5):429-458

In this paper we present Parallel Computing with Mobile Agents (PaCMAn), a mobile agent based Metacomputer that enables its users to utilize idle resources on the internet to tackle computational problems that could not be handled efficiently with their own resources. The PaCMAn launches multiple mobile agents that cooperate and communicate to solve problems in parallel. Each agent supports the basic communication and synchronization tasks of the classical parallel worker assuming the role of a process in a parallel processing application. Application tasks, however, are assigned dynamically to the PaCMAn’s mobile agents via TaskHandlers. TaskHandlers are Java objects capable of implementing particular tasks of the application. The PaCMAn consists of three major components: Broker, Server and Client. A server machine has to be explicitly registered in order to take part in the PaCMAn Metacomputer. A number of brokers keep track of the available resources. In the PaCMAn system both server and client machines can be located anywhere in the Internet. The clients select the servers that they will utilize based on the specific resource requirements. We have developed and tested prototype systems with several applications. These prototypes provide proof of concept of our proposed Metacomputing philosophy. Furthermore they have demonstrated that PaCMAn provides parallel efficiency. We also demonstrate that the PaCMAn Metacomputer can be used as the computational engine for the creation of sophisticated Pervasive Services anywhere anytime. 相似文献

10.

On the computation of the inverse of a two-variable polynomial matrix by interpolation

Nicholas P. Karampetakis Alexandros Evripidou 《Multidimensional Systems and Signal Processing》2012,23(1-2):97-118

Two interpolation algorithms are presented for the computation of the inverse of a two variable polynomial matrix. The first interpolation algorithm, is based on the Lagrange interpolation method that matches pre-assigned data of the determinant and the adjoint of a two-variable polynomial matrix, on a set of points on several circles centered at the origin. The second interpolation algorithm is using discrete fourier transforms (DFT) techniques or better fast fourier transforms which are very efficient algorithms available both in software and hardware and that they are greatly benefitted by the existence of a parallel environment (through symmetric multiprocessing or other techniques). The complexity of both algorithms is discussed and illustrated examples are given. The DFT-algorithm is implemented in the Mathematica programming language and tested in comparison to the respective built-in function of Mathematica. 相似文献