期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A scalable high-performance computing solution for networks onchips

《Micro, IEEE》2002,22(5):46-55

The Eclipse network-on-a-chip architecture uses a sophisticated parallel programming model, realized through multithreaded processors, interleaved memory modules, and a high-capacity interconnection network to support system-on-a-chip designs 相似文献

2.

PARCSIM: a parallel computing simulator for scalable software optimization

Cámara Jesús Cano José-Carlos Cuenca Javier Saura-Sánchez Mariano 《The Journal of supercomputing》2022,78(15):17231-17246

The Journal of Supercomputing - PARCSIM is a parallel software simulator that allows a user to capture, through a graphical interface, matrix algorithm schemes that solve scientific problems. With... 相似文献

3.

一个网络并行计算新平台

李代平罗寿文张信一方海翔《计算机工程与设计》2005,26(1):24-26,137

编写网络并行计算程序对一般人来说是艰难的,用户任务的分解、分配以及在子任务间的交互等问题都需要具有高超的技巧。从改善用户并行程序设计环境出发,给出了在网络并行计算的一种新平台。说明了该系统平台的结构实现,组成该平台的任务描述器、任务的调度和任务控制器。而在网络并行计算的一种新架构中,用户只需提交数据和对它们的操作,而复杂问题让系统处理,这无疑是网络并行计算方法发展的有益尝试。相似文献

4.

The design of an operating system for a scalable parallel computing engine

Paul Austin Kevin Murray Andy Wellings 《Software》1991,21(10):989-1013

There are substantial benefits to be gained from building computing systems from a number of processors working in parallel. One of the frequently-stated advantages of parallel and distributed systems is that they may be scaled to the needs of the user. This paper discusses some of the problems associated with designing a general-purpose operating system for a scalable parallel computing engine and then describes the solutions adopted in our experimental parallel operating system. We explain why a parallel computing engine composed of a collection of processors communicating through point-to-point links provides a suitable vehicle in which to realize the advantages of scaling. We then introduce a parallel-processing abstraction which can be used as the basis of an operating system for such a computing engine. We consider how this abstraction can be implemented and retain the ability to scale. As a concrete example of the ideas presented here we describe our own experimental scalable parallel operating-system project, concentrating on the Wisdom nucleus and the Sage file system. Finally, after introducing related work, we describe some of the lessons learnt from our own project. 相似文献

5.

A parallel computing engine for a class of time critical processes

Nabhan T.M. Zomaya A.Y. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1997,27(5):774-786

This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results. 相似文献

6.

A highly scalable parallel encoder version of the emergent JEM video encoder

López-Granado O. Migallón H. Martínez-Rach M. Galiano V. Malumbres M. P. Van Wallendael Glenn 《The Journal of supercomputing》2019,75(3):1429-1442

The Journal of Supercomputing - In 2016, 73% of total Internet traffic came from video transmission and this percentage is expected to reach 82% by 2021. These figures show the importance of using... 相似文献

7.

Optically interconnected parallel computing systems

Ishikawa M. McArdle N. 《Computer》1998,31(2):61-68

Tomorrow's systems will need high-bandwidth and dense communication paths at various levels. Commercial high-performance computers are now beginning to use optical interconnections at the inter-cabinet level. These connections usually consist of optical fiber ribbons, with each fiber carrying signals at 1 to 2 Gbit/s over distances of 200 to 300 m. The aggregate bandwidth is as much as 30 Gbit/s. The authors propose integrating suitable optoelectronic devices with silicon electronics, which will allow designers to use optical communication channels to transfer data on and off chips. The authors describe an optically interconnected architecture for high-speed computation, image processing and robotic vision systems. They conclude that optoelectronic parallel processing systems will overcome some of the interconnection problems facing conventional electronic technology-allowing high-speed computers powerful enough for vision and image processing applications 相似文献

8.

A run-time load balancing strategy for highly parallel systems

Didier Y. Hinz 《Acta Informatica》1992,29(1):63-94

We discuss a simple run-time load balancing strategy which applies to numerical applications working on planar domains with localized data dependency. We develop an iterative and adaptive partitioner, able to work in a distributed way among the processors of a parallel system. Our algorithm subdivides data space into general quadrilaterals, where each processor works on the data of one area. The topology of these domains is that of a rectangular grid and does not change during execution. In this way a very simple and efficient communication structure is given. The administration overhead due to irregular geometry is small. Also, the overhead caused by periodically read-justing load balance is rather small because of the adaptivity and parallelity of the partitioning algorithm. We ran an scientific application to compare our method with a method working by recursive bisection, and obtained satisfactory results. 相似文献

9.

A language and programming environment for high-performance parallel computing on heterogeneous networks

A. L. Lastovetsky A. Ya. Kalinov I. N. Ledovskikh D. M. Arapov M. A. Posypkin 《Programming and Computer Software》2000,26(4):216-236

An mpC language designed specifically for programming high-performance computations on heterogeneous networks is described. An mpC program explicitly defines an abstract computing network and distributes data, computations, and communications over it. At runtime, the mpC programming environment uses this information and that about the actual network to distribute the processes over the actual network so as to execute the program in the most efficient way. Experience in using mpC for solving problems on local networks consisting of heterogeneous workstations is discussed. 相似文献

10.

Supporting schedules of resource co-allocation for distributed computing in scalable systems

V. V. Toporkov 《Programming and Computer Software》2008,34(3):160-172

This paper proposes a model of scheduling and validates methods of resource co-allocation for distributed computations in scalable systems. Solution of the problem of allocating heterogeneous computing resources for performing complex sets of tasks (jobs) is related to the formation of strategies (families of admissible supporting schedules). The choice of a specific schedule depends on the nature of events occurring in the distributed environment and related primarily to the load and accessibility of computing nodes. 相似文献

11.

Congestion control for asynchronous parallel computing on workstation networks

《Parallel Computing》1997,23(13):1855-1875

Asynchronous parallel computing can result in high message generation rates, thus triggering network congestion. We characterize the communication requirements of a large class of supercomputing applications falling under the category of fixed-point problems amenable to solution by parallel iterative methods. In particular, we concentrate on asynchronous iterative algorithms whose communication/computation ratio is especially high resulting in degraded effective throughput if communication is not managed properly. Second, we show the effects of network contention and asynchrony on application performance in a local-area network environment and investigate methods of solution. Our approach is based on a congestion control algorithm called ‘warp control’ whose adaptive properties are exploited to yield significant performance enhancements when network contention is high. Although tested in a LAN environment for experimental control purposes, our solution follows the end-to-end paradigm and refrains from exploiting special MAC-layer properties to achieve applicability to general WAN environments. Third, we provide a framework wherein efficient congestion control can be facilitated, encompassing methods acting at the application layer as well as the transport/network layer, with emphasis on application-driven control. We conclude with a discussion of our experimental results and special issues arising in high-bandwidth ATM networks. 相似文献

12.

一种用于并行计算系统的分级光环互连网络

魏进民庞亚红毛幼菊《计算机工程与设计》2005,26(4):961-963,992

介绍了大规模并行计算的一个新互连网络——分级光环互连,适用于多处理器与多计算机的可升级网络。分级光环互连由一个衡量无阻塞、容错的单跳可升级互连拓扑组成,并通过波分多址技术充分地利用了光纤的TeraHz带宽。此光网络融合了分级环的互连节点接口简单、节点度恒定、容错等各种吸引人的特征以及光通信的各种优点。提出了分级光环互连拓扑,分析了其结构特征并描述了光设计的方法,导出了一个简短可行的分级光环互连研究。相似文献

13.

Extending Unix for scalable computing

DeBenedictis E.P. Johnson S.C. 《Computer》1993,26(11):43-53

Because it retrieves all instructions and data from a single memory, the von Neumann computer architecture has a fundamental speed limit. The scalable multicomputer architecture, which uses many microprocessors together to solve a single problem and can run at teraflop speeds, may be a solution. While teraflop processor technology is known, the scalable operating and I/O system technology necessary for those speeds are not known. The authors describe how Unix can be extended to scalable computing, permitting teraflop speeds and offering parallel computing to users unfamiliar with parallel programming. They designed this technology into the system software of the Ncube-2, the predecessor to Ncube's announced teraflop parallel computer. The authors describe the system in detail and provide some performance results 相似文献

14.

A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems 总被引：3，自引：0，他引：3

M. Mezmaz N. Melab Y. Kessaci Y.C. Lee E.-G. Talbi A.Y. Zomaya D. TuyttensAuthor vitae 《Journal of Parallel and Distributed Computing》2011,71(11):1497-1508

In this paper, we investigate the problem of scheduling precedence-constrained parallel applications on heterogeneous computing systems (HCSs) like cloud computing infrastructures. This kind of application was studied and used in many research works. Most of these works propose algorithms to minimize the completion time (makespan) without paying much attention to energy consumption.We propose a new parallel bi-objective hybrid genetic algorithm that takes into account, not only makespan, but also energy consumption. We particularly focus on the island parallel model and the multi-start parallel model. Our new method is based on dynamic voltage scaling (DVS) to minimize energy consumption.In terms of energy consumption, the obtained results show that our approach outperforms previous scheduling methods by a significant margin. In terms of completion time, the obtained schedules are also shorter than those of other algorithms. Furthermore, our study demonstrates the potential of DVS. 相似文献

15.

Distributed computing with high-speed optical networks

Vetter R.J. Du D.H.C. 《Computer》1993,26(2):8-18

An environment that uses wavelength division multiplexing techniques and optical switching and processing to provide large bandwidths, short delays, and multiple data streams for distributed processing is described. The focus is on the interrelationship between application needs and network services. The system level, a conceptual layer designed to bridge the gap between application requirements and underlying high-speed network services, is proposed. The system level is a logical view of the physical network represented by a virtual topology projected onto the physical network. Embedding this virtual topology introduces many new problems and performance tradeoffs into the design of the network. A few of these problems are outlined, and some initial research efforts in this area are discussed. The physical network level, the collection of optical fiber links interconnecting the nodes in the network, and the application level, a logical view of an application's computational topology and representation of the application's communication and computing requirements, are also described 相似文献

16.

F-MPJ: scalable Java message-passing communications on parallel systems 总被引：1，自引：0，他引：1

Guillermo L. Taboada Juan Touri?o Ramón Doallo 《The Journal of supercomputing》2012,60(1):117-140

This paper presents F-MPJ (Fast MPJ), a scalable and efficient Message-Passing in Java (MPJ) communication middleware for parallel computing. The increasing interest in Java as the programming language of the multi-core era demands scalable performance on hybrid architectures (with both shared and distributed memory spaces). However, current Java communication middleware lacks efficient communication support. F-MPJ boosts this situation by: (1) providing efficient non-blocking communication, which allows communication overlapping and thus scalable performance; (2) taking advantage of shared memory systems and high-performance networks through the use of our high-performance Java sockets implementation (named JFS, Java Fast Sockets); (3) avoiding the use of communication buffers; and (4) optimizing MPJ collective primitives. Thus, F-MPJ significantly improves the scalability of current MPJ implementations. A performance evaluation on an InfiniBand multi-core cluster has shown that F-MPJ communication primitives outperform representative MPJ libraries up to 60 times. Furthermore, the use of F-MPJ in communication-intensive MPJ codes has increased their performance up to seven times. 相似文献

17.

Future scenarios of parallel computing: Distributed sensor networks

《Journal of Visual Languages and Computing》2007,18(5):484-491

Over the past few years, motivated by the accelerating technological convergence of sensing, computing and communications, there has been a growing interest in potential and technological challenges of Wireless Sensor Network. This paper will introduce a wide range of current basic research lines dealing with ad hoc networks of spatially distributed systems, data rate requirements and constraints, real-time fusion and registration of data from distributed sensors, cooperative control, hypothesis generation, and network consensus filtering. This technical domain has matured to the point where a number of industrial products and systems have appeared. The presentation will also describe the state of the art regarding current and soon-to-appear applications. 相似文献

18.

A scalable parallel method for large collision detection problems

Hammad Mazhar Toby Heyn Dan Negrut 《Multibody System Dynamics》2011,26(1):37-55

This paper discusses a parallel collision detection algorithm. Implemented using software executed on ubiquitous Graphics Processing Unit (GPU) cards, the algorithm demonstrates two orders of magnitude speedup over a state-of-the art sequential implementation when handling multimillion object collision detection tasks. GPUs are composed of many (on the order of hundreds) scalar processors that can simultaneously execute an operation; this strength is leveraged in the proposed algorithm, which combines the use of multiple CPU cores with multiple GPUs. The software implementation of the algorithm can be used to detect collisions between five million objects in less than two seconds and was used to detect 1.4 billion contact events in less than 40 seconds. A spherical padding approach is used to represent surface geometries as large collections of spheres when dealing with collision detection between bodies with complex geometries. The proposed methodology is expected to be relevant in computational mechanics with applications in granular flow dynamics and smoothed particle hydrodynamics (SPH), where the number of contact events ranges from millions to billions. 相似文献

19.

A scalable,parallel algorithm for maximal clique enumeration

Matthew C. Schmidt Nagiza F. Samatova Kevin Thomas Byung-Hoon Park 《Journal of Parallel and Distributed Computing》2009

The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing intelligently coupled with work stack splitting to minimize computing elements’ idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks. 相似文献

20.

A parallel algorithm for solving special tridiagonal systems on ring networks

Dr. K. -L. Chung Dr. W. -M. Yan Mr. J. -G. Wu 《Computing》1996,56(4):385-395

The solution of special linear, circulant-tridiagonal systems is considered. In this paper, a fast parallel algorithm for solving the special tridiagonal systems, which includes the skew-symmetric and tridiagonal-Toeplitz systems, is presented. Employing the diagonally dominant property, our parallel solver does need only local communications between adjacent processors on a ring network. An error analysis is also given. On the nCUBE/2E multiprocessors, some experimental results demonstrate the good performance of our stable parallel solver. 相似文献