共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents the implementation of two connected component labelling algorithms on the BLITZEN massively parallel processor that was developed recently for NASA. The topology of BLITZEN is a two-dimensional mesh that can be dynamically configured to also support diagonal data transfers. It is shown that an algorithm based on Levialdi's connected component shrinking process performs much better than a straightforward algorithm for connected component labelling. 相似文献
2.
随着网络的发展,网络处理器在构建未来网络系统中发挥着越来越重要的作用。在网络处理器产生的历史背景的基础上,分析了评价其硬件体系结构的主要方面,研究了几种典型的网络处理器的体系结构,总结了网络处理器的总体体系结构特点,阐述了其现有体系结构的发展及下一步的工作。 相似文献
3.
Two parallel computer paradigms available today are multi-core accelerators such as the Sony, Toshiba and IBM Cell or Graphics Processing Unit (GPUs), and massively parallel message-passing machines such as the IBM Blue Gene (BG). The solution of systems of linear equations is one of the most central processing unit-intensive steps in engineering and simulation applications and can greatly benefit from the multitude of processing cores and vectorisation on today's parallel computers. We parallelise the conjugate gradient (CG) linear equation solver on the Cell Broadband Engine and the IBM Blue Gene/L machine. We perform a scalability analysis of CG on both machines across 1, 8 and 16 synergistic processing elements and 1–32 cores on BG with heptadiagonal matrices. The results indicate that the multi-core Cell system outperforms by three to four times the massively parallel BG system due to the Cell's higher communication bandwidth and accelerated vector processing capability. 相似文献
4.
In this paper we present a massively parallel open source solver for Richards equation, named the RichardsFOAM solver. This solver has been developed in the framework of the open source generalist computational fluid dynamics tool box OpenFOAM ® and is capable to deal with large scale problems in both space and time. The source code for RichardsFOAM may be downloaded from the CPC program library website. 相似文献
5.
已经能多种从不同的角度利用芯片晶体管资源的途径问世,MITI计算机系统实验室提出的Raw计算机系统结构是较有特色的一种,与其他的并行计算途径不同,Raw结构充分考虑了微粒度并行计算。具有较好的并行效率。同时具有其自身的内在可扩展性,代表了计算机体系结构研究的一个新方向。 相似文献
6.
POTENTIAL is a virtual database machine based on general computing platforms,especially parllel computing platforms.It provides a complete solution to high-performance database systems by a ‘virtual processor virtual data bus virtual memory‘ architecture.Virtual processors manage all CPU resources in the system,on which various operations are running.Virtual data bus is responsible for the management of data transmission between associated operations.which forms the higes of the entire system.Virtual memory provides efficient data storage and buffering mechanisms that conform to data reference behaviors in database systems.The architecture of POTENTIAL is very clear and has many good features,including high efficiency,high scalability,high extensibility,high portability,etc. 相似文献
7.
One issue which is central in developing a general purpose FFT subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. In this paper we present an FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications. We have also addressed the problem of rearranging the data after computing the FFT. We have evaluated the performance of our implementation on a distributed memory parallel machine, the Intel iPSC/860. 相似文献
8.
The design of a fast, flexible and dynamically microprogrammable pipelined image processor is presented. The machine is especially suited, though not completely devoted, to perform local operations (up to 16 × 16) of both logical and arithmetic character on pictures, stored in a random access image memory in a 256 level grey scale. Separate parts of the machine take care of data manipulation and address generation. The machine's functioning is illustrated by discussing the way in which arithmetic N × N neighbourhood operations and binary 3 × 3 neighbourhood operations were implemented and finally the software supporting microprogram development and debugging and the run-time support software is described. 相似文献
9.
We describe the evolution of a distributed shared memory (DSM) system, Mirage, and the difficulties encountered when moving the system from a Unix-based* kernel on the VAX to a Unix-based kernel on personal computers. Mirage provides a network transparent form of shared memory for a loosely coupled environment. The system hides network boundaries for processes that are accessing shared memory and is upward compatible with the Unix System V Interface Definition. This paper addresses the architectural dependencies in the design of the system and evaluates performance of the implementation. The new version, MIRAGE +, performs well compared to Mirage even though eight times the amount of data is sent on each page fault because of the larger page size used in the implementation. We show that performance of systems with a large page size to network packet size can be dramatically improved on conventional hardware by applying three well-known techniques: packet blasting, compression, and running at interrupt level. The measured time for a page fault in MIRAGE + has been reduced 37 per cent by sending a page using packet blasting instead of using a handshake for each portion of the page. When compression was added to MIRAGE +, the time to fault a page across the network was further improved by 47 per cent when the page was compressed into one network packet. Our measured performance compares favorably with the amount of time it takes to fault a page from disk. Lastly, running at interrupt level may improve performance 16 per cent when faulting pages without compression. 相似文献
10.
The HIRLAM (high resolution limited area modelling) limited-area atmospheric model was originally developed and optimized for shared memory vector-based computers, and has been used for operational weather forecasting on such machines for several years. This paper describes the algorithms applied to obtain a highly parallel implementation of the model, suitable for distributed memory machines. The performance results presented indicate that the parallelization effort has been successful, and the Norwegian Meteorological Institute will run the parallel version in production on a Cray T3E. 相似文献
11.
该文旨在分析网络处理器能够同时满足高性能和灵活性要求的体系结构。而传统的网络设备单纯采用专用芯片或者基于RISC的通用处理器(GPPs),很难兼顾这两者要求。该文根据网络处理器的处理空间,将其映射为5个逻辑模块,这些模块由网络处理器中各个功能部件实现。然后分析了网络处理器的SMP和Pipeline两种并行结构,并进一步分析了隐藏延迟等实现加速的技术。最后分析了网络应用发展变化对网络处理器体系结构设计的挑战,并提出了解决办法。 相似文献
12.
The Earth Simulator (ES), developed under the Japanese government’s initiative “Earth Simulator project”, is a highly parallel vector supercomputer system. In this paper, an overview of ES, its architectural features, hardware technology and the result of performance evaluation are described. In May 2002, the ES was acknowledged to be the most powerful computer in the world: 35.86 teraflop/s for the LINPACK HPC benchmark and 26.58 teraflop/s for an atmospheric general circulation code (AFES). Such a remarkable performance may be attributed to the following three architectural features; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network. The ES consists of 640 processor nodes (PN) and an interconnection network (IN), which are housed in 320 PN cabinets and 65 IN cabinets. The ES is installed in a specially designed building, 65 m long, 50 m wide and 17 m high. In order to accomplish this advanced system, many kinds of hardware technologies have been developed, such as a high-density and high-frequency LSI, a high-frequency signal transmission, a high-density packaging, and a high-efficiency cooling and power supply system with low noise so as to reduce whole volume of the ES and total power consumption. For highly parallel processing, a special synchronization means connecting all nodes, Global Barrier Counter (GBC), has been introduced. 相似文献
13.
In wireless sensor networks (WSNs), energy is valuable because it is scarce. This causes their life time to be determined by their ability to use the available energy in an effective and frugal manner. In most of the earlier sensor network applications, the main requirement consisted mainly of data collection but transmitting all of the raw data out of the network may be prohibitively expensive (in terms of communication) or impossible at given data collection rates.In the last decade, the use of the database paradigm has emerged as a feasible solution to manage data in a WSN context. There are various sensor network query processors (SNQPs) (implementing in-network declarative query processing) that provide data reduction, aggregation, logging, and auditing facilities. These SNQPs view the wireless sensor network as a distributed database over which declarative query processor can be used to program a WSN application with much less effort. They allow users to pose declarative queries that provide an effective and efficient means to obtain data about the physical environment, as users would not need to be concerned with how sensors are to acquire the data, or how nodes transform and/or transmit the data.This paper surveys novel approaches of handling query processing by the current SNQP literature, the expressiveness of their query language, the support provided by their compiler/optimizer to generate efficient query plans and the kind of queries supported. We introduce the challenges and opportunities of research in the field of in-network sensor network query processing as well as illustrate the current status of research and future research scopes in this field. 相似文献
14.
根据航天实时图象处理的需求,本文设计了一种基于MPP技术的主从式并行计算机系统,主要介绍主从机通过共享数据存储器的通信方式和互斥工作方式。遵从实时性、可靠性、高精度原则,该系统采用了程序和数据分开存储的组织结构。 相似文献
15.
This paper describes the architecture of a cellular processor capable of directly and efficiently executing reduction languages as defined by Backus. The processor consists of two interconnected networks of microprocessors, one of which is a linear array of identical cells, and the other a tree-structured network of identical cells. Both kinds of cells have modest processing and storage requirements. The processor directly interprets a high-level language, and its efficient operation is not restricted to any special class of problems. Memory space permitting, the processor accommodates the unbounded parallelism allowed by reduction languages in any single user program; it is also able to execute many user programs simultaneously. 相似文献
16.
ContextThe way global software development (GSD) activities are managed impacts knowledge transactions between team members. The first is captured in governance decisions, and the latter in a transactive memory system (TMS), a shared cognitive system for encoding, storing and retrieving knowledge between members of a group. ObjectiveWe seek to identify how different governance decisions (such as business strategy, team configuration, task allocation) affect the structure of transactive memory systems as well as the processes developed within those systems. MethodWe use both a quantitative and a qualitative approach. We collect quantitative data through an online survey to identify transactive memory systems. We analyze transactive memory structures using social network analysis techniques and we build a latent variable model to measure transactive memory processes. We further support and triangulate our results by means of interviews, which also help us examine the GSD governance modes of the participating projects. We analyze governance modes, as set of decisions based on three aspects; business strategy, team structure and composition, and task allocation. ResultsOur results suggest that different governance decisions have a different impact on transactive memory systems. Offshore insourcing as a business strategy, for instance, creates tightly-connected clusters, which in turn leads to better developed transactive memory processes. We also find that within the composition and structure of GSD teams, there are boundary spanners (formal or informal) who have a better overview of the network’s activities and become central members within their network. An interesting mapping between task allocation and the composition of the network core suggests that the way tasks are allocated among distributed teams is an indicator of where expertise resides. ConclusionWe present an analytical method to examine GSD governance decisions and their effect on transactive memory systems. Our method can be used from both practitioners and researchers as a “cause and effect” tool for improving collaboration of global software teams. 相似文献
17.
This paper studies the performance of Peer-to-Peer storage and backup systems (P2PSS). These systems are based on three pillars: data fragmentation and dissemination among the peers, redundancy mechanisms to cope with peers churn and repair mechanisms to recover lost or temporarily unavailable data. Usually, redundancy is achieved either by using replication or by using erasure codes. A new class of network coding (regenerating codes) has been proposed recently. Therefore, we will adapt our work to these three redundancy schemes. We introduce two mechanisms for recovering lost data and evaluate their performance by modeling them through absorbing Markov chains. Specifically, we evaluate the quality of service provided to users in terms of durability and availability of stored data for each recovery mechanism and deduce the impact of its parameters on the system performance. The first mechanism is centralized and based on the use of a single server that can recover multiple losses at once. The second mechanism is distributed: reconstruction of lost fragments is iterated sequentially on many peers until that the required level of redundancy is attained. The key assumptions made in this work, in particular, the assumptions made on the recovery process and peer on-times distribution, are in agreement with the analysis in [1] and in [2] respectively. The models are thereby general enough to be applicable to many distributed environments as shown through numerical computations. We find that, in stable environments such as local area or research institute networks where machines are usually highly available, the distributed-repair scheme in erasure-coded systems offers a reliable, scalable and cheap storage/backup solution. For the case of highly dynamic environments, in general, the distributed-repair scheme is inefficient, in particular to maintain high data availability, unless the data redundancy is high. Using regenerating codes overcomes this limitation of the distributed-repair scheme. P2PSS with centralized-repair scheme are efficient in any environment but have the disadvantage of relying on a centralized authority. However, the analysis of the overhead cost (e.g. computation, bandwidth and complexity cost) resulting from the different redundancy schemes with respect to their advantages (e.g. simplicity), is left for future work. 相似文献
18.
We construct a parallel algorithm, suitable for distributed memory architectures, of an explicit shock-capturing finite volume method for solving the two-dimensional shallow water equations. The finite volume method is based on the very popular approximate Riemann solver of Roe and is extended to second order spatial accuracy by an appropriate TVD technique. The parallel code is applied to distributed memory architectures using domain decomposition techniques and we investigate its performance on a grid computer and on a Distributed Shared Memory supercomputer. The effectiveness of the parallel algorithm is considered for specific benchmark test cases. The performance of the realization measured in terms of execution time and speedup factors reveals the efficiency of the implementation. 相似文献
19.
System logs constitute a rich source of information for detection and prediction of anomalies. However, they can include a huge volume of data, which is usually unstructured or semistructured. We introduce DILAF, a framework for distributed analysis of large-scale system logs for anomaly detection. DILAF is comprised of several processes to facilitate log parsing, feature extraction, and machine learning activities. It has two distinguishing features with respect to the existing tools. First, it does not require the availability of source code of the analyzed system. Second, it is designed to perform all the processes in a distributed manner to support scalable analysis in the context of large-scale distributed systems. We discuss the software architecture of DILAF and we introduce an implementation of it. We conducted controlled experiments based on two datasets to evaluate the effectiveness of the framework. In particular, we evaluated the performance and scalability attributes under various degrees of parallelism. Results showed that DILAF can maintain the same accuracy levels while achieving more than 30% performance improvement on average as the system scales, compared to baseline approaches that do not employ fully distributed processing. 相似文献
|