期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Image Correlation: Case Study to Examine Trade-Offs in Algorithm-to-Machine Mappings

Armstrong James B. Maheswaran Muthucumaru Theys Mitchell D. Siegel Howard Jay Nichols Mark A. Casey Kenneth H. 《The Journal of supercomputing》1998,12(1-2):7-35

Performance of a parallel algorithm on a parallel machine depends not only on the time complexity of the algorithm, but also on how the underlying machine supports the fundamental operations used by the algorithm. This study analyzes various mappings of image correlation algorithms in SIMD, MIMD, and mixed-mode environments. Experiments were conducted on the Intel Paragon, MasPar MP-1, nCUBE 2, and PASM prototype. The machine features considered in this study include: modes of parallelism, communication/computation ratio, network topology and implementation, SIMD CU/PE overlap, and communication/computation overlap. Performance of an implementation can be enhanced by using algorithmic techniques that match the machine features. Some algorithmic techniques discussed here are additional communication versus redundant computation, data block transfers, and communication/computation overlap. The results presented are applicable to a large class of image processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping the subtasks of an application task, or a set of independent application tasks, onto a heterogeneous suite of parallel machines. 相似文献

2.

总被引：1，自引：0，他引：1

下载免费PDF全文

SUN Ninghui 《计算机科学技术学报》1999,14(3):206-223

The Scalable I/O(SIO)Initiative‘s Low-Level Application Programming Interface(SIO LLAP)provides file system implementers with a simple low-Level interface to support high-level parallel /O interfaces efficiently and effectively.This paper describes a reference implementation and the evaluation of the SIO LLAPI on the Intel Paragon multicomputer.The implementation provides the file system structure and striping algorithm compatible with the Parallel File System(PFS)of Intel Paragon ,and runs either inside the kernel or as a user level library.The scatter-gather addressing read/write,asynchronous I/O,client caching and prefetching mechanism,file access hint mechanism,collective I/O and highly efficient file copy have been implemented.The preliminary experience shows that the SIO LLAPI provides opportunities of significant performance improvement and is easy to implement.Some high level file system interfaces and applications such as PFS,ADIO and Hartree-Fock application,are also implemented on top of SIO.The performance of PFS is at least the same as that of Intel‘s native pfs,and in many cases,such as small sequential file access,huge I/O requests and collective I/O,it is stable and much better,The SIO features help to support high level interfaces easily,quickly and more efficiently,and the cache,prefetching,hints are useful to get better performance based on different access models.The scalability and performance of SIO are limited by the network latency,network scalable bandwidth,memory copy bandwidth,memory size and pattern of I/O requests.The tadeoff between generality and efficienc should be considered in implementation. 相似文献

3.

Parallel Distributive Join Algorithm on the Intel Paragon

Chung Soon M. Chatterjee Arindam 《The Journal of supercomputing》1999,13(2):151-169

In this paper, we analyze the performance of the parallel Distributive Join algorithm that we proposed in Chung and Yang 1995. We implemented the algorithm on an Intel Paragon machine and analyzed the effect of the number of processors and the join selectivity on the performance of the algorithm. We also compared the performance of the Distributive Join (DJ) algorithm with that of the Hybrid-Hash(HH) join algorithm. Our results show that the DJ performs comparably with the HH over the entire range of number of processors used and different join selectivities. A big advantage of the parallel DJ algorithm over the HH join algorithm is that it can easily support non-equijoin operations. The results can also be used to estimate the performance of file I/O intensive applications to be implemented on the Intel Paragon machine. 相似文献

4.

并行文件系统扩容与I/O节点负载平衡策略

曲凤山黄刘生吴雪梅李家俊《计算机工程》2000,26(10):121-123

在规模并行处理系统ＭＰＰ商业化应用中,并行文件系统容量与Ｉ／Ｏ节点的负载平衡一直是挖掘整个系统并行性,进而提高系统性能的关键问题之一。本文从分析Ｐａｒａｇｏｎ大规模并行处理系统的体系结构和并行文件系统的体系结构出发,给出了并行文件系统的扩容策略,使用ＭＰＰ系统的Ｉ／Ｏ节点平衡负载理论模型,给出了实际工程应用中的负载平衡策略。相似文献

5.

The Ecological Facades of Patrick Blanc

Matthew Gandy 《Architectural Design》2010,80(3):28-33

The urban geographer Matthew Gandy explores the work of French botanist Patrick Blanc, who applies his scientific knowledge and preoccupations to urban design. After his invention of the mur végetal (green wall), a botanical and structural system for greening buildings, in 1988, Blanc's work has gone on to transcend the creation of merely living walls. Through his landscape schemes Blanc has recognised the city's rich potential for verdant metamorphosis, transforming fern- and moss-covered streets and buildings into unlikely ravines or rainforests. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

6.

MCGS: A Modified Conjugate Gradient Squared Algorithm for Nonsymmetric Linear Systems 总被引：1，自引：0，他引：1

Maheswaran Muthucumaru Webb Kevin J. Siegel Howard Jay 《The Journal of supercomputing》1999,14(3):257-280

The conjugate gradient squared (CGS) algorithm is a Krylov subspace algorithm that can be used to obtain fast solutions for linear systems (Ax=b) with complex nonsymmetric, very large, and very sparse coefficient matrices (A). By considering electromagnetic scattering problems as examples, a study of the performance and scalability of this algorithm on two MIMD machines is presented. A modified CGS (MCGS) algorithm, where the synchronization overhead is effectively reduced by a factor of two, is proposed in this paper. This is achieved by changing the computation sequence in the CGS algorithm. Both experimental and theoretical analyses are performed to investigate the impact of this modification on the overall execution time. From the theoretical and experimental analysis it is found that CGS is faster than MCGS for smaller number of processors and MCGS outperforms CGS as the number of processors increases. Based on this observation, a set of algorithms approach is proposed, where either CGS or MGS is selected depending on the values of the dimension of the A matrix (N) and number of processors (P). The set approach provides an algorithm that is more scalable than either the CGS or MCGS algorithms. The experiments performed on a 128-processor mesh Intel Paragon and on a 16-processor IBM SP2 with multistage network indicate that MCGS is approximately 20% faster than CGS. 相似文献