共查询到20条相似文献,搜索用时 15 毫秒
1.
A. Namazifard I. D. Parsons 《International journal for numerical methods in engineering》2004,61(8):1173-1208
We describe the parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems. We focus on a distributed memory programming model and use the MPI library to perform the required interprocessor communications. We present an algebraic framework for our parallel computations, and describe an object‐based programming methodology using Fortran90. The performance of the implementation is measured by solving both fixed‐ and scaled‐size problems on three different parallel computers (an SGI Origin2000, an IBM SP2 and a Cray T3E). The code performs well in terms of speedup, parallel efficiency and scalability. However, the floating point performance is considerably below the peak values attributed to these machines. Lazy processors are documented on the Origin that produce reduced performance statistics. The solution of two problems on an SGI Origin2000, an IBM PowerPC SMP and a Linux cluster demonstrate that the algorithm performs well when applied to the unstructured meshes required for practical engineering analysis. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献
2.
Abstract Traditionally, to program a distributed memory multiprocessor, a programmer is responsible for partitioning an application program into modules or tasks, scheduling tasks on processors, inserting communication primitives, and generating parallel codes for each processor manually. As both the number of processors and the complexity of problems to be solved increases, programming distributed memory multiprocessors becomes difficult and error‐prone. In a distributed memory multiprocessor, the program partitioning and scheduling play an important role in the performance of a parallel program. However, how to find the best program partitioning and scheduling so that the best performance of a parallel program on a distributed memory multiprocessor can be achieved, is not an easy task. In this paper, we present a parallel programming tool, PPT, to aid programmers to find the best program partitioning and scheduling and automatically generate the parallel code for the single program multiple data (SPMD) model on a distributed memory multiprocessor. An example of designing a parallel FFT program by using PPT on an NCUBE‐2 is also presented. 相似文献
3.
4.
Arora V. Jone W.B. Huang D.C. Das S.R. 《IEEE transactions on instrumentation and measurement》2004,53(4):915-932
In this paper, we propose a built-in self-diagnostic march-based algorithm that identifies faulty memory cells based on a recently introduced nontraditional fault model. It is developed based on the DiagRSMarch algorithm, which is a diagnostic algorithm to identify traditional faults for embedded memory arrays. A minimal set of additional operations is added to DiagRSMarch for identifying the nontraditional faults without affecting the diagnostic coverage of the traditional faults. The embedded memory arrays are accessed using a bidirectional serial interfacing architecture which minimizes the routing overhead introduced by the diagnosis hardware. Using the concepts of the bidirectional interfacing technique, parallel testing, and redundant-tolerant operations, the diagnostic process can be accomplished efficiently at-speed with minimal hardware overhead. 相似文献
5.
A numerical solution of the three dimensional frictionless contact problem and its data parallel implementation on the Connection Machine system CM-2 is presented. The numerical solution is obtained by means of boundary element discretization of a variational inequality and related extremum principle; the associated Green's function is approximated by means of standard direct boundary element procedure. A numerical method is applicable to any kind of geometry of the contacting bodies under arbitrary loading. The example presented illustrate a distinct ability of the method to capture the influence of the shape and the size of a body on the contact area and the pressure acting in it. It has been demonstrated that the symmetry properties of the Green's operator hold only asymptotically for the discretized problem. 相似文献
6.
B. Li M. Stalzer M. Ortiz 《International journal for numerical methods in engineering》2014,100(1):40-61
Presented is a massively parallel implementation of the Optimal Transportation Meshfree (pOTM) method Li et al., 2010 for explicit solid dynamics. Its implementation is based on a two‐level scheme using Message Passing Interface between compute servers and threaded parallelism on the multi‐core processors within each server. Both layers dynamically subdivide the problem to provide excellent parallel scalability. pOTM is used on three problems and compared to experiments to demonstrate accuracy and performance. For both a Taylor‐anvil and a hypervelocity impact problem, the pOTM implementation scales nearly perfectly to about 8000 cores. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
7.
Valerie E. Taylor Bahram Nour-Omid 《International journal for numerical methods in engineering》1994,37(22):3809-3823
In this paper we investigate the additional storage overhead needed for a parallel implementation of finite element applications. In particular, we compare the storage requirements for the factorization of the sparse matrices that would occur on a parallel processor vs. a uniprocessor. This variation in storage results from the factorization fill-in. We address the question of whether the storage overhead is so large for parallel implementations that it imposes severe limitations on the problem size in contrast to the problems executed sequentially on a uniprocessor. The storage requirements for the parallel implementation are based upon a new ordering scheme, the combination mesh-based scheme. This scheme uses a domain decomposition method which attempts to balance the processors' loads and decreases the interprocessor communication. The storage requirements for the sequential implementation is based upon the minimum degree algorithm. The difference between the two storage requirements corresponds to the storage overhead attributed to the parallel scheme. Experiments were conducted on regular and irregular, 2-D and 3-D problems. The meshes were decomposed into 2–256 subdomains which can be executed on 2–256 processors, respectively. The total storage requirements or fill-in for most of the 2-D problems were less than a factor of two increase over the sequential execution. In contrast, large 3-D problems had zero increase in storage or fill-in over the sequential execution; the fill-in was less for the parallel execution than the sequential execution. Thus, we conclude that the storage overhead attributed to the use of parallel processors will not impose severe constraints on the problem size. Further, for large 3-D applications, the combination mesh-based algorithm does better than minimum degree for reducing the fill-in. 相似文献
8.
Optoelectronic parallel watershed implementation for segmentation of magnetic resonance brain images
An optoelectronic implementation for the morphological watershed transform is proposed. Fiber-optic programmable logic arrays are used in the implementation because of their high fan factors at high clock speeds. Image segmentation is one of the main applications of the watershed transform. Based on the optoelectronic implementation, an algorithm for the segmentation of axial magnetic resonance (MR) head images to extract information on brain matter is presented. Simulation results for the different steps of the segmentation process are presented. 相似文献
9.
针对普通低密度校验(LDPC)码制约行列联合(JRC)译码算法并行度提高的问题,基于块渐进边增长(BPEG)算法,提出了一种用于并行JRC译码的LDPC码构造方法.该方法构造的准循环LDPC码(QC-LDPC)基矩阵由含r(r为大于1的整数)行的行组构成,允许一个行组内的r行进行并行JRC运算.仿真结果表明,用上述构造方法构造的LDPC码与BPEG码的误码性能相当.硬件实现表明,用此构造码的并行译码器的速率能达到典型传统准循环译码器的3倍以上,为面向译码器的LDPC码构造提供了范例. 相似文献
10.
Reconfigurable manufacturing systems (RMSs) have been recognized as a new manufacturing paradigm. In light of their enhanced flexibility and responsiveness, RMSs are considered to be mostly applicable to the very dynamic and unpredictable marketplaces of the near future. However, systematic approaches to the design and ramp-up of an RMS have not been well addressed. This paper presents a virtual production line-based (VPL) approach to the design and operation of a reconfigurable manufacturing system. Shop floor attributed finite-capacity automaton and VPL attributed finite-capacity automaton are proposed for modelling the control of an RMS, which leads to ease of control software development. Algorithms for balancing VPLs to maximize the productivity of an RMS are discussed. The results of simulation runs of the proposed methodology and algorithms applied to simplified back-end semiconductor manufacturing are provided. 相似文献
11.
《高技术通讯》2015,(7)
为了提高非一致内存访问(NUMA)架构虚拟机解释器的访存性能,研究了解释器在NUMA架构下的访存优化技术,提出了一种NUMA架构下的解释器访存优化方案,而且设计并实现了解释器的静态指令分派优化方法和动态指令分派优化方法。根据这一方案虚拟机启动时首先获取NUMA节点信息,并在每个NUMA节点中自动生成解释器所需的全部数据结构;解释器在运行时,通过静态或动态的指令分派技术来实现其执行线程在NUMA节点上访存的局部化。试验结果表明,上述方法能够显著提升解释器在NUMA系统中的性能。在DaCapo测试集上的总体性能提升了8%,最高性能提升幅度高达23%,而且算法实现代价低,适用于绝大多数的NUMA服务器系统。 相似文献
12.
The effect of double erasure on Monolayer Bacteriorhodopsin (BR) protein films after photonic excitation to the ultra stable Q-state is studied. It was found that the pronounced emission of 755 nm light occurs only as the protein is made to transition from the Q-state to the ground state via irradiation with blue light. Requirements for the implementation of a next generation Protein-Based Memory (PBM) device utilizing monolayer BR films are considered. The finite element method was used to simulate the optical intensity distribution of nano-aperture waveguides for Red (650 nm), Green (510 nm) and Blue (475 nm) light to analyze the utility of nanoaperture transducers for use in a Protein Based Memory device. The minimum output power required to induce a photochromic transition in BR is calculated to be between 20 nW and 27 nW on a 30 nm spot depending upon the operating wavelength. 相似文献
13.
14.
Kim H Lee IM Lee B 《Journal of the Optical Society of America. A, Optics, image science, and vision》2007,24(8):2313-2327
An extended and refined scattering-matrix method is proposed for the efficient full parallel implementation of rigorous coupled-wave analysis of multilayer structures. The total electromagnetic field distribution in the rigorous coupled-wave analysis is represented by the linear combination of the eigenmodes with their own coupling coefficients. In the proposed scheme, a refined recursion relation of the coupling coefficients of the eigenmodes is defined for complete parallel computation of the electromagnetic field distributions within multilayer structures. 相似文献
15.
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems
approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve
the convergence of the desired portion of the spectrum, addressing issues such as memory requirements and computational costs
related to the generation and storage of the Krylov basis. The presented implementation is suitable for distributed memory
architectures, especially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping
and non-overlapping mesh partitions were tested. Compressed data structures in the formats CSRC and CSRC/CSR were used to
store the coefficient matrices. The parallelization of numerical linear algebra operations present in both Krylov and implicitly
restarted methods are discussed. Numerical examples are shown, in order to point out the efficiency and applicability of the
proposed method. 相似文献
16.
We are concerned with the coding of subimage-transformed elemental images to solve the problems of data transmission and storage in three-dimensional (3D) integral imaging in this paper. First, we use the subimage transform for preprocessing of the elemental image array (EIA). Because of the similarity of correlation distributions between the subimage array (SIA) and multiview video, we present a hierarchical prediction structure for SIA coding based on the hierarchical B picture (HBP) structure for multiview video coding. Moreover, we design a multithreaded parallel implementation for the proposed structure according to inter-row prediction dependencies. Experiments are performed on both EIAs and SIAs. The results show that employing the same coding strategy, the proposed parallel implemented HBP scheme achieves not only higher image quality and better 3D effect but also lower coding delay at low bit rates compared with the previously reported Hilbert-curve-based scheme. 相似文献
17.
We propose a multilayer associative memory with a winner-take-all operation on the inner product between an input and stored exemplars. The winner-take-all operation is performed by a unit-step operation with an adaptive-threshold strategy. We show that the multilayer-associative-memory unit-step operation with an adaptive-threshold strategy has a high noise immunity and a large storage capacity, and it is also capable of extending to a gray-level associative memory with a phase-representation technique. A hybrid optical implementation with a proof-of-concept experiment is also provided. 相似文献
18.
Wenchuang Hu Sarveswaran K. Lieberman M. Bernstein G.H. 《Nanotechnology, IEEE Transactions on》2005,4(3):312-316
Electron beam lithography (EBL) patterning of poly(methylmethacrylate) (PMMA) is a versatile tool for defining molecular structures on the sub-10-nm scale. We demonstrate lithographic resolution to about 5 nm using a cold-development technique. Liftoff of sub-10-nm Au nanoparticles and metal lines proves that cold development completely clears the PMMA residue on the exposed areas. Molecular liftoff is performed to pattern DNA rafts with high fidelity at linewidths of about 100 nm. High-resolution EBL and molecular liftoff can be applied to pattern Creutz-Taube molecules on the scale of a few nanometers for quantum-dot cellular automata. 相似文献
19.