首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We describe the parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems. We focus on a distributed memory programming model and use the MPI library to perform the required interprocessor communications. We present an algebraic framework for our parallel computations, and describe an object‐based programming methodology using Fortran90. The performance of the implementation is measured by solving both fixed‐ and scaled‐size problems on three different parallel computers (an SGI Origin2000, an IBM SP2 and a Cray T3E). The code performs well in terms of speedup, parallel efficiency and scalability. However, the floating point performance is considerably below the peak values attributed to these machines. Lazy processors are documented on the Origin that produce reduced performance statistics. The solution of two problems on an SGI Origin2000, an IBM PowerPC SMP and a Linux cluster demonstrate that the algorithm performs well when applied to the unstructured meshes required for practical engineering analysis. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

2.
Abstract

Traditionally, to program a distributed memory multiprocessor, a programmer is responsible for partitioning an application program into modules or tasks, scheduling tasks on processors, inserting communication primitives, and generating parallel codes for each processor manually. As both the number of processors and the complexity of problems to be solved increases, programming distributed memory multiprocessors becomes difficult and error‐prone. In a distributed memory multiprocessor, the program partitioning and scheduling play an important role in the performance of a parallel program. However, how to find the best program partitioning and scheduling so that the best performance of a parallel program on a distributed memory multiprocessor can be achieved, is not an easy task. In this paper, we present a parallel programming tool, PPT, to aid programmers to find the best program partitioning and scheduling and automatically generate the parallel code for the single program multiple data (SPMD) model on a distributed memory multiprocessor. An example of designing a parallel FFT program by using PPT on an NCUBE‐2 is also presented.  相似文献   

3.
研究了通用并行化循环冗余校验(CRC)编码结构,分析了限制编码速度提高的主要原因,根据多项式理论推导了并行CRC编码的一般化方法.在此基础上,根据线性代数中的有理标准型理论对编码结构中的反馈运算矩阵进行相似变换,提出了CRC编码的高速流水线并行结构,并设计实现了多种不同并行度下的CRC编码器.设计结果表明,高速流水线并行CRC编码器结构相对于其他结构具有最优的编码速度和最优的时序特性,可以满足高速数据完整性校验的需求.  相似文献   

4.
In this paper, we propose a built-in self-diagnostic march-based algorithm that identifies faulty memory cells based on a recently introduced nontraditional fault model. It is developed based on the DiagRSMarch algorithm, which is a diagnostic algorithm to identify traditional faults for embedded memory arrays. A minimal set of additional operations is added to DiagRSMarch for identifying the nontraditional faults without affecting the diagnostic coverage of the traditional faults. The embedded memory arrays are accessed using a bidirectional serial interfacing architecture which minimizes the routing overhead introduced by the diagnosis hardware. Using the concepts of the bidirectional interfacing technique, parallel testing, and redundant-tolerant operations, the diagnostic process can be accomplished efficiently at-speed with minimal hardware overhead.  相似文献   

5.
A numerical solution of the three dimensional frictionless contact problem and its data parallel implementation on the Connection Machine system CM-2 is presented. The numerical solution is obtained by means of boundary element discretization of a variational inequality and related extremum principle; the associated Green's function is approximated by means of standard direct boundary element procedure. A numerical method is applicable to any kind of geometry of the contacting bodies under arbitrary loading. The example presented illustrate a distinct ability of the method to capture the influence of the shape and the size of a body on the contact area and the pressure acting in it. It has been demonstrated that the symmetry properties of the Green's operator hold only asymptotically for the discretized problem.  相似文献   

6.
Presented is a massively parallel implementation of the Optimal Transportation Meshfree (pOTM) method Li et al., 2010 for explicit solid dynamics. Its implementation is based on a two‐level scheme using Message Passing Interface between compute servers and threaded parallelism on the multi‐core processors within each server. Both layers dynamically subdivide the problem to provide excellent parallel scalability. pOTM is used on three problems and compared to experiments to demonstrate accuracy and performance. For both a Taylor‐anvil and a hypervelocity impact problem, the pOTM implementation scales nearly perfectly to about 8000 cores. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

7.
In this paper we investigate the additional storage overhead needed for a parallel implementation of finite element applications. In particular, we compare the storage requirements for the factorization of the sparse matrices that would occur on a parallel processor vs. a uniprocessor. This variation in storage results from the factorization fill-in. We address the question of whether the storage overhead is so large for parallel implementations that it imposes severe limitations on the problem size in contrast to the problems executed sequentially on a uniprocessor. The storage requirements for the parallel implementation are based upon a new ordering scheme, the combination mesh-based scheme. This scheme uses a domain decomposition method which attempts to balance the processors' loads and decreases the interprocessor communication. The storage requirements for the sequential implementation is based upon the minimum degree algorithm. The difference between the two storage requirements corresponds to the storage overhead attributed to the parallel scheme. Experiments were conducted on regular and irregular, 2-D and 3-D problems. The meshes were decomposed into 2–256 subdomains which can be executed on 2–256 processors, respectively. The total storage requirements or fill-in for most of the 2-D problems were less than a factor of two increase over the sequential execution. In contrast, large 3-D problems had zero increase in storage or fill-in over the sequential execution; the fill-in was less for the parallel execution than the sequential execution. Thus, we conclude that the storage overhead attributed to the use of parallel processors will not impose severe constraints on the problem size. Further, for large 3-D applications, the combination mesh-based algorithm does better than minimum degree for reducing the fill-in.  相似文献   

8.
Michael N  Arrathoon R 《Applied optics》1997,36(35):9269-9286
An optoelectronic implementation for the morphological watershed transform is proposed. Fiber-optic programmable logic arrays are used in the implementation because of their high fan factors at high clock speeds. Image segmentation is one of the main applications of the watershed transform. Based on the optoelectronic implementation, an algorithm for the segmentation of axial magnetic resonance (MR) head images to extract information on brain matter is presented. Simulation results for the different steps of the segmentation process are presented.  相似文献   

9.
针对普通低密度校验(LDPC)码制约行列联合(JRC)译码算法并行度提高的问题,基于块渐进边增长(BPEG)算法,提出了一种用于并行JRC译码的LDPC码构造方法.该方法构造的准循环LDPC码(QC-LDPC)基矩阵由含r(r为大于1的整数)行的行组构成,允许一个行组内的r行进行并行JRC运算.仿真结果表明,用上述构造方法构造的LDPC码与BPEG码的误码性能相当.硬件实现表明,用此构造码的并行译码器的速率能达到典型传统准循环译码器的3倍以上,为面向译码器的LDPC码构造提供了范例.  相似文献   

10.
Reconfigurable manufacturing systems (RMSs) have been recognized as a new manufacturing paradigm. In light of their enhanced flexibility and responsiveness, RMSs are considered to be mostly applicable to the very dynamic and unpredictable marketplaces of the near future. However, systematic approaches to the design and ramp-up of an RMS have not been well addressed. This paper presents a virtual production line-based (VPL) approach to the design and operation of a reconfigurable manufacturing system. Shop floor attributed finite-capacity automaton and VPL attributed finite-capacity automaton are proposed for modelling the control of an RMS, which leads to ease of control software development. Algorithms for balancing VPLs to maximize the productivity of an RMS are discussed. The results of simulation runs of the proposed methodology and algorithms applied to simplified back-end semiconductor manufacturing are provided.  相似文献   

11.
为了提高非一致内存访问(NUMA)架构虚拟机解释器的访存性能,研究了解释器在NUMA架构下的访存优化技术,提出了一种NUMA架构下的解释器访存优化方案,而且设计并实现了解释器的静态指令分派优化方法和动态指令分派优化方法。根据这一方案虚拟机启动时首先获取NUMA节点信息,并在每个NUMA节点中自动生成解释器所需的全部数据结构;解释器在运行时,通过静态或动态的指令分派技术来实现其执行线程在NUMA节点上访存的局部化。试验结果表明,上述方法能够显著提升解释器在NUMA系统中的性能。在DaCapo测试集上的总体性能提升了8%,最高性能提升幅度高达23%,而且算法实现代价低,适用于绝大多数的NUMA服务器系统。  相似文献   

12.
The effect of double erasure on Monolayer Bacteriorhodopsin (BR) protein films after photonic excitation to the ultra stable Q-state is studied. It was found that the pronounced emission of 755 nm light occurs only as the protein is made to transition from the Q-state to the ground state via irradiation with blue light. Requirements for the implementation of a next generation Protein-Based Memory (PBM) device utilizing monolayer BR films are considered. The finite element method was used to simulate the optical intensity distribution of nano-aperture waveguides for Red (650 nm), Green (510 nm) and Blue (475 nm) light to analyze the utility of nanoaperture transducers for use in a Protein Based Memory device. The minimum output power required to induce a photochromic transition in BR is calculated to be between 20 nW and 27 nW on a 30 nm spot depending upon the operating wavelength.  相似文献   

13.
Nanotechnologies, including molecular quantum dot cellular automata (QCA), are susceptible to high error rates. In this paper, we present the design of concurrently testable latches ($D$ latch, $T$ latch, JK latch, and SR latch), which are based on reversible conservative logic for molecular QCA. Conservative reversible circuits are a specific type of reversible circuits, in which there would be an equal number of 1's in the outputs as there would be on the inputs, in addition to one-to-one mapping. Thus, conservative logic is parity-preserving, i.e., the parity of the input vectors is equal to that of the output vectors. We analyzed the fault patterns in the conservative reversible Fredkin gate due to a single missing/additional cell defect in molecular QCA. We found that if there is a fault in the molecular QCA implementation of Fredkin gate, there is a parity mismatch between the inputs and the outputs, otherwise the inputs parity is the same as outputs parity. Any permanent or transient fault in molecular QCA can be concurrently detected if implemented with the conservative Fredkin gate. The design of QCA layouts and the verification of the latch designs using the QCADesigner and the HDLQ tool are presented.   相似文献   

14.
An extended and refined scattering-matrix method is proposed for the efficient full parallel implementation of rigorous coupled-wave analysis of multilayer structures. The total electromagnetic field distribution in the rigorous coupled-wave analysis is represented by the linear combination of the eigenmodes with their own coupling coefficients. In the proposed scheme, a refined recursion relation of the coupling coefficients of the eigenmodes is defined for complete parallel computation of the electromagnetic field distributions within multilayer structures.  相似文献   

15.
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve the convergence of the desired portion of the spectrum, addressing issues such as memory requirements and computational costs related to the generation and storage of the Krylov basis. The presented implementation is suitable for distributed memory architectures, especially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping and non-overlapping mesh partitions were tested. Compressed data structures in the formats CSRC and CSRC/CSR were used to store the coefficient matrices. The parallelization of numerical linear algebra operations present in both Krylov and implicitly restarted methods are discussed. Numerical examples are shown, in order to point out the efficiency and applicability of the proposed method.  相似文献   

16.
Wei J  Wang S  Zhao Y  Jin F 《Applied optics》2011,50(12):1707-1716
We are concerned with the coding of subimage-transformed elemental images to solve the problems of data transmission and storage in three-dimensional (3D) integral imaging in this paper. First, we use the subimage transform for preprocessing of the elemental image array (EIA). Because of the similarity of correlation distributions between the subimage array (SIA) and multiview video, we present a hierarchical prediction structure for SIA coding based on the hierarchical B picture (HBP) structure for multiview video coding. Moreover, we design a multithreaded parallel implementation for the proposed structure according to inter-row prediction dependencies. Experiments are performed on both EIAs and SIAs. The results show that employing the same coding strategy, the proposed parallel implemented HBP scheme achieves not only higher image quality and better 3D effect but also lower coding delay at low bit rates compared with the previously reported Hilbert-curve-based scheme.  相似文献   

17.
Lu G  Lu M  Yu FT 《Applied optics》1995,34(23):5109-5117
We propose a multilayer associative memory with a winner-take-all operation on the inner product between an input and stored exemplars. The winner-take-all operation is performed by a unit-step operation with an adaptive-threshold strategy. We show that the multilayer-associative-memory unit-step operation with an adaptive-threshold strategy has a high noise immunity and a large storage capacity, and it is also capable of extending to a gray-level associative memory with a phase-representation technique. A hybrid optical implementation with a proof-of-concept experiment is also provided.  相似文献   

18.
Electron beam lithography (EBL) patterning of poly(methylmethacrylate) (PMMA) is a versatile tool for defining molecular structures on the sub-10-nm scale. We demonstrate lithographic resolution to about 5 nm using a cold-development technique. Liftoff of sub-10-nm Au nanoparticles and metal lines proves that cold development completely clears the PMMA residue on the exposed areas. Molecular liftoff is performed to pattern DNA rafts with high fidelity at linewidths of about 100 nm. High-resolution EBL and molecular liftoff can be applied to pattern Creutz-Taube molecules on the scale of a few nanometers for quantum-dot cellular automata.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号