期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A distributed memory parallel implementation of the multigrid method for solving three‐dimensional implicit solid mechanics problems

A. Namazifard I. D. Parsons 《International journal for numerical methods in engineering》2004,61(8):1173-1208

We describe the parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems. We focus on a distributed memory programming model and use the MPI library to perform the required interprocessor communications. We present an algebraic framework for our parallel computations, and describe an object‐based programming methodology using Fortran90. The performance of the implementation is measured by solving both fixed‐ and scaled‐size problems on three different parallel computers (an SGI Origin2000, an IBM SP2 and a Cray T3E). The code performs well in terms of speedup, parallel efficiency and scalability. However, the floating point performance is considerably below the peak values attributed to these machines. Lazy processors are documented on the Origin that produce reduced performance statistics. The solution of two problems on an SGI Origin2000, an IBM PowerPC SMP and a Linux cluster demonstrate that the algorithm performs well when applied to the unstructured meshes required for practical engineering analysis. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

2.

PPT: A parallel programming tool for distributed memory multiprocessors

Yeh‐Ching Chung Wu‐Hsun Ho Chia‐Cheng Liu 《中国工程学刊》2013,36(3):365-378

Abstract

Traditionally, to program a distributed memory multiprocessor, a programmer is responsible for partitioning an application program into modules or tasks, scheduling tasks on processors, inserting communication primitives, and generating parallel codes for each processor manually. As both the number of processors and the complexity of problems to be solved increases, programming distributed memory multiprocessors becomes difficult and error‐prone. In a distributed memory multiprocessor, the program partitioning and scheduling play an important role in the performance of a parallel program. However, how to find the best program partitioning and scheduling so that the best performance of a parallel program on a distributed memory multiprocessor can be achieved, is not an easy task. In this paper, we present a parallel programming tool, PPT, to aid programmers to find the best program partitioning and scheduling and automatically generate the parallel code for the single program multiple data (SPMD) model on a distributed memory multiprocessor. An example of designing a parallel FFT program by using PPT on an NCUBE‐2 is also presented. 相似文献

3.

基于矩阵相似变换的并行流水线CRC实现

苏厉金德鹏曾烈光《高技术通讯》2007,17(9):902-906

研究了通用并行化循环冗余校验(CRC)编码结构,分析了限制编码速度提高的主要原因,根据多项式理论推导了并行CRC编码的一般化方法.在此基础上,根据线性代数中的有理标准型理论对编码结构中的反馈运算矩阵进行相似变换,提出了CRC编码的高速流水线并行结构,并设计实现了多种不同并行度下的CRC编码器.设计结果表明,高速流水线并行CRC编码器结构相对于其他结构具有最优的编码速度和最优的时序特性,可以满足高速数据完整性校验的需求. 相似文献

4.

A parallel built-in self-diagnostic method for nontraditional faults of embedded memory arrays

Arora V. Jone W.B. Huang D.C. Das S.R. 《IEEE transactions on instrumentation and measurement》2004,53(4):915-932

In this paper, we propose a built-in self-diagnostic march-based algorithm that identifies faulty memory cells based on a recently introduced nontraditional fault model. It is developed based on the DiagRSMarch algorithm, which is a diagnostic algorithm to identify traditional faults for embedded memory arrays. A minimal set of additional operations is added to DiagRSMarch for identifying the nontraditional faults without affecting the diagnostic coverage of the traditional faults. The embedded memory arrays are accessed using a bidirectional serial interfacing architecture which minimizes the routing overhead introduced by the diagnosis hardware. Using the concepts of the bidirectional interfacing technique, parallel testing, and redundant-tolerant operations, the diagnostic process can be accomplished efficiently at-speed with minimal hardware overhead. 相似文献

5.

Unilateral contact problem for finite bodies - parallel implementation

V. L. Rabinovich S. R. Sipcic 《Computational Mechanics》1994,13(6):414-426

A numerical solution of the three dimensional frictionless contact problem and its data parallel implementation on the Connection Machine system CM-2 is presented. The numerical solution is obtained by means of boundary element discretization of a variational inequality and related extremum principle; the associated Green's function is approximated by means of standard direct boundary element procedure. A numerical method is applicable to any kind of geometry of the contacting bodies under arbitrary loading. The example presented illustrate a distinct ability of the method to capture the influence of the shape and the size of a body on the contact area and the pressure acting in it. It has been demonstrated that the symmetry properties of the Green's operator hold only asymptotically for the discretized problem. 相似文献

6.

A massively parallel implementation of the Optimal Transportation Meshfree method for explicit solid dynamics

B. Li M. Stalzer M. Ortiz 《International journal for numerical methods in engineering》2014,100(1):40-61

Presented is a massively parallel implementation of the Optimal Transportation Meshfree (pOTM) method Li et al., 2010 for explicit solid dynamics. Its implementation is based on a two‐level scheme using Message Passing Interface between compute servers and threaded parallelism on the multi‐core processors within each server. Both layers dynamically subdivide the problem to provide excellent parallel scalability. pOTM is used on three problems and compared to experiments to demonstrate accuracy and performance. For both a Taylor‐anvil and a hypervelocity impact problem, the pOTM implementation scales nearly perfectly to about 8000 cores. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

7.

A study of the factorization fill-in for a parallel implementation of the finite element method

Valerie E. Taylor Bahram Nour-Omid 《International journal for numerical methods in engineering》1994,37(22):3809-3823

In this paper we investigate the additional storage overhead needed for a parallel implementation of finite element applications. In particular, we compare the storage requirements for the factorization of the sparse matrices that would occur on a parallel processor vs. a uniprocessor. This variation in storage results from the factorization fill-in. We address the question of whether the storage overhead is so large for parallel implementations that it imposes severe limitations on the problem size in contrast to the problems executed sequentially on a uniprocessor. The storage requirements for the parallel implementation are based upon a new ordering scheme, the combination mesh-based scheme. This scheme uses a domain decomposition method which attempts to balance the processors' loads and decreases the interprocessor communication. The storage requirements for the sequential implementation is based upon the minimum degree algorithm. The difference between the two storage requirements corresponds to the storage overhead attributed to the parallel scheme. Experiments were conducted on regular and irregular, 2-D and 3-D problems. The meshes were decomposed into 2–256 subdomains which can be executed on 2–256 processors, respectively. The total storage requirements or fill-in for most of the 2-D problems were less than a factor of two increase over the sequential execution. In contrast, large 3-D problems had zero increase in storage or fill-in over the sequential execution; the fill-in was less for the parallel execution than the sequential execution. Thus, we conclude that the storage overhead attributed to the use of parallel processors will not impose severe constraints on the problem size. Further, for large 3-D applications, the combination mesh-based algorithm does better than minimum degree for reducing the fill-in. 相似文献

8.

Optoelectronic parallel watershed implementation for segmentation of magnetic resonance brain images

Michael N Arrathoon R 《Applied optics》1997,36(35):9269-9286

An optoelectronic implementation for the morphological watershed transform is proposed. Fiber-optic programmable logic arrays are used in the implementation because of their high fan factors at high clock speeds. Image segmentation is one of the main applications of the watershed transform. Based on the optoelectronic implementation, an algorithm for the segmentation of axial magnetic resonance (MR) head images to extract information on brain matter is presented. Simulation results for the different steps of the segmentation process are presented. 相似文献

9.

Quasi-cyclic LDPC code construction for implementation of parallel joint row-column decoding

Dong Mingke Wang Da Zheng Yadan Xiang Haige 《高技术通讯》2012,22(6)

针对普通低密度校验(LDPC)码制约行列联合(JRC)译码算法并行度提高的问题,基于块渐进边增长(BPEG)算法,提出了一种用于并行JRC译码的LDPC码构造方法.该方法构造的准循环LDPC码(QC-LDPC)基矩阵由含r(r为大于1的整数)行的行组构成,允许一个行组内的r行进行并行JRC运算.仿真结果表明,用上述构造方法构造的LDPC码与BPEG码的误码性能相当.硬件实现表明,用此构造码的并行译码器的速率能达到典型传统准循环译码器的3倍以上,为面向译码器的LDPC码构造提供了范例. 相似文献

10.

Integrated design approach for virtual production line-based reconfigurable manufacturing systems

Y. Tang R. G. Qiu 《国际生产研究杂志》2013,51(18):3803-3822

Reconfigurable manufacturing systems (RMSs) have been recognized as a new manufacturing paradigm. In light of their enhanced flexibility and responsiveness, RMSs are considered to be mostly applicable to the very dynamic and unpredictable marketplaces of the near future. However, systematic approaches to the design and ramp-up of an RMS have not been well addressed. This paper presents a virtual production line-based (VPL) approach to the design and operation of a reconfigurable manufacturing system. Shop floor attributed finite-capacity automaton and VPL attributed finite-capacity automaton are proposed for modelling the control of an RMS, which leads to ease of control software development. Algorithms for balancing VPLs to maximize the productivity of an RMS are discussed. The results of simulation runs of the proposed methodology and algorithms applied to simplified back-end semiconductor manufacturing are provided. 相似文献

11.

基于NUMA架构的解释器访存优化设计与实现

《高技术通讯》2015,(7)

为了提高非一致内存访问(NUMA)架构虚拟机解释器的访存性能,研究了解释器在NUMA架构下的访存优化技术,提出了一种NUMA架构下的解释器访存优化方案,而且设计并实现了解释器的静态指令分派优化方法和动态指令分派优化方法。根据这一方案虚拟机启动时首先获取NUMA节点信息,并在每个NUMA节点中自动生成解释器所需的全部数据结构;解释器在运行时,通过静态或动态的指令分派技术来实现其执行线程在NUMA节点上访存的局部化。试验结果表明,上述方法能够显著提升解释器在NUMA系统中的性能。在DaCapo测试集上的总体性能提升了8%,最高性能提升幅度高达23%,而且算法实现代价低,适用于绝大多数的NUMA服务器系统。相似文献

12.

Considerations for the implementation of 2D protein based memory

Hudgins M Khizroev S 《Journal of nanoscience and nanotechnology》2011,11(3):2520-2523

The effect of double erasure on Monolayer Bacteriorhodopsin (BR) protein films after photonic excitation to the ultra stable Q-state is studied. It was found that the pronounced emission of 755 nm light occurs only as the protein is made to transition from the Q-state to the ground state via irradiation with blue light. Requirements for the implementation of a next generation Protein-Based Memory (PBM) device utilizing monolayer BR films are considered. The finite element method was used to simulate the optical intensity distribution of nano-aperture waveguides for Red (650 nm), Green (510 nm) and Blue (475 nm) light to analyze the utility of nanoaperture transducers for use in a Protein Based Memory device. The minimum output power required to induce a photochromic transition in BR is calculated to be between 20 nW and 27 nW on a 30 nm spot depending upon the operating wavelength. 相似文献

13.

Reversible Logic-Based Concurrently Testable Latches for Molecular QCA

Thapliyal H. Ranganathan N. 《Nanotechnology, IEEE Transactions on》2010,9(1):62-69

Nanotechnologies, including molecular quantum dot cellular automata (QCA), are susceptible to high error rates. In this paper, we present the design of concurrently testable latches ($D$ latch, $T$ latch, JK latch, and SR latch), which are based on reversible conservative logic for molecular QCA. Conservative reversible circuits are a specific type of reversible circuits, in which there would be an equal number of 1's in the outputs as there would be on the inputs, in addition to one-to-one mapping. Thus, conservative logic is parity-preserving, i.e., the parity of the input vectors is equal to that of the output vectors. We analyzed the fault patterns in the conservative reversible Fredkin gate due to a single missing/additional cell defect in molecular QCA. We found that if there is a fault in the molecular QCA implementation of Fredkin gate, there is a parity mismatch between the inputs and the outputs, otherwise the inputs parity is the same as outputs parity. Any permanent or transient fault in molecular QCA can be concurrently detected if implemented with the conservative Fredkin gate. The design of QCA layouts and the verification of the latch designs using the QCADesigner and the HDLQ tool are presented. 相似文献

14.

Extended scattering-matrix method for efficient full parallel implementation of rigorous coupled-wave analysis

Kim H Lee IM Lee B 《Journal of the Optical Society of America. A, Optics, image science, and vision》2007,24(8):2313-2327

An extended and refined scattering-matrix method is proposed for the efficient full parallel implementation of rigorous coupled-wave analysis of multilayer structures. The total electromagnetic field distribution in the rigorous coupled-wave analysis is represented by the linear combination of the eigenmodes with their own coupling coefficients. In the proposed scheme, a refined recursion relation of the coupling coefficients of the eigenmodes is defined for complete parallel computation of the electromagnetic field distributions within multilayer structures. 相似文献

15.

A parallel subdomain by subdomain implementation of the implicitly restarted Arnoldi/Lanczos method

G. O. AinsworthJr. F. L. B. Ribeiro C. Magluta 《Computational Mechanics》2011,48(5):563-577

This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve the convergence of the desired portion of the spectrum, addressing issues such as memory requirements and computational costs related to the generation and storage of the Krylov basis. The presented implementation is suitable for distributed memory architectures, especially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping and non-overlapping mesh partitions were tested. Compressed data structures in the formats CSRC and CSRC/CSR were used to store the coefficient matrices. The parallelization of numerical linear algebra operations present in both Krylov and implicitly restarted methods are discussed. Numerical examples are shown, in order to point out the efficiency and applicability of the proposed method. 相似文献

16.

Hierarchical prediction structure for subimage coding and multithreaded parallel implementation in integral imaging

Wei J Wang S Zhao Y Jin F 《Applied optics》2011,50(12):1707-1716

We are concerned with the coding of subimage-transformed elemental images to solve the problems of data transmission and storage in three-dimensional (3D) integral imaging in this paper. First, we use the subimage transform for preprocessing of the elemental image array (EIA). Because of the similarity of correlation distributions between the subimage array (SIA) and multiview video, we present a hierarchical prediction structure for SIA coding based on the hierarchical B picture (HBP) structure for multiview video coding. Moreover, we design a multithreaded parallel implementation for the proposed structure according to inter-row prediction dependencies. Experiments are performed on both EIAs and SIAs. The results show that employing the same coding strategy, the proposed parallel implemented HBP scheme achieves not only higher image quality and better 3D effect but also lower coding delay at low bit rates compared with the previously reported Hilbert-curve-based scheme. 相似文献

17.

Multilayer associative memory and its hybrid optical implementation

Lu G Lu M Yu FT 《Applied optics》1995,34(23):5109-5117

We propose a multilayer associative memory with a winner-take-all operation on the inner product between an input and stored exemplars. The winner-take-all operation is performed by a unit-step operation with an adaptive-threshold strategy. We show that the multilayer-associative-memory unit-step operation with an adaptive-threshold strategy has a high noise immunity and a large storage capacity, and it is also capable of extending to a gray-level associative memory with a phase-representation technique. A hybrid optical implementation with a proof-of-concept experiment is also provided. 相似文献

18.

High-resolution electron beam lithography and DNA nano-patterning for molecular QCA 总被引：1，自引：0，他引：1

Wenchuang Hu Sarveswaran K. Lieberman M. Bernstein G.H. 《Nanotechnology, IEEE Transactions on》2005,4(3):312-316

Electron beam lithography (EBL) patterning of poly(methylmethacrylate) (PMMA) is a versatile tool for defining molecular structures on the sub-10-nm scale. We demonstrate lithographic resolution to about 5 nm using a cold-development technique. Liftoff of sub-10-nm Au nanoparticles and metal lines proves that cold development completely clears the PMMA residue on the exposed areas. Molecular liftoff is performed to pattern DNA rafts with high fidelity at linewidths of about 100 nm. High-resolution EBL and molecular liftoff can be applied to pattern Creutz-Taube molecules on the scale of a few nanometers for quantum-dot cellular automata. 相似文献

19.

Optical parallel array logic system. 2: A new system architecture without memory elements

Tanida J Ichioka Y 《Applied optics》1986,25(20):3751

相似文献

20.

Truth-table look-up parallel data processing using an optical content-addressable memory

Mirsalehi MM Gaylord TK 《Applied optics》1986,25(14):2277

相似文献