期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A high performance crashworthiness simulation system based on GPU

《Advances in Engineering Software》2015

Crashworthiness simulation system is one of the key computer-aided engineering (CAE) tools for the automobile industry and implies two potential conflicting requirements: accuracy and efficiency. A parallel crashworthiness simulation system based on graphics processing unit (GPU) architecture and the explicit finite element (FE) method is developed in this work. Implementation details with compute unified device architecture (CUDA) are considered. The entire parallel simulation system involves a parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty contact force calculation algorithm. Three basic GPU-based parallel strategies are suggested to meet the natural parallelism of the explicit FE algorithm. Two free GPU-based numerical calculation libraries, cuBLAS and Thrust, are introduced to decrease the difficulty of programming. Furthermore, a mixed array and a thread map to element strategy are proposed to improve the performance of the test pairs searching. The outer loop of the nested loop through the mixed array is unrolled to realize parallel searching. An efficient storage strategy based on data sorting is presented to realize data transfer between different hierarchies with coalesced access during the contact pairs searching. A thread map to element pattern is implemented to calculate the penetrations and the penetration forces; a double float atomic operation is used to scatter contact forces. The simulation results of the three different models based on the Intel Core i7-930 and the NVIDIA GeForce GTX 580 demonstrate the precision and efficiency of this developed parallel crashworthiness simulation system. 相似文献

2.

A contribution to the real-time simulation of coupled finite element models of machine tools – A numerical comparison

S. Hoher S. Röck 《Simulation Modelling Practice and Theory》2011,19(7):1627-1639

In this paper the real-time simulation of finite element (FE) models of machine tools on a multi-processor architecture is presented. The simulation model is based on several FE component models that are connected by non-linear couplings. These couplings allow relative motions of the components in a wide range. The coupled linear FE models are decomposed at the non-linear coupling nodes and each component is solved locally. The linear structure of the components can be used for efficient simulation methods and the components can be distributed to several processors for a parallel computation. Methods that differ in numerical accuracy and stability, computational effort and real-time capacity will be presented. By means of a complex example, it will be illustrated that a parallel, stable computation can be realized time-deterministically. 相似文献

3.

基于h型自适应有限元法在薄板冲压成型中的应用

下载免费PDF全文

刘习洲王城璟王琥《图学学报》2021,42(6):970-978

在对薄板冲压成型这一过程进行有限元仿真分析时,难以精确分析场变量发生剧烈变化的应力集中及应变梯度大的区域,如何平衡精度和效率间的关系是冲压成型仿真的关键.因此,基于非线性有限元大变形的相关理论,针对动态仿真的网格自适应关键技术,建立了自适应分析模式下的薄板冲压成型算法.为了提高计算精度,提出基于单元应变能增量的能量误差... 相似文献

4.

基于Direct3D的有限元后处理技术

甘海董湘怀《计算机工程与科学》2011,33(8):122-127

本文针对板料成形模拟和树脂材料填充过程模拟结果的显示要求,根据面向对象的软件开发思想,结合计算机图形学理论,研究开发了一套有限元后处理系统.该系统采用MVC三层架构,基于Di-rect3D图形编程接口,在.Net框架上进行开发.本文着重研究了后处理过程中的多项关键技术的应用,包括数据插值处理、等值线生成、图元拾取、空间... 相似文献

5.

一种均衡可扩展计算机体系结构分布式模拟方法

徐传福车永刚王正华彭宇行《软件学报》2014,25(8):1844-1857

分布式并行模拟是提高体系结构模拟速度的有效技术手段之一.首先,建立了分布式并行模拟的通用性能分析模型,并对典型系统的并行加速比、并行效率等性质进行了理论分析,得出了一些有用的结论.在此基础上,提出了均衡可扩展分布式并行模拟方法SEDSim（scalable and evenly distributed simulation）.SEDSim 针对模拟节点负载不均衡问题,提出了开销模型指导的指令区间均衡分割和分配策略CoMEPA（cost model guided evenly partitionand allocation）;针对分布式并行模拟与非连续、任意数量抽样模拟区间的高效集成,提出了基于最小等价距离（minimum equivalent cost,简称MinEC）的指令区间分配策略MinEC.基于sim-outorder 实现了SEDSim,采用SPECCPU2000 中的部分程序对其速度和精度进行了测试,理论分析和测试结果均表明了SEDSim 的优势：相对于常用的方法或策略,CoMEPA 和MinEC 分别能够获得多达约1.6 倍和1.4 倍的性能提升. 相似文献

6.

基于CUDA的直升机旋翼桨叶挥舞角快速测量方法

熊邦书汪建勇黄建萍余磊《测控技术》2016,35(6):30-32

针对基于立体视觉的直升机旋翼桨叶挥舞角测量CPU串行算法耗时多、效率不高的问题,利用图像处理单元(GPU)并行计算的优势,提出一种基于CUDA统一计算设备构架的并行处理快速算法.首先,对算法中最耗时的图像去噪、阈值分割、连通域标记三部分进行并行化设计;然后,采用多层次并行策略将大量密集运算分配到不同的图像处理单元上并行执行,利用共享内存和共享寄存器加速数据访问;最后,进行多次测量实验,结果表明该方法执行效率明显高于CPU串行方法,可满足旋翼桨叶挥舞角快速测量的要求. 相似文献

7.

一种基于高压水射流的金属板材柔性渐进成形技术

李赳华何凯罗群毛贺杜如虚《集成技术》2012,1(3):42-46

本文提出了一种新型的金属板材无模成形方法,即高压水射流柔性渐进成形方法,并基于此方法设计出一套五轴成形装置,其中喷嘴具有二个旋转自由度,工作台具有三个平动自由度,具有很好的柔性,非常适合多品种小批量产品的生产和新产品的试制。对高压水射流柔性渐进成形装置的各个子系统进行了详细的介绍,并对成形过程提出了一种新的仿真分析方法,将比较复杂的流固耦合问题,简化为加载等效压强的方法。最后通过单点水柱成形的仿真分析,揭示射流压力和板厚对金属板材成形性能的影响。相似文献

8.

Development of parallel 3D RKPM meshless bulk forming simulation system

《Advances in Engineering Software》2007,38(2):87-101

A parallel computational implementation of modern meshless system is presented for explicit for 3D bulk forming simulation problems. The system is implemented by reproducing kernel particle method. Aspects of a coarse grain parallel paradigm—domain decompose method—are detailed for a Lagrangian formulation using model partitioning. Integration cells are uniquely assigned on each process element and particles are overlap in boundary zones. Partitioning scheme multilevel recursive spectrum bisection approach is applied. The parallel contact search algorithm is also presented. Explicit message passing interface statements are used for all communication among partitions on different processors. The parallel 3D system is developed and implemented into 3D bulk metal forming problems, and the simulation results demonstrated the efficiency of the developed parallel reproducing kernel particle method system. 相似文献

9.

A Practical Implementation of Parallel Dynamic Load Balancing for Adaptive Computing in VLSI Device Simulation

Y. Li S.M. Sze T.-S. Chao 《Engineering with Computers》2002,18(2):124-137

We present a new parallel semiconductor device simulation using the dynamic load balancing approach. This semiconductor device simulation based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface library. A constructive monotone iterative technique is also applied for solution of the system of nonlinear algebraic equations. Two different parallel versions of the algorithm to perform a complete device simulation are proposed. The first is a dynamic parallel domain decomposition approach, and the second is a parallel current-voltage characteristic points simulation. This implementation shows that a well-designed load balancing simulation can significantly reduce the execution time up to an order of magnitude. Compared with the measured data, numerical results on various submicron VLSI devices are presented, to show the accuracy and efficiency of the method. 相似文献

10.

One point quadrature shell elements for sheet metal forming analysis

Rui P. R. Cardoso Jeong-Whan Yoon 《Archives of Computational Methods in Engineering》2005,12(1):3-66

Summary Numerical simulation of sheet metal forming processes is overviewed in this work. Accurate and efficient elements, material modelling and contact procedures are three major considerations for a reliable numerical analysis of plastic forming processes. Two new quadrilaterals with reduced integration scheme are introduced for shell analysis in order to improve computational efficiency without sacryfying accuracy: the first one is formulated for plane stress condition and the second designed to include through-thickness effects with the consideration of the normal stress along thickness direction. Barlat’s yield criterion, which was reported to be adequate to model anisotropy of aluminum alloy sheets, is used together with a multi-stage return mapping method to account for plastic anisotropy of the rolled sheet. A brief revision of contact algorithms is included, specially the computational aspects related to their numerical implementation within sheet metal forming context. Various examples are given to demonstrate the accuracy and robustness of the proposed formulations. 相似文献

11.

Lneuro 1.0: a piece of hardware LEGO for building neural networksystems

Mauduit N. Duranton M. Gobert J. Sirat J.-A. 《Neural Networks, IEEE Transactions on》1992,3(3):414-422

Neural network simulations on a parallel architecture are reported. The architecture is scalable and flexible enough to be useful for simulating various kinds of networks and paradigms. The computing device is based on an existing coarse-grain parallel framework (INMOS transputers), improved with finer-grain parallel abilities through VLSI chips, and is called the Lneuro 1.0 (for LEP neuromimetic) circuit. The modular architecture of the circuit makes it possible to build various kinds of boards to match the expected range of applications or to increase the power of the system by adding more hardware. The resulting machine remains reconfigurable to accommodate a specific problem to some extent. A small-scale machine has been realized using 16 Lneuros, to experimentally test the behavior of this architecture. Results are presented on an integer version of Kohonen feature maps. The speedup factor increases regularly with the number of clusters involved (to a factor of 80). Some ways to improve this family of neural network simulation machines are also investigated. 相似文献

12.

Reliability assessment for sheet metal forming operations

《Computer Methods in Applied Mechanics and Engineering》2002,191(39-40):4511-4532

Methodology developed for reliability calculations of structures is applied to estimate reliability of sheet metal forming operations. Sheet forming operations are one of the most common technological processes but still the tool and process design is a difficult engineering problem. Product defects are often encountered in the industrial practice. Material breakage, wrinkling, shape defects due to springback are most frequent defects in sheet metal forming operations. Numerical simulation allows us to evaluate product manufacturability and predict the defects at early stages of the design process. In the paper the so-called forming limit diagrams (FLD) are used as a criterion of material breakage in the manufacturing process. A zone of a FLD where good results are guaranteed with sufficient probability is considered as safe zone. Sheet forming operations are characterized with a significant scatter of the results. This can be caused by differences that can occur in forming of each part. Small differences in the contact conditions, for instance, can lead to significant changes in the deformation state of the sheet. In reliability-like approach we try to quantify intuitive terms of probability of failure/success of forming operations given some uncertainty of parameters characterizing a forming process like friction parameters or blankholding force. Since the employment of the gradient-based reliability techniques is very much limited due to the some degree of numerical noise introduced by the explicit dynamic algorithm used to perform sheet stamping simulation the method of adaptive Monte Carlo simulations were chosen for reliability assessment. 相似文献

13.

Modeling and distributed simulation of a broadband-ISDN network

Chai A. Ghosh S. 《Computer》1993,26(9):37-51

A distributed approach to communication network simulation using a network of workstations configured as a loosely coupled parallel processor to model and simulate the broadband integrated services digital network (B-ISDN) is proposed. In a loosely coupled parallel processor system, a number of concurrently executable processors communicate asynchronously using explicit messages over high-speed links. Since this architecture is similar to that of B-ISDN networks, it constitutes a realistic testbed for their modeling and simulation. The authors describe an implementation of this approach on 50 Sun workstations at Brown University. Performance results, based on representative B-ISDN networks and realistic traffic models, indicate that the distributed approach is efficient and accurate 相似文献

14.

Trace生成对大规模并行性能模拟的影响及其改进策略

徐传福王荣车永刚王正华《计算机工程与科学》2012,34(3):67-73

Trace生成是trace驱动体系结构模拟中不可或缺的步骤。Trace不仅需要占用大量存储空间,其生成过程还可能对目标应用程序的模拟执行产生一定程度的干扰,导致性能数据误差。Trace驱动并行性能模拟器由于其设计实现特点和所运行的宿主并行平台的多样性,使得trace生成的影响具有其独特性。本文选取典型并行模拟器BigSim和若干具有不同计算通信比的目标并行程序,在三个支持不同traceI/O方式的宿主机平台上设计实验评估了trace生成对并行性能模拟的影响,结果表明trace生成对模拟效率和精度均有较大的影响,并分析了这种影响与并行模拟器实现和宿主机平台I/O方式的关系,进而讨论了几种可行的改进方案,对trace驱动并行模拟器设计、实现和使用具有一定的指导意义。相似文献

15.

覆盖件成形缺陷的数值仿真实例分析 总被引：4，自引：2，他引：4

项辉宇冷崇杰张媛程建璞《计算机仿真》2009,26(12):226-229

为研究汽车大型覆盖件成形规律,并分析其缺陷成因,介绍了板料成形动态显式有限元数值仿真技术的应用现状、基本理论及应用步骤.通过实例研究了采用有限无数值仿真技术进行覆盖件成形缺陷成因仿真分析的途径.仿真结果表明,采用数值仿真可以分析覆盖件变形规律,了解冲压过程中应力、应变分布及方向,成形极限图分布及缺陷情况,进而改进模具结构和冲压工艺,消除成形缺陷、提高产品质量.数值仿真技术是解决大型汽车覆盖件成形缺陷问题的有效工具. 相似文献

16.

基于位置的流体实时交互仿真

王坤于歌梁骥郭丽丽《计算机系统应用》2018,27(2):169-174

针对基于光滑粒子动力学方法的流体交互仿真过程中效率低,交互细节不够真实等问题,提出了采用基于位置的流体来模拟刚体工具与流体的交互方法. 该方法在传统的光滑粒子动力学算法的基础上进行改进,以基于CUDA并行计算平台实时模拟交互过程,并结合力觉交互设备实时输出交互力. 实验结果表明仿真过程中的交互力符合预期,在保证流体模拟的精度的前提下验证了交互力的连续性以及稳定性. 相似文献

17.

ArchSim: A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer 总被引：2，自引：0，他引：2

下载免费PDF全文

Yong-Qin Huang 《计算机科学技术学报》2009,24(5):901-912

High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems. 相似文献

18.

Multiagent simulation subsystem of diagnostic complexes based on device models

V. N. Vagin P. V. Os’kin 《Journal of Computer and Systems Sciences International》2006,45(6):970-982

Application of the multiagent approach in diagnostic systems based on device behavior models is considered. The architecture of a multiagent diagnostic system, as well as the semantic and spatial methods of the distribution of a device model among the agents, is presented. Working algorithms for a simulation subsystem are given and the efficiency of the multi-agent approach in diagnostic systems based on device behavior models is estimated. The described approach is tested for the semantic distribution of a device model among the agents. Our results confirm the efficiency of applying the multi-agent approach in diagnostic systems based on device behavior models. 相似文献

19.

Porting an industrial sheet metal forming code to a distributed memory parallel computer

G.P. Nikishkov M. Kawka A. Makinouchi G. Yagawa S. Yoshimura 《Computers & Structures》1998,67(6):439-449

The parallel version of the sheet metal forming semi-implicit finite element code ITAS3D has been developed using the domain decomposition method and direct solution methods at both subdomain and interface levels. IBM Message Passing Library is used for data communication between tasks of the parallel code. Solutions of some sheet metal forming problems on IBM SP2 computer show that the adopted DDM algorithm with the direct solver provides acceptable parallel efficiency using a moderate number of processors. The speedup 6.7 is achieved for the problem with 20000 degrees-of-freedom on the 8-processor configuration. 相似文献

20.

Finite element implementation of stress-jump and stress-continuity conditions at porous-medium, clear-fluid interface

Hua Tan 《Computers & Fluids》2009,38(6):1118-1131

The boundary conditions at the interface between clear-fluid and porous-medium domains are very important for solving flow through an open domain adjoining a porous medium. In this Galerkin finite-element (FE) based simulation of such interface flows employing Stokes and Brinkman equations, the traditional interfacial condition based on the continuity of stress in fluid and porous media is compared with the stress-jump condition proposed by Ochoa-Tapia and Whitaker using the rigorous volume averaging method. A novel FE formulation employing a second-order adjustable tensor is proposed to implement this new stress-jump condition for full three-dimensional flows. The paper also clarifies the hitherto obscure relationship between flow variables in the fluid and porous media for the conventional stress-continuity condition. In the first validation study involving numerical predictions of flow parallel to the interface, our FE implementation of the new stress-jump condition agree very well with the analytical solution for flow parallel to the interface, thereby proving the soundness of our adjustable tensor approach. Similar excellent results were obtained for FE implementation of the stress-continuity condition as well. A good match with analytical solution for a constant cross-flow superimposed on the parallel flow was also achieved while differences in velocity profiles near the interfaces were studied for the two conditions. Lastly a complex 3D flow simulation involving a fluid and porous media interface within the unit-cell of a non-crimp stitched fiber mat, used in liquid composite molding process during the manufacture of composite materials, is undertaken. The permeability of this dual-scale fibrous porous medium, estimated using the newly implemented stress-jump condition, agrees well with the experimental result thereby pointing to the accuracy of the FE implementation of the condition. Our simulations reveal that the stress-jump condition leads to a much smaller boundary layer within porous medium near the interface as compared to the stress-continuity condition, and hence to a lower, more accurate net flow-rate through the unit cell. However the two interfacial conditions yield similar results with a decrease in the porosity. 相似文献