首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To verify the effect of artificial anisotropy parameters in one‐step leapfrog hybrid implicit‐explicit finite‐difference time‐domain (FDTD) method, we calculated several microwave components with different characteristics. Introduced auxiliary field variable can reduce the program difficulty and improve the computational efficiency without additional computational time and memory cost. Analyses of the numerical results are proved that the calculation time is reduced to about one‐sixth compared to the traditional FDTD method for the same example simulated. The memory cost and relative error are remained at a good level. The numerical experiments for microwave circuit and antenna have been well demonstrated the method available.  相似文献   

2.
In this work, they propose a one‐step leapfrog hybrid implicit‐explicit finite‐difference time‐domain (HIE‐FDTD) method for body‐of‐revolution (BOR). Meanwhile, its Convolutional Perfect Matched Layer (CPML) absorbing boundary condition is implemented. In this method, the implicit difference is applied in the angular direction. All the resultant updating equations are still explicit. However, the stability condition of the proposed method is relaxed. The analytical analysis shows that its time step is only determined by the smaller one of spatial increments Δρ and Δz. A scattering example is provided to demonstrate the new algorithm. At the same time, the relative of reflection error of the CPML is given with comparisons of Mur.  相似文献   

3.
The simulation of electromagnetic (EM) waves propagation in the dielectric media is presented using Compute Unified Device Architecture (CUDA) implementation of finite‐difference time‐domain (FDTD) method on graphic processing unit (GPU). The FDTD formulation in the dielectric media is derived in detail, and GPU‐accelerated FDTD method based on CUDA programming model is described in the flowchart. The accuracy and speedup of the presented CUDA‐implemented FDTD method are validated by the numerical simulation of the EM waves propagating into the lossless and lossy dielectric media from the free space on GPU, by comparison with the original FDTD method on CPU. The comparison of the numerical results of CUDA‐implemented FDTD method on GPU and original FDTD method on CPU demonstrates that the CUDA‐implemented FDTD method on GPU can obtain better application speedup performance with reasonable accuracy. © 2016 Wiley Periodicals, Inc. Int J RF and Microwave CAE 26:512–518, 2016.  相似文献   

4.
Collective behaviour of winged insects is a wondrous and familiar phenomenon in the real world. In this paper, we introduce a highly efficient field‐based approach to simulate various insect swarms. Its core idea is to construct a smooth yet noise‐aware governing velocity field that can be further decomposed into two sub‐fields: (i) a divergence‐free curl‐noise field to model noise‐induced movements of individual insects in a swarm, and (ii) an enhanced global velocity field to control navigational paths in a complex environment along which all the insects in a swarm fly. Through simulation experiments and comparisons with existing crowd simulation approaches, we demonstrate that our approach is effective to simulate various insect swarm behaviours including aggregation, positive phototaxis, sedation, mass‐migrating, and so on. Besides its high efficiency, our approach is very friendly to parallel implementation on GPUs (e.g. the speedup achieved through GPU acceleration is higher than 50 if the number of simulated insects is more than 10 000 on an off‐the‐shelf computer). Our approach is the first multi‐agent modelling system that introduces curl‐noise into agents' velocity field and uses its non‐scattering nature to maintain non‐colliding movements in 3D crowd simulation.  相似文献   

5.
This paper presents a parallel framework for simulating fluids with the Smoothed Particle Hydrodynamics (SPH) method. For low computational costs per simulation step, efficient parallel neighbourhood queries are proposed and compared. To further minimize the computing time for entire simulation sequences, strategies for maximizing the time step and the respective consequences for parallel implementations are investigated. The presented experiments illustrate that the parallel framework can efficiently compute large numbers of time steps for large scenarios. In the context of neighbourhood queries, the paper presents optimizations for two efficient instances of uniform grids, that is, spatial hashing and index sort. For implementations on parallel architectures with shared memory, the paper discusses techniques with improved cache‐hit rate and reduced memory transfer. The performance of the parallel implementations of both optimized data structures is compared. The proposed solutions focus on systems with multiple CPUs. Benefits and challenges of potential GPU implementations are only briefly discussed.  相似文献   

6.
A software framework taking advantage of parallel processing capabilities of CPUs and GPUs is designed for the real‐time interactive cutting simulation of deformable objects. Deformable objects are modelled as voxels connected by links. The voxels are embedded in an octree mesh used for deformation. Cutting is performed by disconnecting links swept by the cutting tool and then adaptively refining octree elements near the cutting tool trajectory. A surface mesh used for visual display is reconstructed from disconnected links using the dual contour method. Spatial hashing of the octree mesh and topology‐aware interpolation of distance field are used for collision. Our framework uses a novel GPU implementation for inter‐object collision and object self collision, while tool‐object collision, cutting and deformation are assigned to CPU, using multiple threads whenever possible. A novel method that splits cutting operations into four independent tasks running in parallel is designed. Our framework also performs data transfers between CPU and GPU simultaneously with other tasks to reduce their impact on performances. Simulation tests show that when compared to three‐threaded CPU implementations, our GPU accelerated collision is 53–160% faster; and the overall simulation frame rate is 47–98% faster.  相似文献   

7.
The subset‐sum problem is a well‐known non‐deterministic polynomial‐time complete (NP‐complete) decision problem. This paper proposes a novel and efficient implementation of a parallel two‐list algorithm for solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecture (CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stage. It is not easy to effectively implement the three stages of the algorithm on a GPU. Ways to achieve better performance, reasonable task distribution between CPU and GPU, effective GPU memory management, and CPU–GPU communication cost minimization are discussed. The generation stage of the algorithm adopts a typical recursive divide‐and‐conquer strategy. Because recursion cannot be well supported by current GPUs with compute capability less than 3.5, a new vector‐based iterative implementation mechanism is designed to replace the explicit recursion. Furthermore, to optimize the performance of the GPU implementation, this paper improves the three stages of the algorithm. The experimental results show that the GPU implementation has much better performance than the CPU implementation and can achieve high speedup on different GPU cards. The experimental results also illustrate that the improved algorithm can bring significant performance benefits for the GPU implementation. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
Real‐time rendering of large‐scale engineering computer‐aided design (CAD) models has been recognized as a challenging task. Because of the constraints of limited graphics processing unit (GPU) memory size and computation capacity, a massive model with hundreds of millions of triangles cannot be loaded and rendered in real‐time using most of modern GPUs. In this paper, an efficient GPU out‐of‐core framework is proposed for interactively visualizing large‐scale CAD models. To improve efficiency of data fetching from CPU host memory to GPU device memory, a parallel offline geometry compression scheme is introduced to minimize the storage cost of each primitive by compressing the levels of detail (LOD) geometries into a highly compact format. At the rendering stage, occlusion culling and LOD processing algorithms are integrated and implemented with an efficient GPU‐based approach to determine a minimal scale of primitives to be transferred for each frame. A prototype software system is developed to preprocess and render massive CAD models with the proposed framework. Experimental results show that users can walkthrough massive CAD models with hundreds of millions of triangles at high frame rates using our framework. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
Because layered low‐density parity‐check (LDPC) decoding algorithm was proposed, one can exploit the diversity gain to achieve performance comparable to the traditional two‐phase message passing (TPMP) decoding but with about twice faster decoding convergence compared to TPMP. In order to reduce the decoding time of layered LDPC decoder, a graphics processing unit (GPU) is exploited as the modem processor so that the decoding procedure can be processed in parallel using numerous threads in the GPU. In this paper, we present the parallel algorithms and efficient implementations on the GPU for two different layered message passing schemes, the row‐layered and column‐layered decoding. In the experiments, the quasicyclic LDPC codes for WiFi (802.11n) and WiMAX (802.16e) are decoded by the proposed layered LDPC decoders. The experimental results show that our decoder has good bit error ratio (BER) performance comparable to TPMP decoder. The peak throughput is 712 Mbps, which is about two orders of magnitude faster than that of CPU implementation and comparable to the dedicated hardware solutions. Compared to the existing fastest GPU‐based implementation, the presented decoder can achieve a performance improvement of 2.3 times. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
In this paper, we provide a smooth extension of the energy aware Gauss‐Seidel iteration to the Position‐Based Dynamics (PBD) method. This extension is inspired by the kinetic and potential energy changes equalization and uses the foundations of the recent extended version of PBD algorithm (XPBD). The proposed method is not meant to conserve the total energy of the system and modifies each position constraint based on the equality of the kinetic and potential energy changes within the Gauss‐Seidel process of the XPBD algorithm. Our extension provides an implicit solution for relatively better stiffness during the simulation of elastic objects. We apply our solution directly within each Gauss‐Seidel iteration and it is independent of both simulation step‐size and integration methods. To demonstrate the benefits of our proposed extension with higher frame rates, we develop an efficient and practical mesh coloring algorithm for the XPBD method which provides parallel processing on a GPU. During the initialization phase, all mesh primitives are grouped according to their connectivity. Afterwards, all these groups are computed simultaneously on a GPU during the simulation phase. We demonstrate the benefits of our method with many spring potential and strain‐based continuous material constraints. Our proposed algorithm is easy to implement and seamlessly fits into the existing position‐based frameworks.  相似文献   

11.
By using the method of Finite Difference Time Domain (FDTD) and the technology of Compute Unified Device Architecture (CUDA), the propagation characteristics of electromagnetic waves in Left‐Handed Materials (LHM) have been studied in this paper. The LHM slab was matched with the free space and the secondary focusing phenomenon of LHM was simulated. Compared with the serial FDTD program, our work showed that this method had a high accuracy. The phase compensation effect and the inverse Snell effect of LHM were also discussed by using the parallel FDTD method based on CUDA, which further proved that our results were consistent with the theoretical study. By comparing the calculation time of traditional FDTD program with that of the CUDA based parallel FDTD program, we conclude that the latter is more efficient than the former. This parallel method can be used as a more efficient way to study LHM.  相似文献   

12.
The hybrid implicit‐explicit (HIE) finite‐difference time‐domain (FDTD) method with the convolutional perfectly matched layer (CPML) is extended to a full three‐dimensional scheme in this article. To demonstrate the application of the CPML better, the entire derivation process is presented, in which the fine scale structure is changed from y‐direction to z‐direction of the propagation innovatively. The numerical examples are adopted to verify the efficiency and accuracy of the proposed method. Numerical results show that the HIE‐FDTD with CPML truncation has the similar relative reflection error with the FDTD with CPML method, but it is much better than the methods with Mur absorbing boundary. Although Courant‐Friedrich‐Levy number climbs to 8, the maximum relative error of the proposed HIE‐CPML remains more below than ?71 dB, and CPU time is nearly 72.1% less than the FDTD‐CPML. As an example, a low‐pass filter is simulated by using the FDTD‐CPML and HIE‐CPML methods. The curves obtained are highly fitted between two methods; the maximum errors are lower than ?79 dB. Furthermore, the CPU time saved much more, accounting for only 26.8% of the FDTD‐CPML method while the same example simulated.  相似文献   

13.
In this article, a hybrid algorithm based on traditional finite‐difference time‐domain (FDTD) and weakly conditionally stable finite‐difference time‐domain (WCS‐FDTD) algorithm is proposed. In this algorithm, the calculation domain is divided into fine‐grid region and coarse‐grid region. The traditional FDTD method is used to calculate the field value in the coarse‐grid region, while the WCS‐FDTD method is used in the fine‐grid region. The spatial interpolation scheme is applied to the interface of the coarse grid region and fine grid region to insure the stability and precision of the presented hybrid algorithm. As a result, a relatively large time step size, which is only determined by the spatial cell sizes in the coarse grid region, is applied to the entire calculation domain. This scheme yields a significant reduction both of computation time and memory requirement in comparison with the conventional FDTD method and WCS‐FDTD method, which are validated by using numerical results.  相似文献   

14.
This article presents a new design of multiband planar inverted‐F antenna with slotted ground plane and S‐etched slot on the radiation patch. The proposed antenna is optimized using an efficient global hybrid optimization method combining bacterial swarm optimization and Nelder‐Mead (BSO‐NM) algorithm to cover a very important six service bands including GSM900, GPS1575, DCS1800, PCS1900, ISM2450, and 4G5000 MHz with enhanced bandwidths. The BSO‐NM algorithm in Matlab code is linked to the CST Microwave studio software to simulate the antenna. To validate the results, the antenna is analyzed using the finite difference time domain (FDTD) method. A good agreement is achieved between the results of EM simulation and that produced from the FDTD method. © 2012 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2013.  相似文献   

15.
We present a GPU‐based streaming algorithm to perform high‐resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU‐based kernels and data structures. Our algorithm perform intra‐object and inter‐object collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many‐core GPUs. In practice, our algorithm can perform high‐fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high‐end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single‐threaded CPU‐based algorithm, and about one order of magnitude improvement over a 16‐core CPU‐based parallel implementation.  相似文献   

16.
The computing power of graphics processing units (GPU) has increased rapidly, and there has been extensive research on general‐purpose computing on GPU (GPGPU) for cryptographic algorithms such as RSA, Elliptic Curve Cryptosystem (ECC), NTRU, and Advanced Encryption Standard. With the rise of GPGPU, commodity computers have become complex heterogeneous GPU+CPU systems. This new architecture poses new challenges and opportunities in high‐performance computing. In this paper, we present high‐speed parallel implementations of the rainbow method based on perfect tables, which is known as the most efficient time‐memory trade‐off, in the heterogeneous GPU+CPU system. We give a complete analysis of the effect of multiple checkpoints on reducing the cost of false alarms and take advantage of it for load balancing between GPU and CPU. For GTX460, our implementation is about 1.86 and 3.25 times faster than other GPU‐accelerated implementations, RainbowCrack and Cryptohaze, respectively, and for GTX580, 1.53 and 2.40 times faster. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Branch‐and‐bound (B&B) algorithms are attractive methods for solving to optimality combinatorial optimization problems using an implicit enumeration of a dynamically built tree‐based search space. Nevertheless, they are time‐consuming when dealing with large problem instances. Therefore, pruning tree nodes (subproblems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Pruning requires to perform the bounding operation, which consists of applying a lower bound function to the subproblems generated during the exploration process. Preliminary experiments performed on the Flow‐Shop scheduling problem (FSP) have shown that the bounding operation consumes over 98% of the execution time of the B&B algorithm. In this paper, we investigate the use of graphics processing unit (GPU) computing as a major complementary way to speed up the search. We revisit the design and implementation of the parallel bounding model on GPU accelerators. The proposed approach enables data access optimization. Extensive experiments have been carried out on well‐known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU‐based single core execution using an Intel Core i7‐970 processor without GPU, speedups higher than 100 times faster are achieved for large problem instances. At an equivalent peak performance, GPU‐accelerated B&B is twice faster than its multi‐core counterpart. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
A novel dispersion formulation of the 2D alternating‐direction implicit (ADI) finite‐difference time‐domain (FDTD) method is presented. The formulation is based on an increasing process analysis of the monochromatic wave in free space. A numerical experiment scheme is designed to verify the accuracy of the proposed formulation. The results obtained from the proposed formulation are in a good agreement with those from the numerical experiments, and the proposed formulation is more accurate than those reported in the literature. © 2006 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2006.  相似文献   

19.
We present an efficient algorithm for object‐space proximity queries between multiple deformable triangular meshes. Our approach uses the rasterization capabilities of the GPU to produce an image‐space representation of the vertices. Using this image‐space representation, inter‐object vertex‐triangle distances and closest points lying under a user‐defined threshold are computed in parallel by conservative rasterization of bounding primitives and sorted using atomic operations. We additionally introduce a similar technique to detect penetrating vertices. We show how mechanisms of modern GPUs such as mipmapping, Early‐Z and Early‐Stencil culling can optimize the performance of our method. Our algorithm is able to compute dense proximity information for complex scenes made of more than a hundred thousand triangles in real time, outperforming a CPU implementation based on bounding volume hierarchies by more than an order of magnitude.  相似文献   

20.
This paper presents a novel method that improves the efficiency of high‐quality surface reconstructions for particle‐based fluids using Marching Cubes. By constructing the scalar field only in a narrow band around the surface, the computational complexity and the memory consumption scale with the fluid surface instead of the volume. Furthermore, a parallel implementation of the method is proposed. The presented method works with various scalar field construction approaches. Experiments show that our method reconstructs high‐quality surface meshes efficiently even on single‐core CPUs. It scales nearly linearly on multi‐core CPUs and runs up to fifty times faster on GPUs compared to the original scalar field construction approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号