共查询到20条相似文献,搜索用时 0 毫秒
1.
Yousef Elkurdi Evgueni Souleimanov Warren J. Gross 《Computer Physics Communications》2008,178(8):558-570
The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory. 相似文献
2.
Rui P.S. Fartaria Pedro C.R. Rodrigues Fernando M.S. Silva Fernandes 《Computer Physics Communications》2006,175(2):116-121
A time saving algorithm for the Monte Carlo method of Metropolis is presented. The technique is tested with different potential models and number of particles. The coupling of the method with neighbor lists, linked lists, Ewald sum and reaction field techniques is also analyzed. It is shown that the proposed algorithm is particularly suitable for computationally heavy intermolecular potentials. 相似文献
3.
Andrei Afanasev Alexander Ilyichev Vladimir Zykunov 《Computer Physics Communications》2007,176(3):218-231
The Monte Carlo generator MERADGEN 1.0 for the simulation of radiative events in parity conserving doubly-polarized Møller scattering has been developed. Analytical integration wherever it is possible provides rather fast and accurate generation. Some numerical tests and histograms are presented.
Program summary
Program title: MERADGEN 1.0Catalogue identifier:ADYM_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYM_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions: noneProgramming language: FORTRAN 77Computer(s) for which the program has been designed: allOperating system(s) for which the program has been designed: LinuxRAM required to execute with typical data: 1 MBNo. of lines in distributed program, including test data, etc.:2196No. of bytes in distributed program, including test data, etc.:23 501Distribution format:tar.gzHas the code been vectorized or parallelized?: noNumber of processors used: 1Supplementary material: noneExternal routines/libraries used: noneCPC Program Library subprograms used: noneNature of problem: Simulation of radiative events in parity conserving doubly-polarized Møller scattering.Solution method: Monte Carlo method for simulation within QED, analytical integration wherever it is possible that provides rather fast and accurate generation.Restrictions: noneUnusual features: noneAdditional comments: noneRunning time: The simulation of 108 radiative events for itest:=1 takes up to 45 seconds on AMD Athlon 2.80 GHz processor. 相似文献4.
This paper focuses on the implementation and the performance analysis of a smooth particle mesh Ewald method on several parallel computers. We present the details of the algorithms and our implementation that are used to optimize parallel efficiency on such parallel computers. 相似文献
5.
V.A. Poghosyan 《Computer Physics Communications》2005,170(3):287-295
This paper describes a package for calculations of expressions with Dirac matrices. Advantages over existing similar packages are described. MatrixExp package is intended for simplification of complex expressions involving γ-matrices, providing such tools as automatic Feynman parameterization, integration in d-dimensional space, sorting and grouping of results in a given order. Also, in comparison with the existing similar package Tracer, the presented package MatrixExp has more enhanced input possibility. User-available functions of MatrixExp package are described in detail. Also an example of calculation of Feynman diagram for process b→sγg with application of functions of MatrixExp package is presented.
Program summary
Title of program:MatrixExpCatalogue identifier:ADWBProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWBProgram obtainable from:CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions:noneProgramming language:MATHEMATICAComputer:PC PentiumOperating system:WindowsNo. of lines in distributed program, including test data, etc.: 1551No. of bytes in distributed program, including test data, etc.: 16 040Distribution format:tar.gzRAM:loading the package uses approx. 3 500 000 bytes of RAM. However memory required for calculations depends heavily on the expressions in the view, as the package uses recursive functions, and MATHEMATICA dynamically allocates memory. Package has been tested to work on PC Pentium II 233 MHz with 128 Mb of memory calculating typical diagrams of contemporary calculations.Nature of problem:Feynman diagram calculation, simplification of expressions with γ-matricesSolution method:Analytic transformations, dimensional regularization, Feynman parameterizationRestrictions:MatrixExp package works only with single line of expressions (G[l1,]), in contrast to the Tracer package that works with multiple lines, i.e., the following is possible in Tracer, but not in MatrixExp: G[l1,]**G[l2,]**G[l3,], which will return the result of G[l1,]**G[l1,]**G[l1,]….Unusual features:noneRunning time:Seconds for expressions with several different γ-matrices on Pentium IV 1.8 GHz and of the order of a minute on Pentium II 233 MHz. Calculation times rise with the number of matrices. 相似文献6.
Yiming Li 《Computer Physics Communications》2003,153(3):359-372
Various self-consistent semiconductor device simulation approaches require the solution of Poisson equation that describes the potential distribution for a specified doping profile (or charge density). In this paper, we solve the multi-dimensional semiconductor nonlinear Poisson equation numerically with the finite volume method and the monotone iterative method on a Linux-cluster. Based on the nonlinear property of the Poisson equation, the proposed method converges monotonically for arbitrary initial guesses. Compared with the Newton's iterative method, it is easy implementing, relatively robust and fast with much less computation time, and its algorithm is inherently parallel in large-scale computing. The presented method has been successfully implemented; the developed parallel nonlinear Poisson solver tested on a variety of devices shows it has good efficiency and robustness. Benchmarks are also included to demonstrate the excellent parallel performance of the method. 相似文献
7.
In this paper, we propose a basis set approach by the Constrained Interpolation Profile (CIP) method for the calculation of bound and continuum wave functions of the Schrödinger equation. This method uses a simple polynomial basis set that is easily extendable to any desired higher-order accuracy. The interpolating profile is chosen so that the subgrid scale solution approaches the local real solution by the constraints from the spatial derivative of the original equation. Thus the solution even on the subgrid scale becomes consistent with the master equation. By increasing the order of the polynomial, this solution quickly converges. The method is tested on the one-dimensional Schrödinger equation and is proven to give solutions a few orders of magnitude higher in accuracy than conventional methods for the lower-lying eigenstates. The method is straightforwardly applicable to various types of partial differential equations. 相似文献
8.
The phase behaviour of three soft core spherocylinder models is investigated with a view to producing an effective potential for use in coarse-grained simulations of liquid crystal phases and polymers composed of rigid and flexible segments. Provided potentials are not made too soft, two of the soft core models are found to work well in terms of successfully reproducing mesophases and in providing considerable improvements in computational speed over other commonly used coarse-grained models. In Monte Carlo simulations a soft-core spherocylinder model in which a cut and shifted Lennard-Jones potential is truncated with a linear tangential potential is found to be particularly effective; while for molecular dynamics a better model is provided by a DPD-like quadratic potential. Here, computational speed-ups of 20-30× are seen in equilibration times in comparison to the well-known soft repulsive spherocylinder (SRS) model. The quadratic potential is used in an additional set of coarse-grained simulations of a liquid crystal with a flexible chain, which exhibits spontaneous formation of a nematic phase. The use of different types of interaction sites is also illustrated by the simulation of a spherocylinder with two “tails” formed from spheres. Here, varying the hardness of the sphere-spherocylinder interaction potential allows the formation of a smectic-A phase which exhibits microphase separation. 相似文献
9.
Antonio Soto Meca Francisco Alhama López Carlos González Fernández 《Computer Physics Communications》2007,177(9):720-728
The network simulation method, based on the formal equivalence between physical systems and electrical networks, solves numerical problems of relatively mathematical complexity in a versatile, efficient and computationally fast way. In this paper, the method is applied for the first time to the design of a general purpose model for simulating two-dimensional transient density-driven flow and solute transport through porous media, a mathematical model made up by coupled, nonlinear differential equations. Using the Boussinesq approximation and the stream function formulation, the model is used to solve two typical problems related with groundwater flows. Isochlor concentration and stream function curves are presented and successfully compared with those of other authors. Simulation is carried out using the digital computer program Pspice with relatively low computing times. 相似文献
10.
We present two sequential and one parallel global optimization codes, that belong to the stochastic class, and an interface routine that enables the use of the Merlin/MCL environment as a non-interactive local optimizer. This interface proved extremely important, since it provides flexibility, effectiveness and robustness to the local search task that is in turn employed by the global procedures. We demonstrate the use of the parallel code to a molecular conformation problem.
Program summary
Title of program: PANMINCatalogue identifier: ADSUProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSUProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested: PANMIN is designed for UNIX machines. The parallel code runs on either shared memory architectures or on a distributed system. The code has been tested on a SUN Microsystems ENTERPRISE 450 with four CPUs, and on a 48-node cluster under Linux, with both the GNU g77 and the Portland group compilers. The parallel implementation is based on MPI and has been tested with LAM MPI and MPICHInstallation: University of Ioannina, GreeceProgramming language used: Fortran-77Memory required to execute with typical data: Approximately O(n2) words, where n is the number of variablesNo. of bits in a word: 64No. of processors used: 1 or manyHas the code been vectorised or parallelized?: Parallelized using MPINo. of bytes in distributed program, including test data, etc.: 147163No. of lines in distributed program, including the test data, etc.: 14366Distribution format: gzipped tar fileNature of physical problem: A multitude of problems in science and engineering are often reduced to minimizing a function of many variables. There are instances that a local optimum does not correspond to the desired physical solution and hence the search for a better solution is required. Local optimization techniques can be trapped in any local minimum. Global Optimization is then the appropriate tool. For example, solving a non-linear system of equations via optimization, one may encounter many local minima that do not correspond to solutions, i.e. they are far from zeroMethod of solution: PANMIN is a suite of programs for Global Optimization that take advantage of the Merlin/MCL optimization environment [1,2]. We offer implementations of two algorithms that belong to the stochastic class and use local searches either as intermediate steps or as solution refinementRestrictions on the complexity of the problem: The only restriction is set by the available memory of the hardware configuration. The software can handle bound constrained problems. The Merlin Optimization environment must be installed. Availability of an MPI installation is necessary for executing the parallel codeTypical running time: Depending on the objective functionReferences: [1] D.G. Papageorgiou, I.N. Demetropoulos, I.E. Lagaris, Merlin-3.0. A multidimensional optimization environment, Comput. Phys. Commun. 109 (1998) 227-249. [2] D.G. Papageorgiou, I.N. Demetropoulos, I.E. Lagaris, The Merlin Control Language for strategic optimization, Comput. Phys. Commun. 109 (1998) 250-275. 相似文献11.
Stochastic optimization for the calculation of the time dependency of the physiological demand during exercise and recovery 总被引:1,自引:0,他引:1
The stochastic optimization method ALOPEX IV is successfully applied to the problem of estimating the time dependency of the physiological demand in response to exercise. This is a fundamental and unsolved problem in the area of exercise physiology, where the lack of appropriate tools and techniques forces the assumption and the use of a constant demand during exercise. By the use of an appropriate partition of the physiological time series and by means of stochastic optimization, the time dependency of the physiological demand during heavy intensity exercise and its subsequent recovery is, for the first time, revealed. 相似文献
12.
13.
以典型膜蒸馏系统中的大型多层平面膜组件为研究对象,通过数值模拟的方法研究该类组件的性能。工作中建立了该类组件的数学模型,并将该模型与系统中其它设备的数学模型联立求解。模拟计算结果表明,大型膜组件在膜蒸馏系统中会表现出与实验室用小型组件完全不同的性能规律。组件通量随外供热量的增加呈线性规律上升;在外供热量一定的情况下,提高热回收换热器的KA值固然能提高组件通量,但随着KA值的增大,这一手段的效果会越来越不显著;增加组件膜面积使组件通量减小,但总产量是增加的;膜两侧流体流量的增加会导致组件通量的下降。流体停留时间的大大增加使大型膜组件内存在范围很宽的流体温度分布,因此才能够进行膜蒸馏系统的能量回收。上述组件性能随设计和操作条件的变化正是能量回收率变化的结果。 相似文献
14.
Piotr Piecuch Stanis?aw A. KucharskiKarol Kowalski Monika Musia? 《Computer Physics Communications》2002,149(2):71-96
The recently proposed renormalized (R) and completely renormalized (CR) coupled-cluster (CC) methods of the CCSD[T] and CCSD(T) types have been implemented using recursively generated intermediates and fast matrix multiplication routines. The details of this implementation, including the complete set of equations that have been used in writing efficient computer codes, memory requirements, and typical CPU timings, are discussed. The R-CCSD[T], R-CCSD(T), CR-CCSD[T], and CR-CCSD(T) computer codes and similar codes for the standard CC methods, including the LCCD, CCD, CCSD, CCSD[T], and CCSD(T) approaches, have been incorporated into the gamess package. Information about the main features of this new set of CC programs is provided. 相似文献
15.
The equation of motion for a balloon in an atmosphere is generalized but placed in proper context by taking into account some fluid theory results and a few factors not considered in previous works. The design of a computer program becomes necessary to find solutions. A code that allows to perform 2D simulations of open balloons flights is developed. The coupled integrodifferential nature of the problem represented a significant challenge for a satisfactory implementation. 相似文献
16.
17.
F. Garcia J. Mesa O. Helene F. Milian T.E. Rodrigues 《Computer Physics Communications》2007,176(5):347-361
The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo simulation procedure.
Program summary
Title of program:STATFLUXCatalogue identifier:ADYS_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions: noneComputer for which the program is designed and others on which it has been tested:Micro-computer with Intel Pentium III, 3.0 GHzInstallation:Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, BrazilOperating system:Windows 2000 and Windows XPProgramming language used:Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program.Memory required to execute with typical data:8 Mbytes of RAM memory and 100 MB of Hard disk memoryNo. of bits in a word:16No. of lines in distributed program, including test data, etc.:6912No. of bytes in distributed program, including test data, etc.:229 541Distribution format:tar.gzNature of the physical problem:The investigation of transport mechanisms for radioactive substances, through environmental pathways, is very important for radiological protection of populations. One such pathway, associated with the food chain, is the grass-animal-man sequence. The distribution of trace elements in humans and laboratory animals has been intensively studied over the past 60 years [R.C. Pendlenton, C.W. Mays, R.D. Lloyd, A.L. Brooks, Differential accumulation of iodine-131 from local fallout in people and milk, Health Phys. 9 (1963) 1253-1262]. In addition, investigations on the incidence of cancer in humans, and a possible causal relationship to radioactive fallout, have been undertaken [E.S. Weiss, M.L. Rallison, W.T. London, W.T. Carlyle Thompson, Thyroid nodularity in southwestern Utah school children exposed to fallout radiation, Amer. J. Public Health 61 (1971) 241-249; M.L. Rallison, B.M. Dobyns, F.R. Keating, J.E. Rall, F.H. Tyler, Thyroid diseases in children, Amer. J. Med. 56 (1974) 457-463; J.L. Lyon, M.R. Klauber, J.W. Gardner, K.S. Udall, Childhood leukemia associated with fallout from nuclear testing, N. Engl. J. Med. 300 (1979) 397-402]. From the pathways of entry of radionuclides in the human (or animal) body, ingestion is the most important because it is closely related to life-long alimentary (or dietary) habits. Those radionuclides which are able to enter the living cells by either metabolic or other processes give rise to localized doses which can be very high. The evaluation of these internally localized doses is of paramount importance for the assessment of radiobiological risks and radiological protection. The time behavior of trace concentration in organs is the principal input for prediction of internal doses after acute or chronic exposure. The General Multiple-Compartment Model (GMCM) is the powerful and more accepted method for biokinetical studies, which allows the calculation of concentration of trace elements in organs as a function of time, when the flow parameters of the model are known. However, few biokinetics data exist in the literature, and the determination of flow and transfer parameters by statistical fitting for each system is an open problem.Restriction on the complexity of the problem:This version of the code works with the constant volume approximation, which is valid for many situations where the biological half-live of a trace is lower than the volume rise time. Another restriction is related to the central flux model. The model considered in the code assumes that exist one central compartment (e.g., blood) that connect the flow with all compartments, and the flow between other compartments is not included.Typical running time:Depends on the choice for calculations. Using the Derivative Method the time is very short (a few minutes) for any number of compartments considered. When the Gauss-Marquardt iterative method is used the calculation time can be approximately 5-6 hours when ∼15 compartments are considered. 相似文献18.
In this paper, we use hat basis functions to solve the system of Fredholm integral equations (SFIEs) and the system of Volterra integral equations (SVIEs) of the second kind. This method converts the system of integral equations into a linear or nonlinear system of algebraic equations. Also, we consider the order of convergence of the method and show that it is O(h2). Application of the method on some examples show its accuracy and efficiency. 相似文献
19.
《Ergonomics》2012,55(8):979-995
An apparatus to measure the coefficient of kinetic friction (μk) between the shoe sole and the underfoot surface was constructed, and a method including criteria to evaluate the risk of slipping during walking was developed. The apparatus is a prototype stationary step simulator capable of simulating the movements of a human foot and the forces applied to the underfoot surface during an actual slip, and the drainage capability of the contact surface between the shoe sole and the flooring when different lubricants or contaminants are used. The apparatus consists of a movable artificial foot controlled by a computer with the aid of three hydraulic cylinders. The frictional force (Fμ), the normal force (FN) and their ratio (μk = Fμ/FN) are measured with a two-way force platform when the foot slides along its surface. Two separate gait patterns, heel-slide (μk1) and sole-slide (μk2) gait pattern, are used for the evaluations. The method classifies studied shoe, lubricant and underfoot surface combinations into five slip resistance classes according to the measured μk1 The slip resistance assessments are specified with some complementary safety criteria, e.g., the ratio μk1/ μk2 The reliability of the developed measurement method was assessed in an international comparison test. According to available results discussed in this paper, our method seems to be valid and the slip resistance measurements seem to be repeatable. 相似文献
20.
《Expert systems with applications》2014,41(16):7316-7327
Using the balanced scorecard approach based on sustainable development parameters is a powerful and useful methodology to evaluate the sustainable performance of organization or company. In this paper, a new approach based on sustainability balanced scorecard (SBSC) and multi criteria decision making (MCDM) approaches is developed for evaluating the performance of oil producing companies in Iran. For reflecting the interdependent relationships among factors influencing the problem under consideration, analytical network process (ANP), a branch of the MCDM techniques, is employed. However, using the ANP method for calculating the preference ratings of alternatives is a time-consuming and bothersome process; therefore, COPRAS (COmplex PRoportional ASsessment) technique is adopted to prioritize the feasible alternatives in terms of linguistic variables. Based on this study, the results demonstrate the effectiveness of the proposed model. The performance evaluation model proposed by using a combination of the MCDM methods and the SBSC approach helps authorities to make an attempt for achieving a competitive advantage. 相似文献