期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Ray Tracing Using the Message Passing Interface

Cameron C.B. 《IEEE transactions on instrumentation and measurement》2008,57(2):228-234

Ray-tracing software is available for lens design and for general optical systems modeling. It tends to be designed to run on a single processor and can be very time consuming if the number of rays traced is large. Previously, multiple digital signal processors (DSPs) have been used to perform such simulations. This approach is attractive because DSPs are inexpensive, and the time saved through parallel processing can be significant. In this paper, we report a nearly linear relationship between the number of processors, and the rate of ray tracing with as many as 839 processors operating in parallel on the Naval Research Laboratory's Cray XD-1 computer with the Message Passing Interface (MPI). In going from 1 to 839 processors, we achieved an efficiency of 97.9% and a normalized ray-tracing rate of in a system with 22 planar surfaces, two paraboloid reflectors, and one hyperboloid refractor. The need for a load-balancing software was obviated by the use of a prime number of processors. 相似文献

2.

A‐scalability and an integrated computational technology and framework for non‐linear structural dynamics. Part 2: Implementation aspects and parallel performance results

R. Kanapady K. K. Tamma 《International journal for numerical methods in engineering》2003,58(15):2295-2323

An integrated framework and computational technology is described that addresses the issues to foster absolute scalability (A‐scalability) of the entire transient duration of the simulations of implicit non‐linear structural dynamics of large scale practical applications on a large number of parallel processors. Whereas the theoretical developments and parallel formulations were presented in Part 1, the implementation, validation and parallel performance assessments and results are presented here in Part 2 of the paper. Relatively simple numerical examples involving large deformation and elastic and elastoplastic non‐linear dynamic behaviour are first presented via the proposed framework for demonstrating the comparative accuracy of methods in comparison to available experimental results and/or results available in the literature. For practical geometrically complex meshes, the A‐scalability of non‐linear implicit dynamic computations is then illustrated by employing scalable optimal dissipative zero‐order displacement and velocity overshoot behaviour time operators which are a subset of the generalized framework in conjunction with numerically scalable spatial domain decomposition methods and scalable graph partitioning techniques. The constant run times of the entire simulation of ‘fixed‐memory‐use‐per‐processor’ scaling of complex finite element mesh geometries is demonstrated for large scale problems and large processor counts on at least 1024 processors. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

3.

The MasPar MP-1 As a Computer Arithmetic Laboratory

Michael A. Anuta Daniel W. Lozier Peter R. Turner 《Journal of research of the National Institute of Standards and Technology》1996,101(2):165-174

相似文献

4.

Concurrent optimal allocation of distributed manufacturing resources using extended Teaching-Learning-Based Optimization

Wenyu Zhang Shuai Zhang Shanshan Guo Yushu Yang Yong Chen 《国际生产研究杂志》2017,55(3):718-735

The optimal allocation of distributed manufacturing resources is a challenging task for supply chain deployment in the current competitive and dynamic manufacturing environments, and is characterised by multiple objectives including time, cost, quality and risk that require simultaneous considerations. This paper presents an improved variant of the Teaching-Learning-Based Optimisation (TLBO) algorithm to concurrently evaluate, select and sequence the candidate distributed manufacturing resources allocated to subtasks comprising the supply chain, while dealing with the trade-offs among multiple objectives. Several algorithm-specific improvements are suggested to extend the standard form of TLBO algorithm, which is only well suited for the one-dimensional continuous numerical optimisation problem well, to solve the two-dimensional (i.e. both resource selection and resource sequencing) discrete combinatorial optimisation problem for concurrent allocation of distributed manufacturing resources through a focused trade-off within the constrained set of Pareto optimal solutions. The experimental simulation results showed that the proposed approach can obtain a better manufacturing resource allocation plan than the current standard meta-heuristic algorithms such as Genetic Algorithm, Particle Swarm Optimisation and Harmony Search. Moreover, a near optimal resource allocation plan can be obtained with linear algorithmic complexity as the problem scale increases greatly. 相似文献

5.

Partial structure factors of a simulated polymer melt

F. Alvarez A. Arbe J. Colmenero R. Zorn D. Richter 《Computational Materials Science》2002,25(4):596-605

Fully atomistic molecular dynamic simulations were carried out by using the Insight (Insight II 4.0.0 P version) and the Discover-3 programs from MSI with the polymer consortium force field. The model system used in these simulations was built using the Amorphous Cell module. The polymer system simulated was glassy polyisoprene (PI) as used in previous neutron scattering (NS) measurements. A first molecular dynamics at 363 K was run for 1 ns using the Discover-3 program collecting data every 0.01 ps and a subsequent one (taking the previous output sample as an input for the following dynamics) was run for 2 ns collecting data every 0.5 ps. The results of the second run agreed to those of the first run, indicating that the sample was well equilibrated at this high temperature. Starting from the obtained atomic trajectories we have calculated the partial static structure factors for NS corresponding to different PI samples with different levels of deuteration (PId3, i.e., methyl group deuterated and main chain protonated; PId5, i.e., methyl group protonated and main chain deuterated; PId8, i.e., fully deuterated and PIh8, i.e., fully protonated). The results obtained are compared to the coherent NS cross-sections measured on real samples by means of D7 spectrometer with polarization analysis (ILL, Grenoble). A good agreement is obtained between experimental and simulated data validating the simulated sample. Moreover, the dynamic evolution of these correlations has also been calculated from the simulations. With these time dependent functions, the magnitude measured in a neutron spin echo (NSE) experiment can be constructed. Here we present two examples dealing with the fully deuterated sample PId8 and a partially deuterated sample, PId5, that show how computer simulation constitutes an invaluable tool for interpreting NSE results. 相似文献

6.

Evaluation of a grid based molecular dynamics approach for polypeptide simulations

Merelli I Morra G Milanesi L 《IEEE transactions on nanobioscience》2007,6(3):229-234

Molecular dynamics is very important for biomedical research because it makes possible simulation of the behavior of a biological macromolecule in silico. However, molecular dynamics is computationally rather expensive: the simulation of some nanoseconds of dynamics for a large macromolecule such as a protein takes very long time, due to the high number of operations that are needed for solving the Newton's equations in the case of a system of thousands of atoms. In order to obtain biologically significant data, it is desirable to use high-performance computation resources to perform these simulations. Recently, a distributed computing approach based on replacing a single long simulation with many independent short trajectories has been introduced, which in many cases provides valuable results. This study concerns the development of an infrastructure to run molecular dynamics simulations on a grid platform in a distributed way. The implemented software allows the parallel submission of different simulations that are singularly short but together bring important biological information. Moreover, each simulation is divided into a chain of jobs to avoid data loss in case of system failure and to contain the dimension of each data transfer from the grid. The results confirm that the distributed approach on grid computing is particularly suitable for molecular dynamics simulations thanks to the elevated scalability. 相似文献

7.

The application of scalable distributed memory computers to the finite element modeling of electromagnetic scattering

Tom Cwik Daniel S. Katz Cinzia Zuffada Vahraz Jamnejad 《International journal for numerical methods in engineering》1998,41(4):759-776

Large-scale parallel computation can be an enabling resource in many areas of engineering and science if the parallel simulation algorithm attains an appreciable fraction of the machine peak performance, and if undue cost in porting the code or in developing the code for the parallel machine is not incurred. The issue of code parallelization is especially significant when considering unstructured mesh simulations. The unstructured mesh models considered in this paper result from a finite element simulation of electromagnetic fields scattered from geometrically complex objects (either penetrable or impenetrable.) The unstructured mesh must be distributed among the processors, as must the resultant sparse system of linear equations. Since a distributed memory architecture does not allow direct access to the irregularly distributed unstructured mesh and sparse matrix data, partitioning algorithms not needed in the sequential software have traditionally been used to efficiently spread the data among the processors. This paper presents a new method for simulating electromagnetic fields scattered from complex objects; namely, an unstructured finite element code that does not use traditional mesh partitioning algorithms. © 1998 This paper was produced under the auspices of the U.S. Government and it is therfore not subject to copyright in the U.S. 相似文献

8.

Dynamic and static job allocation for multi-server systems

Liming Liu Xiaoming Liu 《IIE Transactions》1998,30(9):845-854

We consider the optimal assignment of groups of jobs to a fixed number of time periods over a finite horizon to minimize the total facility idling and job waiting costs. The capacity of the facility varies randomly in the sense that the time that each one of the multiple servers becomes available is random (servers arrive late). The service times are also random and are independent and identically distributed. With approximations, we formulate a dynamic optimization model for this problem. With a simple modification, we can apply this dynamic model to a static outpatient appointment problem. We propose two methods to compute the capacity distribution: (1) Poisson approximation and (2) simulation. While the Poisson approximation works well for exponential service times, the simulation scheme enables us to use the dynamic model without actually specifying the service time distribution. The performance measures of the schedules obtained with these two methods compare well with those of the optimal allocation obtained from (exhaustive) simulation. We also conduct numerical studies to investigate the dynamics between the idling and waiting costs ratio and the number of scheduling periods. 相似文献

9.

V-BLAST/OFDM系统的多用户分集性能研究

肖啸朱光喜周林何业军《高技术通讯》2007,17(4):357-361

为改善V-BLAST/OFDM系统的性能,提出了子载波动态分配准则.该准则可以有效地利用多用户分集,明显提高V-BLAST/OFDM系统的整体性能.采用随机矩阵和排序统计的数学方法对取得的性能增益作了定量分析,仿真试验结果证明了分析的正确性.将取得的性能增益与空间分集系统中取得的多用户分集增益进行了比较,比较结果说明,将基于V-BLAST的多用户OFDM系统与子载波动态分配相结合,更能改善系统的整体性能. 相似文献

10.

Fast and reliable top-level simulation strategy for mixed-signal integrated circuits and its application to DC?DC converters

Forghani-Zadeh H.P. Rincon-Mora G.A. 《Circuits, Devices & Systems, IET》2007,1(2):143-150

Top-level, transient, transistor-based simulations are a critical step in the product-development cycle of mixed-signal integrated circuits. These simulations are normally performed just before fabrication and unfortunately impose cumbersome bottlenecks in the design flow. Verification is an iterative process by nature, whereby each problem found requires another simulation to ensure a proper fix is in place, and because of the complexity of a large system, minor errors can cost days, increasing design time and time-to-market. A top-level transistor-based simulation strategy is proposed with minimal time overhead. The strategy is to start with a quick, all macro-model system simulation and gradually substitute one transistor-level sub-block at a time for each additional run. For optimal results, less computationally intensive blocks, which can be determined from a proposed set of screening simulations, are replaced first. The proposed strategy was tested and applied to a buck, current-mode switching regulator, and the results show that simulation overhead is least for linear analogue functions (e.g. op-amps) and worst for high-speed nonlinear circuits (e.g. signal generators). Nonlinear and bi-stable analogue blocks such as bandgap references take more time to simulate than op-amps and less than low frequency digital functions such as power-on-reset, which in turn are less intensive than ramp and pulse generators 相似文献

11.

An algorithm for optimal buffer placement in reliable serial lines

JOHN H. HARRIS STEPHEN G. POWELL 《IIE Transactions》1999,31(4):287-302

The optima] allocation of buffer capacity in unbalanced production lines with reliable but variable workstations is a complex and little-researched topic. Analytic formulas for the throughput of these lines do not exist, so simulation is the only practical alternative for estimating throughput. Exhaustive search over all possible buffer allocations quickly becomes impractical beyond short lines and few buffers. Thus an algorithm is needed to efficiently find optimal or near-optimal allocations. We develop a simple search algorithm for determining the optimal allocation of a fixed amount of buffer capacity in an n-station serial line. The algorithm, which is an adaptation of the Spendley-Hext and Nelder-Mead simplex search algorithms, uses simulation to estimate throughput for every allocation considered. An important feature of the algorithm is that the simulation run length is adjusted during the running of the algorithm to save simulation run time when high precision in throughput estimates is not needed, and 10 ensure adequate precision when it is needed. We describe the algorithm and show that it can reliably find the known optimal allocation in balanced lines. Then we test the ability of the algorithm to find optimal allocations in unbalanced lines, first for cases in which the optimal allocation is known, and subsequently for cases in which the optimal allocation is not known. We focus particularly on lines with multiple imbalances in means and variances. In general, our algorithm proves highly efficient in finding a near-optimal allocation with short simulation run times. It also usually finds the true optimal allocation, but it is in the nature of this problem that many buffer allocations differ in throughput by small amounts that are difficult to resolve even with long simulation runs. 相似文献

12.

Theoretical assessment of a synthetic aperture beamformer for real-time 3-D imaging

Hazard CR Lockwood GR 《IEEE transactions on ultrasonics, ferroelectrics, and frequency control》1999,46(4):972-980

A real-time 3-D imaging system requires the development of a beamformer that can generate many beams simultaneously. In this paper, we discuss and evaluate a suitable synthetic aperture beamformer. The proposed beamformer is based on a pipelined network of high speed digital signal processors (DSP). By using simple interpolation-based beamforming, only a few calculations per pixel are required for each channel, and an entire 2-D synthetic aperture image can be formed in the time of one transmit event. The performance of this beamformer was explored using a computer simulation of the radiation pattern. The simulations were done for a full 64-element array and a sparse array with the same receive aperture but only five transmit elements. We assessed the effects of changing the sampling rate and amplitude quantization by comparing the relative levels of secondary lobes in the radiation patterns. The results show that the proposed beamformer produces a radiation pattern equivalent to a conventional beamformer using baseband demodulation, provided that the sampling rate is approximately 10 times the center frequency of the transducer (34% bandwidth pulse). The simulations also show that the sparse array is not significantly more sensitive to delay or amplitude quantization than the full array. 相似文献

13.

Computational steering in Monte Carlo simulations of thin film polystyrene

Mason DR Sutton AP 《Philosophical transactions. Series A, Mathematical, physical, and engineering sciences》2005,363(1833):1961-1974

High molecular weight polymer systems show very long relaxation times, of the order of milliseconds or more. This time-scale proves practically inaccessible for atomic-scale dynamical simulation such as molecular dynamics. Even with a Monte Carlo (MC) simulation, the generation of statistically independent configurations is non-trivial. Many moves have been proposed to enhance the efficiency of MC simulation of polymers. Each is described by a proposal density Q(x'; x): the probability of selecting the trial state x' given that the system is in the current state x. This proposal density must be parametrized for a particular chain length, chemistry and temperature. Choosing the correct set of parameters can greatly increase the rate at which the system explores its configuration space. Computational steering (CS) provides a new methodology for a systematic search to optimize the proposal densities for individual moves, and to combine groups of moves to greatly improve the equilibration of a model polymer system. We show that monitoring the correlation time of the system is an ideal single parameter for characterizing the efficiency of a proposal density function, and that this is best evaluated by a distributed network of replicas of the system, with the operator making decisions based on the averages generated over these replicas. We have developed an MC code for simulating an anisotropic atomistic bead model which implements the CS paradigm. We report simulations of thin film polystyrene. 相似文献

14.

Sequential resource allocation utilizing agents

Andrew Wallace 《国际生产研究杂志》2013,51(11):2481-2499

Many manufacturing systems allocate resources, such as machines, sequentially. Sequential allocation of resources can be viewed as a digraph where each vertex represents resources forming nodes in a distributed system and the arcs represent the allocation. The allocation of such resources can be considered to be a distributed problem. Agents are a distributed artificial intelligence paradigm applicable to distributed problems and, therefore, have a potential to be applicable to sequential resource allocation. This paper presents a method of sequential resource allocation utilizing agents, and an AGV system is presented as an example application area. This system was utilized in experiments to test the agent application. Results and an analysis are also presented. 相似文献

15.

The Effect of the Coefficient of Variation of Operation Times on the Allocation of Storage Space in Production Line Systems 总被引：2，自引：0，他引：2

Frederick S. Hillier Kut C. So 《IIE Transactions》1991,23(2):198-206

This paper studies the effect of the coefficient of variation of operation times on the optimal allocation of storage space in production line systems. The operation times at each station are modelled by a two-stage Coxian distribution. This work extends the results of our previous study of the storage allocation problem with exponentially distributed operation times. Interpreting Stage 1 of the two-stage Coxian distribution as the normal service for an item at a station and Stage 2 as down time at the station, our model can also be used to study the effect of breakdowns on the allocation of storage space in production line systems. The results show that the “bowl effect” whereby the center stations should be given preferential treatment becomes more pronounced with higher variability in the operation times. Another general conclusion is that the overall optimal storage allocation commonly follows a “storage bowl phenomenon” whereby the allocation of buffer storage space fits an inverted bowl pattern when the total storage space is also a decision variable 相似文献

16.

Heuristic control of multiple batch processors with incompatible job families and future job arrivals

John Benedict C. Tajan Appa Iyer Sivakumar Stanley B. Gershwin 《国际生产研究杂志》2013,51(15):4206-4219

We analyse the problem of minimising the mean cycle time of a batch processing stage containing K?>?1 batch processors in parallel with incompatible job families and future job arrivals. We provide an integer linear programming formulation and a dynamic program formulation for small problem instances. For larger problem instances, we propose an online heuristic policy MPC_REPEAT. At each instance a decision has to be made, MPC_REPEAT decomposes the problem of simultaneously assigning multiple batches to multiple processors into sequentially assigning multiple batches to multiple processors. When job families are uncorrelated, we show via simulation experiments that MPC_REPEAT has significantly lower mean cycle time than a previously proposed look-ahead method except when: (MPC_REPEAT ignores some job families AND the traffic intensity is high.) Our experiments also reveal that increasing the job family correlation of consecutive job arrivals results, with a few exceptions, in a mean cycle-time reduction, for both policies evaluated. This reduction in cycle time generally increases with: increasing number of job families, decreasing number of processors, and increasing time between job arrivals. Our findings imply that controlling the upstream processors, such that job families of consecutive job arrivals are correlated, can reduce the cycle time at the batch processing stage. Furthermore, the expected mean cycle time reduction due to this strategy can be substantially larger than that expected from switching to a more complex batch processing stage policy, under less stringent conditions. 相似文献

17.

Frequency and fundamental signal measurement algorithms for distributed control and protection applications

Roscoe A.J. Burt G.M. McDonald J.R. 《Generation, Transmission & Distribution, IET》2009,3(5):485-495

Increasing penetration of distributed generation within electricity networks leads to the requirement for cheap, integrated, protection and control systems. To minimise cost, algorithms for the measurement of AC voltage and current waveforms can be implemented on a single microcontroller, which also carries out other protection and control tasks, including communication and data logging. This limits the frame rate of the major algorithms, although analogue to digital converters (ADCs) can be oversampled using peripheral control processors on suitable microcontrollers. Measurement algorithms also have to be tolerant of poor power quality, which may arise within grid-connected or islanded (e.g. emergency, battlefield or marine) power system scenarios. This study presents a 'Clarke-FLL hybrid' architecture, which combines a three-phase Clarke transformation measurement with a frequency-locked loop (FLL). This hybrid contains suitable algorithms for the measurement of frequency, amplitude and phase within dynamic three-phase AC power systems. The Clarke-FLL hybrid is shown to be robust and accurate, with harmonic content up to and above 28% total harmonic distortion (THD), and with the major algorithms executing at only 500 samples per second. This is achieved by careful optimisation and cascaded use of exact-time averaging techniques, which prove to be useful at all stages of the measurements: from DC bias removal through low-sample-rate Fourier analysis to sub-harmonic ripple removal. Platform-independent algorithms for three-phase nodal power flow analysis are benchmarked on three processors, including the Infineon TC1796 microcontroller, on which only 10% of the 2000 mus frame time is required, leaving the remainder free for other algorithms. 相似文献

18.

Parallel DSMC method using dynamic domain decomposition

J.‐S. Wu K.‐C. Tseng 《International journal for numerical methods in engineering》2005,63(1):37-76

A general parallel direct simulation Monte Carlo method using unstructured mesh is introduced, which incorporates a multi‐level graph‐partitioning technique to dynamically decompose the computational domain. The current DSMC method is implemented on an unstructured mesh using particle ray‐tracing technique, which takes the advantages of the cell connectivity information. In addition, various strategies applying the stop at rise (SAR) (IEEE Trans Comput 1988; 39: 1073–1087) scheme is studied to determine how frequent the domain should be re‐decomposed. A high‐speed, bottom‐driven cavity flow, including small, medium and large problems, based on the number of particles and cells, are simulated. Corresponding analysis of parallel performance is reported on IBM‐SP2 parallel machine up to 64 processors. Analysis shows that degree of imbalance among processors with dynamic load balancing is about ??½ of that without dynamic load balancing. Detailed time analysis shows that degree of imbalance levels off very rapidly at a relatively low value with increasing number of processors when applying dynamic load balancing, which makes the large problem size fairly scalable for processors more than 64. In general, optimal frequency of activating SAR scheme decreases with problem size. At the end, the method is applied to compute two two‐dimensional hypersonic flows, a three‐dimensional hypersonic flow and a three‐dimensional near‐continuum twin‐jet gas flow to demonstrate its superior computational capability and compare with experimental data and previous simulation data wherever available. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

19.

A NOTE ON THE REEL ALLOCATION PROBLEM

Richard L. Francis Tomas Horak 《IIE Transactions》1994,26(3):111-114

The reel allocation problem is to choose the numbers of reels of each of a number of types of items to be used in populating a printed circuit board by a SMT machine so as to maximize the length of an uninterrupted machine production run while using no more slots for reels than are available. We show this problem can be solved to optimality very efficiently with a quite simple and robust bisection search algorithm. Algorithm run times are less than 1 second on a 486 PC. 相似文献

20.

一种OFDM系统动态子载波比特和功率分配联合算法

高欢芹酆广增朱琦《中国工程科学》2006,8(3):62-65

提出一种多用户OFDM(orthogonal frequency division multiplexing)系统下行链路，具有信道变化实时性的动态子载波、比特和功率分配联合算法(UA)，在满足各个用户数据速率和BER要求的同时使总的发送功率最小。提出的算法与动态子载波分配算法(WSA)相比，计算复杂度相当，在移动信道环境下仿真结果表明性能有一定的改善。相似文献