首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybrid shared memory systems provide a hierarchy of globally accessible memories. To achieve high performance on such machines one must carefully distribute the work and the data so as to keep the workload balanced while optimizing the access to nonlocal data. In this paper we consider a semi-automatic approach to process partitioning in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting set of tasks. This approach is illustrated with a picture processing example written in BLAZE, which is transformed by the compiler into a task system maximizing locality of memory reference.Research supported by an IBM Graduate Fellowship.Research supported under NASA Contract No. 520-1398-0356.Research supported by NASA Contract No. NAS1-18107 while the last two authors were in residence at ICASE, NASA, Langley Research Center.  相似文献   

2.
In this paper we propose a fast method for solving wave guide problems. In particular, we consider the guide to be inhomogeneous, and allow propagation of waves of higher-order modes. Such techniques have been handled successfully for acoustic wave propagation problems with single mode and finite length. This paper extends this concept to electromagnetic wave guides with several modes and infinite in length. The method is shown and results of computations are presented.Research was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-18107 while the first author was in residence at the ICASE, NASA Langley Research Center, Hampton, VA 23665-5225, and by NASA Grant No. NAG-1-624.  相似文献   

3.
This paper presents an analytically robust, globally convergent approach to managing the use of approximation models of varying fidelity in optimization. By robust global behaviour we mean the mathematical assurance that the iterates produced by the optimization algorithm, started at an arbitrary initial iterate, will converge to a stationary point or local optimizer for the original problem. The approach presented is based on the trust region idea from nonlinear programming and is shown to be provably convergent to a solution of the original high-fidelity problem. The proposed method for managing approximations in engineering optimization suggests ways to decide when the fidelity, and thus the cost, of the approximations might be fruitfully increased or decreased in the course of the optimization iterations. The approach is quite general. We make no assumptions on the structure of the original problem, in particular, no assumptions of convexity and separability, and place only mild requirements on the approximations. The approximations used in the framework can be of any nature appropriate to an application; for instance, they can be represented by analyses, simulations, or simple algebraic models. This paper introduces the approach and outlines the convergence analysis.This research was supported by the Dept. of Energy grant DEFG03-95ER25257 and Air Force Office of Scientific Research grant F49620-95-1-0210This research was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-19480 while the author was in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681, USAThis research was supported by the Air Force Office of Scientific Research grant F49620-95-1-0210 and by the National Aeronautics and Space Administration under NASA Contract No. NAS1-19480 while the author was in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681, USA  相似文献   

4.
Many large-scale finite element problems are intractable on current generation production supercomputers. High-performance computer architectures offer effective avenues to bridge the gap between computational needs and the power of computational hardware. The biggest challenge lies in the substitution of the key algorithms in an application program with redesigned algorithms which exploit the new architectures and use better or more appropriate numerical techniques. A methodology for implementing non-linear finite element analysis on a homogeneous distributed processing network is discussed. The method can also be extended to heterogeneous networks comprised of different machine architectures provided that they have a mutual communication interface. This unique feature has greatly facilitated the port of the code to the 8-node Intel Touchstone Gamma and then the 512-node Intel Touchstone Delta. The domain is decomposed serially in a preprocessor. Separate input files are written for each subdomain. These files are read in by local copies of the program executable operating in parallel. Communication between processors is addressed utilizing asynchronous and synchronous message passing. The basic kernel of message passing is the internal force exchange which is analogous to the computed interactions between sections of physical bodies in static stress analysis. Benchmarks for the Intel Delta are presented. Performance exceeding 1 gigaflop was attained. Results for two large-scale finite element meshes are presented.  相似文献   

5.
In irregular scientific computational problems one is periodically forced to choosea delay point where some overhead cost is suffered to ensure correctness, or to improve subsequent performance. Examples of delay points are problem remappings, and global synchronizations. One sometimes has considerable latitude in choosing the placement and frequency of delay points; we consider the problem of scheduling delay points so as to minimize the overal execution time. We illustrate the problem with two examples, a regridding method which changes the problem discretization during the course of the computation, and a method for solving sparse triangular systems of linear equations. We show that one can optimally choose delay points in polynomial time using dynamic programming. However, the cost models underlying this approach are often unknown. We consequently examine a scheduling heuristic based on maximizing performance locally, and empirically show it to be nearly optimal on both problems. We explain this phenomenon analytically by identifying underlying assumptions which imply that overall performance is maximized asymptotically if local performance is maximized.This research was supported in part by the National Aeronautics and Space Administration under NASA contract NAS1-18107 while the author consulted at ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, Virginia 23665.Supported in part by NASA contract NAS1-18107, the Office of Naval Research under Contract No. N00014-86-K-0654, and NSF Grant DCR 8106181.  相似文献   

6.
Fourier spectral method can achieve exponential accuracy both on the approximation level and for solving partial differential equations if the solutions are analytic. For a linear PDE with discontinuous solutions, Fourier spectral method will produce poor point-wise accuracy without post-processing, but still maintains exponential accuracy for all moments against analytic functions. In this note we assess the accuracy of Fourier spectral method applied to nonlinear conservation laws through a numerical case study. We have found out that the moments against analytic functions are no longer very accurate. However the numerical solution does contain accurate information which can be extracted by a Gegenbauer polynomial based post-processing.Research supported by ARO Grant DAAL03-91-G-0123 and DAAH04-94-G-0205, NSF Grant DMS-9211820, NASA Grant NAG1-1145 and contract NAS1-19480 while the first author was in residence at ICASE, NASA Langley Research Center, Hampton, Virginia 23681-0001, and AFOSR Grant 93-0090.  相似文献   

7.
A concurrent processing algorithm is developed for a materially nonlinear analysis of hollow square and rectangular structural sections and implemented on a special purpose multiprocessor computer at NASA Langley Research Center referred to as the Finite Element Machine (FEM). The cross-sectional thrust-moment-curvature relations are generated concurrently using a tangent stiffness approach, and yield surfaces are obtained that represent the interaction between axial load and biaxial moments. For the study, a maximum speed-up factor of 7.69 is achieved on eight processors.  相似文献   

8.
Since 1988 NASA Langley Research Center has supported a formal methods research program. From its inception, a primary goal of the program has been to transfer formal methods technology into aerospace industries focusing on applications in commercial air transport. The overall program has been described elsewhere. This paper gives an account of the technology transfer strategy and its evolution.  相似文献   

9.
风洞试验中,模型姿态是实验数据修正的重要环节,测量的准确与否对实验结果有着重要影响.从风洞试验实际出发,提出一种基于双目立体视觉的风洞模型姿态测量技术,并介绍了该技术在风洞中的应用.试验表明文中使用的方法与美国兰利研究中心31-inch Mach 10风洞中使用的视觉姿态测量系统精度相当,具有广泛的使用前景.  相似文献   

10.
《Real》2002,8(2):157-172
The high-speed civil transport (HSCT) aircraft has been designed with limited cockpit visibility. To handle this, the National Aeronautics and Space Administration (NASA) has proposed an external visibility system (XVS) to aid pilots in overcoming this lack of visibility. XVS obtains video images using high-resolution cameras mounted on and directed outside the aircraft. Images captured by the XVS enable automatic computer analysis in real-time, and thereby alert pilots about potential flight path hazards. Thus, the system is useful in helping pilots avoid air collisions. In this study, a system was configured to capture image sequences from an on-board high-resolution digital camera at a live video rate, record the images into a high-speed disk array through a fiber channel, and process the images using a Datacube MaxPCI machine with multiple pipelined processors to perform real-time obstacle detection. In this paper, we describe the design, implementation, and evaluation of this computer vision system. Using this system, real-time obstacle detection was performed and digital image data were obtained successfully in flight tests conducted at NASA Langley Research Center in January and September 1999. The system is described in detail so that other researchers can easily replicate the work.  相似文献   

11.
This paper describes the parallel implementation of a numerical model for the simulation of problems from fluid dynamics on distributed memory multiprocessors. The basic procedure is to apply a fully explicit upwind finite difference approximation on a staggered grid. A theoretical time complexity analysis shows that a perfect speedup is achieved asympotically. Experimental results on the Intel Touchstone Delta System confirm the analytical performance model. © 1997 John Wiley & Sons, Ltd.  相似文献   

12.
Wood  W.A. Kleb  W.L. 《Software, IEEE》2003,20(3):30-36
Can we successfully apply XP (Extreme Programming) in a scientific research context? A pilot project at the NASA Langley Research Center tested XPs applicability in this context. Since the cultural environment at a government research center differs from the customer-centric business view, eight of XPs 12 practices seemed incompatible with the existing research culture. Despite initial awkwardness, the authors determined that XP can function in situations for which it appears to be ill suited.  相似文献   

13.
Parallel computation for two-dimensional convective flows in cavities with adiabatic horizontal boundaries and driven by differential heating of the two vertical end walls are investigated using the Intel Paragon, Intel Touchstone Delta, Cray T3D and IBM SP2. The numerical scheme, including a parallel multigrid solver, and domain decomposition techniques for parallel computing are discussed in detail. Performance comparisons are made for the different parallel systems, and numerical results using various numbers of processors are discussed. © 1997 John Wiley & Sons, Ltd.  相似文献   

14.
In a previous work we studied the concurrent implementation of a numerical model, CONDIFP, developed for the analysis of depth-averaged convection–diffusion problems. Initial experiments were conducted on the Intel Touchstone Delta System, using up to 512 processors and different problem sizes. As for other computation-intensive applications, the results demonstrated an asymptotic trend to unity efficiency when the computational load dominates the communication load. This paper relates some other numerical experiences, in both one and two space dimensions with various choices of initial and boundary conditions, carried out on the Intel Paragon XP/S Model L38 with the aim to illustrate the parallel solver versatility and reliability.  相似文献   

15.
Parallel implementation of large-scale structural optimization   总被引:1,自引:0,他引:1  
Advances in computer technology and performance allow researchers to pose useful optimization problems that were previously too large for consideration. For example, NASA Langley Research Center is investigating the large structural optimization problems that arise in aircraft design. The total number of design variables and constraints for these nonlinear optimization problems is now an order of magnitude larger than anything previously reported. To find solutions in a reasonable amount of time, a coarse-grained parallel-processing algorithm is recommended. This paper studies the effects of problem size on sequential and parallel versions of this algorithm.For initial testing of this algorithm, a hub frame optimization problem is devised such that the size of the problem can be adjusted by adding members and load cases. Numerous convergence histories demonstrate that the algorithm performs correctly and in a robust manner. Timing profiles for a wide range of randomly generated problems highlight the changes in the subroutine timings that are caused by the increase in problem size. The potential benefits and drawbacks associated with the parallel approach are summarized.  相似文献   

16.
We consider the problem of optimally assigning the modules of a parallel/pipelined program over the processors of a multiple processor system under certain restrictions on the interconnection structure of the program as well as the multiple computer system. We show that for a variety of such problems, it is possible to find if a partition of the modular program exists in which the load on any processor is whithin a certain bound. This method when combined with a binary search over a fixed range, provides an optimal solution to the partitioning problem.The specific problems we consider are partitioning of (1) a chain structured parallel program over a chain-like computer system, (2) multiple chain-like programs over a host-satellite system, and (3) a tree structured parallel program over a host-satellite system.For a problem withN modules andM processors, the complexity of our algorithm is no worse thanO(Mlog(N)log(W T/)), whereW T is the cost of assigning all modules to one processors, and the desired accuracy. This algorithm provides an improvement over the recently developed best known algorithm that runs inO(MNlog(N)) time.This Research was supported by a grant from the Division of Research Extension and Advisory Services, University of Engineering and Technology Lahore, Pakistan. Further support was provided by NASA Contracts NAS1-17070 and NAS1-18107 while the author was resident at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, Virginia, USA.  相似文献   

17.
A concurrent processing algorithm is developed for materially nonlinear stability analysis of imperfect columns with biaxial partial rotational end restraints. The algorithm for solving the governing nonlinear ordinary differential equations is implemented on a multiprocessor computer called the finite element machine, developed at the NASA Langley Research Center. Numerical results are obtained on up to nine concurrent processors. A substantial computational gain is achieved in using the parallel processing approach.  相似文献   

18.
In an earlier paper we described how uniformization can be used as the basis of a conservative parallel simulation algorithm for simulating a continuous time Markov chain (CTMC). The fundamental notion is that uniformization permits the calculation (in advance of actually running the simulation) of instants where processors will synchronize, achieving much lower synchronization overhead than is usually attributed to conservative methods. In this paper we extend the idea further, showing how to use uniformization in the context of an optimistic parallel simulation to reduce the frequency of state-saving, schedule intelligently, and eliminate the Global Virtual Time (GVT) calculation. We demonstrate the efficiency of the method by implementation on a 16-processor Intel iPSC/2 and on 256 processors of the Intel Touchstone Delta.  相似文献   

19.
Major problems are faced in the aerospace industry today concerning safety in the crowded skies around airports and continuing increases in fuel prices. Lockheed-California Company in collaboration with the NASA Langley Research Center has been working on a development that tackles both of these problems — an airborne four-dimensional computer capability for the L-1011 Tristar jetliner. A trial installation plane is flying with colour electronic displays on a portion of its instrument panel to achieve these ends. The display system is intended to control flights to such a degree that arrival times can be predicted to within a matter of seconds, substantially reducing the congestion and delays of today's airways. Accurate on-board prediction of arrival times in conjunction with an en route traffic metering technique should make air traffic flow much more efficient and lead to a substantial reduction in fuel consumption.  相似文献   

20.
Numerical experiments on the accuracy of ENO and modified ENO schemes   总被引:6,自引:0,他引:6  
In this paper we make further numerical experiments assessing an accuracy degeneracy phenomena reported by A. Rogerson and E. Meiburg (this issue, 1990). We also propose a modified ENO scheme, which recovers the correct order of accuracy for all the test problems with smooth initial conditions and gives results comparable to the original ENO schemes for discontinuous problems.Research supported by NSF grant No. DMS88-10150, NASA Langley contract No. NAS1-18605, and AFOSR grant No. 90-0093. Computation supported by NAS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号