首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The CYBER 205 is a new computer systems produced by Control Data Corporation at Arden Hills, Minnesota. The CYBER 205 is a large scale vector processor with substantial capabilities in scaler processing, one-to-four million words of main memory and concurrent I/O facilities. The CYBER 205 is available to potential users now in the last half of 1981. These facilities are found in Control Data CYBERNET Centers as well as in several installations throughout the world. The technology employed in the construction of the CYBER 205 emphasizes the latest advances in large scale integration of both logic and memory. The CYBER 205 system concept exploits the CDC Loosely Coupled Network (LCN). Standard FORTRAN is supplemented by various extensions imbedded with the provided FORTRAN compiler. Arithmetic properties of the computer are designed to support the functional requirements of FORTRAN in 32-bit and 128-bit formats. A multiprogramming operating system based on the CYBER 205 virtual memory isalso provided by Control Data Corporation. A variety of programs have now been run on the CYBER 205. The results of some of these runs are shown in tabular form for parametric study purposes. Analysis of these tables is converted to (tentative) timing formulas applicable to the specific algorithms described. The net effect of the algorithm development, the language support, the CYBER 200 libraries, the input/output support system and the CYBER 200 operational system is to extend the utility of supercomputers in a user environment. This system is now operational (in 1981).  相似文献   

2.
In mid-1973, the Division of Computing Research, CSIKO, took delivery of a Control Data CVBER76 computer, which acts as the primary processing power of the CSIRO computer network (CSIRONET), replacing the Control Data 3600. The 3600, running under the DAD operating system, has been retained as a ‘front-end’ to the CYBER76 and continues to support the following functions:
  • 1 An interactive system allowing both editing and CYBER76 job submission
  • 2 Input of job files centrally from 3600 input devices, remotely from CSIRONET devices, or from interactive console users, and output of results similarly
  • 3 A permanent file (document) system, with tape archiving
This paper describes the linking of the CYBER76 to the 3600. Software for a CYBER76 PPU has been written and some changes to the 3600 and CYBER76 operating systems have been required.  相似文献   

3.
We present a Monte Carlo programme version written in Vector-FORTRAN 200 which allows a fast computation of thermodynamic properties of dense model fluids on the CYBER 205 vector processing computer.A comparison of the execution speed of this programme, a scalar version and a vectorized molecular dynamics programme showed the following: (i) the vectorized form of the Monte Carlo programme runs about a factor of 8 faster on the CYBER 205 than the scalar version on the conventional computer CYBER 855; (ii) for small ensembles of 32–108 particles, the Monte Carlo programme is of about the velocity as the molecular dynamics one. However, for larger numbers of particles, the molecular dynamics programme is vastly faster executed on the CYBER 205 than the Monte Carlo programme, particularly when neighbour tables are used. We propose a technique to accelerate the Monte Carlo programme for larger ensembles.  相似文献   

4.
《Computers & chemistry》1989,13(4):313-317
A large fraction of the time spent calculating the energy of a configuration of polarizable water molecules is spent calculating the electric field and polarization energy. This paper describes vectorization strategies for such calculations on the CYBER 205. For a cluster of 215 waters and the Lybrand-Kollman model, the vectorized calculation on the CYBER 205 executes at about 47 times VAX 8650 speed, or about 300 times VAX 11/780 speed.  相似文献   

5.
Memory interleaving and multiaccess ports are important details of state-of-the-art supercomputers, by which the powerful CPU is supplied with data at an adequate speed. The data transport will be slowed down and consequently the performance of the CPU will be reduced, if these parallel features of memory organisation cannot be fully exploited, e.g. in case of an unfavorable distribution of data in memory.

We present here a model for the memory access of supercomputers, especially vector computers of SIEMENS VP series. The model results in a formula to give quantitative predictions for access times to vector elements with a constant stride. Model parameters are given for the SIEMENS VP series and the theoretical and measured values are compared.  相似文献   


6.
The CRAY-2 is considered to be one of the most powerful supercomputers. Its state-of-the-art technology features a faster clock and more memory than any other supercomputer available today. In this report the single processor performance of the CRAY-2 is compared with the older, more mature CRAY X-MP. Benchmark results are included for both the slow and the fast memory DRAM MOS CRAY-2. Our comparison is based on a kernel benchmark set aimed at evaluating the performance of these two machines on some standard tasks in scientific computing. Particular emphasis is placed on evaluating the impact of the availability of large real memory on the CRAY-2 versus fast secondary memory on the CRAY X-MP with SSD. Our benchmark includes large linear equation solvers and FFT routines, which test the capabilities of the different approaches to providing large memory. We find that in spite of its higher processor speed the CRAY-2 does not perform as well as the CRAY X-MP on the Fortran kernel benchmark. We also find that for large-scale applications, which have regular and predictable memory access patterns, a high-speed secondary memory device such as the SSD can provide performance equal to the large real memory of the CRAY-2.The author is an employee of SCA Division of Boeing Computer Services.  相似文献   

7.
Most benchmarks are smaller than actual application programs. One reason is to improve benchmark universality by demanding resources every computer is likely to have. However, users dynamically increase the size of application programs to match the power available, whereas most benchmarks are static and of a size appropriate for computers available when the benchmark was created; this is particularly true for parallel computers. Thus, the benchmark overstates computer performance, since smaller problems spend more time in cache. Scalable benchmarks, such as HINT, examine the full spectrum of performance through various memory regimes, and express a superset of the information given by any particular fixed-size benchmark. Using 5,000 experimental measurements, we have found that performance on the NAS Parallel Benchmarks, SPEC, LINPACK, and other benchmarks is predicted accurately by subsets of HINT performance curve. Correlations are typically better than 0.995. Predicted ranking is often perfect.  相似文献   

8.
Recently, a number of advanced architecture machines have become commercially available. These new machines promise better cost performance than traditional computers, and some of them have the potential of competing with current supercomputers, such as the CRAY X-MP, in terms of maximum performance. This paper describes the methodology and results of a pilot study of the performance of a broad range of advanced architecture computers using a number of complete scientific application programs. The computers evaluated include:
  • 1 shared-memory bus architecture machines such as the Alliant FX/8, the Encore Multimax, and the Sequent Balance and Symmetry
  • 2 shared-memory network-connected machines such as the Butterfly
  • 3 distributed-memory machines such as the NCUBE, Intel and Jet Propulsion Laboratory (JPL)/Caltech hypercubes
  • 4 very long instruction word machines such as the Cydrome Cydra-5
  • 5 SIMD machines such as the Connection Machine
  • 6 ‘traditional’ supercomputers such as the CRAY X-MP, CRAY-2 and SCS-40.
Seven application codes from a number of scientific disciplines have been used in the study, although not all the codes were run on every machine. The methodology and guidelines for establishing a standard set of benchmark programs for advanced architecture computers are discussed. The CRAYs offer the best performance on the benchmark suite; the shared memory multiprocessor machines generally permitted some parallelism, and when coupled with substantial floating point capabilities (as in the Alliant FX/8 and Sequent Symmetry), provided an order of magnitude less speed than the CRAYs. Likewise, the early generation hypercubes studied here generally ran slower than the CRAYs, but permitted substantial parallelism from each of the application codes.  相似文献   

9.
Extended or Level 2 BLAS is intended to improve the performance of portable programs on high-performance computers. In this paper we examine where Extended BLAS routines may be inserted in LINPACK so that no changes in the parameter list have to be made. We also discuss why, for some algorithms, a simple restructuring in terms of Level 2 BLAS fails. We do not attempt to redesign the algorithms or to change the data structure. We concentrate on the translation of calls to the original (Level 1) BLAS into calls to Level 2 BLAS to improve readability, modularity, and efficiency. This examination results in a still portable subset of LINPACK with a better performance than the original routines. The measured performances of original and modified LINPACK routines on the CDC CYBER 990, CDC CYBER 205, CRAY X-MP, and the NEX SX-2 are compared and analyzed.  相似文献   

10.
Supercomputers, such as CRAY-1, CRAY X-MP, CYBER 205, ETA10, … etc, have been regularly used for solving numerical problems. It is very rare that supercomputers are used to solve combinatorial problems. In this paper, we present an efficient vectorized algorithm to solve the set cover problem, which was proved to be NP-complete, on a supercomputer, ETA10-Q108. This algorithm fully utilizes vector instructions. Experiments are performed both on ETA10-Q108 and VAX/8550 for comparison. It takes VAX/8550 1174.5 seconds to solve a set of problem instances while it takes ETA10-Q108 only 26.6 seconds to solve the same set of problems. For a problem instance involving 7000 elements in a set, it takes 47.74 seconds for the supercomputer to solve it. If VAX/8550 is used, it will need roughly 15 hours. Thus we conclude that it is quite feasible to solve the set cover problem by using supercomputers.  相似文献   

11.
现有大规模支持向量机求解算法需要大量的内存资源和训练时间,通常在大集群并行环境下才能实现。提出了一种大规模支持向量机(SVM)的高效求解算法,以在个人PC机求解大规模SVM。它包括3个步骤:首先对大规模样本进行子采样来降低数据规模;然后应用随机傅里叶映射显式地构造随机特征空间,使得可在该随机特征空间中应用线性SVM来一致逼近高斯核SVM;最后给出线性SVM在多核环境下的并行实现方法以进一步提高求解效率。标准数据集的对比实验验证了该求解算法的可行性与高效性。  相似文献   

12.
微机环境下基于PVM的网络并行程序开发方法   总被引:1,自引:0,他引:1  
并行虚拟机PVM是一种通用的网络并行程序开发环境,它可以把连网的巨型机,大规模并行机,工作站以及微机作为一大型并行机使用,供人们开发并行算法或运行并行系统。此文对PVM的基本情况和最新进展进行介绍,讨论了基于PVM的网络并行程序开发方法,最后给出了具体的实例。  相似文献   

13.
A new Monte Carlo algorithm for the 3D Ising model and its implementation on a CDC CYBER 205 is presented. This approach is applicable to lattices with sizes between 3·3·3 and 192·192·192 with periodic boundary conditions, and is adjustable to various kinetic models. It simulates a canonical ensemble at given temperature generating a new random number for each spin flip. For the Metropolis transition probability the speed is 27 ns per updates on a two-pipe CDC Cyber 205 with 2 million words physical memory, i.e. 1.35 times the cycle time per update or 38 million updates per second.  相似文献   

14.
Intensive use of computer experiments in modern science introduces qualitative changes into experimental resources. This implies the change in techniques used to solve relevant problems. An analysis of technological chains (from the problem statement to its solution) shows that often a particular problem can be solved in a variety of ways with the use of modern multiprocessor computers, which are also called supercomputers. The multiplicity of approaches to solving a problem requires that researches possess certain skills in using supercomputers. It is difficult for novice users of multiprocessor computers to find bearings when developing software for solving applied problems. The practice shows that main difficulties reveal themselves when it is required to develop portable and efficient parallel software. This is because tools that facilitate the development and provide full access to debugging information have yet to be elaborated. Actually, the problem is in the absence of standards for development and debugging tools for supercomputers, which is explained by the fact that computer science is yet young. For the same reason, no logically complete basic texts for concurrent programming courses for novices are available. On the basis of Russian-language literature, an attempt is made at setting up beacons that mark certain common and promising technologies in using supercomputers. The emphasis is made on problems encountered by programmers when solving applied problems with the use of supercomputers. The development of multiprocessor computers is closely related to concurrent programming technologies, both universal and oriented to specific supercomputer architectures. By programming technology, i.e., by memory management, we mean the use of tools designed for managing a particular computer system. It should be noted that when developing software for supercomputers (both management software and programs for solving applied problems), one must pay special attention to programming technique, i.e., to designing the logical architecture of a program. This implies the development and extending parallelizing algorithms, which enhances the efficiency of execution on multiprocessor computers. This review was compiled on the basis of publications in Russian journals and in the Russian Internet zone.  相似文献   

15.
R. Baird 《Software》1973,3(4):385-395
This article describes a technique for controlling computer system activity in order to measure selected hardware or software functions, as they interact within an application under real operating conditions. APET is an approach for manufacturing synthetic benchmark or yardstick programs that exercise the computer system.  相似文献   

16.
Lattice gauge theory is a technique for studying quantum field theory free of divergences. All the Monte Carlo computer calculations up to now have been performed on scalar machines. A technique has been developed for effectively vectorizing this class of Monte Carlo problems. The key for vectorizing is in finding groups in finding groups of points on the space-time lattice which are independent of each other. This requires a particular ordering of points along diagonals. A technique for matrix multiply is used which enables one to get the whole of the result matrix in one pass. The CDC CYBER 205 is most suitable for this class of problems using random “index-lists” (arising from the ordering algorithm and the use of random numbers) due to the hardware implementation of “GATHER” and “SCATTER” operations performing at a streaming-rate. A preliminary implementation of this method has executed 5 times faster than on the CDC 7600 system.  相似文献   

17.
18.
The vectorization of FORTRAN programmes for the computation of the forces in molecular dynamics (MD) calculations are described. For systems containing linear molecules, two equivalent MD methods can be used: the Singer method and the constraints method. The FORTRAN vector code is presented and discussed for both methods. A comparison of computational times on the CYBER 205 is presented. For the two-centre Lennard-Jones potential, the constraints algorithm becomes increasingly less efficient than the Singer algorithm when executed on the CYBER 205. The reason for this is the difference in the neighbour-list which is made for the centre of each molecule in the Singer method and for each site in the molecule in the constraints method. Both programmes run about a factor of 15 faster on the Cyber 205 than on the conventional computer Cyber 175, for 108 or 256 linear molecules.  相似文献   

19.
Several practical control programs for a speed-tracker with digital control computer are studied in this paper. The stability and adaptability of the closed - loop system are discussed. A realtime hybrid computer simulation is presented.  相似文献   

20.
Cyclic reduction, originally proposed by Hockney and Golub, is the most popular algorithm for solving tridiagonal linear systems on SIMD-type computers like CRAY-1 or CDC CYBER 205. That algorithm seems to be the adequate one for the IBM 3090 VF (uni-processor), too, although the overall expected speedup over Gaussian elimination, specialized for tridiagonal systems, is not as high as for the CRAY-1 or the CYBER 205. That is because the excellent scalar speed of the IBM 3090 makes its vector-to-scalar speed ratio relatively moderate.

The idea of the cyclic reduction algorithm can be generalized and modified in various directions. A polyalgorithm can be derived which takes into account much better the given architecture of the IBM 3090 VF than the ‘pure’ cyclic reduction algorithm as described for instance by Kershaw. This is mainly achieved by introducing more locality into the formulae. For large systems of equations the well-known cache problems are prevented.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号