期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Vector performance estimation for CRAY X-MP/Y-MP supercomputers

Allen R. Hainline Steven R. Thompson Lawrence L. Halcomb 《The Journal of supercomputing》1992,6(1):49-70

Optimization of vector-intensive applications for the CRAY X-MP/Y-MP often requires arranging the operations to take full advantage of such architectural features as the memory system, independent memory ports, chaining, and independent functional units. Estimation of performance is not straightforward since many operations can occur concurrently. As a tool for making trades between vector algorithms, a method has been developed and used successfully at E-Systems Inc. to predict the execution time of a sequence of vector operations without resorting to actual code development. This method reduced our software development time, produced significantly more efficient code, and provided for a systematic approach to optimization. The performance estimation is generally accurate to within 10% and accounts for memory conflicts that result from fixed stride references. 相似文献

2.

Ultrahigh-performance FFTs for the CRAY-2 and CRAY Y-MP supercomputers

David A. Carlson 《The Journal of supercomputing》1992,6(2):107-116

In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300%. 相似文献

3.

Performance comparison of the CRAY-2 and CRAY X-MP/416 supercomputers

Margaret L. Simmons Harvey J. Wasserman 《The Journal of supercomputing》1990,4(2):153-167

The serial and parallel performance of one of the world's fastest general purpose computers, the CRAY-2, is analyzed using the standard Los Alamos Benchmark Set plus codes adapted for parallel processing. For comparison, architectural and performance data are also given for the CRAY X-MP/416. Factors affecting performance, such as memory bandwidth, size and access speed of memory, and software exploitation of hardware, are examined. The parallel processing environments of both machines are evaluated, and speedup measurements for the parallel codes are given.An earlier version of this paper was presented at Supercomputing '88This work was performed under the auspices of the U.S. Department of Energy. 相似文献

4.

A Benchmark Comparison of Three Supercomputers: Fujitsu VP-200, Hitachi S810/120, and Cray X-MP/2

Lubeck O. Moore J. Mendez R. 《Computer》1985,18(12):10-24

相似文献

5.

Shared-Memory Parallel Vector Implementation of the Immersed Boundary Method for the Computation of Blood Flow in the Beating Mammalian Heart 总被引：3，自引：0，他引：3

McQueen David Peskin Charles 《The Journal of supercomputing》1997,11(3):213-236

This paper describes the parallel implementation of the immersed boundary method on a shared-memory machine such as the Cray C-90 computer. In this implementation, outer loops are parallelized and inner loops are vectorized. The sustained computation rates achieved are 0.258 Gflops with a single processor, 1.89 Gflops with 8 processors, and 2.50 Gflops with 16 processors. An application to the computer simulation of blood flow in the heart is presented. 相似文献

6.

A performance comparison of current HPC systems: Blue Gene/Q,Cray XE6 and InfiniBand systems

《Future Generation Computer Systems》2014

We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the way in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q. 相似文献

7.

(

r_{∞}, n_{12}, s_{12}

) measurements on the 2-CPU CRAY X-MP

Roger W Hockney 《Parallel Computing》1985,2(1):1-14

We report performance measurements made on the 2-CPU CRAY X-MP at ECMWF, Reading. Vector (SIMD) performance on one CPU is interpreted by the two parameters (

r_{∞}, n_{12}

), and we find for dyadic operations using FORTRAN

r_{∞} = 70 Mflop/s, n_{12} = 53 flop

. All vector triadic operations produce

r_{∞} = 107 Mflop/s, n_{12} = 45 flop

; and a triadic operation with two vectors and one scalar gives r_∞ = 148 Mflop/s and

n_{12} = 60 flop

. MIMD performance using both CPUs on one job is interpreted with the two parameters (

r_{∞}, s_{12}

), where

s_{12}

is the amount of arithmetic that could have been done during the time taken to synchronize the two CPUs. We find, for dyadic operations using the TSKSTART and TSKWAIT synchronization primitives, that r_∞ = 130 Mflop/s and

s_{12} = 5700 flop

. This means that a job must contain more than ～ 6000 floating-point operations if it is to run at more than 50% of the maximum performance when split between both CPUs by this method. Less expensive synchronization methods using LOCKS and EVENTS reduces

s_{12}

to 4000 flop and 2000 flop respectively. A simplified form of LOCK synchronization written in CAL code further reduces

s_{12}

to 220 flop. This is probably the minimum possible value for synchronization overhead on the CRAY X-MP. 相似文献

8.

Multilevel Robustness for 2D Vector Field Feature Tracking,Selection and Comparison

Lin Yan Paul Aaron Ullrich Luke P. Van Roekel Bei Wang Hanqi Guo 《Computer Graphics Forum》2023,42(6):e14799

Critical point tracking is a core topic in scientific visualization for understanding the dynamic behaviour of time-varying vector field data. The topological notion of robustness has been introduced recently to quantify the structural stability of critical points, that is, the robustness of a critical point is the minimum amount of perturbation to the vector field necessary to cancel it. A theoretical basis has been established previously that relates critical point tracking with the notion of robustness, in particular, critical points could be tracked based on their closeness in stability, measured by robustness, instead of just distance proximity within the domain. However, in practice, the computation of classic robustness may produce artifacts when a critical point is close to the boundary of the domain; thus, we do not have a complete picture of the vector field behaviour within its local neighbourhood. To alleviate these issues, we introduce a multilevel robustness framework for the study of 2D time-varying vector fields. We compute the robustness of critical points across varying neighbourhoods to capture the multiscale nature of the data and to mitigate the boundary effect suffered by the classic robustness computation. We demonstrate via experiments that such a new notion of robustness can be combined seamlessly with existing feature tracking algorithms to improve the visual interpretability of vector fields in terms of feature tracking, selection and comparison for large-scale scientific simulations. We observe, for the first time, that the minimum multilevel robustness is highly correlated with physical quantities used by domain scientists in studying a real-world tropical cyclone dataset. Such an observation helps to increase the physical interpretability of robustness. 相似文献

9.

Multi‐objective H2/L2 performance controller synthesis for LPV systems

Wei Xie 《Asian journal of control》2012,14(5):1273-1281

This paper provides a kind of new matrix inequalities formulation for multi‐objective H2/L2 performance controller synthesis of linear parameter varying systems. These new matrix inequalities enable us to parameterize controllers without involving the Lyapunov variables in the formulation. Taking advantage of this feature, we can readily design multiobjective controllers with non‐common parameter‐dependent Lyapunov variables and two adjustable scalars. Furthermore, to obtain possibly lower values of performance criteria, a linear matrix inequalities (LMI)‐based optimal problem is solved by using the grid of the space that is combined with these two scalars. Finally, a numerical example is included to illustrate the effectiveness of the proposed method. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society 相似文献

10.

Level estimation, classification and probability distribution architectures for trading the EUR/USD exchange rate

Andreas Lindemann Christian L. Dunis Paulo Lisboa 《Neural computing & applications》2005,14(3):256-271

Dunis and Williams (Derivatives: use, trading and regulation 8(3):211–239, 2002; Applied quantitative methods for trading and investment. Wiley, Chichester, 2003) have shown the superiority of a Multi-layer perceptron network (MLP), outperforming its benchmark models such as a moving average convergence divergence technical model (MACD), an autoregressive moving average model (ARMA) and a logistic regression model (LOGIT) on a Euro/Dollar (EUR/USD) time series. The motivation for this paper is to investigate the use of different neural network architectures. This is done by benchmarking three different neural network designs representing a level estimator, a classification model and a probability distribution predictor. More specifically, we present the Mulit-layer perceptron network, the Softmax cross entropy model and the Gaussian mixture model and benchmark their respective performance on the Euro/Dollar (EUR/USD) time series as reported by Dunis and Williams. As it turns out, the Multi-layer perceptron does best when used without confirmation filters and leverage, while the Softmax cross entropy model and the Gaussian mixture model outperforms the Multi-layer perceptron when using more sophisticated trading strategies and leverage. This might be due to the ability of both models using probability distributions to identify successfully trades with a high Sharpe ratio.

Paulo LisboaEmail:

相似文献

11.

Ordered mesoporous Pd/SnO₂ synthesized by a nanocasting route for high hydrogen sensing performance

Jing Zhao Weinan Wang Yinping Liu Jinming Ma Xiaowei Li Yu Du Geyu Lu Author vitae 《Sensors and actuators. B, Chemical》2011,160(1):604

Ordered mesoporous SnO₂ and mesoporous Pd/SnO₂ have been successfully synthesized via nanocasting method using the hexagonal mesoporous SBA-15 as template. Two different procedures, impregnation technique and direct synthesis, were utilized for the doping of Pd in the mesoporous SnO₂. The results of small angle X-ray diffraction (SAXD), nitrogen adsorption–desorption and transmission electron microscopy (TEM) demonstrate that the SnO₂ and Pd/SnO₂ display ordered mesoporous structures and high surface areas. Wide angle X-ray diffraction (WAXD) and X-ray photoelectron spectroscopy (XPS) reveal tetragonal structure of SnO₂ and the existence of Pd element. The sensing properties of mesoporous SnO₂ and mesoporous Pd/SnO₂ for H₂ were detected. The sensor utilizing mesoporous Pd/SnO₂ via direct synthesis method exhibits excellent response and recovery behavior and much higher sensitivity to H₂, compared to those using mesoporous SnO₂ and mesoporous Pd/SnO₂ via impregnation technique. It is believed that its high gas sensing performance is derived from the large surface area, high activity and well dispersion of Pd additive, as well as high porosity, which lead to highly effective surface interaction between the target gas molecules and the surface active sites. 相似文献

12.

Behavioral modeling for concurrent dual‐band power amplifiers using 2D hammerstein/wiener models

You‐Jiang Liu Wenhua Chen Jie Zhou Bang‐Hua Zhou F. M. Ghannouchi 《国际射频与微波计算机辅助工程杂志》2013,23(6):646-654

Behavioral modeling for the concurrent dual‐band power amplifier (PA) is a critical problem in practical applications. The nonlinear distortion in the concurrent dual‐band PA is quite different from that in the conventional single‐band PA. This article analyzes the nonlinearities in the concurrent dual‐band PA and reveals that both input signals in the dual bands are important for the behavioral modeling. The 2D Hammerstein model and 2D Wiener model are proposed for the first time for the concurrent dual‐band PA. They are extended versions of conventional Hammerstein and Wiener structures used in the single‐band PA by including the cross‐band intermodulation in the static nonlinearity block. The proposed 2D models require much less coefficients than the original work of the 2D‐DPD model. Experiments were carried out for an 880 MHz/1960 MHz concurrent dual‐band Doherty PA to demonstrate the effectiveness of the proposed models. The results clearly show that less than ?40 dB normalized mean square errors (NMSEs) are obtained in the dual bands in the behavioral modeling. © 2012 Wiley Periodicals, Inc. Int J RF and Microwave CAE 23: 646–654, 2013. 相似文献

13.

重用运动矢量和预测残差的MPEG-2/H.264空间分辨率转码

杨高波雷靖陈薇薇《小型微型计算机系统》2009,30(7)

本文提出一种新的MPEG-2到H.264的视频转码算法.它通过充分利用MPEG-2解码过程中得到的运动矢量和宏块编码残差等信息,可显著减少H.264编码过程中宏码编码模式确定和运动估计过程的计算复杂度,并得到最终的H.264视频流.几种典型视频测试序列的仿真实验结果表明,本算法对视频质量的损失较小,有稳定的率失真性能,有利于实时转码的实现. 相似文献

14.

Total ionizing radiation effects of 2-T SONOS for 130 nm/4 Mb NOR flash memory technology

QIAO FengYing PAN LiYang YU Xiao MA HaoZhi WU Dong XU Jun 《中国科学:信息科学(英文版)》2014,57(6):1-9

In this paper,we have studied the total ionizing dose(TID)radiation response up to 2 Mrad(Si)of silicon-oxide-nitride-oxide-silicon(SONOS)memory cells and memory circuits,fabricated in a 130 nm complimentary metal-oxide-semiconductor(CMOS)SONOS technology.We explored the threshold voltage(VT)degradation mechanism and found that the VT shifts of SONOS cells depend on the charge state;simply programming the cell to a higher VT cannot compensate for the radiation induced VT loss.The off-state current(Ioff)increase in the SONOS cell is also studied in this paper.Both VT and Ioffdegradation would affect the memory system.Read data failures are mainly caused by VT shifts under irradiation,and program and erase failures are mainly caused by increased Ioff,which overloads the charge pumping circuit.By varying the reference current,our 4 Mb NOR flash chip has the potential to survive a radiation dose of 1 Mrad(Si)in read mode. 相似文献

15.

Utility of spectral vegetation index for estimation of gross CO₂ flux under varied sky conditions

Tatsuro Nakaji Reiko Ide Nobuko Saigusa 《Remote sensing of environment》2007,109(3):274-284

To estimate the gross CO₂ flux (F_CO₂) of deciduous coniferous forest from canopy spectral reflectance, we introduced spectral vegetation indices (VIs) into a light use efficiency (LUE) model of mature Japanese larch (Larix kaempferi) forest. We measured the eddy covariance CO₂ flux and spectral reflectance of larch canopy at half-hourly intervals during one growing season, and investigated the relationships between the parameters of the LUE model (FAPAR, ?) and 3 types of VIs (NDVI, PRI, EVI) in both clear sky and cloudy conditions.FAPAR (fraction of absorbed photosynthetically active radiation) had a positive linear relationship with both NDVI (normalized difference vegetation index) and EVI (enhanced vegetation index), and the sky condition had little effect on the relationships. The relative RMSE (root mean square error) of the APAR (absorbed photosynthetically active radiation) based on the incoming PAR and estimated FAPAR from a linear function of NDVI was less than 10.5%, irrespective of sky condition.Half-hourly values of ? (conversion efficiency of absorbed energy) showed both seasonal variation related to leaf phenology and short-term variation related to light intensity due to varied sun position and sky condition. Both EVI and PRI (photochemical reflectance index) were significantly correlated with ?. EVI showed a positive linear relationship with ? as a result of their similar seasonal variation. However, since EVI did not detect short-term variation of ?, their relationship differed among sky conditions. On the other hand, although PRI could trace the short-term variation of ? in green needles, the relationship became non-linear due to drastic reduction of PRI in the senescent needles.EVI/(PRI/PRI_min), a combined index based on a 6-day moving minimum value of PRI (PRI_min), showed a linear relationship with half-hourly values of ? throughout the seasons irrespective of sky condition. This index allow us to estimate ? in all sky conditions with a smaller error (rRMSE = 35.2%) than using EVI or PRI alone (38.7%-48.7%). Consequently, this combined index-derived ? and NDVI-based FAPAR gave a low estimation error of F_CO₂ (rRMSE = 36.4%, RMSE = 8.3 μmol m^− 2 s^− 1). Although there are still various issues to resolve, including adaptive limit and combination of vegetation index type, we conclude that the combination of PRI and EVI increased the accuracy of estimation of CO₂ uptake in deciduous forest even though sky conditions varied. 相似文献

16.

Using a TiO₂/ZnO double-layer film for improving the sensing performance of ZnO based NO gas sensor

Chia-Yu LinAuthor VitaeJiang-Ging ChenAuthor Vitae Wei-Yi FengAuthor VitaeChii-Wann LinAuthor Vitae Ju-Wen HuangAuthor VitaeJames J. TunneyAuthor Vitae Kuo-Chuan HoAuthor Vitae 《Sensors and actuators. B, Chemical》2011,157(2):361-367

NO gas sensors, based on ZnO thin film (ZnO_film), TiO₂ nanoparticulate film (TiO₂NP), and TiO₂NP/ZnO_film double-layer film, were fabricated, and their sensing characteristics towards NO gas were investigated in this study. The maximal response of a ZnO_film deposited onto a rougher Al₂O₃ substrate, towards NO gas, was higher than that of a ZnO_film deposited on a smoother glass substrate. Although the sensing response of the TiO₂NPs itself towards NO gas was minute, the TiO₂NP/ZnO_film double-layer film showed enhanced response as compared with TiO₂NP or ZnO_film single-layer film. In addition, the sensor response of the TiO₂NP/ZnO_film double-layer film was strongly influenced by the annealing time for the film preparation; the maximum response to NO was enhanced about 6.2 times as the annealing time was increased from 30 min to 2 h. Based on the XPS results, the increase in the transition zone between TiO₂NP and ZnO_film along with the appearance of Ti³⁺ state was noticed when the annealing time was increased. With the high sensitive TiO₂NP/ZnO_film/Al₂O₃ electrode, the limit of detection (S/N = 3) can be achieved at 8.8 ppb. The double-layer TiO₂NP/ZnO_film also showed improved selectivities with respect to NO₂ and CO. 相似文献

17.

基于模糊时序和支持向量机的高速公路SO₂浓度预测算法

岳鹏程张林梁马阅军《计算机系统应用》2017,26(6):1-8

针对现有SO₂浓度预测方法中存在的污染物来源和影响因素认识不统一、小样本数据敏感、易于陷入局部最优等问题,文中提出了基于模糊时序和支持向量机的高速公路SO₂浓度预测算法,为搭建高速公路环境健康监测系统提供了可靠的理论支持.该方法依据SO₂浓度的季节变动规律,以季节作为时间序列,以24h为粒化窗宽,通过高斯核函数提取原始样本数据的特征值,输入支持向量机训练模型,并利用k重交叉验证法结合网格划分优化模型参数.文中应用该方法建立了SO₂浓度预测模型,并以2014年4月至2015年3月山西省太旧高速公路某监测点SO₂小时浓度监测值为样本数据,在MATLAB平台下应用LIBSVM工具实现了计算过程.结果表明,基于模糊时序和支持向量机的高速公路SO₂浓度预测算法不受机理性理论研究的限制,支持小样本学习,非线性拟合效果好,泛化能力强. 相似文献

18.

On ?_∞, robust stability and performance with ?₂ and ?_∞ perturbations

Mohammed Dahleh Alexandre Megretski Bassam Bamieh 《Systems & Control Letters》1996,28(1):1

In this paper we will consider systems with linear time-invariant perturbations. We will analyze robust performance in the ?₂ and ?_∞ settings. The ?₂ setting gives rise to the familiar case of structured singular values, and a stability criterion is given by the “small μ” theorem. We show that although the necessary and sufficient criterion of robust stability for the ?_∞ case (?_∞ stability with structured ?_∞-gain bounded perturbations) is the same “small μ” criterion, a system with ?₂-gain bounded perturbations is never ?_∞ stable. 相似文献

19.

APL and FORTRAN programs for a new equation of state for H₂O,CO₂, and their mixtures at supercritical conditions

G.K. Jacobs D.M. Kerrick 《Computers & Geosciences》1981,7(2):131-143

APL and FORTRAN programs utilizing a new modified hard-sphere Redlich-Kwong equation calculate volumes and fugacity coefficients for pure H₂O and CO₂, and activities in H₂O-CO₂ mixtures, throughout most of the crustal and upper mantle P?T conditions. The new modification allows the term of the equation representing attractive intermolecular forces to vary as a function of both temperature and pressure, in contrast to earlier versions where this term was considered a function of temperature only. Compared with previous modified Redlich-Kwong (MRK) equations, this equation predicts thermodynamic properties for pure H₂O and CO₂ which are in better agreement with those derived from experimental P?V?T data. These programs are versatile and can be incorporated into existing routines which calculate mixed-volatile (H₂O–CO₂) phase equilibria for petrologic systems. 相似文献

20.

Analytical IMC-PID design in terms of performance/robustness tradeoff for integrating processes: From 2-Dof to 1-Dof

《Journal of Process Control》2014,24(3):22-32

This communication addresses the analytical PID tuning rules for integrating processes. First, this paper provides an analytical tuning method of two-degree-of-freedom (2-Dof) PID controller using an enhanced internal model control (IMC) principle. On the basis of the robustness analyses, the presented method can easily achieve the performance/robustness tradeoff by specifying a desired robustness degree. Second, an analytical tuning method of one-degree-of-freedom (1-Dof) PID also is proposed in terms of performance/robustness and servo/regulator tradeoffs, which are not commonly considered for 1-Dof controller design. The servo/regulator tradeoff is formulated as a constrained optimization problem to provide output responses as similar as possible to those produced by the 2-Dof PID controller. The presented PID settings are applicable for a wide range of integrating processes. Simulation studies show the effectiveness and merits of the proposed method. 相似文献