首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Optimization of vector-intensive applications for the CRAY X-MP/Y-MP often requires arranging the operations to take full advantage of such architectural features as the memory system, independent memory ports, chaining, and independent functional units. Estimation of performance is not straightforward since many operations can occur concurrently. As a tool for making trades between vector algorithms, a method has been developed and used successfully at E-Systems Inc. to predict the execution time of a sequence of vector operations without resorting to actual code development. This method reduced our software development time, produced significantly more efficient code, and provided for a systematic approach to optimization. The performance estimation is generally accurate to within 10% and accounts for memory conflicts that result from fixed stride references.  相似文献   

2.
In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300%.  相似文献   

3.
The serial and parallel performance of one of the world's fastest general purpose computers, the CRAY-2, is analyzed using the standard Los Alamos Benchmark Set plus codes adapted for parallel processing. For comparison, architectural and performance data are also given for the CRAY X-MP/416. Factors affecting performance, such as memory bandwidth, size and access speed of memory, and software exploitation of hardware, are examined. The parallel processing environments of both machines are evaluated, and speedup measurements for the parallel codes are given.An earlier version of this paper was presented at Supercomputing '88This work was performed under the auspices of the U.S. Department of Energy.  相似文献   

4.
5.
This paper describes the parallel implementation of the immersed boundary method on a shared-memory machine such as the Cray C-90 computer. In this implementation, outer loops are parallelized and inner loops are vectorized. The sustained computation rates achieved are 0.258 Gflops with a single processor, 1.89 Gflops with 8 processors, and 2.50 Gflops with 16 processors. An application to the computer simulation of blood flow in the heart is presented.  相似文献   

6.
We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the way in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q.  相似文献   

7.
We report performance measurements made on the 2-CPU CRAY X-MP at ECMWF, Reading. Vector (SIMD) performance on one CPU is interpreted by the two parameters (r, n12), and we find for dyadic operations using FORTRAN r = 70 Mflop/s, n12 = 53 flop. All vector triadic operations produce r = 107 Mflop/s, n12 = 45 flop; and a triadic operation with two vectors and one scalar gives r = 148 Mflop/s and n12 = 60 flop. MIMD performance using both CPUs on one job is interpreted with the two parameters (r, s12), where s12 is the amount of arithmetic that could have been done during the time taken to synchronize the two CPUs. We find, for dyadic operations using the TSKSTART and TSKWAIT synchronization primitives, that r = 130 Mflop/s and s12 = 5700 flop. This means that a job must contain more than ~ 6000 floating-point operations if it is to run at more than 50% of the maximum performance when split between both CPUs by this method. Less expensive synchronization methods using LOCKS and EVENTS reduces s12 to 4000 flop and 2000 flop respectively. A simplified form of LOCK synchronization written in CAL code further reduces s12 to 220 flop. This is probably the minimum possible value for synchronization overhead on the CRAY X-MP.  相似文献   

8.
Critical point tracking is a core topic in scientific visualization for understanding the dynamic behaviour of time-varying vector field data. The topological notion of robustness has been introduced recently to quantify the structural stability of critical points, that is, the robustness of a critical point is the minimum amount of perturbation to the vector field necessary to cancel it. A theoretical basis has been established previously that relates critical point tracking with the notion of robustness, in particular, critical points could be tracked based on their closeness in stability, measured by robustness, instead of just distance proximity within the domain. However, in practice, the computation of classic robustness may produce artifacts when a critical point is close to the boundary of the domain; thus, we do not have a complete picture of the vector field behaviour within its local neighbourhood. To alleviate these issues, we introduce a multilevel robustness framework for the study of 2D time-varying vector fields. We compute the robustness of critical points across varying neighbourhoods to capture the multiscale nature of the data and to mitigate the boundary effect suffered by the classic robustness computation. We demonstrate via experiments that such a new notion of robustness can be combined seamlessly with existing feature tracking algorithms to improve the visual interpretability of vector fields in terms of feature tracking, selection and comparison for large-scale scientific simulations. We observe, for the first time, that the minimum multilevel robustness is highly correlated with physical quantities used by domain scientists in studying a real-world tropical cyclone dataset. Such an observation helps to increase the physical interpretability of robustness.  相似文献   

9.
This paper provides a kind of new matrix inequalities formulation for multi‐objective H2/L2 performance controller synthesis of linear parameter varying systems. These new matrix inequalities enable us to parameterize controllers without involving the Lyapunov variables in the formulation. Taking advantage of this feature, we can readily design multiobjective controllers with non‐common parameter‐dependent Lyapunov variables and two adjustable scalars. Furthermore, to obtain possibly lower values of performance criteria, a linear matrix inequalities (LMI)‐based optimal problem is solved by using the grid of the space that is combined with these two scalars. Finally, a numerical example is included to illustrate the effectiveness of the proposed method. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

10.
Dunis and Williams (Derivatives: use, trading and regulation 8(3):211–239, 2002; Applied quantitative methods for trading and investment. Wiley, Chichester, 2003) have shown the superiority of a Multi-layer perceptron network (MLP), outperforming its benchmark models such as a moving average convergence divergence technical model (MACD), an autoregressive moving average model (ARMA) and a logistic regression model (LOGIT) on a Euro/Dollar (EUR/USD) time series. The motivation for this paper is to investigate the use of different neural network architectures. This is done by benchmarking three different neural network designs representing a level estimator, a classification model and a probability distribution predictor. More specifically, we present the Mulit-layer perceptron network, the Softmax cross entropy model and the Gaussian mixture model and benchmark their respective performance on the Euro/Dollar (EUR/USD) time series as reported by Dunis and Williams. As it turns out, the Multi-layer perceptron does best when used without confirmation filters and leverage, while the Softmax cross entropy model and the Gaussian mixture model outperforms the Multi-layer perceptron when using more sophisticated trading strategies and leverage. This might be due to the ability of both models using probability distributions to identify successfully trades with a high Sharpe ratio.
Paulo LisboaEmail:
  相似文献   

11.
Ordered mesoporous SnO2 and mesoporous Pd/SnO2 have been successfully synthesized via nanocasting method using the hexagonal mesoporous SBA-15 as template. Two different procedures, impregnation technique and direct synthesis, were utilized for the doping of Pd in the mesoporous SnO2. The results of small angle X-ray diffraction (SAXD), nitrogen adsorption–desorption and transmission electron microscopy (TEM) demonstrate that the SnO2 and Pd/SnO2 display ordered mesoporous structures and high surface areas. Wide angle X-ray diffraction (WAXD) and X-ray photoelectron spectroscopy (XPS) reveal tetragonal structure of SnO2 and the existence of Pd element. The sensing properties of mesoporous SnO2 and mesoporous Pd/SnO2 for H2 were detected. The sensor utilizing mesoporous Pd/SnO2 via direct synthesis method exhibits excellent response and recovery behavior and much higher sensitivity to H2, compared to those using mesoporous SnO2 and mesoporous Pd/SnO2 via impregnation technique. It is believed that its high gas sensing performance is derived from the large surface area, high activity and well dispersion of Pd additive, as well as high porosity, which lead to highly effective surface interaction between the target gas molecules and the surface active sites.  相似文献   

12.
Behavioral modeling for the concurrent dual‐band power amplifier (PA) is a critical problem in practical applications. The nonlinear distortion in the concurrent dual‐band PA is quite different from that in the conventional single‐band PA. This article analyzes the nonlinearities in the concurrent dual‐band PA and reveals that both input signals in the dual bands are important for the behavioral modeling. The 2D Hammerstein model and 2D Wiener model are proposed for the first time for the concurrent dual‐band PA. They are extended versions of conventional Hammerstein and Wiener structures used in the single‐band PA by including the cross‐band intermodulation in the static nonlinearity block. The proposed 2D models require much less coefficients than the original work of the 2D‐DPD model. Experiments were carried out for an 880 MHz/1960 MHz concurrent dual‐band Doherty PA to demonstrate the effectiveness of the proposed models. The results clearly show that less than ?40 dB normalized mean square errors (NMSEs) are obtained in the dual bands in the behavioral modeling. © 2012 Wiley Periodicals, Inc. Int J RF and Microwave CAE 23: 646–654, 2013.  相似文献   

13.
本文提出一种新的MPEG-2到H.264的视频转码算法.它通过充分利用MPEG-2解码过程中得到的运动矢量和宏块编码残差等信息,可显著减少H.264编码过程中宏码编码模式确定和运动估计过程的计算复杂度,并得到最终的H.264视频流.几种典型视频测试序列的仿真实验结果表明,本算法对视频质量的损失较小,有稳定的率失真性能,有利于实时转码的实现.  相似文献   

14.
In this paper,we have studied the total ionizing dose(TID)radiation response up to 2 Mrad(Si)of silicon-oxide-nitride-oxide-silicon(SONOS)memory cells and memory circuits,fabricated in a 130 nm complimentary metal-oxide-semiconductor(CMOS)SONOS technology.We explored the threshold voltage(VT)degradation mechanism and found that the VT shifts of SONOS cells depend on the charge state;simply programming the cell to a higher VT cannot compensate for the radiation induced VT loss.The off-state current(Ioff)increase in the SONOS cell is also studied in this paper.Both VT and Ioffdegradation would affect the memory system.Read data failures are mainly caused by VT shifts under irradiation,and program and erase failures are mainly caused by increased Ioff,which overloads the charge pumping circuit.By varying the reference current,our 4 Mb NOR flash chip has the potential to survive a radiation dose of 1 Mrad(Si)in read mode.  相似文献   

15.
To estimate the gross CO2 flux (FCO2) of deciduous coniferous forest from canopy spectral reflectance, we introduced spectral vegetation indices (VIs) into a light use efficiency (LUE) model of mature Japanese larch (Larix kaempferi) forest. We measured the eddy covariance CO2 flux and spectral reflectance of larch canopy at half-hourly intervals during one growing season, and investigated the relationships between the parameters of the LUE model (FAPAR, ?) and 3 types of VIs (NDVI, PRI, EVI) in both clear sky and cloudy conditions.FAPAR (fraction of absorbed photosynthetically active radiation) had a positive linear relationship with both NDVI (normalized difference vegetation index) and EVI (enhanced vegetation index), and the sky condition had little effect on the relationships. The relative RMSE (root mean square error) of the APAR (absorbed photosynthetically active radiation) based on the incoming PAR and estimated FAPAR from a linear function of NDVI was less than 10.5%, irrespective of sky condition.Half-hourly values of ? (conversion efficiency of absorbed energy) showed both seasonal variation related to leaf phenology and short-term variation related to light intensity due to varied sun position and sky condition. Both EVI and PRI (photochemical reflectance index) were significantly correlated with ?. EVI showed a positive linear relationship with ? as a result of their similar seasonal variation. However, since EVI did not detect short-term variation of ?, their relationship differed among sky conditions. On the other hand, although PRI could trace the short-term variation of ? in green needles, the relationship became non-linear due to drastic reduction of PRI in the senescent needles.EVI/(PRI/PRImin), a combined index based on a 6-day moving minimum value of PRI (PRImin), showed a linear relationship with half-hourly values of ? throughout the seasons irrespective of sky condition. This index allow us to estimate ? in all sky conditions with a smaller error (rRMSE = 35.2%) than using EVI or PRI alone (38.7%-48.7%). Consequently, this combined index-derived ? and NDVI-based FAPAR gave a low estimation error of FCO2 (rRMSE = 36.4%, RMSE = 8.3 μmol m− 2 s− 1). Although there are still various issues to resolve, including adaptive limit and combination of vegetation index type, we conclude that the combination of PRI and EVI increased the accuracy of estimation of CO2 uptake in deciduous forest even though sky conditions varied.  相似文献   

16.
NO gas sensors, based on ZnO thin film (ZnOfilm), TiO2 nanoparticulate film (TiO2NP), and TiO2NP/ZnOfilm double-layer film, were fabricated, and their sensing characteristics towards NO gas were investigated in this study. The maximal response of a ZnOfilm deposited onto a rougher Al2O3 substrate, towards NO gas, was higher than that of a ZnOfilm deposited on a smoother glass substrate. Although the sensing response of the TiO2NPs itself towards NO gas was minute, the TiO2NP/ZnOfilm double-layer film showed enhanced response as compared with TiO2NP or ZnOfilm single-layer film. In addition, the sensor response of the TiO2NP/ZnOfilm double-layer film was strongly influenced by the annealing time for the film preparation; the maximum response to NO was enhanced about 6.2 times as the annealing time was increased from 30 min to 2 h. Based on the XPS results, the increase in the transition zone between TiO2NP and ZnOfilm along with the appearance of Ti3+ state was noticed when the annealing time was increased. With the high sensitive TiO2NP/ZnOfilm/Al2O3 electrode, the limit of detection (S/N = 3) can be achieved at 8.8 ppb. The double-layer TiO2NP/ZnOfilm also showed improved selectivities with respect to NO2 and CO.  相似文献   

17.
针对现有SO2浓度预测方法中存在的污染物来源和影响因素认识不统一、小样本数据敏感、易于陷入局部最优等问题,文中提出了基于模糊时序和支持向量机的高速公路SO2浓度预测算法,为搭建高速公路环境健康监测系统提供了可靠的理论支持.该方法依据SO2浓度的季节变动规律,以季节作为时间序列,以24h为粒化窗宽,通过高斯核函数提取原始样本数据的特征值,输入支持向量机训练模型,并利用k重交叉验证法结合网格划分优化模型参数.文中应用该方法建立了SO2浓度预测模型,并以2014年4月至2015年3月山西省太旧高速公路某监测点SO2小时浓度监测值为样本数据,在MATLAB平台下应用LIBSVM工具实现了计算过程.结果表明,基于模糊时序和支持向量机的高速公路SO2浓度预测算法不受机理性理论研究的限制,支持小样本学习,非线性拟合效果好,泛化能力强.  相似文献   

18.
In this paper we will consider systems with linear time-invariant perturbations. We will analyze robust performance in the ?2 and ? settings. The ?2 setting gives rise to the familiar case of structured singular values, and a stability criterion is given by the “small μ” theorem. We show that although the necessary and sufficient criterion of robust stability for the ? case (? stability with structured ?-gain bounded perturbations) is the same “small μ” criterion, a system with ?2-gain bounded perturbations is never ? stable.  相似文献   

19.
APL and FORTRAN programs utilizing a new modified hard-sphere Redlich-Kwong equation calculate volumes and fugacity coefficients for pure H2O and CO2, and activities in H2O-CO2 mixtures, throughout most of the crustal and upper mantle P?T conditions. The new modification allows the term of the equation representing attractive intermolecular forces to vary as a function of both temperature and pressure, in contrast to earlier versions where this term was considered a function of temperature only. Compared with previous modified Redlich-Kwong (MRK) equations, this equation predicts thermodynamic properties for pure H2O and CO2 which are in better agreement with those derived from experimental P?V?T data. These programs are versatile and can be incorporated into existing routines which calculate mixed-volatile (H2O–CO2) phase equilibria for petrologic systems.  相似文献   

20.
This communication addresses the analytical PID tuning rules for integrating processes. First, this paper provides an analytical tuning method of two-degree-of-freedom (2-Dof) PID controller using an enhanced internal model control (IMC) principle. On the basis of the robustness analyses, the presented method can easily achieve the performance/robustness tradeoff by specifying a desired robustness degree. Second, an analytical tuning method of one-degree-of-freedom (1-Dof) PID also is proposed in terms of performance/robustness and servo/regulator tradeoffs, which are not commonly considered for 1-Dof controller design. The servo/regulator tradeoff is formulated as a constrained optimization problem to provide output responses as similar as possible to those produced by the 2-Dof PID controller. The presented PID settings are applicable for a wide range of integrating processes. Simulation studies show the effectiveness and merits of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号