共查询到10条相似文献,搜索用时 0 毫秒
1.
流体系结构是一种适应VLSI工艺发展的新型体系结构,它是否对科学计算程序有效是一个广泛关注的问题。本文选取NASA并行测试程序集中的一个数据密集型程序MG,研究了 它在一个64位的面向科学计算设计的流处理器FT64上的实现和优化问题。在FT64上的实测表明,经过面向片上存储层次的优化,FT64能够达到与Itanium2处理器相当的性能。
。 相似文献
。 相似文献
2.
Designers of distributed algorithms typically assume strong memory consistency guarantees, but system implementations provide weaker guarantees for better performance and scalability. This motivates the study of how to implement programs designed for sequential consistency on platforms with weaker consistency models. Typically, such implementations are impossible using only read and write operations to shared variables. One variant of processor consistency originally proposed by Goodman and called here PC-G is an exception because it provides just enough consistency to implement mutual exclusion using only reads and writes. This paper investigates the existence of compilers to convert arbitrary programs that use shared read/write variables with sequentially consistent memory semantics, to programs that use read/write variables with PC-G consistency semantics. We first provide a simple program transformation, and prove that it correctly compiles any 2-process program to a PC-G memory system, while preserving wait-freedom. We next prove that even a substantial generalization of this transformation cannot be a compiler for even a very restricted class of 3-process programs. Even though our program transformation is not a general compiler for three or more processes, it does correctly transform some specific n-process programs. In particular, for the special case of the (necessarily randomized) Test&Set algorithm of Tromp and Vitanyi, our transformation extends to any number of processes, thus providing the first algorithm for expected wait-free Test&Set on any weak memory system, using only read/write variables. 相似文献
3.
为了产生雷达天线系统测试时所需的自检信号、速度指令等信号,设计了一种基于基于 C8051F206和FT245R的混合信号生成系统。该系统利用FT245R USB接口芯片实现与上位机通信,并依靠C8051F206高程控性和C语言的高灵活性,可靠地响应上位机指令并实时发送被测件所需的自检信号、速度指令等信号源。介绍了混合信号生成系统硬件设计和软件设计的原理。实际应用表明,该系统能准确地生成被测件雷达天线系统所需的自检信号、速度指令信号,并实时地反馈测试结果。 相似文献
4.
文章[1]中提出了数组之间的数据融合优化方法,并以IA-32服务器为平台测试了数据融合优化的效果。测试结果表明,在IA-32机器上,数据融合优化在性能代价模型的控制下,能较好地改善具有非连续数据访问特征的应用程序的CACHE利用率。那么,在新一代体系结构IA-64平台上,数据融合优化的效果如何呢?该文分别以IntelIA-32服务器和HPITANIUM服务器为平台,用IntelFORTRAN编译器ifc和efc及自由软件编译器g95分别编译并运行数据融合优化变换前后的程序,获得两种平台上的执行时间及相关的性能数据。测试结果表明,源程序级的数据融合优化不能很好地与IA-64平台上的EFC编译器高级优化配合工作,在O3级优化开关控制下,优化效果是负值。此测试结果进一步表明,编译高级优化如数据预取、循环变换和数据变换等各种优化必须结合体系结构的特点统筹考虑,才能取得好的全局优化效果。该文为研究各种面向IA-32体系结构的编译优化算法在IA-64体系结构上的性能可移植性优化起到抛砖引玉的作用。 相似文献
5.
EP和GEMM是科学计算领域中常用的计算核心,并广泛应用于高性能计算机体系结构的性能评测.基于一种当今热门的体系结构--流体系结构,设计实现了这两个测试程序在FT64并行系统上的并行算法,并取得了很好的效果. 相似文献
6.
Aniruddha Gokhale Jaiganesh Balasubramanian Gan Deng Jeffrey Parsons Douglas C. Schmidt 《Science of Computer Programming》2008,73(1):39-58
Distributed real-time and embedded (DRE) systems have become critical in domains such as avionics (e.g., flight mission computers), telecommunications (e.g., wireless phone services), tele-medicine (e.g., robotic surgery), and defense applications (e.g., total ship computing environments). These types of system are increasingly interconnected via wireless and wireline networks to form systems of systems. A challenging requirement for these DRE systems involves supporting a diverse set of quality of service (QoS) properties, such as predictable latency/jitter, throughput guarantees, scalability, 24x7 availability, dependability, and security that must be satisfied simultaneously in real-time. Although increasing portions of DRE systems are based on QoS-enabled commercial-off-the-shelf (COTS) hardware and software components, the complexity of managing long lifecycles (often ∼15-30 years) remains a key challenge for DRE developers and system integrators. For example, substantial time and effort is spent retrofitting DRE applications when the underlying COTS technology infrastructure changes.This paper provides two contributions that help improve the development, validation, and integration of DRE systems throughout their lifecycles. First, we illustrate the challenges in creating and deploying QoS-enabled component middleware-based DRE applications and describe our approach to resolving these challenges based on a new software paradigm called Model Driven Middleware (MDM), which combines model-based software development techniques with QoS-enabled component middleware to address key challenges faced by developers of DRE systems — particularly composition, integration, and assured QoS for end-to-end operations. Second, we describe the structure and functionality of CoSMIC (Component Synthesis using Model Integrated Computing), which is an MDM toolsuite that addresses key DRE application and middleware lifecycle challenges, including partitioning the components to use distributed resources effectively, validating software configurations, assuring multiple simultaneous QoS properties in real-time, and safeguarding against rapidly changing technology. 相似文献
7.
Embedding a number of displacement features into a base surface is common in industrial product design and modeling, where displaced surface regions are blended with the unmodified surface region. The cubic Hermite interpolant is usually adopted for surface blending, in which tangent plane smoothness across the boundary curve is achieved. However, the polynomial degree of the tangent field curve obtained symbolically is considerably higher, and the reduction of the degree of a freeform curve is a non-trivial task. In this work, an approximation surface blending approach is proposed to achieve tangential continuity across the boundary curve. The boundary curve is first offset in the tangent field with the user-specified tolerance, after which it is refined to be compatible with the offset curve for surface blending. Since the boundary curve is offset in a three-dimensional (3D) space, the local self-intersection in the offset curve is addressed in a 2D space by approximately mapping the offset vectors in the respective tangent planes to the parameter space of the base surface. The proposed algorithm is validated using examples, and the normal vector deviation along the boundary curve is investigated. 相似文献
8.
This article is about testing the equality of several normal means when the variances are unknown and arbitrary, i.e., the set up of the one-way ANOVA. Even though several tests are available in the literature, none of them perform well in terms of Type I error probability under various sample size and parameter combinations. In fact, Type I errors can be highly inflated for some of the commonly used tests; a serious issue that appears to have been overlooked. We propose a parametric bootstrap (PB) approach and compare it with three existing location-scale invariant tests—the Welch test, the James test and the generalized F (GF) test. The Type I error rates and powers of the tests are evaluated using Monte Carlo simulation. Our studies show that the PB test is the best among the four tests with respect to Type I error rates. The PB test performs very satisfactorily even for small samples while the Welch test and the GF test exhibit poor Type I error properties when the sample sizes are small and/or the number of means to be compared is moderate to large. The James test performs better than the Welch test and the GF test. It is also noted that the same tests can be used to test the significance of the random effect variance component in a one-way random model under unequal error variances. Such models are widely used to analyze data from inter-laboratory studies. The methods are illustrated using some examples. 相似文献
9.
We present a method for refining n-sided polygons on a given piecewise linear model by using local computation, where the curved polygons generated by our method interpolate the positions and normals of vertices on the input model. Firstly, we construct a Bézier curve for each silhouette edge. Secondly, we employ a new method to obtain C1 continuous cross-tangent functions that are constructed on these silhouette curves. An important feature of our method is that the cross tangent functions are produced solely by their corresponding facet parameters. Gregory patches can therefore be locally constructed on every polygon while preserving G1 continuity between neighboring patches. To provide a flexible shape control, several local schemes are provided to modify the cross-tangent functions so that the sharp features can be retained on the resultant models. Because of the localized construction, our method can be easily accelerated by graphics hardware and fully run on the Graphics Processing Unit (GPU). 相似文献
10.
T. Hoang Ngan Le Chia-Chen Lin Chin-Chen Chang Hoai Bac LeAuthor vitae 《Digital Signal Processing》2011,21(6):734-745
Many secret sharing schemes for digital images have been developed in recent decades. Traditional schemes typically must deal with the problem of computational complexity, and other visual secret sharing schemes come with a higher transmission cost and storage cost; that is, each shadow size is m times as big as the original secret image. The new (2,n) secret sharing scheme for grayscale images proposed in this paper is based a combination of acceptable image quality using block truncation coding (BTC), high compression ratio discrete wavelet transform (DWT) and good subjective performance of the vector quantization (VQ) technique. Experimental results confirm that our proposed scheme not only generates a high quality reconstructed original image but also generates small, random-like grayscale shadows. 相似文献