首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
4.
5.
一种面向分布主存多处理机的有效数据分布方法   总被引:1,自引:0,他引:1       下载免费PDF全文
本文针对分布主存多处理机中的数据分布问题,在程序已经过并行性分析的基础之上,提出了一种基于数据变换技术的有效数据分布方法。该方法能对多个嵌套循环中具有一般仿射数组下标的任意维数组进行有效的数据分布,并且该方法还考虑了偏移常量的对准问题,从而能使得数据通信量尽量小。实验结果表明了该方法的有效性。  相似文献   

6.
This paper presents a data layout optimization technique for sequential and parallel programs based on the theory of hyperplanes from linear algebra. Given a program, our framework automatically determines suitable memory layouts that can be expressed by hyperplanes for each array that is referenced. We discuss the cases where data transformations are preferable to loop transformations and show that under certain conditions a loop nest can be optimized for perfect spatial locality by using data transformations. We argue that data transformations can also optimize spatial locality for some arrays without distorting temporal/spatial locality exhibited by others. We divide the problem of optimizing data layout into two independent subproblems: 1) determining optimal static data layouts, and 2) determining data transformation matrices to implement the optimal layouts. By postponing the determination of the transformation matrix to the last stage, our method can be adapted to compilers with different default layouts. We then present an algorithm that considers optimizing parallelism and spatial locality simultaneously. Our results on eight programs on two distributed shared-memory multiprocessors, the Convex Exemplar SPP-2000 and the SGI Origin 2000, show that the layout optimizations are effective in optimizing spatial locality and parallelism  相似文献   

7.
8.
9.
Minimizing data communication over processors is the key to compile programs for distributed memory multicomputers. In this paper, we propose new data partition and alignment techniques for partitioning and aligning data arrays with a program in a way of minimizing communication over processors. We use skewed alignment instead of the dimension-ordered alignment techniques to align data arrays. By developing the skewed scheme, we can solve more complex programs with minimized data communication than that of the dimension-ordered scheme. Finally, we compare the proposed scheme with the dimension-ordered alignment one by experimental results. The experimental results show that our proposed scheme has more opportunities to align data arrays such that data communications over processors can be minimized.  相似文献   

10.
11.
In this paper, we propose a new automatic data alignment model called segmented alignment. The conventional data alignment model, such as that used in High-Performance Fortran (HPF), aligns arrays with the whole index domain. The principle of our proposed segmented alignment is to allow alignment relations within delimited index domains. We first provide motivating examples to illustrate how code fragments of HPF with EOSHIFT or CSHIFT operations, or produced by synthesis operations can benefit from our enhanced alignment scheme. Second, we show that this new model can be implemented in HPF-like languages by adding WHEN and IN constructs to them. In addition, we show that the new proposed schemes for WHEN and IN constructs can be emulated using standard HPF syntax. Finally, we address issues related to automatic data alignment for the new proposed model, and present an algorithm to automatically align programs using our segmented alignment scheme. Since the optimal algorithm to do this is NP-hard, a practical heuristic is also given. Our experiments were performed on a DEC Alpha Farm with HPF environments. Our experiments confirm our theory that our proposed alignment scheme can significantly enhance not only the performance of HPF code fragments with EOSHIFT or CSHIFT operations, but also that of codes produced by synthesis operations.  相似文献   

12.
13.
This paper investigates the use of nonlinear direct output feedback in the control of linear time-invariant systems. Inparticular, it is concerned with those nonlinear controllers which result in a closed loop system which is quadratically stable. That is, the closed loop nonlinear system is asymptotically stable and furthermore, a quadratic Lyapunov function can be used to establish this stability. The main result of this paper can be stated roughly as follows: Any linear system which can be made quadratically stable using nonlinear direct output feedback control can also be stabilized using linear direct output feedback control.  相似文献   

14.
This paper investigates some conditions that can provide stabilizability for linear switched systems with polytopic uncertainties via their closed loop linear quadratic state feedback regulator. The closed loop switched systems can stabilize unstable open loop systems or stable open loop systems but in which there is no solution for a common Lyapunov matrix. For continuous time switched linear systems, we show that if there exists solution in an associated Riccati equation for the closed loop systems sharing one common Lyapunov matrix, the switched linear systems are stable. For the discrete time switched systems, we derive a Linear Matrix Inequality (LMI) to calculate a common Lyapunov matrix and solution for the stable closed loop feedback systems. These closed loop linear quadratic state feedback regulators guarantee the global asymptotical stability for any switched linear systems with any switching signal sequence.  相似文献   

15.
In distributed memory multicomputers, local memory accesses are much faster than those involving interprocessor communication. For the sake of reducing or even eliminating the interprocessor communication, the array elements in programs must be carefully distributed to local memory of processors for parallel execution. We devote our efforts to the techniques of allocating array elements of nested loops onto multicomputers in a communication-free fashion for parallelizing compilers. We first analyze the pattern of references among all arrays referenced by a nested loop, and then partition the iteration space into blocks without interblock communication. The arrays can be partitioned under the communication-free criteria with nonduplicate or duplicate data. Finally, a heuristic method for mapping the partitioned array elements and iterations onto the fixed-size multicomputers under the consideration of load balancing is proposed. Based on these methods, the nested loops can execute without any communication overhead on the distributed memory multicomputers. Moreover, the performance of the strategies with nonduplicate and duplicate data for matrix multiplication is studied  相似文献   

16.
Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is called scalarization.Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.In this paper, we present a novel application of two compiler strategies—loop alignment and loop skewing—to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.  相似文献   

17.
刘勇  陆林生  何王全 《软件学报》2010,21(Z1):290-297
考虑到硬件管理Cache 多级存储结构在功耗和面积方面的开销过大,众核处理器倾向于采用软件管理的多级存储结构,这就需要软件规划好程序的数据在各级存储上的布局和传输.尝试了一种依赖程序原有循环结构和问题规模的简易数据自动分块方法,根据循环层内的数据访存范围进行相应的分块,避免数据复杂的依赖关系分析,使得该方法易于在编译器中实现.同时可根据需要进一步结合程序变换如循环交换、循环联合和循环分裂等方法得到更佳的分块参数.实验结果表明,在大多数问题规模下与一般分块方法的优化性能相当,但在某些特定问题规模下能够获得较高的优化性能.  相似文献   

18.
Over the past 20 years, increases in processor speed have dramatically outstripped performance increases for standard memory chips. To bridge this gap, compilers must optimize applications so that data fetched into caches are reused before being displaced. Existing compiler techniques can efficiently optimize simple loop structures such as sequences of perfectly nested loops. However, on more complicated structures, existing techniques are either ineffective or require too much computation time to be practical for a commercial compiler. To optimize complex loop structures both effectively and inexpensively, we present a novel loop transformation, dependence hoisting, for optimizing arbitrarily nested loops, and an efficient framework that applies the new technique to aggressively optimize benchmarks for better locality. Our technique is as inexpensive as the traditional unimodular loop transformation techniques and thus can be incorporated into commercial compilers. In addition, it is highly effective and is able to block several linear algebra kernels containing highly challenging loop structures, in particular, Cholesky, QR, LU factorization without pivoting, and LU with partial pivoting. The automatic blocking of QR and pivoting LU is a notable achievement—to our knowledge, few previous compiler techniques, including theoretically more general loop transformation frameworks [1, 21, 23, 27, 31], were able to completely automate the blocking of these kernels, and none has produced the same blocking as produced by our technique. These results indicate that with low compilation cost, our technique can in practice match the effectiveness of much more expensive frameworks that are theoretically more powerful.  相似文献   

19.
Effective optimization for fuzzy model predictive control   总被引:4,自引:0,他引:4  
This paper addresses the optimization in fuzzy model predictive control. When the prediction model is a nonlinear fuzzy model, nonconvex, time-consuming optimization is necessary, with no guarantee of finding an optimal solution. A possible way around this problem is to linearize the fuzzy model at the current operating point and use linear predictive control (i.e., quadratic programming). For long-range predictive control, however, the influence of the linearization error may significantly deteriorate the performance. In our approach, this is remedied by linearizing the fuzzy model along the predicted input and output trajectories. One can further improve the model prediction by iteratively applying the optimized control sequence to the fuzzy model and linearizing along the so obtained simulated trajectories. Four different methods for the construction of the optimization problem are proposed, making difference between the cases when a single linear model or a set of linear models are used. By choosing an appropriate method, the user can achieve a desired tradeoff between the control performance and the computational load. The proposed techniques have been tested and evaluated using two simulated industrial benchmarks: pH control in a continuous stirred tank reactor and a high-purity distillation column.  相似文献   

20.
提出了两种高基Montgomery模乘线性阵列结构。两种线性阵列结构分别利用两种不同的并行化开发方法,沿不同的循环维度进行任务分配和调度,都能够充分开发算法的流水线并行。在Xilinx XC5VLX330 FPGA上实现了两种256位宽、基为216的模乘阵列结构。实验结果表明,两种结构具有84个时钟周期的延迟,吞吐率分别为1/17和1/21,与相关结构相比吞吐率更高。两种结构在性能和实现代价间能够达到合理平衡。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号