期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘斌赵银亮韩博李玉祥吉烁冯博琴武万杰《电子与信息学报》2014,36(11):2768-2774

线程级推测(Thread-Level Speculation, TLS)是多核上一种加速串行程序的线程级自动并行化技术。循环具有规则的结构并在运行时占有大量的执行时间,因此循环是挖掘并行性的理想对象。然而,选择哪些循环并行才能提高程序的加速比是一个很难决定的问题。为了解决该问题,该文提出一种基于性能预测的循环选择方法。基于输入训练集获取程序预执行的剖析信息,同时结合各种推测因素,构建了循环结构的性能预测模型。预测结果定量评估了循环推测并行的加速比并决定该循环在运行时是否适合并行。实验结果表明,该文提出的方法能有效地预测循环并行时所蕴含的并行性,并依据预测结果准确地选择具有并行收益的循环推测并行,最终Olden基准测试集加速比性能平均提升了12.34%。相似文献

2.

CFD程序自相关循环的并行化方法研究

傅游花嵘丁晓宁康继昌《微电子学与计算机》2003,20(4):23-25

针对CFD程序中常见的自相关循环结构，文章分析了波前并行技术不能对其进行并行化的原因，针对其相关实质，提出了自相关循环的镜像分解技术，通过消除跨迭代的反相关，实现自相关循环结构的波前并行，完成自相关循环的并行化。相似文献

3.

基于BWDSP的HEVC帧内预测角度模式的并行化算法

汪辉段苓丽郎文辉佘成龙《电视技术》2018,(3):33-39

针对HEVC帧内预测角度模式算法的特点,提出实现角度预测模式的并行化方法.该方法基于BWDSP1041仿真平台通过分析角度模式算法的可并行性,提出了适合多乘法器并行计算的数据分配方式,结合处理器所搭载的硬件资源,设计了多运算部件并行工作的算法程序.实验结果表明角度预测模式20和垂直模式26在BWDSP1041上利用硬件资源的并行化实现,并行加速比分别达到161.68和344.65.该并行化算法减少了视频编码的时间,其数据分配方案对于帧内预测算法在多核和多运算部件结构上的并行化研究也具有一定的参考价值. 相似文献

4.

并行数据库技术分析与展望

《信息通信》2016,(12)

并行数据库系统(Parallel Database System)是新一代高性能的数据库系统,是在大规模并行处理机(Massive Parallel Processor)和集群并行计算环境的基础上建立的数据库系统。该技术起源于20世纪70年代的数据库机(Database Machine)研究,研究的内容主要集中在关系代数操作的并行化和实现关系操作的专用硬件设计上,后该研究以失败而告终。从上世纪90年代至今,随着处理器、存储、网络等相关基础技术的发展,并行数据库技术的研究重点也转移到数据操作的时间并行性和空间并行性上。通过并行使用多个CPU和磁盘来将把诸如装载数据、建立索引、执行查询等操作并行化,以提升性能数据库系统。最关键的两个内容是并行和分布式。相似文献

5.

多岔控制转的并行化重构

范植华范路《电子学报》1999,27(8):120-122

诸如ＰＡＳＣＡＬ里的ＣＡＳＥ,Ｃ里的ＳＷＩＴＣＨ,ＦＯＲＴＲＡＮ里的计算ＧＯＴＯ等等语句所代表的多岔控制转移,是程序设计语言中最复杂的控制结构之一,其本身,或者与无条件ＧＯＴＯ的配合使用,迄今在国内外均被并行性识别排除在外,亦即无条件地保持串行,从而丧失硬件惊人的并行潜力,本文通过并行化重构,在等价地消除各种多植逻辑的基础上,进而实施对它们的并行性分析,把隐藏于其中的潜在并行性全部挖掘出来。相似文献

6.

基于System Generator的字符分割算法的并行实现

亓静刘萍《现代电子技术》2009,32(14):10-13

基于FPGA并行性对车牌识别系统中重要组成部分--字符分割,提出一种适合硬件并行实现的结构,并在System Generator中完成了模型的建立以及优化.并行操作分为两个时间段:第一个时间段,通过循环迭代求出字符上下边界;第二个时间段,字符上下边界位置的使能控制与字符分割位置的控制信号并行作用于数据路径,产生有效像素.硬件仿真结果满足了时序要求,证实该结构的可行性.由于并行逻辑的建立,实现速度大大提高,体现出了FPGA的并行性在性能提高上的极大优势. 相似文献

7.

多岔控制转换的并行化重构

范植华范路《电子学报》1999,27(8):120-122

诸如ＰＡＳＣＡＬ里的ＣＡＳＥ,Ｃ里的ＳＷＩＴＣＨ,ＦＯＲＴＲＡＮ里的计算ＧＯＴＯ等等语句所代表的多岔控制转移,是程序设计语言中最复杂的控制结构之一．其本身,或者与无条件ＧＯＴＯ的配合使用,迄今在国内外均被并行性识别排除在外,亦即无条件地保持串行,从而丧失硬件惊人的并行潜力．本文通过并行化重构,在等价地消除各种多值逻辑的基础上,进而实施对它们的并行性分析,把隐藏于其中的潜在并行性全部挖掘出来．相似文献

8.

应用程序、专用程序、实用程序

《电子科技文摘》1999,(4)

9905156相关距离在循环语句并行化重构中的应用[刊]/周鹏//计算机工程与应用.—1998,34(8),—54～56(C)L 为一个顺序执行的 DO 循环语句,其中包含赋值语句或 IF-THEN-ELSE 条件语句。通过数据相关性分析,计算相关距离,可以析取 L 中内在的并行性,实现 L 向 DOALL 循环的完全变换或部分变换。相似文献

9.

HEVC帧内预测Planar和DC模式算法的并行化设计_*

谢晓燕徐卫芳《电视技术》2015,39(5)

针对HEVC帧内预测Planar和DC模式算法的特点,提出实现这两种模式的并行化方法.该方法是通过分析推导Planar和DC模式算法之间的可并行性,以西安邮电大学自主设计的一款面向图形、图像应用的阵列处理器PAAG(Polymorphic Array Architecture for Graphics and Image Processing)平台为基础,采用最优的数据分配方式,合理地设计了多处理单元并行工作的算法程序.实验结果表明Planar预测模式和DC预测模式在多处理单元上的并行实现,相比于单核的串行运算速度分别提高了84％和81％,串/并行加速比分别达到6.34和5.44.该并行化算法减少了视频的编解码时间,其数据分配方案对于帧内预测算法在多核结构上的并行化研究也有一定的参考价值. 相似文献

10.

计算机在电子学方面的应用

《中国无线电电子学文摘》2001,(6)

rP31 01061771软件过程中的荆子性挖掘/李彤,(2〕王黎霞(云南大学)寿汁算机应用与软件一2001,18(5)一27一31挖掘软件过程中的并行性,使其中的活动尽量并行进行,是提高软件生产率的重要手段.文中提出了一种通过活动间相关性分析,寻找软件过程中可并行化的因素,挖拢!出可并行进行的活动,进而构造出Petri网表示的并行化的软件过程摸型的技术,获得了比较理想的并行性挖掘效果.图5参11(午)行为三个方面刻画构件.采用JB/5 ADL可以方便地进行软件体系结构的构造、细化和验证,并具有决速生成原型的能力,还支持代码框架的自动生成和系统体系结构的… 相似文献

11.

Locality‐Conscious Nested‐Loops Parallelization

Saeed Parsa Mohammad Hamze 《ETRI Journal》2014,36(1):124-133

To speed up data‐intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions. 相似文献

12.

Compact Code Generation for Tightly-Coupled Processor Arrays

Srinivas Boppu Frank Hannig Jürgen Teich 《Journal of Signal Processing Systems》2014,77(1-2):5-29

In this paper, we consider programmable tightly-coupled processor arrays consisting of interconnected small light-weight VLIW cores, which can exploit both loop-level parallelism and instruction-level parallelism. These arrays are well suited for compute-intensive nested loop applications often providing a higher power and area efficiency compared with commercial off-the-shelf processors. They are ideal candidates for accelerating the computation of nested loop programs in future heterogeneous systems, where energy efficiency is one of the most important design goals for overall system-on-chip design. In this context, we present a novel design methodology for the mapping of nested loop programs onto such processor arrays. Key features of our approach are: (1) Design entry in form of a functional programming language and loop parallelization in the polyhedron model, (2) support of zero-overhead looping not only for innermost loops but also for arbitrarily nested loops. Processors of such arrays are often limited in instruction memory size to reduce the area and power consumption. Hence, (3) we present methods for code compaction and code generation, and integrated these methods into a design tool. Finally, (4) we evaluated selected benchmarks by comparing our code generator with the Trimaran and VEX compiler frameworks. As the results show, our approach can reduce the size of the generated processor codes up to 64 % (Trimaran) and 55 % (VEX) while at the same time achieving a significant higher throughput. 相似文献

13.

Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors

Yi Wang Linfeng Pan Zili Shao Yong Guan Minyi Guo 《Journal of Signal Processing Systems》2014,74(2):137-150

Multimedia SIMD extensions are commonly employed today to speed up media processing. When performing vectorization for SIMD architectures, one of the major issues is to handle the problem of memory alignment. Prior study focused on either vectorizing loops with all memory references being properly aligned, or introducing extra operations to deal with the misaligned memory references. On the other hand, multi-core SIMD architectures require coarse-grain parallelism. Therefore, it is an important problem to study how to parallelize and vectorize loop nests with the awareness of data misalignments. This paper presents a loop transformation scheme that maximizes the parallelism of outermost loops, while the misaligned memory references in innermost loops are reduced. The basic idea of our technique is to align each level of loops in the nest, considering the constraint of dependence relations. To reduce the data misalignments, we establish a mathematical model with a concept of offset-collection and propose an effective heuristic algorithm. For coarser-grain parallelism, we propose some rules to analyze the outermost loop. When transformations are applied, the inner loops are involved to maximize the parallelism. To avoid introducing more data misalignments, the involved innermost loop is handled from other levels of loops. Experimental results show that 7 % to 37 % (on average 18.4 %) misaligned memory references can be reduced. The simulations on CELL show that 1.1x speedup can be reached by reducing the misaligned data, while 6.14x speedup can be achieved by enhancing the parallelism for multi-core. 相似文献

14.

Parametric Analysis of Polyhedral Iteration Spaces

Philippe Clauss Vincent Loechner 《The Journal of VLSI Signal Processing》1998,19(2):179-194

In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the polytope model, is based on mathematics dealing with convex and discrete geometry, linear programming, combinatorics and geometry of numbers.In this paper, we present automatic methods for computing the parametric vertices and the Ehrhart polynomial, i.e., a parametric expression of the number of integer points, of a polytope defined by a set of parametric linear constraints.These methods have many applications in analysis and transformations of nested loop programs. The paper is illustrated with exact symbolic array dataflow analysis, estimation of execution time, and with the computation of the maximum available parallelism of given loop nests. 相似文献

15.

Outer Loop Pipelining for Application Specific Datapaths in FPGAs

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(10):1268-1280

Most hardware compilers apply loop pipelining to increase the parallelism achieved, but pipelining is restricted to the only innermost level in a nested loop. In this work we extend and adapt an existing outer loop pipelining approach known as single dimension software pipelining to generate schedules for field-programmable gate-array (FPGA) hardware coprocessors. Each loop level in nine test loops is pipelined and the resulting schedules are implemented in VHDL and targeted to an Altera Stratix II FPGA. The results show that the fastest solution for all but one of the loops occurs when pipelining is applied one to three levels above the innermost loop. Across the nine test loops we achieve an acceleration over the innermost loop solution of up to seven times, with a mean speedup of 3.2 times. The results suggest that inclusion of outer loop pipelining in future hardware compilers may be worthwhile as it can allow significantly improved results to be achieved at the cost of a small increase in compile time. 相似文献

16.

循环扭曲技术的再认识

金国华陈福接《电子学报》1994,22(5):25-31

本文对Ｗｏｌｆｅ８６年提出的循环扭曲转换技术进行了重新认识。通过引入相关距离矩阵和相关方向矩阵概念，给出了扭曲变换多重紧嵌套循环的一般化方法。然后分析了循环扭曲对并行性和数据局部性的影响，最后讨论了它和其它转换技术之间的相互关系。相似文献

17.

Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Chun Xue Zili Shao Edwin H.-M. Sha 《The Journal of VLSI Signal Processing》2007,47(2):153-167

Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively.

Edwin H.-M. ShaEmail:

相似文献

18.

LDPC码的树图法构造

张焕明叶梧冯穗力《电讯技术》2007,47(4):166-168

LDPC码译码采用的是BP算法,但由于回路的存在,使译码重复迭代,特别是短长度的回路使LDPC码的性能下降.为此,用树图法分析了LDPC码的回路及其特性,给出了求解回路长度和所经过节点的方法,非常适合于计算机进行求解.同时也用树图的方法来构造LDPC码,可以在树生成的过程中了解其中的回路数目及长度. 相似文献

19.

低密度奇偶校验码码字构造的消环算法

谭星宋文涛张海滨《电信快报》2005,(10):34-36

低密度奇偶校验码(LDPC)的性能取决于多种因素,包括度分布对、码字的长度以及环的分布。环的存在会影响LDPC码的译码门限和误码平层,尤其是长度比较小的环对LDPC码的性能影响很大。因此,有必要在构造LDPC码时消去长度比较小的环。文中提供了一种有效的消环算法,降低了LDPC码的误码平层。相似文献